Posts by Astro

61) Message boards : RALPH@home bug list : Bug reports for Ralph 5.05 and higher (Message 1497)
Posted 5 May 2006 by Profile Astro
Post:
And now if we can just get BOINC to ONLY preempt an application after it does a checkpoint, then we'll REALLY be cruisin'!

This was posted to the boinc alpha mail list yesterday by JM7 (the creator of the scheduler)

John.McLeod@xxxxxxxxxx.com to boinc_dev
More options May 4 (1 day ago)

I have been working on the CPU scheduler to see what I can do to make it
work as the doc says it should.

What I have at the moment:

The CPU scheduler checks the necessity to preempt:
1) If one of the events that could cause entry to EDF occurs.
(Checkpoint after process swap time, files downloaded, task exit, ...).
2) At least once every 10 minutes. (Just to be safe). What should this
frequency be? 10 minutes? an hour? the time between allowed checkpoints?

The CPU scheduler select tasks to run if:
1) There are not enough runnable tasks scheduled to meet 1 per CPU
allowed. (Startup / task complete / running task suspended ...).
2) A checkpoint has been reached after the process swap time.
3) One or more results has recently entered the state of requiring EDF.

Enforcement is immediate. If a result has reached its checkpoint after
process swap time, and the CPU scheduler has scheduled it for another
process time, then it gets the full time allotted to it (default another
hour + time to checkpoint).

AND

John.McLeod@xxxxxxxxx.com to elst93, boinc_dev
More options May 4 (1 day ago)

How often to check to see if pre-emption is needed may not want to be user
configurable because someone is going to set the number to way too large.

If the process doesn't checkpoint, it will either complete (and the system
will fall under 1 - not enough runable results running) OR another process
will require attention in order to meet deadline in which case, that
process will start running.

One further note, if a process does actually make it to a checkpoint, it
will then be removed from memory when it suspends - this suspend will
happen within a second or two of the checkpoint.

jm7

seems from this, it's already being looked into
62) Message boards : RALPH@home bug list : Bug reports for Ralph 5.05 and higher (Message 1475)
Posted 4 May 2006 by Profile Astro
Post:
[This computer is headless. Remote access only. Hence no screensaver.

Mike, I use VNC to see the graphics on my remote monitorless, keyboardless, and mouseless puter. I click on the WU from the task tab and then view graphics. No screensaver here either. If it's a service install your hosed.

tony
63) Message boards : Cafe RALPH : SOMEone PLEASE vote SOMEONE for user of the day!! (Message 1449)
Posted 1 May 2006 by Profile Astro
Post:
I was under the impression that UOTD was picked at random each day, and that votes had nothing to do with it. I've seen on new projects where someone would be UOTD until someone else submitted a new profile. Maybe I'll make a profile to test this. If it's not this, then something is turned off on the back end.

tony

[edit] I made a profile, but it didn't switch, must be back end
64) Message boards : RALPH@home bug list : Debugger Stuff (Message 1316)
Posted 23 Apr 2006 by Profile Astro
Post:
and 5.4.6 isn't far off either.
65) Message boards : RALPH@home bug list : Network error (Message 1280)
Posted 20 Apr 2006 by Profile Astro
Post:
Carlos, try a manual project update on other projects and look at the message tab. I think the message you're seeing just means the project server is down.

I just tried and got this:

4/20/2006 3:56:08 PM|ralph@home|Sending scheduler request to http://ralph.bakerlab.org/ralph_cgi/cgi
4/20/2006 3:56:08 PM|ralph@home|Reason: Requested by user
4/20/2006 3:56:08 PM|ralph@home|(not requesting new work or reporting completed tasks)
4/20/2006 3:56:13 PM|ralph@home|Scheduler request to http://ralph.bakerlab.org/ralph_cgi/cgi succeeded
66) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 1229)
Posted 18 Apr 2006 by Profile Astro
Post:
Thanks, I aborted it. WU in question

Now, it's been sent out to: RoliLSD
67) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 1226)
Posted 18 Apr 2006 by Profile Astro
Post:
Thank you

This old machine just keeps chugging and I have no problem letting it continue as long as it may be useful. I don't care if it wants to run 2000 hours. I'm getting interested if it'll finish at 100%(that's 100 days from now at this rate,LOL)

There is no debug software on that old machine (it's stuffed under an end table in the corner (ultra microatx frame), has no mouse, no keyboard,no monitor, it's only viewable via Realvnc. It is hooked to an UPS, but my fear is the memory leaks will be what causes this to stop crunching and not some other error.

It's now at 3.5633% done, 122:54:06, stage Full atom relax, Model 1, Step 34155, 124:39:17 remaining, oh yeah, there's 24 red dots now (whatever the red dots are)

My Ralph prefs:

Resource share
If you participate in multiple BOINC projects, this is the proportion of your resources used by RALPH@home 10
Percentage of CPU time used for graphics not selected
Number of frames per second for graphics not selected
Target CPU run time 4 hours
Miscellaneous
Should RALPH@home send you email newsletters? yes
Should RALPH@home show your computers on its web site? yes
Default computer location home

my general prefs:

Processor usage
Do work while computer is running on batteries?
(matters only for portable computers) yes
Do work while computer is in use? yes
Do work only between the hours of (no restriction)
Leave applications in memory while preempted?
(suspended applications will consume swap space if 'yes') yes
Switch between applications every
(recommended: 60 minutes) 180 minutes
On multiprocessors, use at most 1 processors
Disk and memory usage
Use no more than 400 GB disk space
Leave at least
(Values smaller than 0.001 are ignored) .25 GB disk space free
Use no more than 85% of total disk space
Write to disk at most every 600 seconds
Use no more than 100% of total virtual memory
Network usage
Connect to network about every
(determines size of work cache; maximum 10 days) 3 days
Confirm before connecting to Internet?
(matters only if you have a modem, ISDN or VPN connection) no
Disconnect when done?
(matters only if you have a modem, ISDN or VPN connection) no
Maximum download rate: 200 KB/s
Maximum upload rate: 200 KB/s
Use network only between the hours of
Enforced by versions 4.46 and greater (no restriction)
Skip image file verification?
Check this ONLY if your Internet provider modifies image files (UMTS does this, for example).
Skipping verification reduces the security of BOINC. no
68) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 1222)
Posted 18 Apr 2006 by Profile Astro
Post:
Tony, I agree with you as well. The directive from the dev's is *not* to abort work units unless specifically asked to. http://ralph.bakerlab.org/forum_thread.php?id=18

If it's not giving you any problems and progressing properly, let it crunch!

David, that's part of the question I need answered, it's a timex, in that it keeps crunching, graphics work well, all the bits move, % done advances, CPU time advances, and even the "estimate to completion" moves, but it keeps getting higher. This could be due to the way win98 counts time.

I see it run "Ab initio", then switch to "full atom relax", then it loops back to "ab initio" and starts all over again. All the while staying on "model 1". Is this how others see it working? I was thinking it did "ab initio", then "full atom relax", and then switched to the next model, but I'm not sure which way is "normal".

tony
69) Message boards : RALPH@home bug list : Old - Bug reports for Windows Ver - 5.00 (and higher) (Message 1220)
Posted 18 Apr 2006 by Profile Astro
Post:
Feet1st, The "no finished file" message means Boinc.exe (the daemon), lost contact with Boincmgr.exe (the manager). The move to a network drive may have something to do with it. I assume the manager in on a local puter and the daemon is on the network drive.

tony
70) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 1216)
Posted 18 Apr 2006 by Profile Astro
Post:
Older versions can be helpful, see the following WU(this is very typical as I scan my results page):

79398 566 7 Apr 2006 22:24:47 UTC 8 Apr 2006 8:51:57 UTC Over Client error Computing 212.18 0.40 ---
81388 1531 8 Apr 2006 14:53:27 UTC 15 Apr 2006 18:53:05 UTC Over Client error Computing 130.30 0.27 ---
88134 2175 15 Apr 2006 18:53:29 UTC 18 Apr 2006 11:42:31 UTC Over Success Done 15,423.92 24.23 24.23

notice how the first two users had "client error computing" (these were "unhandled exceptions"), and yet I did it successfully? This tells them that the wu itself can be finished and isn't bad in all cases, it's just that some conditional difference exists between the first two users and myself. The question that can help debug becomes "what's different between the first two users and the third.

If I had aborted it, they wouldn't have this info to work with.
71) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 1215)
Posted 18 Apr 2006 by Profile Astro
Post:
Carlos, have they said they can't use older results to help debug the future versions? Have they said "always delete all results after a new versions come out? I haven't seen that. Hence, I'm reporting it and waiting for further instructions from Mod9/developers. I'd hate to dump it if it can be useful. Maybe they've already found the problem. If so someone should say something. This is an alpha project. Boinc Alpha wants reports from previous versions. They still have the 4.99 threads listed for use, that says, they still want reports or haven't "closed" them yet. Either way, I want someone to tell me if this is useful (see my first and succeeding posts). I will continue to post this until someone says otherwise.

I question my posts qualifying for acceptance to this thread, but it started as a 1% bug. Mod9 can feel free to move or delete it. All I need is some guidance from management as to how I can best help them.

tony
72) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 1213)
Posted 18 Apr 2006 by Profile Astro
Post:
on another machine I have an "hblr" 4.99 stuck at one percent, model 1, step 0, stage "initializing". I'd restarted boinc after getting a fatal error message, and seeing this wu at 59 hours and change. It is work unit HBLR_1.0_1di2_377_4.

OH wait a second, the cpu time dropped back to 54 hours and it's working again. Stage Ab initio, model 1, Step 19013 and progressing. do I let it continue? It's a celeron 500, win98se, 256Mram running on realVNC so obviously I don't watch this one too closely and don't care about credit. Is there someway this may be helpful?

tony

[edit] it's now at 1.041% so it's not stuck anymore, I'll let it run unless otherwise instructed.

OK this unit is STILL running. I suspended boincsimap as it would cause error which made boinc be restarted. It's now at 89:32:49 cpu time, 94:35:57 remainiing, and 2.291 % done. It's on AB INITIO model 1, step 2691. This is the only thing running, it hasn't switched or been paused. It just seems to keep looping.

Still awaiting further instructions. Can this info be helpful?

It's now at 96:40:29 cpu time, 2.539%, 101:16:46 to completion. There are 14 red dots on graphic, which states:
Stage: full atom relax, Model 1, Step 31878

Still letting her run, waiting on instructions.

tony

It's still chugging along, but from the model and step numbers I think it's going in circles.

Cpu time 104:11:17, 2.795% done, 108:17:52 remaining
graph shows 17 red dots
Stage: full atom relax, Model 1, step 31473.

It was at model 1, step 31878 many hours ago. This is not being switched, paused, or removed from memory.

OK, I just watched it switch back to Model 1 Step 130, now it's 2.8155% done and there are NO red dots, but plenty of teal ones, and they're in a completely different pattern from what it just was. CPU time 104:30:12. It had gotten up to model 1 step 31535 (was the last I saw and only a few minutes had passed so it could have gone much beyond that).

in the time it took to type this the steps jumped up to 28000ish. it only stayed in Ab initio maybe a minute or two, and is now in full atom relax, and my red dots are back. I think the scale prevents me from seeing them all yet.

OK, still running, 22 red dots, model 1, step 32190.
114:42:18 cpu time, 3.306% done, 117:51:40 remaining
73) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 1212)
Posted 18 Apr 2006 by Profile Astro
Post:
Is switching from ab initio to full atom relax to ab intio to full atom relax and on and on and on within the same model normal?
74) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 1211)
Posted 18 Apr 2006 by Profile Astro
Post:
on another machine I have an "hblr" 4.99 stuck at one percent, model 1, step 0, stage "initializing". I'd restarted boinc after getting a fatal error message, and seeing this wu at 59 hours and change. It is work unit HBLR_1.0_1di2_377_4.

OH wait a second, the cpu time dropped back to 54 hours and it's working again. Stage Ab initio, model 1, Step 19013 and progressing. do I let it continue? It's a celeron 500, win98se, 256Mram running on realVNC so obviously I don't watch this one too closely and don't care about credit. Is there someway this may be helpful?

tony

[edit] it's now at 1.041% so it's not stuck anymore, I'll let it run unless otherwise instructed.

OK this unit is STILL running. I suspended boincsimap as it would cause error which made boinc be restarted. It's now at 89:32:49 cpu time, 94:35:57 remainiing, and 2.291 % done. It's on AB INITIO model 1, step 2691. This is the only thing running, it hasn't switched or been paused. It just seems to keep looping.

Still awaiting further instructions. Can this info be helpful?

It's now at 96:40:29 cpu time, 2.539%, 101:16:46 to completion. There are 14 red dots on graphic, which states:
Stage: full atom relax, Model 1, Step 31878

Still letting her run, waiting on instructions.

tony

It's still chugging along, but from the model and step numbers I think it's going in circles.

Cpu time 104:11:17, 2.795% done, 108:17:52 remaining
graph shows 17 red dots
Stage: full atom relax, Model 1, step 31473.

It was at model 1, step 31878 many hours ago. This is not being switched, paused, or removed from memory.

OK, I just watched it switch back to Model 1 Step 130, now it's 2.8155% done and there are NO red dots, but plenty of teal ones, and they're in a completely different pattern from what it just was. CPU time 104:30:12. It had gotten up to model 1 step 31535 (was the last I saw and only a few minutes had passed so it could have gone much beyond that).

in the time it took to type this the steps jumped up to 28000ish. it only stayed in Ab initio maybe a minute or two, and is now in full atom relax, and my red dots are back. I think the scale prevents me from seeing them all yet.
75) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 1210)
Posted 18 Apr 2006 by Profile Astro
Post:
on another machine I have an "hblr" 4.99 stuck at one percent, model 1, step 0, stage "initializing". I'd restarted boinc after getting a fatal error message, and seeing this wu at 59 hours and change. It is work unit HBLR_1.0_1di2_377_4.

OH wait a second, the cpu time dropped back to 54 hours and it's working again. Stage Ab initio, model 1, Step 19013 and progressing. do I let it continue? It's a celeron 500, win98se, 256Mram running on realVNC so obviously I don't watch this one too closely and don't care about credit. Is there someway this may be helpful?

tony

[edit] it's now at 1.041% so it's not stuck anymore, I'll let it run unless otherwise instructed.

OK this unit is STILL running. I suspended boincsimap as it would cause error which made boinc be restarted. It's now at 89:32:49 cpu time, 94:35:57 remainiing, and 2.291 % done. It's on AB INITIO model 1, step 2691. This is the only thing running, it hasn't switched or been paused. It just seems to keep looping.

Still awaiting further instructions. Can this info be helpful?

It's now at 96:40:29 cpu time, 2.539%, 101:16:46 to completion. There are 14 red dots on graphic, which states:
Stage: full atom relax, Model 1, Step 31878

Still letting her run, waiting on instructions.

tony

It's still chugging along, but from the model and step numbers I think it's going in circles.

Cpu time 104:11:17, 2.795% done, 108:17:52 remaining
graph shows 17 red dots
Stage: full atom relax, Model 1, step 31473.

It was at model 1, step 31878 many hours ago. This is not being switched, paused, or removed from memory.
76) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 1205)
Posted 17 Apr 2006 by Profile Astro
Post:
on another machine I have an "hblr" 4.99 stuck at one percent, model 1, step 0, stage "initializing". I'd restarted boinc after getting a fatal error message, and seeing this wu at 59 hours and change. It is work unit HBLR_1.0_1di2_377_4.

OH wait a second, the cpu time dropped back to 54 hours and it's working again. Stage Ab initio, model 1, Step 19013 and progressing. do I let it continue? It's a celeron 500, win98se, 256Mram running on realVNC so obviously I don't watch this one too closely and don't care about credit. Is there someway this may be helpful?

tony

[edit] it's now at 1.041% so it's not stuck anymore, I'll let it run unless otherwise instructed.

OK this unit is STILL running. I suspended boincsimap as it would cause error which made boinc be restarted. It's now at 89:32:49 cpu time, 94:35:57 remainiing, and 2.291 % done. It's on AB INITIO model 1, step 2691. This is the only thing running, it hasn't switched or been paused. It just seems to keep looping.

Still awaiting further instructions. Can this info be helpful?

It's now at 96:40:29 cpu time, 2.539%, 101:16:46 to completion. There are 14 red dots on graphic, which states:
Stage: full atom relax, Model 1, Step 31878

Still letting her run, waiting on instructions.

tony
77) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 1202)
Posted 17 Apr 2006 by Profile Astro
Post:
on another machine I have an "hblr" 4.99 stuck at one percent, model 1, step 0, stage "initializing". I'd restarted boinc after getting a fatal error message, and seeing this wu at 59 hours and change. It is work unit HBLR_1.0_1di2_377_4.

OH wait a second, the cpu time dropped back to 54 hours and it's working again. Stage Ab initio, model 1, Step 19013 and progressing. do I let it continue? It's a celeron 500, win98se, 256Mram running on realVNC so obviously I don't watch this one too closely and don't care about credit. Is there someway this may be helpful?

tony

[edit] it's now at 1.041% so it's not stuck anymore, I'll let it run unless otherwise instructed.

OK this unit is STILL running. I suspended boincsimap as it would cause error which made boinc be restarted. It's now at 89:32:49 cpu time, 94:35:57 remainiing, and 2.291 % done. It's on AB INITIO model 1, step 2691. This is the only thing running, it hasn't switched or been paused. It just seems to keep looping.

Still awaiting further instructions. Can this info be helpful?
78) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 1185)
Posted 15 Apr 2006 by Profile Astro
Post:
on another machine I have an "hblr" 4.99 stuck at one percent, model 1, step 0, stage "initializing". I'd restarted boinc after getting a fatal error message, and seeing this wu at 59 hours and change. It is work unit HBLR_1.0_1di2_377_4.

OH wait a second, the cpu time dropped back to 54 hours and it's working again. Stage Ab initio, model 1, Step 19013 and progressing. do I let it continue? It's a celeron 500, win98se, 256Mram running on realVNC so obviously I don't watch this one too closely and don't care about credit. Is there someway this may be helpful?

tony

[edit] it's now at 1.041% so it's not stuck anymore, I'll let it run unless otherwise instructed.
79) Message boards : RALPH@home bug list : RALPH Version News! - Version 4.97 (Win/Lin/Mac) released! (Message 1152)
Posted 14 Apr 2006 by Profile Astro
Post:
Stupid HBLR's

http://ralph.bakerlab.org/workunit.php?wuid=76718
http://ralph.bakerlab.org/workunit.php?wuid=76792
http://ralph.bakerlab.org/workunit.php?wuid=74618
http://ralph.bakerlab.org/workunit.php?wuid=74617
http://ralph.bakerlab.org/workunit.php?wuid=74616
http://ralph.bakerlab.org/workunit.php?wuid=74586
http://ralph.bakerlab.org/workunit.php?wuid=74584
http://ralph.bakerlab.org/workunit.php?wuid=75896
http://ralph.bakerlab.org/workunit.php?wuid=74836
80) Message boards : Cafe RALPH : Forum Moderator Contact thread (Message 1018)
Posted 30 Mar 2006 by Profile Astro
Post:
I have not been able to enter my ID number or project URL to my BOINC application. I never received an email from you with the numbers and have tried to use the numbers which I found on my membership listing. It is not being accepted. Can you help?

Chuck Etienne
cetienne@woh.rr.com

Log into your account at ralph, then create a password under "your account", then open the attachement wizard and use http://ralph.bakerlab.org/ as the URL, then select "use existing account" and enter your email and newly created password.

tony


Previous 20 · Next 20



©2024 University of Washington
http://www.bakerlab.org