Message boards : RALPH@home bug list : Bug reports for Ralph 5.20
Author | Message |
---|---|
dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0 |
This version has some boinc-related fixes in the watchdog and graphics. |
Nikolay A. Saharov Send message Joined: 17 Feb 06 Posts: 6 Credit: 25,102 RAC: 0 |
Hi, I have Ralph WU Result 149188 that is stuck in BOINC Mgr queue at 100% and time 1:20:42. It has status "Running". But in Graphics window the result is completed at 67.2% with time 1:20:45. CPU usage is 50% and only another WU is really running. (I have P4-2.6 GHz HT with 2 logical CPUs). Or other words, 2 Ralph WUs are running but only one uses CPU at 50% and another at 0%. PS: This result is completed now successfully and reported with messages: BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... {Edit} No other problems. {Edit 2} There was something like described in this post. |
suguruhirahara Send message Joined: 5 Mar 06 Posts: 40 Credit: 11,320 RAC: 0 |
This version has some boinc-related fixes in the watchdog and graphics.I confirmed graphics has been fixed. It works more smoothly than before. |
Niehaus Send message Joined: 22 Feb 06 Posts: 10 Credit: 2,707 RAC: 0 |
My Ralph calculates the WUs to 100% but doesnt send them, and they are still "active" but there is no further calculation, the programm continues with my rosetta WUs... Oh it DID send the WU after some time, sry!!! |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
I ran 3 work units. Two actually completed but I suspect the "dormant" bug is still present as this first work (149120) unit completed in EXACTLY 1 hour with 36 min of CPU time, and this other one (149885) completed in EXACTLY 2 hours with 81 min of CPU time. The third (149194) errored out with: Unrecoverable error for result t296__CASP7_ABINITIO_SAVE_ALL_OUT_hom013__614_2_1 (One or more arguments are invalid (0x80000003) - exit code -2147483645 (0x80000003)) The upload indicated a watchdog shut down. Mike |
Fuzzy Hollynoodles Send message Joined: 19 Feb 06 Posts: 37 Credit: 2,089 RAC: 0 |
I ran 3 work units. The "dormant" bug was in this one also: https://ralph.bakerlab.org/workunit.php?wuid=131456 Result: https://ralph.bakerlab.org/result.php?resultid=148875 And unmonitored my computer went into sleepmode, so it started to upload, when I got back to my computer again. This means that my computer was idle for some time, where it could have crunched something else. :-( So I aborted the next WU and I have set Ralph to No new work, untill you have this sorted out. I will not have a computer being in sleepmode for a longer time untill I can get to it again, so it can continue crunching. In worst case it can be for a whole day. :-( EDIT: Can't you make a watchdog to activate the WU again, after it has been idle for, let's say 3 minutes? Or 5 minutes? Not crunching my computer goes into sleepmode after 15 minutes. [color=navy][b]"I'm trying to maintain a shred of dignity in this world." - Me[/b][/color] |
tralala Send message Joined: 12 Apr 06 Posts: 52 Credit: 15,257 RAC: 0 |
This is a bug which was invented after 5.16 so I hope they can spot it and fix it completely rather than adding another safety mechanism. |
[B^S] sTrey Send message Joined: 15 Feb 06 Posts: 58 Credit: 15,430 RAC: 0 |
My Ralph calculates the WUs to 100% but doesnt send them, and they are still "active" but there is no further calculation, the programm continues with my rosetta WUs... I've noticed this too with 5.19 and 5.20. My pref is set to 2 hours and my crunching interval is 2:01. The wus I've been getting happen to finish early, say 1:45, go to 100% but then pause instead of completing. Nothing else such as downloads has triggered early rescheduling. The next time the wu gets crunch-time it completes immediately and uploads. Not causing any problems but it's definitely different behavior, and after about 5 in a row not counting one that errored out, it doesn't seem coincidental. sample result |
Honza Send message Joined: 16 Feb 06 Posts: 9 Credit: 1,962 RAC: 0 |
3WUs went fine, 4th got stucked at 100% for hours - https://ralph.bakerlab.org/result.php?resultid=150036. 3 more to go... |
Honza Send message Joined: 16 Feb 06 Posts: 9 Credit: 1,962 RAC: 0 |
(too late to edit). Another one sitting idle at 100% - https://ralph.bakerlab.org/result.php?resultid=150039 so 2 of 6 got stucked at finish in my case. |
dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0 |
Rom tells me it is waiting for the watchdog to finish for debugging. Here is his response: "When I added code .... to wait until the thread is finished, it stalls for up to 30 minutes waiting until watchdog makes its next check." I think the watchdog can take up to 2x the cpu run time pref, which may explain the longer stalls. |
tralala Send message Joined: 12 Apr 06 Posts: 52 Credit: 15,257 RAC: 0 |
Rom tells me it is waiting for the watchdog to finish for debugging. Does this mean it was intentionally implemented for debugging purposes? You could have saved us some investigation if you would have told us. Anyway it's good to know that the reason is known and won't delay any further development. |
crossworks Send message Joined: 19 May 06 Posts: 2 Credit: 510 RAC: 0 |
How long before you should abort WU's stuck at 100%? Why does my firewall show a lot of traffic for bonic ralph client even though its stuck at 100% I have all other projects suspended to see if the WU will report. |
NJMHoffmann Send message Joined: 17 Feb 06 Posts: 8 Credit: 1,270 RAC: 0 |
How long before you should abort WU's stuck at 100%? Why does my firewall show a lot of traffic for bonic ralph client even though its stuck at 100% I have all other projects suspended to see if the WU will report. Wild guess: The client is downloading (BIIIG) symbol tables for the debug output?? Norbert |
Carlos_Pfitzner Send message Joined: 16 Feb 06 Posts: 182 Credit: 22,792 RAC: 0 |
Rosetta_betta_5.20 Windows # This process generated 2 decoys from 2 attempts BOINC :: Watchdog shutting down... Unhandled Exception Detected... - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x77F9193C Engaging BOINC Windows Runtime Debugger... </stderr_txt> I abborted this result cause it was running using 0.0000% of CPU ie: STUCK https://ralph.bakerlab.org/result.php?resultid=150083 With 5.19 I waited for 6 hours, what happens, and rebooted too -:( *My preference runtime for ralph is 1 hour But I will not do this anymore -:) CPU Temperature changes can crack silicon and renders my 7 ghz putter innoperant *Also I am crunching to rosetta too. (CASP7) - in need of more cpu power ! Now, If CPU temperature decreases to below 60 C, and the alarm sounds, immediattely I act to find the cause -:) So, IF I go to asleep, I stop crunching for ralph first. *May be a sutck condition occurs while I asleep Thanks Click signature for global team stats |
Fuzzy Hollynoodles Send message Joined: 19 Feb 06 Posts: 37 Credit: 2,089 RAC: 0 |
Rom tells me it is waiting for the watchdog to finish for debugging. Yes, but my problem is that my computer goes into sleepmode after 15 minutes, and what then? Then it takes untill I get to it and can start it again. And then, if I'm unlucky, I can sit and wait with an idle computer for one hour untill the clock triggers the upload. No, I'm still on No new work here. :-( [color=navy][b]"I'm trying to maintain a shred of dignity in this world." - Me[/b][/color] |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Hi everybody: Rom and I fixed this silly watchdog thing. I'm sending out work now with ralph 5.21! Thanks for helping us out with this. Rom tells me it is waiting for the watchdog to finish for debugging. |
RodEllery Send message Joined: 20 Feb 06 Posts: 5 Credit: 8,820 RAC: 0 |
Had 4-5 computing errors over weekend with 5.20. All with similar error. See below. WU: 132525 Outcome Client error Client state Computing Exit status 1 (0x1) Computer ID 913 Report deadline 8 Jun 2006 23:40:23 UTC CPU time 0.550792 stderr out <core_client_version>5.4.9</core_client_version> <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> ERROR:: Exit at: .fragments.cc line:767 </stderr_txt> Validate state Invalid -- RodEllery |
crossworks Send message Joined: 19 May 06 Posts: 2 Credit: 510 RAC: 0 |
Had 4-5 computing errors over weekend with 5.20. I got that error when I killed 5.20.exe in windows task manger. I thought it was stuck. Next unit I wanted about 2 hours after it was 100% and it reported. |
Fuzzy Hollynoodles Send message Joined: 19 Feb 06 Posts: 37 Credit: 2,089 RAC: 0 |
Hi everybody: Rom and I fixed this silly watchdog thing. I'm sending out work now with ralph 5.21! Thanks for helping us out with this. Ok, let me give it a try again. [color=navy][b]"I'm trying to maintain a shred of dignity in this world." - Me[/b][/color] |
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.20
©2024 University of Washington
http://www.bakerlab.org