Bug reports for Ralph 5.20

Message boards : RALPH@home bug list : Bug reports for Ralph 5.20

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 250
Credit: 543,579
RAC: 0
Message 1751 - Posted: 3 Jun 2006, 1:11:00 UTC

This version has some boinc-related fixes in the watchdog and graphics.
ID: 1751 · Report as offensive    Reply Quote
Nikolay A. Saharov

Send message
Joined: 17 Feb 06
Posts: 6
Credit: 25,102
RAC: 0
Message 1752 - Posted: 3 Jun 2006, 5:10:38 UTC
Last modified: 3 Jun 2006, 5:37:51 UTC

Hi,

I have Ralph WU Result 149188 that is stuck in BOINC Mgr queue at 100% and time 1:20:42. It has status "Running". But in Graphics window the result is completed at 67.2% with time 1:20:45.

CPU usage is 50% and only another WU is really running. (I have P4-2.6 GHz HT with 2 logical CPUs). Or other words, 2 Ralph WUs are running but only one uses CPU at 50% and another at 0%.

PS: This result is completed now successfully and reported with messages:
BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

{Edit} No other problems.
{Edit 2} There was something like described in this post.
ID: 1752 · Report as offensive    Reply Quote
suguruhirahara

Send message
Joined: 5 Mar 06
Posts: 40
Credit: 11,320
RAC: 0
Message 1753 - Posted: 3 Jun 2006, 7:37:58 UTC - in response to Message 1751.  
Last modified: 3 Jun 2006, 7:38:10 UTC

This version has some boinc-related fixes in the watchdog and graphics.
I confirmed graphics has been fixed. It works more smoothly than before.
ID: 1753 · Report as offensive    Reply Quote
Niehaus
Avatar

Send message
Joined: 22 Feb 06
Posts: 10
Credit: 2,707
RAC: 0
Message 1755 - Posted: 3 Jun 2006, 14:21:16 UTC
Last modified: 3 Jun 2006, 14:33:34 UTC

My Ralph calculates the WUs to 100% but doesnt send them, and they are still "active" but there is no further calculation, the programm continues with my rosetta WUs...


Oh it DID send the WU after some time, sry!!!
ID: 1755 · Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 17 Feb 06
Posts: 50
Credit: 55,397
RAC: 0
Message 1756 - Posted: 3 Jun 2006, 14:53:44 UTC
Last modified: 3 Jun 2006, 14:55:15 UTC

I ran 3 work units.

Two actually completed but I suspect the "dormant" bug is still present as this first work (149120) unit completed in EXACTLY 1 hour with 36 min of CPU time, and this other one (149885) completed in EXACTLY 2 hours with 81 min of CPU time.

The third (149194) errored out with:
Unrecoverable error for result t296__CASP7_ABINITIO_SAVE_ALL_OUT_hom013__614_2_1 (One or more arguments are invalid (0x80000003) - exit code -2147483645 (0x80000003))

The upload indicated a watchdog shut down.

Mike

ID: 1756 · Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 19 Feb 06
Posts: 37
Credit: 2,089
RAC: 0
Message 1757 - Posted: 3 Jun 2006, 16:21:32 UTC - in response to Message 1756.  
Last modified: 3 Jun 2006, 16:23:53 UTC

I ran 3 work units.

Two actually completed but I suspect the "dormant" bug is still present as this first work (149120) unit completed in EXACTLY 1 hour with 36 min of CPU time, and this other one (149885) completed in EXACTLY 2 hours with 81 min of CPU time.

...

Mike


The "dormant" bug was in this one also: https://ralph.bakerlab.org/workunit.php?wuid=131456

Result: https://ralph.bakerlab.org/result.php?resultid=148875

And unmonitored my computer went into sleepmode, so it started to upload, when I got back to my computer again. This means that my computer was idle for some time, where it could have crunched something else. :-(

So I aborted the next WU and I have set Ralph to No new work, untill you have this sorted out. I will not have a computer being in sleepmode for a longer time untill I can get to it again, so it can continue crunching. In worst case it can be for a whole day. :-(


EDIT: Can't you make a watchdog to activate the WU again, after it has been idle for, let's say 3 minutes? Or 5 minutes? Not crunching my computer goes into sleepmode after 15 minutes.


[color=navy][b]"I'm trying to maintain a shred of dignity in this world." - Me[/b][/color]

ID: 1757 · Report as offensive    Reply Quote
tralala

Send message
Joined: 12 Apr 06
Posts: 52
Credit: 15,257
RAC: 0
Message 1758 - Posted: 3 Jun 2006, 17:54:54 UTC - in response to Message 1757.  


EDIT: Can't you make a watchdog to activate the WU again, after it has been idle for, let's say 3 minutes? Or 5 minutes? Not crunching my computer goes into sleepmode after 15 minutes.


This is a bug which was invented after 5.16 so I hope they can spot it and fix it completely rather than adding another safety mechanism.
ID: 1758 · Report as offensive    Reply Quote
[B^S] sTrey
Avatar

Send message
Joined: 15 Feb 06
Posts: 58
Credit: 15,430
RAC: 0
Message 1759 - Posted: 3 Jun 2006, 20:15:42 UTC - in response to Message 1755.  
Last modified: 3 Jun 2006, 20:18:41 UTC

My Ralph calculates the WUs to 100% but doesnt send them, and they are still "active" but there is no further calculation, the programm continues with my rosetta WUs...


Oh it DID send the WU after some time, sry!!!


I've noticed this too with 5.19 and 5.20. My pref is set to 2 hours and my crunching interval is 2:01. The wus I've been getting happen to finish early, say 1:45, go to 100% but then pause instead of completing. Nothing else such as downloads has triggered early rescheduling. The next time the wu gets crunch-time it completes immediately and uploads.

Not causing any problems but it's definitely different behavior, and after about 5 in a row not counting one that errored out, it doesn't seem coincidental.

sample result
ID: 1759 · Report as offensive    Reply Quote
Honza

Send message
Joined: 16 Feb 06
Posts: 9
Credit: 1,962
RAC: 0
Message 1761 - Posted: 4 Jun 2006, 9:20:02 UTC

3WUs went fine, 4th got stucked at 100% for hours - https://ralph.bakerlab.org/result.php?resultid=150036.
3 more to go...
ID: 1761 · Report as offensive    Reply Quote
Honza

Send message
Joined: 16 Feb 06
Posts: 9
Credit: 1,962
RAC: 0
Message 1762 - Posted: 4 Jun 2006, 12:53:26 UTC

(too late to edit). Another one sitting idle at 100% - https://ralph.bakerlab.org/result.php?resultid=150039 so 2 of 6 got stucked at finish in my case.
ID: 1762 · Report as offensive    Reply Quote
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 250
Credit: 543,579
RAC: 0
Message 1763 - Posted: 4 Jun 2006, 15:44:20 UTC

Rom tells me it is waiting for the watchdog to finish for debugging.

Here is his response:

"When I added code .... to wait until
the thread is finished, it stalls for up to 30 minutes waiting until
watchdog makes its next check."

I think the watchdog can take up to 2x the cpu run time pref, which may explain the longer stalls.
ID: 1763 · Report as offensive    Reply Quote
tralala

Send message
Joined: 12 Apr 06
Posts: 52
Credit: 15,257
RAC: 0
Message 1764 - Posted: 4 Jun 2006, 16:04:37 UTC - in response to Message 1763.  

Rom tells me it is waiting for the watchdog to finish for debugging.

Here is his response:

"When I added code .... to wait until
the thread is finished, it stalls for up to 30 minutes waiting until
watchdog makes its next check."

I think the watchdog can take up to 2x the cpu run time pref, which may explain the longer stalls.


Does this mean it was intentionally implemented for debugging purposes? You could have saved us some investigation if you would have told us. Anyway it's good to know that the reason is known and won't delay any further development.
ID: 1764 · Report as offensive    Reply Quote
crossworks

Send message
Joined: 19 May 06
Posts: 2
Credit: 510
RAC: 0
Message 1765 - Posted: 4 Jun 2006, 19:28:55 UTC

How long before you should abort WU's stuck at 100%? Why does my firewall show a lot of traffic for bonic ralph client even though its stuck at 100% I have all other projects suspended to see if the WU will report.
ID: 1765 · Report as offensive    Reply Quote
NJMHoffmann

Send message
Joined: 17 Feb 06
Posts: 8
Credit: 1,270
RAC: 0
Message 1766 - Posted: 4 Jun 2006, 20:48:49 UTC - in response to Message 1765.  

How long before you should abort WU's stuck at 100%? Why does my firewall show a lot of traffic for bonic ralph client even though its stuck at 100% I have all other projects suspended to see if the WU will report.

Wild guess: The client is downloading (BIIIG) symbol tables for the debug output??

Norbert
ID: 1766 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 1767 - Posted: 4 Jun 2006, 22:13:48 UTC
Last modified: 4 Jun 2006, 22:21:14 UTC

Rosetta_betta_5.20 Windows

# This process generated 2 decoys from 2 attempts


BOINC :: Watchdog shutting down...


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x77F9193C

Engaging BOINC Windows Runtime Debugger...


</stderr_txt>
I abborted this result cause it was running using 0.0000% of CPU ie: STUCK

https://ralph.bakerlab.org/result.php?resultid=150083

With 5.19 I waited for 6 hours, what happens, and rebooted too -:(
*My preference runtime for ralph is 1 hour

But I will not do this anymore -:)
CPU Temperature changes can crack silicon
and renders my 7 ghz putter innoperant
*Also I am crunching to rosetta too. (CASP7) - in need of more cpu power !

Now,
If CPU temperature decreases to below 60 C, and the alarm sounds,
immediattely I act to find the cause -:)

So, IF I go to asleep, I stop crunching for ralph first.
*May be a sutck condition occurs while I asleep
Thanks
Click signature for global team stats
ID: 1767 · Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 19 Feb 06
Posts: 37
Credit: 2,089
RAC: 0
Message 1769 - Posted: 5 Jun 2006, 2:50:21 UTC - in response to Message 1763.  

Rom tells me it is waiting for the watchdog to finish for debugging.

Here is his response:

"When I added code .... to wait until
the thread is finished, it stalls for up to 30 minutes waiting until
watchdog makes its next check."

I think the watchdog can take up to 2x the cpu run time pref, which may explain the longer stalls.


Yes, but my problem is that my computer goes into sleepmode after 15 minutes, and what then? Then it takes untill I get to it and can start it again. And then, if I'm unlucky, I can sit and wait with an idle computer for one hour untill the clock triggers the upload.

No, I'm still on No new work here. :-(



[color=navy][b]"I'm trying to maintain a shred of dignity in this world." - Me[/b][/color]

ID: 1769 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1770 - Posted: 5 Jun 2006, 2:57:30 UTC - in response to Message 1769.  

Hi everybody: Rom and I fixed this silly watchdog thing. I'm sending out work now with ralph 5.21! Thanks for helping us out with this.

Rom tells me it is waiting for the watchdog to finish for debugging.

Here is his response:

"When I added code .... to wait until
the thread is finished, it stalls for up to 30 minutes waiting until
watchdog makes its next check."

I think the watchdog can take up to 2x the cpu run time pref, which may explain the longer stalls.


Yes, but my problem is that my computer goes into sleepmode after 15 minutes, and what then? Then it takes untill I get to it and can start it again. And then, if I'm unlucky, I can sit and wait with an idle computer for one hour untill the clock triggers the upload.

No, I'm still on No new work here. :-(




ID: 1770 · Report as offensive    Reply Quote
Profile RodEllery

Send message
Joined: 20 Feb 06
Posts: 5
Credit: 8,820
RAC: 0
Message 1772 - Posted: 5 Jun 2006, 15:29:47 UTC

Had 4-5 computing errors over weekend with 5.20.

All with similar error. See below.

WU: 132525
Outcome Client error
Client state Computing
Exit status 1 (0x1)
Computer ID 913
Report deadline 8 Jun 2006 23:40:23 UTC
CPU time 0.550792
stderr out <core_client_version>5.4.9</core_client_version>
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
ERROR:: Exit at: .fragments.cc line:767

</stderr_txt>


Validate state Invalid

--
RodEllery
ID: 1772 · Report as offensive    Reply Quote
crossworks

Send message
Joined: 19 May 06
Posts: 2
Credit: 510
RAC: 0
Message 1773 - Posted: 5 Jun 2006, 16:00:20 UTC - in response to Message 1772.  
Last modified: 5 Jun 2006, 16:00:44 UTC

Had 4-5 computing errors over weekend with 5.20.

All with similar error. See below.

WU: 132525
Outcome Client error
Client state Computing
Exit status 1 (0x1)
Computer ID 913
Report deadline 8 Jun 2006 23:40:23 UTC
CPU time 0.550792
stderr out <core_client_version>5.4.9</core_client_version>
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
ERROR:: Exit at: .fragments.cc line:767

</stderr_txt>


Validate state Invalid

--
RodEllery

I got that error when I killed 5.20.exe in windows task manger. I thought it was stuck. Next unit I wanted about 2 hours after it was 100% and it reported.

ID: 1773 · Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 19 Feb 06
Posts: 37
Credit: 2,089
RAC: 0
Message 1775 - Posted: 5 Jun 2006, 19:55:19 UTC - in response to Message 1770.  

Hi everybody: Rom and I fixed this silly watchdog thing. I'm sending out work now with ralph 5.21! Thanks for helping us out with this.



Ok, let me give it a try again.



[color=navy][b]"I'm trying to maintain a shred of dignity in this world." - Me[/b][/color]

ID: 1775 · Report as offensive    Reply Quote
1 · 2 · Next

Message boards : RALPH@home bug list : Bug reports for Ralph 5.20



©2024 University of Washington
http://www.bakerlab.org