Bug reports for Ralph 5.08

Message boards : RALPH@home bug list : Bug reports for Ralph 5.08

To post messages, you must log in.

AuthorMessage
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1438 - Posted: 30 Apr 2006, 2:59:58 UTC

Thanks in advance for posting errors! In this round, we're going to try to track the last sources of errors in Rosetta, with help from some more powerful backtracing.
ID: 1438 · Report as offensive    Reply Quote
Profile [AF>France>Est>Lorraine]Le Zam
Avatar

Send message
Joined: 2 Mar 06
Posts: 9
Credit: 3,278
RAC: 0
Message 1443 - Posted: 30 Apr 2006, 18:23:34 UTC

Hello again, i have this problem :

30/04/2006 19:19:51|ralph@home|Computation for result JUMP_RELAX_ALLBARCODE03_1tul_SAVE_ALL_OUT_463_16_0 finished
30/04/2006 19:19:53|ralph@home|Starting result JUMP_RELAX_ALLBARCODE04_1tul_SAVE_ALL_OUT_463_16_0 using rosetta_beta version 508
30/04/2006 19:19:53|ralph@home|Unrecoverable error for result JUMP_RELAX_ALLBARCODE03_1tul_SAVE_ALL_OUT_463_16_0 (<file_xfer_error> <file_name>JUMP_RELAX_ALLBARCODE03_1tul_SAVE_ALL_OUT_463_16_0_0</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error>)

What could i do ?
ID: 1443 · Report as offensive    Reply Quote
Profile [AF>France>Est>Lorraine]Le Zam
Avatar

Send message
Joined: 2 Mar 06
Posts: 9
Credit: 3,278
RAC: 0
Message 1444 - Posted: 30 Apr 2006, 21:08:40 UTC - in response to Message 1443.  

Hello again, i have this problem :

30/04/2006 19:19:51|ralph@home|Computation for result JUMP_RELAX_ALLBARCODE03_1tul_SAVE_ALL_OUT_463_16_0 finished
30/04/2006 19:19:53|ralph@home|Starting result JUMP_RELAX_ALLBARCODE04_1tul_SAVE_ALL_OUT_463_16_0 using rosetta_beta version 508
30/04/2006 19:19:53|ralph@home|Unrecoverable error for result JUMP_RELAX_ALLBARCODE03_1tul_SAVE_ALL_OUT_463_16_0 (<file_xfer_error> <file_name>JUMP_RELAX_ALLBARCODE03_1tul_SAVE_ALL_OUT_463_16_0_0</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error>)

What could i do ?

An error again :
30/04/2006 22:27:32|ralph@home|Computation for result JUMP_RELAX_ALLBARCODE04_1tul_SAVE_ALL_OUT_463_16_0 finished
30/04/2006 22:27:34|ralph@home|Unrecoverable error for result JUMP_RELAX_ALLBARCODE04_1tul_SAVE_ALL_OUT_463_16_0 (<file_xfer_error> <file_name>JUMP_RELAX_ALLBARCODE04_1tul_SAVE_ALL_OUT_463_16_0_0</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error>)

ID: 1444 · Report as offensive    Reply Quote
doc :)

Send message
Joined: 16 Feb 06
Posts: 46
Credit: 4,437
RAC: 0
Message 1445 - Posted: 30 Apr 2006, 21:48:09 UTC

got that file xfer error on my first 5.08 unit too, the next 3 (2 different pcs) went ok. (was away the last 3 weeks, so my knowledge of what happened in that time is not the best :))

30/04/2006 08:44:00|ralph@home|Computation for task JUMP_RELAX_ALLBARCODE06_1tul_SAVE_ALL_OUT_463_4_0 finished
30/04/2006 08:44:02|ralph@home|Unrecoverable error for result JUMP_RELAX_ALLBARCODE06_1tul_SAVE_ALL_OUT_463_4_0 (<file_xfer_error> <file_name>JUMP_RELAX_ALLBARCODE06_1tul_SAVE_ALL_OUT_463_4_0_0</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error>)
ID: 1445 · Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 16 Feb 06
Posts: 251
Credit: 0
RAC: 0
Message 1446 - Posted: 1 May 2006, 0:47:22 UTC - in response to Message 1444.  

Hello again, i have this problem :

30/04/2006 19:19:51|ralph@home|Computation for result JUMP_RELAX_ALLBARCODE03_1tul_SAVE_ALL_OUT_463_16_0 finished
30/04/2006 19:19:53|ralph@home|Starting result JUMP_RELAX_ALLBARCODE04_1tul_SAVE_ALL_OUT_463_16_0 using rosetta_beta version 508
30/04/2006 19:19:53|ralph@home|Unrecoverable error for result JUMP_RELAX_ALLBARCODE03_1tul_SAVE_ALL_OUT_463_16_0 (<file_xfer_error> <file_name>JUMP_RELAX_ALLBARCODE03_1tul_SAVE_ALL_OUT_463_16_0_0</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error>)

What could i do ?

An error again :
30/04/2006 22:27:32|ralph@home|Computation for result JUMP_RELAX_ALLBARCODE04_1tul_SAVE_ALL_OUT_463_16_0 finished
30/04/2006 22:27:34|ralph@home|Unrecoverable error for result JUMP_RELAX_ALLBARCODE04_1tul_SAVE_ALL_OUT_463_16_0 (<file_xfer_error> <file_name>JUMP_RELAX_ALLBARCODE04_1tul_SAVE_ALL_OUT_463_16_0_0</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error>)


Rhiju commented on a similar error in This Post , but I am not certain yours is a Watchdog stop. The file errors are the same, and I know Rhiju had a post about that somewhere too, but I can't find it right now. He will pick this thread up again tomorrow if he does not catch it tonight sometime. My recollection is that this will self correct.

Moderator9
RALPH@home FAQs
RALPH@home Guidelines
Moderator Contact
ID: 1446 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1450 - Posted: 1 May 2006, 4:34:40 UTC - in response to Message 1446.  
Last modified: 1 May 2006, 4:36:02 UTC

Hi, thanks for posting. I've noticed that these workunits -- which are testing a new science protocol -- are occasionally not producing outfiles, and I figured out why thanks to these first tests on ralph! Its not a watchdog problem, but something that will be easy to fix in the app. I'll fix the problem and re-test on ralph before these ever go to rosetta@home. In the meanwhile, I'll send out some other tests.

Hello again, i have this problem :

30/04/2006 19:19:51|ralph@home|Computation for result JUMP_RELAX_ALLBARCODE03_1tul_SAVE_ALL_OUT_463_16_0 finished
30/04/2006 19:19:53|ralph@home|Starting result JUMP_RELAX_ALLBARCODE04_1tul_SAVE_ALL_OUT_463_16_0 using rosetta_beta version 508
30/04/2006 19:19:53|ralph@home|Unrecoverable error for result JUMP_RELAX_ALLBARCODE03_1tul_SAVE_ALL_OUT_463_16_0 ( JUMP_RELAX_ALLBARCODE03_1tul_SAVE_ALL_OUT_463_16_0_0 -161 )

What could i do ?

An error again :
30/04/2006 22:27:32|ralph@home|Computation for result JUMP_RELAX_ALLBARCODE04_1tul_SAVE_ALL_OUT_463_16_0 finished
30/04/2006 22:27:34|ralph@home|Unrecoverable error for result JUMP_RELAX_ALLBARCODE04_1tul_SAVE_ALL_OUT_463_16_0 ( JUMP_RELAX_ALLBARCODE04_1tul_SAVE_ALL_OUT_463_16_0_0 -161 )


Rhiju commented on a similar error in This Post , but I am not certain yours is a Watchdog stop. The file errors are the same, and I know Rhiju had a post about that somewhere too, but I can't find it right now. He will pick this thread up again tomorrow if he does not catch it tonight sometime. My recollection is that this will self correct.


ID: 1450 · Report as offensive    Reply Quote
Hans Sveen

Send message
Joined: 17 Feb 06
Posts: 11
Credit: 368,311
RAC: 0
Message 1452 - Posted: 1 May 2006, 17:48:16 UTC

Hello!
My 2 latest wu both crashed with exit code 1 (Incorrect function), the wu's are 88566(which also has crashed for someone else!) and 85760. Hope this will help You sort out the last? errors!

With regards and thank You!


Hans Sveen
Oslo, Norway

ID: 1452 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 1457 - Posted: 2 May 2006, 3:46:57 UTC

Do the WU still go by nstruct and not the user time setting?

https://ralph.bakerlab.org/result.php?resultid=101375

Anders n
ID: 1457 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 1477 - Posted: 5 May 2006, 1:17:25 UTC
Last modified: 5 May 2006, 1:23:17 UTC

Ralph@home I think you have problems with the new 5.08 version.
I just received 11 WU and within the first hour 8 of them had errored out with "Computation errors". The shortest ran for 1 minute 32 seconds the longest ran for 59 minutes and 57 seconds.
I am running them on a AMD dual 848 computer using Linux (Ralph computer id 2601). Preferences set to leave in memory as this seems to have stopped errors that I was getting on the 28/29th April where a number of Rosetta and Ralph WU's errored out on 3 different machines.
These WU come on my machine saying will take 8 hours 39 minutes to process, with only 3 left I doubt I will get anything (credit or sciece output) from these WU's as I expect all of them to fail.
WU's that so far have failed are:-
https://ralph.bakerlab.org/workunit.php?wuid=89629
https://ralph.bakerlab.org/workunit.php?wuid=89630
https://ralph.bakerlab.org/workunit.php?wuid=89653
https://ralph.bakerlab.org/workunit.php?wuid=89654
https://ralph.bakerlab.org/workunit.php?wuid=89656
https://ralph.bakerlab.org/workunit.php?wuid=89664
https://ralph.bakerlab.org/workunit.php?wuid=89665
https://ralph.bakerlab.org/workunit.php?wuid=89696
Thank you.

ID: 1477 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1479 - Posted: 5 May 2006, 2:26:29 UTC - in response to Message 1477.  

Thanks Conan. IT looks like a problem with that workunit type; we're looking into it. Glad we tested these on ralph before on rosetta@home!

Ralph@home I think you have problems with the new 5.08 version.
I just received 11 WU and within the first hour 8 of them had errored out with "Computation errors". The shortest ran for 1 minute 32 seconds the longest ran for 59 minutes and 57 seconds.
I am running them on a AMD dual 848 computer using Linux (Ralph computer id 2601). Preferences set to leave in memory as this seems to have stopped errors that I was getting on the 28/29th April where a number of Rosetta and Ralph WU's errored out on 3 different machines.
These WU come on my machine saying will take 8 hours 39 minutes to process, with only 3 left I doubt I will get anything (credit or sciece output) from these WU's as I expect all of them to fail.
WU's that so far have failed are:-
https://ralph.bakerlab.org/workunit.php?wuid=89629
https://ralph.bakerlab.org/workunit.php?wuid=89630
https://ralph.bakerlab.org/workunit.php?wuid=89653
https://ralph.bakerlab.org/workunit.php?wuid=89654
https://ralph.bakerlab.org/workunit.php?wuid=89656
https://ralph.bakerlab.org/workunit.php?wuid=89664
https://ralph.bakerlab.org/workunit.php?wuid=89665
https://ralph.bakerlab.org/workunit.php?wuid=89696
Thank you.


ID: 1479 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 1483 - Posted: 5 May 2006, 3:11:43 UTC

You're welcome. of the 3 I have left 2 have so ran for over 2 hours 20 minutes and over 60%, so they may finish. We will wait and see.
ID: 1483 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 1485 - Posted: 5 May 2006, 8:03:28 UTC

Lost another WU. That's 9 out of 11 that failed. 2 did finish though.
The failed WU with "Computational Error' was
https://ralph.bakerlab.org/workunit.php?wuid=89697 failed after almost 3 hours.
WU 89649 and 89650 were both successful.
ID: 1485 · Report as offensive    Reply Quote
Divide Overflow

Send message
Joined: 15 Feb 06
Posts: 12
Credit: 128,027
RAC: 0
Message 1489 - Posted: 5 May 2006, 17:20:01 UTC

ID: 1489 · Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 16 Feb 06
Posts: 251
Credit: 0
RAC: 0
Message 1491 - Posted: 5 May 2006, 18:06:07 UTC - in response to Message 1489.  
Last modified: 5 May 2006, 18:08:34 UTC

v5.08 has generated mixed results on my Linux box. Although several WU’s completed successfully, I’ve also had several result in computational errors:

https://ralph.bakerlab.org/result.php?resultid=102100
https://ralph.bakerlab.org/result.php?resultid=102101
https://ralph.bakerlab.org/result.php?resultid=102102
https://ralph.bakerlab.org/result.php?resultid=102103
https://ralph.bakerlab.org/result.php?resultid=102372
https://ralph.bakerlab.org/result.php?resultid=102373
https://ralph.bakerlab.org/result.php?resultid=102374
https://ralph.bakerlab.org/result.php?resultid=102375

These results came from a Work Unit type that had a problem. See this post suggesting they be aborted.
Moderator9
RALPH@home FAQs
RALPH@home Guidelines
Moderator Contact
ID: 1491 · Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 19 Feb 06
Posts: 37
Credit: 2,089
RAC: 0
Message 1498 - Posted: 5 May 2006, 23:15:31 UTC

This one errored out:

https://ralph.bakerlab.org/workunit.php?wuid=89670
(apparently for others also)

Result: https://ralph.bakerlab.org/result.php?resultid=102139


[color=navy][b]"I'm trying to maintain a shred of dignity in this world." - Me[/b][/color]

ID: 1498 · Report as offensive    Reply Quote

Message boards : RALPH@home bug list : Bug reports for Ralph 5.08



©2024 University of Washington
http://www.bakerlab.org