Bug Report for Ralph 5.41

Message boards : RALPH@home bug list : Bug Report for Ralph 5.41

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Chu
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 Sep 06
Posts: 61
Credit: 12,545
RAC: 0
Message 2560 - Posted: 30 Nov 2006, 2:01:16 UTC

Ralph has been updated to 5.41. In this update, several previously found bugs were fixed. Those are:

1. bug of do checkpointing even after Rosetta is finished.
2. bug of "error" deleting some intermediate files after they are gzipped.
3. watchdog failure -- when a run is stuck and caught by the watchdog, the results, if there is any, will be returned and validated. Credits will be assigned acoordingly.
4. some other bugs related to Rosetta Science.

A new feature of reading Rosetta command from an input file is added and this gives more flexible interface to set up runs on BOINC.

Thanks for everyone's support and please report bugs here!
ID: 2560 · Report as offensive    Reply Quote
Profile Krzychu P.

Send message
Joined: 16 Feb 06
Posts: 19
Credit: 25,687
RAC: 0
Message 2561 - Posted: 30 Nov 2006, 6:55:49 UTC

I get:
2006-11-30 08:38:51|ralph@home|Unrecoverable error for result DOC_R061113_1BQL_p2_fa_relax_from_native_unbound_1514_3_0 (Niepoprawna funkcja. (0x1) - exit code 1 (0x1))

<core_client_version>5.7.5</core_client_version>
<![CDATA[
<message>
Niepoprawna funkcja. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 2846143
ABORT: bad to aa_rotno_to_packedrotno
aa,rot1/2/3/4: MET 11 3 0 0 0
chi no 2 nchi 3 aav 1 is_chi_proton_rotamer(aa,aav,i) 0
ERROR:: Exit at: .rotamer_functions.cc line:1441

</stderr_txt>
]]>

I don't know what this mean. Is it a bug, or my computer works badly?
ID: 2561 · Report as offensive    Reply Quote
Profile Krzychu P.

Send message
Joined: 16 Feb 06
Posts: 19
Credit: 25,687
RAC: 0
Message 2562 - Posted: 30 Nov 2006, 9:26:42 UTC

Another error:
2006-11-30 11:17:33|ralph@home|Unrecoverable error for result DOC_R061113_1DFJ_p1_fa_relax_from_native_unbound_reduced_cycles_1515_4_0 (Niepoprawna funkcja. (0x1) - exit code 1 (0x1))

<core_client_version>5.7.5</core_client_version>
<![CDATA[
<message>
Niepoprawna funkcja. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 2845892
ABORT: bad to aa_rotno_to_packedrotno
aa,rot1/2/3/4: GLN 14 0 2 4 0
chi no 1 nchi 3 aav 1 is_chi_proton_rotamer(aa,aav,i) 0
ERROR:: Exit at: .rotamer_functions.cc line:1441

</stderr_txt>
]]>
ID: 2562 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2563 - Posted: 30 Nov 2006, 21:48:57 UTC

Regarding the new feature to run Rosetta from a command file and a more flexible interface for setting up runs on BOINC...

I just wanted to confirm... this will make things more flexible for... the project team, right? Or if users can make use of this in some way, then of course we'll want to know how to utilize it and what it can do. If for the project team, another brief sentence about what this will enable you to do in the future would be nice. It will... ??? allow you to setup the runtime parameters for a set of WUs faster? and in a way that's less prone to error? ...or to produce more WUs on the server with less overhead? Or what?

=============

Also, I wasn't clear on what you fixed in item 3. Was the watchdog not stopping WUs? Or were the completed models not being properly preserved?
ID: 2563 · Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 2564 - Posted: 30 Nov 2006, 23:18:09 UTC - in response to Message 2560.  
Last modified: 1 Dec 2006, 0:03:21 UTC

1. bug of do checkpointing even after Rosetta is finished.

This is good news!! I'm looking forward! Because now I can see a Rosetta 5.40 app sleeping on both of my hosts (linux and win) on the 100% mark. This takes usually a half to one day, depending on how deep the STD is and how does it jump up and down, and keeps some plenty of MBs of memory and swap (keep-in-memory while preempted).

Peter
ID: 2564 · Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 2565 - Posted: 30 Nov 2006, 23:21:41 UTC - in response to Message 2564.  

Message 2564 - Posted 30 Nov 2006 23:18:09 UTC - in response to Message ID 2560. [Edit this post]
Last modified: 1 Dec 2006 0:03:21 UTC

1. bug of do checkpointing even after Rosetta is finished.

This is good news!! I'm looking forward!

Huh? Am I mad? It took me few seconds to re-edit my message. Did the server's time jump away? It's 23:21 UTC now!

Peter
ID: 2565 · Report as offensive    Reply Quote
Chu
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 Sep 06
Posts: 61
Credit: 12,545
RAC: 0
Message 2566 - Posted: 30 Nov 2006, 23:24:55 UTC - in response to Message 2563.  

The command line file is added for the project team. To test a lot of Rosetta parameters without changing the executable, we made them as input arguments from the command line. One impact of doing so is that Rosetta command line becomes longer and longer, difficutlt to remember and difficult to set up ( and more errors could slip through). The file is meant to help that aspect. In my personal opinion, this is a positive step, though still far away to go, to provide a more friendly control interface for Rosetta, such as to build up a graphic interface and a pull-down menu etc in the future.

Sorry for not making it more clear on the watchdog issue. It did stop the WUs if the run is found to be stuck or running too long and it did preserve models which have been completed. However, the old behavior would throw an errror if there was no model generated before being caught by watchdog and with the fix, this should no longer happen any more. The empty result file (not really empty as it says it is from a watch dog error) will be returned and recognized by the validator and the credit will be assigned.
Regarding the new feature to run Rosetta from a command file and a more flexible interface for setting up runs on BOINC...

I just wanted to confirm... this will make things more flexible for... the project team, right? Or if users can make use of this in some way, then of course we'll want to know how to utilize it and what it can do. If for the project team, another brief sentence about what this will enable you to do in the future would be nice. It will... ??? allow you to setup the runtime parameters for a set of WUs faster? and in a way that's less prone to error? ...or to produce more WUs on the server with less overhead? Or what?

=============

Also, I wasn't clear on what you fixed in item 3. Was the watchdog not stopping WUs? Or were the completed models not being properly preserved?

ID: 2566 · Report as offensive    Reply Quote
Profile Krzychu P.

Send message
Joined: 16 Feb 06
Posts: 19
Credit: 25,687
RAC: 0
Message 2567 - Posted: 1 Dec 2006, 7:02:26 UTC

After about 40 minutes of computing:

2006-12-01 08:07:29|ralph@home|Unrecoverable error for result 1mkyA_ETABLE_TEST_ABRELAX_rhh13sm6atrrep__1519_8_0 (Niepoprawna funkcja. (0x1) - exit code 1 (0x1))

<stderr_txt>
fullatom_setup.cc: CHANGING fa_max_dis to 6!!!!!!!!!!!!!
setting hydrogen_interaction_cutoff to: 6
# random seed: 2845518
# cpu_run_time_pref: 3600
fullatom_setup.cc: CHANGING fa_max_dis to 6!!!!!!!!!!!!!
setting hydrogen_interaction_cutoff to: 6
# cpu_run_time_pref: 3600
sin_cos_range ERROR: -1.#IND000 is outside of [-1,+1] range
sin_cos_range ERROR: -1.#IND000 is outside of [-1,+1] range
ERROR:: Exit at: .fullatom_energy.cc line:2002
ID: 2567 · Report as offensive    Reply Quote
Chu
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 Sep 06
Posts: 61
Credit: 12,545
RAC: 0
Message 2569 - Posted: 1 Dec 2006, 20:38:46 UTC - in response to Message 2567.  

Hi Krzychu P., thanks for reporting all these errors. If I remember correctly, I have also seen you reporting similar type errors for the previous versions of ralph applications and other types of WUs. From the stderr output, it looks like that Rosetta simulations were found to enter some bad conditions and triggered pre-mature exits. Although we are not 100% sure on what have been wrong, it is mostly likely due to a corrupted database file. For those WUs you have reported problems, they seem to be running ok on other clients' computers (on both ralph and boinc) with a fairly good successful rate. In other words, this type of error seems to happen on your computer much more frequently than average and this leads to my suspicion that there might be an issue of your computer to handle those input database files. I am not an expert on computer hardware and boinc setup and I just want to bring this to your attention. Do you use this computer also run Rosetta@Home? Have you seen similar errors for the WUs from Rosetta@Home? Have you noticed any other signs of a potential hardware or software problem? From the stderr file, I can see your computer is running a non-English version of operating system. Could that be the reason that some of the files are not input correctly? Maybe there are some other experts here who can have a better idea.

Again, thank you for your contribution and support for our project.
After about 40 minutes of computing:

2006-12-01 08:07:29|ralph@home|Unrecoverable error for result 1mkyA_ETABLE_TEST_ABRELAX_rhh13sm6atrrep__1519_8_0 (Niepoprawna funkcja. (0x1) - exit code 1 (0x1))


fullatom_setup.cc: CHANGING fa_max_dis to 6!!!!!!!!!!!!!
setting hydrogen_interaction_cutoff to: 6
# random seed: 2845518
# cpu_run_time_pref: 3600
fullatom_setup.cc: CHANGING fa_max_dis to 6!!!!!!!!!!!!!
setting hydrogen_interaction_cutoff to: 6
# cpu_run_time_pref: 3600
sin_cos_range ERROR: -1.#IND000 is outside of [-1,+1] range
sin_cos_range ERROR: -1.#IND000 is outside of [-1,+1] range
ERROR:: Exit at: .fullatom_energy.cc line:2002

ID: 2569 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2570 - Posted: 2 Dec 2006, 4:29:39 UTC

Thanks for the reply Chu. I was glad to see the improved descriptions on Rosetta.
ID: 2570 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 2571 - Posted: 2 Dec 2006, 10:58:18 UTC

Wu https://ralph.bakerlab.org/result.php?resultid=331672

file_name>CAPRI_11_t27j_SMALLPERTURBATION_DOCKING_1520_7_0_0</file_name

Is it working as it should as no credits was assignd?

Anders n
ID: 2571 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 2572 - Posted: 2 Dec 2006, 15:54:11 UTC - in response to Message 2571.  

Wu https://ralph.bakerlab.org/result.php?resultid=331672

file_name>CAPRI_11_t27j_SMALLPERTURBATION_DOCKING_1520_7_0_0</file_name

Is it working as it should as no credits was assignd?

Anders n



Oups it´s a <error_code>-161</error_code> on that one and this one

https://ralph.bakerlab.org/result.php?resultid=331673

Anders n

ID: 2572 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 2573 - Posted: 2 Dec 2006, 22:59:51 UTC

Old error 161 is still with us

https://ralph.bakerlab.org/result.php?resultid=331711

stderr out

<core_client_version>5.2.14</core_client_version>
<stderr_txt>
Graphics are disabled due to configuration...
# random seed: 2845454
# cpu_run_time_pref: 21600
======================================================
DONE :: 1 starting structures built 126 (nstruct) times
This process generated 126 decoys from 126 attempts
0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message><file_xfer_error>
<file_name>CAPRI_11_t27j_SMALLPERTURBATION_DOCKING_1520_12_1_0</file_name>
<error_code>-161</error_code>
<error_message></error_message>
</file_xfer_error>

</message>

Validate state Invalid
ID: 2573 · Report as offensive    Reply Quote
Chu
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 Sep 06
Posts: 61
Credit: 12,545
RAC: 0
Message 2574 - Posted: 3 Dec 2006, 4:51:16 UTC - in response to Message 2573.  

I think those errors (exit 161) are from a batch of problematic WUs, not general for the application as we have seen before.
Old error 161 is still with us

https://ralph.bakerlab.org/result.php?resultid=331711

stderr out

5.2.14

Graphics are disabled due to configuration...
# random seed: 2845454
# cpu_run_time_pref: 21600
======================================================
DONE :: 1 starting structures built 126 (nstruct) times
This process generated 126 decoys from 126 attempts
0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...



CAPRI_11_t27j_SMALLPERTURBATION_DOCKING_1520_12_1_0
-161





Validate state Invalid

ID: 2574 · Report as offensive    Reply Quote
Profile sslickerson

Send message
Joined: 15 Feb 06
Posts: 17
Credit: 4,006
RAC: 0
Message 2576 - Posted: 3 Dec 2006, 17:05:42 UTC

Several errors: 333210, 333209
ID: 2576 · Report as offensive    Reply Quote
blackbird

Send message
Joined: 19 Feb 06
Posts: 2
Credit: 12,029
RAC: 0
Message 2577 - Posted: 4 Dec 2006, 14:40:02 UTC

WU 291498 crashed on SUSE Linux 10.1 - kernel 2.6.18.2-jen37-default
stderr.txt:
SIGSEGV: segmentation violation
Stack trace (34 frames):
[0x8a47b6f]
[0x8a638bc]
[0xa7ff4420]
[0x8ae4300]
[0x8ae5bb9]
[0x8ab48d7]
[0x8a9a1ba]
[0x8a9a22d]
[0x8a9a690]
[0x8a9766a]
[0x8a98ba2]
[0x8a90e32]
[0x8a2bce1]
[0x879288c]
[0x81d88d4]
[0x8788300]
[0x804cf1d]
[0x84f40cd]
[0x861a7fd]
[0x861c62c]
[0x861e38f]
[0x862b559]
[0x84f858e]
[0x862ec65]
[0x804d695]
[0x87433d5]
[0x8745e9f]
[0x8746cc0]
[0x82ef851]
[0x84c998f]
[0x85dd0a3]
[0x85dd14c]
[0x8ac2da4]
[0x8048111]

tail stdout.txt:
Cycle: 41 -- Monte_carlo:: best_score,low_score: -502.661 -502.661
Cycle: 42 -- Monte_carlo:: best_score,low_score: -502.661 -502.661
Cycle: 43 -- Monte_carlo:: best_score,low_score: -502.661 -502.661
Cycle: 44 -- Monte_carlo:: best_score,low_score: -502.661 -502.661
Cycle: 45 -- Monte_carlo:: best_score,low_score: -502.661 -502.661
pose_docking_mcm --2 out of 45 monte carlo cycles accepted
***** pose_docking_monte_carlo_minimize *****

***** pose_docking_minimize_trial *****
WARNING!!! position and full_coord disagree!

ID: 2577 · Report as offensive    Reply Quote
Profile [B^S] JoeB@Ky

Send message
Joined: 11 Oct 06
Posts: 8
Credit: 39,098
RAC: 0
Message 2578 - Posted: 5 Dec 2006, 3:09:34 UTC - in response to Message 2560.  

Ralph has been updated to 5.41. In this update, several previously found bugs were fixed. Those are:

1. bug of do checkpointing even after Rosetta is finished.
2. bug of "error" deleting some intermediate files after they are gzipped.
3. watchdog failure -- when a run is stuck and caught by the watchdog, the results, if there is any, will be returned and validated. Credits will be assigned acoordingly.
4. some other bugs related to Rosetta Science.

A new feature of reading Rosetta command from an input file is added and this gives more flexible interface to set up runs on BOINC.

Thanks for everyone's support and please report bugs here!

ID: 2578 · Report as offensive    Reply Quote
Profile [B^S] JoeB@Ky

Send message
Joined: 11 Oct 06
Posts: 8
Credit: 39,098
RAC: 0
Message 2579 - Posted: 5 Dec 2006, 3:13:07 UTC - in response to Message 2560.  

1. Time to completion clock still only counts up. Then suddenly updates with a lower number.
2. Percent Completed does not change with CPU time or Time to Completion.
ID: 2579 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2580 - Posted: 5 Dec 2006, 13:37:41 UTC

Joe, are you reporting something new that's occuring? Or are you new to Rosetta and Ralph? Because the way you've described it so far could be the normal behavior described here and here.
ID: 2580 · Report as offensive    Reply Quote
Profile Saenger
Avatar

Send message
Joined: 28 Feb 06
Posts: 13
Credit: 67,395
RAC: 0
Message 2581 - Posted: 5 Dec 2006, 18:59:22 UTC

Here's my last result:
stderr out

<core_client_version>5.4.9</core_client_version>
<stderr_txt>
Graphics are disabled due to configuration...
# random seed: 2844030
# cpu_run_time_pref: 14400
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
No heartbeat from core client for 31 sec - exiting
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
======================================================
DONE :: 1 starting structures built 34 (nstruct) times
This process generated 34 decoys from 34 attempts
0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>t103_test_LARS_constraints_6_IGNORE_THE_RESTS_00001_0000492_0.pdb_1528_16_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>

Validate state Invalid
Claimed credit 20.4061201355507
Granted credit 0
application version 5.41


The WU has been crunched by another guy/gal with an error as well.
Grüße vom Sänger
ID: 2581 · Report as offensive    Reply Quote
1 · 2 · Next

Message boards : RALPH@home bug list : Bug Report for Ralph 5.41



©2024 University of Washington
http://www.bakerlab.org