Message boards : RALPH@home bug list : Bug Report for Ralph 5.41
Author | Message |
---|---|
Chu Volunteer moderator Project developer Project scientist Send message Joined: 26 Sep 06 Posts: 61 Credit: 12,545 RAC: 0 |
Ralph has been updated to 5.41. In this update, several previously found bugs were fixed. Those are: 1. bug of do checkpointing even after Rosetta is finished. 2. bug of "error" deleting some intermediate files after they are gzipped. 3. watchdog failure -- when a run is stuck and caught by the watchdog, the results, if there is any, will be returned and validated. Credits will be assigned acoordingly. 4. some other bugs related to Rosetta Science. A new feature of reading Rosetta command from an input file is added and this gives more flexible interface to set up runs on BOINC. Thanks for everyone's support and please report bugs here! |
Krzychu P. Send message Joined: 16 Feb 06 Posts: 19 Credit: 25,687 RAC: 0 |
I get: 2006-11-30 08:38:51|ralph@home|Unrecoverable error for result DOC_R061113_1BQL_p2_fa_relax_from_native_unbound_1514_3_0 (Niepoprawna funkcja. (0x1) - exit code 1 (0x1)) <core_client_version>5.7.5</core_client_version> <![CDATA[ <message> Niepoprawna funkcja. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # random seed: 2846143 ABORT: bad to aa_rotno_to_packedrotno aa,rot1/2/3/4: MET 11 3 0 0 0 chi no 2 nchi 3 aav 1 is_chi_proton_rotamer(aa,aav,i) 0 ERROR:: Exit at: .rotamer_functions.cc line:1441 </stderr_txt> ]]> I don't know what this mean. Is it a bug, or my computer works badly? |
Krzychu P. Send message Joined: 16 Feb 06 Posts: 19 Credit: 25,687 RAC: 0 |
Another error: 2006-11-30 11:17:33|ralph@home|Unrecoverable error for result DOC_R061113_1DFJ_p1_fa_relax_from_native_unbound_reduced_cycles_1515_4_0 (Niepoprawna funkcja. (0x1) - exit code 1 (0x1)) <core_client_version>5.7.5</core_client_version> <![CDATA[ <message> Niepoprawna funkcja. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # random seed: 2845892 ABORT: bad to aa_rotno_to_packedrotno aa,rot1/2/3/4: GLN 14 0 2 4 0 chi no 1 nchi 3 aav 1 is_chi_proton_rotamer(aa,aav,i) 0 ERROR:: Exit at: .rotamer_functions.cc line:1441 </stderr_txt> ]]> |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Regarding the new feature to run Rosetta from a command file and a more flexible interface for setting up runs on BOINC... I just wanted to confirm... this will make things more flexible for... the project team, right? Or if users can make use of this in some way, then of course we'll want to know how to utilize it and what it can do. If for the project team, another brief sentence about what this will enable you to do in the future would be nice. It will... ??? allow you to setup the runtime parameters for a set of WUs faster? and in a way that's less prone to error? ...or to produce more WUs on the server with less overhead? Or what? ============= Also, I wasn't clear on what you fixed in item 3. Was the watchdog not stopping WUs? Or were the completed models not being properly preserved? |
Pepo Send message Joined: 8 Sep 06 Posts: 104 Credit: 36,890 RAC: 0 |
1. bug of do checkpointing even after Rosetta is finished. This is good news!! I'm looking forward! Because now I can see a Rosetta 5.40 app sleeping on both of my hosts (linux and win) on the 100% mark. This takes usually a half to one day, depending on how deep the STD is and how does it jump up and down, and keeps some plenty of MBs of memory and swap (keep-in-memory while preempted). Peter |
Pepo Send message Joined: 8 Sep 06 Posts: 104 Credit: 36,890 RAC: 0 |
Message 2564 - Posted 30 Nov 2006 23:18:09 UTC - in response to Message ID 2560. [Edit this post] Huh? Am I mad? It took me few seconds to re-edit my message. Did the server's time jump away? It's 23:21 UTC now! Peter |
Chu Volunteer moderator Project developer Project scientist Send message Joined: 26 Sep 06 Posts: 61 Credit: 12,545 RAC: 0 |
The command line file is added for the project team. To test a lot of Rosetta parameters without changing the executable, we made them as input arguments from the command line. One impact of doing so is that Rosetta command line becomes longer and longer, difficutlt to remember and difficult to set up ( and more errors could slip through). The file is meant to help that aspect. In my personal opinion, this is a positive step, though still far away to go, to provide a more friendly control interface for Rosetta, such as to build up a graphic interface and a pull-down menu etc in the future. Sorry for not making it more clear on the watchdog issue. It did stop the WUs if the run is found to be stuck or running too long and it did preserve models which have been completed. However, the old behavior would throw an errror if there was no model generated before being caught by watchdog and with the fix, this should no longer happen any more. The empty result file (not really empty as it says it is from a watch dog error) will be returned and recognized by the validator and the credit will be assigned. Regarding the new feature to run Rosetta from a command file and a more flexible interface for setting up runs on BOINC... |
Krzychu P. Send message Joined: 16 Feb 06 Posts: 19 Credit: 25,687 RAC: 0 |
After about 40 minutes of computing: 2006-12-01 08:07:29|ralph@home|Unrecoverable error for result 1mkyA_ETABLE_TEST_ABRELAX_rhh13sm6atrrep__1519_8_0 (Niepoprawna funkcja. (0x1) - exit code 1 (0x1)) <stderr_txt> fullatom_setup.cc: CHANGING fa_max_dis to 6!!!!!!!!!!!!! setting hydrogen_interaction_cutoff to: 6 # random seed: 2845518 # cpu_run_time_pref: 3600 fullatom_setup.cc: CHANGING fa_max_dis to 6!!!!!!!!!!!!! setting hydrogen_interaction_cutoff to: 6 # cpu_run_time_pref: 3600 sin_cos_range ERROR: -1.#IND000 is outside of [-1,+1] range sin_cos_range ERROR: -1.#IND000 is outside of [-1,+1] range ERROR:: Exit at: .fullatom_energy.cc line:2002 |
Chu Volunteer moderator Project developer Project scientist Send message Joined: 26 Sep 06 Posts: 61 Credit: 12,545 RAC: 0 |
Hi Krzychu P., thanks for reporting all these errors. If I remember correctly, I have also seen you reporting similar type errors for the previous versions of ralph applications and other types of WUs. From the stderr output, it looks like that Rosetta simulations were found to enter some bad conditions and triggered pre-mature exits. Although we are not 100% sure on what have been wrong, it is mostly likely due to a corrupted database file. For those WUs you have reported problems, they seem to be running ok on other clients' computers (on both ralph and boinc) with a fairly good successful rate. In other words, this type of error seems to happen on your computer much more frequently than average and this leads to my suspicion that there might be an issue of your computer to handle those input database files. I am not an expert on computer hardware and boinc setup and I just want to bring this to your attention. Do you use this computer also run Rosetta@Home? Have you seen similar errors for the WUs from Rosetta@Home? Have you noticed any other signs of a potential hardware or software problem? From the stderr file, I can see your computer is running a non-English version of operating system. Could that be the reason that some of the files are not input correctly? Maybe there are some other experts here who can have a better idea. Again, thank you for your contribution and support for our project. After about 40 minutes of computing: |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Thanks for the reply Chu. I was glad to see the improved descriptions on Rosetta. |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
Wu https://ralph.bakerlab.org/result.php?resultid=331672 file_name>CAPRI_11_t27j_SMALLPERTURBATION_DOCKING_1520_7_0_0</file_name Is it working as it should as no credits was assignd? Anders n |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
Wu https://ralph.bakerlab.org/result.php?resultid=331672 Oups it´s a <error_code>-161</error_code> on that one and this one https://ralph.bakerlab.org/result.php?resultid=331673 Anders n |
Conan Send message Joined: 16 Feb 06 Posts: 364 Credit: 1,368,421 RAC: 0 |
Old error 161 is still with us https://ralph.bakerlab.org/result.php?resultid=331711 stderr out <core_client_version>5.2.14</core_client_version> <stderr_txt> Graphics are disabled due to configuration... # random seed: 2845454 # cpu_run_time_pref: 21600 ====================================================== DONE :: 1 starting structures built 126 (nstruct) times This process generated 126 decoys from 126 attempts 0 starting pdbs were skipped ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> <message><file_xfer_error> <file_name>CAPRI_11_t27j_SMALLPERTURBATION_DOCKING_1520_12_1_0</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> </message> Validate state Invalid |
Chu Volunteer moderator Project developer Project scientist Send message Joined: 26 Sep 06 Posts: 61 Credit: 12,545 RAC: 0 |
I think those errors (exit 161) are from a batch of problematic WUs, not general for the application as we have seen before. Old error 161 is still with us |
sslickerson Send message Joined: 15 Feb 06 Posts: 17 Credit: 4,006 RAC: 0 |
|
blackbird Send message Joined: 19 Feb 06 Posts: 2 Credit: 12,029 RAC: 0 |
WU 291498 crashed on SUSE Linux 10.1 - kernel 2.6.18.2-jen37-default stderr.txt: SIGSEGV: segmentation violation Stack trace (34 frames): [0x8a47b6f] [0x8a638bc] [0xa7ff4420] [0x8ae4300] [0x8ae5bb9] [0x8ab48d7] [0x8a9a1ba] [0x8a9a22d] [0x8a9a690] [0x8a9766a] [0x8a98ba2] [0x8a90e32] [0x8a2bce1] [0x879288c] [0x81d88d4] [0x8788300] [0x804cf1d] [0x84f40cd] [0x861a7fd] [0x861c62c] [0x861e38f] [0x862b559] [0x84f858e] [0x862ec65] [0x804d695] [0x87433d5] [0x8745e9f] [0x8746cc0] [0x82ef851] [0x84c998f] [0x85dd0a3] [0x85dd14c] [0x8ac2da4] [0x8048111] tail stdout.txt: Cycle: 41 -- Monte_carlo:: best_score,low_score: -502.661 -502.661 Cycle: 42 -- Monte_carlo:: best_score,low_score: -502.661 -502.661 Cycle: 43 -- Monte_carlo:: best_score,low_score: -502.661 -502.661 Cycle: 44 -- Monte_carlo:: best_score,low_score: -502.661 -502.661 Cycle: 45 -- Monte_carlo:: best_score,low_score: -502.661 -502.661 pose_docking_mcm --2 out of 45 monte carlo cycles accepted ***** pose_docking_monte_carlo_minimize ***** ***** pose_docking_minimize_trial ***** WARNING!!! position and full_coord disagree! |
[B^S] JoeB@Ky Send message Joined: 11 Oct 06 Posts: 8 Credit: 39,098 RAC: 0 |
Ralph has been updated to 5.41. In this update, several previously found bugs were fixed. Those are: |
[B^S] JoeB@Ky Send message Joined: 11 Oct 06 Posts: 8 Credit: 39,098 RAC: 0 |
1. Time to completion clock still only counts up. Then suddenly updates with a lower number. 2. Percent Completed does not change with CPU time or Time to Completion. |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
|
Saenger Send message Joined: 28 Feb 06 Posts: 13 Credit: 67,395 RAC: 0 |
Here's my last result: stderr out The WU has been crunched by another guy/gal with an error as well. Grüße vom Sänger |
Message boards :
RALPH@home bug list :
Bug Report for Ralph 5.41
©2024 University of Washington
http://www.bakerlab.org