Bug reports for Ralph 5.37 through 5.40

Message boards : RALPH@home bug list : Bug reports for Ralph 5.37 through 5.40

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Chu
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 Sep 06
Posts: 61
Credit: 12,545
RAC: 0
Message 2493 - Posted: 7 Nov 2006, 5:58:05 UTC - in response to Message 2484.  

Sure, I am happy to answer that. A normal output file for Rosetta models contain for each atom the xyz position coordinates and some other necessary information. This file is just too large for BOINC application as there will be thousands and thousands of such files to be handled. Therefore, Rosetta uses a clever trick to compress the output which is called "silent_output". Under this mode, only the variables ( or degrees of freedom ) in the simulation is being output and these are normally the backbone and sidechain torsion angles ( phi, psi and chi ). By this means, the size of output file can be reduced by at least 30 fold. However, this requires us to reconstruct from these silent out files the normal rosetta output model files (with xyz positions in it) and to do so there is a critical assumption taken that a chemical covalent bond which connect any two atoms has a 'ideal" value for its length. Similarly, the angle composed by any three connected atom has its own ideal value too. So taking these ideal values together with phi/psi/chi angles, we are able to restore the positions for all the atoms in the protein model and we often refer the structure with "ideal" bond lengths and angles as "idealized structure". For the ab initio prediction, the output model is always "idealized" as it is folded with "ideal" geometry and an optimal set of phi/psi/chi angles.

However, there are also some other important tests which requires starting from experimentally solved protein structures (native structures). Normally the bond geometries in these structures have a little bit different values from the ideal ones ( the ideal values are computed as an average over a large distribution of these value from experimental structures ). So in order to run these tests on BOINC, we need to add new functions to allow us to reconstruct protein models from non-idealized bond geometries and phi/psi/chi angles. On the client side, there is almost nothing changed except that the silent output file has one number for each residue which indicates whether it has ideal bonds or non-ideal bonds. The file size increase is very trivial with this new feature, but it opens the door for us to do large-scale tests on the experimentally sovled structures to understand better what are the features for these structures and how we can make Rosetta model more like those native structures.

Hopefully this answers your question.
May I ask? I hope a good description can be written for when 5.38 comes to Rosetta... WHY is "...outputting structures with non-ideal backbone and sidehchain geometries" an improvement? I know, useful to the science... please explain more, on the surface it sounds to a layperson like a step backwards. Also, what impact will this have on the user experience? Will it mean we'll see larger upload sizes on results?

ID: 2493 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2494 - Posted: 7 Nov 2006, 16:57:35 UTC

Thanks for the description... now try for something the size of your new release annoucement. I'd suggest focus on last two sentences. Not sure how the ideal compares to the native myself, so wasn't sure how to reword. But basically just focus on the question "How does this new feature help advance the science of Rosetta?"... and state only minor change to upload file size.
ID: 2494 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2495 - Posted: 7 Nov 2006, 17:00:45 UTC
Last modified: 7 Nov 2006, 17:47:37 UTC

Graphic on v5.38 this 1vls wu and this 2ci2I WU doesn't function properly. In order to rotate the native structure, you have to click in the low energy box.
ID: 2495 · Report as offensive    Reply Quote
Leffe

Send message
Joined: 19 Feb 06
Posts: 10
Credit: 3,683
RAC: 0
Message 2496 - Posted: 7 Nov 2006, 18:45:44 UTC


here's 1 more:

07/11/2006 21:07:38|ralph@home|Starting task fibril_abeta40_test1_1463_122_0 using rosetta_beta version 538
07/11/2006 21:07:38||Suspending work fetch because computer is overcommitted.
07/11/2006 21:42:56||Allowing work fetch again.
07/11/2006 21:43:00|Predictor @ Home|Sending scheduler request to http://predictor.scripps.edu/predictor_cgi/cgi
07/11/2006 21:43:00|Predictor @ Home|Reason: To fetch work
07/11/2006 21:43:00|Predictor @ Home|Requesting 86400 seconds of new work
07/11/2006 21:43:00||Rescheduling CPU: application exited
07/11/2006 21:43:00|ralph@home|Computation for task fibril_abeta40_test1_1463_122_0 finished
07/11/2006 21:43:00||Resuming round-robin CPU scheduling.
07/11/2006 21:43:00|Einstein@Home|Starting task h1_1250.0_S5R1__1329_S5R1a_1 using einstein_S5R1 version 424
07/11/2006 21:43:02|ralph@home|Unrecoverable error for result fibril_abeta40_test1_1463_122_0 (<file_xfer_error> <file_name>fibril_abeta40_test1_1463_122_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
07/11/2006 21:43:02|ralph@home|Deferring scheduler requests for 1 minutes and 0 seconds
07/11/2006 21:43:02|Predictor @ Home|Scheduler request succeeded
07/11/2006 21:43:02|Predictor @ Home|No work from project

ID: 2496 · Report as offensive    Reply Quote
Pieface

Send message
Joined: 16 Feb 06
Posts: 64
Credit: 203,513
RAC: 0
Message 2497 - Posted: 7 Nov 2006, 20:02:52 UTC

Still getting file transfer errors on 5.39:

RESID 319463

</stderr_txt>
<message>
<file_xfer_error>
<file_name>DOC_1CHO_p2_fa_relax_from_native_1462_4_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>

ID: 2497 · Report as offensive    Reply Quote
Wiesi

Send message
Joined: 14 Oct 06
Posts: 3
Credit: 397
RAC: 0
Message 2498 - Posted: 7 Nov 2006, 21:07:03 UTC

another one:

Result:
https://ralph.bakerlab.org/result.php?resultid=319535

Wu:
https://ralph.bakerlab.org/workunit.php?wuid=277224

<file_name>fibril_abeta40_test1_1463_172_2_0</file_name>
<error_code>-161</error_code>
ID: 2498 · Report as offensive    Reply Quote
Profile Saenger
Avatar

Send message
Joined: 28 Feb 06
Posts: 13
Credit: 67,395
RAC: 0
Message 2499 - Posted: 7 Nov 2006, 21:51:53 UTC
Last modified: 7 Nov 2006, 22:39:02 UTC

I've got one as well:
BOINC messages:
Die 07 Nov 2006 21:25:01 CET|ralph@home|Resuming task fibril_abeta40_test1_1463_19_2 using rosetta_beta version 538
Die 07 Nov 2006 23:14:51 CET|ralph@home|Computation for task fibril_abeta40_test1_1463_19_2 finished
Die 07 Nov 2006 23:14:52 CET|ralph@home|Unrecoverable error for result fibril_abeta40_test1_1463_19_2 (<file_xfer_error> <file_name>fibril_abeta40_test1_1463_19_2_0</file_name> <error_code>-161</error_code></file_xfer_error>)
Die 07 Nov 2006 23:14:52 CET|ralph@home|Deferring scheduler requests for 1 minutes and 0 seconds


From the stderr.txt:
</stderr_txt>
<message>
<file_xfer_error>
<file_name>fibril_abeta40_test1_1463_19_2_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>


As you can see, all other Wus failed as well.
Grüße vom Sänger
ID: 2499 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 2502 - Posted: 8 Nov 2006, 0:09:57 UTC

>> A few more with the "error -161" and "file_xfer_error"

https://ralph.bakerlab.org/result.php?resultid=314519 31 decoys
https://ralph.bakerlab.org/result.php?resultid=314520 6 decoys
https://ralph.bakerlab.org/result.php?resultid=314523 10 decoys
https://ralph.bakerlab.org/result.php?resultid=314524 6 decoys
https://ralph.bakerlab.org/result.php?resultid=314525 2 decoys
https://ralph.bakerlab.org/result.php?resultid=314553 5 decoys

My preference is set to 6 hours or 21600 seconds and all these workunits processed for between 18696.36 to 21488.33 seconds, showing that they were within minutes of completion when they failed.
Is the problem something to do with completing the data, finalising the data in the file then preparing to upload that data?
I have seen this error crop up from about version 5.28, so not new just more prevalent.

Also this workunit
https://ralph.bakerlab.org/result.php?resultid=314522
completed, producing 11 decoys but had a very low credit given compared to other successful results on the same computer (5.72 c/h, usual is 12 c/h plus), any reason for this?

All unit are 5.38.
ID: 2502 · Report as offensive    Reply Quote
Chu
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 Sep 06
Posts: 61
Credit: 12,545
RAC: 0
Message 2503 - Posted: 8 Nov 2006, 1:34:15 UTC

Thank you all for the help. We are sorry that the file transfer bug was not completely fixed in 5.39 and that we had several updates in the last several days. The bug is very sneaky that it is only hit by a special combination of command line flags and only for some protein targets under some certain conditions, which makes local debugging difficult. Anyway, we believe this should be completely fixed in 5.40 as shown by our local preliminary test. We will put more tests on RALPH soon to confirm the fix. Thanks again for the patience and the generous support!
ID: 2503 · Report as offensive    Reply Quote
Leffe

Send message
Joined: 19 Feb 06
Posts: 10
Credit: 3,683
RAC: 0
Message 2505 - Posted: 8 Nov 2006, 9:21:47 UTC

got a screensaver lockdown, only restart cured the problem

08/11/2006 12:16:37|ralph@home|Resuming task 1n0u__BOINC_POSE_ABRELAX_VARY_ALL_BOND_ANGLES_VARY_ALL_BOND_DISTANCES_NEWRELAXFLAGS__1473_8_0 using rosetta_beta version 540

ID: 2505 · Report as offensive    Reply Quote
Profile Krzychu P.

Send message
Joined: 16 Feb 06
Posts: 19
Credit: 25,687
RAC: 0
Message 2506 - Posted: 8 Nov 2006, 9:56:35 UTC

I get:

2006-11-08 12:01:17|ralph@home|Unrecoverable error for result 1dtj__BOINC_POSE_ABRELAX_VARY_ALL_BOND_ANGLES_VARY_ALL_BOND_DISTANCES_NEWRELAXFLAGS__1473_8_0 (Niepoprawna funkcja. (0x1) - exit code 1 (0x1))

<core_client_version>5.6.5</core_client_version>
<![CDATA[
<message>
Niepoprawna funkcja. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 2854769
# cpu_run_time_pref: 3600
ABORT: bad to aa_rotno_to_packedrotno
aa,rot1/2/3/4: MET 11 3 0 0 0
chi no 2 nchi 3 aav 2 is_chi_proton_rotamer(aa,aav,i) 0
ERROR:: Exit at: .rotamer_functions.cc line:1509

</stderr_txt>
]]>
ID: 2506 · Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 2507 - Posted: 8 Nov 2006, 13:54:54 UTC - in response to Message 2462.  

Another one 5.37 (WinXP) after 105 seconds produced a readable stack trace of all threads and a list of modules (BOINC Windows Runtime Debugger). It's a bit large, I'll keep a copy of it.

Among 5 successfully crunched 5.40's, crunched silently overnight, there was one failed resultid=320612 after 2:30 min runtime, exit code -529697949 (0xe06d7363), again with extensive debugging output.

Peter
ID: 2507 · Report as offensive    Reply Quote
Papagiorgio

Send message
Joined: 2 Nov 06
Posts: 3
Credit: 26,100
RAC: 0
Message 2508 - Posted: 8 Nov 2006, 14:31:15 UTC

Another -161 file_xfer_error:

2006-11-07 12:56:43 [ralph@home] Unrecoverable error for result
DOC_1JHL_p2_fa_relax_from_native_1462_3_1 (<file_xfer_error>

resultid=315657

ID: 2508 · Report as offensive    Reply Quote
Profile sslickerson

Send message
Joined: 15 Feb 06
Posts: 17
Credit: 4,006
RAC: 0
Message 2509 - Posted: 8 Nov 2006, 17:01:55 UTC

Validation errors: 322305, 321713,

Computation error: 319459


ID: 2509 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 2510 - Posted: 8 Nov 2006, 20:10:06 UTC - in response to Message 2507.  

Sweet -- that debug output will be useful. We love it when the app "loads the PDB symbols". I'm letting Bin know; I bet he can debug this quickly.

Another one 5.37 (WinXP) after 105 seconds produced a readable stack trace of all threads and a list of modules (BOINC Windows Runtime Debugger). It's a bit large, I'll keep a copy of it.

Among 5 successfully crunched 5.40's, crunched silently overnight, there was one failed resultid=320612 after 2:30 min runtime, exit code -529697949 (0xe06d7363), again with extensive debugging output.

Peter


ID: 2510 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 2511 - Posted: 8 Nov 2006, 20:18:56 UTC - in response to Message 2509.  

Chu and David K. have tracked down the source of the validation errors, and are working on the fix. Its just for the "fibril" workunits.

Validation errors: 322305, 321713,

Computation error: 319459


ID: 2511 · Report as offensive    Reply Quote
Starbuck N Syble

Send message
Joined: 21 Feb 06
Posts: 2
Credit: 8,343
RAC: 0
Message 2512 - Posted: 8 Nov 2006, 22:42:52 UTC - in response to Message 2470.  

I get message in new small window:
"Microsoft Visual C++ Runtime Library
Runtime Error!
Program: ...ralph.bakerlab.orgrosetta_beta_5.38_windows_intelx86.exe
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information."
I got the same error yesterday and today from v5.40. WinXP, US English system, other stuff running, not in screensaver mode. Not sure what other diagnostic info I can provide or where it comes from. HTH, Good luck.
ID: 2512 · Report as offensive    Reply Quote
Profile sslickerson

Send message
Joined: 15 Feb 06
Posts: 17
Credit: 4,006
RAC: 0
Message 2513 - Posted: 9 Nov 2006, 0:49:16 UTC

Computation Error: 323402


ID: 2513 · Report as offensive    Reply Quote
genes
Avatar

Send message
Joined: 16 Feb 06
Posts: 45
Credit: 43,300
RAC: 0
Message 2514 - Posted: 9 Nov 2006, 2:28:24 UTC
Last modified: 9 Nov 2006, 3:15:54 UTC

Got this watchdog timeout: resultid=323419 with debugging info.


ID: 2514 · Report as offensive    Reply Quote
Itay Perl

Send message
Joined: 18 Aug 06
Posts: 1
Credit: 1,482
RAC: 0
Message 2516 - Posted: 9 Nov 2006, 11:44:56 UTC

http://img146.imageshack.us/my.php?image=rosettasu7.gif

Is this alright?
ID: 2516 · Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : RALPH@home bug list : Bug reports for Ralph 5.37 through 5.40



©2024 University of Washington
http://www.bakerlab.org