Message boards : RALPH@home bug list : minirosetta v1.48-1.51 bug thread
Author | Message |
---|---|
mtyka Volunteer moderator Project developer Project scientist Send message Joined: 19 Mar 08 Posts: 79 Credit: 0 RAC: 0 |
This (long anticipated, yes i know ) new release has a whole lot of new features and bugfixes revolving mainly around making the app more stable, obey user runtimes more precisely as well as give back more information to us when things go wrong. We are (as always but with this one particularily) keen to hear how this app runs out there on your computers. Any feedback in invaluable to us to make progress in getting minirosetta as stable as rosetta++ and other projects. There is more new science on its way too but the priority right now is code stabilty. 1.48 Release CHANGELOG Faster loop closing in FoldCST/Abinitio (affects cc_* cc2_* cs_* WUs), should help with overrunning WUs. Bug fix concerning intermittent crashes in _rlbd_ jobs. Bug fix for a potential instability in handling text files (affects all types of WUs). Bug fix in checkpointing machinery, states were not being correctly restored, probably contributing to long runtimes. (affects cc_* cc2_* cs_* WUs) Increased the density of checkpoints to lose less time on restarts and address the weired "backjumping" of the time reported in this thread. Added checkpointing to Loopclosing part of FoldCST. (affects cc_* cc2_* cs_* WUs) Added checkpointing to Looprelax. The Watchdog has been checked and improved, now returning information on the aborted jobs to help us figure out how the remaining long running models come about. The watchdog will now abort if the runtime exceeds your preferred runtime + 4 hours. In other words the WUs should not overrun for more than around 4 hours. If they do please let us know !! Thank you all for helping us fix all these problems. Especially all of you who temporarily switched over from Rosetta@HOME, we really appreciate your efford. What's next ? We will be submiting a whole variety of different WUs on RALPH, see if we have improved the stability or have inadertedly created new problems. Then we will either relewase another app here (1.49) to address outstanding issues or move this version (1.48) directly to BOINC. Fingers crossed. EDIT: 1.49 CHANGELOG Something screwed up during the 1.48 release of the database. Supplying the database post-facto seemed to only help those that hadnt already grabbed the new app, so this release is merely a copy of the previous one with proper database to make sure all the clients are downloading the databse correctly. THe apps are identical to 1.48 though! EDIT: 1.50 CHANGELOG Minor update - essentially identical to 1.49. Added another error-reporting mechanism and repaired the symbol store mechanism to help us figure out remaining problems. |
Pepo Send message Joined: 8 Sep 06 Posts: 104 Credit: 36,890 RAC: 0 |
As a first preliminary report: This (long anticipated, yes i know ) new release ...... is mistakenly dated to December 12, 2008 (probably a copy of the 1.47 release). Peter |
mtyka Volunteer moderator Project developer Project scientist Send message Joined: 19 Mar 08 Posts: 79 Credit: 0 RAC: 0 |
whoops yes - thanks for noticing that ;) |
mtyka Volunteer moderator Project developer Project scientist Send message Joined: 19 Mar 08 Posts: 79 Credit: 0 RAC: 0 |
Update: something went wrong with the database during the update - this has probably nothing todo with the new application itself but wit hthe fact that our update machine went down last week and so this update was done from a new machine that evidently failed to update the databse correctly. We're fixing that right now - the project will be down for a few hours before we get this sorted out. SOrry for the delay. Mike |
sslickerson Send message Joined: 15 Feb 06 Posts: 17 Credit: 4,006 RAC: 0 |
Would this be why the last 26 WU or so have failed on my PC with the following message, or is this something different? <core_client_version>6.5.0</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> ERROR: in::file::zip minirosetta_database.zip does not exist! ERROR:: Exit from: ....srcappspublicboincminirosetta.cc line: 83 called boinc_finish </stderr_txt> ]]> |
mtyka Volunteer moderator Project developer Project scientist Send message Joined: 19 Mar 08 Posts: 79 Credit: 0 RAC: 0 |
Yes! Ignore that database error message - for some reason the databse did not get uploaded to the server when i did the update on sunday. Something to do with the move to a new update machine i suspect.. |
Paul D. Buck Send message Joined: 14 Jan 09 Posts: 62 Credit: 33,293 RAC: 0 |
I don't know if you like success stories, but I have run 4 tasks now I think on OS-X Intel and they all have completed successfully. |
I _ quit Send message Joined: 13 Jan 09 Posts: 44 Credit: 88,562 RAC: 0 |
8 tasks on win xp home sp3 and no errors so far had a few 1 hour runs before i updated the prefs. to 4hrs |
mtyka Volunteer moderator Project developer Project scientist Send message Joined: 19 Mar 08 Posts: 79 Credit: 0 RAC: 0 |
Awesome guys! Keep me posted on what you see out there. The error rate so far is looking fabulous. I'll probably update the app once more today to fix an issue with the symbol store such that we get code traces in cases where it still fails. Mike :) |
Paul D. Buck Send message Joined: 14 Jan 09 Posts: 62 Credit: 33,293 RAC: 0 |
Awesome guys! Keep me posted on what you see out there. The error rate so far is looking fabulous. I only got one 1.5 task so that will be all I have to report ... so, my latest bug report is that version 1.5 is repelling the creation of new tasks ... |
Ian_D Send message Joined: 16 Feb 06 Posts: 16 Credit: 39,518 RAC: 0 |
https://ralph.bakerlab.org/result.php?resultid=1250838 <core_client_version>6.4.5</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> Watchdog active. ERROR: target_strands.size() ERROR:: Exit from: ....srcprotocolsabinitioTemplateJumpSetup.cc line: 94 called boinc_finish </stderr_txt> ]]> |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
More checkpointing is great! But... this is a bit extreme. My write to disk at MOST every... setting is at 1800 seconds. My harddrive will never be able to spin down and go in to power saver mode all night long if the checkpoints continue at this pace. 1/20/2009 3:31:58 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed 1/20/2009 3:32:28 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed 1/20/2009 3:32:39 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed 1/20/2009 3:33:14 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed 1/20/2009 3:33:22 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed 1/20/2009 3:34:02 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed 1/20/2009 3:34:04 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed 1/20/2009 3:34:42 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed 1/20/2009 3:34:44 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed 1/20/2009 3:35:22 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed 1/20/2009 3:35:28 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed 1/20/2009 3:36:04 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed 1/20/2009 3:36:17 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed 1/20/2009 3:36:44 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed 1/20/2009 3:36:58 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed 1/20/2009 3:37:24 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed 1/20/2009 3:37:39 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed 1/20/2009 3:38:04 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed 1/20/2009 3:38:21 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed 1/20/2009 3:38:43 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed 1/20/2009 3:39:02 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed 1/20/2009 3:39:22 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed 1/20/2009 3:39:44 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed |
mtyka Volunteer moderator Project developer Project scientist Send message Joined: 19 Mar 08 Posts: 79 Credit: 0 RAC: 0 |
https://ralph.bakerlab.org/result.php?resultid=1250838 Awesome !! Our new debug tools are working. This rare error (i've never seen it in 1000ds of runs) would have gone unnoticed before and led to a segfault. Now it gets caught at least and we can find its cause. Thanks! |
mtyka Volunteer moderator Project developer Project scientist Send message Joined: 19 Mar 08 Posts: 79 Credit: 0 RAC: 0 |
More checkpointing is great! But... this is a bit extreme. My write to disk at MOST every... setting is at 1800 seconds. My harddrive will never be able to spin down and go in to power saver mode all night long if the checkpoints continue at this pace. Hmm ok, i'll look into this. |
Paul D. Buck Send message Joined: 14 Jan 09 Posts: 62 Credit: 33,293 RAC: 0 |
Need more work ... I have run with 1.50 and both were success ... of course the Mac Application has been stable for me, even the awful 1.47 which really farbled up my XP machines ... well, I be doing my part ... :) |
mtyka Volunteer moderator Project developer Project scientist Send message Joined: 19 Mar 08 Posts: 79 Credit: 0 RAC: 0 |
as you wish ... |
Paul D. Buck Send message Joined: 14 Jan 09 Posts: 62 Credit: 33,293 RAC: 0 |
as you wish ... Well, it is the only defect I have found so far on OS-X ...I can't get work ... :) of course, 1.47 works well on OS-X no hung tasks, no long running tasks ... no illegal functions ... so ... well, I will, be looking to add a windows machine the next drop ... Anyway, got three more tasks ... thanks ... |
HA-SOFT, s.r.o. Send message Joined: 19 Jan 09 Posts: 6 Credit: 19,644 RAC: 0 |
I still have problems on my new W2008 X64 server. Every taks of 1.5 minirosetta hangs at startup with 3MB memory and stdout: [2009- 1-21 9:52:36:] :: BOINC :: boinc_init() Created shared memory segment These tasks hangs and I have to kill them from taskbar. After killing stderr is: Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x778806CF read attempt to address 0x00000004 Engaging BOINC Windows Runtime Debugger... |
Paul D. Buck Send message Joined: 14 Jan 09 Posts: 62 Credit: 33,293 RAC: 0 |
First ever error Task on OS-X ... I got this error:
Which seems to be the same error reported below ... |
sslickerson Send message Joined: 15 Feb 06 Posts: 17 Credit: 4,006 RAC: 0 |
My Windows Vista 64 laptop has received about 8 WU and all have completed successfully without error so this looks good, hopefully I will be able to attach to Rosetta soon! |
Message boards :
RALPH@home bug list :
minirosetta v1.48-1.51 bug thread
©2024 University of Washington
http://www.bakerlab.org