Message boards : RALPH@home bug list : Bug reports for Ralph 5.52-5.54
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
Bug ? After 16H 28min it was swiched for Rosetta. It was finished when I woke up this morning and reported 3H and 5 min. Anders n |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
anders n, do the messages indicate that the task was preempted and restarted at all during the night? Almost sounds like it was removed from memory and reverted back to a prior checkpoint. |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
anders n, do the messages indicate that the task was preempted and restarted at all during the night? Almost sounds like it was removed from memory and reverted back to a prior checkpoint. It was preempted but not removed from memory. Anders n |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Sorry, more work coming now!
|
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
On second thought, I'm a little worried about our recent Linux build based on your comment. I think we changed the way libraries are used in the build -- let me see if I can fix this.
[/quote] |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
OK 5.54 should have the old style build with static libraries -- let's see how it goes. I'm seeing at least one host that had consistent success up through 5.51, then has been giving shared library errors in 5.52 and 5.53. So I'll see if it returns good results with 5.54. On second thought, I'm a little worried about our recent Linux build based on your comment. I think we changed the way libraries are used in the build -- let me see if I can fix this. [/quote] |
Thomas Leibold Send message Joined: 25 Feb 07 Posts: 27 Credit: 77,464 RAC: 0 |
Sorry, more work coming now! Thanks! I got one and it appears to be running fine using Ralph 5.53 with the newly installed libstdc++.so.6 which means that this was the only library that was missing. I saw the news that Ralph 5.54 fixes the library issue, but haven't gotten that version of the client yet. |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Hi Thomas, good to hear you client is working again. I also haven't seen any more libstdc++.so.6 errors after the release of 5.54. So hopefully that's fixed for other Linux users who aren't as library-savvy as you. Thanks for posting! Sorry, more work coming now! |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Just for giggles, I ended and started BOINC while a 5.54 RNA task was in progress... lost all of my progress on model #2, over 30min of work. The first model took just over 40min to complete, so it should have been some 3/4 through with model 2. Do the RNA WUs have any checkpointing? Or is progress only saved upon model completion? |
genes Send message Joined: 16 Feb 06 Posts: 45 Credit: 43,706 RAC: 20 |
No problems so far... :) returned 12 valid 5.54, 1 valid 5.53, 50 valid 5.52 WU's all on Windows boxes. No errors yet. |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
Bug ? A new Wu "stuck" https://ralph.bakerlab.org/result.php?resultid=467564 Now at 16H 33 min at 69,7%. I have stopped other projects so it will not be preemted this time. Lets hope for the best. Anders n |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
It is now at 30H 19 min. The watchdog should have done it's work by now. What do you want me to do? Anders n |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Anders n, what is your runtime preference? The watchdog should step in within about 15min of crossing 4x the preferred runtime. Has it made no progress on model numbers in that time? |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
Anders n, what is your runtime preference? The watchdog should step in within about 15min of crossing 4x the preferred runtime. It is now at 40H 50 min. Still at 69,7%. My pref. runtime is 4H. I can not open grafics on this WU. Anders n |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
I'd suggest you suspend it, then resume it again and there are a few possible outcomes. Knowing the outcome may be of use to the Project Team. 1) it may end with errors. 2) it may restart the same model and end up in the same state of not being able to complete the model. (I'd only let it run 4hrs this time, I've not seen any tasks that should take that long to complete a model on a 2ghz CPU, which you would see by the % completed changing). 3) it may then complete the model normally in <2hrs. |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
I'd suggest you suspend it, then resume it again and there are a few possible outcomes. Knowing the outcome may be of use to the Project Team. Test 1Suspend and resume - the task continues where it was. Test 2 Restarting Boinc - It resumes at 2H 47min and I can now view grafics. Anders n |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Oops, yep, you must be keeping the application in memory (a good thing). I wasn't thinking. Yes, ending BOINC and restarting... basically we're forcing the WU to pick up from the last checkpoint. Curious, what % done did it say upon restart? |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
Oops, yep, you must be keeping the application in memory (a good thing). I wasn't thinking. Yes, ending BOINC and restarting... basically we're forcing the WU to pick up from the last checkpoint. 69,7% It started decoy 10. Anders n |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
I've got several that seem to have ended normally (lasted through to my 24hr time preference), but the reported WU shows a "No heartbeat from core client for 31 sec - exiting" message. https://ralph.bakerlab.org/result.php?resultid=459533 v5.52 https://ralph.bakerlab.org/result.php?resultid=466179 v5.52 https://ralph.bakerlab.org/result.php?resultid=467168 v5.54 |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
I may be on to what is happening with my MAC. A Ralph task was preemted and Rosetta continued and all worked as usual. The Rosetta task finished and Ralph was to continue. For several seconds the Ralph task showed as running but no ticking on the CPU time then the time started to go up But I could not watch Grafics on that task again. I restarted Boinc and everything was back to normal. Anders n |
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.52-5.54
©2025 University of Washington
http://www.bakerlab.org