Message boards : RALPH@home bug list : Bug reports for Ralph 5.05 and higher
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
Hello, i have some problem with this Wu : 5.06 They are so big that it takes more than 1 H on a fast computer to complete 1 decoy. Anders n |
Dotsch Send message Joined: 4 Mar 06 Posts: 12 Credit: 13,725 RAC: 0 |
I have some problems with 5.06 on Windows 98 : <core_client_version>5.2.13</core_client_version> <message> - exit code -164 (0xffffff5c) </message> <stderr_txt> LoadLibraryA( dbghelp95.dll ): GetLastError = 1157 LoadLibraryA( dbghelp.dll ): GetLastError = 1157 </stderr_txt> Result ID : https://ralph.bakerlab.org/result.php?resultid=100666 |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Hi Feet1st, these are great suggestions, as usual! We've come to expect them. I'm about to post 5.08, and I'll ask that ralph users use similar preferences to their r@h preferences, as you suggest. I think the checkpointing and watchdog issues have largely been resolved, thankfully, and we've moved on to testing real science. As for keeping work on ralph, we haven't quite got that figured out. We'd like to have jobs go out instantly to clients when we post the new app or test a new scientific mode on ralph, so that we get feedback ASAP. The problem is that if we've flooded the clients with jobs with the previous app or previous jobs, there's typically a wait for those clients to free up again. In the future, if we can get trickle-messages implemented, we could send out a purge request. Still, I hear you ... I'll keep sending out work and ask others to do the same. Feet1st, you noticed how bad the problem was with 5.05; has your client tried any 5.06? |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Hi Mike: this is a silly thing that we haven't quite been able to fix, but should happen rarely on rosetta@home. That ralph workunit was a test that our watchdog timer properly aborts really long running jobs. So we're very glad to see it worked on your computer! If you ever run into similar super-long workunits on Rosetta@home (hopefully not!), you'll eventually get credit granted to it, because that's our policy. Thanks for posting! 4/28/2006 12:53:48 AM||Rescheduling CPU: files downloaded |
tralala Send message Joined: 12 Apr 06 Posts: 52 Credit: 15,257 RAC: 0 |
As for keeping work on ralph, we haven't quite got that figured out. We'd like to have jobs go out instantly to clients when we post the new app or test a new scientific mode on ralph, so that we get feedback ASAP. The problem is that if we've flooded the clients with jobs with the previous app or previous jobs, there's typically a wait for those clients to free up again. That's easy to solve: limit the daily quota to five or less. That means clients grab new jobs instantly but can't pile up big caches. At the moment it works as follows the first 20 clients pile up 20 WUs each and no more work is available. These hosts are busy with them several days so you get your work returned late. With 5WU/day the first 80 clients grab 5 WU each and are busy with them only for a day or less. I'd even say 3WU/day is a good quota. Short deadlines have a similar effect but it seems you reset them to match those of Rosetta. |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
Yes a quota of 3-5 would keep most of the host with work and if you need fast answers to a test batch set the return date to 1-3 days and they will be cruched first. Anders n |
JKeck {pirate} Send message Joined: 16 Feb 06 Posts: 14 Credit: 153,095 RAC: 0 |
I would think for the daily quota 2 would be the minimum and the max 4 or 8. You would want to have a chance at getting multiple tasks running on multi-CPU hosts. BOINC WIKI BOINCing since 2002/12/8 |
tralala Send message Joined: 12 Apr 06 Posts: 52 Credit: 15,257 RAC: 0 |
I would think for the daily quota 2 would be the minimum and the max 4 or 8. You would want to have a chance at getting multiple tasks running on multi-CPU hosts. The daily quota is per CPU. So if you have a dual-core or a Hyperthreading-enabled P4 you get 6 WU/day if the daily quote is 3WU/Day. |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
ROM, I currently have a rosetta_beta_5.06 that has been running 14 hours+ with 1.04% for progress. I have debug capability on this computer, any suggestions, or just Abort? its labeled: WATCHDOG_KILL_VERY_LONG_JOBS_414_3 I notice that 2 others ran this unit and it died at 1.5 hours and 1.8 hours Running on Win2000 SP4, leave in memory is set. |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
its labeled: WATCHDOG_KILL_VERY_LONG_JOBS_414_3 I've seen other posts that this WU was specially designed to TEST the watchdog. It is INTENDED to have the watchdog step in and end it for you. So if you abort, you essentially leave the watchdog less proven. He'll get it! But that SHOULD be the reason why the others "failed". |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
ROM, https://ralph.bakerlab.org/workunit.php?wuid=83793 Now at 24 hours and still stuck at 1.04%. |
William Senn Send message Joined: 16 Feb 06 Posts: 4 Credit: 30,895 RAC: 0 |
Hi, Got two erroneous results, but did not report them here, yet, sorry for being so late.... resultid=98902 resultid=99919 App version 5.06 (both)... Other 2 earlier workunits completed succesfully.... greetings, William Senn... |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
ROM, 36 hours and still stuck at 1.04%... the watchdog is NOT working... is anyone out there? |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
36 hours and still stuck at 1.04%... the watchdog is NOT working... is anyone out there? Hi Mike Have you checked the grafics to se if the steps or % has changed? The % should show with 1.04?? and not as on boinc manager with only 1,04. Anders n |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
36 hours and still stuck at 1.04%... the watchdog is NOT working... is anyone out there? This computer is headless. Remote access only. Hence no screensaver. |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Looks like your normal WUs are the 4hrs default... so we're now well passed the 4x preference guideline I've seen posted elsewhere... so it is time to abort. Since we're here on Ralph, the diagnostic info. should prove useful for study. Hopefully it's something they fixed in the versions after 5.06. Ironic... given your photo that your computer is "headless" :):) |
Astro Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0 |
[This computer is headless. Remote access only. Hence no screensaver. Mike, I use VNC to see the graphics on my remote monitorless, keyboardless, and mouseless puter. I click on the WU from the task tab and then view graphics. No screensaver here either. If it's a service install your hosed. tony |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
[This computer is headless. Remote access only. Hence no screensaver. It is a service install. I forgot about the "View Graphics button" I do VN into this computer. OK... 1.041% complete after 40 hours. Stage Full atom relax, Mode 1, Step 100, Accepted RMSD 50.36, Accepted Energy -19.40622 whatever this all means. |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Hi Mike: thanks very much for posting. This sounds weird. The job should have been killed by the watchdog. In fact we sent out these workunits to test that infinite loops are aborted by the watchdog, and they've been "successful" in that they've mostly returned without keeping computers in infinite loops. For now, please either abort or follow mod9's suggestion of suspending and restarting a few times. If this occurs again, please post! [This computer is headless. Remote access only. Hence no screensaver. |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
[This computer is headless. Remote access only. Hence no screensaver. Starting and stopping did indeed reset the time to 0 (I had to reboot for other reasons). I am going to allow it to build back up... at over 24 I will report back. Its the Max Time Setting (24 hrs) that appears to not be working. |
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.05 and higher
©2024 University of Washington
http://www.bakerlab.org