Message boards : RALPH@home bug list : Discussion of the \"1% Hang\" issue
Previous · 1 · 2
Author | Message |
---|---|
Dimitris Hatzopoulos Send message Joined: 16 Feb 06 Posts: 31 Credit: 2,308 RAC: 0 |
Sorry for intervening, but I'm trying to understand how to tell the difference of various bugs. Carlos, does your Rosetta executable keep running? consuming 100% of CPU time? (as seen via Win Task Manager (alt-ctrl-del etc) or using some tool like ProcessExplorer (free, standalone exe, no install required, I've been using it for years) Because I've never encountered a Rosetta WU that "stuck", consuming 100% CPU time, ad infinitum. The ones I've seen "stuck" were all stopped (loaded in memory, BOINC thought they were running, but "top" or "ps" revealed that Rosetta wasn't running, it was "SN"=stopped,nice). And, killing just the Rosetta-task (not ./boinc or anything else, which has been happily running for 1+ month now continuously) will have BOINC re-start the WU with different random-seed and it'll finish OK this time (on the handful of ocassions I encountered sofar). |
Astro Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0 |
Carlos, you have a winner there, please don't abort it, keep it in memory, you may have the WU we testers need to fix this. I'd wait until instructed what to do next. Remember it's sunday. Leaving Ralph or that WU suspended is important to Ralph and is the whole reason Ralph even exists. I wish I had what you have, I really do. tony |
Carlos_Pfitzner Send message Joined: 16 Feb 06 Posts: 182 Credit: 22,792 RAC: 0 |
Sorry for intervening, but I'm trying to understand how to tell the difference of various bugs. I use this one http://www.iarsn.com/taskinfo.html and YES rosetta is "stuck", consuming 100% CPU time, ad infinitum *Not exactly 100% but 99.98% ... remaining 0.02% are used by network. Click signature for global team stats |
Astro Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0 |
Carlos, go to the "Work tab", highlight the stuck wu, then select "suspend". It should stop it, but keep it in memory until they get a chance to respond, and you can continue crunching other work. tony |
Astro Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0 |
I have one that is stuck WU, Computer. It has been going for 2 days, 20 hours, 58 minutes and 4 seconds of CPU time. This machine is currently estimating 8 hours for completion of other results. Hi John, two other had this issue, Mod9 sent for help. David Kim responded with this He hasn't advised further. You could read the whole thread and get a better feel for his intentions. tony [Edit] Mod9 wants to keep this thread just about reporting bugs. He started this thread for discussions about this bug. I have much material there. |
Stargazer257 Send message Joined: 16 Feb 06 Posts: 6 Credit: 17,492 RAC: 0 |
I have two ver 4.90 wu's that "appear" to hang @ 1%, but they have actually just appear to have slowed down to a crawl. Both of them are acting similar in that they race up to Step 34,000 (Model 1) in about 30 minutes and then sloooowly creep forward acomplishing only 50-100 additional steps in 30 additional minutes of processing time. I had rebooted both hosts when they "appeared" to be stuck (@ 4+ hours of processing time), and they both reset to 0:00 (since they must not have "checkpointed"). I will keep them running as long as they progress forward, and will report my results irregardless. They are still in Model 1 at this time. WU10642 WU11437 BTW, is there a fixed number of steps in Model 1, i.e., a goal if you will, to know how close a WU is to completing Model 1 and checkpointing? Join Us! - Click the Sig! |
Moderator9 Volunteer moderator Send message Joined: 16 Feb 06 Posts: 251 Credit: 0 RAC: 0 |
I have two ver 4.90 wu's that "appear" to hang @ 1%, but they have actually just appear to have slowed down to a crawl. Both of them are acting similar in that they race up to Step 34,000 (Model 1) in about 30 minutes and then sloooowly creep forward acomplishing only 50-100 additional steps in 30 additional minutes of processing time. There are a specific number of steps fo a model but it is different for each kind of WU and run parameter set up combination for a WU. So from the user side you can only determine how many steps are in a model when the WU finishes the first model. Moderator9 RALPH@home FAQs RALPH@home Guidelines Moderator Contact |
Stargazer257 Send message Joined: 16 Feb 06 Posts: 6 Credit: 17,492 RAC: 0 |
|
Message boards :
RALPH@home bug list :
Discussion of the \"1% Hang\" issue
©2024 University of Washington
http://www.bakerlab.org