Bug reports for rosetta_beta_5.77 and rosetta_5.69

Message boards : RALPH@home bug list : Bug reports for rosetta_beta_5.77 and rosetta_5.69

To post messages, you must log in.

AuthorMessage
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 250
Credit: 543,579
RAC: 0
Message 3329 - Posted: 20 Aug 2007, 17:38:28 UTC
Last modified: 21 Aug 2007, 17:08:24 UTC

Please post any bugs here regarding version rosetta_beta 5.77 and/or rosetta 5.69. The same bug was resolved for both versions (the stable and development versions).
ID: 3329 · Report as offensive    Reply Quote
Profile m.mitch
Avatar

Send message
Joined: 12 May 06
Posts: 16
Credit: 154,608
RAC: 0
Message 3332 - Posted: 29 Aug 2007, 0:43:30 UTC
Last modified: 29 Aug 2007, 0:43:56 UTC

The work unit 550011 is stuck at about 94% complete, with 10 minutes left to run. The BOINC Manager says it's running but there is no CPU use.

It's on a Linux box and the work unit was suspended but I didn't notice if that was the direct cause also the box was rebooted.

Should it be aborted?


Click here to join the #1 Aussie Alliance on RALPH
ID: 3332 · Report as offensive    Reply Quote
mdettweiler
Avatar

Send message
Joined: 4 Apr 07
Posts: 11
Credit: 1,010
RAC: 0
Message 3333 - Posted: 29 Aug 2007, 15:47:44 UTC - in response to Message 3332.  
Last modified: 29 Aug 2007, 15:48:16 UTC

The work unit 550011 is stuck at about 94% complete, with 10 minutes left to run. The BOINC Manager says it's running but there is no CPU use.

It's on a Linux box and the work unit was suspended but I didn't notice if that was the direct cause also the box was rebooted.

Should it be aborted?

NO. Rosetta bases its progress bar and time to completion estimates off of your preferred run time--which, for some of the larger workunits that seem to be very common nowadays, is less (sometimes drastically) than the amount of time actually required to complete one model (the minimum to complete a WU). Thus, if the workunit goes over your preferred runtime, it will stick at about 10 minutes left, and cut down that and up the % done very slowly, because it really has no idea how long the workunit's going to take. The % done and time left to completion, at least for Rosetta/RALPH workunits, are just rough estimates, and with the new, bigger workunits, if you have a lower set runtime (which is recommended for RALPH anyway), most of your workunits will probably go over, unless you have a very fast, modern CPU.

Long story short, this is normal, so don't abort the workunit, let it run. Some workunits can take up to 4 hours (a couple close to 5, even) per model on my P4 3.2Ghz HT, so in my case, they'll take at the very least that amount of time, no matter what time preferences you have set. Rosetta doesn't know ahead of time how much time they'll take, so once it goes over your preferred run time, all it can do is make underestimates so people don't freak out if it goes over 100%. :-)
ID: 3333 · Report as offensive    Reply Quote
Profile m.mitch
Avatar

Send message
Joined: 12 May 06
Posts: 16
Credit: 154,608
RAC: 0
Message 3334 - Posted: 30 Aug 2007, 1:28:01 UTC - in response to Message 3333.  

The work unit 550011 is stuck at about 94% complete, with 10 minutes left to run. The BOINC Manager says it's running but there is no CPU use.

It's on a Linux box and the work unit was suspended but I didn't notice if that was the direct cause also the box was rebooted.

Should it be aborted?

NO. Rosetta bases its progress bar and time to completion estimates off of your preferred run time--which, for some of the larger workunits that seem to be very common nowadays, is less (sometimes drastically) than the amount of time actually required to complete one model (the minimum to complete a WU). Thus, if the workunit goes over your preferred runtime, it will stick at about 10 minutes left, and cut down that and up the % done very slowly, because it really has no idea how long the workunit's going to take. The % done and time left to completion, at least for Rosetta/RALPH workunits, are just rough estimates, and with the new, bigger workunits, if you have a lower set runtime (which is recommended for RALPH anyway), most of your workunits will probably go over, unless you have a very fast, modern CPU.

Long story short, this is normal, so don't abort the workunit, let it run. Some workunits can take up to 4 hours (a couple close to 5, even) per model on my P4 3.2Ghz HT, so in my case, they'll take at the very least that amount of time, no matter what time preferences you have set. Rosetta doesn't know ahead of time how much time they'll take, so once it goes over your preferred run time, all it can do is make underestimates so people don't freak out if it goes over 100%. :-)


I don't think it's normal for the BOINC Manager to report the work unit as running but the CPU to be inactive.


Click here to join the #1 Aussie Alliance on RALPH
ID: 3334 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 3335 - Posted: 30 Aug 2007, 4:04:52 UTC

A restart of Boinc is the first thing to do when a Wu seems stuck.


ID: 3335 · Report as offensive    Reply Quote
mdettweiler
Avatar

Send message
Joined: 4 Apr 07
Posts: 11
Credit: 1,010
RAC: 0
Message 3336 - Posted: 30 Aug 2007, 4:41:01 UTC - in response to Message 3334.  

The work unit 550011 is stuck at about 94% complete, with 10 minutes left to run. The BOINC Manager says it's running but there is no CPU use.

It's on a Linux box and the work unit was suspended but I didn't notice if that was the direct cause also the box was rebooted.

Should it be aborted?

NO. Rosetta bases its progress bar and time to completion estimates off of your preferred run time--which, for some of the larger workunits that seem to be very common nowadays, is less (sometimes drastically) than the amount of time actually required to complete one model (the minimum to complete a WU). Thus, if the workunit goes over your preferred runtime, it will stick at about 10 minutes left, and cut down that and up the % done very slowly, because it really has no idea how long the workunit's going to take. The % done and time left to completion, at least for Rosetta/RALPH workunits, are just rough estimates, and with the new, bigger workunits, if you have a lower set runtime (which is recommended for RALPH anyway), most of your workunits will probably go over, unless you have a very fast, modern CPU.

Long story short, this is normal, so don't abort the workunit, let it run. Some workunits can take up to 4 hours (a couple close to 5, even) per model on my P4 3.2Ghz HT, so in my case, they'll take at the very least that amount of time, no matter what time preferences you have set. Rosetta doesn't know ahead of time how much time they'll take, so once it goes over your preferred run time, all it can do is make underestimates so people don't freak out if it goes over 100%. :-)


I don't think it's normal for the BOINC Manager to report the work unit as running but the CPU to be inactive.

Oh! Sorry. I made a blooper--I didn't notice that you said that the CPU was not being active at all. If the CPU was being used, yet the progress and time to completion were as you said, then what I said would be correct, but not in the case that it's not using any CPU time at all. In the case of it using no CPU time at all, I would recommend that you abort the WU.

Sorry! :-(

ID: 3336 · Report as offensive    Reply Quote
Profile m.mitch
Avatar

Send message
Joined: 12 May 06
Posts: 16
Credit: 154,608
RAC: 0
Message 3337 - Posted: 30 Aug 2007, 6:34:30 UTC - in response to Message 3336.  

The work unit 550011 is stuck at about 94% complete, with 10 minutes left to run. The BOINC Manager says it's running but there is no CPU use.

It's on a Linux box and the work unit was suspended but I didn't notice if that was the direct cause also the box was rebooted.

Should it be aborted?

NO. Rosetta bases its progress bar and time to completion estimates off of your preferred run time--which, for some of the larger workunits that seem to be very common nowadays, is less (sometimes drastically) than the amount of time actually required to complete one model (the minimum to complete a WU). Thus, if the workunit goes over your preferred runtime, it will stick at about 10 minutes left, and cut down that and up the % done very slowly, because it really has no idea how long the workunit's going to take. The % done and time left to completion, at least for Rosetta/RALPH workunits, are just rough estimates, and with the new, bigger workunits, if you have a lower set runtime (which is recommended for RALPH anyway), most of your workunits will probably go over, unless you have a very fast, modern CPU.

Long story short, this is normal, so don't abort the workunit, let it run. Some workunits can take up to 4 hours (a couple close to 5, even) per model on my P4 3.2Ghz HT, so in my case, they'll take at the very least that amount of time, no matter what time preferences you have set. Rosetta doesn't know ahead of time how much time they'll take, so once it goes over your preferred run time, all it can do is make underestimates so people don't freak out if it goes over 100%. :-)


I don't think it's normal for the BOINC Manager to report the work unit as running but the CPU to be inactive.

Oh! Sorry. I made a blooper--I didn't notice that you said that the CPU was not being active at all. If the CPU was being used, yet the progress and time to completion were as you said, then what I said would be correct, but not in the case that it's not using any CPU time at all. In the case of it using no CPU time at all, I would recommend that you abort the WU.

Sorry! :-(


No probs Anonymous, I have duelly blown it out of the water. Just as well to, I'd left it unsuspended and have no idea how much crunching time it wasted.

Cheers


Click here to join the #1 Aussie Alliance on RALPH
ID: 3337 · Report as offensive    Reply Quote
ramostol

Send message
Joined: 29 Mar 07
Posts: 24
Credit: 31,121
RAC: 0
Message 3338 - Posted: 30 Aug 2007, 9:17:42 UTC

I commented on a similar problem in a Rosetta message board some time ago (39305). My experience is that if a Boinc project is running using no CPU (more correctly: using so little CPU time that it is practically unnoticeable), it happens because other programs hog the CPU in such a way that the Rosetta crunching is performed not in the Rosetta process but in the kernel_task process.

To bring the situation back to normal you may examine the active processes on you computer. If you observe a quite active kernel_task process this would confirm the theory. Then look through all processes to find a program/process using lots of CPU although doing nothing sensible, and quit this program. Then you can see kernel_task shrinking and the Rosetta process using CPU as normally.

What I did not mention in my original message is that this is probably also the cause of the occasionally reported problem of Rosetta processes running for days and days without being able to stop. Since Boinc/Rosetta will register the CPU use of the Rosetta process to determine when to terminate the process in accordance with your default settings, it will know nothing of the computing going on inside the kernel_task process and will let the process continue for a looooong time.
ID: 3338 · Report as offensive    Reply Quote

Message boards : RALPH@home bug list : Bug reports for rosetta_beta_5.77 and rosetta_5.69



©2024 University of Washington
http://www.bakerlab.org