Message boards : RALPH@home bug list : Bug reports for Ralph 5.33 and 5.34
Author | Message |
---|---|
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
1. We are trying out some more features that should get lower energy decoys for the same amount of computation. 2. Some slight fixes for docking graphics. 3. Added options to allow simulations constrained by data from solution x-ray scattering experiments. Let us know if you see anything weird! |
Pieface Send message Joined: 16 Feb 06 Posts: 64 Credit: 203,513 RAC: 0 |
Here are a couple of wu's that err'd on 5.33: result 298467 and result 298499 Both got: <core_client_version>5.4.11</core_client_version> <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # random seed: 2887450 ERROR:: Exit at: .pose_routines.cc line:126 |
zombie67 [MM] Send message Joined: 8 Aug 06 Posts: 75 Credit: 2,396,363 RAC: 6,299 |
|
Sadir Send message Joined: 21 Feb 06 Posts: 6 Credit: 1,419 RAC: 0 |
|
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Thanks - I think I've fixed that problem in 5.34! We'll see... Here are a couple of wu's that err'd on 5.33: |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
OK, we're looking at it -- most of those WUs failed. Same problem with WU FRA_t389... |
Nikolay A. Saharov Send message Joined: 17 Feb 06 Posts: 6 Credit: 25,102 RAC: 0 |
|
Tobie Send message Joined: 4 Oct 06 Posts: 3 Credit: 582 RAC: 0 |
My WUs came up with errors. Results: 301114 300324 <core_client_version>5.4.11</core_client_version> <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # random seed: 2883800 ERROR:: Exit at: .minimize.cc line:2089 </stderr_txt> |
Bin Qian Send message Joined: 13 Feb 06 Posts: 3 Credit: 4,483 RAC: 0 |
Thanks for reporting these error! I've tracked down the bug and fixed it. The bug only affect a particular set of command line options. The bug fix will be included in the next update. My WUs came up with errors. |
Conrad Poohs Send message Joined: 29 Aug 06 Posts: 9 Credit: 1,955 RAC: 0 |
I have a WU which seems to be 'stuck' ie. the run time and the time to go are increasing at about 1 sec per sec but % done is remaining at 39.901. I have tried stopping and restarting BOINC Client (as this has worked in similar situations before) to no avail. This WU is hogging my BOIC processing as my PC (DELL 4600 Intel P4 3.00Ghz WinXPSP2) is now overcommitted. Any answers please as I am only willing to give this WU another couple of hours unless it moves on. Regards, Andy G |
Conrad Poohs Send message Joined: 29 Aug 06 Posts: 9 Credit: 1,955 RAC: 0 |
Oops, forgot to say that WU is 263878. Regards, Andy G. |
Conrad Poohs Send message Joined: 29 Aug 06 Posts: 9 Credit: 1,955 RAC: 0 |
Above WU finally finished all of a sudden, having reduced run time back down to 2h 53m from over 6h (it actually suddenly changed to 90-something % complete and reduced time to go to a couple of minutes and run time to 2h 52m then completed). As it actually clocked up considerably more runtime than it says I feel slightly cheated as I assume credits will be granted according to reported run time. Regards, Andy G. |
SafeAggie Send message Joined: 5 Oct 06 Posts: 6 Credit: 4,207 RAC: 0 |
Unrecoverable error for result 1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_ANGLES_ALL_BOND_DISTANCES_SAVE_ALL_OUT__1385_26_0 (One or more arguments are invalid (0x80000003) - exit code -2147483645 (0x80000003)) 10/25/2006 7:15:46 AM|ralph@home|Unrecoverable error for result 1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_ANGLES_SAVE_ALL_OUT__1385_27_0 (One or more arguments are invalid (0x80000003) - exit code -2147483645 (0x80000003)) resultid=301379 resultid=301375 |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Andy Credits are based on the number of models you crunch, not raw CPU time (see Aug 23rd update). The time remaining INCREASING as the WU runs is normal. The CPU time spent on the WU will always be reduced if you end and restart BOINC. So, I suspect that's what happened in your case. It is just a question of how much time is lost. Basically you lose all work done since the last checkpoint was established. Checkpoints are always established when a model reaches completion, they are also established periodically within a model, but for some types of WUs it can be more then an hour between checkpoints. As for hogging your CPU... once BOINC feels overcommitted, it runs earliest deadline first. Since RALPH has short deadlines, it's common for it's WUs to have the earliest deadline in your list of tasks. Don't worry about the time hogging, BOINC keeps track of this and will make it up to the other projects. Read more. In short, I don't see anything in your description that alarms me as being out of the ordinary. If you have further questions, perhaps start a new thread, either here on Ralph, or over on Rosetta. |
Conrad Poohs Send message Joined: 29 Aug 06 Posts: 9 Credit: 1,955 RAC: 0 |
Feet1st. Thanks very much for the Info. I had got my head around most of that, it was just that it was the first time I had seen the time to go increase to quite such an extent (out to over 6 hours for a WU that estimated run time a about 3 hours). It also seems to me that the WU didn't checkpoint for some 3 hours but it was the large increase in run time that threw me. Thanks again. Regards, Andy G. |
[B^S] Gamma^Ray Send message Joined: 20 Oct 06 Posts: 4 Credit: 1,038 RAC: 0 |
The first I ran V/5.34 Workunit 264440, Result ID 304006 Errored with: stderr out <core_client_version>5.4.11</core_client_version> <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # random seed: 2883885 # cpu_run_time_pref: 3600 ERROR:: Exit at: .minimize.cc line:2089 G^R |
Conan Send message Joined: 16 Feb 06 Posts: 364 Credit: 1,368,421 RAC: 0 |
> The freezing and lock up problems is totally related to the new graphics that were introduced in Ralph 5.32. Within a very short time of the Boinc screensaver for Ralph/Rosetta coming on the work unit either stops/freezes and the processor may or may not keep going in the background with the Task Manager saying that all is well, or the graphics stop doing anything and Task Manager says 'not responding' with the cpu back to idle. In the first case I often had to reboot or the unit would eventually be killed by the watchdog. In the second I could manually kill the workunit in Task Manager. Only one WU between Result Id 295099 and 295112 (13 wu's) actually completed successfully (295104) but reported late due to the computer closing down with these screen errors whilst I was away. >> With Ralph 5.34 things have not changed at all see Result Id 302900. I am having the same issues with Rosetta 5.32 and 5.34 and I have reported over there as well. My solution has been to stop using Boinc Screensaver, this also stops my other projects from showing there graphics as well but every Work Unit since turning off the graphics has been successful, see Result Id 302935 to 302942 (8 results). Also Rosetta has had no more problems either. My 2 linux machines have no problems as they do not have graphics, 3 other Windows XP machines have no problem either due to Boinc being installed as a Service, so no graphics. The 2 Windows XP machines with graphics are the only ones in trouble. |
Brian B Send message Joined: 17 Feb 06 Posts: 9 Credit: 2,632 RAC: 0 |
Hi all. I have a wu/result that seems to have a checkpoint issue. I have noticed several times now that prior to shutting the laptop down to take with me, the wu might be at 5 hours CPU time and up around 11 hours (and increasing) to complete with Progress stuck at 1.00%. After arriving at my destination and booting the laptop back up, the wu will be restarted back to 0 hours CPU time with To complete at around 1:29 and increasing and Progress back to 0%. This has happened several times now. Sorry, but I am going to abort this wu, especially since its back to zero again and its probably already run around 10 hours total or so. Let me know if there is any more info I can supply. Thanks! |
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.33 and 5.34
©2024 University of Washington
http://www.bakerlab.org