Message boards : RALPH@home bug list : Bug reports for 5.56-5.59
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
My observations of % complete resetting to zero upon restart are from Windows as well. You have to remove from memory. I did so by ending BOINC completely rather then changing my settings. Crunch 2 models, then end BOINC and restart. |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Updates in 5.59 I think this is the last update. Everything ran pretty smoothly in 5.58. This just has some small updates in the science, to get back some useful scores for each decoy and a small set of fixes for the symmetric FOLD_AND_DOCK workunits. |
ashriel Send message Joined: 3 Mar 07 Posts: 11 Credit: 648 RAC: 0 |
5.59, default: 1 hour, WU s029__BOINC_SYMM_FOLD_AND_DOCK_RELAX-s029_-truncate_hom014__1906_17, Win2000 Time: 06 Minutes - Percentage: 10 - Time left: 4h 16m Time: 30 Minutes - Percentage: 50 - Time left: 1h 33m Time: 50 Minutes - Percentage: 83 - Time left: 0h 17m Time: 60 Minutes - Percentage: 86 - Time left: 0h 15m (Model 1, Step 67622) Time: 75 Minutes - Percentage: 88 - Time left: 0h 15m (Model 1, Step 67717) Time: 80 Minutes - Percentage:100 - Time left: - (Model ?, Step ?) a) The remaining time is strange - it was mostly ok in 5.57/5.58. b) The steps are very slow (sorry, started to watch them after 60 minutes only) c) Model 1 takes very long |
ashriel Send message Joined: 3 Mar 07 Posts: 11 Credit: 648 RAC: 0 |
5.59, default: 1 hour, WU 1fkaA_BOINC_INCREASECYCLES10_RNA_ABINITIO-1fkaA-chunk005__1901_4, Win2000 Time: 15 Minutes - Percentage: 25 - Time left: 2h 55m (Model 1, Step 271.000) Time: 40 Minutes - Percentage: 67 - Time left: 0h 45m (Model 2, Step 235.000) Time: 50 Minutes - Percentage: 83 - Time left: 0h 17m (Model 2, Step 409.000) Time: 55 Minutes - Percentage:100 - Time left: - (Model ?, Step ?) <1h and more then 1 model, but remaining time strange |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Maion, I believe your time remaining is working just the way Rhiju intended for it to. Once the remaining time estimate gets <10min. then time starts moving slower. This is avoid exceeding 100%. So, basically, once you get below a 10 minute estimated time remaining, the estimate is not on track anymore. Basically, the client is unsure exactly when it will finish, but in each case, the 15 and 17 minutes estimates were not far from right. ...But Rhiju assures us they won't be sending WUs which take more then an hour per model on Rosetta. And so on Rosetta, with shorter WUs, the estimates should appear better. The 1hr time preference is always going to be the toughest to provide a good estimate on. As it is the time preference that will see the most variation (in percentage terms) between the actual time and the preference. |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Thanks, Feet1st, that's a great explanation. We indeed try to keep the avg time per model at less than one hour; actually our ralph runs help us calibrate this! Maion, I believe your time remaining is working just the way Rhiju intended for it to. Once the remaining time estimate gets <10min. then time starts moving slower. This is avoid exceeding 100%. So, basically, once you get below a 10 minute estimated time remaining, the estimate is not on track anymore. Basically, the client is unsure exactly when it will finish, but in each case, the 15 and 17 minutes estimates were not far from right. |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Still seems to end WUs prematurely. If you restart an RNA task that's already completed 30 models... then it will end, regardless of preferred runtime. This is that Completed 30 RNA decoys. additional message I've been mentioing. Here's a v5.58 example. |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
Can anyone explain the new text on MAC results? It looks like this Rosetta@home Macintosh Stack Size checker. Original size: 8388608. Maximum size: 0. RLIM_INFINITY 67108864 Anders n |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
I did confirm this morning that even when the % completed resets to zero, when you restart the task it does seem to know to move time ahead quicker. I had some 10 or 12 hrs in to a task on my 24hr preference and every 5 second tick it was subtracting 15 seconds from the estimated time remaining. So, even though the estimate went to 24+10hrs, if you study it for a minute you can see that it knows better then that. This must be due to the BOINC correction factor applied to the % completed and the current CPU time in to the task. |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Can anyone explain the new text on MAC results? Rhiju explained: ...I'm also reporting the stack sizes in stderr.txt which is returned from your clients to our server, so I can get some info. I think he was trying to determine if stack size had any correlation to Mac failures. |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
@feet1st thanks :) |
ashriel Send message Joined: 3 Mar 07 Posts: 11 Credit: 648 RAC: 0 |
|
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Rhiju, on this issue of the % completed resetting when a task is restarted after being kicked out of memory... I'm puzzled. Before the progress% changes, a restart would not have impacted the calculations. Why does it now? I mean it seems Rosetta used to know the correct total CPU time spent so far when it recomputed progress at end of each model. So... where did it get that number? ...and isn't THAT the number to use now? Rather then the one that resets upon restart? |
UBT - Terry Send message Joined: 13 Nov 06 Posts: 2 Credit: 68,467 RAC: 0 |
|
Conan Send message Joined: 16 Feb 06 Posts: 364 Credit: 1,368,421 RAC: 0 |
Had 2 WU's fail with stderr out <core_client_version>5.8.15</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # cpu_run_time_pref: 21600 # random seed: 2693807 ERROR:: Exit at: .loop_relax.cc line:1688 </stderr_txt> ]]> https://ralph.bakerlab.org/result.php?resultid=486602 https://ralph.bakerlab.org/result.php?resultid=486603 Both only ran for 16 minutes, BAK workunit type. |
UBT - Terry Send message Joined: 13 Nov 06 Posts: 2 Credit: 68,467 RAC: 0 |
Ive also had a couple likethis one 06/04/2007 19:19:53|ralph@home|Computation for task te00_1_NMRREF_1_te00_1_idid_model_06_core_0001IGNORE_THE_REST_idl_1917_44_0 finished jump from 53% or there abouts upto 100% finishing in only 38 mins ??? Not sure if this is a bug or it's meant to do that I'm running at 1.86 ghz using BOINC 5.8.15 WIN XP |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Terry, looks like you have a 1 hour runtime preference?? You are completing the first model in something over 30 minutes, and so your % complete shows the fraction, say 35min/60min preference = 58% complete... and then it hits the end of the model and determins that you don't have time to start a second one, so it completes it. In short, the estimate doesn't predict if you will cut out early, and until you complete model 1, it really doesn't have any way to know if you are likely to or not. |
Thomas Leibold Send message Joined: 25 Feb 07 Posts: 27 Credit: 77,464 RAC: 0 |
Had 2 WU's fail ... Got one of those too: <core_client_version>5.8.15</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1) </message> <stderr_txt> Graphics are disabled due to configuration... # cpu_run_time_pref: 14400 # random seed: 2693814 ERROR:: Exit at: loop_relax.cc line:1688 </stderr_txt> ]]> Workunit 430740 on Linux Server. |
Thomas Leibold Send message Joined: 25 Feb 07 Posts: 27 Credit: 77,464 RAC: 0 |
Workunit 431258 had problems with downloading two of its parts: Fri 06 Apr 2007 04:33:04 AM PDT|ralph@home|[file_xfer] Started download of file 1mhk_.fasta.gz Fri 06 Apr 2007 04:33:04 AM PDT|ralph@home|[file_xfer] Started download of file 1mhk__1ffk.fragments.gz Fri 06 Apr 2007 04:33:06 AM PDT|ralph@home|Incomplete read of 66.000000 < 5KB for 1mhk_.fasta.gz - truncating Fri 06 Apr 2007 04:33:06 AM PDT|ralph@home|[file_xfer] Finished download of file 1mhk_.fasta.gz Fri 06 Apr 2007 04:33:06 AM PDT|ralph@home|[file_xfer] Throughput 623 bytes/sec Fri 06 Apr 2007 04:33:06 AM PDT|ralph@home|[file_xfer] Started download of file 1mhk_RNA.pdb.gz Fri 06 Apr 2007 04:33:06 AM PDT|ralph@home|[error] Checksum or signature error for 1mhk_.fasta.gz Fri 06 Apr 2007 04:33:12 AM PDT|ralph@home|[file_xfer] Finished download of file 1mhk_RNA.pdb.gz Fri 06 Apr 2007 04:33:12 AM PDT|ralph@home|[file_xfer] Throughput 3265 bytes/sec Fri 06 Apr 2007 04:33:12 AM PDT|ralph@home|[file_xfer] Started download of file 1mhk__pairing.pdat.gz Fri 06 Apr 2007 04:33:12 AM PDT|ralph@home|[error] MD5 check failed for 1mhk_RNA.pdb.gz Fri 06 Apr 2007 04:33:12 AM PDT|ralph@home|[error] expected 43fa6b24e2ed0b12d7d949aaa6952085, got 398a6a6e30c8d9493c75a549173bcd93 Fri 06 Apr 2007 04:33:12 AM PDT|ralph@home|[error] Checksum or signature error for 1mhk_RNA.pdb.gz Fri 06 Apr 2007 04:33:13 AM PDT|ralph@home|[file_xfer] Finished download of file 1mhk__1ffk.fragments.gz Fri 06 Apr 2007 04:33:13 AM PDT|ralph@home|[file_xfer] Throughput 159807 bytes/sec Fri 06 Apr 2007 04:33:13 AM PDT|ralph@home|[file_xfer] Finished download of file 1mhk__pairing.pdat.gz Fri 06 Apr 2007 04:33:13 AM PDT|ralph@home|[file_xfer] Throughput 162 bytes/sec Fri 06 Apr 2007 04:33:13 AM PDT|ralph@home|[error] Checksum or signature error for 1mhk__1ffk.fragments.gz Fri 06 Apr 2007 04:33:13 AM PDT|ralph@home|[error] MD5 check failed for 1mhk__pairing.pdat.gz Fri 06 Apr 2007 04:33:13 AM PDT|ralph@home|[error] expected 6a8599df2728416df250dcde0449ece6, got 4b92756f68af0bf0c557fdb008fb878c Fri 06 Apr 2007 04:33:13 AM PDT|ralph@home|[error] Checksum or signature error for 1mhk__pairing.pdat.gz <core_client_version>5.8.15</core_client_version> <![CDATA[ <message> WU download error: couldn't get input files: <file_xfer_error> <file_name>1mhk_.fasta.gz</file_name> <error_code>-200</error_code> </file_xfer_error> </message> ]]> |
Inais Send message Joined: 30 Jul 06 Posts: 12 Credit: 13,115 RAC: 0 |
Same problem on 4 WU's 491751 431342 10 Apr 2007 6:23:00 UTC 10 Apr 2007 6:35:14 UTC Over Client error Downloading 0.00 0.00 --- 491750 431340 10 Apr 2007 6:23:00 UTC 10 Apr 2007 6:35:14 UTC Over Client error Downloading 0.00 0.00 --- 491745 431276 10 Apr 2007 6:18:50 UTC 10 Apr 2007 6:23:00 UTC Over Client error Downloading 0.00 0.00 --- 491743 431275 10 Apr 2007 6:18:50 UTC 10 Apr 2007 6:23:00 UTC Over Client error Downloading 0.00 0.00 --- I wish I can fly like a bird in the sky |
Message boards :
RALPH@home bug list :
Bug reports for 5.56-5.59
©2024 University of Washington
http://www.bakerlab.org