Bug reports for 5.56-5.59

Message boards : RALPH@home bug list : Bug reports for 5.56-5.59

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2979 - Posted: 1 Apr 2007, 23:10:36 UTC

My observations of % complete resetting to zero upon restart are from Windows as well. You have to remove from memory. I did so by ending BOINC completely rather then changing my settings. Crunch 2 models, then end BOINC and restart.
ID: 2979 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 2980 - Posted: 2 Apr 2007, 1:19:52 UTC
Last modified: 2 Apr 2007, 1:21:36 UTC

Updates in 5.59
I think this is the last update. Everything ran pretty smoothly in 5.58. This just has
some small updates in the science, to get back some useful scores for each decoy and
a small set of fixes for the symmetric FOLD_AND_DOCK workunits.

ID: 2980 · Report as offensive    Reply Quote
Profile ashriel

Send message
Joined: 3 Mar 07
Posts: 11
Credit: 648
RAC: 0
Message 2982 - Posted: 2 Apr 2007, 14:05:59 UTC
Last modified: 2 Apr 2007, 14:58:13 UTC

5.59, default: 1 hour, WU s029__BOINC_SYMM_FOLD_AND_DOCK_RELAX-s029_-truncate_hom014__1906_17, Win2000

Time: 06 Minutes - Percentage: 10 - Time left: 4h 16m
Time: 30 Minutes - Percentage: 50 - Time left: 1h 33m
Time: 50 Minutes - Percentage: 83 - Time left: 0h 17m
Time: 60 Minutes - Percentage: 86 - Time left: 0h 15m (Model 1, Step 67622)
Time: 75 Minutes - Percentage: 88 - Time left: 0h 15m (Model 1, Step 67717)
Time: 80 Minutes - Percentage:100 - Time left: - (Model ?, Step ?)

a) The remaining time is strange - it was mostly ok in 5.57/5.58.
b) The steps are very slow (sorry, started to watch them after 60 minutes only)
c) Model 1 takes very long
ID: 2982 · Report as offensive    Reply Quote
Profile ashriel

Send message
Joined: 3 Mar 07
Posts: 11
Credit: 648
RAC: 0
Message 2983 - Posted: 2 Apr 2007, 14:06:00 UTC
Last modified: 2 Apr 2007, 14:55:30 UTC

5.59, default: 1 hour, WU 1fkaA_BOINC_INCREASECYCLES10_RNA_ABINITIO-1fkaA-chunk005__1901_4, Win2000

Time: 15 Minutes - Percentage: 25 - Time left: 2h 55m (Model 1, Step 271.000)
Time: 40 Minutes - Percentage: 67 - Time left: 0h 45m (Model 2, Step 235.000)
Time: 50 Minutes - Percentage: 83 - Time left: 0h 17m (Model 2, Step 409.000)
Time: 55 Minutes - Percentage:100 - Time left: - (Model ?, Step ?)

<1h and more then 1 model, but remaining time strange
ID: 2983 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2985 - Posted: 2 Apr 2007, 17:02:01 UTC
Last modified: 2 Apr 2007, 17:32:10 UTC

Maion, I believe your time remaining is working just the way Rhiju intended for it to. Once the remaining time estimate gets <10min. then time starts moving slower. This is avoid exceeding 100%. So, basically, once you get below a 10 minute estimated time remaining, the estimate is not on track anymore. Basically, the client is unsure exactly when it will finish, but in each case, the 15 and 17 minutes estimates were not far from right.

...But Rhiju assures us they won't be sending WUs which take more then an hour per model on Rosetta. And so on Rosetta, with shorter WUs, the estimates should appear better. The 1hr time preference is always going to be the toughest to provide a good estimate on. As it is the time preference that will see the most variation (in percentage terms) between the actual time and the preference.
ID: 2985 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 2986 - Posted: 2 Apr 2007, 18:22:50 UTC - in response to Message 2985.  

Thanks, Feet1st, that's a great explanation. We indeed try to keep the avg time per model at less than one hour; actually our ralph runs help us calibrate this!

Maion, I believe your time remaining is working just the way Rhiju intended for it to. Once the remaining time estimate gets <10min. then time starts moving slower. This is avoid exceeding 100%. So, basically, once you get below a 10 minute estimated time remaining, the estimate is not on track anymore. Basically, the client is unsure exactly when it will finish, but in each case, the 15 and 17 minutes estimates were not far from right.

...But Rhiju assures us they won't be sending WUs which take more then an hour per model on Rosetta. And so on Rosetta, with shorter WUs, the estimates should appear better. The 1hr time preference is always going to be the toughest to provide a good estimate on. As it is the time preference that will see the most variation (in percentage terms) between the actual time and the preference.


ID: 2986 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2987 - Posted: 2 Apr 2007, 18:37:16 UTC
Last modified: 2 Apr 2007, 18:38:59 UTC

Still seems to end WUs prematurely. If you restart an RNA task that's already completed 30 models... then it will end, regardless of preferred runtime.

This is that Completed 30 RNA decoys. additional message I've been mentioing.

Here's a v5.58 example.
ID: 2987 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 2988 - Posted: 2 Apr 2007, 18:41:52 UTC
Last modified: 2 Apr 2007, 18:42:22 UTC

Can anyone explain the new text on MAC results?

It looks like this

Rosetta@home Macintosh Stack Size checker.
Original size: 8388608.
Maximum size: 0.
RLIM_INFINITY 67108864

Anders n
ID: 2988 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2989 - Posted: 2 Apr 2007, 18:43:58 UTC

I did confirm this morning that even when the % completed resets to zero, when you restart the task it does seem to know to move time ahead quicker. I had some 10 or 12 hrs in to a task on my 24hr preference and every 5 second tick it was subtracting 15 seconds from the estimated time remaining. So, even though the estimate went to 24+10hrs, if you study it for a minute you can see that it knows better then that. This must be due to the BOINC correction factor applied to the % completed and the current CPU time in to the task.
ID: 2989 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2990 - Posted: 2 Apr 2007, 19:06:56 UTC - in response to Message 2988.  
Last modified: 2 Apr 2007, 19:08:21 UTC

Can anyone explain the new text on MAC results?

It looks like this

Rosetta@home Macintosh Stack Size checker.
Original size: 8388608.
Maximum size: 0.
RLIM_INFINITY 67108864

Anders n


Rhiju explained:
...I'm also reporting the stack sizes in stderr.txt which is returned from your clients to our server, so I can get some info.


I think he was trying to determine if stack size had any correlation to Mac failures.
ID: 2990 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 2991 - Posted: 2 Apr 2007, 19:15:00 UTC

@feet1st thanks :)
ID: 2991 · Report as offensive    Reply Quote
Profile ashriel

Send message
Joined: 3 Mar 07
Posts: 11
Credit: 648
RAC: 0
Message 2992 - Posted: 2 Apr 2007, 21:09:21 UTC
Last modified: 2 Apr 2007, 21:10:26 UTC

Because I don't really know what information could be helpful I post it so detailed.
But I believe they are no real help.
ID: 2992 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2993 - Posted: 2 Apr 2007, 21:19:54 UTC

Rhiju, on this issue of the % completed resetting when a task is restarted after being kicked out of memory...

I'm puzzled. Before the progress% changes, a restart would not have impacted the calculations. Why does it now? I mean it seems Rosetta used to know the correct total CPU time spent so far when it recomputed progress at end of each model. So... where did it get that number? ...and isn't THAT the number to use now? Rather then the one that resets upon restart?
ID: 2993 · Report as offensive    Reply Quote
Profile UBT - Terry
Avatar

Send message
Joined: 13 Nov 06
Posts: 2
Credit: 68,467
RAC: 0
Message 2997 - Posted: 6 Apr 2007, 12:27:56 UTC

Got this error message for this wu 06/04/2007 11:15:05|ralph@home|Reason: Unrecoverable error for result 1mhk__BOINC_RNA_ABINITIO-1mhk_-_1918_27_1 ( - exit code -1073741819 (0xc0000005))


ID: 2997 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 2998 - Posted: 6 Apr 2007, 15:50:08 UTC

Had 2 WU's fail with

stderr out
<core_client_version>5.8.15</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 2693807
ERROR:: Exit at: .loop_relax.cc line:1688

</stderr_txt>
]]>

https://ralph.bakerlab.org/result.php?resultid=486602
https://ralph.bakerlab.org/result.php?resultid=486603

Both only ran for 16 minutes, BAK workunit type.
ID: 2998 · Report as offensive    Reply Quote
Profile UBT - Terry
Avatar

Send message
Joined: 13 Nov 06
Posts: 2
Credit: 68,467
RAC: 0
Message 2999 - Posted: 6 Apr 2007, 18:27:03 UTC
Last modified: 6 Apr 2007, 18:32:41 UTC

Ive also had a couple likethis one 06/04/2007 19:19:53|ralph@home|Computation for task te00_1_NMRREF_1_te00_1_idid_model_06_core_0001IGNORE_THE_REST_idl_1917_44_0 finished
jump from 53% or there abouts upto 100% finishing in only 38 mins ???
Not sure if this is a bug or it's meant to do that
I'm running at 1.86 ghz using BOINC 5.8.15 WIN XP

ID: 2999 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 3000 - Posted: 6 Apr 2007, 22:59:18 UTC

Terry, looks like you have a 1 hour runtime preference??

You are completing the first model in something over 30 minutes, and so your % complete shows the fraction, say 35min/60min preference = 58% complete... and then it hits the end of the model and determins that you don't have time to start a second one, so it completes it.

In short, the estimate doesn't predict if you will cut out early, and until you complete model 1, it really doesn't have any way to know if you are likely to or not.
ID: 3000 · Report as offensive    Reply Quote
Thomas Leibold

Send message
Joined: 25 Feb 07
Posts: 27
Credit: 77,464
RAC: 0
Message 3001 - Posted: 7 Apr 2007, 3:20:38 UTC - in response to Message 2998.  
Last modified: 7 Apr 2007, 3:22:09 UTC

Had 2 WU's fail ...


Got one of those too:
<core_client_version>5.8.15</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1)
</message>
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
# random seed: 2693814
ERROR:: Exit at: loop_relax.cc line:1688

</stderr_txt>
]]>

Workunit 430740 on Linux Server.
ID: 3001 · Report as offensive    Reply Quote
Thomas Leibold

Send message
Joined: 25 Feb 07
Posts: 27
Credit: 77,464
RAC: 0
Message 3002 - Posted: 7 Apr 2007, 3:29:13 UTC

Workunit 431258 had problems with downloading two of its parts:

Fri 06 Apr 2007 04:33:04 AM PDT|ralph@home|[file_xfer] Started download of file 1mhk_.fasta.gz
Fri 06 Apr 2007 04:33:04 AM PDT|ralph@home|[file_xfer] Started download of file 1mhk__1ffk.fragments.gz
Fri 06 Apr 2007 04:33:06 AM PDT|ralph@home|Incomplete read of 66.000000 < 5KB for 1mhk_.fasta.gz - truncating
Fri 06 Apr 2007 04:33:06 AM PDT|ralph@home|[file_xfer] Finished download of file 1mhk_.fasta.gz
Fri 06 Apr 2007 04:33:06 AM PDT|ralph@home|[file_xfer] Throughput 623 bytes/sec
Fri 06 Apr 2007 04:33:06 AM PDT|ralph@home|[file_xfer] Started download of file 1mhk_RNA.pdb.gz
Fri 06 Apr 2007 04:33:06 AM PDT|ralph@home|[error] Checksum or signature error for 1mhk_.fasta.gz
Fri 06 Apr 2007 04:33:12 AM PDT|ralph@home|[file_xfer] Finished download of file 1mhk_RNA.pdb.gz
Fri 06 Apr 2007 04:33:12 AM PDT|ralph@home|[file_xfer] Throughput 3265 bytes/sec
Fri 06 Apr 2007 04:33:12 AM PDT|ralph@home|[file_xfer] Started download of file 1mhk__pairing.pdat.gz
Fri 06 Apr 2007 04:33:12 AM PDT|ralph@home|[error] MD5 check failed for 1mhk_RNA.pdb.gz
Fri 06 Apr 2007 04:33:12 AM PDT|ralph@home|[error] expected 43fa6b24e2ed0b12d7d949aaa6952085, got 398a6a6e30c8d9493c75a549173bcd93
Fri 06 Apr 2007 04:33:12 AM PDT|ralph@home|[error] Checksum or signature error for 1mhk_RNA.pdb.gz
Fri 06 Apr 2007 04:33:13 AM PDT|ralph@home|[file_xfer] Finished download of file 1mhk__1ffk.fragments.gz
Fri 06 Apr 2007 04:33:13 AM PDT|ralph@home|[file_xfer] Throughput 159807 bytes/sec
Fri 06 Apr 2007 04:33:13 AM PDT|ralph@home|[file_xfer] Finished download of file 1mhk__pairing.pdat.gz
Fri 06 Apr 2007 04:33:13 AM PDT|ralph@home|[file_xfer] Throughput 162 bytes/sec
Fri 06 Apr 2007 04:33:13 AM PDT|ralph@home|[error] Checksum or signature error for 1mhk__1ffk.fragments.gz
Fri 06 Apr 2007 04:33:13 AM PDT|ralph@home|[error] MD5 check failed for 1mhk__pairing.pdat.gz
Fri 06 Apr 2007 04:33:13 AM PDT|ralph@home|[error] expected 6a8599df2728416df250dcde0449ece6, got 4b92756f68af0bf0c557fdb008fb878c
Fri 06 Apr 2007 04:33:13 AM PDT|ralph@home|[error] Checksum or signature error for 1mhk__pairing.pdat.gz

<core_client_version>5.8.15</core_client_version>
<![CDATA[
<message>
WU download error: couldn't get input files:
<file_xfer_error>
  <file_name>1mhk_.fasta.gz</file_name>
  <error_code>-200</error_code>
</file_xfer_error>

</message>
]]>

ID: 3002 · Report as offensive    Reply Quote
Profile Inais
Avatar

Send message
Joined: 30 Jul 06
Posts: 12
Credit: 13,115
RAC: 0
Message 3003 - Posted: 10 Apr 2007, 7:30:08 UTC

Same problem on 4 WU's

491751 431342 10 Apr 2007 6:23:00 UTC 10 Apr 2007 6:35:14 UTC Over Client error Downloading 0.00 0.00 ---
491750 431340 10 Apr 2007 6:23:00 UTC 10 Apr 2007 6:35:14 UTC Over Client error Downloading 0.00 0.00 ---
491745 431276 10 Apr 2007 6:18:50 UTC 10 Apr 2007 6:23:00 UTC Over Client error Downloading 0.00 0.00 ---
491743 431275 10 Apr 2007 6:18:50 UTC 10 Apr 2007 6:23:00 UTC Over Client error Downloading 0.00 0.00 ---

I wish I can fly like a bird in the sky
ID: 3003 · Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : RALPH@home bug list : Bug reports for 5.56-5.59



©2024 University of Washington
http://www.bakerlab.org