Bug reports for Ralph 5.36

Message boards : RALPH@home bug list : Bug reports for Ralph 5.36

To post messages, you must log in.

AuthorMessage
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 2439 - Posted: 31 Oct 2006, 2:22:22 UTC

Hi, thanks for your input so far. This will likely be the last ralph update before we update rosetta@home, unless something goes wrong.
ID: 2439 · Report as offensive    Reply Quote
wraith

Send message
Joined: 31 Oct 06
Posts: 4
Credit: 382
RAC: 0
Message 2440 - Posted: 31 Oct 2006, 13:59:22 UTC - in response to Message 2439.  
Last modified: 31 Oct 2006, 14:55:50 UTC

Hi, thanks for your input so far. This will likely be the last ralph update before we update rosetta@home, unless something goes wrong.


https://ralph.bakerlab.org/result.php?resultid=306545

<core_client_version>5.4.9</core_client_version>
<stderr_txt>
Graphics are disabled due to configuration...
input_etable: reading etable... dsolv
input_etable: WARNING etable types don't match! 
              expected dsolv,606 got dsolv,721
Graphics are disabled due to configuration...
input_etable: reading etable... dsolv
input_etable: WARNING etable types don't match! 
              expected dsolv,606 got dsolv,721
Graphics are disabled due to configuration...
input_etable: reading etable... dsolv
input_etable: WARNING etable types don't match! 
              expected dsolv,606 got dsolv,721
Graphics are disabled due to configuration...
input_etable: reading etable... dsolv
input_etable: WARNING etable types don't match! 
              expected dsolv,606 got dsolv,721
Graphics are disabled due to configuration...
input_etable: reading etable... dsolv
input_etable: WARNING etable types don't match! 
              expected dsolv,606 got dsolv,721
Graphics are disabled due to configuration...
Too many restarts with no progress. Keep application in memory while preempted.
======================================================
DONE ::     0 starting structures built        29 (nstruct) times
This process generated      0 decoys from       0 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
  <file_name>1b72__ETABLE_TEST_ABRELAX_rhh13sm6__1387_15_3_0</file_name>
  <error_code>-161</error_code>
</file_xfer_error>

</message>
ID: 2440 · Report as offensive    Reply Quote
genes
Avatar

Send message
Joined: 16 Feb 06
Posts: 45
Credit: 43,706
RAC: 20
Message 2441 - Posted: 1 Nov 2006, 0:55:42 UTC
Last modified: 1 Nov 2006, 1:45:23 UTC

Had this one happen a few minutes ago --

https://ralph.bakerlab.org/result.php?resultid=307179

I clicked "show graphics", and the graphics came up, then froze a few seconds later. The Ralph App was still running, and together with the graphics was using up 2 CPU's of my quad CPU (2 virtual) system. I couldn't close the graphics, if I tried I got a Windows message box (...not responding, end now?) which I canceled a few times to wait (to no avail), and when I chose "end now", the Ralph app terminated with a computation error.

I do use the screensaver, which mostly works, but occasionally errors out or needs to be killed like this.

ID: 2441 · Report as offensive    Reply Quote
Leffe

Send message
Joined: 19 Feb 06
Posts: 10
Credit: 3,683
RAC: 0
Message 2443 - Posted: 1 Nov 2006, 5:15:33 UTC

got one: 01/11/2006 08:06:15|ralph@home|Unrecoverable error for result BENCH_ABRELAX_SAVE_ALL_OUT_1a19A_BARCODE_R64_1423_10_0 ( - exit code -1073741819 (0xc0000005))

ID: 2443 · Report as offensive    Reply Quote
wraith

Send message
Joined: 31 Oct 06
Posts: 4
Credit: 382
RAC: 0
Message 2444 - Posted: 1 Nov 2006, 11:01:47 UTC
Last modified: 1 Nov 2006, 11:54:39 UTC

Another 'watchdog timeout' on a...

1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_ANGLES_ALL_BOND_DISTANCES_SAVE_ALL_OUT__1424_31_0

https://ralph.bakerlab.org/result.php?resultid=307852

This is also a machine that did 30 decoys on a similar job w/5.34

https://boinc.bakerlab.org/rosetta/result.php?resultid=44225313
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=39021279
ID: 2444 · Report as offensive    Reply Quote
Profile KC0ISW

Send message
Joined: 17 Feb 06
Posts: 20
Credit: 11,725
RAC: 0
Message 2446 - Posted: 2 Nov 2006, 2:09:23 UTC

Result ID 308969
Name 2reb__BOINC_POSE_ABRELAX_VARY_ALL_BOND_ANGLES_VARY_ALL_BOND_DISTANCES_NEWRELAXFLAGS__1429_24_0
Workunit 272034
Created 2 Nov 2006 1:55:37 UTC
Sent 2 Nov 2006 2:04:02 UTC
Received 2 Nov 2006 2:52:50 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 4918
Report deadline 6 Nov 2006 2:04:02 UTC
CPU time 2192.522693
stderr out <core_client_version>5.7.1</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 3600
# random seed: 2875416
WARNING! error deleting file .xx2reb.out
======================================================
DONE :: 1 starting structures built 1 (nstruct) times
This process generated 1 decoys from 1 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
]]>


Validate state Valid
Claimed credit 6.63670628586472
Granted credit 6.63670628586472
application version 5.36

ID: 2446 · Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 2447 - Posted: 2 Nov 2006, 9:30:09 UTC
Last modified: 2 Nov 2006, 10:18:01 UTC

Also the version 5.36 is checkpointing after reaching 100% (instead of reporting the result) and then being preempted by other apps afterwards (possibly for a longer time, because of negative STD).

Peter
ID: 2447 · Report as offensive    Reply Quote
Profile Krzychu P.

Send message
Joined: 16 Feb 06
Posts: 19
Credit: 25,687
RAC: 0
Message 2448 - Posted: 3 Nov 2006, 7:57:11 UTC


2006-11-03 09:26:26|ralph@home|Unrecoverable error for result 1mkyA_BOINC_POSE_ABRELAX_VARY_ALL_BOND_ANGLES_VARY_ALL_BOND_DISTANCES_NEWRELAXFLAGS__1443_33_0 (Niepoprawna funkcja. (0x1) - exit code 1 (0x1))


<core_client_version>5.6.5</core_client_version>
<![CDATA[
<message>
Niepoprawna funkcja. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 2873154
# random seed: 2873154
# cpu_run_time_pref: 3600
ERROR:: Exit at: .fullatom_energy.cc line:1969

</stderr_txt>
ID: 2448 · Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 2449 - Posted: 3 Nov 2006, 10:12:32 UTC

Result 1dcj__BOINC_POSE_ABRELAX_VARY_SC_BOND_ANGLES_NEWRELAXFLAGS__1444_25_0 predicted crunchtime of around 12:30 hours, but finally finished after 1:19 hours, approx. similar to my other Ralph 5.36 results here.

I have not noticed any previous Ralph results (or for other projects) doing such overprediction. My benchmark values did not change significantly in last days, Ralph's DCF is 0.898411. The fpops_est is the same as on previous results, 40000000000000. Or have I overseen something?

By the way, now it's my at least second 5.36 WU in a row staying preempted after 100%.

Peter
ID: 2449 · Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 2518 - Posted: 9 Nov 2006, 12:36:29 UTC - in response to Message 2449.  

Result 1dcj__.....__1444_25_0 predicted crunchtime of around 12:30 hours, but finally finished after 1:19 hours, approx. similar to my other Ralph 5.36 results here. [...] Or have I overseen something?

I'm sorry, it seems to be the same with all Ralphs, I've probably missed it previously.

Peter
ID: 2518 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2521 - Posted: 9 Nov 2006, 16:48:49 UTC

Pepo:

The time remaining INCREASING as the WU runs is normal.

...odd, but normal. The time remaining is really based on the completed % and the time taken so far... and the completed % only changes materially when a model is completed (which can sometimes take several hours).
ID: 2521 · Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 2523 - Posted: 9 Nov 2006, 20:31:08 UTC - in response to Message 2521.  

The time remaining INCREASING as the WU runs is normal.

I understand the reasons for increasing remaining time, but it was initial prediction, prior to being run - in the "ready to run" state. I probably missed the predicted time on previous results.

Is it the same among other users?

Peter
ID: 2523 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2541 - Posted: 17 Nov 2006, 19:14:36 UTC

The initial prediction (unfortunately) is based on the last WU you crunched. I say "unfortunately" because it's pretty CLEAR that the project's target runtime would be a more reliable estimate. Unfortunately BOINC doesn't recognize that the project has such a target. So, if your last Ralph WUs ran 12 hrs, and then you ran out of work and changed your runtime preference to 1 or 2 hours... BOINC isn't aware of that yet, and still predicts 12 hrs. Once you crunch and report 2hr WUs, BOINC should adjust it's initial estimates.
ID: 2541 · Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 2545 - Posted: 18 Nov 2006, 21:19:18 UTC - in response to Message 2541.  

The initial prediction (unfortunately) is based on the last WU you crunched. I say "unfortunately" because it's pretty CLEAR that the project's target runtime would be a more reliable estimate. Unfortunately BOINC doesn't recognize that the project has such a target.

Approx. until 8.11. I used the default target time. Then (after noticing I can change it :-) I modified it to 2 hours.
So, if your last Ralph WUs ran 12 hrs, and then you ran out of work and changed your runtime preference to 1 or 2 hours... BOINC isn't aware of that yet, and still predicts 12 hrs. Once you crunch and report 2hr WUs, BOINC should adjust it's initial estimates.

Since attaching the host to Ralph to the moment I noticed the long prediction, it crunched 4-5 WUs (with default target time) with crunch times ranging between 40-80 minutes (probably depending on the CPU frequency).

I'd rather believe the prediction will stabilize with time and depends only on the fixed fpops_est value (40000000000000), usually fixed target time, computer speed and calculated DCF. If I'm right, my DCF should once stabilize somewhere around 0.15-0.20?

Peter
ID: 2545 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2547 - Posted: 20 Nov 2006, 20:34:37 UTC

Well, that's the thing that BOINC doesn't understand about how Rosetta (and Ralph) work. They actually crunch as best they can to meet your runtime objective. So, regardless of the Ghz of your machine, it will produce more models to fill the target runtime. This is why credit is granted on models crunched, rather then the BOINC claimed credit.

What I'm trying to say is that regardless of the speed of your machine, a 2hr runtime preference is still 2 hours. But with a faster machine, you can crunch more models in 2hrs. And because you can crunch models faster, you are less likely to dramatically exceed your preference just to complete the first model.

Some of the larger WUs can take several hours to crunch one model. And one model is the minimum you can report with. On slow machines this may be more like 6-8 hours for a single model. This is regardless of the runtime preference, because the minimum you can report is one completed model... and this is perhaps the type of thing that may have thrown your estimates off.
ID: 2547 · Report as offensive    Reply Quote

Message boards : RALPH@home bug list : Bug reports for Ralph 5.36



©2024 University of Washington
http://www.bakerlab.org