Message boards : RALPH@home bug list : Bug reports for Ralph 5.36
Author | Message |
---|---|
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Hi, thanks for your input so far. This will likely be the last ralph update before we update rosetta@home, unless something goes wrong. |
wraith Send message Joined: 31 Oct 06 Posts: 4 Credit: 382 RAC: 0 |
Hi, thanks for your input so far. This will likely be the last ralph update before we update rosetta@home, unless something goes wrong. https://ralph.bakerlab.org/result.php?resultid=306545 <core_client_version>5.4.9</core_client_version> <stderr_txt> Graphics are disabled due to configuration... input_etable: reading etable... dsolv input_etable: WARNING etable types don't match! expected dsolv,606 got dsolv,721 Graphics are disabled due to configuration... input_etable: reading etable... dsolv input_etable: WARNING etable types don't match! expected dsolv,606 got dsolv,721 Graphics are disabled due to configuration... input_etable: reading etable... dsolv input_etable: WARNING etable types don't match! expected dsolv,606 got dsolv,721 Graphics are disabled due to configuration... input_etable: reading etable... dsolv input_etable: WARNING etable types don't match! expected dsolv,606 got dsolv,721 Graphics are disabled due to configuration... input_etable: reading etable... dsolv input_etable: WARNING etable types don't match! expected dsolv,606 got dsolv,721 Graphics are disabled due to configuration... Too many restarts with no progress. Keep application in memory while preempted. ====================================================== DONE :: 0 starting structures built 29 (nstruct) times This process generated 0 decoys from 0 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> <message> <file_xfer_error> <file_name>1b72__ETABLE_TEST_ABRELAX_rhh13sm6__1387_15_3_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> |
genes Send message Joined: 16 Feb 06 Posts: 45 Credit: 43,706 RAC: 20 |
Had this one happen a few minutes ago -- https://ralph.bakerlab.org/result.php?resultid=307179 I clicked "show graphics", and the graphics came up, then froze a few seconds later. The Ralph App was still running, and together with the graphics was using up 2 CPU's of my quad CPU (2 virtual) system. I couldn't close the graphics, if I tried I got a Windows message box (...not responding, end now?) which I canceled a few times to wait (to no avail), and when I chose "end now", the Ralph app terminated with a computation error. I do use the screensaver, which mostly works, but occasionally errors out or needs to be killed like this. |
Leffe Send message Joined: 19 Feb 06 Posts: 10 Credit: 3,683 RAC: 0 |
got one: 01/11/2006 08:06:15|ralph@home|Unrecoverable error for result BENCH_ABRELAX_SAVE_ALL_OUT_1a19A_BARCODE_R64_1423_10_0 ( - exit code -1073741819 (0xc0000005)) |
wraith Send message Joined: 31 Oct 06 Posts: 4 Credit: 382 RAC: 0 |
Another 'watchdog timeout' on a... 1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_ANGLES_ALL_BOND_DISTANCES_SAVE_ALL_OUT__1424_31_0 https://ralph.bakerlab.org/result.php?resultid=307852 This is also a machine that did 30 decoys on a similar job w/5.34 https://boinc.bakerlab.org/rosetta/result.php?resultid=44225313 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=39021279 |
KC0ISW Send message Joined: 17 Feb 06 Posts: 20 Credit: 11,725 RAC: 0 |
Result ID 308969 Name 2reb__BOINC_POSE_ABRELAX_VARY_ALL_BOND_ANGLES_VARY_ALL_BOND_DISTANCES_NEWRELAXFLAGS__1429_24_0 Workunit 272034 Created 2 Nov 2006 1:55:37 UTC Sent 2 Nov 2006 2:04:02 UTC Received 2 Nov 2006 2:52:50 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 4918 Report deadline 6 Nov 2006 2:04:02 UTC CPU time 2192.522693 stderr out <core_client_version>5.7.1</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 3600 # random seed: 2875416 WARNING! error deleting file .xx2reb.out ====================================================== DONE :: 1 starting structures built 1 (nstruct) times This process generated 1 decoys from 1 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> ]]> Validate state Valid Claimed credit 6.63670628586472 Granted credit 6.63670628586472 application version 5.36 |
Pepo Send message Joined: 8 Sep 06 Posts: 104 Credit: 36,890 RAC: 0 |
Also the version 5.36 is checkpointing after reaching 100% (instead of reporting the result) and then being preempted by other apps afterwards (possibly for a longer time, because of negative STD). Peter |
Krzychu P. Send message Joined: 16 Feb 06 Posts: 19 Credit: 25,687 RAC: 0 |
2006-11-03 09:26:26|ralph@home|Unrecoverable error for result 1mkyA_BOINC_POSE_ABRELAX_VARY_ALL_BOND_ANGLES_VARY_ALL_BOND_DISTANCES_NEWRELAXFLAGS__1443_33_0 (Niepoprawna funkcja. (0x1) - exit code 1 (0x1)) <core_client_version>5.6.5</core_client_version> <![CDATA[ <message> Niepoprawna funkcja. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # random seed: 2873154 # random seed: 2873154 # cpu_run_time_pref: 3600 ERROR:: Exit at: .fullatom_energy.cc line:1969 </stderr_txt> |
Pepo Send message Joined: 8 Sep 06 Posts: 104 Credit: 36,890 RAC: 0 |
Result 1dcj__BOINC_POSE_ABRELAX_VARY_SC_BOND_ANGLES_NEWRELAXFLAGS__1444_25_0 predicted crunchtime of around 12:30 hours, but finally finished after 1:19 hours, approx. similar to my other Ralph 5.36 results here. I have not noticed any previous Ralph results (or for other projects) doing such overprediction. My benchmark values did not change significantly in last days, Ralph's DCF is 0.898411. The fpops_est is the same as on previous results, 40000000000000. Or have I overseen something? By the way, now it's my at least second 5.36 WU in a row staying preempted after 100%. Peter |
Pepo Send message Joined: 8 Sep 06 Posts: 104 Credit: 36,890 RAC: 0 |
Result 1dcj__.....__1444_25_0 predicted crunchtime of around 12:30 hours, but finally finished after 1:19 hours, approx. similar to my other Ralph 5.36 results here. [...] Or have I overseen something? I'm sorry, it seems to be the same with all Ralphs, I've probably missed it previously. Peter |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Pepo: The time remaining INCREASING as the WU runs is normal. ...odd, but normal. The time remaining is really based on the completed % and the time taken so far... and the completed % only changes materially when a model is completed (which can sometimes take several hours). |
Pepo Send message Joined: 8 Sep 06 Posts: 104 Credit: 36,890 RAC: 0 |
The time remaining INCREASING as the WU runs is normal. I understand the reasons for increasing remaining time, but it was initial prediction, prior to being run - in the "ready to run" state. I probably missed the predicted time on previous results. Is it the same among other users? Peter |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
The initial prediction (unfortunately) is based on the last WU you crunched. I say "unfortunately" because it's pretty CLEAR that the project's target runtime would be a more reliable estimate. Unfortunately BOINC doesn't recognize that the project has such a target. So, if your last Ralph WUs ran 12 hrs, and then you ran out of work and changed your runtime preference to 1 or 2 hours... BOINC isn't aware of that yet, and still predicts 12 hrs. Once you crunch and report 2hr WUs, BOINC should adjust it's initial estimates. |
Pepo Send message Joined: 8 Sep 06 Posts: 104 Credit: 36,890 RAC: 0 |
The initial prediction (unfortunately) is based on the last WU you crunched. I say "unfortunately" because it's pretty CLEAR that the project's target runtime would be a more reliable estimate. Unfortunately BOINC doesn't recognize that the project has such a target. Approx. until 8.11. I used the default target time. Then (after noticing I can change it :-) I modified it to 2 hours. So, if your last Ralph WUs ran 12 hrs, and then you ran out of work and changed your runtime preference to 1 or 2 hours... BOINC isn't aware of that yet, and still predicts 12 hrs. Once you crunch and report 2hr WUs, BOINC should adjust it's initial estimates. Since attaching the host to Ralph to the moment I noticed the long prediction, it crunched 4-5 WUs (with default target time) with crunch times ranging between 40-80 minutes (probably depending on the CPU frequency). I'd rather believe the prediction will stabilize with time and depends only on the fixed fpops_est value (40000000000000), usually fixed target time, computer speed and calculated DCF. If I'm right, my DCF should once stabilize somewhere around 0.15-0.20? Peter |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Well, that's the thing that BOINC doesn't understand about how Rosetta (and Ralph) work. They actually crunch as best they can to meet your runtime objective. So, regardless of the Ghz of your machine, it will produce more models to fill the target runtime. This is why credit is granted on models crunched, rather then the BOINC claimed credit. What I'm trying to say is that regardless of the speed of your machine, a 2hr runtime preference is still 2 hours. But with a faster machine, you can crunch more models in 2hrs. And because you can crunch models faster, you are less likely to dramatically exceed your preference just to complete the first model. Some of the larger WUs can take several hours to crunch one model. And one model is the minimum you can report with. On slow machines this may be more like 6-8 hours for a single model. This is regardless of the runtime preference, because the minimum you can report is one completed model... and this is perhaps the type of thing that may have thrown your estimates off. |
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.36
©2024 University of Washington
http://www.bakerlab.org