Message boards : RALPH@home bug list : Bug reports for Ralph 5.02
Author | Message |
---|---|
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
We're testing some new features (see news on main page). Please pay special attention to jobs that appear stuck or appear to be taking too long! We're hoping a new watchdog thread will catch them |
Nikolay A. Saharov Send message Joined: 17 Feb 06 Posts: 6 Credit: 25,102 RAC: 0 |
I have 3 errored results: 1. 92417 and 92455 finished with the message
2. 91907 finished with the text:
|
Yeti Send message Joined: 19 Feb 06 Posts: 32 Credit: 316,371 RAC: 853 |
|
Yeti Send message Joined: 19 Feb 06 Posts: 32 Credit: 316,371 RAC: 853 |
|
tralala Send message Joined: 12 Apr 06 Posts: 52 Credit: 15,257 RAC: 0 |
Out of seven six have crashed: https://ralph.bakerlab.org/results.php?userid=1266 Although I have 5.4.3 installed I didn't get a large crash-dump |
Pieface Send message Joined: 16 Feb 06 Posts: 64 Credit: 203,513 RAC: 0 |
Had one die this morning with 0xc00000005, result: resultid Looks like the old died while swapping problem. 4/22/2006 5:49:41 AM|ralph@home|Restarting task FACONTACTS_RECENTER_NOFILTERS_1a68__399_7_0 using rosetta_beta version 502 4/22/2006 5:49:41 AM|ralph@home|Restarting task FACONTACTS_RECENTER_NOFILTERS_1ew4A_399_2_0 using rosetta_beta version 502 4/22/2006 5:49:41 AM|SETI@home Beta Test|Pausing task 01jn01aa.27448.448.572166.3.124_1 (removed from memory) 4/22/2006 5:49:41 AM|SETI@home Beta Test|Pausing task 01jn01aa.27448.448.572166.3.132_3 (removed from memory) 4/22/2006 6:49:41 AM|ralph@home|Pausing task FACONTACTS_RECENTER_NOFILTERS_1ew4A_399_2_0 (removed from memory) 4/22/2006 6:49:41 AM|SETI@home Beta Test|Restarting task 01jn01aa.27448.448.572166.3.124_1 using setiathome_enhanced version 511 4/22/2006 6:49:43 AM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1a68__399_7_0 ( - exit code -1073741819 (0xc0000005)) |
Pieface Send message Joined: 16 Feb 06 Posts: 64 Credit: 203,513 RAC: 0 |
oops, my bad, that points to the one from yesterday, the one this morning that the log entries go with is here: 91953 |
Snake Doctor Send message Joined: 16 Feb 06 Posts: 37 Credit: 998,880 RAC: 0 |
Just got this one here. This was on a MAC Dual G4 running MAC OS 10.4.6, BOINC 5.3.28 WU - NO_CHECK_7486h002_dec123_1.pdb_407_19_0 Looks like a file problem from this error message - <message><file_xfer_error> <file_name>NO_CHECK_7486h002_dec123_1.pdb_407_19_0_0</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> </message> |
[B^S] Doug Worrall Send message Joined: 16 Feb 06 Posts: 10 Credit: 1,515 RAC: 0 |
Just got this one here. Hello, I thought all the w/u that were fubarred was cause my P.C. crashed yesteday. The last 8 w/u only 1 worked well.Amstill having difficulties tweeking this new system.The same w/u above stuck at 1.47% at 1 hour,some I have let go 2 to 3 hours before aborting.I just saw in Tam thread to let new Feture to handle these w/u.Will Edit this with the correct information. Sincerely Sluger |
Daxl Send message Joined: 1 Mar 06 Posts: 2 Credit: 55,301 RAC: 0 |
All 6 WU's have crashed on my Laptop : P4-M 2,2 GHz 512 MB Memory (XP-SP2) WU-83346 Error -161 WU-83301 Error -161 WU-83275 Watchdog kill WU-83276 Error -161 WU-83302 Error -161 WU-83282 Error -161 ----------------------------------------------------------------------------- <core_client_version>5.4.4</core_client_version> <stderr_txt> # random seed: 3885665 # cpu_run_time_pref: 3600 ********************************************************************** Rosetta score stayed the same too long. Watchdog is killing the run! ********************************************************************** </stderr_txt> <message><file_xfer_error> <file_name>NO_CHECK_7486h002_dec124_1.pdb_407_9_0_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ---------------------------------------------------------------------------- <core_client_version>5.4.4</core_client_version> <stderr_txt> # random seed: 3885638 # cpu_run_time_pref: 3600 # DONE :: 1 starting structures built 5 (nstruct) times # This process generated 1 decoys from 1 attempts # 0 starting pdbs were skipped </stderr_txt> <message><file_xfer_error> <file_name>NO_CHECK_7486h002_dec129_1.pdb_407_16_1_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ----------------------------------------------------------------------------- greetz DAXL |
Daxl Send message Joined: 1 Mar 06 Posts: 2 Credit: 55,301 RAC: 0 |
6 out of 12 WU's have crashed - 6 aborted On my Athlon 64-3000 1GB Memory (XP SP2) WU-83315 Error -161 WU-83216 Error -161 WU-83217 Error -161 WU-83218 Error -161 WU-83219 Error -161 WU-83222 Error -161 --------------------------------------------------------------------- <core_client_version>5.4.4</core_client_version> <stderr_txt> # random seed: 3885631 # cpu_run_time_pref: 3600 # DONE :: 1 starting structures built 5 (nstruct) times # This process generated 3 decoys from 3 attempts # 0 starting pdbs were skipped </stderr_txt> <message><file_xfer_error> <file_name>NO_CHECK_7486h002_dec184_1.pdb_407_3_0_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> --------------------------------------------------------------------- greetz DAXL |
suguruhirahara Send message Joined: 5 Mar 06 Posts: 40 Credit: 11,320 RAC: 0 |
on winxp 64bit https://ralph.bakerlab.org/result.php?resultid=91614 <core_client_version>5.2.13</core_client_version> <stderr_txt> # random seed: 3886628 # cpu_run_time_pref: 3600 ********************************************************************** Rosetta score stayed the same too long. Watchdog is killing the run! ********************************************************************** </stderr_txt> <message><file_xfer_error> <file_name>FACONTACTS_RECENTER_NOFILTERS_1pgx__399_1_0_0</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> </message> Anyway, what is the watchdog? |
Psycodad Send message Joined: 16 Feb 06 Posts: 14 Credit: 2,157 RAC: 0 |
22.04.2006 17:50:28|ralph@home|Unrecoverable error for result NO_CHECK_7486h002_dec123_1.pdb_407_8_1 (<file_xfer_error> <file_name>NO_CHECK_7486h002_dec123_1.pdb_407_8_1_0</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error>) WU Result <core_client_version>5.2.13</core_client_version> <stderr_txt> # random seed: 3885686 # cpu_run_time_pref: 3600 ********************************************************************** Rosetta score stayed the same too long. Watchdog is killing the run! ********************************************************************** </stderr_txt> <message><file_xfer_error> <file_name>NO_CHECK_7486h002_dec123_1.pdb_407_8_1_0</file_name> <error_code>-161</error_code> <error_message></error_message> </file_xfer_error> </message> |
casio7131 Send message Joined: 20 Mar 06 Posts: 15 Credit: 12,660 RAC: 0 |
8 results where watchdog killed the run. i think that it might be killing it a bit too early because this machine doesn't usually get stuck or error out too often. 22/04/2006 6:41:39 PM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1ail__399_7_0 (Incorrect function. (0x1) - exit code 1 (0x1)) https://ralph.bakerlab.org/result.php?resultid=91955 22/04/2006 6:43:50 PM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1a32__399_8_0 (<file_xfer_error> <file_name>FACONTACTS_RECENTER_NOFILTERS_1a32__399_8_0_0</file_name> <error_code>-161</error_code></file_xfer_error>) https://ralph.bakerlab.org/result.php?resultid=92014 22/04/2006 11:15:56 PM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1ubi__399_8_0 (Incorrect function. (0x1) - exit code 1 (0x1)) https://ralph.bakerlab.org/result.php?resultid=92058 22/04/2006 11:15:59 PM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1who__399_6_0 (<file_xfer_error> <file_name>FACONTACTS_RECENTER_NOFILTERS_1who__399_6_0_0</file_name> <error_code>-161</error_code></file_xfer_error>) https://ralph.bakerlab.org/result.php?resultid=91941 23/04/2006 2:52:01 AM|ralph@home|Unrecoverable error for result HOMO_7486_h002_1_LOOPRLX_7486h002_dec184_1.pdb_406_9_2 (Incorrect function. (0x1) - exit code 1 (0x1)) https://ralph.bakerlab.org/result.php?resultid=93253 23/04/2006 2:52:06 AM|ralph@home|Unrecoverable error for result NO_CHECK_7486h002_dec124_1.pdb_407_3_0 (<file_xfer_error> <file_name>NO_CHECK_7486h002_dec124_1.pdb_407_3_0_0</file_name> <error_code>-161</error_code></file_xfer_error>) https://ralph.bakerlab.org/result.php?resultid=92833 23/04/2006 7:35:36 AM|ralph@home|Unrecoverable error for result NO_CHECK_7486h002_dec123_1.pdb_407_12_1 (Incorrect function. (0x1) - exit code 1 (0x1)) https://ralph.bakerlab.org/result.php?resultid=93254 23/04/2006 7:35:42 AM|ralph@home|Unrecoverable error for result HOMO_7486_h002_1_LOOPRLX_7486h002_dec08_1.pdb_406_15_1 (<file_xfer_error> <file_name>HOMO_7486_h002_1_LOOPRLX_7486h002_dec08_1.pdb_406_15_1_0</file_name> <error_code>-161</error_code></file_xfer_error>) https://ralph.bakerlab.org/result.php?resultid=93255 |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Thanks for the posts. We think we've tracked down the two most common errors. The watchdog does seem to be a little too aggressive... we'll see how things go for ralph 5.03! 8 results where watchdog killed the run. i think that it might be killing it a bit too early because this machine doesn't usually get stuck or error out too often. |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Is anyone out there running with a Mac? Are your jobs from 5.02 or 5.03 running? |
Leffe Send message Joined: 19 Feb 06 Posts: 10 Credit: 3,683 RAC: 0 |
win xp pro sp2 boinc 5.2.13 Ralph 5.02 23/04/2006 12:50:50|ralph@home|Unrecoverable error for result NO_CHECK_7486h002_dec08_1.pdb_407_3_1 (<file_xfer_error> <file_name>NO_CHECK_7486h002_dec08_1.pdb_407_3_1_0</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error>) |
Robert Everly Send message Joined: 16 Feb 06 Posts: 10 Credit: 2,333 RAC: 0 |
All three of my 5.02 WUs were killed by the watchdog thread. resultid=91985 resultid=91973 resultid=91972 I still have my settings to leave the app in memory when switching. Is it possible that the watchdog thread is taking that time into consideration? I have my systems set to switch projects every hour. All of mine aborted very very close to the one hour mark. |
paul and kirsty yates Send message Joined: 16 Feb 06 Posts: 11 Credit: 949 RAC: 0 |
|
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
|
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.02
©2024 University of Washington
http://www.bakerlab.org