| Author | Message |
|
|
|
We\'re testing some new features (see news on main page).
Please pay special attention to jobs that appear stuck or appear to be taking too long! We\'re hoping a new watchdog thread will catch them
____________
|
|
|
|
|
|
I have 3 errored results:
1. 92417 and 92455 finished with the message
<message>Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
ERROR:: Exit at: .\\fragments.cc line:687
</stderr_txt>
2. 91907 finished with the text:
<stderr_txt>
# random seed: 3886793
# cpu_run_time_pref: 7200
**********************************************************************
Rosetta score stayed the same too long. Watchdog is killing the run!
**********************************************************************
</stderr_txt>
<message><file_xfer_error>
<file_name>FACONTACTS_RECENTER_NOFILTERS_1dhn__399_6_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
</message>
____________
 |
|
|
|
|
|
So far, all 5.02-WUs have crashed:
http://ralph.bakerlab.org/result.php?resultid=91805
http://ralph.bakerlab.org/result.php?resultid=91808
http://ralph.bakerlab.org/result.php?resultid=91965
Sevral more in progress, let\'s see, what\'s going on
____________

Supporting BOINC, a great concept ! |
|
|
|
|
|
Here is one with a large crash-dump:
http://ralph.bakerlab.org/result.php?resultid=92882
____________

Supporting BOINC, a great concept ! |
|
|
|
|
|
Out of seven six have crashed:
http://ralph.bakerlab.org/results.php?userid=1266
Although I have 5.4.3 installed I didn\'t get a large crash-dump
____________
|
|
|
|
|
|
Had one die this morning with 0xc00000005, result: resultid
Looks like the old died while swapping problem.
4/22/2006 5:49:41 AM|ralph@home|Restarting task FACONTACTS_RECENTER_NOFILTERS_1a68__399_7_0 using rosetta_beta version 502
4/22/2006 5:49:41 AM|ralph@home|Restarting task FACONTACTS_RECENTER_NOFILTERS_1ew4A_399_2_0 using rosetta_beta version 502
4/22/2006 5:49:41 AM|SETI@home Beta Test|Pausing task 01jn01aa.27448.448.572166.3.124_1 (removed from memory)
4/22/2006 5:49:41 AM|SETI@home Beta Test|Pausing task 01jn01aa.27448.448.572166.3.132_3 (removed from memory)
4/22/2006 6:49:41 AM|ralph@home|Pausing task FACONTACTS_RECENTER_NOFILTERS_1ew4A_399_2_0 (removed from memory)
4/22/2006 6:49:41 AM|SETI@home Beta Test|Restarting task 01jn01aa.27448.448.572166.3.124_1 using setiathome_enhanced version 511
4/22/2006 6:49:43 AM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1a68__399_7_0 ( - exit code -1073741819 (0xc0000005))
|
|
|
|
|
|
oops, my bad, that points to the one from yesterday, the one this morning that the log entries go with is here: 91953 |
|
|
|
|
|
Just got this one here.
This was on a MAC Dual G4 running MAC OS 10.4.6, BOINC 5.3.28
WU - NO_CHECK_7486h002_dec123_1.pdb_407_19_0
Looks like a file problem from this error message -
<message><file_xfer_error>
<file_name>NO_CHECK_7486h002_dec123_1.pdb_407_19_0_0</file_name>
<error_code>-161</error_code>
<error_message></error_message>
</file_xfer_error>
</message>
____________
|
|
|
|
|
Just got this one here.
This was on a MAC Dual G4 running MAC OS 10.4.6, BOINC 5.3.28
WU - NO_CHECK_7486h002_dec123_1.pdb_407_19_0
Looks like a file problem from this error message -
<message><file_xfer_error>
<file_name>NO_CHECK_7486h002_dec123_1.pdb_407_19_0_0</file_name>
<error_code>-161</error_code>
<error_message></error_message>
</file_xfer_error>
</message>
Hello,
I thought all the w/u that were fubarred was cause my P.C. crashed yesteday.
The last 8 w/u only 1 worked well.Amstill having difficulties tweeking this new system.The same w/u above stuck at 1.47% at 1 hour,some I have let go 2 to 3 hours
before aborting.I just saw in Tam thread to let new Feture to handle these
w/u.Will Edit this with the correct information.
Sincerely
Sluger
____________
 |
|
|
|
|
|
All 6 WU\'s have crashed on my Laptop : P4-M 2,2 GHz 512 MB Memory (XP-SP2)
WU-83346 Error -161
WU-83301 Error -161
WU-83275 Watchdog kill
WU-83276 Error -161
WU-83302 Error -161
WU-83282 Error -161
-----------------------------------------------------------------------------
<core_client_version>5.4.4</core_client_version>
<stderr_txt>
# random seed: 3885665
# cpu_run_time_pref: 3600
**********************************************************************
Rosetta score stayed the same too long. Watchdog is killing the run!
**********************************************************************
</stderr_txt>
<message><file_xfer_error>
<file_name>NO_CHECK_7486h002_dec124_1.pdb_407_9_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
</message>
----------------------------------------------------------------------------
<core_client_version>5.4.4</core_client_version>
<stderr_txt>
# random seed: 3885638
# cpu_run_time_pref: 3600
# DONE :: 1 starting structures built 5 (nstruct) times
# This process generated 1 decoys from 1 attempts
# 0 starting pdbs were skipped
</stderr_txt>
<message><file_xfer_error>
<file_name>NO_CHECK_7486h002_dec129_1.pdb_407_16_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
</message>
-----------------------------------------------------------------------------
greetz DAXL |
|
|
|
|
|
6 out of 12 WU\'s have crashed - 6 aborted
On my Athlon 64-3000 1GB Memory (XP SP2)
WU-83315 Error -161
WU-83216 Error -161
WU-83217 Error -161
WU-83218 Error -161
WU-83219 Error -161
WU-83222 Error -161
---------------------------------------------------------------------
<core_client_version>5.4.4</core_client_version>
<stderr_txt>
# random seed: 3885631
# cpu_run_time_pref: 3600
# DONE :: 1 starting structures built 5 (nstruct) times
# This process generated 3 decoys from 3 attempts
# 0 starting pdbs were skipped
</stderr_txt>
<message><file_xfer_error>
<file_name>NO_CHECK_7486h002_dec184_1.pdb_407_3_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
</message>
---------------------------------------------------------------------
greetz DAXL
|
|
|
|
|
|
on winxp 64bit
http://ralph.bakerlab.org/result.php?resultid=91614
<core_client_version>5.2.13</core_client_version>
<stderr_txt>
# random seed: 3886628
# cpu_run_time_pref: 3600
**********************************************************************
Rosetta score stayed the same too long. Watchdog is killing the run!
**********************************************************************
</stderr_txt>
<message><file_xfer_error>
<file_name>FACONTACTS_RECENTER_NOFILTERS_1pgx__399_1_0_0</file_name>
<error_code>-161</error_code>
<error_message></error_message>
</file_xfer_error>
</message>
Anyway, what is the watchdog?
____________
|
|
|
|
|
|
22.04.2006 17:50:28|ralph@home|Unrecoverable error for result NO_CHECK_7486h002_dec123_1.pdb_407_8_1 (<file_xfer_error> <file_name>NO_CHECK_7486h002_dec123_1.pdb_407_8_1_0</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error>)
WU
Result
<core_client_version>5.2.13</core_client_version>
<stderr_txt>
# random seed: 3885686
# cpu_run_time_pref: 3600
**********************************************************************
Rosetta score stayed the same too long. Watchdog is killing the run!
**********************************************************************
</stderr_txt>
<message><file_xfer_error>
<file_name>NO_CHECK_7486h002_dec123_1.pdb_407_8_1_0</file_name>
<error_code>-161</error_code>
<error_message></error_message>
</file_xfer_error>
</message>
____________
|
|
|
|
|
|
8 results where watchdog killed the run. i think that it might be killing it a bit too early because this machine doesn\'t usually get stuck or error out too often.
22/04/2006 6:41:39 PM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1ail__399_7_0 (Incorrect function. (0x1) - exit code 1 (0x1))
http://ralph.bakerlab.org/result.php?resultid=91955
22/04/2006 6:43:50 PM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1a32__399_8_0 (<file_xfer_error> <file_name>FACONTACTS_RECENTER_NOFILTERS_1a32__399_8_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
http://ralph.bakerlab.org/result.php?resultid=92014
22/04/2006 11:15:56 PM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1ubi__399_8_0 (Incorrect function. (0x1) - exit code 1 (0x1))
http://ralph.bakerlab.org/result.php?resultid=92058
22/04/2006 11:15:59 PM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1who__399_6_0 (<file_xfer_error> <file_name>FACONTACTS_RECENTER_NOFILTERS_1who__399_6_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
http://ralph.bakerlab.org/result.php?resultid=91941
23/04/2006 2:52:01 AM|ralph@home|Unrecoverable error for result HOMO_7486_h002_1_LOOPRLX_7486h002_dec184_1.pdb_406_9_2 (Incorrect function. (0x1) - exit code 1 (0x1))
http://ralph.bakerlab.org/result.php?resultid=93253
23/04/2006 2:52:06 AM|ralph@home|Unrecoverable error for result NO_CHECK_7486h002_dec124_1.pdb_407_3_0 (<file_xfer_error> <file_name>NO_CHECK_7486h002_dec124_1.pdb_407_3_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
http://ralph.bakerlab.org/result.php?resultid=92833
23/04/2006 7:35:36 AM|ralph@home|Unrecoverable error for result NO_CHECK_7486h002_dec123_1.pdb_407_12_1 (Incorrect function. (0x1) - exit code 1 (0x1))
http://ralph.bakerlab.org/result.php?resultid=93254
23/04/2006 7:35:42 AM|ralph@home|Unrecoverable error for result HOMO_7486_h002_1_LOOPRLX_7486h002_dec08_1.pdb_406_15_1 (<file_xfer_error> <file_name>HOMO_7486_h002_1_LOOPRLX_7486h002_dec08_1.pdb_406_15_1_0</file_name> <error_code>-161</error_code></file_xfer_error>)
http://ralph.bakerlab.org/result.php?resultid=93255
____________
|
|
|
|
|
|
Thanks for the posts. We think we\'ve tracked down the
two most common errors. The watchdog does seem to be
a little too aggressive... we\'ll see how things
go for ralph 5.03!
8 results where watchdog killed the run. i think that it might be killing it a bit too early because this machine doesn\'t usually get stuck or error out too often.
22/04/2006 6:41:39 PM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1ail__399_7_0 (Incorrect function. (0x1) - exit code 1 (0x1))
http://ralph.bakerlab.org/result.php?resultid=91955
22/04/2006 6:43:50 PM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1a32__399_8_0 (<file_xfer_error> <file_name>FACONTACTS_RECENTER_NOFILTERS_1a32__399_8_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
http://ralph.bakerlab.org/result.php?resultid=92014
22/04/2006 11:15:56 PM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1ubi__399_8_0 (Incorrect function. (0x1) - exit code 1 (0x1))
http://ralph.bakerlab.org/result.php?resultid=92058
22/04/2006 11:15:59 PM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1who__399_6_0 (<file_xfer_error> <file_name>FACONTACTS_RECENTER_NOFILTERS_1who__399_6_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
http://ralph.bakerlab.org/result.php?resultid=91941
23/04/2006 2:52:01 AM|ralph@home|Unrecoverable error for result HOMO_7486_h002_1_LOOPRLX_7486h002_dec184_1.pdb_406_9_2 (Incorrect function. (0x1) - exit code 1 (0x1))
http://ralph.bakerlab.org/result.php?resultid=93253
23/04/2006 2:52:06 AM|ralph@home|Unrecoverable error for result NO_CHECK_7486h002_dec124_1.pdb_407_3_0 (<file_xfer_error> <file_name>NO_CHECK_7486h002_dec124_1.pdb_407_3_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
http://ralph.bakerlab.org/result.php?resultid=92833
23/04/2006 7:35:36 AM|ralph@home|Unrecoverable error for result NO_CHECK_7486h002_dec123_1.pdb_407_12_1 (Incorrect function. (0x1) - exit code 1 (0x1))
http://ralph.bakerlab.org/result.php?resultid=93254
23/04/2006 7:35:42 AM|ralph@home|Unrecoverable error for result HOMO_7486_h002_1_LOOPRLX_7486h002_dec08_1.pdb_406_15_1 (<file_xfer_error> <file_name>HOMO_7486_h002_1_LOOPRLX_7486h002_dec08_1.pdb_406_15_1_0</file_name> <error_code>-161</error_code></file_xfer_error>)
http://ralph.bakerlab.org/result.php?resultid=93255
____________
|
|
|
|
|
|
Is anyone out there running with a Mac? Are your jobs from 5.02 or 5.03 running?
____________
|
|
|
|
|
|
win xp pro sp2
boinc 5.2.13
Ralph 5.02
23/04/2006 12:50:50|ralph@home|Unrecoverable error for result NO_CHECK_7486h002_dec08_1.pdb_407_3_1 (<file_xfer_error> <file_name>NO_CHECK_7486h002_dec08_1.pdb_407_3_1_0</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error>)
____________
|
|
|
|
|
|
All three of my 5.02 WUs were killed by the watchdog thread.
resultid=91985
resultid=91973
resultid=91972
I still have my settings to leave the app in memory when switching. Is it possible that the watchdog thread is taking that time into consideration? I have my systems set to switch projects every hour. All of mine aborted very very close to the one hour mark.
____________
|
|
|
|
|
|
i also got a watchdog killing :(
on this one this one

____________
 |
|
|
|
|
|
The dog is barking bad :)
http://ralph.bakerlab.org/results.php?hostid=2049
Anders n
____________
|
|
|
|
|
Is anyone out there running with a Mac? Are your jobs from 5.02 or 5.03 running?
\"Rhiju\"
See this post
____________
Moderator9
RALPH@home FAQs
RALPH@home Guidelines
Moderator Contact |
|
|
|
|
The dog is barking bad :)
http://ralph.bakerlab.org/results.php?hostid=2049
Anders n
Ok. I set the cruching time to 2 H and the dog shut up.
This means that it should have something to do with swiching tasks.
Anders n
____________
|
|
|