Bug Reports for Minirosetta v1.36

Message boards : RALPH@home bug list : Bug Reports for Minirosetta v1.36

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Tim

Send message
Joined: 5 Apr 08
Posts: 5
Credit: 138,356
RAC: 0
Message 4260 - Posted: 10 Oct 2008, 22:01:06 UTC

Task 1112070 has been running for 9hrs 25mins so far...
ID: 4260 · Report as offensive    Reply Quote
Dotsch
Avatar

Send message
Joined: 4 Mar 06
Posts: 12
Credit: 12,000
RAC: 0
Message 4262 - Posted: 11 Oct 2008, 22:09:39 UTC - in response to Message 4251.  

ERROR: NANs occured in hbonding!
ERROR:: Exit from: src/core/scoring/hbonds/hbonds_geom.cc line: 763

All my WUs aborts with this error on MacOS (Intel) 10.5.5 after 20 to 610 sec. runtime.
ID: 4262 · Report as offensive    Reply Quote
Profile Inais
Avatar

Send message
Joined: 30 Jul 06
Posts: 12
Credit: 13,115
RAC: 0
Message 4263 - Posted: 11 Oct 2008, 22:50:34 UTC
Last modified: 11 Oct 2008, 23:03:25 UTC

Error on this WUs:

983615
988440
986225
089250

<core_client_version>6.2.19</core_client_version>
<![CDATA[
<stderr_txt>
======================================================
DONE :: 1 starting structures 6714.73 cpu seconds
This process generated 2 decoys from 2 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>hombench_mtyka_looprelax_ccd_moves_looprelax_ccd_moves_t315__IGNORE_THE_REST_2F6KA_3_5089_1_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

WU 987449 run up to 50,755% in process than start initializing and start again at zero time and zero process


I wish I can fly like a bird in the sky
ID: 4263 · Report as offensive    Reply Quote
BigMike
Avatar

Send message
Joined: 23 Feb 06
Posts: 63
Credit: 58,730
RAC: 0
Message 4264 - Posted: 11 Oct 2008, 23:14:04 UTC

This one died almost immediately:

<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>

ERROR: NANs occured in hbonding!
ERROR:: Exit from: ....srccorescoringhbondshbonds_geom.cc line: 763
called boinc_finish

</stderr_txt>
]]>

==Mike
Don't believe everything you think.
ID: 4264 · Report as offensive    Reply Quote
BigMike
Avatar

Send message
Joined: 23 Feb 06
Posts: 63
Credit: 58,730
RAC: 0
Message 4265 - Posted: 12 Oct 2008, 6:36:20 UTC - in response to Message 4264.  

ERROR: NANs occured in hbonding!


I've had quite a few more of these:
1120465 1125654 1125657 1125717 1125913

==Mike
Don't believe everything you think.
ID: 4265 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 4266 - Posted: 12 Oct 2008, 8:32:24 UTC - in response to Message 4264.  

This one died almost immediately:

<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>

ERROR: NANs occured in hbonding!
ERROR:: Exit from: ....srccorescoringhbondshbonds_geom.cc line: 763
called boinc_finish

</stderr_txt>
]]>

==Mike


Ok I have come back from my short break and enabled work fetch.
Not much has improved on these "hombench" work units has it ?
Instead of work units that ran until they were terminated the new ones die within seconds.

I also am getting the same error message as Mike, on nearly all work units,
See 1120652
1120653
1125933
1126061
1126062
1126064
1126067
1126079
1126147
1126153

Also had 1126041 fail with a validate error.

None of my current batch of work units has completed, all have failed.

The problems with "hombench" go back over a month now, both here and on Rosetta, what is the problem with them ?
Has our testing not shown you where the problems lie ?

Please sort out the problems with this work unit type.

Conan.
ID: 4266 · Report as offensive    Reply Quote
Dotsch
Avatar

Send message
Joined: 4 Mar 06
Posts: 12
Credit: 12,000
RAC: 0
Message 4267 - Posted: 12 Oct 2008, 8:47:38 UTC - in response to Message 4262.  

ERROR: NANs occured in hbonding!
ERROR:: Exit from: src/core/scoring/hbonds/hbonds_geom.cc line: 763

All my WUs aborts with this error on MacOS (Intel) 10.5.5 after 20 to 610 sec. runtime.

Same problem on my Windows system. All WUs errroed out with the same failure.

The Mac finished one WU successfull, but they did not validate.
ID: 4267 · Report as offensive    Reply Quote
Azurrio

Send message
Joined: 27 Jun 07
Posts: 12
Credit: 8,020
RAC: 0
Message 4268 - Posted: 12 Oct 2008, 20:43:05 UTC
Last modified: 12 Oct 2008, 20:44:37 UTC

Here are my failures:
1, 2, 3, 4, 5 and 6.
All seem to have the same error:
<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
Funktio ei kelpaa. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>

ERROR: NANs occured in hbonding!
ERROR:: Exit from: ....srccorescoringhbondshbonds_geom.cc line: 763
called boinc_finish

</stderr_txt>
]]>
ID: 4268 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 4269 - Posted: 13 Oct 2008, 1:44:16 UTC

This one ran 42hrs... still on model 1. I ended and restarted until the Watchdog turned it in.

hombench_mtyka_looprelax_ccd_moves_looprelax_ccd_moves_t328__IGNORE_THE_REST_2CG4A_13_5095_1_0

ID: 4269 · Report as offensive    Reply Quote
Profile EvoDude
Avatar

Send message
Joined: 18 Feb 06
Posts: 28
Credit: 639,833
RAC: 0
Message 4270 - Posted: 13 Oct 2008, 12:09:27 UTC

All the new series of WU's seem to be failing after a short time (1min - 15min). Files affected all have 'hombench_tex_' at beginning of ident.

This is happening on both my Vista machines with latest BOINC agent installed. I'm tempted to dump the remaining 50 or so files.
ID: 4270 · Report as offensive    Reply Quote
AdeB
Avatar

Send message
Joined: 22 Dec 07
Posts: 61
Credit: 159,951
RAC: 0
Message 4271 - Posted: 13 Oct 2008, 20:33:11 UTC

five failures:
1127424, 1127480, 1127566, 1128390 and 1122258.
All have the name: hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_cst_... and all have the same error:

ERROR: NANs occured in hbonding!
ERROR:: Exit from: src/core/scoring/hbonds/hbonds_geom.cc line: 763
called boinc_finish


And this one ran too long.
stderr out:
<core_client_version>5.10.45</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 14400
**********************************************************************
Rosetta is going too long. Watchdog is ending the run!
CPU time: 49551.9 seconds. Greater than 3X preferred time: 14400 seconds
**********************************************************************
called boinc_finish
SIGSEGV: segmentation violation
Stack trace (20 frames):
[0x89f8027]
[0x8a22720]
[0xffffe420]
[0x887f36b]
[0x82f934a]
[0x830358d]
[0x8749cb6]
[0x8945431]
[0x874b4a6]
[0x874e760]
[0x8749400]
[0x8066196]
[0x8084942]
[0x8092d68]
[0x808c16e]
[0x805e8f8]
[0x809795c]
[0x804bed3]
[0x8a7e21c]
[0x8048111]

Exiting...

</stderr_txt>
]]>
ID: 4271 · Report as offensive    Reply Quote
LEONARI

Send message
Joined: 12 Mar 06
Posts: 5
Credit: 108,342
RAC: 0
Message 4272 - Posted: 14 Oct 2008, 9:29:38 UTC

Rosetta Mini 1.36 Task: -"homebench_tex_cst_looprelax_tex_cst_t315_IGNORE_THE_REST_1GKPA_16_5148_1_0" locks up in the intialisation phase. At this point in time, this task has been running for 03:27:55 without any progress at all! It will now be aborted.
My account details are below: -
BOINC client version 5.10.45 for windows_intelx86
log flags: task, file_xfer, sched_ops
Libraries: libcurl/7.18.0 OpenSSL/0.9.8e zlib/1.2.3
Data directory: C:Program FilesBOINC
Processor: 1 GenuineIntel Mobile Intel(R) Pentium(R) 4 - M CPU 2.20GHz [x86 Family 15 Model 2 Stepping 7]
Processor features: fpu tsc sse mmx
OS: Microsoft Windows 2000: Professional Edition, Service Pack 4, (05.00.2195.00)
Memory: 511.43 MB physical, 1.21 GB virtual
Disk: 17.70 GB total, 2.26 GB free
Local time is UTC +1 hours
rosetta@home|URL: https://boinc.bakerlab.org/rosetta/; Computer ID: 97037; location: home; project prefs: default
ralph@home|URL: https://ralph.bakerlab.org/; Computer ID: 1760; location: home; project prefs: default
SETI@home|URL: http://setiathome.berkeley.edu/; Computer ID: 1960189; location: work; project prefs: default
General prefs: from http://setiathome.ssl.berkeley.edu/ (last modified 08-Jun-2006 10:33:55)
Host location: work
General prefs: no separate prefs for work; using your defaults
Reading preferences override file
Preferences limit memory usage when active to 255.71MB
Preferences limit memory usage when idle to 460.29MB
Preferences limit disk usage to 2.26GB

ID: 4272 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 4273 - Posted: 14 Oct 2008, 11:23:38 UTC - in response to Message 4266.  

This one died almost immediately:

<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>

ERROR: NANs occured in hbonding!
ERROR:: Exit from: ....srccorescoringhbondshbonds_geom.cc line: 763
called boinc_finish

</stderr_txt>
]]>

==Mike


Ok I have come back from my short break and enabled work fetch.
Not much has improved on these "hombench" work units has it ?
Instead of work units that ran until they were terminated the new ones die within seconds.

I also am getting the same error message as Mike, on nearly all work units,
See 1120652
1120653
1125933
1126061
1126062
1126064
1126067
1126079
1126147
1126153

Also had 1126041 fail with a validate error.

None of my current batch of work units has completed, all have failed.

The problems with "hombench" go back over a month now, both here and on Rosetta, what is the problem with them ?
Has our testing not shown you where the problems lie ?

Please sort out the problems with this work unit type.

Conan.


I have had just 1 "hombench_tex" work unit complete ok.

But I have had another 62 "hombench_tex" work units fail,
56 with the error already reported in this post,
and 6 have "validate error".

All "hombench_mtyka" work units that I get I am aborting as they all go way past my set preferences (6 hour preference, running for over 11) and then if I let them go the watchdog kills them and gives a small amount of credit for the effort.

These problems affect both Windows and Linux machines.
ID: 4273 · Report as offensive    Reply Quote
AdeB
Avatar

Send message
Joined: 22 Dec 07
Posts: 61
Credit: 159,951
RAC: 0
Message 4274 - Posted: 14 Oct 2008, 16:03:27 UTC
Last modified: 14 Oct 2008, 16:04:35 UTC

At this moment i have a ralph and a rosetta task running at the same time on a single-core computer, both using 50 %CPU.
The ralph task has the status 'Waiting to run' in boinc manager, and it is a hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_...-task, so that is the one i'm going to abort.
Something is seriously wrong with these tasks.
ID: 4274 · Report as offensive    Reply Quote
Profile [AF>France>TDM>Centre]Jeannot Le Tazon

Send message
Joined: 11 Jun 06
Posts: 3
Credit: 1,754
RAC: 0
Message 4275 - Posted: 14 Oct 2008, 16:23:43 UTC

Hello,
Some WUs are valid :
hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_cst_t325__IGNORE_THE_REST_1YIXA_16_5151_1_0
resultid=1123704
<core_client_version>6.2.18</core_client_version>
<![CDATA[
<stderr_txt>
======================================================
DONE :: 1 starting structures 4940.01 cpu seconds
This process generated 2 decoys from 2 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>
]]>

and some are invalid :
resultid=1130000
hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_cst_t322__IGNORE_THE_REST_2GVHA_6_5150_1_1
ERROR: NANs occured in hbonding!
ERROR:: Exit from: ....srccorescoringhbondshbonds_geom.cc line: 763


ID: 4275 · Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 4276 - Posted: 15 Oct 2008, 8:41:59 UTC

Two hombench_mtyka_foldcst_ tasks with "Maximum disk usage exceeded" and error -177, both on WinXP SP3, client 6.3.14:

hombench_mtyka_foldcst_loopbuild_tex_cst_foldcst_loopbuild_tex_cst_t286__IGNORE_THE_REST_2APJA_3_5159_1_0 exited after 0 seconds with -177 (0xffffffffffffff4f)
<core_client_version>6.3.14</core_client_version>
Maximum disk usage exceeded


hombench_mtyka_foldcst_loopbuild_tex_cst_foldcst_loopbuild_tex_cst_t286__IGNORE_THE_REST_1ZITA_10_5159_1_0 exited after 4972.422 seconds with -177 (0xffffffffffffff4f)
<core_client_version>6.3.14</core_client_version>
Maximum disk usage exceeded

sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
[...] (repeated ~720 x)
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range

.Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x7C90120E
#Engaging BOINC Windows Runtime Debugger...

According to dbg output, I'd bet the running thread's got a broken stack...

----

A bunch of hombench_tex_ tasks failed with "Incorrect function. - exit code 1" after 60-800 seconds; WinXP SP3, client 6.3.14 and Linux client 6.2.4:

hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_cst_t288__IGNORE_THE_REST_1T2MA_5_5137_1_1
hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_cst_t328__IGNORE_THE_REST_2CFXA_4_5154_1_0
hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_cst_t328__IGNORE_THE_REST_2CFXA_3_5154_1_0
hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_cst_t315__IGNORE_THE_REST_1KCXA_17_5148_1_0
hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_cst_t293__IGNORE_THE_REST_1VQ1A_2_5139_1_0

Peter
ID: 4276 · Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 4277 - Posted: 15 Oct 2008, 9:02:01 UTC
Last modified: 15 Oct 2008, 9:47:32 UTC

I've noticed that my hombench_mtyka_looprelax_ccd_close_looprelax_t286__IGNORE_THE_REST_1BWP__13_5163_1_0 is still running (now at 03:02:29, 76.011%, 01:25:16 to go), although was meant to be preempted approx. 1:18 hours (or 39 CPU minutes) ago. Linux P-III, 6.2.4 client.

Peter

[edit]It finished correctly.[/edit]
ID: 4277 · Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 4278 - Posted: 15 Oct 2008, 12:05:00 UTC - in response to Message 4275.  

some are invalid :
resultid=1130000
hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_cst_t322__IGNORE_THE_REST_2GVHA_6_5150_1_1
ERROR: NANs occured in hbonding!
ERROR:: Exit from: ....srccorescoringhbondshbonds_geom.cc line: 763

hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_cst_t325__IGNORE_THE_REST_1DPMA_12_5151_1_1 exited with error 1.

<core_client_version>6.3.14</core_client_version>
Incorrect function. (0x1) - exit code 1 (0x1)
# cpu_run_time_pref: 7200
No heartbeat from core client for 30 sec - exiting
# cpu_run_time_pref: 7200

ERROR: NANs occured in hbonding!
ERROR:: Exit from: ....srccorescoringhbondshbonds_geom.cc line: 763
called boinc_finish


Peter
ID: 4278 · Report as offensive    Reply Quote
HTH

Send message
Joined: 6 Mar 06
Posts: 9
Credit: 10,226
RAC: 0
Message 4279 - Posted: 15 Oct 2008, 14:47:51 UTC
Last modified: 15 Oct 2008, 14:50:25 UTC

https://ralph.bakerlab.org/result.php?resultid=1136498

stderr out
<core_client_version>6.3.14</core_client_version>
<![CDATA[
<message>
Funktio ei kelpaa. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>

ERROR: NANs occured in hbonding!
ERROR:: Exit from: ....srccorescoringhbondshbonds_geom.cc line: 763
called boinc_finish

</stderr_txt>
]]>

ID: 4279 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 4280 - Posted: 16 Oct 2008, 1:31:04 UTC - in response to Message 4274.  

At this moment i have a ralph and a rosetta task running at the same time on a single-core computer, both using 50 %CPU.
The ralph task has the status 'Waiting to run' in boinc manager, and it is a hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_...-task, so that is the one i'm going to abort.
Something is seriously wrong with these tasks.


I too have had this same experience with a four core machine running 7 boinc work units. 4 Docking and 3 Ralph, the Ralph work units had "waiting to run" next to them but were still running. I had to stop Boinc and restart it to get just 4 WU's running.

This has happened on more than one machine and more than once. All with the 'hombench_tex' name.

The new 'hombench_mtyka' work units seem to now be working ok, as none of my latest have gone past my 6 hour preference.
ID: 4280 · Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : RALPH@home bug list : Bug Reports for Minirosetta v1.36



©2020 University of Washington
http://www.bakerlab.org