Message boards : RALPH@home bug list : Bug Reports for Minirosetta v1.36
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Tim Send message Joined: 5 Apr 08 Posts: 5 Credit: 138,356 RAC: 0 |
Task 1112070 has been running for 9hrs 25mins so far... |
Dotsch Send message Joined: 4 Mar 06 Posts: 12 Credit: 13,725 RAC: 0 |
ERROR: NANs occured in hbonding! All my WUs aborts with this error on MacOS (Intel) 10.5.5 after 20 to 610 sec. runtime. |
Inais Send message Joined: 30 Jul 06 Posts: 12 Credit: 13,115 RAC: 0 |
Error on this WUs: 983615 988440 986225 089250 <core_client_version>6.2.19</core_client_version> <![CDATA[ <stderr_txt> ====================================================== DONE :: 1 starting structures 6714.73 cpu seconds This process generated 2 decoys from 2 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... called boinc_finish </stderr_txt> <message> <file_xfer_error> <file_name>hombench_mtyka_looprelax_ccd_moves_looprelax_ccd_moves_t315__IGNORE_THE_REST_2F6KA_3_5089_1_1_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> WU 987449 run up to 50,755% in process than start initializing and start again at zero time and zero process I wish I can fly like a bird in the sky |
BigMike Send message Joined: 23 Feb 06 Posts: 63 Credit: 58,730 RAC: 0 |
This one died almost immediately: <core_client_version>6.2.19</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> ERROR: NANs occured in hbonding! ERROR:: Exit from: ....srccorescoringhbondshbonds_geom.cc line: 763 called boinc_finish </stderr_txt> ]]> ==Mike Don't believe everything you think. |
BigMike Send message Joined: 23 Feb 06 Posts: 63 Credit: 58,730 RAC: 0 |
|
Conan Send message Joined: 16 Feb 06 Posts: 364 Credit: 1,368,421 RAC: 0 |
This one died almost immediately: Ok I have come back from my short break and enabled work fetch. Not much has improved on these "hombench" work units has it ? Instead of work units that ran until they were terminated the new ones die within seconds. I also am getting the same error message as Mike, on nearly all work units, See 1120652 1120653 1125933 1126061 1126062 1126064 1126067 1126079 1126147 1126153 Also had 1126041 fail with a validate error. None of my current batch of work units has completed, all have failed. The problems with "hombench" go back over a month now, both here and on Rosetta, what is the problem with them ? Has our testing not shown you where the problems lie ? Please sort out the problems with this work unit type. Conan. |
Dotsch Send message Joined: 4 Mar 06 Posts: 12 Credit: 13,725 RAC: 0 |
ERROR: NANs occured in hbonding! Same problem on my Windows system. All WUs errroed out with the same failure. The Mac finished one WU successfull, but they did not validate. |
Azurrio Send message Joined: 27 Jun 07 Posts: 12 Credit: 8,020 RAC: 0 |
Here are my failures: 1, 2, 3, 4, 5 and 6. All seem to have the same error: <core_client_version>6.2.19</core_client_version> <![CDATA[ <message> Funktio ei kelpaa. (0x1) - exit code 1 (0x1) </message> <stderr_txt> ERROR: NANs occured in hbonding! ERROR:: Exit from: ....srccorescoringhbondshbonds_geom.cc line: 763 called boinc_finish </stderr_txt> ]]> |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
This one ran 42hrs... still on model 1. I ended and restarted until the Watchdog turned it in. hombench_mtyka_looprelax_ccd_moves_looprelax_ccd_moves_t328__IGNORE_THE_REST_2CG4A_13_5095_1_0 |
EvoDude Send message Joined: 18 Feb 06 Posts: 28 Credit: 639,833 RAC: 0 |
All the new series of WU's seem to be failing after a short time (1min - 15min). Files affected all have 'hombench_tex_' at beginning of ident. This is happening on both my Vista machines with latest BOINC agent installed. I'm tempted to dump the remaining 50 or so files. |
AdeB Send message Joined: 22 Dec 07 Posts: 61 Credit: 161,367 RAC: 0 |
five failures: 1127424, 1127480, 1127566, 1128390 and 1122258. All have the name: hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_cst_... and all have the same error: ERROR: NANs occured in hbonding! ERROR:: Exit from: src/core/scoring/hbonds/hbonds_geom.cc line: 763 called boinc_finish And this one ran too long. stderr out: <core_client_version>5.10.45</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 14400 ********************************************************************** Rosetta is going too long. Watchdog is ending the run! CPU time: 49551.9 seconds. Greater than 3X preferred time: 14400 seconds ********************************************************************** called boinc_finish SIGSEGV: segmentation violation Stack trace (20 frames): [0x89f8027] [0x8a22720] [0xffffe420] [0x887f36b] [0x82f934a] [0x830358d] [0x8749cb6] [0x8945431] [0x874b4a6] [0x874e760] [0x8749400] [0x8066196] [0x8084942] [0x8092d68] [0x808c16e] [0x805e8f8] [0x809795c] [0x804bed3] [0x8a7e21c] [0x8048111] Exiting... </stderr_txt> ]]> |
LEONARI Send message Joined: 12 Mar 06 Posts: 5 Credit: 108,342 RAC: 0 |
Rosetta Mini 1.36 Task: -"homebench_tex_cst_looprelax_tex_cst_t315_IGNORE_THE_REST_1GKPA_16_5148_1_0" locks up in the intialisation phase. At this point in time, this task has been running for 03:27:55 without any progress at all! It will now be aborted. My account details are below: - BOINC client version 5.10.45 for windows_intelx86 log flags: task, file_xfer, sched_ops Libraries: libcurl/7.18.0 OpenSSL/0.9.8e zlib/1.2.3 Data directory: C:Program FilesBOINC Processor: 1 GenuineIntel Mobile Intel(R) Pentium(R) 4 - M CPU 2.20GHz [x86 Family 15 Model 2 Stepping 7] Processor features: fpu tsc sse mmx OS: Microsoft Windows 2000: Professional Edition, Service Pack 4, (05.00.2195.00) Memory: 511.43 MB physical, 1.21 GB virtual Disk: 17.70 GB total, 2.26 GB free Local time is UTC +1 hours rosetta@home|URL: https://boinc.bakerlab.org/rosetta/; Computer ID: 97037; location: home; project prefs: default ralph@home|URL: https://ralph.bakerlab.org/; Computer ID: 1760; location: home; project prefs: default SETI@home|URL: http://setiathome.berkeley.edu/; Computer ID: 1960189; location: work; project prefs: default General prefs: from http://setiathome.ssl.berkeley.edu/ (last modified 08-Jun-2006 10:33:55) Host location: work General prefs: no separate prefs for work; using your defaults Reading preferences override file Preferences limit memory usage when active to 255.71MB Preferences limit memory usage when idle to 460.29MB Preferences limit disk usage to 2.26GB |
Conan Send message Joined: 16 Feb 06 Posts: 364 Credit: 1,368,421 RAC: 0 |
This one died almost immediately: I have had just 1 "hombench_tex" work unit complete ok. But I have had another 62 "hombench_tex" work units fail, 56 with the error already reported in this post, and 6 have "validate error". All "hombench_mtyka" work units that I get I am aborting as they all go way past my set preferences (6 hour preference, running for over 11) and then if I let them go the watchdog kills them and gives a small amount of credit for the effort. These problems affect both Windows and Linux machines. |
AdeB Send message Joined: 22 Dec 07 Posts: 61 Credit: 161,367 RAC: 0 |
At this moment i have a ralph and a rosetta task running at the same time on a single-core computer, both using 50 %CPU. The ralph task has the status 'Waiting to run' in boinc manager, and it is a hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_...-task, so that is the one i'm going to abort. Something is seriously wrong with these tasks. |
[AF>France>TDM>Centre]Jeannot Le Tazon Send message Joined: 11 Jun 06 Posts: 3 Credit: 1,754 RAC: 0 |
Hello, Some WUs are valid : hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_cst_t325__IGNORE_THE_REST_1YIXA_16_5151_1_0 resultid=1123704 <core_client_version>6.2.18</core_client_version> <![CDATA[ <stderr_txt> ====================================================== DONE :: 1 starting structures 4940.01 cpu seconds This process generated 2 decoys from 2 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... called boinc_finish </stderr_txt> ]]> and some are invalid : resultid=1130000 hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_cst_t322__IGNORE_THE_REST_2GVHA_6_5150_1_1 ERROR: NANs occured in hbonding! ERROR:: Exit from: ....srccorescoringhbondshbonds_geom.cc line: 763 |
Pepo Send message Joined: 8 Sep 06 Posts: 104 Credit: 36,890 RAC: 0 |
Two hombench_mtyka_foldcst_ tasks with "Maximum disk usage exceeded" and error -177, both on WinXP SP3, client 6.3.14: hombench_mtyka_foldcst_loopbuild_tex_cst_foldcst_loopbuild_tex_cst_t286__IGNORE_THE_REST_2APJA_3_5159_1_0 exited after 0 seconds with -177 (0xffffffffffffff4f) <core_client_version>6.3.14</core_client_version> hombench_mtyka_foldcst_loopbuild_tex_cst_foldcst_loopbuild_tex_cst_t286__IGNORE_THE_REST_1ZITA_10_5159_1_0 exited after 4972.422 seconds with -177 (0xffffffffffffff4f) <core_client_version>6.3.14</core_client_version> According to dbg output, I'd bet the running thread's got a broken stack... ---- A bunch of hombench_tex_ tasks failed with "Incorrect function. - exit code 1" after 60-800 seconds; WinXP SP3, client 6.3.14 and Linux client 6.2.4: hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_cst_t288__IGNORE_THE_REST_1T2MA_5_5137_1_1 hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_cst_t328__IGNORE_THE_REST_2CFXA_4_5154_1_0 hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_cst_t328__IGNORE_THE_REST_2CFXA_3_5154_1_0 hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_cst_t315__IGNORE_THE_REST_1KCXA_17_5148_1_0 hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_cst_t293__IGNORE_THE_REST_1VQ1A_2_5139_1_0 Peter |
Pepo Send message Joined: 8 Sep 06 Posts: 104 Credit: 36,890 RAC: 0 |
I've noticed that my hombench_mtyka_looprelax_ccd_close_looprelax_t286__IGNORE_THE_REST_1BWP__13_5163_1_0 is still running (now at 03:02:29, 76.011%, 01:25:16 to go), although was meant to be preempted approx. 1:18 hours (or 39 CPU minutes) ago. Linux P-III, 6.2.4 client. Peter [edit]It finished correctly.[/edit] |
Pepo Send message Joined: 8 Sep 06 Posts: 104 Credit: 36,890 RAC: 0 |
some are invalid : hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_cst_t325__IGNORE_THE_REST_1DPMA_12_5151_1_1 exited with error 1. <core_client_version>6.3.14</core_client_version> Peter |
HTH Send message Joined: 6 Mar 06 Posts: 9 Credit: 10,226 RAC: 0 |
https://ralph.bakerlab.org/result.php?resultid=1136498 stderr out <core_client_version>6.3.14</core_client_version> <![CDATA[ <message> Funktio ei kelpaa. (0x1) - exit code 1 (0x1) </message> <stderr_txt> ERROR: NANs occured in hbonding! ERROR:: Exit from: ....srccorescoringhbondshbonds_geom.cc line: 763 called boinc_finish </stderr_txt> ]]> |
Conan Send message Joined: 16 Feb 06 Posts: 364 Credit: 1,368,421 RAC: 0 |
At this moment i have a ralph and a rosetta task running at the same time on a single-core computer, both using 50 %CPU. I too have had this same experience with a four core machine running 7 boinc work units. 4 Docking and 3 Ralph, the Ralph work units had "waiting to run" next to them but were still running. I had to stop Boinc and restart it to get just 4 WU's running. This has happened on more than one machine and more than once. All with the 'hombench_tex' name. The new 'hombench_mtyka' work units seem to now be working ok, as none of my latest have gone past my 6 hour preference. |
Message boards :
RALPH@home bug list :
Bug Reports for Minirosetta v1.36
©2024 University of Washington
http://www.bakerlab.org