Bug reports for version 5.93

Message boards : RALPH@home bug list : Bug reports for version 5.93

To post messages, you must log in.

AuthorMessage
Ingemar
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 7 Mar 07
Posts: 9
Credit: 76
RAC: 0
Message 3593 - Posted: 4 Jan 2008, 6:25:19 UTC
Last modified: 4 Jan 2008, 6:30:35 UTC

Please report any weird behavior of rosetta version 5.93!
ID: 3593 · Report as offensive    Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 2 Sep 06
Posts: 76
Credit: 107,857
RAC: 0
Message 3599 - Posted: 12 Jan 2008, 9:06:49 UTC
Last modified: 12 Jan 2008, 9:14:59 UTC

This Work Unit 649494 exited with a "161" error:

Outcome Client error
Client state Compute error
Exit status 0 (0x0)
Computer ID 10396
CPU time 6315.28125

stderr out
<core_client_version>6.1.0</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 1553865
======================================================
DONE :: 1 starting structures 6315.06 cpu seconds
This process generated 5 decoys from 5 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>trunc_solit_BOINC_ABRELAX_-trunc_solit-_2891_27_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
ID: 3599 · Report as offensive    Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 2 Sep 06
Posts: 76
Credit: 107,857
RAC: 0
Message 3600 - Posted: 12 Jan 2008, 9:14:18 UTC

This Work Unit 649484 exited with a "161" error for me and my wingman.
Details below from my result id:

Outcome Client error
Client state Compute error
Exit status 0 (0x0)
Computer ID 4500
Report deadline 16 Jan 2008 1:09:00 UTC
CPU time 7364.203125
stderr out

<core_client_version>6.1.0</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 1553875
======================================================
DONE :: 1 starting structures 7363.47 cpu seconds
This process generated 3 decoys from 3 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>trunc_solit_BOINC_ABRELAX_-trunc_solit-_2891_17_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
ID: 3600 · Report as offensive    Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 2 Sep 06
Posts: 76
Credit: 107,857
RAC: 0
Message 3608 - Posted: 14 Jan 2008, 8:02:11 UTC

Another 161 error to report:

https://ralph.bakerlab.org/result.php?resultid=732877

Exit status 0 (0x0)
Computer ID 4500
Report deadline 16 Jan 2008 1:09:00 UTC
CPU time 7364.203125
stderr out

<core_client_version>6.1.0</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 1553875
======================================================
DONE :: 1 starting structures 7363.47 cpu seconds
This process generated 3 decoys from 3 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>trunc_solit_BOINC_ABRELAX_-trunc_solit-_2891_17_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

Validate state Invalid
Claimed credit 18.0472218006911

ID: 3608 · Report as offensive    Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 2 Sep 06
Posts: 76
Credit: 107,857
RAC: 0
Message 3609 - Posted: 14 Jan 2008, 8:05:31 UTC

Another 161 error to report:

https://ralph.bakerlab.org/result.php?resultid=732906

Outcome Client error
Client state Compute error
Exit status 0 (0x0)
Computer ID 10396
Report deadline 16 Jan 2008 3:04:44 UTC
CPU time 6187.515625
stderr out

<core_client_version>6.1.0</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 1553752
======================================================
DONE :: 1 starting structures 6186.97 cpu seconds
This process generated 5 decoys from 5 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>trunc_solit_BOINC_ABRELAX_-trunc_solit-_2892_40_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

Validate state Invalid
Claimed credit 16.9298421717622
Granted credit 0
application version 5.93

ID: 3609 · Report as offensive    Reply Quote
Snagletooth

Send message
Joined: 4 May 07
Posts: 67
Credit: 134,427
RAC: 0
Message 3612 - Posted: 14 Jan 2008, 11:12:48 UTC

Another "161" error for trunc_solit_BOINC_ABRELAX_-trunc_solit-_2891_53_1

workunit 649520 has now been sent to a third cruncher


<core_client_version>5.10.20</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 36000
# random seed: 1553839
# cpu_run_time_pref: 36000
======================================================
DONE :: 1 starting structures 35646.5 cpu seconds
This process generated 6 decoys from 6 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>trunc_solit_BOINC_ABRELAX_-trunc_solit-_2891_53_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>


ID: 3612 · Report as offensive    Reply Quote
BigMike
Avatar

Send message
Joined: 23 Feb 06
Posts: 63
Credit: 58,730
RAC: 0
Message 3629 - Posted: 16 Jan 2008, 3:35:26 UTC

Wow ... that didn't take long...

<core_client_version>5.10.30</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 3600
ERROR:: Unable to determine sequence length from pdb file
ERROR:: Exit from: .pose.cc line: 1983

</stderr_txt>
]]>


Don't believe everything you think.
ID: 3629 · Report as offensive    Reply Quote
Eric Ogletree

Send message
Joined: 27 Aug 07
Posts: 1
Credit: 24,361
RAC: 0
Message 3638 - Posted: 16 Jan 2008, 15:45:54 UTC

Got four of them here. Hope it helps. :)

16/01/2008 1:27:23 AM|ralph@home|Reason: Unrecoverable error for result mini_-1a32_-test_2898_200_0 (<file_xfer_error> <file_name>mini_-1a32_-test_2898_200_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)

16/01/2008 5:34:01 AM|ralph@home|Task mini_-1a32_-test_2898_193_0 exited with zero status but no 'finished' file

16/01/2008 5:57:55 AM|ralph@home|Task mini_-1a32_-test_2898_206_0 exited with zero status but no 'finished' file

16/01/2008 8:36:34 AM|ralph@home|Reason: Unrecoverable error for result mini_-1a32_-test_2898_193_0 (<file_xfer_error> <file_name>mini_-1a32_-test_2898_193_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
ID: 3638 · Report as offensive    Reply Quote
ramostol

Send message
Joined: 29 Mar 07
Posts: 24
Credit: 31,121
RAC: 0
Message 3652 - Posted: 20 Jan 2008, 11:17:06 UTC

You probably know this now, but anyhow:

(Some?) trunc_solit-wus seem unable to create proper output files.

3 invalid results for trunc_solit_BOINC_ABRELAX_-trunc_solit-_2934_25
ID: 3652 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 3653 - Posted: 20 Jan 2008, 14:38:18 UTC

Getting the same error as some others here with WU type 'trunc_solit'

WU 732726
WU 732758
WU 732769
WU 732802
WU 732803
WU 733276
WU 736191

<core_client_version>5.10.21</core_client_version>
<![CDATA[
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 21600
# random seed: 1553847
======================================================
DONE :: 1 starting structures 21007.6 cpu seconds
This process generated 30 decoys from 30 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>trunc_solit_BOINC_ABRELAX_-trunc_solit-_2891_45_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

ID: 3653 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 3660 - Posted: 21 Jan 2008, 10:56:12 UTC

WU 736639

<core_client_version>5.10.21</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
Graphics are disabled due to configuration...
ERROR:: Unable to obtain total_residue & sequence.
start pdb file must be provided.
ERROR:: Exit from: input_pdb.cc line: 2968
# cpu_run_time_pref: 21600

ID: 3660 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 3664 - Posted: 22 Jan 2008, 21:22:01 UTC

Have a WU running at the moment (1 of 4 but not sure exactly which one), that is behaving very strange.
I noticed this morning that I had a Ralph WU that had completed at 100% after 17:29:51 but was still showing as running at High Priority.

Suspending and resuming made no difference so I stopped Boinc Manager and restarted.

The WU appeared to have gone but on checking further I found that it has gone back to a process time of 4 hours 12 minutes and going as normal again but still at High Priority.

Is this normal for these 2h4o_BOINC_TWIST type work units?
ID: 3664 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 3666 - Posted: 23 Jan 2008, 14:10:36 UTC - in response to Message 3664.  

Have a WU running at the moment (1 of 4 but not sure exactly which one), that is behaving very strange.
I noticed this morning that I had a Ralph WU that had completed at 100% after 17:29:51 but was still showing as running at High Priority.

Suspending and resuming made no difference so I stopped Boinc Manager and restarted.

The WU appeared to have gone but on checking further I found that it has gone back to a process time of 4 hours 12 minutes and going as normal again but still at High Priority.

Is this normal for these 2h4o_BOINC_TWIST type work units?


I am suspecting that the WU resets and starts again, so I lost possibly up to 17 hours processing time.
Of the 4 I received, 1 has now completed normally without an indication of problems.
2 more are now up to 15 and 16 hours at 98.5% with 9 minutes 56 seconds left on both. One has switched to let another project run but the other is running at High Priority.
ID: 3666 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 3667 - Posted: 23 Jan 2008, 22:57:59 UTC - in response to Message 3666.  

Have a WU running at the moment (1 of 4 but not sure exactly which one), that is behaving very strange.
I noticed this morning that I had a Ralph WU that had completed at 100% after 17:29:51 but was still showing as running at High Priority.

Suspending and resuming made no difference so I stopped Boinc Manager and restarted.

The WU appeared to have gone but on checking further I found that it has gone back to a process time of 4 hours 12 minutes and going as normal again but still at High Priority.

Is this normal for these 2h4o_BOINC_TWIST type work units?


I am suspecting that the WU resets and starts again, so I lost possibly up to 17 hours processing time.
Of the 4 I received, 1 has now completed normally without an indication of problems.
2 more are now up to 15 and 16 hours at 98.5% with 9 minutes 56 seconds left on both. One has switched to let another project run but the other is running at High Priority.


Ok WU 736936 finished without error and in normal 6 hour preference range. I believe that this is the WU that got to 17:29:51 then after restarting BM it went back to normal, but I can't prove that, it could of been one of the following WU's.

WU 736937 went for 16:24:27 (59067.94 seconds) and then returned a computation error, that was a lot of wasted effort, here is the error output

59067.941307
stderr out

<core_client_version>5.10.21</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 21600
# random seed: 1551605
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score 16.1773 for 900 seconds
**********************************************************************
GZIP SILENT FILE: ./xx2h4o.out
*** glibc detected *** corrupted double-linked list: 0xae7e1098 ***
SIGABRT: abort called
Stack trace (14 frames):
[0x8da3037]
[0x8d9de2c]
[0xb7f8c420]
[0x8e0e444]
[0x8e2330f]
[0x8e28532]
[0x8e28653]
[0x8e0e9b4]
[0x8d9fab7]
[0x8d9ff27]
[0x8d2023d]
[0x8d20f35]
[0x8d9a0c5]
[0x8e3aa1a]

Exiting...
SIGSEGV: segmentation violation
Stack trace (18 frames):
[0x8da3037]
[0x8d9de2c]
[0xb7f8c420]
[0x8cad54d]
[0x8c11820]
[0x8c14e33]
[0x804c7c2]
[0x8a835ed]
[0x8a8586f]
[0x89363de]
[0x89380e3]
[0x893ba27]
[0x898ad7a]
[0x85e96d6]
[0x87289d2]
[0x8728af2]
[0x8e07384]
[0x8048111]

Exiting...
FILE_LOCK::unlock(): close failed.: Bad file descriptor
Graphics are disabled due to configuration...
# cpu_run_time_pref: 21600
SIGSEGV: segmentation violation
Stack trace (18 frames):
[0x8da3037]
[0x8d9de2c]
[0xb7f00420]
[0x8cad54d]
[0x8c11820]
[0x8c14e33]
[0x804c7c2]
[0x8a835ed]
[0x8a8586f]
[0x89363de]
[0x8938119]
[0x893ba27]
[0x898ad7a]
[0x85e96d6]
[0x87289d2]
[0x8728af2]
[0x8e07384]
[0x8048111]

Exiting...

WU 736938 ran for 21:48:59 (78,539.06 seconds) was validated but returned a very poor credit amount for such a long process time.

Both the last two WU's were stopped by the Watchdog for being stuck.
ID: 3667 · Report as offensive    Reply Quote
RAD-Poland

Send message
Joined: 6 Apr 07
Posts: 6
Credit: 100,029
RAC: 0
Message 3668 - Posted: 24 Jan 2008, 10:23:44 UTC
Last modified: 24 Jan 2008, 10:25:03 UTC

Workunit 652259

<core_client_version>5.10.10</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 3600
# random seed: 1551090
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
CPU time: 17762 seconds. Greater than 4X preferred time: 3600 seconds
**********************************************************************
GZIP SILENT FILE: ./xxgp04.out
SIGSEGV: segmentation violation
Stack trace (25 frames):
[0x8da3037]
...

Validate state Invalid
ID: 3668 · Report as offensive    Reply Quote
Basilaris

Send message
Joined: 16 Feb 06
Posts: 2
Credit: 10,006
RAC: 0
Message 3669 - Posted: 24 Jan 2008, 18:55:46 UTC

2h4o_Boinc_Twist_Angle_Symm_Fold_and_Dock-2h4o_-native__2970_18_0 did not continue at Model 2, Step: 34817, RMSDE 1.187E+004, Energy: -68.98463. Time and Percent complete went on, but nothing happend. After restarting it was the same: it went up to step 34817 and stop. And the graphics went were faulty too.
ID: 3669 · Report as offensive    Reply Quote
Keith T.
Avatar

Send message
Joined: 4 May 07
Posts: 13
Credit: 10,923
RAC: 0
Message 3672 - Posted: 25 Jan 2008, 23:47:35 UTC

https://ralph.bakerlab.org/workunit.php?wuid=651754 2h4o__BOINC_TWIST_RINGS_TWIST_ANGLE_SYMM_FOLD_AND_DOCK-2h4o_-native__2970_79 got stuck on the 9th decoy for over an hour at least twice.

I eventually changed the CPU run time down to 4 hours from 8 to get the WU to finish before it's deadline. I did try exiting BOINC a few times as well. The WU was stuck on the 9th decoy and restarted the same one at least twice.

Keith
ID: 3672 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 3684 - Posted: 4 Feb 2008, 5:22:35 UTC

Just finished this one.
It took over 24 hours before the watchdog stopped it. Should of claimed 300 credits but was granted 80 for less than 3.5 cr/h. Pretty miserable.

So Work units are still not adhering to preferences. Bug not fixed.
ID: 3684 · Report as offensive    Reply Quote
Profile [B^S] JoeB@Ky

Send message
Joined: 11 Oct 06
Posts: 8
Credit: 39,098
RAC: 0
Message 3703 - Posted: 10 Feb 2008, 5:35:08 UTC

I had 2 WU's load on my 2.13GHZ C2D with about a ~1 hr run time. Both were stuck at ~84.3/84.4% after running 1:42/1:46 hrs. I let them stay that way for an additional ~2.25 hours before aborting them yesterday PM. No such problems on my 3.4GHZ P4w/HT; the 2 WU's on it now loaded at ~2.0 hr run time and after 1:07:04 run time the 1st one is at 86.7% done, no freeze up.
I just DLed the code file listed on the news buletin on the Bonic Synergy web site and put it in the Ralph PROJECT Folder on the C2D box. I noticed at that time that there was a similar file named: "minirosetta_1.03_windows_intelx86" dated 1-15-08. But it didn't have the .pbd file extention on the end of it. My P4 box, RALPH directory, already has the current 1.07 code file w/ the .pbd extention. Might be why it wasn't working right on the C2D box!
ID: 3703 · Report as offensive    Reply Quote
quimillo

Send message
Joined: 14 Feb 08
Posts: 4
Credit: 10,604
RAC: 0
Message 3741 - Posted: 14 Feb 2008, 21:10:24 UTC

task tol5__BOINC_SYMM_FOLD_AND_DOCK_RELAX_ONLY-tol5_-lowres_dock_-dock_3218__3305_1_0
using rosetta_beta version 593

time of CPU stopped in: 04:38:41

Progress: 100%

Status: Running, high prioprity

BOINC client version 5.10.28 for i686-pc-linux-gnu

What I do?
ID: 3741 · Report as offensive    Reply Quote

Message boards : RALPH@home bug list : Bug reports for version 5.93



©2024 University of Washington
http://www.bakerlab.org