Bug report for Rosetta version 5.97

Message boards : RALPH@home bug list : Bug report for Rosetta version 5.97

To post messages, you must log in.

AuthorMessage
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 250
Credit: 543,579
RAC: 0
Message 4126 - Posted: 21 Jun 2008, 3:24:47 UTC

Please post any 5.97 bugs here. This version is identical to 5.96 but includes a fix for a rare infinite loop in the boinc api. "The dreaded t405 issue".
ID: 4126 · Report as offensive    Reply Quote
Jipsu

Send message
Joined: 11 Mar 08
Posts: 26
Credit: 76,448
RAC: 0
Message 4128 - Posted: 21 Jun 2008, 12:18:55 UTC

This stderr might be interesting, it still was a success though.

https://ralph.bakerlab.org/result.php?resultid=1045195
ID: 4128 · Report as offensive    Reply Quote
John Hunt
Avatar

Send message
Joined: 16 Mar 07
Posts: 10
Credit: 28,654
RAC: 0
Message 4129 - Posted: 21 Jun 2008, 12:29:22 UTC

ID: 4129 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 4131 - Posted: 21 Jun 2008, 13:55:03 UTC

Old "t405" is still raising it's ugly head, see this result

<core_client_version>5.10.21</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
Graphics are disabled due to configuration...
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
# cpu_run_time_pref: 21600
# random seed: 1176122
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period
======================================================
DONE :: 1 starting structures 20983.1 cpu seconds
This process generated 11 decoys from 11 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish
*** glibc detected *** free(): invalid next size (normal): 0x0959bb58 ***
SIGABRT: abort called
Stack trace (19 frames):
[0x8e1b51f]
[0x8e15eac]
[0xb7f6b420]
[0x8e87104]
[0x8e9c06f]
[0x8ea10d5]
[0x8ea13b3]
[0x8e71d61]
[0x8e73789]
[0x8070d59]
[0x87cdb6a]
[0x8e8764f]
[0x8e17abc]
[0x8e17bc7]
[0x86290f2]
[0x8768b46]
[0x8768c66]
[0x8e80044]
[0x8048111]

Exiting...

</stderr_txt>
]]>

Validate state Invalid
ID: 4131 · Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 4132 - Posted: 21 Jun 2008, 14:58:27 UTC
Last modified: 21 Jun 2008, 15:00:55 UTC

Linux Boinc 6.2.4, t419N_autoalign_IGNORE_THE_REST_renumbered_4388_2_1: "Maximum disk usage exceeded". And a lot of
Unrecognized XML in parse_init_data_file: fraction_done_update_period
Skipping: 1.000000
Skipping: /fraction_done_update_period

in the stderr_txt.

Peter
ID: 4132 · Report as offensive    Reply Quote
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 250
Credit: 543,579
RAC: 0
Message 4133 - Posted: 21 Jun 2008, 16:52:03 UTC

You can ignore the unrecognized XML parse output in stderr.txt. There is still a possibility of getting access violations with the t405 jobs. We have to track down and fix the cause. What we are really concerned about is fixing the client stalling issue. This is a really bad problem since without any information sent back from the client, we can't determine the status of tasks and obviously a stalled client is a very bad situation.
ID: 4133 · Report as offensive    Reply Quote
Profile David Emigh
Avatar

Send message
Joined: 6 Jan 08
Posts: 27
Credit: 26,482
RAC: 0
Message 4137 - Posted: 22 Jun 2008, 13:56:07 UTC
Last modified: 22 Jun 2008, 13:59:38 UTC

This one threw a "Compute Error" at 0 seconds:

t419N_autoalign_IGNORE_THE_REST_renumbered_4293_3_1

The error message was:

Maximum disk usage exceeded

RALPH and Rosie sittin' in a tree,
F - O - L - D - I - N - G!
ID: 4137 · Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 4139 - Posted: 23 Jun 2008, 21:01:28 UTC - in response to Message 4133.  

There is still a possibility of getting access violations with the t405 jobs. We have to track down and fix the cause. What we are really concerned about is fixing the client stalling issue. This is a really bad problem since without any information sent back from the client, we can't determine the status of tasks and obviously a stalled client is a very bad situation.

If my tasks will stall again (as some already did in the past), what files from my slot/project folders would interest you?

Peter
ID: 4139 · Report as offensive    Reply Quote
Odysseus

Send message
Joined: 4 May 07
Posts: 23
Credit: 16,331
RAC: 0
Message 4141 - Posted: 24 Jun 2008, 10:44:30 UTC

My G5 got a computation error on t419N_autoalign_IGNORE_THE_REST_renumbered_4320_3:
exit status 1 (0x1), “Unable to determine sequence length from pdb file”.

ID: 4141 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 4142 - Posted: 24 Jun 2008, 11:53:39 UTC

> Another "t405" errored WU, see this WU. It has the same error as the one I reported previously.
ID: 4142 · Report as offensive    Reply Quote
Path7

Send message
Joined: 11 Feb 08
Posts: 56
Credit: 4,974
RAC: 0
Message 4143 - Posted: 24 Jun 2008, 19:05:54 UTC
Last modified: 24 Jun 2008, 19:20:33 UTC

Hello all,
Running Ubuntu 7.10 x86 the next WU errored with a -177 (0xffffffffffffff4f) error:
t419N_autoalign_IGNORE_THE_REST_renumbered_4285_6_2
While the stderr out gives a: ERROR:: Unable to determine sequence length from pdb file
from Boinc messages I find:
di 24 jun 2008 20:22:51 CEST|ralph@home|Aborting task t419N_autoalign_IGNORE_THE_REST_renumbered_4285_6_2: exceeded disk limit: 569.28MB > 476.84MB
di 24 jun 2008 20:22:52 CEST|ralph@home|Computation for task t419N_autoalign_IGNORE_THE_REST_renumbered_4285_6_2 finished
di 24 jun 2008 20:22:52 CEST|ralph@home|Output file t419N_autoalign_IGNORE_THE_REST_renumbered_4285_6_2_0 for task t419N_autoalign_IGNORE_THE_REST_renumbered_4285_6_2 absent
note: di is short for Tuesday.

Edit: Boinc disk usage is limited to 2.16 GB.

Have a nice day,
Path7.
ID: 4143 · Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 4144 - Posted: 24 Jun 2008, 21:35:08 UTC - in response to Message 4143.  

t419N_autoalign_IGNORE_THE_REST_renumbered_4285_6_2
di 24 jun 2008 20:22:51 CEST|ralph@home|Aborting task t419N_autoalign_IGNORE_THE_REST_renumbered_4285_6_2: exceeded disk limit: 569.28MB > 476.84MB

The t419N_autoalign_IGNORE_THE_REST_renumbered_4406_8_1: "Maximum disk usage exceeded". The same happened to all 3 wingmen (Mac, Linux, Vista).

Peter
ID: 4144 · Report as offensive    Reply Quote

Message boards : RALPH@home bug list : Bug report for Rosetta version 5.97



©2024 University of Washington
http://www.bakerlab.org