minirosetta v1.48-1.51 bug thread

Message boards : RALPH@home bug list : minirosetta v1.48-1.51 bug thread

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 4450 - Posted: 19 Jan 2009, 8:17:50 UTC - in response to Message 4449.  

As a first preliminary report:
This (long anticipated, yes i know ) new release ...
... is mistakenly dated to December 12, 2008 (probably a copy of the 1.47 release).

Peter
ID: 4450 · Report as offensive    Reply Quote
Profile sslickerson

Send message
Joined: 15 Feb 06
Posts: 17
Credit: 4,006
RAC: 0
Message 4453 - Posted: 19 Jan 2009, 19:27:56 UTC - in response to Message 4452.  

Would this be why the last 26 WU or so have failed on my PC with the following message, or is this something different?

<core_client_version>6.5.0</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>

ERROR: in::file::zip minirosetta_database.zip does not exist!
ERROR:: Exit from: ....srcappspublicboincminirosetta.cc line: 83
called boinc_finish

</stderr_txt>
]]>
ID: 4453 · Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 14 Jan 09
Posts: 62
Credit: 33,293
RAC: 0
Message 4455 - Posted: 20 Jan 2009, 7:51:17 UTC

I don't know if you like success stories, but I have run 4 tasks now I think on OS-X Intel and they all have completed successfully.
ID: 4455 · Report as offensive    Reply Quote
I _ quit

Send message
Joined: 13 Jan 09
Posts: 44
Credit: 88,562
RAC: 0
Message 4456 - Posted: 20 Jan 2009, 15:34:38 UTC

8 tasks on win xp home sp3 and no errors so far
had a few 1 hour runs before i updated the prefs. to 4hrs
ID: 4456 · Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 14 Jan 09
Posts: 62
Credit: 33,293
RAC: 0
Message 4458 - Posted: 20 Jan 2009, 19:55:04 UTC - in response to Message 4457.  

Awesome guys! Keep me posted on what you see out there. The error rate so far is looking fabulous.

I'll probably update the app once more today to fix an issue with the symbol store such that we get code traces in cases where it still fails.

Mike :)


I only got one 1.5 task so that will be all I have to report ... so, my latest bug report is that version 1.5 is repelling the creation of new tasks ...
ID: 4458 · Report as offensive    Reply Quote
Profile Ian_D

Send message
Joined: 16 Feb 06
Posts: 16
Credit: 39,518
RAC: 0
Message 4459 - Posted: 20 Jan 2009, 20:52:47 UTC

https://ralph.bakerlab.org/result.php?resultid=1250838

<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
Watchdog active.

ERROR: target_strands.size()
ERROR:: Exit from: ....srcprotocolsabinitioTemplateJumpSetup.cc line: 94
called boinc_finish

</stderr_txt>
]]>

ID: 4459 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 4460 - Posted: 20 Jan 2009, 21:42:33 UTC
Last modified: 20 Jan 2009, 21:44:44 UTC

More checkpointing is great! But... this is a bit extreme. My write to disk at MOST every... setting is at 1800 seconds. My harddrive will never be able to spin down and go in to power saver mode all night long if the checkpoints continue at this pace.


1/20/2009 3:31:58 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed
1/20/2009 3:32:28 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed
1/20/2009 3:32:39 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed
1/20/2009 3:33:14 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed
1/20/2009 3:33:22 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed
1/20/2009 3:34:02 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed
1/20/2009 3:34:04 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed
1/20/2009 3:34:42 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed
1/20/2009 3:34:44 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed
1/20/2009 3:35:22 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed
1/20/2009 3:35:28 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed
1/20/2009 3:36:04 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed
1/20/2009 3:36:17 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed
1/20/2009 3:36:44 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed
1/20/2009 3:36:58 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed
1/20/2009 3:37:24 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed
1/20/2009 3:37:39 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed
1/20/2009 3:38:04 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed
1/20/2009 3:38:21 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed
1/20/2009 3:38:43 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed
1/20/2009 3:39:02 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed
1/20/2009 3:39:22 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed
1/20/2009 3:39:44 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0 checkpointed

ID: 4460 · Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 14 Jan 09
Posts: 62
Credit: 33,293
RAC: 0
Message 4463 - Posted: 20 Jan 2009, 23:11:44 UTC

Need more work ...

I have run with 1.50 and both were success ... of course the Mac Application has been stable for me, even the awful 1.47 which really farbled up my XP machines ... well, I be doing my part ... :)
ID: 4463 · Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 14 Jan 09
Posts: 62
Credit: 33,293
RAC: 0
Message 4465 - Posted: 21 Jan 2009, 5:31:15 UTC - in response to Message 4464.  

as you wish ...


Well, it is the only defect I have found so far on OS-X ...I can't get work ... :)

of course, 1.47 works well on OS-X no hung tasks, no long running tasks ... no illegal functions ... so ... well, I will, be looking to add a windows machine the next drop ...

Anyway, got three more tasks ... thanks ...
ID: 4465 · Report as offensive    Reply Quote
HA-SOFT, s.r.o.

Send message
Joined: 19 Jan 09
Posts: 6
Credit: 19,644
RAC: 0
Message 4466 - Posted: 21 Jan 2009, 9:10:21 UTC

I still have problems on my new W2008 X64 server.
Every taks of 1.5 minirosetta hangs at startup with 3MB memory and stdout:

[2009- 1-21 9:52:36:] :: BOINC :: boinc_init()
Created shared memory segment

These tasks hangs and I have to kill them from taskbar. After killing stderr is:


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x778806CF read attempt to address 0x00000004

Engaging BOINC Windows Runtime Debugger...
ID: 4466 · Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 14 Jan 09
Posts: 62
Credit: 33,293
RAC: 0
Message 4467 - Posted: 21 Jan 2009, 9:35:58 UTC

First ever error Task on OS-X ... I got this error:


<core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
Watchdog active.

ERROR: target_strands.size()
ERROR:: Exit from: src/protocols/abinitio/TemplateJumpSetup.cc line: 94
called boinc_finish

</stderr_txt>


Which seems to be the same error reported below ...
ID: 4467 · Report as offensive    Reply Quote
Profile sslickerson

Send message
Joined: 15 Feb 06
Posts: 17
Credit: 4,006
RAC: 0
Message 4468 - Posted: 21 Jan 2009, 13:54:50 UTC

My Windows Vista 64 laptop has received about 8 WU and all have completed successfully without error so this looks good, hopefully I will be able to attach to Rosetta soon!


ID: 4468 · Report as offensive    Reply Quote
I _ quit

Send message
Joined: 13 Jan 09
Posts: 44
Credit: 88,562
RAC: 0
Message 4469 - Posted: 21 Jan 2009, 14:35:22 UTC

every task given to me so far has completed ok.
I see there is nothing more for jobs in queue, so i take it this test is coming to an end?
ID: 4469 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 4470 - Posted: 21 Jan 2009, 15:55:30 UTC - in response to Message 4469.  

so i take it this test is coming to an end?


Don't rush it! There were still 1000 failures and 3000 successes. I'm not sure what time the last of the DB packaging problems were cleared through. But, you've barely given any time for a 24hr runtime to complete.

Greg. Typically, they release a few tasks, if no clear problems like the DB packaging, and successes start coming in, then they release a few thousand tasks over the course of a day or so. And then, when they've made the final adjustments, explained most of the reported errors and confirmed the results, then they do a final push of 10,000+ tasks (again over the course of a day or two) to really seek out those rare and intermittant problems. THEN send it over to Rosetta.
ID: 4470 · Report as offensive    Reply Quote
I _ quit

Send message
Joined: 13 Jan 09
Posts: 44
Credit: 88,562
RAC: 0
Message 4471 - Posted: 21 Jan 2009, 16:19:36 UTC - in response to Message 4470.  

so i take it this test is coming to an end?


Don't rush it! There were still 1000 failures and 3000 successes. I'm not sure what time the last of the DB packaging problems were cleared through. But, you've barely given any time for a 24hr runtime to complete.

Greg. Typically, they release a few tasks, if no clear problems like the DB packaging, and successes start coming in, then they release a few thousand tasks over the course of a day or so. And then, when they've made the final adjustments, explained most of the reported errors and confirmed the results, then they do a final push of 10,000+ tasks (again over the course of a day or two) to really seek out those rare and intermittant problems. THEN send it over to Rosetta.



ok...thanks for the explanation, just not sure what to expect here since this is the first time for me on ralph.
ID: 4471 · Report as offensive    Reply Quote
Snagletooth

Send message
Joined: 4 May 07
Posts: 67
Credit: 134,427
RAC: 0
Message 4472 - Posted: 21 Jan 2009, 17:33:55 UTC

test_cc_1_8_nocst4_hb_t367__IGNORE_THE_REST_1WOLA_4_6830_2_0 ended with same rare error as Ian D and Paul D Buck.

<core_client_version>6.2.18</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
Watchdog active.
# cpu_run_time_pref: 14400

ERROR: target_strands.size()
ERROR:: Exit from: src/protocols/abinitio/TemplateJumpSetup.cc line: 94
called boinc_finish

Snags

ID: 4472 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 4474 - Posted: 21 Jan 2009, 22:14:45 UTC - in response to Message 4462.  

More checkpointing is great! But... this is a bit extreme. My write to disk at MOST every... setting is at 1800 seconds. My harddrive will never be able to spin down and go in to power saver mode all night long if the checkpoints continue at this pace.


1/20/2009 3:31:58 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed
1/20/2009 3:32:28 PM|ralph@home|[checkpoint_debug] result


Hmm ok, i'll look into this.


Ya, this afternoon I've been running two tasks for about 4.25hrs and I've got 450 checkpoint taken messages in my messages tab. Sometimes showing two checkpoints on same task within the same second.

Task names:
test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0
test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0
(both are running v1.50)
ID: 4474 · Report as offensive    Reply Quote
I _ quit

Send message
Joined: 13 Jan 09
Posts: 44
Credit: 88,562
RAC: 0
Message 4475 - Posted: 22 Jan 2009, 1:00:19 UTC - in response to Message 4474.  
Last modified: 22 Jan 2009, 1:02:22 UTC

More checkpointing is great! But... this is a bit extreme. My write to disk at MOST every... setting is at 1800 seconds. My harddrive will never be able to spin down and go in to power saver mode all night long if the checkpoints continue at this pace.


1/20/2009 3:31:58 PM|ralph@home|[checkpoint_debug] result test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0 checkpointed
1/20/2009 3:32:28 PM|ralph@home|[checkpoint_debug] result


Hmm ok, i'll look into this.


Ya, this afternoon I've been running two tasks for about 4.25hrs and I've got 450 checkpoint taken messages in my messages tab. Sometimes showing two checkpoints on same task within the same second.

Task names:
test_cc_1_8_nocst4_hb_t342__IGNORE_THE_REST_2G0QA_13_6824_1_0
test_cc_1_8_nocst4_hb_t332__IGNORE_THE_REST_1X7OA_6_6823_1_0
(both are running v1.50)



interesting how you guys are getting checkpoint messages, i don't see that in my boinc manager. is that due to me using version 6 and your using version 5?
ID: 4475 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 4476 - Posted: 22 Jan 2009, 5:09:57 UTC

Greg, this is one of the BOINC debug messages. You get them be setting up a cc_config.xml file. In this case you need "checkpoint_debug" set to 1. You also need at least the first three set to 1.

Otherwise, the checkpoints are pretty transparent. But, in my case, my BOINC data directory is over on my second drive and it cannot spin down when idle now, because it is never idle. Normally, the drive is set to spin down when not in use, and then every 15-30 minutes BOINC kicks in and wants to write something and it spins up to do so, then goes back to sleep. The timer that makes the drive sleep is longer then the time between checkpoints with this new app.

It really should be honoring the BOINC setting for "write at most". I'm not clear why, but many projects do not honor that setting. They take then checkpoints as they are able, regardless. ...which was fine, until they started checkpointing every 2 minutes :)
ID: 4476 · Report as offensive    Reply Quote
Evan

Send message
Joined: 23 Dec 07
Posts: 75
Credit: 69,584
RAC: 0
Message 4477 - Posted: 22 Jan 2009, 9:28:05 UTC

I don't see that in my boinc manager


You can find it in your slots directory. (v6.2.19) You will find the boinc_checkpoint_count file. Open it and you will see how many checkpoints you have. Also there is a list of the checkpoints.

My work unit has been running for for about 34 minutes and I have accumulated 43 checkpoints.

ID: 4477 · Report as offensive    Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : RALPH@home bug list : minirosetta v1.48-1.51 bug thread



©2024 University of Washington
http://www.bakerlab.org