Minirosetta 1.95

Message boards : RALPH@home bug list : Minirosetta 1.95

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
AdeB
Avatar

Send message
Joined: 22 Dec 07
Posts: 61
Credit: 161,367
RAC: 0
Message 4927 - Posted: 22 Aug 2009, 9:50:58 UTC

<message>
Maximum memory exceeded
</message>

in task 1578464

AdeB
ID: 4927 · Report as offensive    Reply Quote
I _ quit

Send message
Joined: 13 Jan 09
Posts: 44
Credit: 88,562
RAC: 0
Message 4928 - Posted: 22 Aug 2009, 14:46:55 UTC

ID: 4928 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 4930 - Posted: 25 Aug 2009, 8:36:05 UTC

https://ralph.bakerlab.org/workunit.php?wuid=1397739
I kill this wu (and other 5), because after 36 minutes it's at 0% (initializing)....
Use of ram? Over 500 mb!!
ID: 4930 · Report as offensive    Reply Quote
Snagletooth

Send message
Joined: 4 May 07
Posts: 67
Credit: 134,427
RAC: 0
Message 4931 - Posted: 25 Aug 2009, 9:07:41 UTC
Last modified: 25 Aug 2009, 9:09:00 UTC

242l_A_50_I_ddg_predictions_82409_001_MUT.242l_A_50_I_.out_12117_1_0

According to the information in the graphics window this one was still initializing after 52 minutes. A minute or so after I closed the window it uploaded and now reports that it has completed one decoy successfully. As my target runtime is 4 hours I suspect that it is not simply a matter of an error in the graphics. The question then is would it be just as well to abort WUs that don't initialize within x minutes and report them here as stefanob has done or is there valuable information in the 4.41MB upload that would be lost if the WU is not allowed to end on its own?


Snags
ID: 4931 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 4932 - Posted: 25 Aug 2009, 13:15:29 UTC - in response to Message 4931.  


The question then is would it be just as well to abort WUs that don't initialize within x minutes and report them here as stefanob has done or is there valuable information in the 4.41MB upload that would be lost if the WU is not allowed to end on its own?
Snags


I kill the wu because i cannot work with my pc (over 1Gb of ram....)!!!
I think that debug is important, but this is too much for my poor notebook
I return to rosetta 1.97 (90 mb at WU)
ID: 4932 · Report as offensive    Reply Quote
AdeB
Avatar

Send message
Joined: 22 Dec 07
Posts: 61
Credit: 161,367
RAC: 0
Message 4933 - Posted: 25 Aug 2009, 19:32:12 UTC

I had to kill this task. It made other applications crash and claimed almost all of the memory (including virtual memory).

AdeB
ID: 4933 · Report as offensive    Reply Quote
Nflight

Send message
Joined: 1 Nov 07
Posts: 5
Credit: 36,103
RAC: 0
Message 4934 - Posted: 26 Aug 2009, 0:13:24 UTC

Work Unit: 1aye D 35 a ddg predictions 82009 003 MUT D 35 A .out 12105 1 1

Taking up 563 K of Ram

Now going on 16 hours.

Now ending, locking up my system time to time!

Work Unit 1395454


ID: 4934 · Report as offensive    Reply Quote
Snagletooth

Send message
Joined: 4 May 07
Posts: 67
Credit: 134,427
RAC: 0
Message 4935 - Posted: 26 Aug 2009, 2:07:53 UTC

238l_A_103_V_ddg_predictions_82409_010_MUT.238l_A_103_V_.out_12126_1_0

Another one. I was away from the computer while this one ran but it ended itself in less than an hour with the same report as the last WU. I'm sorry I wasn't around to take note of the amount of memory it claimed. Obviously if it's taking up all the computer's memory it's going to get killed by the cruncher. Has anyone tried quitting(not suspending) BOINC (or even just that task) thus removing it from memory then restarting? I wonder if it would error out immediately on restart and further if that error report would contain anything more useful than a simple abort.
I vaguely recall the project adding code to end those WUs that never seem to start and assume that's catching mine though I see nothing obvious in the sterr out unless the clue is this line: reached end of minirosetta::main(). I further assume that claiming one decoy is just to give me some credit without having to run a special validator script. If my assumptions are correct the question of most interest though would be, Why did my WUs end gracefully but Nflight's run on for 16 hours?

Snags
ID: 4935 · Report as offensive    Reply Quote
Profile RodEllery

Send message
Joined: 20 Feb 06
Posts: 5
Credit: 8,820
RAC: 0
Message 4936 - Posted: 26 Aug 2009, 14:31:54 UTC

KIlled This Task.
Target time 2hrs. No progress after 6.5 hrs.
ID: 4936 · Report as offensive    Reply Quote
Snagletooth

Send message
Joined: 4 May 07
Posts: 67
Credit: 134,427
RAC: 0
Message 4937 - Posted: 27 Aug 2009, 5:07:53 UTC

theta_PCS_BOINC_abrelax.1xcycles.v1_SAVE_ALL_OUT_12132_2_0

21 decoys completed but output file absent:

Thu Aug 27 00:03:14 2009|ralph@home|Computation for task theta_PCS_BOINC_abrelax.1xcycles.v1_SAVE_ALL_OUT_12132_2_0 finished
Thu Aug 27 00:03:14 2009|ralph@home|Output file theta_PCS_BOINC_abrelax.1xcycles.v1_SAVE_ALL_OUT_12132_2_0_0 for task theta_PCS_BOINC_abrelax.1xcycles.v1_SAVE_ALL_OUT_12132_2_0 absent

stderr out:
======================================================
DONE :: 21 starting structures 13985.5 cpu seconds
This process generated 21 decoys from 21 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>theta_PCS_BOINC_abrelax.1xcycles.v1_SAVE_ALL_OUT_12132_2_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

ID: 4937 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 4938 - Posted: 27 Aug 2009, 10:33:00 UTC

Just had two 6 hour WUs fail due to

<error_code>-161</error_code>
</file_xfer_error>

Other than the transfer issue the WUs completed OK and generated a number of Decoys each.

See 1582694 and 1583046

Conan


ID: 4938 · Report as offensive    Reply Quote
himmelskasper

Send message
Joined: 24 May 09
Posts: 1
Credit: 2,638
RAC: 0
Message 4939 - Posted: 27 Aug 2009, 17:22:05 UTC

mmhm, i have permanent calculation faults (@40%) :( ...what is the problem?...cu

hK
ID: 4939 · Report as offensive    Reply Quote
Snagletooth

Send message
Joined: 4 May 07
Posts: 67
Credit: 134,427
RAC: 0
Message 4940 - Posted: 28 Aug 2009, 5:20:52 UTC

ID: 4940 · Report as offensive    Reply Quote
BigMike
Avatar

Send message
Joined: 23 Feb 06
Posts: 63
Credit: 58,730
RAC: 0
Message 4941 - Posted: 28 Aug 2009, 6:03:50 UTC

This one only ran for a few seconds:

Incorrect function. (0x1) - exit code 1 (0x1)

ERROR: ERROR: Unable to open silent_input file: 'default.out'
ERROR:: Exit from: ....srccoreiosilentSilentFileData.cc line: 96
BOINC:: Error reading and gzipping output datafile: default.out

==Mike
Don't believe everything you think.
ID: 4941 · Report as offensive    Reply Quote
I _ quit

Send message
Joined: 13 Jan 09
Posts: 44
Credit: 88,562
RAC: 0
Message 4942 - Posted: 28 Aug 2009, 9:58:20 UTC

4 x the same error:

ERROR: ERROR: Unable to open silent_input file: 'default.out'
ERROR:: Exit from: ....srccoreiosilentSilentFileData.cc line: 96
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

https://ralph.bakerlab.org/result.php?resultid=1583639
https://ralph.bakerlab.org/result.php?resultid=1583638
https://ralph.bakerlab.org/result.php?resultid=1583662
https://ralph.bakerlab.org/result.php?resultid=1583661

happens between 5-9 seconds into the task
ID: 4942 · Report as offensive    Reply Quote
svincent

Send message
Joined: 4 Apr 08
Posts: 34
Credit: 51,768
RAC: 0
Message 4943 - Posted: 28 Aug 2009, 16:01:03 UTC

I'm getting Ralph workunits for 1.95 while Rosetta is at 1.97. How come?
ID: 4943 · Report as offensive    Reply Quote
Path7

Send message
Joined: 11 Feb 08
Posts: 56
Credit: 4,974
RAC: 0
Message 4944 - Posted: 28 Aug 2009, 19:45:46 UTC - in response to Message 4943.  
Last modified: 28 Aug 2009, 19:48:56 UTC

I'm getting Ralph workunits for 1.95 while Rosetta is at 1.97. How come?

Good question.
Looks like a (quick-)fix to me at Rosetta@home.
Perhaps the techs would like to answer the question why Ralph is still at minirosetta 1.95?

And an error:
proteinG_PCS_BOINC_abrelax.1xcycles.v1_NOGZ_SAVE_ALL_OUT_12136_6_1

ERROR: ERROR: Unable to open silent_input file: 'default.out'
ERROR:: Exit from: ....srccoreiosilentSilentFileData.cc line: 96
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

Had the second run of this WU, the first run it had the same error.

Have a nice day,
Path7.
ID: 4944 · Report as offensive    Reply Quote
svincent

Send message
Joined: 4 Apr 08
Posts: 34
Credit: 51,768
RAC: 0
Message 4945 - Posted: 28 Aug 2009, 20:16:39 UTC

This 1585116 on Mac sat for over an hour apparently initialising ( 0% progress : model 0 step 0) yet it completed OK. I used the Sampler to get a dump of what it was doing while initialising: it's way too big to post here but will Email it if it's of interest.

I've had the same issue with a few recent Rosetta@home workunits.
ID: 4945 · Report as offensive    Reply Quote
Murasaki

Send message
Joined: 1 Aug 09
Posts: 7
Credit: 2,202
RAC: 0
Message 4946 - Posted: 30 Aug 2009, 22:05:34 UTC

Task 1585136
myoglobin_PCS_BOINC_abrelax.1xcycles.v1_SAVE_ALL_OUT_12132_13_1

======================================================
DONE :: 6 starting structures 3145.11 cpu seconds
This process generated 6 decoys from 6 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>myoglobin_PCS_BOINC_abrelax.1xcycles.v1_SAVE_ALL_OUT_12132_13_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>


An earlier attempt to crunch this WU by another user resulted in the same xfer error.
ID: 4946 · Report as offensive    Reply Quote
Profile RodEllery

Send message
Joined: 20 Feb 06
Posts: 5
Credit: 8,820
RAC: 0
Message 4947 - Posted: 31 Aug 2009, 22:36:29 UTC

This task errored after 5+ hours with a file upload error.
ID: 4947 · Report as offensive    Reply Quote
1 · 2 · 3 · Next

Message boards : RALPH@home bug list : Minirosetta 1.95



©2024 University of Washington
http://www.bakerlab.org