| Author | Message |
|
|
|
Hopefully all the graphics work now (no more blackout current windows).
As always, your feedback is highly appreciated !
Mike |
|
|
|
|
|
In the Message Boards: RALPH@home bug list forum, you might want to check if some of the older threads still need stickies, such as:
Bug Reports for Ralph Server Update to BOINC version 5.9.2 |
|
|
|
|
|
Looks like you don't have a link to the 1.58 bugs thread on your home page yet. |
|
|
|
|
|
I finally decided to take the time to report this petty issue. Note that I didn't have a v1.58 task to take the screen shot from at the time, but seems to be the same scaling issue there from what I've seen so far. This has been the case for quite some time in the graphic. At least on my machines. The specific machine, Rosetta version, task type all seem to have the same problem.
I use Windows XP, and display resolution of 1024 x 768 which is pretty standard I believe. By default, when I open the graphic of a running task, it comes up to a less then full screen window (see image below). And if you note the title bar seems to be scaled in as part of the low energy and the RMSD, perhaps all boxes on the top half of the graphic. The low energy box and the Native box should have the same displayable height. But they don't.
As this one ran on 1.56, it wasn't showing an RMSD history at all. But if it were, you'd see it take longer to appear in the display then the energy graph. And if you maximize to full screen, you will see more of the RMSD history (the title bar being scaled to less of the window leaves more room to show the RMSD), then with the original window size.
Let me know if I'm the lone wolf on this issue or what further video or other details may be relevant.

____________
|
|
|
|
|
|
Feet1st, both ralph and rosie open the graphics in a reduced window on my system and i am running wide screen 1680 x1050.
once i click on the the windows box to bring it to full screen it does. |
|
|
|
|
|
"...it does" what when maximized? Scale properly? My "square" boxes still aren't square, they are short by the thickness of the titlebar.
____________
|
|
|
|
|
|
@Feel1st: Hi there, I haven't noticed it before but I have the same on my widescreen, Win XP SP2, task lr6_E_score12.
It looks as if the top of the Ralph graphics is consumed by the title (usually blue) bar of the viewer application.
The same model when displayed as a screensaver (without the title bar) looks O.K. |
|
|
|
|
|
Feet1st I am also using XP with the same resolution. On full screen the native box is approximately 6 cm high while the low energy box is about 5 cm. The energy graph takes a fair time to appear from the left and the rmsd graph takes even longer as you have described. It is probably being exemplified because I am going through some slow moving proteins (csttest)on 1.57. |
|
|
|
|
|
i see in the rosie graphics the same size of boxes for low energy and native models as those here on ralph. 6cm for low energy and 7cm for native(not including the extra mm's) on both rosie and here on ralph. the low energy model here on ralph looks a bit close to the top of the box vs rosie. but then again its hard to compare two different proteins and their rotations to each other.
i am noticing that the top of the model for low energy is getting clipped by the title bar here on ralph. this is in the 1.54 rosie and the 1.56 ralph. I also started my first 1.58 and see the same dimensions and the same issue that the top of whatever strand is at the top of the low energy box gets a few mm's cut off of it if it happens to be a ribbon strand that is in a long vertical loop. in stick form the model fits into the box. box size did not change. |
|
|
|
|
|
1.58 errors are still being reported in the thread for 1.55 errors, since there's no reference to the 1.58 errors thread on the home page. |
|
|
|
|
|
Have just had 9 errors for 1.58 with this error
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
[2009- 2-11 13:51:53:] :: BOINC:: Initializing ... ok.
[2009- 2-11 13:51:53:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Trying to access options object.
Success.
src/protocols/abinitio/AbrelaxApplication.cc:217
src/protocols/abinitio/AbrelaxApplication.cc:237
src/protocols/abinitio/AbrelaxApplication.cc:295
src/protocols/abinitio/AbrelaxApplication.cc:317
src/protocols/abinitio/AbrelaxApplication.cc:324
src/protocols/abinitio/AbrelaxApplication.cc:326
src/protocols/abinitio/AbrelaxApplication.cc:328
src/protocols/abinitio/AbrelaxApplication.cc:330
src/protocols/abinitio/AbrelaxApplication.cc:335
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/ralph.bakerlab.org/minirosetta_database_rev26003.zip
Unpacking WU data ...
Unpacking data: ../../projects/ralph.bakerlab.org/cst_1_8_nativecst_b1.0_cen_0.1.foldcst_chunk_general.t373_.mtyka.boinc_files.zip
Setting database description ...
Setting up checkpointing ...
Setting up folding (abrelax) ...
ERROR: [ERROR] Unable to open constraints file: .pdb.distances.csts.bounded_1.0
ERROR:: Exit from: src/core/scoring/constraints/ConstraintIO.cc line: 330
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
they are dying almost as soon as they start (after 14 to 20 minutes).
See this result as an example.
____________
 |
|
|
|
|
|
This 1.58 workunit:
http://ralph.bakerlab.org/workunit.php?wuid=1151935
keeps giving me error messages like these:
2/12/2009 9:34:00 AM|ralph@home|Task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 exited with zero status but no 'finished' file
2/12/2009 9:34:00 AM|ralph@home|If this happens repeatedly you may need to reset the project.
2/12/2009 9:34:00 AM|ralph@home|Restarting task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 using minirosetta version 158
2/12/2009 9:34:41 AM|ralph@home|Task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 exited with zero status but no 'finished' file
2/12/2009 9:34:41 AM|ralph@home|If this happens repeatedly you may need to reset the project.
2/12/2009 9:34:41 AM|ralph@home|Restarting task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 using minirosetta version 158
2/12/2009 9:35:22 AM|ralph@home|Task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 exited with zero status but no 'finished' file
2/12/2009 9:35:22 AM|ralph@home|If this happens repeatedly you may need to reset the project.
2/12/2009 9:35:23 AM|ralph@home|Restarting task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 using minirosetta version 158
2/12/2009 9:36:04 AM|ralph@home|Task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 exited with zero status but no 'finished' file
2/12/2009 9:36:04 AM|ralph@home|If this happens repeatedly you may need to reset the project.
2/12/2009 9:36:04 AM|ralph@home|Restarting task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 using minirosetta version 158
2/12/2009 9:36:45 AM|ralph@home|Task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 exited with zero status but no 'finished' file
2/12/2009 9:36:45 AM|ralph@home|If this happens repeatedly you may need to reset the project.
2/12/2009 9:36:45 AM|ralph@home|Restarting task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 using minirosetta version 158
2/12/2009 9:37:26 AM|ralph@home|Task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 exited with zero status but no 'finished' file
2/12/2009 9:37:26 AM|ralph@home|If this happens repeatedly you may need to reset the project.
Also, when I start its graphics window within the Simple View, the window comes up bit with solid black where the graphics should be. If I hover my cursor over this window, I get the circle indicating that it is waiting for something, although without indicating what. If I then try to close the graphics window, the graphics program doesn't respond to this attempt, and must be aborted. The CPU time used so far is frozen at 0:26:55, and this workunit doesn't seem to be using the CPU even though it's listed as Running. If I try to start the graphics from within the Tasks tab of the Advanced View, the results are similar except that after a second or so, there's a white block near the center of the black graphics window.
Since I've finally succeeded in setting the percentage of CPU time used to 90% instead of 100%, this could a factor in triggering the problem.
During this time, the task monitor shows one CPU core running normally, and different views disagree on how the other is running except that they all agree that it's using significantly less than the requested 90% of the CPU time. An actual usage of about 30% is typical for these views, but one shows about 50% instead.
This workunit looks like it needs to be aborted, but is there something else I need to try first?
I'm using BOINC 6.2.28 under Vista SP1. |
|
|
|
|
|
This workunit went into Compute error status before anyone replied, so I decided to return it with an Update.
It had the lockfile problem, repeatedly, but didn't make any information about that visible to the user before the workunit finished.
If someone else finishes that workunit with a success, could you ask him or her whether he or she is using 100% CPU? |
|
|
|
|
|
This one: 1150435 was a long one taking about 5 hours completing one decoy but given a validate error. I notice that its second attempt ended in immediate failure.
Also is it my imagination or are these units (cst__1_8_nativecst_b1.0_cen_0.1_hb_t313 etc) using more ram? |
|
|
|
|
This 1.58 workunit:
http://ralph.bakerlab.org/workunit.php?wuid=1151935
It looks like the second try it ran to completion ...
It looks to me like my assertion that running at less than 100% CPU causes issues ... like the "can't acquire lockfile" error ... |
|
|
|
|
|
New error ... random, many tasks work, but I have at least 5 with this error:
ERROR: [ERROR] Error opening RBSeg file 'core_1VKBA_18_noloop_loops.txt'
ERROR:: Exit from: src/protocols/loops/LoopClass.cc line: 443
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
Not sure why the mix of success and failure ... the failures:
1308740
1308739
1307455
1307355
1307354 |
|
|
|
|
|
Hello all,
Having the same kind of error as Paul D. Buck:
ERROR: [ERROR] Error opening RBSeg file 'core_1B70B_12_noloop_loops.txt'
ERROR:: Exit from: ..\..\src\protocols\loops\LoopClass.cc line: 443
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
loopbuild_chunk_cheat_3_5_hb_t306__IGNORE_THE_REST_1B70B_12_7794_1_0
Windows XP – Boinc 5.10.45
This WU had the same error again when it was returned by another computer.
Have a nice day,
Path7.
|
|
|
|
|
|
Hello,
And the same error here: loopbuild_chunk_2_7_hb_t332__IGNORE_THE_REST_1X7OA_7_7809_1_0
ERROR: [ERROR] Error opening RBSeg file 'core_1X7OA_7_noloop_loops.txt'
ERROR:: Exit from: src/protocols/loops/LoopClass.cc line: 443
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
Gentoo linux - Boinc 5.10.45
AdeB |
|
|
|
|
|
Also had one repetition of the line 330 error:
ERROR: [ERROR] Unable to open constraints file: .pdb.distances.csts.bounded_1.0
ERROR:: Exit from: src/core/scoring/constraints/ConstraintIO.cc line: 330
BOINC:: Error reading and gzipping output datafile: default.out
Task 1309122
Just as important, have also had quite a few successes too ... though not as good as 1.54 running on Rosetta where I have not had an error in a couple weeks ...
Hmm, was this just a reissue of the task that should have been canceled? I see that the wingman also had the same error ... |
|
|
|
|
This 1.58 workunit:
http://ralph.bakerlab.org/workunit.php?wuid=1151935
It looks like the second try it ran to completion ...
It looks to me like my assertion that running at less than 100% CPU causes issues ... like the "can't acquire lockfile" error ...
Mike, have you thought of adding some debug code to the parts of the next version of minirosetta that have anything to do with the lockfile, and occasionally recording the percentage of the CPU time BOINC runs?
|
|
|
|
|
This 1.58 workunit:
http://ralph.bakerlab.org/workunit.php?wuid=1151935
It looks like the second try it ran to completion ...
It looks to me like my assertion that running at less than 100% CPU causes issues ... like the "can't acquire lockfile" error ...
Mike, have you thought of adding some debug code to the parts of the next version of minirosetta that have anything to do with the lockfile, and occasionally recording the percentage of the CPU time BOINC runs?
Mike,
Also, debug code for any parts that can do an exit from the program without setting the status.
Some of the messages so far from a workunit likely to need this debug code to determine just what's going on:
2/14/2009 12:29:53 PM|ralph@home|Task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 exited with zero status but no 'finished' file
2/14/2009 12:29:53 PM|ralph@home|If this happens repeatedly you may need to reset the project.
2/14/2009 12:29:53 PM|ralph@home|Restarting task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 using minirosetta version 158
2/14/2009 12:30:34 PM|ralph@home|Task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 exited with zero status but no 'finished' file
2/14/2009 12:30:34 PM|ralph@home|If this happens repeatedly you may need to reset the project.
2/14/2009 12:30:35 PM|ralph@home|Restarting task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 using minirosetta version 158
2/14/2009 12:31:16 PM|ralph@home|Task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 exited with zero status but no 'finished' file
2/14/2009 12:31:16 PM|ralph@home|If this happens repeatedly you may need to reset the project.
2/14/2009 12:31:16 PM|ralph@home|Restarting task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 using minirosetta version 158
2/14/2009 12:31:57 PM|ralph@home|Task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 exited with zero status but no 'finished' file
2/14/2009 12:31:57 PM|ralph@home|If this happens repeatedly you may need to reset the project.
2/14/2009 12:31:57 PM|ralph@home|Restarting task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 using minirosetta version 158
2/14/2009 12:32:38 PM|ralph@home|Task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 exited with zero status but no 'finished' file
2/14/2009 12:32:38 PM|ralph@home|If this happens repeatedly you may need to reset the project.
2/14/2009 12:32:39 PM|ralph@home|Restarting task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 using minirosetta version 158
2/14/2009 12:33:20 PM|ralph@home|Task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 exited with zero status but no 'finished' file
2/14/2009 12:33:20 PM|ralph@home|If this happens repeatedly you may need to reset the project.
2/14/2009 12:33:20 PM|ralph@home|Restarting task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 using minirosetta version 158
2/14/2009 12:34:01 PM|ralph@home|Task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 exited with zero status but no 'finished' file
2/14/2009 12:34:01 PM|ralph@home|If this happens repeatedly you may need to reset the project.
2/14/2009 12:34:01 PM|ralph@home|Restarting task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 using minirosetta version 158
2/14/2009 12:34:42 PM|ralph@home|Task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 exited with zero status but no 'finished' file
2/14/2009 12:34:42 PM|ralph@home|If this happens repeatedly you may need to reset the project.
2/14/2009 12:34:42 PM|ralph@home|Restarting task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 using minirosetta version 158
2/14/2009 12:35:23 PM|ralph@home|Task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 exited with zero status but no 'finished' file
2/14/2009 12:35:23 PM|ralph@home|If this happens repeatedly you may need to reset the project.
2/14/2009 12:35:23 PM|ralph@home|Restarting task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 using minirosetta version 158
2/14/2009 12:36:04 PM|ralph@home|Task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 exited with zero status but no 'finished' file
2/14/2009 12:36:04 PM|ralph@home|If this happens repeatedly you may need to reset the project.
http://ralph.bakerlab.org/workunit.php?wuid=1156395
04:40:56 CPU so far with 6 hours requested, and no longer changing even while the workunit is running. Task Manager indicates that it is using 0% CPU time.
This is with a 1.58 workunit at 90% CPU, under BOINC 6.2.28 with 32-bit Vista SP1 with a dual-core AMD CPU. I don't know if it's significant that I only saw this problem after enabling the graphics for a few minutes, even though I normally keep it disabled. The graphics looked reasonable, though. |
|
|
|
|
http://ralph.bakerlab.org/workunit.php?wuid=1156395
04:40:56 CPU so far with 6 hours requested, and no longer changing even while the workunit is running. Task Manager indicates that it is using 0% CPU time.
This is with a 1.58 workunit at 90% CPU, under BOINC 6.2.28 with 32-bit Vista SP1 with a dual-core AMD CPU. I don't know if it's significant that I only saw this problem after enabling the graphics for a few minutes, even though I normally keep it disabled. The graphics looked reasonable, though.
Now that workunit has ended with a Computation error after a lot of these messages (not visible until the workunit ends):
[2009- 2-14 11:55:25:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 2-14 11:56: 7:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 2-14 11:56:48:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 2-14 11:57:29:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 2-14 11:58:11:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 2-14 11:58:52:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 2-14 11:59:33:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 2-14 12: 0:14:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 2-14 12: 0:56:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
Hope these results are at least useful in tracking down the lockfile problem.
|
|
|
|
|
|
I tried resetting the Ralph@home project as the messages suggested, and got these messages as a result:
2/14/2009 2:56:12 PM|ralph@home|Resetting project
2/14/2009 2:56:18 PM|ralph@home|[error] Couldn't delete file projects/ralph.bakerlab.org/minirosetta_1.58_windows_intelx86.exe
2/14/2009 2:56:33 PM|ralph@home|Sending scheduler request: To fetch work. Requesting 3853 seconds of work, reporting 0 completed tasks
2/14/2009 2:56:38 PM|ralph@home|Scheduler request succeeded: got 0 new tasks
Looks like there's also a problem in your reset procedure. |
|
|
|
|
I don't know if it's significant that I only saw this problem after enabling the graphics for a few minutes, even though I normally keep it disabled. The graphics looked reasonable, though.
Ha! There may be the clue I was missing!
I don't recall if I had been looking at the graphics or not. But it is likely when I saw a failure. Until I gave up in disgust.
Though there are no tasks perhaps the test would be to run a few tasks with no looking and some where you look at the graphics.
I am not sure why the launching of the graphics application would cause this issue but this could be the missing clue ... and why I never saw the issue in Einstein even when I had the setting that caused this issue in Rosetta ...
MOST interesting is that you can launch graphics at 100% and have no issue. But, that the switch to pause the application would cause it.
Oh, and if you don't want to lose that much CPU you can use 99% like I did and get the same effect ... |
|
|
|
|
Oh, and if you don't want to lose that much CPU you can use 99% like I did and get the same effect ...
I tried 99% for a while but had two problems with this setting:
1. Problems making this setting reduce the CPU percentage at all - now fixed.
2. Problems getting Task Manager to show me such small gaps in CPU usage.
I may try again soon at 95%, though.
Mike, in order to save time in testing, you may want to consider these ideas:
1. Try to send a larger share of any workunits aimed at the lockfile problem to machines known to have had these problems recently.
2. If these same machines get a workunit aimed at testing anything else, immediately put a copy of that workunit back on the queue to be sent to machines not in this group. |
|
|
|
|
I tried resetting the Ralph@home project as the messages suggested, and got these messages as a result:
2/14/2009 2:56:12 PM|ralph@home|Resetting project
2/14/2009 2:56:18 PM|ralph@home|[error] Couldn't delete file projects/ralph.bakerlab.org/minirosetta_1.58_windows_intelx86.exe
2/14/2009 2:56:33 PM|ralph@home|Sending scheduler request: To fetch work. Requesting 3853 seconds of work, reporting 0 completed tasks
2/14/2009 2:56:38 PM|ralph@home|Scheduler request succeeded: got 0 new tasks
Looks like there's also a problem in your reset procedure.
Better check just what that Ralph@home reset procedure does. Since the reset, I haven't been able to connect to Rosetta@home, either through BOINC or through its website. I have a Rosetta@home result I haven't been able to send, or I'd try resetting the Rosetta@home project also.
Has Rosetta@home been offline for several hours, or is this part of the result of the Ralph@home reset attempt? |
|
|
|
|
I tried resetting the Ralph@home project as the messages suggested, and got these messages as a result:
2/14/2009 2:56:12 PM|ralph@home|Resetting project
2/14/2009 2:56:18 PM|ralph@home|[error] Couldn't delete file projects/ralph.bakerlab.org/minirosetta_1.58_windows_intelx86.exe
2/14/2009 2:56:33 PM|ralph@home|Sending scheduler request: To fetch work. Requesting 3853 seconds of work, reporting 0 completed tasks
2/14/2009 2:56:38 PM|ralph@home|Scheduler request succeeded: got 0 new tasks
Looks like there's also a problem in your reset procedure.
Better check just what that Ralph@home reset procedure does. Since the reset, I haven't been able to connect to Rosetta@home, either through BOINC or through its website. I have a Rosetta@home result I haven't been able to send, or I'd try resetting the Rosetta@home project also.
Has Rosetta@home been offline for several hours, or is this part of the result of the Ralph@home reset attempt?
It's been nearly 24 hrs and Rosetta is still down. It's almost like a system failure happened there. The problems there have nothing to do with Ralph and that problem you are having deleting the file. |
|
|
|
|
|
This looks like a long-running model: resultid=1309799
Name: loopbuild_chunk_2_7_B_hb_t286__IGNORE_THE_REST_1YZFA_5_7846_1_0
Outcome: Validate error
stderr out: . . .
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 14400
Hbond tripped !!!
BOINC:: CPU time: 28889.1s, 14400s + 14400s[2009- 2-14 15:24: 2:] :: BOINC
======================================================
DONE :: 2 starting structures 28889.1 cpu seconds
This process generated 3 decoys from 3 attempts
======================================================
called boinc_finish
AdeB |
|
|
|
|
|
At least we got a little work again ... |
|
|
|
|
|
A failed workunit:
http://ralph.bakerlab.org/workunit.php?wuid=1160172
Some typical messages from it:
2/19/2009 12:19:42 AM|ralph@home|Restarting task 1ig5A_BOINC_ABRELAX_IGNORE_THE_REST-ENV10000--1ig5A-_7875_1_0 using minirosetta version 158
2/19/2009 12:20:22 AM|ralph@home|Task 1ig5A_BOINC_ABRELAX_IGNORE_THE_REST-ENV10000--1ig5A-_7875_1_0 exited with zero status but no 'finished' file
2/19/2009 12:20:22 AM|ralph@home|If this happens repeatedly you may need to reset the project.
2/19/2009 12:20:22 AM|ralph@home|Restarting task 1ig5A_BOINC_ABRELAX_IGNORE_THE_REST-ENV10000--1ig5A-_7875_1_0 using minirosetta version 158
2/19/2009 12:21:04 AM|ralph@home|Task 1ig5A_BOINC_ABRELAX_IGNORE_THE_REST-ENV10000--1ig5A-_7875_1_0 exited with zero status but no 'finished' file
2/19/2009 12:21:04 AM|ralph@home|If this happens repeatedly you may need to reset the project.
These messages are repeated many times.
I'm now running at 95% CPU, in order to help pin down the cause of this problem. |
|
|
|
|
|
http://ralph.bakerlab.org/result.php?resultid=1311870
<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
[2009- 2-17 6:40:25:] :: BOINC:: Initializing ... ok.
[2009- 2-17 6:40:25:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Trying to access options object.
Success.
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/ralph.bakerlab.org/minirosetta_database_rev26003.zip
Setting database description ...
Setting up checkpointing ...
Setting up folding (abrelax) ...
ERROR: ERROR: FragmentIO: could not open file cs_aa_1ji8A09_05.200_v1_3.gz
ERROR:: Exit from: ..\..\src\core\fragment\FragmentIO.cc line: 245
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
</stderr_txt>
]]>
http://ralph.bakerlab.org/result.php?resultid=1311551
<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
[2009- 2-15 22:23:26:] :: BOINC:: Initializing ... ok.
[2009- 2-15 22:23:26:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Trying to access options object.
Success.
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/ralph.bakerlab.org/minirosetta_database_rev26003.zip
Unpacking WU data ...
Unpacking data: ../../projects/ralph.bakerlab.org/loopbuild_chunk_cheat_3_5.loopbuild_chunk.t326_.mtyka.boinc_files.zip
Setting database description ...
Setting up checkpointing ...
ERROR: [ERROR] Error opening RBSeg file 'core_2GHRA_10_noloop_loops.txt'
ERROR:: Exit from: ..\..\src\protocols\loops\LoopClass.cc line: 443
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
</stderr_txt>
]]>
____________
|
|
|
|
|
These messages are repeated many times.
I'm now running at 95% CPU, in order to help pin down the cause of this problem.
The one hint is that POSSIBLY only those tasks where you have the screen saver activate or use the graphics application ALONG with CPU throtteling may be linked ... can you make note of that? |
|
|
|
|
|
Invalid, Huh ?
http://ralph.bakerlab.org/result.php?resultid=1316610
<core_client_version>6.4.5</core_client_version>
<![CDATA[
<stderr_txt>
[2009- 2-20 5:36:36:] :: BOINC:: Initializing ... ok.
[2009- 2-20 5:36:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Trying to access options object.
Success.
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/ralph.bakerlab.org/minirosetta_database_rev26003.zip
Unpacking WU data ...
Unpacking data: ../../projects/ralph.bakerlab.org/loopbuild_mamaln_ideal.loopbuild.t312_.mtyka.boinc_files.zip
Setting database description ...
Setting up checkpointing ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
======================================================
DONE :: 1 starting structures 3034.98 cpu seconds
This process generated 5 decoys from 5 attempts
======================================================
BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish
</stderr_txt>
]]>
____________
|
|
|
|
|
These messages are repeated many times.
I'm now running at 95% CPU, in order to help pin down the cause of this problem.
The one hint is that POSSIBLY only those tasks where you have the screen saver activate or use the graphics application ALONG with CPU throtteling may be linked ... can you make note of that?
Since then, I've had two 1.58 workunits complete successfully with no graphics application activation. Still running at 95% CPU.
Doing the same for 1.54 over on Rosetta@home doesn't trigger the problem.
In other words, the combination of all of the following trigger the problem for me: 1.58, less than 100% CPU, activating graphics after the workunit starts without them, shutting down the graphics window. Running 1.58 at less than 100% CPU, but with no graphics, doesn't trigger it for me. The problem doesn't trigger for 1.54. I haven't tested the other possibilities yet.
I use an all-black screen saver these days. |
|
|
|
|
|
Just FYI, ALL tasks assigned to me completed ok. NO compute errors! |
|
|
|
|
|
The last few times I looked at the System Status, the File Deleter was not running. Does it need to be running more often? |
|
|
|
|
|
It looks like three tasks with the same error... and it is not one I have seen before:
ERROR: ( vol_a.length() == 2 ) && ( std::isalpha( vol_a[ 0 ] ) ) && ( vol_a[ 1 ] == ':' )
ERROR:: Exit from: ..\..\src\utility\file\FileName.cc line: 41
BOINC:: Error reading and gzipping output datafile: default.out
1325665
1325692
1325691 |
|
|
|
|
|
I've had 4 workunits on Mac OS X 10.4.11 that all failed after apparent successful completion
</stderr_txt>
<message>
<file_xfer_error>
<file_name>homobench_natrelax_t312__8094_1_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
Workunit ID's
1170292
1170291
1170290
1170289
It appears in each case, that they had previously been sent to a Windows machine where they failed (as noted by Paul Buck) in the manner shown below, but at the start, not at the end:
ERROR: ( vol_a.length() == 2 ) && ( std::isalpha( vol_a[ 0 ] ) ) && ( vol_a[ 1 ] == ':' )
ERROR:: Exit from: ..\..\src\utility\file\FileName.cc line: 41
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
|
|
|
|
|
|
This one had a result file over 1MB. It's name is cc_2_2_mamcstmix_cen_bounded_0.1_hb_t311__ IGNORE_THE_REST_1B0NA_6_8133_1_0
It also ended after under 15 hrs on my 24hr preference. Looks like it hit the 99 model limit.
____________
|
|
|
|
|
|
validate error:
all first time runs
1329199
1329195
1329205
1329212
1329219
1329224
|
|
|
|
|
|
The error is: "Unable to open weights."
1331227
1331235
|
|
|
|
|
validate error:
all first time runs
1329199
1329195
1329205
1329212
1329219
1329224
What causes a validation error? It would appear that my 6 errant work units are well on the way to have successful second runs. |
|
|
|
|
|
There are going to be quite few errors coming through until the system sorts itself out after the closure. I have had a good number of ghost work units reportedly downloaded to me but I can't find them. They have been already been reported as successes and rewarded points but handed out again. |
|
|
|
|
|
This Workunit Displayed a black window when I pushed the show graphics button under the tasks tab in Boinc manger. It completed successfully after about 55 minutes I got credit for it. Could the reason be because the task may have been a resend? Is their a eta as to when 1.58 was be deployed on the main Rosetta? |
|
|
|
|
|
A few tasks lately have produced large outfiles, and ended prematurely (as compared to my 24hr runtime preference). Presumably due to the 99 model limit. But I thought you would want to be aware of them.
runtime !result size !task
11:41 !1.6MB !1343012
10hrs !unknown !1339103
21:41 !2.56MB !1343325
These are all the loopbuild tasks of various flavors.
____________
|
|
|
|
|
|
One WU seems not willing to upload at all. Others before and after it are
uploading correctly:
13.3.2009 21:38:39|ralph@home|Started upload of loopbuild_mamaln_full_hb_t303__IGNORE_THE_REST_1te2_99_8360_1_0_0
13.3.2009 21:38:39|ralph@home|Started upload of loopbuild_mamaln_full_hb_t303__IGNORE_THE_REST_1f5s_99_8360_1_0_0
13.3.2009 21:38:39|Poem@Home|Restarting task Peptide_387_1236485967_1676836187_0 using poem version 100
13.3.2009 21:38:39|malariacontrol.net|Restarting task wu_510_511_2640_0_1236930257_1 using malariacontrolBeta version 612
13.3.2009 21:38:39|malariacontrol.net|Restarting task wu_510_415_2640_0_1236930257_0 using malariacontrolBeta version 612
13.3.2009 21:38:39|malariacontrol.net|Restarting task wu_510_414_2640_0_1236930257_1 using malariacontrolBeta version 612
13.3.2009 21:38:40|ralph@home|[error] Error reported by file upload server: [loopbuild_mamaln_full_hb_t303__IGNORE_THE_REST_1te2_99_8360_1_0_0] locked by file_upload_handler PID=-1
13.3.2009 21:38:40|ralph@home|Temporarily failed upload of loopbuild_mamaln_full_hb_t303__IGNORE_THE_REST_1te2_99_8360_1_0_0: transient upload error
13.3.2009 21:38:40|ralph@home|Backing off 3 hr 5 min 27 sec on upload of loopbuild_mamaln_full_hb_t303__IGNORE_THE_REST_1te2_99_8360_1_0_0
...
15.3.2009 9:34:49|ralph@home|Started upload of loopbuild_mamaln_full_hb_t303__IGNORE_THE_REST_1te2_99_8360_1_0_0
15.3.2009 9:34:51|ralph@home|[error] Error reported by file upload server: [loopbuild_mamaln_full_hb_t303__IGNORE_THE_REST_1te2_99_8360_1_0_0] locked by file_upload_handler PID=-1
15.3.2009 9:34:51|ralph@home|Temporarily failed upload of loopbuild_mamaln_full_hb_t303__IGNORE_THE_REST_1te2_99_8360_1_0_0: transient upload error
15.3.2009 9:34:51|ralph@home|Backing off 2 hr 45 min 43 sec on upload of loopbuild_mamaln_full_hb_t303__IGNORE_THE_REST_1te2_99_8360_1_0_0
What is the problem?
|
|
|
|
|
I have had a good number of ghost work units reportedly downloaded to me but I can't find them.
Same here. I have about 40 WU's that R@H thinks are "in progress", but I never saw them. Something's broken...
==Mike
____________
Don't believe everything you think. |
|
|
|
|
|
Another lockfile problem:
http://ralph.bakerlab.org/result.php?resultid=1358791
I'm still running at 95% CPU time but don't think I enabled graphics at any time for this workunit. |
|
|
|
|
|
this mamaln task was preempted at like 98% complete. When BOINC got back to it, it immediately (16 seconds later) finished. No other suspicious messages about the task.
Status page shows scheduler is active, but I'm getting these when I try to update.
Scheduler request failed: Server returned nothing (no headers, no data)
____________
|
|
|
|
|
|
http://ralph.bakerlab.org/result.php?resultid=1357308
____________
|
|
|
|
|
|
http://ralph.bakerlab.org/result.php?resultid=1357307
http://ralph.bakerlab.org/result.php?resultid=1357306
http://ralph.bakerlab.org/result.php?resultid=1357304
http://ralph.bakerlab.org/result.php?resultid=1357303
http://ralph.bakerlab.org/result.php?resultid=1357302
http://ralph.bakerlab.org/result.php?resultid=1357274
http://ralph.bakerlab.org/result.php?resultid=1357231
____________
|
|
|
|
|
I have had a good number of ghost work units reportedly downloaded to me but I can't find them.
Same here. I have about 40 WU's that R@H thinks are "in progress", but I never saw them.
It just did it to me again. Three WU's completed...37 non-existent ones "in progress". And I've reached my daily "quota".
==Mike
____________
Don't believe everything you think. |
|
|
|
|
|
http://ralph.bakerlab.org/result.php?resultid=1357309
http://ralph.bakerlab.org/result.php?resultid=1357310
http://ralph.bakerlab.org/result.php?resultid=1357311
http://ralph.bakerlab.org/result.php?resultid=1357312
____________
|
|
|
|
|
|
ok my errors maybe because of dep settings i told dep to over bonic see that helps.
____________
|
|
|
|
|
|
First death in like forever ... I have run off a ton of tasks recently and only the one death:
1362077 0x006C43DD write attempt to address 0x00000000
Pretty sure this is an address reported previously ... |
|
|
|
|
|
all my error keep coming from
Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x005286C6 read attempt to address 0x06CA4FF8
____________
|
|
|
|
|
|
Stage "unk"?? So... it's "unknown"? (and truncated?)
The protein seemed to slip off the pane too. I just saw black until it later came in to view.


____________
|
|
|
|
|
http://ralph.bakerlab.org/result.php?resultid=1357309
http://ralph.bakerlab.org/result.php?resultid=1357310
http://ralph.bakerlab.org/result.php?resultid=1357311
http://ralph.bakerlab.org/result.php?resultid=1357312
In each case, when your task failed another task was generated for the work unit, and the next person was able to run it without error.
Seems to point to some problem on your machine. Overclocking? Dust bunnies clogging cooling system? Memory failing?
____________
|
|
|
|
|
|
A couple of recent compute errors on Mac OS X 10.4.11
Workunit 1210811
ERROR: Cannot open PDB file "1a17_201.pdb"
ERROR:: Exit from: src/core/io/pdb/pose_io.cc line: 179
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
Workunit 1210764
RROR: Cannot open PDB file "1ad6_197.pdb"
ERROR:: Exit from: src/core/io/pdb/pose_io.cc line: 179
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
|
|
|
|
|
|
A PDB error this time on windows with this one:
1369045
Cannot open PDB file "1xqo_213.pdb"
ERROR:: Exit from: ..\..\src\core\io\pdb\pose_io.cc line: 179
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish |
|
|
|
|
|
This WU
http://ralph.bakerlab.org/result.php?resultid=1371832
Give me a compute error. |
|
|
|
|
|
Just has a bunch of WU's fail at the start (Mac OS X 10.4.11) all in the same way: sample output below.
1213821
1213853
1213852
1213701
1213693
1213692
1213677
ERROR: [ERROR] Unable to open constraints file: t297_.cst.best.multi
ERROR:: Exit from: src/core/scoring/constraints/ConstraintIO.cc line: 330
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
|
|
|
|
|
|
http://ralph.bakerlab.org/workunit.php?wuid=1185064
One WU seems not willing to upload at all. Others before and after it are
uploading correctly:
13.3.2009 21:38:40|ralph@home|[error] Error reported by file upload server: [loopbuild_mamaln_full_hb_t303__IGNORE_THE_REST_1te2_99_8360_1_0_0] locked by file_upload_handler PID=-1
As a reminder. Still hanging on my upload queue. 12h CPU time on Quad 2.66GHz. Should I cancel this at last, or are you still intereseted on it?
|
|
|
|
|
|
Another lockfile problem:
http://ralph.bakerlab.org/result.php?resultid=1374201
Running at 95% CPU, with BOINC 6.2.28 under Vista SP1 with graphics disabled.
Will try resetting Ralph@home soon. |
|
|
|
|
|
Tried resetting Ralph@home, got these error messages:
3/22/2009 9:04:01 AM|ralph@home|Resetting project
3/22/2009 9:04:06 AM|ralph@home|[error] Couldn't delete file projects/ralph.bakerlab.org/minirosetta_1.58_windows_intelx86.exe
Similar problem with Rosetta@home, except with the 1.54 executable.
I use BOINC 6.2.28 under Vista SP1, with graphics not enabled.
An attempt to manually delete this file failed when I couldn't find the directory containing it, or even anything under the BOINC directory specific to the Ralph@home project.
I intend to leave both Ralph@home and Rosetta@home on no new tasks until I get some usable advice on how to complete the resets. |
|
|
|
|
|
Validate errors
1374039
1374038
1374037
All run for around 5 hours. No doubt the second run will take half the time or less as has happened in previous work units
|
|
|
|
|
|
More on the lockfile problem:
When this problem shows up, expect a few subdirectories of BOINC\slots to have three files each, unrelated to any workunit in progress and including the lockfile. I suspect that Rosetta@home and Ralph@home workunits are unable to run successfully if assigned to any of these slots, even if workunits from other BOINC projects can. Attempts to manually delete these files also fail.
However, the following may have helped for me: Set Rosetta@home and/or Ralph@home to no new tasks and wait until all tasks for either of them complete. Do an update for either that has tasks not reported yet. Suspend all workunits and network activity. Shut down the BOINC client, then find process boinc.exe and kill it. Reboot. If these subdirectories of BOINC\slots have disappeared, enable network activity and do another reset on Rosetta@home and/or Ralph@home. If these resets complete without error messages, it's safe to resume activity on any other BOINC projects, then allow new tasks on Rosetta@home and/or Ralph@home.
However, I haven't completed any workunits for either Rosetta@home or Ralph@home since doing this, so it will be at least tomorrow before I can check if this actually took care of the lockfile problem, at least temporarily.
I wouldn't be surprised if this procedure includes some unneccessary steps, but wanted to report this before any effort to find out. |
|
|
|
|
|
Didn't help enough - the first Rosetta@home 1.54 workunit completed after the above procedure had the lockfile problem again, but two more started since then and not complete yet haven't had that problem yet.
Suggestion: Modify minirosetta so that it checks for a lockfile as it starts up, preferably before trying to create one, and if this first check finds a lockfile, reduce the number of times minirosetta is allowed to restart before it is able to write the first checkpoint.
Suggestion: Modify minirosetta so that it reports which slot it was run under if it is able to do this, since the problem looks likely to repeat for any minirosetta workunit run in a slot where a previous workunit's lockfile was not erased when the previous workunit completed and was reported.
Suggestion: Check the procedure used for failed workunits to see if it leaves a lockfile behind after abandoning efforts to restart the workunit.
Suggestion: Check what program is supposed to delete the lockfiles for workunits that have been completed and reported.
Suggestion: Check if BOINC allows any way to request that a workunit be restarted, but in a different slot.
Suggestion: If BOINC is supposed to clean up the slots after workunits complete and are reported, check if BOINC 6.2.28 is known to have any problems with doing this.
I haven't had any 1.58 workunits since trying the procedure, so I don't know whether these continued problems also apply to 1.58.
I often let BOINC run for a few days between reboots.
I still use BOINC 6.2.28 under Vista SP1, with 95% CPU time. |
|
|
|
|
|
Robert,
Thanks for all the work on the lock file ... I hope we can figure out what is going on with this ...
On my part I have a Validate error though the task seems to have failed with another error that did not get reported as an error:
ERROR: dis==0 in pairtermderiv!
ERROR:: Exit from: src/core/scoring/methods/PairEnergy.cc line: 338
What does this mean? Beats me ... |
|
|
|
|
|
New error:
ERROR: aFrame->nr_frags()
ERROR:: Exit from: ..\..\src\core\fragment\FragSet.cc line: 168
|
|
|
|
|
|
More problems on Mac O S X 10.4.11
WU's 1376869,1376870,1376871 failed: see below
ERROR: Conformation: fold_tree nres should match conformation nres. conformation nres: 137 fold_tree nres: 156589050
ERROR:: Exit from: src/core/conformation/Conformation.cc line: 224
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
</stderr_txt>
|
|
|
|
|
|
Thanks for your reporting. Some input and output files were not compressed properly for the WUs ending with "BOINC_MPZN_with_zinc_loop_modeling" and therefore caused pre-matured failures/exits. Sorry about it.
More problems on Mac O S X 10.4.11
WU's 1376869,1376870,1376871 failed: see below
ERROR: Conformation: fold_tree nres should match conformation nres. conformation nres: 137 fold_tree nres: 156589050
ERROR:: Exit from: src/core/conformation/Conformation.cc line: 224
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
</stderr_txt>
|
|
|
|
|
|
It seems that the work units that I downloaded this morning have an incomplete nomenclature. They are missing the final _0 or _1 that indicates whether it is a first or second attempt. |
|
|
|
|
It seems that the work units that I downloaded this morning have an incomplete nomenclature. They are missing the final _0 or _1 that indicates whether it is a first or second attempt.
Correction! They are correct on the task list but missing on the work details on the website |
|
|
|
|
|
This workunit 1223308 gave a Validate Error on Mac: it claimed to generate 99 decoys from 99 attempts in 12 minutes. Seem unlikely.
The end of stderr output
Starting work on structure: _1UFBA_5_00097
Starting work on structure: _1UFBA_5_00098
Starting work on structure: _1UFBA_5_00099
======================================================
DONE :: 1 starting structures 782.2 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================
BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish
</stderr_txt> |
|
|
|
|
|
Another unzipping issue with workunit 1223005 on Mac
Unpacking zip data: ../../projects/ralph.bakerlab.org/minirosetta_database_rev26003.zip
Unpacking WU data ...
Unpacking data: ../../projects/ralph.bakerlab.org/frb_0_8_el_chosen.foldcst_chunk_general_cf.t325_.mtyka.boinc_files.zip
Setting database description ...
Setting up checkpointing ...
Setting up folding (abrelax) ...
ERROR: ERROR: FragmentIO: could not open file aa9mer.1_3.gz
ERROR:: Exit from: src/core/fragment/FragmentIO.cc line: 245
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
</stderr_txt>
]]>
|
|
|
|
|
|
error after few seconds on windows XP.
Also the other WU gives an error.
Is the same error already posted for Mac:
ERROR: ERROR: FragmentIO: could not open file aa9mer.1_3.gz |
|
|
|
|
|
Several incidents of the error reported by svincent below,
ERROR: ERROR: FragmentIO: could not open file aa9mer.1_3.gz
ERROR:: Exit from: src/core/fragment/FragmentIO.cc line: 245
Task IDs:
1378040
1378221
1384682
1379800
I noted on at least two of them that the other wingman also had the task fail ... configuration issue?
Another error in this latest batch is:
ERROR: aFrame->nr_frags()
ERROR:: Exit from: ..\..\src\core\fragment\FragSet.cc line: 168
Task ID: 1376838
The only good news I suppose is that the failures happen almost right away... |
|
|
|
|
|
compute error with:
1390773
1390766
both with message:
ERROR: ERROR: FragmentIO: could not open file aa9mer.1_3.gz
ERROR:: Exit from: ..\..\src\core\fragment\FragmentIO.cc line: 245
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
looks a similar fault to some already posted
|
|
|
|
|
|
Some WUs take longer to complete than the default runtime.
I'm not sure, but it seems that are all frb_1_8_template_enriched_hb_t286.
For exemple this one. link |
|
|
|
|
|
@ Manuel Lupotto: Yes I also had 2 WU starting with: frb_1_8_ which took over 2 hours to complete a single decoy.
frb_1_8_el_chosen_hb_t286__SAVE_ALL_OUT_IGNORE_THE_REST_1ESCA_11_8901_1_0
frb_1_8_bestfrag_hb_t297__SAVE_ALL_OUT_IGNORE_THE_REST_1VJGA_10_8858_1_0
Not a real problem I think, since Rosetta@home has a 3 hour default runtime.
My last WU ended with an Unhandled Exception Detected:
lb_save_all_out_hb_t369__SAVE_ALL_OUT_1HHSA_3_8759_1_1
I was the second one to crunch this WU, the first time it ended with the same error.
Have a nice day,
Path7.
|
|
|
|
|
compute error with:
1390773
1390766
both with message:
ERROR: ERROR: FragmentIO: could not open file aa9mer.1_3.gz
ERROR:: Exit from: ..\..\src\core\fragment\FragmentIO.cc line: 245
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
looks a similar fault to some already posted
Also had the same/similar error on Result 1391999
Result 1394906
ERROR: ERROR: FragmentIO: could not open file aa9mer.1_3.gz
ERROR:: Exit from: src/core/fragment/FragmentIO.cc line: 245
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
____________
 |
|
|
|
|
|
Seven days and no bad tasks? I know we don't get that many total ... but ... isn't about time to promote 1.58 to operational while we chase these last buglets?
I know 1.54 is decent, but, 1.58 is marginally better in stability ...
What about it guys? |
|
|
|
|
|
I agree. On the 10th of March I asked when it was going over to the main project, I'm yet to get a answer |
|
|
|
|
I agree. On the 10th of March I asked when it was going over to the main project, I'm yet to get a answer
They haven't said anything about the bad tasks in like forever also ... |
|
|
|
|
|
This task seemed to have hung, tried graphics on it and got a black window. The window would not close so got a GPF for my troubles. The task seems to have "hung" after that and went into high priority mode and no advance on the percentage complete so I shot it. |
|
|
|
|
|
1utg__BOINC_ABINITIO_IGNORE_THE_REST-MOO18--1utg_-_9087_1_0 died with a computation error after 4201 seconds.
Error -1073741819 (0xffffffffc0000005)
it looks to have completed or at least started work on 17 models before crashing.
|
|
|
|
|
|
I've had a number of WUs with names like this:
rest3d85_ip40_2g3r.patchdock.3.pdb_0002_fa_dock.xml_score12_pert38_DOCK_9104
All 4 ran through 99 models in just 3hours. That will throw off work fetch for folks on Rosetta with longer runtime preference.
____________
|
|
|
|
|
|
These 3 tasks, all named broker_lb_test2_hb*, failed on Mac after apparent successful completion due to some file error.
1421321
1421322
1421323
</stderr_txt>
<message>
<file_xfer_error>
<file_name>broker_lb_test2_hb_t363__IGNORE_THE_REST_9214_1_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
|
|
|
|
|
|
This task crashed on Mac with a segmentation violation.
1421317
Crash log is in the task file: here's the first bit:
Thread 0 Crashed:
0 ...etta_1.59_i686-apple-darwin 0x009579fe __ZNK4core7scoring7methods10VDW_Energy19residue_pair_energyERKNS_12conformation7ResidueES6_RKNS_4pose4PoseERKNS0_13ScoreFunctionERNS0_17TwoBodyEMapVectorE + 1534
1 ...etta_1.59_i686-apple-darwin 0x00189b81 __ZNK4core7scoring13ScoreFunctionclERNS_4pose4PoseE + 5171
2 ...etta_1.59_i686-apple-darwin 0x004e9985 __ZN9protocols8abinitio12AbrelaxMover5applyERN4core4pose4PoseE + 5993
3 ...etta_1.59_i686-apple-darwin 0x004d0a99 __ZN9protocols3jd214JobDistributor2goEN7utility7pointer10owning_ptrINS_5moves5MoverEEE + 4041
4 ...etta_1.59_i686-apple-darwin 0x00ace519 __ZN9protocols3jd219BOINCJobDistributor2goEN7utility7pointer10owning_ptrINS_5moves5MoverEEE + 41
5 ...etta_1.59_i686-apple-darwin 0x0010f73c __ZN9protocols8abinitio11Broker_mainEv + 812
6 ...etta_1.59_i686-apple-darwin 0x0000402c _main + 2532
7 ...etta_1.59_i686-apple-darwin 0x00001eee __start + 216
8 ...etta_1.59_i686-apple-darwin 0x00001e15 start + 41
|
|
|
|
|
|
I now have a minirosetta 1.59 workunit. Is it time to create a new thread for 1.59? |
|
|
|
|
|
On this 1.59 workunit, I ran into the lockfile problem on structure _U16X13X_00019, but my wingman chose a shorter workunit length and therefore didn't even try that structure:
http://ralph.bakerlab.org/result.php?resultid=1422995
http://ralph.bakerlab.org/workunit.php?wuid=1260423
I use BOINC 6.2.28 under 32-bit Vista SP1 on that machine.
Although my machine still uses settings intended to check for the lockfile problem, I'm having to reboot my machine more often to get past problems with the router I'm using to allow a recently installed newer computer to reach the internet, and therefore less likely to actually see such problems. |
|
|