RALPH@home

minirosetta 1.58

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search

Message boards : RALPH@home bug list : minirosetta 1.58

AuthorMessage
mtyka
Forum moderator
Project developer
Project scientist

Joined: Mar 19 08
Posts: 79
ID: 4144
Credit: 0
RAC: 0
Message 4642 - Posted 1 Feb 2009 23:33:22 UTC

    Hopefully all the graphics work now (no more blackout current windows).

    As always, your feedback is highly appreciated !

    Mike

    Profile robertmiles

    Joined: Jan 13 09
    Posts: 79
    ID: 5137
    Credit: 246,177
    RAC: 399
    Message 4644 - Posted 1 Feb 2009 23:53:12 UTC - in response to Message 4642.

      In the Message Boards: RALPH@home bug list forum, you might want to check if some of the older threads still need stickies, such as:

      Bug Reports for Ralph Server Update to BOINC version 5.9.2

      Profile robertmiles

      Joined: Jan 13 09
      Posts: 79
      ID: 5137
      Credit: 246,177
      RAC: 399
      Message 4648 - Posted 2 Feb 2009 6:44:05 UTC - in response to Message 4642.

        Looks like you don't have a link to the 1.58 bugs thread on your home page yet.

        Profile feet1st

        Joined: Mar 7 06
        Posts: 312
        ID: 1028
        Credit: 110,522
        RAC: 0
        Message 4654 - Posted 2 Feb 2009 20:44:31 UTC

          Last modified: 2 Feb 2009 20:48:31 UTC

          I finally decided to take the time to report this petty issue. Note that I didn't have a v1.58 task to take the screen shot from at the time, but seems to be the same scaling issue there from what I've seen so far. This has been the case for quite some time in the graphic. At least on my machines. The specific machine, Rosetta version, task type all seem to have the same problem.

          I use Windows XP, and display resolution of 1024 x 768 which is pretty standard I believe. By default, when I open the graphic of a running task, it comes up to a less then full screen window (see image below). And if you note the title bar seems to be scaled in as part of the low energy and the RMSD, perhaps all boxes on the top half of the graphic. The low energy box and the Native box should have the same displayable height. But they don't.

          As this one ran on 1.56, it wasn't showing an RMSD history at all. But if it were, you'd see it take longer to appear in the display then the energy graph. And if you maximize to full screen, you will see more of the RMSD history (the title bar being scaled to less of the window leaves more room to show the RMSD), then with the original window size.

          Let me know if I'm the lone wolf on this issue or what further video or other details may be relevant.


          ____________

          I _ quit

          Joined: Jan 13 09
          Posts: 44
          ID: 5136
          Credit: 88,562
          RAC: 0
          Message 4655 - Posted 2 Feb 2009 21:58:17 UTC - in response to Message 4654.

            Last modified: 2 Feb 2009 21:58:39 UTC

            Feet1st, both ralph and rosie open the graphics in a reduced window on my system and i am running wide screen 1680 x1050.

            once i click on the the windows box to bring it to full screen it does.

            Profile feet1st

            Joined: Mar 7 06
            Posts: 312
            ID: 1028
            Credit: 110,522
            RAC: 0
            Message 4656 - Posted 2 Feb 2009 22:30:33 UTC

              "...it does" what when maximized? Scale properly? My "square" boxes still aren't square, they are short by the thickness of the titlebar.
              ____________

              Aegis Maelstrom

              Joined: Jan 19 09
              Posts: 12
              ID: 5159
              Credit: 4,751
              RAC: 0
              Message 4657 - Posted 2 Feb 2009 23:02:31 UTC

                Last modified: 2 Feb 2009 23:07:24 UTC

                @Feel1st: Hi there, I haven't noticed it before but I have the same on my widescreen, Win XP SP2, task lr6_E_score12.

                It looks as if the top of the Ralph graphics is consumed by the title (usually blue) bar of the viewer application.

                The same model when displayed as a screensaver (without the title bar) looks O.K.

                Evan

                Joined: Dec 23 07
                Posts: 75
                ID: 3893
                Credit: 69,584
                RAC: 0
                Message 4658 - Posted 2 Feb 2009 23:11:41 UTC - in response to Message 4654.

                  Feet1st I am also using XP with the same resolution. On full screen the native box is approximately 6 cm high while the low energy box is about 5 cm. The energy graph takes a fair time to appear from the left and the rmsd graph takes even longer as you have described. It is probably being exemplified because I am going through some slow moving proteins (csttest)on 1.57.

                  I _ quit

                  Joined: Jan 13 09
                  Posts: 44
                  ID: 5136
                  Credit: 88,562
                  RAC: 0
                  Message 4659 - Posted 3 Feb 2009 1:06:44 UTC

                    Last modified: 3 Feb 2009 1:14:15 UTC

                    i see in the rosie graphics the same size of boxes for low energy and native models as those here on ralph. 6cm for low energy and 7cm for native(not including the extra mm's) on both rosie and here on ralph. the low energy model here on ralph looks a bit close to the top of the box vs rosie. but then again its hard to compare two different proteins and their rotations to each other.

                    i am noticing that the top of the model for low energy is getting clipped by the title bar here on ralph. this is in the 1.54 rosie and the 1.56 ralph. I also started my first 1.58 and see the same dimensions and the same issue that the top of whatever strand is at the top of the low energy box gets a few mm's cut off of it if it happens to be a ribbon strand that is in a long vertical loop. in stick form the model fits into the box. box size did not change.

                    Profile robertmiles

                    Joined: Jan 13 09
                    Posts: 79
                    ID: 5137
                    Credit: 246,177
                    RAC: 399
                    Message 4681 - Posted 11 Feb 2009 12:47:19 UTC - in response to Message 4642.

                      1.58 errors are still being reported in the thread for 1.55 errors, since there's no reference to the 1.58 errors thread on the home page.

                      Profile Conan
                      Avatar

                      Joined: Feb 16 06
                      Posts: 344
                      ID: 145
                      Credit: 1,325,239
                      RAC: 792
                      Message 4685 - Posted 12 Feb 2009 3:51:02 UTC

                        Last modified: 12 Feb 2009 3:55:55 UTC

                        Have just had 9 errors for 1.58 with this error

                        process exited with code 1 (0x1, -255)
                        </message>
                        <stderr_txt>
                        [2009- 2-11 13:51:53:] :: BOINC:: Initializing ... ok.
                        [2009- 2-11 13:51:53:] :: BOINC :: boinc_init()
                        BOINC:: Setting up shared resources ... ok.
                        BOINC:: Setting up semaphores ... ok.
                        BOINC:: Updating status ... ok.
                        BOINC:: Registering timer callback... ok.
                        BOINC:: Worker initialized successfully.
                        Registering options..
                        Trying to access options object.
                        Success.
                        src/protocols/abinitio/AbrelaxApplication.cc:217
                        src/protocols/abinitio/AbrelaxApplication.cc:237
                        src/protocols/abinitio/AbrelaxApplication.cc:295
                        src/protocols/abinitio/AbrelaxApplication.cc:317
                        src/protocols/abinitio/AbrelaxApplication.cc:324
                        src/protocols/abinitio/AbrelaxApplication.cc:326
                        src/protocols/abinitio/AbrelaxApplication.cc:328
                        src/protocols/abinitio/AbrelaxApplication.cc:330
                        src/protocols/abinitio/AbrelaxApplication.cc:335
                        Registered extra options.
                        Initializing core...
                        Initializing options.... ok
                        Options::initialize()
                        Options::adding_options()
                        Options::initialize() Check specs.
                        Options::initialize() End reached
                        Loaded options.... ok
                        Processed options.... ok
                        Initializing random generators... ok
                        Initialization complete.
                        Setting WU description ...
                        Unpacking zip data: ../../projects/ralph.bakerlab.org/minirosetta_database_rev26003.zip
                        Unpacking WU data ...
                        Unpacking data: ../../projects/ralph.bakerlab.org/cst_1_8_nativecst_b1.0_cen_0.1.foldcst_chunk_general.t373_.mtyka.boinc_files.zip
                        Setting database description ...
                        Setting up checkpointing ...
                        Setting up folding (abrelax) ...

                        ERROR: [ERROR] Unable to open constraints file: .pdb.distances.csts.bounded_1.0
                        ERROR:: Exit from: src/core/scoring/constraints/ConstraintIO.cc line: 330
                        BOINC:: Error reading and gzipping output datafile: default.out
                        called boinc_finish

                        they are dying almost as soon as they start (after 14 to 20 minutes).

                        See this result as an example.
                        ____________

                        Profile robertmiles

                        Joined: Jan 13 09
                        Posts: 79
                        ID: 5137
                        Credit: 246,177
                        RAC: 399
                        Message 4686 - Posted 12 Feb 2009 16:20:10 UTC - in response to Message 4642.

                          Last modified: 12 Feb 2009 16:22:56 UTC

                          This 1.58 workunit:

                          http://ralph.bakerlab.org/workunit.php?wuid=1151935

                          keeps giving me error messages like these:

                          2/12/2009 9:34:00 AM|ralph@home|Task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 exited with zero status but no 'finished' file
                          2/12/2009 9:34:00 AM|ralph@home|If this happens repeatedly you may need to reset the project.
                          2/12/2009 9:34:00 AM|ralph@home|Restarting task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 using minirosetta version 158
                          2/12/2009 9:34:41 AM|ralph@home|Task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 exited with zero status but no 'finished' file
                          2/12/2009 9:34:41 AM|ralph@home|If this happens repeatedly you may need to reset the project.
                          2/12/2009 9:34:41 AM|ralph@home|Restarting task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 using minirosetta version 158
                          2/12/2009 9:35:22 AM|ralph@home|Task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 exited with zero status but no 'finished' file
                          2/12/2009 9:35:22 AM|ralph@home|If this happens repeatedly you may need to reset the project.
                          2/12/2009 9:35:23 AM|ralph@home|Restarting task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 using minirosetta version 158
                          2/12/2009 9:36:04 AM|ralph@home|Task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 exited with zero status but no 'finished' file
                          2/12/2009 9:36:04 AM|ralph@home|If this happens repeatedly you may need to reset the project.
                          2/12/2009 9:36:04 AM|ralph@home|Restarting task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 using minirosetta version 158
                          2/12/2009 9:36:45 AM|ralph@home|Task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 exited with zero status but no 'finished' file
                          2/12/2009 9:36:45 AM|ralph@home|If this happens repeatedly you may need to reset the project.
                          2/12/2009 9:36:45 AM|ralph@home|Restarting task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 using minirosetta version 158
                          2/12/2009 9:37:26 AM|ralph@home|Task loopbuild_fcst_hb_t374__IGNORE_THE_REST_1VHSA_6_7770_3_0 exited with zero status but no 'finished' file
                          2/12/2009 9:37:26 AM|ralph@home|If this happens repeatedly you may need to reset the project.

                          Also, when I start its graphics window within the Simple View, the window comes up bit with solid black where the graphics should be. If I hover my cursor over this window, I get the circle indicating that it is waiting for something, although without indicating what. If I then try to close the graphics window, the graphics program doesn't respond to this attempt, and must be aborted. The CPU time used so far is frozen at 0:26:55, and this workunit doesn't seem to be using the CPU even though it's listed as Running. If I try to start the graphics from within the Tasks tab of the Advanced View, the results are similar except that after a second or so, there's a white block near the center of the black graphics window.

                          Since I've finally succeeded in setting the percentage of CPU time used to 90% instead of 100%, this could a factor in triggering the problem.

                          During this time, the task monitor shows one CPU core running normally, and different views disagree on how the other is running except that they all agree that it's using significantly less than the requested 90% of the CPU time. An actual usage of about 30% is typical for these views, but one shows about 50% instead.

                          This workunit looks like it needs to be aborted, but is there something else I need to try first?

                          I'm using BOINC 6.2.28 under Vista SP1.

                          Profile robertmiles

                          Joined: Jan 13 09
                          Posts: 79
                          ID: 5137
                          Credit: 246,177
                          RAC: 399
                          Message 4688 - Posted 12 Feb 2009 17:55:05 UTC - in response to Message 4686.

                            This workunit went into Compute error status before anyone replied, so I decided to return it with an Update.

                            It had the lockfile problem, repeatedly, but didn't make any information about that visible to the user before the workunit finished.

                            If someone else finishes that workunit with a success, could you ask him or her whether he or she is using 100% CPU?

                            Evan

                            Joined: Dec 23 07
                            Posts: 75
                            ID: 3893
                            Credit: 69,584
                            RAC: 0
                            Message 4689 - Posted 12 Feb 2009 18:35:58 UTC

                              This one: 1150435 was a long one taking about 5 hours completing one decoy but given a validate error. I notice that its second attempt ended in immediate failure.

                              Also is it my imagination or are these units (cst__1_8_nativecst_b1.0_cen_0.1_hb_t313 etc) using more ram?

                              Profile Paul D. Buck

                              Joined: Jan 14 09
                              Posts: 62
                              ID: 5139
                              Credit: 33,293
                              RAC: 0
                              Message 4692 - Posted 12 Feb 2009 20:46:09 UTC - in response to Message 4686.

                                This 1.58 workunit:

                                http://ralph.bakerlab.org/workunit.php?wuid=1151935


                                It looks like the second try it ran to completion ...

                                It looks to me like my assertion that running at less than 100% CPU causes issues ... like the "can't acquire lockfile" error ...

                                Profile Paul D. Buck

                                Joined: Jan 14 09
                                Posts: 62
                                ID: 5139
                                Credit: 33,293
                                RAC: 0
                                Message 4693 - Posted 13 Feb 2009 7:41:22 UTC

                                  Last modified: 13 Feb 2009 7:45:04 UTC

                                  New error ... random, many tasks work, but I have at least 5 with this error:

                                  ERROR: [ERROR] Error opening RBSeg file 'core_1VKBA_18_noloop_loops.txt'
                                  ERROR:: Exit from: src/protocols/loops/LoopClass.cc line: 443
                                  BOINC:: Error reading and gzipping output datafile: default.out
                                  called boinc_finish


                                  Not sure why the mix of success and failure ... the failures:

                                  1308740
                                  1308739
                                  1307455
                                  1307355
                                  1307354

                                  Path7

                                  Joined: Feb 11 08
                                  Posts: 56
                                  ID: 4036
                                  Credit: 4,974
                                  RAC: 0
                                  Message 4694 - Posted 13 Feb 2009 21:09:48 UTC

                                    Hello all,

                                    Having the same kind of error as Paul D. Buck:

                                    ERROR: [ERROR] Error opening RBSeg file 'core_1B70B_12_noloop_loops.txt'
                                    ERROR:: Exit from: ..\..\src\protocols\loops\LoopClass.cc line: 443
                                    BOINC:: Error reading and gzipping output datafile: default.out
                                    called boinc_finish

                                    loopbuild_chunk_cheat_3_5_hb_t306__IGNORE_THE_REST_1B70B_12_7794_1_0

                                    Windows XP – Boinc 5.10.45

                                    This WU had the same error again when it was returned by another computer.

                                    Have a nice day,
                                    Path7.

                                    AdeB
                                    Avatar

                                    Joined: Dec 22 07
                                    Posts: 61
                                    ID: 3888
                                    Credit: 121,238
                                    RAC: 465
                                    Message 4695 - Posted 13 Feb 2009 21:26:36 UTC - in response to Message 4694.

                                      Hello,

                                      And the same error here: loopbuild_chunk_2_7_hb_t332__IGNORE_THE_REST_1X7OA_7_7809_1_0

                                      ERROR: [ERROR] Error opening RBSeg file 'core_1X7OA_7_noloop_loops.txt'
                                      ERROR:: Exit from: src/protocols/loops/LoopClass.cc line: 443
                                      BOINC:: Error reading and gzipping output datafile: default.out
                                      called boinc_finish


                                      Gentoo linux - Boinc 5.10.45

                                      AdeB

                                      Profile Paul D. Buck

                                      Joined: Jan 14 09
                                      Posts: 62
                                      ID: 5139
                                      Credit: 33,293
                                      RAC: 0
                                      Message 4696 - Posted 14 Feb 2009 10:02:43 UTC

                                        Also had one repetition of the line 330 error:


                                        ERROR: [ERROR] Unable to open constraints file: .pdb.distances.csts.bounded_1.0
                                        ERROR:: Exit from: src/core/scoring/constraints/ConstraintIO.cc line: 330
                                        BOINC:: Error reading and gzipping output datafile: default.out


                                        Task 1309122

                                        Just as important, have also had quite a few successes too ... though not as good as 1.54 running on Rosetta where I have not had an error in a couple weeks ...

                                        Hmm, was this just a reissue of the task that should have been canceled? I see that the wingman also had the same error ...

                                        Profile robertmiles

                                        Joined: Jan 13 09
                                        Posts: 79
                                        ID: 5137
                                        Credit: 246,177
                                        RAC: 399
                                        Message 4697 - Posted 14 Feb 2009 16:52:36 UTC - in response to Message 4692.

                                          This 1.58 workunit:

                                          http://ralph.bakerlab.org/workunit.php?wuid=1151935


                                          It looks like the second try it ran to completion ...

                                          It looks to me like my assertion that running at less than 100% CPU causes issues ... like the "can't acquire lockfile" error ...


                                          Mike, have you thought of adding some debug code to the parts of the next version of minirosetta that have anything to do with the lockfile, and occasionally recording the percentage of the CPU time BOINC runs?

                                          Profile robertmiles

                                          Joined: Jan 13 09
                                          Posts: 79
                                          ID: 5137
                                          Credit: 246,177
                                          RAC: 399
                                          Message 4698 - Posted 14 Feb 2009 19:05:12 UTC - in response to Message 4697.

                                            This 1.58 workunit:

                                            http://ralph.bakerlab.org/workunit.php?wuid=1151935


                                            It looks like the second try it ran to completion ...

                                            It looks to me like my assertion that running at less than 100% CPU causes issues ... like the "can't acquire lockfile" error ...


                                            Mike, have you thought of adding some debug code to the parts of the next version of minirosetta that have anything to do with the lockfile, and occasionally recording the percentage of the CPU time BOINC runs?


                                            Mike,

                                            Also, debug code for any parts that can do an exit from the program without setting the status.

                                            Some of the messages so far from a workunit likely to need this debug code to determine just what's going on:

                                            2/14/2009 12:29:53 PM|ralph@home|Task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 exited with zero status but no 'finished' file
                                            2/14/2009 12:29:53 PM|ralph@home|If this happens repeatedly you may need to reset the project.
                                            2/14/2009 12:29:53 PM|ralph@home|Restarting task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 using minirosetta version 158
                                            2/14/2009 12:30:34 PM|ralph@home|Task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 exited with zero status but no 'finished' file
                                            2/14/2009 12:30:34 PM|ralph@home|If this happens repeatedly you may need to reset the project.
                                            2/14/2009 12:30:35 PM|ralph@home|Restarting task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 using minirosetta version 158
                                            2/14/2009 12:31:16 PM|ralph@home|Task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 exited with zero status but no 'finished' file
                                            2/14/2009 12:31:16 PM|ralph@home|If this happens repeatedly you may need to reset the project.
                                            2/14/2009 12:31:16 PM|ralph@home|Restarting task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 using minirosetta version 158
                                            2/14/2009 12:31:57 PM|ralph@home|Task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 exited with zero status but no 'finished' file
                                            2/14/2009 12:31:57 PM|ralph@home|If this happens repeatedly you may need to reset the project.
                                            2/14/2009 12:31:57 PM|ralph@home|Restarting task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 using minirosetta version 158
                                            2/14/2009 12:32:38 PM|ralph@home|Task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 exited with zero status but no 'finished' file
                                            2/14/2009 12:32:38 PM|ralph@home|If this happens repeatedly you may need to reset the project.
                                            2/14/2009 12:32:39 PM|ralph@home|Restarting task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 using minirosetta version 158
                                            2/14/2009 12:33:20 PM|ralph@home|Task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 exited with zero status but no 'finished' file
                                            2/14/2009 12:33:20 PM|ralph@home|If this happens repeatedly you may need to reset the project.
                                            2/14/2009 12:33:20 PM|ralph@home|Restarting task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 using minirosetta version 158
                                            2/14/2009 12:34:01 PM|ralph@home|Task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 exited with zero status but no 'finished' file
                                            2/14/2009 12:34:01 PM|ralph@home|If this happens repeatedly you may need to reset the project.
                                            2/14/2009 12:34:01 PM|ralph@home|Restarting task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 using minirosetta version 158
                                            2/14/2009 12:34:42 PM|ralph@home|Task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 exited with zero status but no 'finished' file
                                            2/14/2009 12:34:42 PM|ralph@home|If this happens repeatedly you may need to reset the project.
                                            2/14/2009 12:34:42 PM|ralph@home|Restarting task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 using minirosetta version 158
                                            2/14/2009 12:35:23 PM|ralph@home|Task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 exited with zero status but no 'finished' file
                                            2/14/2009 12:35:23 PM|ralph@home|If this happens repeatedly you may need to reset the project.
                                            2/14/2009 12:35:23 PM|ralph@home|Restarting task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 using minirosetta version 158
                                            2/14/2009 12:36:04 PM|ralph@home|Task loopbuild_chunk_2_7_B_hb_t327__IGNORE_THE_REST_1Z7UA_9_7851_1_0 exited with zero status but no 'finished' file
                                            2/14/2009 12:36:04 PM|ralph@home|If this happens repeatedly you may need to reset the project.


                                            http://ralph.bakerlab.org/workunit.php?wuid=1156395

                                            04:40:56 CPU so far with 6 hours requested, and no longer changing even while the workunit is running. Task Manager indicates that it is using 0% CPU time.

                                            This is with a 1.58 workunit at 90% CPU, under BOINC 6.2.28 with 32-bit Vista SP1 with a dual-core AMD CPU. I don't know if it's significant that I only saw this problem after enabling the graphics for a few minutes, even though I normally keep it disabled. The graphics looked reasonable, though.

                                            Profile robertmiles

                                            Joined: Jan 13 09
                                            Posts: 79
                                            ID: 5137
                                            Credit: 246,177
                                            RAC: 399
                                            Message 4701 - Posted 14 Feb 2009 20:52:51 UTC - in response to Message 4698.

                                              http://ralph.bakerlab.org/workunit.php?wuid=1156395

                                              04:40:56 CPU so far with 6 hours requested, and no longer changing even while the workunit is running. Task Manager indicates that it is using 0% CPU time.

                                              This is with a 1.58 workunit at 90% CPU, under BOINC 6.2.28 with 32-bit Vista SP1 with a dual-core AMD CPU. I don't know if it's significant that I only saw this problem after enabling the graphics for a few minutes, even though I normally keep it disabled. The graphics looked reasonable, though.


                                              Now that workunit has ended with a Computation error after a lot of these messages (not visible until the workunit ends):

                                              [2009- 2-14 11:55:25:] :: BOINC:: Initializing ... ok.
                                              Can't acquire lockfile - exiting
                                              [2009- 2-14 11:56: 7:] :: BOINC:: Initializing ... ok.
                                              Can't acquire lockfile - exiting
                                              [2009- 2-14 11:56:48:] :: BOINC:: Initializing ... ok.
                                              Can't acquire lockfile - exiting
                                              [2009- 2-14 11:57:29:] :: BOINC:: Initializing ... ok.
                                              Can't acquire lockfile - exiting
                                              [2009- 2-14 11:58:11:] :: BOINC:: Initializing ... ok.
                                              Can't acquire lockfile - exiting
                                              [2009- 2-14 11:58:52:] :: BOINC:: Initializing ... ok.
                                              Can't acquire lockfile - exiting
                                              [2009- 2-14 11:59:33:] :: BOINC:: Initializing ... ok.
                                              Can't acquire lockfile - exiting
                                              [2009- 2-14 12: 0:14:] :: BOINC:: Initializing ... ok.
                                              Can't acquire lockfile - exiting
                                              [2009- 2-14 12: 0:56:] :: BOINC:: Initializing ... ok.
                                              Can't acquire lockfile - exiting

                                              Hope these results are at least useful in tracking down the lockfile problem.

                                              Profile robertmiles

                                              Joined: Jan 13 09
                                              Posts: 79
                                              ID: 5137
                                              Credit: 246,177
                                              RAC: 399
                                              Message 4702 - Posted 14 Feb 2009 21:01:15 UTC - in response to Message 4701.

                                                I tried resetting the Ralph@home project as the messages suggested, and got these messages as a result:

                                                2/14/2009 2:56:12 PM|ralph@home|Resetting project
                                                2/14/2009 2:56:18 PM|ralph@home|[error] Couldn't delete file projects/ralph.bakerlab.org/minirosetta_1.58_windows_intelx86.exe
                                                2/14/2009 2:56:33 PM|ralph@home|Sending scheduler request: To fetch work. Requesting 3853 seconds of work, reporting 0 completed tasks
                                                2/14/2009 2:56:38 PM|ralph@home|Scheduler request succeeded: got 0 new tasks

                                                Looks like there's also a problem in your reset procedure.

                                                Profile Paul D. Buck

                                                Joined: Jan 14 09
                                                Posts: 62
                                                ID: 5139
                                                Credit: 33,293
                                                RAC: 0
                                                Message 4703 - Posted 14 Feb 2009 21:49:21 UTC - in response to Message 4701.

                                                  I don't know if it's significant that I only saw this problem after enabling the graphics for a few minutes, even though I normally keep it disabled. The graphics looked reasonable, though.


                                                  Ha! There may be the clue I was missing!

                                                  I don't recall if I had been looking at the graphics or not. But it is likely when I saw a failure. Until I gave up in disgust.

                                                  Though there are no tasks perhaps the test would be to run a few tasks with no looking and some where you look at the graphics.

                                                  I am not sure why the launching of the graphics application would cause this issue but this could be the missing clue ... and why I never saw the issue in Einstein even when I had the setting that caused this issue in Rosetta ...

                                                  MOST interesting is that you can launch graphics at 100% and have no issue. But, that the switch to pause the application would cause it.

                                                  Oh, and if you don't want to lose that much CPU you can use 99% like I did and get the same effect ...

                                                  Profile robertmiles

                                                  Joined: Jan 13 09
                                                  Posts: 79
                                                  ID: 5137
                                                  Credit: 246,177
                                                  RAC: 399
                                                  Message 4704 - Posted 14 Feb 2009 23:47:10 UTC - in response to Message 4703.

                                                    Oh, and if you don't want to lose that much CPU you can use 99% like I did and get the same effect ...


                                                    I tried 99% for a while but had two problems with this setting:

                                                    1. Problems making this setting reduce the CPU percentage at all - now fixed.

                                                    2. Problems getting Task Manager to show me such small gaps in CPU usage.

                                                    I may try again soon at 95%, though.



                                                    Mike, in order to save time in testing, you may want to consider these ideas:

                                                    1. Try to send a larger share of any workunits aimed at the lockfile problem to machines known to have had these problems recently.

                                                    2. If these same machines get a workunit aimed at testing anything else, immediately put a copy of that workunit back on the queue to be sent to machines not in this group.

                                                    Profile robertmiles

                                                    Joined: Jan 13 09
                                                    Posts: 79
                                                    ID: 5137
                                                    Credit: 246,177
                                                    RAC: 399
                                                    Message 4705 - Posted 15 Feb 2009 4:15:04 UTC - in response to Message 4702.

                                                      Last modified: 15 Feb 2009 4:17:41 UTC

                                                      I tried resetting the Ralph@home project as the messages suggested, and got these messages as a result:

                                                      2/14/2009 2:56:12 PM|ralph@home|Resetting project
                                                      2/14/2009 2:56:18 PM|ralph@home|[error] Couldn't delete file projects/ralph.bakerlab.org/minirosetta_1.58_windows_intelx86.exe
                                                      2/14/2009 2:56:33 PM|ralph@home|Sending scheduler request: To fetch work. Requesting 3853 seconds of work, reporting 0 completed tasks
                                                      2/14/2009 2:56:38 PM|ralph@home|Scheduler request succeeded: got 0 new tasks

                                                      Looks like there's also a problem in your reset procedure.


                                                      Better check just what that Ralph@home reset procedure does. Since the reset, I haven't been able to connect to Rosetta@home, either through BOINC or through its website. I have a Rosetta@home result I haven't been able to send, or I'd try resetting the Rosetta@home project also.

                                                      Has Rosetta@home been offline for several hours, or is this part of the result of the Ralph@home reset attempt?

                                                      I _ quit

                                                      Joined: Jan 13 09
                                                      Posts: 44
                                                      ID: 5136
                                                      Credit: 88,562
                                                      RAC: 0
                                                      Message 4706 - Posted 15 Feb 2009 9:37:07 UTC - in response to Message 4705.

                                                        I tried resetting the Ralph@home project as the messages suggested, and got these messages as a result:

                                                        2/14/2009 2:56:12 PM|ralph@home|Resetting project
                                                        2/14/2009 2:56:18 PM|ralph@home|[error] Couldn't delete file projects/ralph.bakerlab.org/minirosetta_1.58_windows_intelx86.exe
                                                        2/14/2009 2:56:33 PM|ralph@home|Sending scheduler request: To fetch work. Requesting 3853 seconds of work, reporting 0 completed tasks
                                                        2/14/2009 2:56:38 PM|ralph@home|Scheduler request succeeded: got 0 new tasks

                                                        Looks like there's also a problem in your reset procedure.


                                                        Better check just what that Ralph@home reset procedure does. Since the reset, I haven't been able to connect to Rosetta@home, either through BOINC or through its website. I have a Rosetta@home result I haven't been able to send, or I'd try resetting the Rosetta@home project also.

                                                        Has Rosetta@home been offline for several hours, or is this part of the result of the Ralph@home reset attempt?



                                                        It's been nearly 24 hrs and Rosetta is still down. It's almost like a system failure happened there. The problems there have nothing to do with Ralph and that problem you are having deleting the file.

                                                        AdeB
                                                        Avatar

                                                        Joined: Dec 22 07
                                                        Posts: 61
                                                        ID: 3888
                                                        Credit: 121,238
                                                        RAC: 465
                                                        Message 4707 - Posted 15 Feb 2009 11:27:17 UTC

                                                          Last modified: 15 Feb 2009 11:28:03 UTC

                                                          This looks like a long-running model: resultid=1309799
                                                          Name: loopbuild_chunk_2_7_B_hb_t286__IGNORE_THE_REST_1YZFA_5_7846_1_0
                                                          Outcome: Validate error
                                                          stderr out:

                                                          . . .
                                                          BOINC:: Worker startup.
                                                          Starting watchdog...
                                                          Watchdog active.
                                                          # cpu_run_time_pref: 14400
                                                          Hbond tripped !!!
                                                          BOINC:: CPU time: 28889.1s, 14400s + 14400s[2009- 2-14 15:24: 2:] :: BOINC
                                                          ======================================================
                                                          DONE :: 2 starting structures 28889.1 cpu seconds
                                                          This process generated 3 decoys from 3 attempts
                                                          ======================================================
                                                          called boinc_finish


                                                          AdeB

                                                          Profile Paul D. Buck

                                                          Joined: Jan 14 09
                                                          Posts: 62
                                                          ID: 5139
                                                          Credit: 33,293
                                                          RAC: 0
                                                          Message 4708 - Posted 19 Feb 2009 7:01:57 UTC

                                                            At least we got a little work again ...

                                                            Profile robertmiles

                                                            Joined: Jan 13 09
                                                            Posts: 79
                                                            ID: 5137
                                                            Credit: 246,177
                                                            RAC: 399
                                                            Message 4709 - Posted 19 Feb 2009 8:11:41 UTC

                                                              A failed workunit:

                                                              http://ralph.bakerlab.org/workunit.php?wuid=1160172

                                                              Some typical messages from it:

                                                              2/19/2009 12:19:42 AM|ralph@home|Restarting task 1ig5A_BOINC_ABRELAX_IGNORE_THE_REST-ENV10000--1ig5A-_7875_1_0 using minirosetta version 158
                                                              2/19/2009 12:20:22 AM|ralph@home|Task 1ig5A_BOINC_ABRELAX_IGNORE_THE_REST-ENV10000--1ig5A-_7875_1_0 exited with zero status but no 'finished' file
                                                              2/19/2009 12:20:22 AM|ralph@home|If this happens repeatedly you may need to reset the project.
                                                              2/19/2009 12:20:22 AM|ralph@home|Restarting task 1ig5A_BOINC_ABRELAX_IGNORE_THE_REST-ENV10000--1ig5A-_7875_1_0 using minirosetta version 158
                                                              2/19/2009 12:21:04 AM|ralph@home|Task 1ig5A_BOINC_ABRELAX_IGNORE_THE_REST-ENV10000--1ig5A-_7875_1_0 exited with zero status but no 'finished' file
                                                              2/19/2009 12:21:04 AM|ralph@home|If this happens repeatedly you may need to reset the project.

                                                              These messages are repeated many times.

                                                              I'm now running at 95% CPU, in order to help pin down the cause of this problem.

                                                              Profile Ian_D

                                                              Joined: Feb 16 06
                                                              Posts: 16
                                                              ID: 321
                                                              Credit: 39,518
                                                              RAC: 0
                                                              Message 4710 - Posted 19 Feb 2009 18:23:25 UTC

                                                                Last modified: 19 Feb 2009 18:25:15 UTC

                                                                http://ralph.bakerlab.org/result.php?resultid=1311870

                                                                <core_client_version>6.4.5</core_client_version>
                                                                <![CDATA[
                                                                <message>
                                                                Incorrect function. (0x1) - exit code 1 (0x1)
                                                                </message>
                                                                <stderr_txt>
                                                                [2009- 2-17 6:40:25:] :: BOINC:: Initializing ... ok.
                                                                [2009- 2-17 6:40:25:] :: BOINC :: boinc_init()
                                                                BOINC:: Setting up shared resources ... ok.
                                                                BOINC:: Setting up semaphores ... ok.
                                                                BOINC:: Updating status ... ok.
                                                                BOINC:: Registering timer callback... ok.
                                                                BOINC:: Worker initialized successfully.
                                                                Registering options..
                                                                Trying to access options object.
                                                                Success.
                                                                Registered extra options.
                                                                Initializing core...
                                                                Initializing options.... ok
                                                                Options::initialize()
                                                                Options::adding_options()
                                                                Options::initialize() Check specs.
                                                                Options::initialize() End reached
                                                                Loaded options.... ok
                                                                Processed options.... ok
                                                                Initializing random generators... ok
                                                                Initialization complete.
                                                                Setting WU description ...
                                                                Unpacking zip data: ../../projects/ralph.bakerlab.org/minirosetta_database_rev26003.zip
                                                                Setting database description ...
                                                                Setting up checkpointing ...
                                                                Setting up folding (abrelax) ...

                                                                ERROR: ERROR: FragmentIO: could not open file cs_aa_1ji8A09_05.200_v1_3.gz
                                                                ERROR:: Exit from: ..\..\src\core\fragment\FragmentIO.cc line: 245
                                                                BOINC:: Error reading and gzipping output datafile: default.out
                                                                called boinc_finish

                                                                </stderr_txt>
                                                                ]]>

                                                                http://ralph.bakerlab.org/result.php?resultid=1311551



                                                                <core_client_version>6.4.5</core_client_version>
                                                                <![CDATA[
                                                                <message>
                                                                Incorrect function. (0x1) - exit code 1 (0x1)
                                                                </message>
                                                                <stderr_txt>
                                                                [2009- 2-15 22:23:26:] :: BOINC:: Initializing ... ok.
                                                                [2009- 2-15 22:23:26:] :: BOINC :: boinc_init()
                                                                BOINC:: Setting up shared resources ... ok.
                                                                BOINC:: Setting up semaphores ... ok.
                                                                BOINC:: Updating status ... ok.
                                                                BOINC:: Registering timer callback... ok.
                                                                BOINC:: Worker initialized successfully.
                                                                Registering options..
                                                                Trying to access options object.
                                                                Success.
                                                                Registered extra options.
                                                                Initializing core...
                                                                Initializing options.... ok
                                                                Options::initialize()
                                                                Options::adding_options()
                                                                Options::initialize() Check specs.
                                                                Options::initialize() End reached
                                                                Loaded options.... ok
                                                                Processed options.... ok
                                                                Initializing random generators... ok
                                                                Initialization complete.
                                                                Setting WU description ...
                                                                Unpacking zip data: ../../projects/ralph.bakerlab.org/minirosetta_database_rev26003.zip
                                                                Unpacking WU data ...
                                                                Unpacking data: ../../projects/ralph.bakerlab.org/loopbuild_chunk_cheat_3_5.loopbuild_chunk.t326_.mtyka.boinc_files.zip
                                                                Setting database description ...
                                                                Setting up checkpointing ...

                                                                ERROR: [ERROR] Error opening RBSeg file 'core_2GHRA_10_noloop_loops.txt'
                                                                ERROR:: Exit from: ..\..\src\protocols\loops\LoopClass.cc line: 443
                                                                BOINC:: Error reading and gzipping output datafile: default.out
                                                                called boinc_finish

                                                                </stderr_txt>
                                                                ]]>
                                                                ____________

                                                                Profile Paul D. Buck

                                                                Joined: Jan 14 09
                                                                Posts: 62
                                                                ID: 5139
                                                                Credit: 33,293
                                                                RAC: 0
                                                                Message 4711 - Posted 19 Feb 2009 19:44:06 UTC - in response to Message 4709.

                                                                  These messages are repeated many times.

                                                                  I'm now running at 95% CPU, in order to help pin down the cause of this problem.


                                                                  The one hint is that POSSIBLY only those tasks where you have the screen saver activate or use the graphics application ALONG with CPU throtteling may be linked ... can you make note of that?

                                                                  Profile Ian_D

                                                                  Joined: Feb 16 06
                                                                  Posts: 16
                                                                  ID: 321
                                                                  Credit: 39,518
                                                                  RAC: 0
                                                                  Message 4712 - Posted 20 Feb 2009 20:35:22 UTC

                                                                    Invalid, Huh ?

                                                                    http://ralph.bakerlab.org/result.php?resultid=1316610

                                                                    <core_client_version>6.4.5</core_client_version>
                                                                    <![CDATA[
                                                                    <stderr_txt>
                                                                    [2009- 2-20 5:36:36:] :: BOINC:: Initializing ... ok.
                                                                    [2009- 2-20 5:36:36:] :: BOINC :: boinc_init()
                                                                    BOINC:: Setting up shared resources ... ok.
                                                                    BOINC:: Setting up semaphores ... ok.
                                                                    BOINC:: Updating status ... ok.
                                                                    BOINC:: Registering timer callback... ok.
                                                                    BOINC:: Worker initialized successfully.
                                                                    Registering options..
                                                                    Trying to access options object.
                                                                    Success.
                                                                    Registered extra options.
                                                                    Initializing core...
                                                                    Initializing options.... ok
                                                                    Options::initialize()
                                                                    Options::adding_options()
                                                                    Options::initialize() Check specs.
                                                                    Options::initialize() End reached
                                                                    Loaded options.... ok
                                                                    Processed options.... ok
                                                                    Initializing random generators... ok
                                                                    Initialization complete.
                                                                    Setting WU description ...
                                                                    Unpacking zip data: ../../projects/ralph.bakerlab.org/minirosetta_database_rev26003.zip
                                                                    Unpacking WU data ...
                                                                    Unpacking data: ../../projects/ralph.bakerlab.org/loopbuild_mamaln_ideal.loopbuild.t312_.mtyka.boinc_files.zip
                                                                    Setting database description ...
                                                                    Setting up checkpointing ...
                                                                    BOINC:: Worker startup.
                                                                    Starting watchdog...
                                                                    Watchdog active.
                                                                    ======================================================
                                                                    DONE :: 1 starting structures 3034.98 cpu seconds
                                                                    This process generated 5 decoys from 5 attempts
                                                                    ======================================================

                                                                    BOINC :: Watchdog shutting down...
                                                                    BOINC :: BOINC support services shutting down cleanly ...
                                                                    called boinc_finish

                                                                    </stderr_txt>
                                                                    ]]>

                                                                    ____________

                                                                    Profile robertmiles

                                                                    Joined: Jan 13 09
                                                                    Posts: 79
                                                                    ID: 5137
                                                                    Credit: 246,177
                                                                    RAC: 399
                                                                    Message 4713 - Posted 21 Feb 2009 12:03:56 UTC - in response to Message 4711.

                                                                      Last modified: 21 Feb 2009 12:29:46 UTC

                                                                      These messages are repeated many times.

                                                                      I'm now running at 95% CPU, in order to help pin down the cause of this problem.


                                                                      The one hint is that POSSIBLY only those tasks where you have the screen saver activate or use the graphics application ALONG with CPU throtteling may be linked ... can you make note of that?


                                                                      Since then, I've had two 1.58 workunits complete successfully with no graphics application activation. Still running at 95% CPU.

                                                                      Doing the same for 1.54 over on Rosetta@home doesn't trigger the problem.

                                                                      In other words, the combination of all of the following trigger the problem for me: 1.58, less than 100% CPU, activating graphics after the workunit starts without them, shutting down the graphics window. Running 1.58 at less than 100% CPU, but with no graphics, doesn't trigger it for me. The problem doesn't trigger for 1.54. I haven't tested the other possibilities yet.

                                                                      I use an all-black screen saver these days.

                                                                      I _ quit

                                                                      Joined: Jan 13 09
                                                                      Posts: 44
                                                                      ID: 5136
                                                                      Credit: 88,562
                                                                      RAC: 0
                                                                      Message 4714 - Posted 22 Feb 2009 22:25:50 UTC

                                                                        Just FYI, ALL tasks assigned to me completed ok. NO compute errors!

                                                                        Profile robertmiles

                                                                        Joined: Jan 13 09
                                                                        Posts: 79
                                                                        ID: 5137
                                                                        Credit: 246,177
                                                                        RAC: 399
                                                                        Message 4715 - Posted 26 Feb 2009 13:42:35 UTC

                                                                          The last few times I looked at the System Status, the File Deleter was not running. Does it need to be running more often?

                                                                          Profile Paul D. Buck

                                                                          Joined: Jan 14 09
                                                                          Posts: 62
                                                                          ID: 5139
                                                                          Credit: 33,293
                                                                          RAC: 0
                                                                          Message 4719 - Posted 3 Mar 2009 7:01:12 UTC

                                                                            It looks like three tasks with the same error... and it is not one I have seen before:

                                                                            ERROR: ( vol_a.length() == 2 ) && ( std::isalpha( vol_a[ 0 ] ) ) && ( vol_a[ 1 ] == ':' )
                                                                            ERROR:: Exit from: ..\..\src\utility\file\FileName.cc line: 41
                                                                            BOINC:: Error reading and gzipping output datafile: default.out

                                                                            1325665
                                                                            1325692
                                                                            1325691

                                                                            svincent

                                                                            Joined: Apr 4 08
                                                                            Posts: 34
                                                                            ID: 4182
                                                                            Credit: 51,768
                                                                            RAC: 0
                                                                            Message 4720 - Posted 3 Mar 2009 17:16:55 UTC

                                                                              I've had 4 workunits on Mac OS X 10.4.11 that all failed after apparent successful completion

                                                                              </stderr_txt>
                                                                              <message>
                                                                              <file_xfer_error>
                                                                              <file_name>homobench_natrelax_t312__8094_1_1_0</file_name>
                                                                              <error_code>-161</error_code>
                                                                              </file_xfer_error>


                                                                              Workunit ID's

                                                                              1170292
                                                                              1170291
                                                                              1170290
                                                                              1170289

                                                                              It appears in each case, that they had previously been sent to a Windows machine where they failed (as noted by Paul Buck) in the manner shown below, but at the start, not at the end:

                                                                              ERROR: ( vol_a.length() == 2 ) && ( std::isalpha( vol_a[ 0 ] ) ) && ( vol_a[ 1 ] == ':' )
                                                                              ERROR:: Exit from: ..\..\src\utility\file\FileName.cc line: 41
                                                                              BOINC:: Error reading and gzipping output datafile: default.out
                                                                              called boinc_finish

                                                                              Profile feet1st

                                                                              Joined: Mar 7 06
                                                                              Posts: 312
                                                                              ID: 1028
                                                                              Credit: 110,522
                                                                              RAC: 0
                                                                              Message 4721 - Posted 4 Mar 2009 23:48:42 UTC

                                                                                Last modified: 4 Mar 2009 23:51:12 UTC

                                                                                This one had a result file over 1MB. It's name is cc_2_2_mamcstmix_cen_bounded_0.1_hb_t311__ IGNORE_THE_REST_1B0NA_6_8133_1_0
                                                                                It also ended after under 15 hrs on my 24hr preference. Looks like it hit the 99 model limit.
                                                                                ____________

                                                                                Evan

                                                                                Joined: Dec 23 07
                                                                                Posts: 75
                                                                                ID: 3893
                                                                                Credit: 69,584
                                                                                RAC: 0
                                                                                Message 4722 - Posted 5 Mar 2009 18:27:44 UTC

                                                                                  validate error:
                                                                                  all first time runs
                                                                                  1329199
                                                                                  1329195
                                                                                  1329205
                                                                                  1329212
                                                                                  1329219
                                                                                  1329224

                                                                                  Profile Paul D. Buck

                                                                                  Joined: Jan 14 09
                                                                                  Posts: 62
                                                                                  ID: 5139
                                                                                  Credit: 33,293
                                                                                  RAC: 0
                                                                                  Message 4723 - Posted 6 Mar 2009 6:46:55 UTC

                                                                                    The error is: "Unable to open weights."

                                                                                    1331227
                                                                                    1331235

                                                                                    Evan

                                                                                    Joined: Dec 23 07
                                                                                    Posts: 75
                                                                                    ID: 3893
                                                                                    Credit: 69,584
                                                                                    RAC: 0
                                                                                    Message 4724 - Posted 6 Mar 2009 9:33:09 UTC

                                                                                      validate error:
                                                                                      all first time runs
                                                                                      1329199
                                                                                      1329195
                                                                                      1329205
                                                                                      1329212
                                                                                      1329219
                                                                                      1329224


                                                                                      What causes a validation error? It would appear that my 6 errant work units are well on the way to have successful second runs.

                                                                                      Evan

                                                                                      Joined: Dec 23 07
                                                                                      Posts: 75
                                                                                      ID: 3893
                                                                                      Credit: 69,584
                                                                                      RAC: 0
                                                                                      Message 4725 - Posted 9 Mar 2009 23:42:46 UTC

                                                                                        There are going to be quite few errors coming through until the system sorts itself out after the closure. I have had a good number of ghost work units reportedly downloaded to me but I can't find them. They have been already been reported as successes and rewarded points but handed out again.

                                                                                        Speedy

                                                                                        Joined: Dec 4 06
                                                                                        Posts: 8
                                                                                        ID: 2327
                                                                                        Credit: 1,985
                                                                                        RAC: 0
                                                                                        Message 4726 - Posted 10 Mar 2009 7:57:10 UTC

                                                                                          Last modified: 10 Mar 2009 8:04:33 UTC

                                                                                          This Workunit Displayed a black window when I pushed the show graphics button under the tasks tab in Boinc manger. It completed successfully after about 55 minutes I got credit for it. Could the reason be because the task may have been a resend? Is their a eta as to when 1.58 was be deployed on the main Rosetta?

                                                                                          Profile feet1st

                                                                                          Joined: Mar 7 06
                                                                                          Posts: 312
                                                                                          ID: 1028
                                                                                          Credit: 110,522
                                                                                          RAC: 0
                                                                                          Message 4727 - Posted 13 Mar 2009 19:00:36 UTC

                                                                                            Last modified: 13 Mar 2009 19:01:33 UTC

                                                                                            A few tasks lately have produced large outfiles, and ended prematurely (as compared to my 24hr runtime preference). Presumably due to the 99 model limit. But I thought you would want to be aware of them.

                                                                                            runtime !result size !task
                                                                                            11:41 !1.6MB !1343012
                                                                                            10hrs !unknown !1339103
                                                                                            21:41 !2.56MB !1343325

                                                                                            These are all the loopbuild tasks of various flavors.
                                                                                            ____________

                                                                                            Profile Pentti Kiesi

                                                                                            Joined: Jan 2 09
                                                                                            Posts: 2
                                                                                            ID: 5105
                                                                                            Credit: 111,437
                                                                                            RAC: 0
                                                                                            Message 4728 - Posted 15 Mar 2009 7:49:43 UTC

                                                                                              One WU seems not willing to upload at all. Others before and after it are
                                                                                              uploading correctly:

                                                                                              13.3.2009 21:38:39|ralph@home|Started upload of loopbuild_mamaln_full_hb_t303__IGNORE_THE_REST_1te2_99_8360_1_0_0
                                                                                              13.3.2009 21:38:39|ralph@home|Started upload of loopbuild_mamaln_full_hb_t303__IGNORE_THE_REST_1f5s_99_8360_1_0_0
                                                                                              13.3.2009 21:38:39|Poem@Home|Restarting task Peptide_387_1236485967_1676836187_0 using poem version 100
                                                                                              13.3.2009 21:38:39|malariacontrol.net|Restarting task wu_510_511_2640_0_1236930257_1 using malariacontrolBeta version 612
                                                                                              13.3.2009 21:38:39|malariacontrol.net|Restarting task wu_510_415_2640_0_1236930257_0 using malariacontrolBeta version 612
                                                                                              13.3.2009 21:38:39|malariacontrol.net|Restarting task wu_510_414_2640_0_1236930257_1 using malariacontrolBeta version 612
                                                                                              13.3.2009 21:38:40|ralph@home|[error] Error reported by file upload server: [loopbuild_mamaln_full_hb_t303__IGNORE_THE_REST_1te2_99_8360_1_0_0] locked by file_upload_handler PID=-1
                                                                                              13.3.2009 21:38:40|ralph@home|Temporarily failed upload of loopbuild_mamaln_full_hb_t303__IGNORE_THE_REST_1te2_99_8360_1_0_0: transient upload error
                                                                                              13.3.2009 21:38:40|ralph@home|Backing off 3 hr 5 min 27 sec on upload of loopbuild_mamaln_full_hb_t303__IGNORE_THE_REST_1te2_99_8360_1_0_0


                                                                                              ...

                                                                                              15.3.2009 9:34:49|ralph@home|Started upload of loopbuild_mamaln_full_hb_t303__IGNORE_THE_REST_1te2_99_8360_1_0_0
                                                                                              15.3.2009 9:34:51|ralph@home|[error] Error reported by file upload server: [loopbuild_mamaln_full_hb_t303__IGNORE_THE_REST_1te2_99_8360_1_0_0] locked by file_upload_handler PID=-1
                                                                                              15.3.2009 9:34:51|ralph@home|Temporarily failed upload of loopbuild_mamaln_full_hb_t303__IGNORE_THE_REST_1te2_99_8360_1_0_0: transient upload error
                                                                                              15.3.2009 9:34:51|ralph@home|Backing off 2 hr 45 min 43 sec on upload of loopbuild_mamaln_full_hb_t303__IGNORE_THE_REST_1te2_99_8360_1_0_0

                                                                                              What is the problem?

                                                                                              BigMike
                                                                                              Avatar

                                                                                              Joined: Feb 23 06
                                                                                              Posts: 63
                                                                                              ID: 738
                                                                                              Credit: 58,730
                                                                                              RAC: 0
                                                                                              Message 4729 - Posted 15 Mar 2009 7:56:03 UTC - in response to Message 4725.

                                                                                                I have had a good number of ghost work units reportedly downloaded to me but I can't find them.

                                                                                                Same here. I have about 40 WU's that R@H thinks are "in progress", but I never saw them. Something's broken...

                                                                                                ==Mike

                                                                                                ____________
                                                                                                Don't believe everything you think.

                                                                                                Profile robertmiles

                                                                                                Joined: Jan 13 09
                                                                                                Posts: 79
                                                                                                ID: 5137
                                                                                                Credit: 246,177
                                                                                                RAC: 399
                                                                                                Message 4730 - Posted 15 Mar 2009 12:39:00 UTC

                                                                                                  Another lockfile problem:

                                                                                                  http://ralph.bakerlab.org/result.php?resultid=1358791

                                                                                                  I'm still running at 95% CPU time but don't think I enabled graphics at any time for this workunit.

                                                                                                  Profile feet1st

                                                                                                  Joined: Mar 7 06
                                                                                                  Posts: 312
                                                                                                  ID: 1028
                                                                                                  Credit: 110,522
                                                                                                  RAC: 0
                                                                                                  Message 4731 - Posted 15 Mar 2009 14:39:19 UTC

                                                                                                    Last modified: 15 Mar 2009 14:41:19 UTC

                                                                                                    this mamaln task was preempted at like 98% complete. When BOINC got back to it, it immediately (16 seconds later) finished. No other suspicious messages about the task.

                                                                                                    Status page shows scheduler is active, but I'm getting these when I try to update.
                                                                                                    Scheduler request failed: Server returned nothing (no headers, no data)
                                                                                                    ____________

                                                                                                    Profile KC0ISW

                                                                                                    Joined: Feb 17 06
                                                                                                    Posts: 20
                                                                                                    ID: 452
                                                                                                    Credit: 11,725
                                                                                                    RAC: 0
                                                                                                    Message 4732 - Posted 15 Mar 2009 20:34:55 UTC

                                                                                                      http://ralph.bakerlab.org/result.php?resultid=1357308
                                                                                                      ____________

                                                                                                      Profile KC0ISW

                                                                                                      Joined: Feb 17 06
                                                                                                      Posts: 20
                                                                                                      ID: 452
                                                                                                      Credit: 11,725
                                                                                                      RAC: 0
                                                                                                      Message 4733 - Posted 15 Mar 2009 20:38:24 UTC


                                                                                                        http://ralph.bakerlab.org/result.php?resultid=1357307
                                                                                                        http://ralph.bakerlab.org/result.php?resultid=1357306
                                                                                                        http://ralph.bakerlab.org/result.php?resultid=1357304
                                                                                                        http://ralph.bakerlab.org/result.php?resultid=1357303
                                                                                                        http://ralph.bakerlab.org/result.php?resultid=1357302
                                                                                                        http://ralph.bakerlab.org/result.php?resultid=1357274
                                                                                                        http://ralph.bakerlab.org/result.php?resultid=1357231

                                                                                                        ____________

                                                                                                        BigMike
                                                                                                        Avatar

                                                                                                        Joined: Feb 23 06
                                                                                                        Posts: 63
                                                                                                        ID: 738
                                                                                                        Credit: 58,730
                                                                                                        RAC: 0
                                                                                                        Message 4734 - Posted 15 Mar 2009 22:19:12 UTC - in response to Message 4729.

                                                                                                          I have had a good number of ghost work units reportedly downloaded to me but I can't find them.

                                                                                                          Same here. I have about 40 WU's that R@H thinks are "in progress", but I never saw them.


                                                                                                          It just did it to me again. Three WU's completed...37 non-existent ones "in progress". And I've reached my daily "quota".

                                                                                                          ==Mike

                                                                                                          ____________
                                                                                                          Don't believe everything you think.

                                                                                                          Profile KC0ISW

                                                                                                          Joined: Feb 17 06
                                                                                                          Posts: 20
                                                                                                          ID: 452
                                                                                                          Credit: 11,725
                                                                                                          RAC: 0
                                                                                                          Message 4735 - Posted 16 Mar 2009 5:10:03 UTC

                                                                                                            http://ralph.bakerlab.org/result.php?resultid=1357309
                                                                                                            http://ralph.bakerlab.org/result.php?resultid=1357310
                                                                                                            http://ralph.bakerlab.org/result.php?resultid=1357311
                                                                                                            http://ralph.bakerlab.org/result.php?resultid=1357312
                                                                                                            ____________

                                                                                                            Profile KC0ISW

                                                                                                            Joined: Feb 17 06
                                                                                                            Posts: 20
                                                                                                            ID: 452
                                                                                                            Credit: 11,725
                                                                                                            RAC: 0
                                                                                                            Message 4736 - Posted 16 Mar 2009 5:22:15 UTC - in response to Message 4735.

                                                                                                              ok my errors maybe because of dep settings i told dep to over bonic see that helps.
                                                                                                              ____________

                                                                                                              Profile Paul D. Buck

                                                                                                              Joined: Jan 14 09
                                                                                                              Posts: 62
                                                                                                              ID: 5139
                                                                                                              Credit: 33,293
                                                                                                              RAC: 0
                                                                                                              Message 4737 - Posted 16 Mar 2009 8:39:12 UTC

                                                                                                                First death in like forever ... I have run off a ton of tasks recently and only the one death:

                                                                                                                1362077 0x006C43DD write attempt to address 0x00000000

                                                                                                                Pretty sure this is an address reported previously ...

                                                                                                                Profile KC0ISW

                                                                                                                Joined: Feb 17 06
                                                                                                                Posts: 20
                                                                                                                ID: 452
                                                                                                                Credit: 11,725
                                                                                                                RAC: 0
                                                                                                                Message 4738 - Posted 16 Mar 2009 14:47:46 UTC

                                                                                                                  all my error keep coming from

                                                                                                                  Unhandled Exception Record -
                                                                                                                  Reason: Access Violation (0xc0000005) at address 0x005286C6 read attempt to address 0x06CA4FF8

                                                                                                                  ____________

                                                                                                                  Profile feet1st

                                                                                                                  Joined: Mar 7 06
                                                                                                                  Posts: 312
                                                                                                                  ID: 1028
                                                                                                                  Credit: 110,522
                                                                                                                  RAC: 0
                                                                                                                  Message 4739 - Posted 16 Mar 2009 19:55:10 UTC

                                                                                                                    Last modified: 16 Mar 2009 20:04:45 UTC

                                                                                                                    Stage "unk"?? So... it's "unknown"? (and truncated?)

                                                                                                                    The protein seemed to slip off the pane too. I just saw black until it later came in to view.



                                                                                                                    ____________

                                                                                                                    Profile feet1st

                                                                                                                    Joined: Mar 7 06
                                                                                                                    Posts: 312
                                                                                                                    ID: 1028
                                                                                                                    Credit: 110,522
                                                                                                                    RAC: 0
                                                                                                                    Message 4740 - Posted 16 Mar 2009 20:12:17 UTC - in response to Message 4735.

                                                                                                                      http://ralph.bakerlab.org/result.php?resultid=1357309
                                                                                                                      http://ralph.bakerlab.org/result.php?resultid=1357310
                                                                                                                      http://ralph.bakerlab.org/result.php?resultid=1357311
                                                                                                                      http://ralph.bakerlab.org/result.php?resultid=1357312


                                                                                                                      In each case, when your task failed another task was generated for the work unit, and the next person was able to run it without error.

                                                                                                                      Seems to point to some problem on your machine. Overclocking? Dust bunnies clogging cooling system? Memory failing?
                                                                                                                      ____________

                                                                                                                      svincent

                                                                                                                      Joined: Apr 4 08
                                                                                                                      Posts: 34
                                                                                                                      ID: 4182
                                                                                                                      Credit: 51,768
                                                                                                                      RAC: 0
                                                                                                                      Message 4742 - Posted 17 Mar 2009 16:45:54 UTC

                                                                                                                        A couple of recent compute errors on Mac OS X 10.4.11

                                                                                                                        Workunit 1210811

                                                                                                                        ERROR: Cannot open PDB file "1a17_201.pdb"
                                                                                                                        ERROR:: Exit from: src/core/io/pdb/pose_io.cc line: 179
                                                                                                                        BOINC:: Error reading and gzipping output datafile: default.out
                                                                                                                        called boinc_finish

                                                                                                                        Workunit 1210764

                                                                                                                        RROR: Cannot open PDB file "1ad6_197.pdb"
                                                                                                                        ERROR:: Exit from: src/core/io/pdb/pose_io.cc line: 179
                                                                                                                        BOINC:: Error reading and gzipping output datafile: default.out
                                                                                                                        called boinc_finish


                                                                                                                        Evan

                                                                                                                        Joined: Dec 23 07
                                                                                                                        Posts: 75
                                                                                                                        ID: 3893
                                                                                                                        Credit: 69,584
                                                                                                                        RAC: 0
                                                                                                                        Message 4743 - Posted 17 Mar 2009 16:54:21 UTC

                                                                                                                          Last modified: 17 Mar 2009 16:55:10 UTC

                                                                                                                          A PDB error this time on windows with this one:

                                                                                                                          1369045

                                                                                                                          Cannot open PDB file "1xqo_213.pdb"
                                                                                                                          ERROR:: Exit from: ..\..\src\core\io\pdb\pose_io.cc line: 179
                                                                                                                          BOINC:: Error reading and gzipping output datafile: default.out
                                                                                                                          called boinc_finish

                                                                                                                          Profile [Toscana]SickBoy88

                                                                                                                          Joined: Jan 27 09
                                                                                                                          Posts: 3
                                                                                                                          ID: 5177
                                                                                                                          Credit: 17,581
                                                                                                                          RAC: 0
                                                                                                                          Message 4744 - Posted 18 Mar 2009 12:53:20 UTC

                                                                                                                            This WU
                                                                                                                            http://ralph.bakerlab.org/result.php?resultid=1371832
                                                                                                                            Give me a compute error.

                                                                                                                            svincent

                                                                                                                            Joined: Apr 4 08
                                                                                                                            Posts: 34
                                                                                                                            ID: 4182
                                                                                                                            Credit: 51,768
                                                                                                                            RAC: 0
                                                                                                                            Message 4745 - Posted 18 Mar 2009 20:09:54 UTC

                                                                                                                              Just has a bunch of WU's fail at the start (Mac OS X 10.4.11) all in the same way: sample output below.

                                                                                                                              1213821
                                                                                                                              1213853
                                                                                                                              1213852
                                                                                                                              1213701
                                                                                                                              1213693
                                                                                                                              1213692
                                                                                                                              1213677

                                                                                                                              ERROR: [ERROR] Unable to open constraints file: t297_.cst.best.multi
                                                                                                                              ERROR:: Exit from: src/core/scoring/constraints/ConstraintIO.cc line: 330
                                                                                                                              BOINC:: Error reading and gzipping output datafile: default.out
                                                                                                                              called boinc_finish


                                                                                                                              Profile Pentti Kiesi

                                                                                                                              Joined: Jan 2 09
                                                                                                                              Posts: 2
                                                                                                                              ID: 5105
                                                                                                                              Credit: 111,437
                                                                                                                              RAC: 0
                                                                                                                              Message 4747 - Posted 19 Mar 2009 12:15:40 UTC - in response to Message 4728.


                                                                                                                                http://ralph.bakerlab.org/workunit.php?wuid=1185064

                                                                                                                                One WU seems not willing to upload at all. Others before and after it are
                                                                                                                                uploading correctly:
                                                                                                                                13.3.2009 21:38:40|ralph@home|[error] Error reported by file upload server: [loopbuild_mamaln_full_hb_t303__IGNORE_THE_REST_1te2_99_8360_1_0_0] locked by file_upload_handler PID=-1

                                                                                                                                As a reminder. Still hanging on my upload queue. 12h CPU time on Quad 2.66GHz. Should I cancel this at last, or are you still intereseted on it?

                                                                                                                                Profile robertmiles

                                                                                                                                Joined: Jan 13 09
                                                                                                                                Posts: 79
                                                                                                                                ID: 5137
                                                                                                                                Credit: 246,177
                                                                                                                                RAC: 399
                                                                                                                                Message 4748 - Posted 22 Mar 2009 14:01:52 UTC

                                                                                                                                  Another lockfile problem:

                                                                                                                                  http://ralph.bakerlab.org/result.php?resultid=1374201

                                                                                                                                  Running at 95% CPU, with BOINC 6.2.28 under Vista SP1 with graphics disabled.

                                                                                                                                  Will try resetting Ralph@home soon.

                                                                                                                                  Profile robertmiles

                                                                                                                                  Joined: Jan 13 09
                                                                                                                                  Posts: 79
                                                                                                                                  ID: 5137
                                                                                                                                  Credit: 246,177
                                                                                                                                  RAC: 399
                                                                                                                                  Message 4749 - Posted 22 Mar 2009 15:20:49 UTC - in response to Message 4748.

                                                                                                                                    Tried resetting Ralph@home, got these error messages:

                                                                                                                                    3/22/2009 9:04:01 AM|ralph@home|Resetting project
                                                                                                                                    3/22/2009 9:04:06 AM|ralph@home|[error] Couldn't delete file projects/ralph.bakerlab.org/minirosetta_1.58_windows_intelx86.exe

                                                                                                                                    Similar problem with Rosetta@home, except with the 1.54 executable.

                                                                                                                                    I use BOINC 6.2.28 under Vista SP1, with graphics not enabled.

                                                                                                                                    An attempt to manually delete this file failed when I couldn't find the directory containing it, or even anything under the BOINC directory specific to the Ralph@home project.

                                                                                                                                    I intend to leave both Ralph@home and Rosetta@home on no new tasks until I get some usable advice on how to complete the resets.

                                                                                                                                    Evan

                                                                                                                                    Joined: Dec 23 07
                                                                                                                                    Posts: 75
                                                                                                                                    ID: 3893
                                                                                                                                    Credit: 69,584
                                                                                                                                    RAC: 0
                                                                                                                                    Message 4750 - Posted 22 Mar 2009 18:54:11 UTC

                                                                                                                                      Validate errors
                                                                                                                                      1374039
                                                                                                                                      1374038
                                                                                                                                      1374037

                                                                                                                                      All run for around 5 hours. No doubt the second run will take half the time or less as has happened in previous work units

                                                                                                                                      Profile robertmiles

                                                                                                                                      Joined: Jan 13 09
                                                                                                                                      Posts: 79
                                                                                                                                      ID: 5137
                                                                                                                                      Credit: 246,177
                                                                                                                                      RAC: 399
                                                                                                                                      Message 4751 - Posted 23 Mar 2009 2:38:44 UTC - in response to Message 4749.

                                                                                                                                        Last modified: 23 Mar 2009 2:39:46 UTC

                                                                                                                                        More on the lockfile problem:

                                                                                                                                        When this problem shows up, expect a few subdirectories of BOINC\slots to have three files each, unrelated to any workunit in progress and including the lockfile. I suspect that Rosetta@home and Ralph@home workunits are unable to run successfully if assigned to any of these slots, even if workunits from other BOINC projects can. Attempts to manually delete these files also fail.

                                                                                                                                        However, the following may have helped for me: Set Rosetta@home and/or Ralph@home to no new tasks and wait until all tasks for either of them complete. Do an update for either that has tasks not reported yet. Suspend all workunits and network activity. Shut down the BOINC client, then find process boinc.exe and kill it. Reboot. If these subdirectories of BOINC\slots have disappeared, enable network activity and do another reset on Rosetta@home and/or Ralph@home. If these resets complete without error messages, it's safe to resume activity on any other BOINC projects, then allow new tasks on Rosetta@home and/or Ralph@home.

                                                                                                                                        However, I haven't completed any workunits for either Rosetta@home or Ralph@home since doing this, so it will be at least tomorrow before I can check if this actually took care of the lockfile problem, at least temporarily.

                                                                                                                                        I wouldn't be surprised if this procedure includes some unneccessary steps, but wanted to report this before any effort to find out.

                                                                                                                                        Profile robertmiles

                                                                                                                                        Joined: Jan 13 09
                                                                                                                                        Posts: 79
                                                                                                                                        ID: 5137
                                                                                                                                        Credit: 246,177
                                                                                                                                        RAC: 399
                                                                                                                                        Message 4752 - Posted 23 Mar 2009 16:46:06 UTC - in response to Message 4751.

                                                                                                                                          Didn't help enough - the first Rosetta@home 1.54 workunit completed after the above procedure had the lockfile problem again, but two more started since then and not complete yet haven't had that problem yet.

                                                                                                                                          Suggestion: Modify minirosetta so that it checks for a lockfile as it starts up, preferably before trying to create one, and if this first check finds a lockfile, reduce the number of times minirosetta is allowed to restart before it is able to write the first checkpoint.

                                                                                                                                          Suggestion: Modify minirosetta so that it reports which slot it was run under if it is able to do this, since the problem looks likely to repeat for any minirosetta workunit run in a slot where a previous workunit's lockfile was not erased when the previous workunit completed and was reported.

                                                                                                                                          Suggestion: Check the procedure used for failed workunits to see if it leaves a lockfile behind after abandoning efforts to restart the workunit.

                                                                                                                                          Suggestion: Check what program is supposed to delete the lockfiles for workunits that have been completed and reported.

                                                                                                                                          Suggestion: Check if BOINC allows any way to request that a workunit be restarted, but in a different slot.

                                                                                                                                          Suggestion: If BOINC is supposed to clean up the slots after workunits complete and are reported, check if BOINC 6.2.28 is known to have any problems with doing this.

                                                                                                                                          I haven't had any 1.58 workunits since trying the procedure, so I don't know whether these continued problems also apply to 1.58.

                                                                                                                                          I often let BOINC run for a few days between reboots.

                                                                                                                                          I still use BOINC 6.2.28 under Vista SP1, with 95% CPU time.

                                                                                                                                          Profile Paul D. Buck

                                                                                                                                          Joined: Jan 14 09
                                                                                                                                          Posts: 62
                                                                                                                                          ID: 5139
                                                                                                                                          Credit: 33,293
                                                                                                                                          RAC: 0
                                                                                                                                          Message 4753 - Posted 23 Mar 2009 17:06:10 UTC

                                                                                                                                            Robert,

                                                                                                                                            Thanks for all the work on the lock file ... I hope we can figure out what is going on with this ...

                                                                                                                                            On my part I have a Validate error though the task seems to have failed with another error that did not get reported as an error:

                                                                                                                                            ERROR: dis==0 in pairtermderiv!
                                                                                                                                            ERROR:: Exit from: src/core/scoring/methods/PairEnergy.cc line: 338

                                                                                                                                            What does this mean? Beats me ...

                                                                                                                                            Profile Paul D. Buck

                                                                                                                                            Joined: Jan 14 09
                                                                                                                                            Posts: 62
                                                                                                                                            ID: 5139
                                                                                                                                            Credit: 33,293
                                                                                                                                            RAC: 0
                                                                                                                                            Message 4754 - Posted 24 Mar 2009 18:54:31 UTC

                                                                                                                                              New error:
                                                                                                                                              ERROR: aFrame->nr_frags()
                                                                                                                                              ERROR:: Exit from: ..\..\src\core\fragment\FragSet.cc line: 168

                                                                                                                                              svincent

                                                                                                                                              Joined: Apr 4 08
                                                                                                                                              Posts: 34
                                                                                                                                              ID: 4182
                                                                                                                                              Credit: 51,768
                                                                                                                                              RAC: 0
                                                                                                                                              Message 4755 - Posted 24 Mar 2009 21:24:14 UTC

                                                                                                                                                More problems on Mac O S X 10.4.11

                                                                                                                                                WU's 1376869,1376870,1376871 failed: see below

                                                                                                                                                ERROR: Conformation: fold_tree nres should match conformation nres. conformation nres: 137 fold_tree nres: 156589050
                                                                                                                                                ERROR:: Exit from: src/core/conformation/Conformation.cc line: 224
                                                                                                                                                BOINC:: Error reading and gzipping output datafile: default.out
                                                                                                                                                called boinc_finish

                                                                                                                                                </stderr_txt>

                                                                                                                                                Chu
                                                                                                                                                Forum moderator
                                                                                                                                                Project developer
                                                                                                                                                Project scientist

                                                                                                                                                Joined: Sep 26 06
                                                                                                                                                Posts: 61
                                                                                                                                                ID: 1900
                                                                                                                                                Credit: 12,545
                                                                                                                                                RAC: 0
                                                                                                                                                Message 4756 - Posted 24 Mar 2009 22:27:50 UTC - in response to Message 4755.

                                                                                                                                                  Last modified: 25 Mar 2009 5:27:38 UTC

                                                                                                                                                  Thanks for your reporting. Some input and output files were not compressed properly for the WUs ending with "BOINC_MPZN_with_zinc_loop_modeling" and therefore caused pre-matured failures/exits. Sorry about it.

                                                                                                                                                  More problems on Mac O S X 10.4.11

                                                                                                                                                  WU's 1376869,1376870,1376871 failed: see below

                                                                                                                                                  ERROR: Conformation: fold_tree nres should match conformation nres. conformation nres: 137 fold_tree nres: 156589050
                                                                                                                                                  ERROR:: Exit from: src/core/conformation/Conformation.cc line: 224
                                                                                                                                                  BOINC:: Error reading and gzipping output datafile: default.out
                                                                                                                                                  called boinc_finish

                                                                                                                                                  </stderr_txt>

                                                                                                                                                  Evan

                                                                                                                                                  Joined: Dec 23 07
                                                                                                                                                  Posts: 75
                                                                                                                                                  ID: 3893
                                                                                                                                                  Credit: 69,584
                                                                                                                                                  RAC: 0
                                                                                                                                                  Message 4757 - Posted 27 Mar 2009 12:00:35 UTC

                                                                                                                                                    Last modified: 27 Mar 2009 12:01:43 UTC

                                                                                                                                                    It seems that the work units that I downloaded this morning have an incomplete nomenclature. They are missing the final _0 or _1 that indicates whether it is a first or second attempt.

                                                                                                                                                    Evan

                                                                                                                                                    Joined: Dec 23 07
                                                                                                                                                    Posts: 75
                                                                                                                                                    ID: 3893
                                                                                                                                                    Credit: 69,584
                                                                                                                                                    RAC: 0
                                                                                                                                                    Message 4758 - Posted 27 Mar 2009 13:32:14 UTC - in response to Message 4757.

                                                                                                                                                      It seems that the work units that I downloaded this morning have an incomplete nomenclature. They are missing the final _0 or _1 that indicates whether it is a first or second attempt.


                                                                                                                                                      Correction! They are correct on the task list but missing on the work details on the website

                                                                                                                                                      svincent

                                                                                                                                                      Joined: Apr 4 08
                                                                                                                                                      Posts: 34
                                                                                                                                                      ID: 4182
                                                                                                                                                      Credit: 51,768
                                                                                                                                                      RAC: 0
                                                                                                                                                      Message 4759 - Posted 27 Mar 2009 17:48:08 UTC

                                                                                                                                                        Last modified: 27 Mar 2009 17:49:06 UTC

                                                                                                                                                        This workunit 1223308 gave a Validate Error on Mac: it claimed to generate 99 decoys from 99 attempts in 12 minutes. Seem unlikely.

                                                                                                                                                        The end of stderr output

                                                                                                                                                        Starting work on structure: _1UFBA_5_00097
                                                                                                                                                        Starting work on structure: _1UFBA_5_00098
                                                                                                                                                        Starting work on structure: _1UFBA_5_00099
                                                                                                                                                        ======================================================
                                                                                                                                                        DONE :: 1 starting structures 782.2 cpu seconds
                                                                                                                                                        This process generated 99 decoys from 99 attempts
                                                                                                                                                        ======================================================

                                                                                                                                                        BOINC :: Watchdog shutting down...
                                                                                                                                                        BOINC :: BOINC support services shutting down cleanly ...
                                                                                                                                                        called boinc_finish

                                                                                                                                                        </stderr_txt>

                                                                                                                                                        svincent

                                                                                                                                                        Joined: Apr 4 08
                                                                                                                                                        Posts: 34
                                                                                                                                                        ID: 4182
                                                                                                                                                        Credit: 51,768
                                                                                                                                                        RAC: 0
                                                                                                                                                        Message 4760 - Posted 27 Mar 2009 18:01:50 UTC

                                                                                                                                                          Another unzipping issue with workunit 1223005 on Mac

                                                                                                                                                          Unpacking zip data: ../../projects/ralph.bakerlab.org/minirosetta_database_rev26003.zip
                                                                                                                                                          Unpacking WU data ...
                                                                                                                                                          Unpacking data: ../../projects/ralph.bakerlab.org/frb_0_8_el_chosen.foldcst_chunk_general_cf.t325_.mtyka.boinc_files.zip
                                                                                                                                                          Setting database description ...
                                                                                                                                                          Setting up checkpointing ...
                                                                                                                                                          Setting up folding (abrelax) ...

                                                                                                                                                          ERROR: ERROR: FragmentIO: could not open file aa9mer.1_3.gz
                                                                                                                                                          ERROR:: Exit from: src/core/fragment/FragmentIO.cc line: 245
                                                                                                                                                          BOINC:: Error reading and gzipping output datafile: default.out
                                                                                                                                                          called boinc_finish

                                                                                                                                                          </stderr_txt>
                                                                                                                                                          ]]>

                                                                                                                                                          Tonno

                                                                                                                                                          Joined: Nov 23 06
                                                                                                                                                          Posts: 16
                                                                                                                                                          ID: 2269
                                                                                                                                                          Credit: 49,841
                                                                                                                                                          RAC: 0
                                                                                                                                                          Message 4761 - Posted 27 Mar 2009 21:42:15 UTC - in response to Message 4760.

                                                                                                                                                            Last modified: 27 Mar 2009 21:44:23 UTC

                                                                                                                                                            error after few seconds on windows XP.
                                                                                                                                                            Also the other WU gives an error.
                                                                                                                                                            Is the same error already posted for Mac:
                                                                                                                                                            ERROR: ERROR: FragmentIO: could not open file aa9mer.1_3.gz

                                                                                                                                                            Profile Paul D. Buck

                                                                                                                                                            Joined: Jan 14 09
                                                                                                                                                            Posts: 62
                                                                                                                                                            ID: 5139
                                                                                                                                                            Credit: 33,293
                                                                                                                                                            RAC: 0
                                                                                                                                                            Message 4762 - Posted 28 Mar 2009 9:38:55 UTC

                                                                                                                                                              Several incidents of the error reported by svincent below,

                                                                                                                                                              ERROR: ERROR: FragmentIO: could not open file aa9mer.1_3.gz
                                                                                                                                                              ERROR:: Exit from: src/core/fragment/FragmentIO.cc line: 245

                                                                                                                                                              Task IDs:
                                                                                                                                                              1378040
                                                                                                                                                              1378221
                                                                                                                                                              1384682
                                                                                                                                                              1379800

                                                                                                                                                              I noted on at least two of them that the other wingman also had the task fail ... configuration issue?

                                                                                                                                                              Another error in this latest batch is:

                                                                                                                                                              ERROR: aFrame->nr_frags()
                                                                                                                                                              ERROR:: Exit from: ..\..\src\core\fragment\FragSet.cc line: 168

                                                                                                                                                              Task ID: 1376838

                                                                                                                                                              The only good news I suppose is that the failures happen almost right away...

                                                                                                                                                              Evan

                                                                                                                                                              Joined: Dec 23 07
                                                                                                                                                              Posts: 75
                                                                                                                                                              ID: 3893
                                                                                                                                                              Credit: 69,584
                                                                                                                                                              RAC: 0
                                                                                                                                                              Message 4763 - Posted 28 Mar 2009 19:32:58 UTC

                                                                                                                                                                compute error with:

                                                                                                                                                                1390773
                                                                                                                                                                1390766

                                                                                                                                                                both with message:

                                                                                                                                                                ERROR: ERROR: FragmentIO: could not open file aa9mer.1_3.gz
                                                                                                                                                                ERROR:: Exit from: ..\..\src\core\fragment\FragmentIO.cc line: 245
                                                                                                                                                                BOINC:: Error reading and gzipping output datafile: default.out
                                                                                                                                                                called boinc_finish

                                                                                                                                                                looks a similar fault to some already posted

                                                                                                                                                                Tonno

                                                                                                                                                                Joined: Nov 23 06
                                                                                                                                                                Posts: 16
                                                                                                                                                                ID: 2269
                                                                                                                                                                Credit: 49,841
                                                                                                                                                                RAC: 0
                                                                                                                                                                Message 4764 - Posted 29 Mar 2009 9:19:07 UTC - in response to Message 4763.

                                                                                                                                                                  Some WUs take longer to complete than the default runtime.
                                                                                                                                                                  I'm not sure, but it seems that are all frb_1_8_template_enriched_hb_t286.
                                                                                                                                                                  For exemple this one. link

                                                                                                                                                                  Path7

                                                                                                                                                                  Joined: Feb 11 08
                                                                                                                                                                  Posts: 56
                                                                                                                                                                  ID: 4036
                                                                                                                                                                  Credit: 4,974
                                                                                                                                                                  RAC: 0
                                                                                                                                                                  Message 4765 - Posted 29 Mar 2009 12:18:58 UTC

                                                                                                                                                                    @ Manuel Lupotto: Yes I also had 2 WU starting with: frb_1_8_ which took over 2 hours to complete a single decoy.

                                                                                                                                                                    frb_1_8_el_chosen_hb_t286__SAVE_ALL_OUT_IGNORE_THE_REST_1ESCA_11_8901_1_0
                                                                                                                                                                    frb_1_8_bestfrag_hb_t297__SAVE_ALL_OUT_IGNORE_THE_REST_1VJGA_10_8858_1_0
                                                                                                                                                                    Not a real problem I think, since Rosetta@home has a 3 hour default runtime.

                                                                                                                                                                    My last WU ended with an Unhandled Exception Detected:

                                                                                                                                                                    lb_save_all_out_hb_t369__SAVE_ALL_OUT_1HHSA_3_8759_1_1

                                                                                                                                                                    I was the second one to crunch this WU, the first time it ended with the same error.

                                                                                                                                                                    Have a nice day,
                                                                                                                                                                    Path7.

                                                                                                                                                                    Profile Conan
                                                                                                                                                                    Avatar

                                                                                                                                                                    Joined: Feb 16 06
                                                                                                                                                                    Posts: 344
                                                                                                                                                                    ID: 145
                                                                                                                                                                    Credit: 1,325,239
                                                                                                                                                                    RAC: 792
                                                                                                                                                                    Message 4766 - Posted 31 Mar 2009 3:14:14 UTC - in response to Message 4763.

                                                                                                                                                                      compute error with:

                                                                                                                                                                      1390773
                                                                                                                                                                      1390766

                                                                                                                                                                      both with message:

                                                                                                                                                                      ERROR: ERROR: FragmentIO: could not open file aa9mer.1_3.gz
                                                                                                                                                                      ERROR:: Exit from: ..\..\src\core\fragment\FragmentIO.cc line: 245
                                                                                                                                                                      BOINC:: Error reading and gzipping output datafile: default.out
                                                                                                                                                                      called boinc_finish

                                                                                                                                                                      looks a similar fault to some already posted


                                                                                                                                                                      Also had the same/similar error on Result 1391999
                                                                                                                                                                      Result 1394906

                                                                                                                                                                      ERROR: ERROR: FragmentIO: could not open file aa9mer.1_3.gz
                                                                                                                                                                      ERROR:: Exit from: src/core/fragment/FragmentIO.cc line: 245
                                                                                                                                                                      BOINC:: Error reading and gzipping output datafile: default.out
                                                                                                                                                                      called boinc_finish

                                                                                                                                                                      ____________

                                                                                                                                                                      Profile Paul D. Buck

                                                                                                                                                                      Joined: Jan 14 09
                                                                                                                                                                      Posts: 62
                                                                                                                                                                      ID: 5139
                                                                                                                                                                      Credit: 33,293
                                                                                                                                                                      RAC: 0
                                                                                                                                                                      Message 4767 - Posted 7 Apr 2009 8:23:51 UTC

                                                                                                                                                                        Seven days and no bad tasks? I know we don't get that many total ... but ... isn't about time to promote 1.58 to operational while we chase these last buglets?

                                                                                                                                                                        I know 1.54 is decent, but, 1.58 is marginally better in stability ...

                                                                                                                                                                        What about it guys?

                                                                                                                                                                        Speedy

                                                                                                                                                                        Joined: Dec 4 06
                                                                                                                                                                        Posts: 8
                                                                                                                                                                        ID: 2327
                                                                                                                                                                        Credit: 1,985
                                                                                                                                                                        RAC: 0
                                                                                                                                                                        Message 4768 - Posted 7 Apr 2009 21:54:54 UTC

                                                                                                                                                                          I agree. On the 10th of March I asked when it was going over to the main project, I'm yet to get a answer

                                                                                                                                                                          Profile Paul D. Buck

                                                                                                                                                                          Joined: Jan 14 09
                                                                                                                                                                          Posts: 62
                                                                                                                                                                          ID: 5139
                                                                                                                                                                          Credit: 33,293
                                                                                                                                                                          RAC: 0
                                                                                                                                                                          Message 4769 - Posted 8 Apr 2009 5:52:14 UTC - in response to Message 4768.

                                                                                                                                                                            I agree. On the 10th of March I asked when it was going over to the main project, I'm yet to get a answer

                                                                                                                                                                            They haven't said anything about the bad tasks in like forever also ...

                                                                                                                                                                            Profile Paul D. Buck

                                                                                                                                                                            Joined: Jan 14 09
                                                                                                                                                                            Posts: 62
                                                                                                                                                                            ID: 5139
                                                                                                                                                                            Credit: 33,293
                                                                                                                                                                            RAC: 0
                                                                                                                                                                            Message 4770 - Posted 9 Apr 2009 21:05:28 UTC

                                                                                                                                                                              This task seemed to have hung, tried graphics on it and got a black window. The window would not close so got a GPF for my troubles. The task seems to have "hung" after that and went into high priority mode and no advance on the percentage complete so I shot it.

                                                                                                                                                                              I _ quit

                                                                                                                                                                              Joined: Jan 13 09
                                                                                                                                                                              Posts: 44
                                                                                                                                                                              ID: 5136
                                                                                                                                                                              Credit: 88,562
                                                                                                                                                                              RAC: 0
                                                                                                                                                                              Message 4771 - Posted 10 Apr 2009 10:17:23 UTC

                                                                                                                                                                                1utg__BOINC_ABINITIO_IGNORE_THE_REST-MOO18--1utg_-_9087_1_0 died with a computation error after 4201 seconds.

                                                                                                                                                                                Error -1073741819 (0xffffffffc0000005)
                                                                                                                                                                                it looks to have completed or at least started work on 17 models before crashing.

                                                                                                                                                                                Profile feet1st

                                                                                                                                                                                Joined: Mar 7 06
                                                                                                                                                                                Posts: 312
                                                                                                                                                                                ID: 1028
                                                                                                                                                                                Credit: 110,522
                                                                                                                                                                                RAC: 0
                                                                                                                                                                                Message 4772 - Posted 14 Apr 2009 16:30:15 UTC

                                                                                                                                                                                  Last modified: 14 Apr 2009 16:30:31 UTC

                                                                                                                                                                                  I've had a number of WUs with names like this:
                                                                                                                                                                                  rest3d85_ip40_2g3r.patchdock.3.pdb_0002_fa_dock.xml_score12_pert38_DOCK_9104

                                                                                                                                                                                  All 4 ran through 99 models in just 3hours. That will throw off work fetch for folks on Rosetta with longer runtime preference.
                                                                                                                                                                                  ____________

                                                                                                                                                                                  svincent

                                                                                                                                                                                  Joined: Apr 4 08
                                                                                                                                                                                  Posts: 34
                                                                                                                                                                                  ID: 4182
                                                                                                                                                                                  Credit: 51,768
                                                                                                                                                                                  RAC: 0
                                                                                                                                                                                  Message 4773 - Posted 15 Apr 2009 15:40:12 UTC

                                                                                                                                                                                    These 3 tasks, all named broker_lb_test2_hb*, failed on Mac after apparent successful completion due to some file error.

                                                                                                                                                                                    1421321
                                                                                                                                                                                    1421322
                                                                                                                                                                                    1421323

                                                                                                                                                                                    </stderr_txt>
                                                                                                                                                                                    <message>
                                                                                                                                                                                    <file_xfer_error>
                                                                                                                                                                                    <file_name>broker_lb_test2_hb_t363__IGNORE_THE_REST_9214_1_0_0</file_name>
                                                                                                                                                                                    <error_code>-161</error_code>
                                                                                                                                                                                    </file_xfer_error>


                                                                                                                                                                                    svincent

                                                                                                                                                                                    Joined: Apr 4 08
                                                                                                                                                                                    Posts: 34
                                                                                                                                                                                    ID: 4182
                                                                                                                                                                                    Credit: 51,768
                                                                                                                                                                                    RAC: 0
                                                                                                                                                                                    Message 4774 - Posted 15 Apr 2009 15:43:28 UTC

                                                                                                                                                                                      This task crashed on Mac with a segmentation violation.

                                                                                                                                                                                      1421317

                                                                                                                                                                                      Crash log is in the task file: here's the first bit:


                                                                                                                                                                                      Thread 0 Crashed:
                                                                                                                                                                                      0 ...etta_1.59_i686-apple-darwin 0x009579fe __ZNK4core7scoring7methods10VDW_Energy19residue_pair_energyERKNS_12conformation7ResidueES6_RKNS_4pose4PoseERKNS0_13ScoreFunctionERNS0_17TwoBodyEMapVectorE + 1534
                                                                                                                                                                                      1 ...etta_1.59_i686-apple-darwin 0x00189b81 __ZNK4core7scoring13ScoreFunctionclERNS_4pose4PoseE + 5171
                                                                                                                                                                                      2 ...etta_1.59_i686-apple-darwin 0x004e9985 __ZN9protocols8abinitio12AbrelaxMover5applyERN4core4pose4PoseE + 5993
                                                                                                                                                                                      3 ...etta_1.59_i686-apple-darwin 0x004d0a99 __ZN9protocols3jd214JobDistributor2goEN7utility7pointer10owning_ptrINS_5moves5MoverEEE + 4041
                                                                                                                                                                                      4 ...etta_1.59_i686-apple-darwin 0x00ace519 __ZN9protocols3jd219BOINCJobDistributor2goEN7utility7pointer10owning_ptrINS_5moves5MoverEEE + 41
                                                                                                                                                                                      5 ...etta_1.59_i686-apple-darwin 0x0010f73c __ZN9protocols8abinitio11Broker_mainEv + 812
                                                                                                                                                                                      6 ...etta_1.59_i686-apple-darwin 0x0000402c _main + 2532
                                                                                                                                                                                      7 ...etta_1.59_i686-apple-darwin 0x00001eee __start + 216
                                                                                                                                                                                      8 ...etta_1.59_i686-apple-darwin 0x00001e15 start + 41

                                                                                                                                                                                      Profile robertmiles

                                                                                                                                                                                      Joined: Jan 13 09
                                                                                                                                                                                      Posts: 79
                                                                                                                                                                                      ID: 5137
                                                                                                                                                                                      Credit: 246,177
                                                                                                                                                                                      RAC: 399
                                                                                                                                                                                      Message 4775 - Posted 16 Apr 2009 2:46:01 UTC

                                                                                                                                                                                        I now have a minirosetta 1.59 workunit. Is it time to create a new thread for 1.59?

                                                                                                                                                                                        Profile robertmiles

                                                                                                                                                                                        Joined: Jan 13 09
                                                                                                                                                                                        Posts: 79
                                                                                                                                                                                        ID: 5137
                                                                                                                                                                                        Credit: 246,177
                                                                                                                                                                                        RAC: 399
                                                                                                                                                                                        Message 4790 - Posted 18 Apr 2009 20:51:40 UTC - in response to Message 4775.

                                                                                                                                                                                          Last modified: 18 Apr 2009 20:52:41 UTC

                                                                                                                                                                                          On this 1.59 workunit, I ran into the lockfile problem on structure _U16X13X_00019, but my wingman chose a shorter workunit length and therefore didn't even try that structure:

                                                                                                                                                                                          http://ralph.bakerlab.org/result.php?resultid=1422995

                                                                                                                                                                                          http://ralph.bakerlab.org/workunit.php?wuid=1260423

                                                                                                                                                                                          I use BOINC 6.2.28 under 32-bit Vista SP1 on that machine.

                                                                                                                                                                                          Although my machine still uses settings intended to check for the lockfile problem, I'm having to reboot my machine more often to get past problems with the router I'm using to allow a recently installed newer computer to reach the internet, and therefore less likely to actually see such problems.

                                                                                                                                                                                          Message boards : RALPH@home bug list : minirosetta 1.58


                                                                                                                                                                                          Home | Join | About | Participants | Community | Statistics

                                                                                                                                                                                          Copyright © 2017 University of Washington

                                                                                                                                                                                          Last Modified: 20 Nov 2008 19:41:56 UTC
                                                                                                                                                                                          Back to top ^