RALPH@home

Bug reports for Ralph 5.52-5.54

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search

Message boards : RALPH@home bug list : Bug reports for Ralph 5.52-5.54

AuthorMessage
Rhiju
Forum moderator
Project developer
Project scientist

Joined: Feb 14 06
Posts: 161
ID: 4
Credit: 3,725
RAC: 0
Message 2866 - Posted 14 Mar 2007 22:07:09 UTC

    Last modified: 15 Mar 2007 3:28:07 UTC

    Hopefully this will fix the problem with RNA WUs running over time. Also, please post if you had difficulties with 5.51 on your machine, e.g., with graphics!
    ____________

    Profile feet1st

    Joined: Mar 7 06
    Posts: 312
    ID: 1028
    Credit: 110,522
    RAC: 1
    Message 2867 - Posted 15 Mar 2007 1:53:24 UTC

      Last modified: 15 Mar 2007 1:54:49 UTC

      Got work? ...and what \"major problem\" are you addressing? I mean there have been a few quirky things going on. Just wanted to pin it down a little as to what you\'ve found.
      ____________

      Rhiju
      Forum moderator
      Project developer
      Project scientist

      Joined: Feb 14 06
      Posts: 161
      ID: 4
      Credit: 3,725
      RAC: 0
      Message 2868 - Posted 15 Mar 2007 3:28:28 UTC - in response to Message 2867.

        Sorry, clarified below. Also sending out more work.

        Got work? ...and what \"major problem\" are you addressing? I mean there have been a few quirky things going on. Just wanted to pin it down a little as to what you\'ve found.


        ____________

        Profile feet1st

        Joined: Mar 7 06
        Posts: 312
        ID: 1028
        Credit: 110,522
        RAC: 1
        Message 2869 - Posted 15 Mar 2007 12:45:30 UTC

          Well, the RNA tasks seemed to have the following issues:

          1) Not showing models, only nstructs in the reported results (and presumably in the graphic as well?)

          2) Were the steps incrementing from the start of the run through the end?

          3) \"stage\" remaining at \"initializing . . .\" during entire run.

          4) Producing 30 nstructs, regardless of runtime preference. Actually, in most cases this resulted in them ending \"early\".

          5) Unusually high number of validation errors.

          ...and so it sounds like these 5 have all been addressed by v5.52? And I believe all of the above ONLY pertained to the new RNA work.

          There have also been some problems with -107 return codes with Access Violations. Have problems in this area been addressed as well?

          ============= ...and the wish list...

          New users are still very confused about the way percent completed and estimated runtime are presented. They end up resetting the project, and reinstalling BOINC and things. And this is the patient ones that try to get it \"work properly\" and post that they are having problems. We can only speculate how many others just throw up their hands and disconnect.

          What are the plans for advance posting of new releases and having an advanced copy available for download? Is this only as time permits? Or a practice that is no longer required since BOINC has better compression?
          ____________

          Profile feet1st

          Joined: Mar 7 06
          Posts: 312
          ID: 1028
          Credit: 110,522
          RAC: 1
          Message 2870 - Posted 15 Mar 2007 13:09:28 UTC

            I\'m crunching model 11 of this guy:
            http://ralph.bakerlab.org/result.php?resultid=457544 2f88__BOINC_INCREASE_CYCLES_RNA_ABINITIO-2f88_-_1844_55_0

            And it seems to be modeling the blue end of the strand and moving it around, and the graphic shows a spec of blue popping in and out on the side opposite the blue end. Not attached to anything. No defined shape to it, just a short line. Not close enough to the rest of the action to be a sidechain.
            ____________

            Rhiju
            Forum moderator
            Project developer
            Project scientist

            Joined: Feb 14 06
            Posts: 161
            ID: 4
            Credit: 3,725
            RAC: 0
            Message 2871 - Posted 15 Mar 2007 18:37:04 UTC - in response to Message 2869.

              Hi feet1st, thanks for summarizing the issues. I\'ve tried to put in fixes for all five RNA issues. I did put in a fix for a known problem withthe graphics (where it gets confused about drawing the last segment of the molecule), not sure if that will solve the -107 issue.

              Let me know if that is helping. Certainly, the validate rate seems higher from my end! I\'ll send out more work, too.

              As for advance copies for download on r@h, we\'ve now compressed the app for Windows and Linux, so that seems to address the bandwidth issue for (most) users with multiple machines. I think we had trouble compressing the Mac builds, but if you point us to a good compressor, we\'ll try it of course!


              Well, the RNA tasks seemed to have the following issues:

              1) Not showing models, only nstructs in the reported results (and presumably in the graphic as well?)

              2) Were the steps incrementing from the start of the run through the end?

              3) \"stage\" remaining at \"initializing . . .\" during entire run.

              4) Producing 30 nstructs, regardless of runtime preference. Actually, in most cases this resulted in them ending \"early\".

              5) Unusually high number of validation errors.

              ...and so it sounds like these 5 have all been addressed by v5.52? And I believe all of the above ONLY pertained to the new RNA work.

              There have also been some problems with -107 return codes with Access Violations. Have problems in this area been addressed as well?

              ============= ...and the wish list...

              New users are still very confused about the way percent completed and estimated runtime are presented. They end up resetting the project, and reinstalling BOINC and things. And this is the patient ones that try to get it \"work properly\" and post that they are having problems. We can only speculate how many others just throw up their hands and disconnect.

              What are the plans for advance posting of new releases and having an advanced copy available for download? Is this only as time permits? Or a practice that is no longer required since BOINC has better compression?


              ____________

              Rhiju
              Forum moderator
              Project developer
              Project scientist

              Joined: Feb 14 06
              Posts: 161
              ID: 4
              Credit: 3,725
              RAC: 0
              Message 2872 - Posted 15 Mar 2007 21:37:41 UTC - in response to Message 2871.

                I\'m seeing a freaky blue dot showing up at random spots on 1qxa_RNA workunits. Let me see if I can find the fix today. Any other problems showing up out there?
                ____________

                Profile feet1st

                Joined: Mar 7 06
                Posts: 312
                ID: 1028
                Credit: 110,522
                RAC: 1
                Message 2873 - Posted 15 Mar 2007 22:34:38 UTC - in response to Message 2872.

                  Great! Glad to hear they are ALL addressed. I realize several of them were not \"major\" problems, but they each caused a number of concerned reports on the boards from confused observers.


                  I\'m seeing a freaky blue dot showing up at random spots on 1qxa_RNA workunits. Let me see if I can find the fix today. Any other problems showing up out there?


                  ...and 1kka__...RNA...
                  ...and 2f88__...RNA... (I\'ve got a second 2f88 and am seeing the same \"freaky blue dot\" on it as well).
                  ____________

                  Profile feet1st

                  Joined: Mar 7 06
                  Posts: 312
                  ID: 1028
                  Credit: 110,522
                  RAC: 1
                  Message 2874 - Posted 16 Mar 2007 13:16:58 UTC

                    Since noone seems to have any other problems...

                    Wondering how difficult it might be to change the preferences page so that when a runtime preference is established, that a \"maximum runtime allowed\" field is shown. And have it just show 5x the amount of the preference (I believe 5x is the point at which the watchdog would end a task).

                    I KNOW the screen says \"target\", but many people seem to consider that an absolute \"limit\". By showing a maximum as well, it would make very clear what the project considers the limit to be.

                    Otherwise, how difficult would it be to predict that a given machine will not complete a model for a given task prior to it\'s configured preferred runtime? If there were some way to avoid sending a long running DOC task to a slow PC with a short preference, that would be another way to minimize the problem. But then you\'ve got other users that do understand how it works, and just want to get results back to you faster and make good use of their old slow machine. If the machine were slow enough, it would be entirely possible that no tasks would complete within the preferred RT.
                    ____________

                    Profile Conan
                    Avatar

                    Joined: Feb 16 06
                    Posts: 344
                    ID: 145
                    Credit: 1,309,534
                    RAC: 0
                    Message 2875 - Posted 17 Mar 2007 0:48:37 UTC

                      Have 6 workunits that have sat on 2 of my computers for many, many hours but have not progressed.
                      I had to abort all of them. 4 were on the one machine which had all 4 cores alocated to WU\'s but the time was not moving the cpus were on Zero usage, the other computer had 2 WU\'s going but they also showed no cpu usage.
                      On Boinc Manager it showed over an hour (17% -19%) done for two of the units and over 4 hours (66%-69%) done on the other 4 units but in my results it shows Zero cpu time, how can this be?

                      http://ralph.bakerlab.org/result.php?resultid=457366 (17% complete)
                      http://ralph.bakerlab.org/result.php?resultid=457365 (17%+ complete)

                      http://ralph.bakerlab.org/result.php?resultid=458259 (66%+ complete)
                      http://ralph.bakerlab.org/result.php?resultid=458261 (66%+ complete)
                      http://ralph.bakerlab.org/result.php?resultid=458336 (66%+ complete)
                      http://ralph.bakerlab.org/result.php?resultid=458337 (66%+ complete)


                      ____________

                      Thomas Leibold

                      Joined: Feb 25 07
                      Posts: 27
                      ID: 2684
                      Credit: 77,464
                      RAC: 0
                      Message 2876 - Posted 17 Mar 2007 3:33:11 UTC

                        Last modified: 17 Mar 2007 3:33:36 UTC

                        I thought the linux client was statically linked ? I\'m getting an error because of a missing shared library (C++ Standard Library version 6) with the 5.52 Ralph client.

                        <stderr_txt>
                        rosetta_beta_5.52_i686-pc-linux-gnu: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory

                        </stderr_txt>

                        Workunits 408735, 408642, 408607
                        OS: SuSE Linux 9.3

                        Ty

                        Joined: Feb 18 06
                        Posts: 1
                        ID: 551
                        Credit: 4,431
                        RAC: 0
                        Message 2877 - Posted 17 Mar 2007 5:44:08 UTC

                          Work Unit 408043 rosetta_beta 5.52 Boinc 5.4.11
                          Before starting work, computer showed starting time to completion 54.xx min.
                          While computer worked on work unit, time to completion incremented, instead of decrementing and progress remained at 1.000% until completion.
                          Computer finished work unit in 2368.02 seconds, about 39.5 minutes,
                          and displayed progress at 100% and uploaded.
                          Computer: AMD Sempron 3400+ Win XP Pro x64 SP2

                          Conrad Poohs
                          Avatar

                          Joined: Aug 29 06
                          Posts: 9
                          ID: 1758
                          Credit: 1,955
                          RAC: 0
                          Message 2878 - Posted 17 Mar 2007 10:23:20 UTC

                            This WU ran for two hours, my setting is four, and produced 30 nstructs 1 decoys.
                            ____________

                            LudwigVonDrake

                            Joined: Aug 9 06
                            Posts: 1
                            ID: 1679
                            Credit: 46,669
                            RAC: 0
                            Message 2879 - Posted 17 Mar 2007 15:36:41 UTC

                              I\'m getting this error also.

                              \"I thought the linux client was statically linked ? I\'m getting an error because of a missing shared library (C++ Standard Library version 6) with the 5.52 Ralph client.

                              <stderr_txt>
                              rosetta_beta_5.52_i686-pc-linux-gnu: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory

                              </stderr_txt>
                              \"
                              ____________

                              John Hunt
                              Avatar

                              Joined: Mar 16 07
                              Posts: 10
                              ID: 2766
                              Credit: 28,654
                              RAC: 1
                              Message 2880 - Posted 17 Mar 2007 17:24:19 UTC

                                Last modified: 17 Mar 2007 18:19:02 UTC

                                This WU ran for almost two hours (although WU length in my profile was at default = 1 hr) showing 1% complete all the way until it completed successfully.

                                <edit>
                                Looks like the same thing is going to happen with this WU. I\'ll let it run.



                                ____________

                                John Hunt
                                Avatar

                                Joined: Mar 16 07
                                Posts: 10
                                ID: 2766
                                Credit: 28,654
                                RAC: 1
                                Message 2881 - Posted 17 Mar 2007 19:16:45 UTC


                                  <edit>
                                  Looks like the same thing is going to happen with this WU. I\'ll let it run.


                                  WU finished in 51 mins - % complete was at 1% all the way through.

                                  ____________

                                  Profile [B^S] JoeB@Ky

                                  Joined: Oct 11 06
                                  Posts: 8
                                  ID: 1990
                                  Credit: 39,098
                                  RAC: 0
                                  Message 2882 - Posted 17 Mar 2007 21:03:01 UTC

                                    Every WU I have run under 5.52 has stuck at 1.000% completed no matter how long I have let them run. After 2-3 hours i have been abourting them. No problems with any of the previous sets of WU\'s.

                                    Profile anders n

                                    Joined: Feb 16 06
                                    Posts: 166
                                    ID: 91
                                    Credit: 131,419
                                    RAC: 0
                                    Message 2884 - Posted 18 Mar 2007 8:16:34 UTC

                                      Last modified: 18 Mar 2007 9:15:48 UTC

                                      Bug ?

                                      I have one WU http://ralph.bakerlab.org/result.php?resultid=463216
                                      that now is at 7H 15min on a pref. of 4 H.
                                      It is at 76.9% and has been on that for the last 2H at least.
                                      I can\'t get it to show grafics.
                                      I have 2 cores on that MAC and on the other core there is a Rosetta running
                                      where I can se grafics ok.

                                      I\'ll let it run and se how it turns out.

                                      Anders n

                                      [edit] 8H 15 min same 76.9% [/edit]
                                      ____________

                                      Thomas Leibold

                                      Joined: Feb 25 07
                                      Posts: 27
                                      ID: 2684
                                      Credit: 77,464
                                      RAC: 0
                                      Message 2885 - Posted 18 Mar 2007 19:20:30 UTC - in response to Message 2876.

                                        <stderr_txt>
                                        rosetta_beta_5.52_i686-pc-linux-gnu: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory
                                        </stderr_txt>


                                        Looks like my computer tried to get some Ralph work done last night: at least 15 of those errors.
                                        I have installed libstdc++.so.6.0.3 (there is no official package for SuSE 9.3, but I found a third party package that happened to include this library because they needed it too).
                                        Of course now the project is out of work, so I don\'t know whether that would have solved the problem or if there are other shared libraries that are missing as well.

                                        Michael.L

                                        Joined: Nov 26 06
                                        Posts: 5
                                        ID: 2278
                                        Credit: 1,173
                                        RAC: 0
                                        Message 2886 - Posted 18 Mar 2007 22:49:59 UTC

                                          Last modified: 18 Mar 2007 22:59:20 UTC

                                          18/03/2007 22:39:55|ralph@home|Starting BENCH_04JUMPING_SAVE_ALL_OUT_-1hz6A-_NATIVE_PAIR_7_57_BARCODE_R51H_1845_25_1
                                          18/03/2007 22:39:56|ralph@home|Starting task BENCH_04JUMPING_SAVE_ALL_OUT_-1hz6A-_NATIVE_PAIR_7_57_BARCODE_R51H_1845_25_1 using rosetta_beta version 552
                                          -----
                                          Why two starts? The two messages are consecutive.
                                          Thought for a horrible second that Rosie had me as a dual core but saw that this WU is the only one I had remaining.

                                          Profile anders n

                                          Joined: Feb 16 06
                                          Posts: 166
                                          ID: 91
                                          Credit: 131,419
                                          RAC: 0
                                          Message 2887 - Posted 19 Mar 2007 5:20:20 UTC - in response to Message 2884.

                                            Bug ?

                                            I have one WU http://ralph.bakerlab.org/result.php?resultid=463216
                                            that now is at 7H 15min on a pref. of 4 H.
                                            It is at 76.9% and has been on that for the last 2H at least.
                                            I can\'t get it to show grafics.
                                            I have 2 cores on that MAC and on the other core there is a Rosetta running
                                            where I can se grafics ok.

                                            I\'ll let it run and se how it turns out.

                                            Anders n

                                            [edit] 8H 15 min same 76.9% [/edit]


                                            After 16H 28min it was swiched for Rosetta. It was finished when I woke up
                                            this morning and reported 3H and 5 min.

                                            Anders n

                                            ____________

                                            Profile feet1st

                                            Joined: Mar 7 06
                                            Posts: 312
                                            ID: 1028
                                            Credit: 110,522
                                            RAC: 1
                                            Message 2888 - Posted 19 Mar 2007 15:24:17 UTC

                                              anders n, do the messages indicate that the task was preempted and restarted at all during the night? Almost sounds like it was removed from memory and reverted back to a prior checkpoint.
                                              ____________

                                              Profile anders n

                                              Joined: Feb 16 06
                                              Posts: 166
                                              ID: 91
                                              Credit: 131,419
                                              RAC: 0
                                              Message 2889 - Posted 19 Mar 2007 15:36:09 UTC - in response to Message 2888.

                                                anders n, do the messages indicate that the task was preempted and restarted at all during the night? Almost sounds like it was removed from memory and reverted back to a prior checkpoint.


                                                It was preempted but not removed from memory.

                                                Anders n

                                                ____________

                                                Rhiju
                                                Forum moderator
                                                Project developer
                                                Project scientist

                                                Joined: Feb 14 06
                                                Posts: 161
                                                ID: 4
                                                Credit: 3,725
                                                RAC: 0
                                                Message 2890 - Posted 19 Mar 2007 23:48:04 UTC - in response to Message 2885.

                                                  Sorry, more work coming now!

                                                  <stderr_txt>
                                                  rosetta_beta_5.52_i686-pc-linux-gnu: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory
                                                  </stderr_txt>


                                                  Looks like my computer tried to get some Ralph work done last night: at least 15 of those errors.
                                                  I have installed libstdc++.so.6.0.3 (there is no official package for SuSE 9.3, but I found a third party package that happened to include this library because they needed it too).
                                                  Of course now the project is out of work, so I don\'t know whether that would have solved the problem or if there are other shared libraries that are missing as well.


                                                  ____________

                                                  Rhiju
                                                  Forum moderator
                                                  Project developer
                                                  Project scientist

                                                  Joined: Feb 14 06
                                                  Posts: 161
                                                  ID: 4
                                                  Credit: 3,725
                                                  RAC: 0
                                                  Message 2891 - Posted 20 Mar 2007 0:23:33 UTC - in response to Message 2890.

                                                    Last modified: 20 Mar 2007 0:32:43 UTC

                                                    On second thought, I\'m a little worried about our recent Linux build based on your comment. I think we changed the way libraries are used in the build -- let me see if I can fix this.

                                                    <stderr_txt>
                                                    rosetta_beta_5.52_i686-pc-linux-gnu: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory
                                                    </stderr_txt>


                                                    Looks like my computer tried to get some Ralph work done last night: at least 15 of those errors.
                                                    I have installed libstdc++.so.6.0.3 (there is no official package for SuSE 9.3, but I found a third party package that happened to include this library because they needed it too).
                                                    Of course now the project is out of work, so I don\'t know whether that would have solved the problem or if there are other shared libraries that are missing as well.

                                                    [/quote]

                                                    ____________

                                                    Rhiju
                                                    Forum moderator
                                                    Project developer
                                                    Project scientist

                                                    Joined: Feb 14 06
                                                    Posts: 161
                                                    ID: 4
                                                    Credit: 3,725
                                                    RAC: 0
                                                    Message 2892 - Posted 20 Mar 2007 1:33:04 UTC - in response to Message 2891.

                                                      OK 5.54 should have the old style build with static libraries -- let\'s see how it goes. I\'m seeing at least one host that had consistent success up through 5.51, then has been giving shared library errors in 5.52 and 5.53. So I\'ll see if it returns good results with 5.54.

                                                      On second thought, I\'m a little worried about our recent Linux build based on your comment. I think we changed the way libraries are used in the build -- let me see if I can fix this.

                                                      <stderr_txt>
                                                      rosetta_beta_5.52_i686-pc-linux-gnu: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory
                                                      </stderr_txt>


                                                      Looks like my computer tried to get some Ralph work done last night: at least 15 of those errors.
                                                      I have installed libstdc++.so.6.0.3 (there is no official package for SuSE 9.3, but I found a third party package that happened to include this library because they needed it too).
                                                      Of course now the project is out of work, so I don\'t know whether that would have solved the problem or if there are other shared libraries that are missing as well.


                                                      [/quote]

                                                      ____________

                                                      Thomas Leibold

                                                      Joined: Feb 25 07
                                                      Posts: 27
                                                      ID: 2684
                                                      Credit: 77,464
                                                      RAC: 0
                                                      Message 2893 - Posted 20 Mar 2007 2:52:25 UTC - in response to Message 2890.

                                                        Sorry, more work coming now!

                                                        Thanks! I got one and it appears to be running fine using Ralph 5.53 with the newly installed libstdc++.so.6 which means that this was the only library that was missing.

                                                        I saw the news that Ralph 5.54 fixes the library issue, but haven\'t gotten that version of the client yet.

                                                        Rhiju
                                                        Forum moderator
                                                        Project developer
                                                        Project scientist

                                                        Joined: Feb 14 06
                                                        Posts: 161
                                                        ID: 4
                                                        Credit: 3,725
                                                        RAC: 0
                                                        Message 2894 - Posted 20 Mar 2007 17:59:00 UTC - in response to Message 2893.

                                                          Hi Thomas, good to hear you client is working again. I also haven\'t seen any more
                                                          libstdc++.so.6 errors after the release of 5.54. So hopefully that\'s fixed for other Linux users who
                                                          aren\'t as library-savvy as you. Thanks for posting!

                                                          Sorry, more work coming now!

                                                          Thanks! I got one and it appears to be running fine using Ralph 5.53 with the newly installed libstdc++.so.6 which means that this was the only library that was missing.

                                                          I saw the news that Ralph 5.54 fixes the library issue, but haven\'t gotten that version of the client yet.


                                                          ____________

                                                          Profile feet1st

                                                          Joined: Mar 7 06
                                                          Posts: 312
                                                          ID: 1028
                                                          Credit: 110,522
                                                          RAC: 1
                                                          Message 2895 - Posted 20 Mar 2007 21:49:10 UTC

                                                            Just for giggles, I ended and started BOINC while a 5.54 RNA task was in progress... lost all of my progress on model #2, over 30min of work. The first model took just over 40min to complete, so it should have been some 3/4 through with model 2.

                                                            Do the RNA WUs have any checkpointing? Or is progress only saved upon model completion?
                                                            ____________

                                                            genes
                                                            Avatar

                                                            Joined: Feb 16 06
                                                            Posts: 45
                                                            ID: 57
                                                            Credit: 43,300
                                                            RAC: 0
                                                            Message 2896 - Posted 21 Mar 2007 11:18:54 UTC

                                                              Last modified: 21 Mar 2007 11:23:31 UTC

                                                              No problems so far... :) returned 12 valid 5.54, 1 valid 5.53, 50 valid 5.52 WU\'s all on Windows boxes. No errors yet.
                                                              ____________

                                                              Profile anders n

                                                              Joined: Feb 16 06
                                                              Posts: 166
                                                              ID: 91
                                                              Credit: 131,419
                                                              RAC: 0
                                                              Message 2898 - Posted 21 Mar 2007 15:58:27 UTC - in response to Message 2887.

                                                                Last modified: 21 Mar 2007 15:59:06 UTC

                                                                Bug ?

                                                                I have one WU http://ralph.bakerlab.org/result.php?resultid=463216
                                                                that now is at 7H 15min on a pref. of 4 H.
                                                                It is at 76.9% and has been on that for the last 2H at least.
                                                                I can\'t get it to show grafics.
                                                                I have 2 cores on that MAC and on the other core there is a Rosetta running
                                                                where I can se grafics ok.

                                                                I\'ll let it run and se how it turns out.

                                                                Anders n

                                                                [edit] 8H 15 min same 76.9% [/edit]


                                                                After 16H 28min it was swiched for Rosetta. It was finished when I woke up
                                                                this morning and reported 3H and 5 min.

                                                                Anders n

                                                                A new Wu \"stuck\" http://ralph.bakerlab.org/result.php?resultid=467564

                                                                Now at 16H 33 min at 69,7%.
                                                                I have stopped other projects so it will not be preemted this time.
                                                                Lets hope for the best.

                                                                Anders n



                                                                ____________

                                                                Profile anders n

                                                                Joined: Feb 16 06
                                                                Posts: 166
                                                                ID: 91
                                                                Credit: 131,419
                                                                RAC: 0
                                                                Message 2899 - Posted 22 Mar 2007 4:49:06 UTC - in response to Message 2898.


                                                                  A new Wu \"stuck\" http://ralph.bakerlab.org/result.php?resultid=467564

                                                                  Now at 16H 33 min at 69,7%.
                                                                  I have stopped other projects so it will not be preemted this time.
                                                                  Lets hope for the best.

                                                                  Anders n


                                                                  It is now at 30H 19 min.
                                                                  The watchdog should have done it\'s work by now.
                                                                  What do you want me to do?

                                                                  Anders n

                                                                  ____________

                                                                  Profile feet1st

                                                                  Joined: Mar 7 06
                                                                  Posts: 312
                                                                  ID: 1028
                                                                  Credit: 110,522
                                                                  RAC: 1
                                                                  Message 2900 - Posted 22 Mar 2007 13:28:32 UTC

                                                                    Last modified: 22 Mar 2007 13:29:14 UTC

                                                                    Anders n, what is your runtime preference? The watchdog should step in within about 15min of crossing 4x the preferred runtime.

                                                                    Has it made no progress on model numbers in that time?
                                                                    ____________

                                                                    Profile anders n

                                                                    Joined: Feb 16 06
                                                                    Posts: 166
                                                                    ID: 91
                                                                    Credit: 131,419
                                                                    RAC: 0
                                                                    Message 2901 - Posted 22 Mar 2007 15:53:53 UTC - in response to Message 2900.

                                                                      Anders n, what is your runtime preference? The watchdog should step in within about 15min of crossing 4x the preferred runtime.

                                                                      Has it made no progress on model numbers in that time?


                                                                      It is now at 40H 50 min.

                                                                      Still at 69,7%.

                                                                      My pref. runtime is 4H.

                                                                      I can not open grafics on this WU.

                                                                      Anders n

                                                                      ____________

                                                                      Profile feet1st

                                                                      Joined: Mar 7 06
                                                                      Posts: 312
                                                                      ID: 1028
                                                                      Credit: 110,522
                                                                      RAC: 1
                                                                      Message 2902 - Posted 22 Mar 2007 16:09:51 UTC

                                                                        I\'d suggest you suspend it, then resume it again and there are a few possible outcomes. Knowing the outcome may be of use to the Project Team.

                                                                        1) it may end with errors.

                                                                        2) it may restart the same model and end up in the same state of not being able to complete the model. (I\'d only let it run 4hrs this time, I\'ve not seen any tasks that should take that long to complete a model on a 2ghz CPU, which you would see by the % completed changing).

                                                                        3) it may then complete the model normally in <2hrs.
                                                                        ____________

                                                                        Profile anders n

                                                                        Joined: Feb 16 06
                                                                        Posts: 166
                                                                        ID: 91
                                                                        Credit: 131,419
                                                                        RAC: 0
                                                                        Message 2903 - Posted 22 Mar 2007 16:24:41 UTC - in response to Message 2902.

                                                                          Last modified: 22 Mar 2007 16:25:40 UTC

                                                                          I\'d suggest you suspend it, then resume it again and there are a few possible outcomes. Knowing the outcome may be of use to the Project Team.

                                                                          1) it may end with errors.

                                                                          2) it may restart the same model and end up in the same state of not being able to complete the model. (I\'d only let it run 4hrs this time, I\'ve not seen any tasks that should take that long to complete a model on a 2ghz CPU, which you would see by the % completed changing).

                                                                          3) it may then complete the model normally in <2hrs.


                                                                          Test 1Suspend and resume

                                                                          - the task continues where it was.

                                                                          Test 2 Restarting Boinc

                                                                          - It resumes at 2H 47min and I can now view grafics.

                                                                          Anders n
                                                                          ____________

                                                                          Profile feet1st

                                                                          Joined: Mar 7 06
                                                                          Posts: 312
                                                                          ID: 1028
                                                                          Credit: 110,522
                                                                          RAC: 1
                                                                          Message 2904 - Posted 22 Mar 2007 19:20:58 UTC

                                                                            Oops, yep, you must be keeping the application in memory (a good thing). I wasn\'t thinking. Yes, ending BOINC and restarting... basically we\'re forcing the WU to pick up from the last checkpoint.

                                                                            Curious, what % done did it say upon restart?
                                                                            ____________

                                                                            Profile anders n

                                                                            Joined: Feb 16 06
                                                                            Posts: 166
                                                                            ID: 91
                                                                            Credit: 131,419
                                                                            RAC: 0
                                                                            Message 2905 - Posted 22 Mar 2007 19:27:00 UTC - in response to Message 2904.

                                                                              Last modified: 22 Mar 2007 19:27:48 UTC

                                                                              Oops, yep, you must be keeping the application in memory (a good thing). I wasn\'t thinking. Yes, ending BOINC and restarting... basically we\'re forcing the WU to pick up from the last checkpoint.

                                                                              Curious, what % done did it say upon restart?


                                                                              69,7% It started decoy 10.

                                                                              Anders n
                                                                              ____________

                                                                              Profile feet1st

                                                                              Joined: Mar 7 06
                                                                              Posts: 312
                                                                              ID: 1028
                                                                              Credit: 110,522
                                                                              RAC: 1
                                                                              Message 2906 - Posted 22 Mar 2007 22:18:12 UTC

                                                                                I\'ve got several that seem to have ended normally (lasted through to my 24hr time preference), but the reported WU shows a \"No heartbeat from core client for 31 sec - exiting\" message.

                                                                                http://ralph.bakerlab.org/result.php?resultid=459533 v5.52
                                                                                http://ralph.bakerlab.org/result.php?resultid=466179 v5.52
                                                                                http://ralph.bakerlab.org/result.php?resultid=467168 v5.54
                                                                                ____________

                                                                                Profile anders n

                                                                                Joined: Feb 16 06
                                                                                Posts: 166
                                                                                ID: 91
                                                                                Credit: 131,419
                                                                                RAC: 0
                                                                                Message 2908 - Posted 24 Mar 2007 9:31:46 UTC

                                                                                  I may be on to what is happening with my MAC.
                                                                                  A Ralph task was preemted and Rosetta continued and all worked as usual.
                                                                                  The Rosetta task finished and Ralph was to continue.
                                                                                  For several seconds the Ralph task showed as running but no ticking on the
                                                                                  CPU time then the time started to go up But I could not watch Grafics on
                                                                                  that task again.
                                                                                  I restarted Boinc and everything was back to normal.

                                                                                  Anders n
                                                                                  ____________

                                                                                  Profile anders n

                                                                                  Joined: Feb 16 06
                                                                                  Posts: 166
                                                                                  ID: 91
                                                                                  Credit: 131,419
                                                                                  RAC: 0
                                                                                  Message 2909 - Posted 24 Mar 2007 21:12:18 UTC - in response to Message 2908.

                                                                                    I may be on to what is happening with my MAC.
                                                                                    A Ralph task was preemted and Rosetta continued and all worked as usual.
                                                                                    The Rosetta task finished and Ralph was to continue.
                                                                                    For several seconds the Ralph task showed as running but no ticking on the
                                                                                    CPU time then the time started to go up But I could not watch Grafics on
                                                                                    that task again.
                                                                                    I restarted Boinc and everything was back to normal.

                                                                                    Anders n


                                                                                    Same thing happend again this time it was a swich between Einstein
                                                                                    and Ralph.

                                                                                    Anders n
                                                                                    ____________

                                                                                    Profile anders n

                                                                                    Joined: Feb 16 06
                                                                                    Posts: 166
                                                                                    ID: 91
                                                                                    Credit: 131,419
                                                                                    RAC: 0
                                                                                    Message 2910 - Posted 25 Mar 2007 12:33:01 UTC

                                                                                      A MAC issue.
                                                                                      Ralph showed as running but nothing was happening.
                                                                                      No countup on the time and when watching grafics all was frosen.
                                                                                      It was after ~11000 steps in a model of this Wu http://ralph.bakerlab.org/result.php?resultid=470124
                                                                                      After restarting Boinc everything went back to normal.

                                                                                      Anders n
                                                                                      ____________

                                                                                      Profile anders n

                                                                                      Joined: Feb 16 06
                                                                                      Posts: 166
                                                                                      ID: 91
                                                                                      Credit: 131,419
                                                                                      RAC: 0
                                                                                      Message 2912 - Posted 27 Mar 2007 7:57:14 UTC

                                                                                        MAC my XP computers is running just fine.
                                                                                        As soon as a Ralph or Rosetta task is preemted and then resumed it
                                                                                        \"hangs\" that is shows as running but the only thing happening is
                                                                                        that the time counts up.
                                                                                        If I recall right Boinc only swiches tasks at checkpoints,
                                                                                        if the task don\'t progress = no checkpoint.
                                                                                        And it looks like the watchdog don\'t work on this problem :(

                                                                                        Anders n

                                                                                        ____________

                                                                                        Message boards : RALPH@home bug list : Bug reports for Ralph 5.52-5.54


                                                                                        Home | Join | About | Participants | Community | Statistics

                                                                                        Copyright © 2017 University of Washington

                                                                                        Last Modified: 20 Nov 2008 19:41:56 UTC
                                                                                        Back to top ^