RALPH@home

Bug reports for rosetta_beta_5.77 and rosetta_5.69

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search

Message boards : RALPH@home bug list : Bug reports for rosetta_beta_5.77 and rosetta_5.69

AuthorMessage
Profile dekim
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jan 20 06
Posts: 213
ID: 1
Credit: 479,238
RAC: 182
Message 3329 - Posted 20 Aug 2007 17:38:28 UTC

    Last modified: 21 Aug 2007 17:08:24 UTC

    Please post any bugs here regarding version rosetta_beta 5.77 and/or rosetta 5.69. The same bug was resolved for both versions (the stable and development versions).
    ____________

    Profile m.mitch
    Avatar

    Joined: May 12 06
    Posts: 15
    ID: 1393
    Credit: 99,828
    RAC: 25
    Message 3332 - Posted 29 Aug 2007 0:43:30 UTC

      Last modified: 29 Aug 2007 0:43:56 UTC

      The work unit 550011 is stuck at about 94% complete, with 10 minutes left to run. The BOINC Manager says it\'s running but there is no CPU use.

      It\'s on a Linux box and the work unit was suspended but I didn\'t notice if that was the direct cause also the box was rebooted.

      Should it be aborted?
      ____________


      Click here to join the #1 Aussie Alliance on RALPH

      mdettweiler
      Avatar

      Joined: Apr 4 07
      Posts: 11
      ID: 2886
      Credit: 1,010
      RAC: 0
      Message 3333 - Posted 29 Aug 2007 15:47:44 UTC - in response to Message 3332.

        Last modified: 29 Aug 2007 15:48:16 UTC

        The work unit 550011 is stuck at about 94% complete, with 10 minutes left to run. The BOINC Manager says it\'s running but there is no CPU use.

        It\'s on a Linux box and the work unit was suspended but I didn\'t notice if that was the direct cause also the box was rebooted.

        Should it be aborted?

        NO. Rosetta bases its progress bar and time to completion estimates off of your preferred run time--which, for some of the larger workunits that seem to be very common nowadays, is less (sometimes drastically) than the amount of time actually required to complete one model (the minimum to complete a WU). Thus, if the workunit goes over your preferred runtime, it will stick at about 10 minutes left, and cut down that and up the % done very slowly, because it really has no idea how long the workunit\'s going to take. The % done and time left to completion, at least for Rosetta/RALPH workunits, are just rough estimates, and with the new, bigger workunits, if you have a lower set runtime (which is recommended for RALPH anyway), most of your workunits will probably go over, unless you have a very fast, modern CPU.

        Long story short, this is normal, so don\'t abort the workunit, let it run. Some workunits can take up to 4 hours (a couple close to 5, even) per model on my P4 3.2Ghz HT, so in my case, they\'ll take at the very least that amount of time, no matter what time preferences you have set. Rosetta doesn\'t know ahead of time how much time they\'ll take, so once it goes over your preferred run time, all it can do is make underestimates so people don\'t freak out if it goes over 100%. :-)
        ____________

        Profile m.mitch
        Avatar

        Joined: May 12 06
        Posts: 15
        ID: 1393
        Credit: 99,828
        RAC: 25
        Message 3334 - Posted 30 Aug 2007 1:28:01 UTC - in response to Message 3333.

          The work unit 550011 is stuck at about 94% complete, with 10 minutes left to run. The BOINC Manager says it\'s running but there is no CPU use.

          It\'s on a Linux box and the work unit was suspended but I didn\'t notice if that was the direct cause also the box was rebooted.

          Should it be aborted?

          NO. Rosetta bases its progress bar and time to completion estimates off of your preferred run time--which, for some of the larger workunits that seem to be very common nowadays, is less (sometimes drastically) than the amount of time actually required to complete one model (the minimum to complete a WU). Thus, if the workunit goes over your preferred runtime, it will stick at about 10 minutes left, and cut down that and up the % done very slowly, because it really has no idea how long the workunit\'s going to take. The % done and time left to completion, at least for Rosetta/RALPH workunits, are just rough estimates, and with the new, bigger workunits, if you have a lower set runtime (which is recommended for RALPH anyway), most of your workunits will probably go over, unless you have a very fast, modern CPU.

          Long story short, this is normal, so don\'t abort the workunit, let it run. Some workunits can take up to 4 hours (a couple close to 5, even) per model on my P4 3.2Ghz HT, so in my case, they\'ll take at the very least that amount of time, no matter what time preferences you have set. Rosetta doesn\'t know ahead of time how much time they\'ll take, so once it goes over your preferred run time, all it can do is make underestimates so people don\'t freak out if it goes over 100%. :-)


          I don\'t think it\'s normal for the BOINC Manager to report the work unit as running but the CPU to be inactive.
          ____________


          Click here to join the #1 Aussie Alliance on RALPH

          Profile anders n

          Joined: Feb 16 06
          Posts: 166
          ID: 91
          Credit: 131,419
          RAC: 0
          Message 3335 - Posted 30 Aug 2007 4:04:52 UTC

            A restart of Boinc is the first thing to do when a Wu seems stuck.


            ____________

            mdettweiler
            Avatar

            Joined: Apr 4 07
            Posts: 11
            ID: 2886
            Credit: 1,010
            RAC: 0
            Message 3336 - Posted 30 Aug 2007 4:41:01 UTC - in response to Message 3334.

              The work unit 550011 is stuck at about 94% complete, with 10 minutes left to run. The BOINC Manager says it\'s running but there is no CPU use.

              It\'s on a Linux box and the work unit was suspended but I didn\'t notice if that was the direct cause also the box was rebooted.

              Should it be aborted?

              NO. Rosetta bases its progress bar and time to completion estimates off of your preferred run time--which, for some of the larger workunits that seem to be very common nowadays, is less (sometimes drastically) than the amount of time actually required to complete one model (the minimum to complete a WU). Thus, if the workunit goes over your preferred runtime, it will stick at about 10 minutes left, and cut down that and up the % done very slowly, because it really has no idea how long the workunit\'s going to take. The % done and time left to completion, at least for Rosetta/RALPH workunits, are just rough estimates, and with the new, bigger workunits, if you have a lower set runtime (which is recommended for RALPH anyway), most of your workunits will probably go over, unless you have a very fast, modern CPU.

              Long story short, this is normal, so don\'t abort the workunit, let it run. Some workunits can take up to 4 hours (a couple close to 5, even) per model on my P4 3.2Ghz HT, so in my case, they\'ll take at the very least that amount of time, no matter what time preferences you have set. Rosetta doesn\'t know ahead of time how much time they\'ll take, so once it goes over your preferred run time, all it can do is make underestimates so people don\'t freak out if it goes over 100%. :-)


              I don\'t think it\'s normal for the BOINC Manager to report the work unit as running but the CPU to be inactive.

              Oh! Sorry. I made a blooper--I didn\'t notice that you said that the CPU was not being active at all. If the CPU was being used, yet the progress and time to completion were as you said, then what I said would be correct, but not in the case that it\'s not using any CPU time at all. In the case of it using no CPU time at all, I would recommend that you abort the WU.

              Sorry! :-(

              ____________

              Profile m.mitch
              Avatar

              Joined: May 12 06
              Posts: 15
              ID: 1393
              Credit: 99,828
              RAC: 25
              Message 3337 - Posted 30 Aug 2007 6:34:30 UTC - in response to Message 3336.

                The work unit 550011 is stuck at about 94% complete, with 10 minutes left to run. The BOINC Manager says it\'s running but there is no CPU use.

                It\'s on a Linux box and the work unit was suspended but I didn\'t notice if that was the direct cause also the box was rebooted.

                Should it be aborted?

                NO. Rosetta bases its progress bar and time to completion estimates off of your preferred run time--which, for some of the larger workunits that seem to be very common nowadays, is less (sometimes drastically) than the amount of time actually required to complete one model (the minimum to complete a WU). Thus, if the workunit goes over your preferred runtime, it will stick at about 10 minutes left, and cut down that and up the % done very slowly, because it really has no idea how long the workunit\'s going to take. The % done and time left to completion, at least for Rosetta/RALPH workunits, are just rough estimates, and with the new, bigger workunits, if you have a lower set runtime (which is recommended for RALPH anyway), most of your workunits will probably go over, unless you have a very fast, modern CPU.

                Long story short, this is normal, so don\'t abort the workunit, let it run. Some workunits can take up to 4 hours (a couple close to 5, even) per model on my P4 3.2Ghz HT, so in my case, they\'ll take at the very least that amount of time, no matter what time preferences you have set. Rosetta doesn\'t know ahead of time how much time they\'ll take, so once it goes over your preferred run time, all it can do is make underestimates so people don\'t freak out if it goes over 100%. :-)


                I don\'t think it\'s normal for the BOINC Manager to report the work unit as running but the CPU to be inactive.

                Oh! Sorry. I made a blooper--I didn\'t notice that you said that the CPU was not being active at all. If the CPU was being used, yet the progress and time to completion were as you said, then what I said would be correct, but not in the case that it\'s not using any CPU time at all. In the case of it using no CPU time at all, I would recommend that you abort the WU.

                Sorry! :-(


                No probs Anonymous, I have duelly blown it out of the water. Just as well to, I\'d left it unsuspended and have no idea how much crunching time it wasted.

                Cheers
                ____________


                Click here to join the #1 Aussie Alliance on RALPH

                ramostol

                Joined: Mar 29 07
                Posts: 24
                ID: 2840
                Credit: 31,121
                RAC: 0
                Message 3338 - Posted 30 Aug 2007 9:17:42 UTC

                  I commented on a similar problem in a Rosetta message board some time ago (39305). My experience is that if a Boinc project is running using no CPU (more correctly: using so little CPU time that it is practically unnoticeable), it happens because other programs hog the CPU in such a way that the Rosetta crunching is performed not in the Rosetta process but in the kernel_task process.

                  To bring the situation back to normal you may examine the active processes on you computer. If you observe a quite active kernel_task process this would confirm the theory. Then look through all processes to find a program/process using lots of CPU although doing nothing sensible, and quit this program. Then you can see kernel_task shrinking and the Rosetta process using CPU as normally.

                  What I did not mention in my original message is that this is probably also the cause of the occasionally reported problem of Rosetta processes running for days and days without being able to stop. Since Boinc/Rosetta will register the CPU use of the Rosetta process to determine when to terminate the process in accordance with your default settings, it will know nothing of the computing going on inside the kernel_task process and will let the process continue for a looooong time.

                  Message boards : RALPH@home bug list : Bug reports for rosetta_beta_5.77 and rosetta_5.69


                  Home | Join | About | Participants | Community | Statistics

                  Copyright © 2017 University of Washington

                  Last Modified: 20 Nov 2008 19:41:56 UTC
                  Back to top ^