RALPH@home

Bug reports for Ralph 5.36

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search

Message boards : RALPH@home bug list : Bug reports for Ralph 5.36

AuthorMessage
Rhiju
Forum moderator
Project developer
Project scientist

Joined: Feb 14 06
Posts: 161
ID: 4
Credit: 3,725
RAC: 0
Message 2439 - Posted 31 Oct 2006 2:22:22 UTC

    Hi, thanks for your input so far. This will likely be the last ralph update before we update rosetta@home, unless something goes wrong.
    ____________

    wraith

    Joined: Oct 31 06
    Posts: 4
    ID: 2142
    Credit: 382
    RAC: 0
    Message 2440 - Posted 31 Oct 2006 13:59:22 UTC - in response to Message 2439.

      Last modified: 31 Oct 2006 14:55:50 UTC

      Hi, thanks for your input so far. This will likely be the last ralph update before we update rosetta@home, unless something goes wrong.


      http://ralph.bakerlab.org/result.php?resultid=306545


      <core_client_version>5.4.9</core_client_version>
      <stderr_txt>
      Graphics are disabled due to configuration...
      input_etable: reading etable... dsolv
      input_etable: WARNING etable types don\'t match!
      expected dsolv,606 got dsolv,721
      Graphics are disabled due to configuration...
      input_etable: reading etable... dsolv
      input_etable: WARNING etable types don\'t match!
      expected dsolv,606 got dsolv,721
      Graphics are disabled due to configuration...
      input_etable: reading etable... dsolv
      input_etable: WARNING etable types don\'t match!
      expected dsolv,606 got dsolv,721
      Graphics are disabled due to configuration...
      input_etable: reading etable... dsolv
      input_etable: WARNING etable types don\'t match!
      expected dsolv,606 got dsolv,721
      Graphics are disabled due to configuration...
      input_etable: reading etable... dsolv
      input_etable: WARNING etable types don\'t match!
      expected dsolv,606 got dsolv,721
      Graphics are disabled due to configuration...
      Too many restarts with no progress. Keep application in memory while preempted.
      ======================================================
      DONE :: 0 starting structures built 29 (nstruct) times
      This process generated 0 decoys from 0 attempts
      ======================================================


      BOINC :: Watchdog shutting down...
      BOINC :: BOINC support services shutting down...

      </stderr_txt>
      <message>
      <file_xfer_error>
      <file_name>1b72__ETABLE_TEST_ABRELAX_rhh13sm6__1387_15_3_0</file_name>
      <error_code>-161</error_code>
      </file_xfer_error>

      </message>

      genes
      Avatar

      Joined: Feb 16 06
      Posts: 45
      ID: 57
      Credit: 43,300
      RAC: 0
      Message 2441 - Posted 1 Nov 2006 0:55:42 UTC

        Last modified: 1 Nov 2006 1:45:23 UTC

        Had this one happen a few minutes ago --

        http://ralph.bakerlab.org/result.php?resultid=307179

        I clicked \"show graphics\", and the graphics came up, then froze a few seconds later. The Ralph App was still running, and together with the graphics was using up 2 CPU\'s of my quad CPU (2 virtual) system. I couldn\'t close the graphics, if I tried I got a Windows message box (...not responding, end now?) which I canceled a few times to wait (to no avail), and when I chose \"end now\", the Ralph app terminated with a computation error.

        I do use the screensaver, which mostly works, but occasionally errors out or needs to be killed like this.

        ____________

        Leffe

        Joined: Feb 19 06
        Posts: 10
        ID: 596
        Credit: 3,683
        RAC: 0
        Message 2443 - Posted 1 Nov 2006 5:15:33 UTC

          got one: 01/11/2006 08:06:15|ralph@home|Unrecoverable error for result BENCH_ABRELAX_SAVE_ALL_OUT_1a19A_BARCODE_R64_1423_10_0 ( - exit code -1073741819 (0xc0000005))

          ____________

          wraith

          Joined: Oct 31 06
          Posts: 4
          ID: 2142
          Credit: 382
          RAC: 0
          Message 2444 - Posted 1 Nov 2006 11:01:47 UTC

            Last modified: 1 Nov 2006 11:54:39 UTC

            Another \'watchdog timeout\' on a...

            1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_ANGLES_ALL_BOND_DISTANCES_SAVE_ALL_OUT__1424_31_0

            http://ralph.bakerlab.org/result.php?resultid=307852

            This is also a machine that did 30 decoys on a similar job w/5.34

            http://boinc.bakerlab.org/rosetta/result.php?resultid=44225313
            http://boinc.bakerlab.org/rosetta/workunit.php?wuid=39021279

            Profile KC0ISW

            Joined: Feb 17 06
            Posts: 20
            ID: 452
            Credit: 11,725
            RAC: 0
            Message 2446 - Posted 2 Nov 2006 2:09:23 UTC

              Result ID 308969
              Name 2reb__BOINC_POSE_ABRELAX_VARY_ALL_BOND_ANGLES_VARY_ALL_BOND_DISTANCES_NEWRELAXFLAGS__1429_24_0
              Workunit 272034
              Created 2 Nov 2006 1:55:37 UTC
              Sent 2 Nov 2006 2:04:02 UTC
              Received 2 Nov 2006 2:52:50 UTC
              Server state Over
              Outcome Success
              Client state Done
              Exit status 0 (0x0)
              Computer ID 4918
              Report deadline 6 Nov 2006 2:04:02 UTC
              CPU time 2192.522693
              stderr out <core_client_version>5.7.1</core_client_version>
              <![CDATA[
              <stderr_txt>
              # cpu_run_time_pref: 3600
              # random seed: 2875416
              WARNING! error deleting file .\\xx2reb.out
              ======================================================
              DONE :: 1 starting structures built 1 (nstruct) times
              This process generated 1 decoys from 1 attempts
              ======================================================


              BOINC :: Watchdog shutting down...
              BOINC :: BOINC support services shutting down...

              </stderr_txt>
              ]]>


              Validate state Valid
              Claimed credit 6.63670628586472
              Granted credit 6.63670628586472
              application version 5.36

              ____________

              Pepo
              Avatar

              Joined: Sep 8 06
              Posts: 104
              ID: 1812
              Credit: 36,890
              RAC: 0
              Message 2447 - Posted 2 Nov 2006 9:30:09 UTC

                Last modified: 2 Nov 2006 10:18:01 UTC

                Also the version 5.36 is checkpointing after reaching 100% (instead of reporting the result) and then being preempted by other apps afterwards (possibly for a longer time, because of negative STD).

                Peter

                Profile Krzychu P.

                Joined: Feb 16 06
                Posts: 19
                ID: 114
                Credit: 10,236
                RAC: 0
                Message 2448 - Posted 3 Nov 2006 7:57:11 UTC


                  2006-11-03 09:26:26|ralph@home|Unrecoverable error for result 1mkyA_BOINC_POSE_ABRELAX_VARY_ALL_BOND_ANGLES_VARY_ALL_BOND_DISTANCES_NEWRELAXFLAGS__1443_33_0 (Niepoprawna funkcja. (0x1) - exit code 1 (0x1))


                  <core_client_version>5.6.5</core_client_version>
                  <![CDATA[
                  <message>
                  Niepoprawna funkcja. (0x1) - exit code 1 (0x1)
                  </message>
                  <stderr_txt>
                  # random seed: 2873154
                  # random seed: 2873154
                  # cpu_run_time_pref: 3600
                  ERROR:: Exit at: .\\fullatom_energy.cc line:1969

                  </stderr_txt>
                  ____________

                  Pepo
                  Avatar

                  Joined: Sep 8 06
                  Posts: 104
                  ID: 1812
                  Credit: 36,890
                  RAC: 0
                  Message 2449 - Posted 3 Nov 2006 10:12:32 UTC

                    Result 1dcj__BOINC_POSE_ABRELAX_VARY_SC_BOND_ANGLES_NEWRELAXFLAGS__1444_25_0 predicted crunchtime of around 12:30 hours, but finally finished after 1:19 hours, approx. similar to my other Ralph 5.36 results here.

                    I have not noticed any previous Ralph results (or for other projects) doing such overprediction. My benchmark values did not change significantly in last days, Ralph\'s DCF is 0.898411. The fpops_est is the same as on previous results, 40000000000000. Or have I overseen something?

                    By the way, now it\'s my at least second 5.36 WU in a row staying preempted after 100%.

                    Peter

                    Pepo
                    Avatar

                    Joined: Sep 8 06
                    Posts: 104
                    ID: 1812
                    Credit: 36,890
                    RAC: 0
                    Message 2518 - Posted 9 Nov 2006 12:36:29 UTC - in response to Message 2449.

                      Result 1dcj__.....__1444_25_0 predicted crunchtime of around 12:30 hours, but finally finished after 1:19 hours, approx. similar to my other Ralph 5.36 results here. [...] Or have I overseen something?

                      I\'m sorry, it seems to be the same with all Ralphs, I\'ve probably missed it previously.

                      Peter

                      Profile feet1st

                      Joined: Mar 7 06
                      Posts: 312
                      ID: 1028
                      Credit: 110,522
                      RAC: 0
                      Message 2521 - Posted 9 Nov 2006 16:48:49 UTC

                        Pepo:

                        The time remaining INCREASING as the WU runs is normal.

                        ...odd, but normal. The time remaining is really based on the completed % and the time taken so far... and the completed % only changes materially when a model is completed (which can sometimes take several hours).
                        ____________

                        Pepo
                        Avatar

                        Joined: Sep 8 06
                        Posts: 104
                        ID: 1812
                        Credit: 36,890
                        RAC: 0
                        Message 2523 - Posted 9 Nov 2006 20:31:08 UTC - in response to Message 2521.

                          The time remaining INCREASING as the WU runs is normal.

                          I understand the reasons for increasing remaining time, but it was initial prediction, prior to being run - in the \"ready to run\" state. I probably missed the predicted time on previous results.

                          Is it the same among other users?

                          Peter

                          Profile feet1st

                          Joined: Mar 7 06
                          Posts: 312
                          ID: 1028
                          Credit: 110,522
                          RAC: 0
                          Message 2541 - Posted 17 Nov 2006 19:14:36 UTC

                            The initial prediction (unfortunately) is based on the last WU you crunched. I say \"unfortunately\" because it\'s pretty CLEAR that the project\'s target runtime would be a more reliable estimate. Unfortunately BOINC doesn\'t recognize that the project has such a target. So, if your last Ralph WUs ran 12 hrs, and then you ran out of work and changed your runtime preference to 1 or 2 hours... BOINC isn\'t aware of that yet, and still predicts 12 hrs. Once you crunch and report 2hr WUs, BOINC should adjust it\'s initial estimates.
                            ____________

                            Pepo
                            Avatar

                            Joined: Sep 8 06
                            Posts: 104
                            ID: 1812
                            Credit: 36,890
                            RAC: 0
                            Message 2545 - Posted 18 Nov 2006 21:19:18 UTC - in response to Message 2541.

                              The initial prediction (unfortunately) is based on the last WU you crunched. I say \"unfortunately\" because it\'s pretty CLEAR that the project\'s target runtime would be a more reliable estimate. Unfortunately BOINC doesn\'t recognize that the project has such a target.

                              Approx. until 8.11. I used the default target time. Then (after noticing I can change it :-) I modified it to 2 hours.
                              So, if your last Ralph WUs ran 12 hrs, and then you ran out of work and changed your runtime preference to 1 or 2 hours... BOINC isn\'t aware of that yet, and still predicts 12 hrs. Once you crunch and report 2hr WUs, BOINC should adjust it\'s initial estimates.

                              Since attaching the host to Ralph to the moment I noticed the long prediction, it crunched 4-5 WUs (with default target time) with crunch times ranging between 40-80 minutes (probably depending on the CPU frequency).

                              I\'d rather believe the prediction will stabilize with time and depends only on the fixed fpops_est value (40000000000000), usually fixed target time, computer speed and calculated DCF. If I\'m right, my DCF should once stabilize somewhere around 0.15-0.20?

                              Peter

                              Profile feet1st

                              Joined: Mar 7 06
                              Posts: 312
                              ID: 1028
                              Credit: 110,522
                              RAC: 0
                              Message 2547 - Posted 20 Nov 2006 20:34:37 UTC

                                Well, that\'s the thing that BOINC doesn\'t understand about how Rosetta (and Ralph) work. They actually crunch as best they can to meet your runtime objective. So, regardless of the Ghz of your machine, it will produce more models to fill the target runtime. This is why credit is granted on models crunched, rather then the BOINC claimed credit.

                                What I\'m trying to say is that regardless of the speed of your machine, a 2hr runtime preference is still 2 hours. But with a faster machine, you can crunch more models in 2hrs. And because you can crunch models faster, you are less likely to dramatically exceed your preference just to complete the first model.

                                Some of the larger WUs can take several hours to crunch one model. And one model is the minimum you can report with. On slow machines this may be more like 6-8 hours for a single model. This is regardless of the runtime preference, because the minimum you can report is one completed model... and this is perhaps the type of thing that may have thrown your estimates off.
                                ____________

                                Message boards : RALPH@home bug list : Bug reports for Ralph 5.36


                                Home | Join | About | Participants | Community | Statistics

                                Copyright © 2017 University of Washington

                                Last Modified: 20 Nov 2008 19:41:56 UTC
                                Back to top ^