RALPH@home

Bug reports for Ralph 5.02

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search

Message boards : RALPH@home bug list : Bug reports for Ralph 5.02

AuthorMessage
Rhiju
Forum moderator
Project developer
Project scientist

Joined: Feb 14 06
Posts: 161
ID: 4
Credit: 3,725
RAC: 0
Message 1289 - Posted 22 Apr 2006 5:20:20 UTC

    We\'re testing some new features (see news on main page).
    Please pay special attention to jobs that appear stuck or appear to be taking too long! We\'re hoping a new watchdog thread will catch them
    ____________

    Nikolay A. Saharov

    Joined: Feb 17 06
    Posts: 6
    ID: 380
    Credit: 15,472
    RAC: 3
    Message 1293 - Posted 22 Apr 2006 7:15:38 UTC

      Last modified: 22 Apr 2006 7:27:13 UTC

      I have 3 errored results:
      1. 92417 and 92455 finished with the message


      <message>Incorrect function. (0x1) - exit code 1 (0x1)
      </message>
      <stderr_txt>
      ERROR:: Exit at: .\\fragments.cc line:687

      </stderr_txt>

      2. 91907 finished with the text:

      <stderr_txt>
      # random seed: 3886793
      # cpu_run_time_pref: 7200
      **********************************************************************
      Rosetta score stayed the same too long. Watchdog is killing the run!
      **********************************************************************

      </stderr_txt>
      <message><file_xfer_error>
      <file_name>FACONTACTS_RECENTER_NOFILTERS_1dhn__399_6_0_0</file_name>
      <error_code>-161</error_code>
      </file_xfer_error>

      </message>

      ____________

      Yeti
      Avatar

      Joined: Feb 19 06
      Posts: 30
      ID: 581
      Credit: 49,557
      RAC: 0
      Message 1294 - Posted 22 Apr 2006 8:09:35 UTC

        So far, all 5.02-WUs have crashed:

        http://ralph.bakerlab.org/result.php?resultid=91805

        http://ralph.bakerlab.org/result.php?resultid=91808

        http://ralph.bakerlab.org/result.php?resultid=91965

        Sevral more in progress, let\'s see, what\'s going on

        ____________


        Supporting BOINC, a great concept !

        Yeti
        Avatar

        Joined: Feb 19 06
        Posts: 30
        ID: 581
        Credit: 49,557
        RAC: 0
        Message 1295 - Posted 22 Apr 2006 11:21:30 UTC

          Here is one with a large crash-dump:

          http://ralph.bakerlab.org/result.php?resultid=92882

          ____________


          Supporting BOINC, a great concept !

          tralala

          Joined: Apr 12 06
          Posts: 52
          ID: 1266
          Credit: 15,257
          RAC: 0
          Message 1296 - Posted 22 Apr 2006 11:34:06 UTC

            Out of seven six have crashed:

            http://ralph.bakerlab.org/results.php?userid=1266

            Although I have 5.4.3 installed I didn\'t get a large crash-dump
            ____________

            Pieface

            Joined: Feb 16 06
            Posts: 64
            ID: 234
            Credit: 203,513
            RAC: 0
            Message 1297 - Posted 22 Apr 2006 13:40:18 UTC

              Had one die this morning with 0xc00000005, result: resultid

              Looks like the old died while swapping problem.

              4/22/2006 5:49:41 AM|ralph@home|Restarting task FACONTACTS_RECENTER_NOFILTERS_1a68__399_7_0 using rosetta_beta version 502
              4/22/2006 5:49:41 AM|ralph@home|Restarting task FACONTACTS_RECENTER_NOFILTERS_1ew4A_399_2_0 using rosetta_beta version 502
              4/22/2006 5:49:41 AM|SETI@home Beta Test|Pausing task 01jn01aa.27448.448.572166.3.124_1 (removed from memory)
              4/22/2006 5:49:41 AM|SETI@home Beta Test|Pausing task 01jn01aa.27448.448.572166.3.132_3 (removed from memory)
              4/22/2006 6:49:41 AM|ralph@home|Pausing task FACONTACTS_RECENTER_NOFILTERS_1ew4A_399_2_0 (removed from memory)
              4/22/2006 6:49:41 AM|SETI@home Beta Test|Restarting task 01jn01aa.27448.448.572166.3.124_1 using setiathome_enhanced version 511
              4/22/2006 6:49:43 AM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1a68__399_7_0 ( - exit code -1073741819 (0xc0000005))

              Pieface

              Joined: Feb 16 06
              Posts: 64
              ID: 234
              Credit: 203,513
              RAC: 0
              Message 1299 - Posted 22 Apr 2006 15:47:34 UTC

                oops, my bad, that points to the one from yesterday, the one this morning that the log entries go with is here: 91953

                Snake Doctor

                Joined: Feb 16 06
                Posts: 37
                ID: 44
                Credit: 996,938
                RAC: 0
                Message 1300 - Posted 22 Apr 2006 16:00:19 UTC

                  Just got this one here.

                  This was on a MAC Dual G4 running MAC OS 10.4.6, BOINC 5.3.28

                  WU - NO_CHECK_7486h002_dec123_1.pdb_407_19_0

                  Looks like a file problem from this error message -

                  <message><file_xfer_error>
                  <file_name>NO_CHECK_7486h002_dec123_1.pdb_407_19_0_0</file_name>
                  <error_code>-161</error_code>
                  <error_message></error_message>
                  </file_xfer_error>

                  </message>
                  ____________

                  Profile [B^S] Doug Worrall
                  Avatar

                  Joined: Feb 16 06
                  Posts: 10
                  ID: 220
                  Credit: 1,515
                  RAC: 0
                  Message 1301 - Posted 22 Apr 2006 16:45:48 UTC - in response to Message 1300.

                    Just got this one here.

                    This was on a MAC Dual G4 running MAC OS 10.4.6, BOINC 5.3.28

                    WU - NO_CHECK_7486h002_dec123_1.pdb_407_19_0

                    Looks like a file problem from this error message -

                    <message><file_xfer_error>
                    <file_name>NO_CHECK_7486h002_dec123_1.pdb_407_19_0_0</file_name>
                    <error_code>-161</error_code>
                    <error_message></error_message>
                    </file_xfer_error>

                    </message>


                    Hello,
                    I thought all the w/u that were fubarred was cause my P.C. crashed yesteday.
                    The last 8 w/u only 1 worked well.Amstill having difficulties tweeking this new system.The same w/u above stuck at 1.47% at 1 hour,some I have let go 2 to 3 hours
                    before aborting.I just saw in Tam thread to let new Feture to handle these
                    w/u.Will Edit this with the correct information.
                    Sincerely
                    Sluger



                    ____________

                    Profile Daxl

                    Joined: Mar 1 06
                    Posts: 2
                    ID: 906
                    Credit: 55,301
                    RAC: 0
                    Message 1302 - Posted 22 Apr 2006 17:02:48 UTC

                      All 6 WU\'s have crashed on my Laptop : P4-M 2,2 GHz 512 MB Memory (XP-SP2)

                      WU-83346 Error -161
                      WU-83301 Error -161
                      WU-83275 Watchdog kill
                      WU-83276 Error -161
                      WU-83302 Error -161
                      WU-83282 Error -161

                      -----------------------------------------------------------------------------
                      <core_client_version>5.4.4</core_client_version>
                      <stderr_txt>
                      # random seed: 3885665
                      # cpu_run_time_pref: 3600
                      **********************************************************************
                      Rosetta score stayed the same too long. Watchdog is killing the run!
                      **********************************************************************

                      </stderr_txt>
                      <message><file_xfer_error>
                      <file_name>NO_CHECK_7486h002_dec124_1.pdb_407_9_0_0</file_name>
                      <error_code>-161</error_code>
                      </file_xfer_error>
                      </message>
                      ----------------------------------------------------------------------------

                      <core_client_version>5.4.4</core_client_version>
                      <stderr_txt>
                      # random seed: 3885638
                      # cpu_run_time_pref: 3600
                      # DONE :: 1 starting structures built 5 (nstruct) times
                      # This process generated 1 decoys from 1 attempts
                      # 0 starting pdbs were skipped

                      </stderr_txt>
                      <message><file_xfer_error>
                      <file_name>NO_CHECK_7486h002_dec129_1.pdb_407_16_1_0</file_name>
                      <error_code>-161</error_code>
                      </file_xfer_error>
                      </message>
                      -----------------------------------------------------------------------------

                      greetz DAXL

                      Profile Daxl

                      Joined: Mar 1 06
                      Posts: 2
                      ID: 906
                      Credit: 55,301
                      RAC: 0
                      Message 1303 - Posted 22 Apr 2006 17:22:58 UTC

                        6 out of 12 WU\'s have crashed - 6 aborted
                        On my Athlon 64-3000 1GB Memory (XP SP2)

                        WU-83315 Error -161
                        WU-83216 Error -161
                        WU-83217 Error -161
                        WU-83218 Error -161
                        WU-83219 Error -161
                        WU-83222 Error -161

                        ---------------------------------------------------------------------

                        <core_client_version>5.4.4</core_client_version>
                        <stderr_txt>
                        # random seed: 3885631
                        # cpu_run_time_pref: 3600
                        # DONE :: 1 starting structures built 5 (nstruct) times
                        # This process generated 3 decoys from 3 attempts
                        # 0 starting pdbs were skipped

                        </stderr_txt>
                        <message><file_xfer_error>
                        <file_name>NO_CHECK_7486h002_dec184_1.pdb_407_3_0_0</file_name>
                        <error_code>-161</error_code>
                        </file_xfer_error>
                        </message>
                        ---------------------------------------------------------------------
                        greetz DAXL

                        [B^S] suguruhirahara

                        Joined: Mar 5 06
                        Posts: 40
                        ID: 992
                        Credit: 6,001
                        RAC: 0
                        Message 1304 - Posted 22 Apr 2006 17:33:31 UTC

                          on winxp 64bit

                          http://ralph.bakerlab.org/result.php?resultid=91614

                          <core_client_version>5.2.13</core_client_version>
                          <stderr_txt>
                          # random seed: 3886628
                          # cpu_run_time_pref: 3600
                          **********************************************************************
                          Rosetta score stayed the same too long. Watchdog is killing the run!
                          **********************************************************************

                          </stderr_txt>
                          <message><file_xfer_error>
                          <file_name>FACONTACTS_RECENTER_NOFILTERS_1pgx__399_1_0_0</file_name>
                          <error_code>-161</error_code>
                          <error_message></error_message>
                          </file_xfer_error>

                          </message>

                          Anyway, what is the watchdog?
                          ____________

                          Psycodad

                          Joined: Feb 16 06
                          Posts: 14
                          ID: 202
                          Credit: 295
                          RAC: 0
                          Message 1305 - Posted 22 Apr 2006 17:38:08 UTC

                            Last modified: 22 Apr 2006 17:38:48 UTC

                            22.04.2006 17:50:28|ralph@home|Unrecoverable error for result NO_CHECK_7486h002_dec123_1.pdb_407_8_1 (<file_xfer_error> <file_name>NO_CHECK_7486h002_dec123_1.pdb_407_8_1_0</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error>)


                            WU
                            Result





                            <core_client_version>5.2.13</core_client_version>
                            <stderr_txt>
                            # random seed: 3885686
                            # cpu_run_time_pref: 3600
                            **********************************************************************
                            Rosetta score stayed the same too long. Watchdog is killing the run!
                            **********************************************************************

                            </stderr_txt>
                            <message><file_xfer_error>
                            <file_name>NO_CHECK_7486h002_dec123_1.pdb_407_8_1_0</file_name>
                            <error_code>-161</error_code>
                            <error_message></error_message>
                            </file_xfer_error>

                            </message>
                            ____________

                            casio7131

                            Joined: Mar 20 06
                            Posts: 15
                            ID: 1151
                            Credit: 12,660
                            RAC: 0
                            Message 1307 - Posted 23 Apr 2006 2:15:07 UTC

                              8 results where watchdog killed the run. i think that it might be killing it a bit too early because this machine doesn\'t usually get stuck or error out too often.

                              22/04/2006 6:41:39 PM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1ail__399_7_0 (Incorrect function. (0x1) - exit code 1 (0x1))
                              http://ralph.bakerlab.org/result.php?resultid=91955
                              22/04/2006 6:43:50 PM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1a32__399_8_0 (<file_xfer_error> <file_name>FACONTACTS_RECENTER_NOFILTERS_1a32__399_8_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
                              http://ralph.bakerlab.org/result.php?resultid=92014
                              22/04/2006 11:15:56 PM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1ubi__399_8_0 (Incorrect function. (0x1) - exit code 1 (0x1))
                              http://ralph.bakerlab.org/result.php?resultid=92058
                              22/04/2006 11:15:59 PM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1who__399_6_0 (<file_xfer_error> <file_name>FACONTACTS_RECENTER_NOFILTERS_1who__399_6_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
                              http://ralph.bakerlab.org/result.php?resultid=91941
                              23/04/2006 2:52:01 AM|ralph@home|Unrecoverable error for result HOMO_7486_h002_1_LOOPRLX_7486h002_dec184_1.pdb_406_9_2 (Incorrect function. (0x1) - exit code 1 (0x1))
                              http://ralph.bakerlab.org/result.php?resultid=93253
                              23/04/2006 2:52:06 AM|ralph@home|Unrecoverable error for result NO_CHECK_7486h002_dec124_1.pdb_407_3_0 (<file_xfer_error> <file_name>NO_CHECK_7486h002_dec124_1.pdb_407_3_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
                              http://ralph.bakerlab.org/result.php?resultid=92833
                              23/04/2006 7:35:36 AM|ralph@home|Unrecoverable error for result NO_CHECK_7486h002_dec123_1.pdb_407_12_1 (Incorrect function. (0x1) - exit code 1 (0x1))
                              http://ralph.bakerlab.org/result.php?resultid=93254
                              23/04/2006 7:35:42 AM|ralph@home|Unrecoverable error for result HOMO_7486_h002_1_LOOPRLX_7486h002_dec08_1.pdb_406_15_1 (<file_xfer_error> <file_name>HOMO_7486_h002_1_LOOPRLX_7486h002_dec08_1.pdb_406_15_1_0</file_name> <error_code>-161</error_code></file_xfer_error>)
                              http://ralph.bakerlab.org/result.php?resultid=93255

                              ____________

                              Rhiju
                              Forum moderator
                              Project developer
                              Project scientist

                              Joined: Feb 14 06
                              Posts: 161
                              ID: 4
                              Credit: 3,725
                              RAC: 0
                              Message 1308 - Posted 23 Apr 2006 3:10:04 UTC - in response to Message 1307.

                                Thanks for the posts. We think we\'ve tracked down the
                                two most common errors. The watchdog does seem to be
                                a little too aggressive... we\'ll see how things
                                go for ralph 5.03!

                                8 results where watchdog killed the run. i think that it might be killing it a bit too early because this machine doesn\'t usually get stuck or error out too often.

                                22/04/2006 6:41:39 PM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1ail__399_7_0 (Incorrect function. (0x1) - exit code 1 (0x1))
                                http://ralph.bakerlab.org/result.php?resultid=91955
                                22/04/2006 6:43:50 PM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1a32__399_8_0 (<file_xfer_error> <file_name>FACONTACTS_RECENTER_NOFILTERS_1a32__399_8_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
                                http://ralph.bakerlab.org/result.php?resultid=92014
                                22/04/2006 11:15:56 PM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1ubi__399_8_0 (Incorrect function. (0x1) - exit code 1 (0x1))
                                http://ralph.bakerlab.org/result.php?resultid=92058
                                22/04/2006 11:15:59 PM|ralph@home|Unrecoverable error for result FACONTACTS_RECENTER_NOFILTERS_1who__399_6_0 (<file_xfer_error> <file_name>FACONTACTS_RECENTER_NOFILTERS_1who__399_6_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
                                http://ralph.bakerlab.org/result.php?resultid=91941
                                23/04/2006 2:52:01 AM|ralph@home|Unrecoverable error for result HOMO_7486_h002_1_LOOPRLX_7486h002_dec184_1.pdb_406_9_2 (Incorrect function. (0x1) - exit code 1 (0x1))
                                http://ralph.bakerlab.org/result.php?resultid=93253
                                23/04/2006 2:52:06 AM|ralph@home|Unrecoverable error for result NO_CHECK_7486h002_dec124_1.pdb_407_3_0 (<file_xfer_error> <file_name>NO_CHECK_7486h002_dec124_1.pdb_407_3_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
                                http://ralph.bakerlab.org/result.php?resultid=92833
                                23/04/2006 7:35:36 AM|ralph@home|Unrecoverable error for result NO_CHECK_7486h002_dec123_1.pdb_407_12_1 (Incorrect function. (0x1) - exit code 1 (0x1))
                                http://ralph.bakerlab.org/result.php?resultid=93254
                                23/04/2006 7:35:42 AM|ralph@home|Unrecoverable error for result HOMO_7486_h002_1_LOOPRLX_7486h002_dec08_1.pdb_406_15_1 (<file_xfer_error> <file_name>HOMO_7486_h002_1_LOOPRLX_7486h002_dec08_1.pdb_406_15_1_0</file_name> <error_code>-161</error_code></file_xfer_error>)
                                http://ralph.bakerlab.org/result.php?resultid=93255


                                ____________

                                Rhiju
                                Forum moderator
                                Project developer
                                Project scientist

                                Joined: Feb 14 06
                                Posts: 161
                                ID: 4
                                Credit: 3,725
                                RAC: 0
                                Message 1312 - Posted 23 Apr 2006 8:33:15 UTC - in response to Message 1308.

                                  Is anyone out there running with a Mac? Are your jobs from 5.02 or 5.03 running?

                                  ____________

                                  Leffe

                                  Joined: Feb 19 06
                                  Posts: 10
                                  ID: 596
                                  Credit: 3,683
                                  RAC: 0
                                  Message 1315 - Posted 23 Apr 2006 10:31:21 UTC

                                    win xp pro sp2
                                    boinc 5.2.13
                                    Ralph 5.02


                                    23/04/2006 12:50:50|ralph@home|Unrecoverable error for result NO_CHECK_7486h002_dec08_1.pdb_407_3_1 (<file_xfer_error> <file_name>NO_CHECK_7486h002_dec08_1.pdb_407_3_1_0</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error>)

                                    ____________

                                    Robert Everly

                                    Joined: Feb 16 06
                                    Posts: 10
                                    ID: 276
                                    Credit: 2,333
                                    RAC: 0
                                    Message 1317 - Posted 23 Apr 2006 13:26:36 UTC

                                      Last modified: 23 Apr 2006 13:29:27 UTC

                                      All three of my 5.02 WUs were killed by the watchdog thread.

                                      resultid=91985
                                      resultid=91973
                                      resultid=91972

                                      I still have my settings to leave the app in memory when switching. Is it possible that the watchdog thread is taking that time into consideration? I have my systems set to switch projects every hour. All of mine aborted very very close to the one hour mark.
                                      ____________

                                      Profile paul and kirsty yates
                                      Avatar

                                      Joined: Feb 16 06
                                      Posts: 11
                                      ID: 310
                                      Credit: 949
                                      RAC: 0
                                      Message 1318 - Posted 23 Apr 2006 13:51:24 UTC

                                        Last modified: 23 Apr 2006 13:53:42 UTC

                                        i also got a watchdog killing :(

                                        on this one this one


                                        ____________

                                        Profile anders n

                                        Joined: Feb 16 06
                                        Posts: 166
                                        ID: 91
                                        Credit: 131,419
                                        RAC: 0
                                        Message 1321 - Posted 23 Apr 2006 16:34:10 UTC

                                          The dog is barking bad :)

                                          http://ralph.bakerlab.org/results.php?hostid=2049

                                          Anders n
                                          ____________

                                          Moderator9
                                          Forum moderator

                                          Joined: Feb 16 06
                                          Posts: 251
                                          ID: 210
                                          Credit: 0
                                          RAC: 0
                                          Message 1323 - Posted 23 Apr 2006 17:27:43 UTC - in response to Message 1312.

                                            Is anyone out there running with a Mac? Are your jobs from 5.02 or 5.03 running?

                                            \"Rhiju\"
                                            See this post

                                            ____________
                                            Moderator9
                                            RALPH@home FAQs
                                            RALPH@home Guidelines
                                            Moderator Contact

                                            Profile anders n

                                            Joined: Feb 16 06
                                            Posts: 166
                                            ID: 91
                                            Credit: 131,419
                                            RAC: 0
                                            Message 1332 - Posted 24 Apr 2006 3:56:37 UTC - in response to Message 1321.

                                              The dog is barking bad :)

                                              http://ralph.bakerlab.org/results.php?hostid=2049

                                              Anders n



                                              Ok. I set the cruching time to 2 H and the dog shut up.

                                              This means that it should have something to do with swiching tasks.

                                              Anders n
                                              ____________

                                              Message boards : RALPH@home bug list : Bug reports for Ralph 5.02


                                              Home | Join | About | Participants | Community | Statistics

                                              Copyright © 2017 University of Washington

                                              Last Modified: 20 Nov 2008 19:41:56 UTC
                                              Back to top ^