RALPH@home

Bug reports for Ralph 5.03

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search

Message boards : RALPH@home bug list : Bug reports for Ralph 5.03

AuthorMessage
Rhiju
Forum moderator
Project developer
Project scientist

Joined: Feb 14 06
Posts: 161
ID: 4
Credit: 3,725
RAC: 0
Message 1309 - Posted 23 Apr 2006 3:18:39 UTC

    We\'ve tried to make the watchdog a little less aggressive about aborting, and are having it give
    us back the reason for aborting. Let us know if
    you think these jobs are getting killed too soon, or
    too late. Thanks!
    ____________

    Rhiju
    Forum moderator
    Project developer
    Project scientist

    Joined: Feb 14 06
    Posts: 161
    ID: 4
    Credit: 3,725
    RAC: 0
    Message 1313 - Posted 23 Apr 2006 8:33:48 UTC - in response to Message 1309.

      Is anyone out there running with a Mac? Are your jobs from 5.02 or 5.03 running?

      We\'ve tried to make the watchdog a little less aggressive about aborting, and are having it give
      us back the reason for aborting. Let us know if
      you think these jobs are getting killed too soon, or
      too late. Thanks!


      ____________

      tralala

      Joined: Apr 12 06
      Posts: 52
      ID: 1266
      Credit: 15,257
      RAC: 0
      Message 1319 - Posted 23 Apr 2006 14:01:46 UTC - in response to Message 1309.

        This WU was aborted by the watchdog on another machine but fished ok on my machine:

        http://ralph.bakerlab.org/workunit.php?wuid=82603

        Do you still receive the finished models if the watchdog kills a WU which gest stuck on model x?
        ____________

        Snake Doctor

        Joined: Feb 16 06
        Posts: 37
        ID: 44
        Credit: 996,938
        RAC: 0
        Message 1320 - Posted 23 Apr 2006 15:50:18 UTC - in response to Message 1313.

          Last modified: 23 Apr 2006 15:56:34 UTC

          Is anyone out there running with a Mac? Are your jobs from 5.02 or 5.03 running?

          We\'ve tried to make the watchdog a little less aggressive about aborting, and are having it give
          us back the reason for aborting. Let us know if
          you think these jobs are getting killed too soon, or
          too late. Thanks!



          I am running on Macs. I have a G4 Dual that has a 5.03 job. And a G4 Laptop that was running 2, 5.01 jobs. One of the jobs on the Laptop hung at 1.4295% for 12 hours, I restarted BOINC and it Erred, but I cant get it to report (it is still stuck on my Tasks tab) I have a second on that machine that I aborted and it is stuck in the task tab too.

          This may be a boinc thing. I had upgraded to boinc 5.4.4 per instructions from Rom for error checking. It looked ok but it is really not running right.

          Anyway. the 5.03 WU on the G4 seems to be running fine. I changed the run time settings for it last night to 4 hours, and set the system to \"remove apps from memory, to bang on it a while. It is at about 90% after 3:58 CPU time. I looked at the graphics last night and it seemed to be fine.

          EDIT/UPDATE - The two WU stuck in my task tab finally reported

          here is the one that was stuck for 12 hours.
          here is the one I aborted manually.

          Regards
          Phil
          ____________

          Divide Overflow

          Joined: Feb 15 06
          Posts: 12
          ID: 6
          Credit: 128,027
          RAC: 0
          Message 1324 - Posted 23 Apr 2006 17:34:06 UTC

            Last modified: 23 Apr 2006 17:36:04 UTC

            The watchdog seems to be more of a junkyard dog. ;) It killed off two of my v5.02 WU\'s that seemed to be running just fine:

            93821
            93726
            ____________

            Snake Doctor

            Joined: Feb 16 06
            Posts: 37
            ID: 44
            Credit: 996,938
            RAC: 0
            Message 1325 - Posted 23 Apr 2006 17:36:09 UTC

              (sorry for the second post, Darned 1 hour edit limit)

              Well my MAC G4 reported in this result for the only 5.03 WU I have had.

              It looks very normal to me.

              I do know the graphics were working (in fact they seemed faster somehow). I had no problems that I am aware of.

              Regards
              Phil
              ____________

              Profile Fuzzy Hollynoodles
              Avatar

              Joined: Feb 19 06
              Posts: 37
              ID: 585
              Credit: 2,089
              RAC: 0
              Message 1326 - Posted 23 Apr 2006 18:17:09 UTC

                Last modified: 23 Apr 2006 18:22:02 UTC

                I had this one: http://ralph.bakerlab.org/workunit.php?wuid=83796

                Result: http://ralph.bakerlab.org/result.php?resultid=94327

                I looked to the graphic when I saw it running, and it was stuck at about 1% without any movements at all. So I suppose the watchdog did it\'s job by killing it after some time.

                It ran about 80 minutes on my computer. It could have been killed a little sooner, I think, as it was totally dead, when I looked after about 45 minutes. I see the others, who ran it, did it in less time before it was killed.



                ____________

                "I'm trying to maintain a shred of dignity in this world." - Me

                MatthewBChambers

                Joined: Mar 13 06
                Posts: 4
                ID: 1102
                Credit: 5,367
                RAC: 0
                Message 1328 - Posted 23 Apr 2006 21:37:13 UTC

                  Host ID:
                  http://ralph.bakerlab.org/show_host_detail.php?hostid=2404

                  Result ID:
                  http://ralph.bakerlab.org/result.php?resultid=94285



                  Here is my 5.03 bug (in Windows XP, full details to follow):

                  4/23/2006 12:59:18 PM|ralph@home|Unrecoverable error for result NO_CHECK_NO_DOG_7486h002_dec123_1.pdb_408_5_2 (<file_xfer_error> <file_name>NO_CHECK_NO_DOG_7486h002_dec123_1.pdb_408_5_2_0</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error>)


                  Here is the context:
                  4/23/2006 12:36:34 PM|ralph@home|Resuming result NO_CHECK_NO_DOG_7486h002_dec123_1.pdb_408_5_2 using rosetta_beta version 503
                  4/23/2006 12:36:34 PM|boincsimap|Pausing result 60420100.007375_0 (left in memory)
                  4/23/2006 12:59:15 PM|ralph@home|Sending scheduler request to http://ralph.bakerlab.org/ralph_cgi/cgi
                  4/23/2006 12:59:15 PM|ralph@home|Reason: To fetch work
                  4/23/2006 12:59:15 PM|ralph@home|Requesting 43200 seconds of new work
                  4/23/2006 12:59:17 PM||request_reschedule_cpus: process exited
                  4/23/2006 12:59:17 PM|ralph@home|Computation for result NO_CHECK_NO_DOG_7486h002_dec123_1.pdb_408_5_2 finished
                  4/23/2006 12:59:17 PM|Predictor @ Home|Resuming result abeta_7_135392_2 using mfoldB125 version 428
                  4/23/2006 12:59:18 PM|ralph@home|Unrecoverable error for result NO_CHECK_NO_DOG_7486h002_dec123_1.pdb_408_5_2 (<file_xfer_error> <file_name>NO_CHECK_NO_DOG_7486h002_dec123_1.pdb_408_5_2_0</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error>)
                  4/23/2006 12:59:20 PM|ralph@home|Scheduler request to http://ralph.bakerlab.org/ralph_cgi/cgi succeeded
                  4/23/2006 12:59:21 PM|ralph@home|No work from project
                  4/23/2006 1:03:26 PM|ralph@home|Sending scheduler request to http://ralph.bakerlab.org/ralph_cgi/cgi
                  4/23/2006 1:03:26 PM|ralph@home|Reason: To fetch work
                  4/23/2006 1:03:26 PM|ralph@home|Requesting 43200 seconds of new work, and reporting 1 results
                  4/23/2006 1:03:31 PM|ralph@home|Scheduler request to http://ralph.bakerlab.org/ralph_cgi/cgi succeeded
                  4/23/2006 1:03:31 PM|ralph@home|No work from project


                  Here is the startup info for the computer:
                  4/15/2006 8:22:55 PM||Starting BOINC client version 5.2.13 for windows_intelx86
                  4/15/2006 8:22:55 PM||libcurl/7.14.0 OpenSSL/0.9.8 zlib/1.2.3
                  4/15/2006 8:22:55 PM||Data directory: C:\\Program Files\\BOINC
                  4/15/2006 8:22:56 PM||Processor: 1 GenuineIntel x86 Family 6 Model 8 Stepping 6 863MHz
                  4/15/2006 8:22:56 PM||Memory: 383.30 MB physical, 1.29 GB virtual
                  4/15/2006 8:22:56 PM||Disk: 24.41 GB total, 19.33 GB free
                  4/15/2006 8:22:56 PM|rosetta@home|Computer ID: 197494; location: home; project prefs: default
                  4/15/2006 8:22:56 PM|boincsimap|Computer ID: 17955; location: home; project prefs: default
                  4/15/2006 8:22:56 PM|Einstein@Home|Computer ID: 594228; location: home; project prefs: default
                  4/15/2006 8:22:56 PM|LHC@home|Computer ID: 142531; location: home; project prefs: default
                  4/15/2006 8:22:56 PM|Predictor @ Home|Computer ID: 237773; location: home; project prefs: default
                  4/15/2006 8:22:56 PM|ralph@home|Computer ID: 2404; location: home; project prefs: default
                  4/15/2006 8:22:56 PM|SETI@home|Computer ID: 2330542; location: home; project prefs: default
                  4/15/2006 8:22:56 PM|SZTAKI Desktop Grid|Computer ID: 17392; location: home; project prefs: default
                  4/15/2006 8:22:56 PM|World Community Grid|Computer ID: 31989; location: ; project prefs: default
                  4/15/2006 8:22:56 PM||General prefs: from ralph@home (last modified 2006-04-15 20:06:57)
                  4/15/2006 8:22:56 PM||General prefs: no separate prefs for home; using your defaults
                  4/15/2006 8:22:57 PM||Remote control not allowed; using loopback address

                  Profile Fuzzy Hollynoodles
                  Avatar

                  Joined: Feb 19 06
                  Posts: 37
                  ID: 585
                  Credit: 2,089
                  RAC: 0
                  Message 1330 - Posted 24 Apr 2006 2:30:39 UTC

                    My next one ran without erroring out, so I don\'t know if there was any watchdog or it was supposed to run normal.

                    http://ralph.bakerlab.org/workunit.php?wuid=83916

                    Result: http://ralph.bakerlab.org/result.php?resultid=94405


                    ____________

                    "I'm trying to maintain a shred of dignity in this world." - Me

                    casio7131

                    Joined: Mar 20 06
                    Posts: 15
                    ID: 1151
                    Credit: 12,660
                    RAC: 0
                    Message 1331 - Posted 24 Apr 2006 2:46:00 UTC

                      24/04/2006 3:14:57 AM|ralph@home|Unrecoverable error for result NOCHECK_DEFAULT_DOG_7486h002_dec184_1.pdb_418_5_0 (<file_xfer_error> <file_name>NOCHECK_DEFAULT_DOG_7486h002_dec184_1.pdb_418_5_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
                      http://ralph.bakerlab.org/result.php?resultid=94190
                      ____________

                      Profile anders n

                      Joined: Feb 16 06
                      Posts: 166
                      ID: 91
                      Credit: 131,419
                      RAC: 0
                      Message 1333 - Posted 24 Apr 2006 12:37:39 UTC

                        Last modified: 24 Apr 2006 13:03:56 UTC

                        A new error for me.

                        http://ralph.bakerlab.org/result.php?resultid=94349

                        This was error no 2 on this WU.

                        And another one

                        http://ralph.bakerlab.org/result.php?resultid=94350


                        Anders n

                        Edit no 2
                        ____________

                        Mike Gelvin
                        Avatar

                        Joined: Feb 17 06
                        Posts: 50
                        ID: 468
                        Credit: 55,397
                        RAC: 0
                        Message 1334 - Posted 24 Apr 2006 13:28:59 UTC - in response to Message 1331.

                          24/04/2006 3:14:57 AM|ralph@home|Unrecoverable error for result NOCHECK_DEFAULT_DOG_7486h002_dec184_1.pdb_418_5_0 (<file_xfer_error> <file_name>NOCHECK_DEFAULT_DOG_7486h002_dec184_1.pdb_418_5_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
                          http://ralph.bakerlab.org/result.php?resultid=94190



                          I thought credit was supposed to be granted on these?
                          ____________

                          Moderator9
                          Forum moderator

                          Joined: Feb 16 06
                          Posts: 251
                          ID: 210
                          Credit: 0
                          RAC: 0
                          Message 1335 - Posted 24 Apr 2006 14:31:01 UTC - in response to Message 1334.

                            24/04/2006 3:14:57 AM|ralph@home|Unrecoverable error for result NOCHECK_DEFAULT_DOG_7486h002_dec184_1.pdb_418_5_0 (<file_xfer_error> <file_name>NOCHECK_DEFAULT_DOG_7486h002_dec184_1.pdb_418_5_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
                            http://ralph.bakerlab.org/result.php?resultid=94190



                            I thought credit was supposed to be granted on these?

                            The system IS supposed to award the claimed credits, for \"Watchdog\" terminated Work Units. But the ones I have seen so far have always had some model information reported back. Yours seems to have a 161 error implying that something file related is in play. Rhiju will have to explain why it did not get awarded.

                            As you may recall in RALPH the credit will not be awarded after the fact, but we still need to know why it did not get awarded in the first place before this deploys to Rosetta.

                            ____________
                            Moderator9
                            RALPH@home FAQs
                            RALPH@home Guidelines
                            Moderator Contact

                            Yeti
                            Avatar

                            Joined: Feb 19 06
                            Posts: 30
                            ID: 581
                            Credit: 49,557
                            RAC: 0
                            Message 1336 - Posted 24 Apr 2006 14:59:10 UTC

                              HM, if you are looking for results, that have been \"killed\" by the watchdog and did not get credits, look at:

                              http://ralph.bakerlab.org/results.php?userid=581

                              There you can see several results, that have not been credited.


                              ____________


                              Supporting BOINC, a great concept !

                              Rhiju
                              Forum moderator
                              Project developer
                              Project scientist

                              Joined: Feb 14 06
                              Posts: 161
                              ID: 4
                              Credit: 3,725
                              RAC: 0
                              Message 1337 - Posted 24 Apr 2006 19:25:55 UTC - in response to Message 1335.

                                Thanks for the post; moderator 9 is right about what happened.
                                Sorry about the annoying file transfer error -- I\'ve fixed it. We will be testing it on ralph 5.04 later today.

                                24/04/2006 3:14:57 AM|ralph@home|Unrecoverable error for result NOCHECK_DEFAULT_DOG_7486h002_dec184_1.pdb_418_5_0 (<file_xfer_error> <file_name>NOCHECK_DEFAULT_DOG_7486h002_dec184_1.pdb_418_5_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
                                http://ralph.bakerlab.org/result.php?resultid=94190



                                I thought credit was supposed to be granted on these?

                                The system IS supposed to award the claimed credits, for \"Watchdog\" terminated Work Units. But the ones I have seen so far have always had some model information reported back. Yours seems to have a 161 error implying that something file related is in play. Rhiju will have to explain why it did not get awarded.

                                As you may recall in RALPH the credit will not be awarded after the fact, but we still need to know why it did not get awarded in the first place before this deploys to Rosetta.


                                ____________

                                Rhiju
                                Forum moderator
                                Project developer
                                Project scientist

                                Joined: Feb 14 06
                                Posts: 161
                                ID: 4
                                Credit: 3,725
                                RAC: 0
                                Message 1339 - Posted 25 Apr 2006 1:52:53 UTC - in response to Message 1326.

                                  GREAT! I actually forced an infinite loop in that one. Very glad it
                                  was killed by the watchdog.

                                  I had this one: http://ralph.bakerlab.org/workunit.php?wuid=83796

                                  Result: http://ralph.bakerlab.org/result.php?resultid=94327

                                  I looked to the graphic when I saw it running, and it was stuck at about 1% without any movements at all. So I suppose the watchdog did it\'s job by killing it after some time.

                                  It ran about 80 minutes on my computer. It could have been killed a little sooner, I think, as it was totally dead, when I looked after about 45 minutes. I see the others, who ran it, did it in less time before it was killed.




                                  ____________

                                  Message boards : RALPH@home bug list : Bug reports for Ralph 5.03


                                  Home | Join | About | Participants | Community | Statistics

                                  Copyright © 2017 University of Washington

                                  Last Modified: 20 Nov 2008 19:41:56 UTC
                                  Back to top ^