RALPH@home

Bug reports for Ralph 5.21

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search

Message boards : RALPH@home bug list : Bug reports for Ralph 5.21

AuthorMessage
Rhiju
Forum moderator
Project developer
Project scientist

Joined: Feb 14 06
Posts: 161
ID: 4
Credit: 3,725
RAC: 0
Message 1771 - Posted 5 Jun 2006 3:00:13 UTC

    In ralph 5.20 we made some fixes to the graphics and
    Rom put in some stuff to clean up how different threads (e.g. rosetta itself, graphics, teh watchdog) shut down at the end of the run. We expect this to reduce errors!

    In ralph 5.21 we fixed a small bug introduced into 5.20 where the application would hang for a little while after Rosetta finished and waited for the watchdog to check in (which could take up to an hour!).
    ____________

    [B^S] sTrey
    Avatar

    Joined: Feb 15 06
    Posts: 58
    ID: 36
    Credit: 15,430
    RAC: 0
    Message 1774 - Posted 5 Jun 2006 17:04:10 UTC

      Last modified: 5 Jun 2006 17:06:25 UTC

      My one 5.21 wu ran before I got up, but from the log I\'d say it finished without the delay. Could use some more wus :) to check graphics etc.
      One thing, the watchdog timer bug is said to have been introduced with 5.20 but I definitely saw the delayed-finishing behavior in 5.19. Makes me wonder.

      doc :)

      Joined: Feb 16 06
      Posts: 46
      ID: 60
      Credit: 4,437
      RAC: 0
      Message 1777 - Posted 5 Jun 2006 22:36:20 UTC

        3 successes without the delay or any errors as far as i can tell (was not there most of the time)

        and i agree, that bug was introduced before 5.20, i never had any 5.20 work but i definitely remember having that bug with 5.19. not that it is that important as long as it is fixed though :)

        Profile Fuzzy Hollynoodles
        Avatar

        Joined: Feb 19 06
        Posts: 37
        ID: 585
        Credit: 2,089
        RAC: 0
        Message 1778 - Posted 6 Jun 2006 3:13:57 UTC

          One successfull finished:

          http://ralph.bakerlab.org/workunit.php?wuid=129796

          Result: http://ralph.bakerlab.org/result.php?resultid=151527

          It uploaded fine after finishing.

          But it was a huge protein! After 3 hours it was at the same first model. And it ran almost 4 hours, even I have Target CPU run time to 2 hours.



          ____________

          "I'm trying to maintain a shred of dignity in this world." - Me

          Profile Fuzzy Hollynoodles
          Avatar

          Joined: Feb 19 06
          Posts: 37
          ID: 585
          Credit: 2,089
          RAC: 0
          Message 1779 - Posted 6 Jun 2006 7:38:30 UTC

            And the next finished fine:

            http://ralph.bakerlab.org/workunit.php?wuid=134423

            Result: http://ralph.bakerlab.org/result.php?resultid=152501

            No problems so far. :-)


            ____________

            "I'm trying to maintain a shred of dignity in this world." - Me

            Profile Astro

            Joined: Feb 16 06
            Posts: 141
            ID: 48
            Credit: 32,977
            RAC: 0
            Message 1780 - Posted 6 Jun 2006 11:33:56 UTC

              Last modified: 6 Jun 2006 11:49:57 UTC

              So far so good on my Celeron 500, Win98se, 256M ram, screen saver disabled. These are all the results for that machine(note: bottom two are 5.20, top two are 5.21):

              151320 128850 5 Jun 2006 7:44:20 UTC 6 Jun 2006 0:59:45 UTC Over Success Done 13,805.00 6.72 6.72
              150599 132927 5 Jun 2006 5:34:20 UTC 5 Jun 2006 18:05:24 UTC Over Success Done 7,576.00 3.69 3.69
              149963 132447 4 Jun 2006 2:58:27 UTC 4 Jun 2006 22:16:14 UTC Over Success Done 27,478.00 13.37 13.37
              149454 132032 3 Jun 2006 5:51:31 UTC 3 Jun 2006 16:34:09 UTC Over Success Done 14,206.00 6.91 6.91

              Also good on my P4 1.8 winxp, screensaver enabled (top is 5.21, rest are 5.19)
              150342 132671 5 Jun 2006 3:13:50 UTC 6 Jun 2006 10:40:40 UTC Over Success Done 16,171.36 24.95 24.95
              150341 132670 5 Jun 2006 3:13:50 UTC 5 Jun 2006 11:10:55 UTC Over Success Done 13,574.19 20.95 20.95
              144445 128098 1 Jun 2006 5:56:15 UTC 4 Jun 2006 0:35:55 UTC Over Success Done 14,410.27 23.46 23.46
              144444 128097 1 Jun 2006 5:56:15 UTC 2 Jun 2006 11:40:34 UTC Over Success Done 13,161.00 21.42 21.42
              144443 128096 1 Jun 2006 5:56:15 UTC 2 Jun 2006 3:43:23 UTC Over Success Done 13,093.22 21.31 21.31
              143910 126962 1 Jun 2006 0:06:20 UTC 2 Jun 2006 3:43:23 UTC Over Success Done 13,946.11 22.70 22.70

              Also good on my AMD 64 3700 \"Mobile\" 754 socket laptop, screensaver enabled: (top two 5.21, rest are 5.19)
              150593 132921 5 Jun 2006 5:16:58 UTC 6 Jun 2006 10:52:52 UTC Over Success Done 13,880.64 51.21 51.21
              150592 132920 5 Jun 2006 5:16:58 UTC 5 Jun 2006 13:43:50 UTC Over Success Done 14,694.41 54.21 54.21
              144382 128035 1 Jun 2006 5:31:50 UTC 2 Jun 2006 19:56:12 UTC Over Success Done 13,925.86 51.51 51.51
              144254 127907 1 Jun 2006 5:31:50 UTC 2 Jun 2006 11:40:12 UTC Over Success Done 13,994.19 51.77 51.77
              144253 127906 1 Jun 2006 5:31:50 UTC 2 Jun 2006 3:45:43 UTC Over Success Done 14,501.66 53.64 53.64

              Even good on my AMD64 3700 sandiego, although I have the screensaver disabled on this one. Would you like me to re enable the screensaver to see if I get fatal windows errors? (top is 5.21, next is 5.20, rest are 5.19)
              151133 133451 5 Jun 2006 7:04:09 UTC 5 Jun 2006 22:17:50 UTC Over Success Done 13,768.64 54.29 54.29
              150091 129771 4 Jun 2006 11:03:49 UTC 4 Jun 2006 20:18:34 UTC Over Success Done 9,495.28 37.54 37.54
              148489 131139 2 Jun 2006 11:47:20 UTC 4 Jun 2006 0:35:48 UTC Over Success Done 12,861.34 50.85 50.85
              145068 128721 1 Jun 2006 7:42:42 UTC 3 Jun 2006 11:26:40 UTC Over Success Done 13,851.55 54.76 54.76
              145067 128720 1 Jun 2006 7:42:42 UTC 2 Jun 2006 11:39:56 UTC Over Success Done 13,212.41 52.24 52.24

              Profile Carlos_Pfitzner
              Avatar

              Joined: Feb 16 06
              Posts: 182
              ID: 296
              Credit: 22,792
              RAC: 0
              Message 1782 - Posted 6 Jun 2006 14:17:34 UTC

                Last modified: 6 Jun 2006 14:46:43 UTC

                Rosetta_betta_5.21 Windows

                NO bugs yet !


                It needs of about 110 MB into RAM,
                and the total RAM Utilization (ram+swap) is about 325 MB

                What means that any pc with 256 MB physical RAM can run it without problems

                The graphics screen uses at most 10% CPU

                Thus, I consider ralph 5.21 good to replace current rosetta 5.16

                btw: I have sucessfully crunched following WUs (5.21) without any problems
                http://ralph.bakerlab.org/result.php?resultid=150394
                http://ralph.bakerlab.org/result.php?resultid=150487
                http://ralph.bakerlab.org/result.php?resultid=151970
                http://ralph.bakerlab.org/result.php?resultid=151971
                http://ralph.bakerlab.org/result.php?resultid=152703
                http://ralph.bakerlab.org/result.php?resultid=152868
                http://ralph.bakerlab.org/result.php?resultid=153082
                http://ralph.bakerlab.org/result.php?resultid=153169
                http://ralph.bakerlab.org/result.php?resultid=153204

                ps: Why not use 3Dnow! to speedup float point operations ? (Athlon XP+)
                *On einsten, crunching time went from 6 hours wu to 1 hour wu , cause 3dnow!
                and cpu 5 C hotter -:)

                Thanks,
                ____________
                Click signature for global team stats

                tralala

                Joined: Apr 12 06
                Posts: 52
                ID: 1266
                Credit: 15,257
                RAC: 0
                Message 1783 - Posted 6 Jun 2006 14:29:31 UTC

                  Last modified: 6 Jun 2006 14:31:37 UTC

                  I had four good results with 5.21.
                  However I noticed that no checkpointing was done between the models. On my fast computer a model completed between 10 and 25 minutes. For this WU for example it took 25 minutes between the checkpoints (models) which can translate in over an hour on a slow Mac.

                  Over at Rosetta people are \"complaining\" that it may take between 90-120 minutes for a WU to reach its first checkpoint.

                  What happened to more often checkpointing?
                  ____________

                  NJMHoffmann

                  Joined: Feb 17 06
                  Posts: 8
                  ID: 395
                  Credit: 1,270
                  RAC: 0
                  Message 1784 - Posted 6 Jun 2006 19:39:43 UTC - in response to Message 1783.

                    Last modified: 6 Jun 2006 19:48:49 UTC

                    For this WU for example it took 25 minutes between the checkpoints (models) which can translate in over an hour on a slow Mac.
                    And this was one of the small (t314 with Nres=106) targets.

                    Over at Rosetta people are \"complaining\" that it may take between 90-120 minutes for a WU to reach its first checkpoint.
                    I am running two WUs for the slightly bigger targets 299 (Nres=180). One is at 40 min., one at 35. No checkpoint until now.
                    (Edit: finished first models at 44/38 mins.)

                    You can imagine how the t296 with Nres=445 looked like.

                    What happened to more often checkpointing?
                    Inquiring minds want to know :-)

                    Norbert (waiting for the boinc client, that waits for a checkpoint before switching the task)
                    ____________

                    Profile Astro

                    Joined: Feb 16 06
                    Posts: 141
                    ID: 48
                    Credit: 32,977
                    RAC: 0
                    Message 1785 - Posted 6 Jun 2006 20:25:43 UTC

                      Last modified: 6 Jun 2006 20:26:01 UTC

                      OOPS, spoke to soon. On my AMD64 3700 I got excited about a graphics fix, so I turned ON the screensaver.

                      wuid=133453

                      got another fatal windows error.


                      Result ID 151135
                      Name t307__CASP7_ABRELAX_SAVE_ALL_OUT_CONTACT_hom001__649_260_0
                      Workunit 133453
                      Created 5 Jun 2006 5:31:35 UTC
                      Sent 5 Jun 2006 7:04:09 UTC
                      Received 6 Jun 2006 20:16:34 UTC
                      Server state Over
                      Outcome Client error
                      Client state Computing
                      Exit status -1073741811 (0xffffffffc000000d)
                      Computer ID 2172
                      Report deadline 9 Jun 2006 7:04:09 UTC
                      CPU time 13127.953125
                      stderr out <core_client_version>5.4.9</core_client_version>
                      <message>
                      - exit code -1073741811 (0xc000000d)
                      </message>
                      <stderr_txt>
                      # random seed: 3033888
                      # cpu_run_time_pref: 14400

                      </stderr_txt>


                      Validate state Invalid
                      Claimed credit 51.7612201395287
                      Granted credit 0
                      application version 5.21

                      Rom Walton (BOINC)
                      Forum moderator
                      Project developer

                      Joined: Mar 10 06
                      Posts: 21
                      ID: 1057
                      Credit: 5,515
                      RAC: 0
                      Message 1788 - Posted 6 Jun 2006 20:31:03 UTC

                        Tony,

                        What kind of graphics adapter do you have on that machine?

                        ____________

                        Profile Astro

                        Joined: Feb 16 06
                        Posts: 141
                        ID: 48
                        Credit: 32,977
                        RAC: 0
                        Message 1790 - Posted 6 Jun 2006 21:14:29 UTC - in response to Message 1788.

                          Last modified: 6 Jun 2006 21:20:18 UTC

                          Tony,

                          What kind of graphics adapter do you have on that machine?

                          AMD64 3700 Sandiego processor, Asus A8N-E mobo, Asus EN6200TC256/TD/64M/A Pci express video card, 1 GB OCZ Gold RAM. This is my only machine giving this fatal windows error, I has Nvidia chipsets in both the mobo and Video card.

                          tony

                          display \"plug and play monitor onn NVIDIA Geforce 6200 TurboCache(TM)

                          Says ASUS OSD provide you the access to dynamically adjust parameters in D3D or OpenGL games by hotkeys.

                          Graphics card info
                          GeForce 6200TurboCache
                          Video Bios Version, 5.44.02.11
                          IRQ 18
                          PCI Express X16
                          256 MB memory
                          ForceWare Version 71.24
                          TV Encoder Type: Nvidia integrated

                          Rhiju
                          Forum moderator
                          Project developer
                          Project scientist

                          Joined: Feb 14 06
                          Posts: 161
                          ID: 4
                          Credit: 3,725
                          RAC: 0
                          Message 1791 - Posted 7 Jun 2006 1:32:59 UTC - in response to Message 1777.

                            Last modified: 7 Jun 2006 1:42:02 UTC

                            Hi: sorry, the delay bug may have been introduced in 5.19. I\'m glad you all posted about it to give us a chance to fix it.

                            I think the most common refrain on ralph message boards is that there\'s not enough work. So we\'re trying a new strategy for our workunit queue -- from now on, there will always be work in the queue to help us debug continuously! We didn\'t do this before because we needed quick turnaround for certain new workunits every couple days. Now we\'ve changed the workunit buffer size and priority system to let us send out new jobs quickly while maintaining a trickle of regular jobs at other times. Does that sound OK to everyone?

                            3 successes without the delay or any errors as far as i can tell (was not there most of the time)

                            and i agree, that bug was introduced before 5.20, i never had any 5.20 work but i definitely remember having that bug with 5.19. not that it is that important as long as it is fixed though :)


                            ____________

                            Rhiju
                            Forum moderator
                            Project developer
                            Project scientist

                            Joined: Feb 14 06
                            Posts: 161
                            ID: 4
                            Credit: 3,725
                            RAC: 0
                            Message 1792 - Posted 7 Jun 2006 1:41:40 UTC - in response to Message 1784.

                              This was a mistake, thanks very much for pointing it out. We had some of the jobs sent out without checkpointing, and now we\'re switching back. There will be a delay, though, because those jobs are already in the queue... for all the jobs we sent out, we\'ve made sure that the time between decoys is, on average, less than half an hour (but longer on a Mac).

                              For the crazy t296 protein, we\'re not doing the second, super-long \"relax\" stage
                              of the Rosetta protocol. But the first stage takes long enough on those guys that we should put some checkpoints in that stage too... this will require changing the app, and I can try doing this over the next week. Again, thanks for the suggestion!

                              For this WU for example it took 25 minutes between the checkpoints (models) which can translate in over an hour on a slow Mac.
                              And this was one of the small (t314 with Nres=106) targets.

                              Over at Rosetta people are \"complaining\" that it may take between 90-120 minutes for a WU to reach its first checkpoint.
                              I am running two WUs for the slightly bigger targets 299 (Nres=180). One is at 40 min., one at 35. No checkpoint until now.
                              (Edit: finished first models at 44/38 mins.)

                              You can imagine how the t296 with Nres=445 looked like.

                              What happened to more often checkpointing?
                              Inquiring minds want to know :-)

                              Norbert (waiting for the boinc client, that waits for a checkpoint before switching the task)


                              ____________

                              [B^S] suguruhirahara

                              Joined: Mar 5 06
                              Posts: 40
                              ID: 992
                              Credit: 6,001
                              RAC: 0
                              Message 1793 - Posted 7 Jun 2006 1:47:04 UTC - in response to Message 1791.

                                So we\'re trying a new strategy for our workunit queue -- from now on, there will always be work in the queue to help us debug continuously! ... Does that sound OK to everyone?
                                it sounds okay.

                                I got an error with this result.
                                http://ralph.bakerlab.org/result.php?resultid=152634

                                Outcome Client error
                                Client state Computing
                                Exit status 3 (0x3)

                                ____________

                                Profile Carlos_Pfitzner
                                Avatar

                                Joined: Feb 16 06
                                Posts: 182
                                ID: 296
                                Credit: 22,792
                                RAC: 0
                                Message 1794 - Posted 7 Jun 2006 2:48:46 UTC

                                  To Rom Walton (BOINC)

                                  Cause 5.20 had the freeze bug, and I was with a lot of 5.20 WUs
                                  on my Queue, I did a reset for ralph project ...

                                  All well, it did worked, and I started receiving 5.21 WUs -:)

                                  The bug are these phantom WU(s) listed on the server side


                                  150394 132722 5 Jun 2006 5:07:36 UTC 5 Jun 2006 15:01:10 UTC Over Success Done 3,454.54 19.77 19.77
                                  150241 132571 5 Jun 2006 1:58:12 UTC 9 Jun 2006 1:58:12 UTC In Progress Unknown New --- --- ---
                                  150240 132570 5 Jun 2006 1:58:12 UTC 9 Jun 2006 1:58:12 UTC In Progress Unknown New --- --- ---
                                  150239 132569 5 Jun 2006 1:58:12 UTC 9 Jun 2006 1:58:12 UTC In Progress Unknown New --- --- ---
                                  150238 132568 5 Jun 2006 1:58:11 UTC 9 Jun 2006 1:58:11 UTC In Progress Unknown New --- --- ---
                                  150237 132567 5 Jun 2006 1:58:11 UTC 9 Jun 2006 1:58:11 UTC In Progress Unknown New --- --- ---
                                  150083 128443 4 Jun 2006 6:53:49 UTC 4 Jun 2006 21:40:58 UTC Over Client error Computing 3,273.47 18.47 ---


                                  I hope a \"Fix\" for the boinc \"server side\" can be done, to avoid that phantom(s)

                                  *ALL above listed WUs was on my queue (5.20), that the reset get rid of

                                  However continue \"listed\" on the server as \"In progress\"
                                  waiting deadtime be over, to change as \"No reply\" -:(

                                  Thanks
                                  ____________
                                  Click signature for global team stats

                                  [B^S] sTrey
                                  Avatar

                                  Joined: Feb 15 06
                                  Posts: 58
                                  ID: 36
                                  Credit: 15,430
                                  RAC: 0
                                  Message 1796 - Posted 7 Jun 2006 6:59:30 UTC - in response to Message 1791.

                                    I think the most common refrain on ralph message boards is that there\'s not enough work. So we\'re trying a new strategy for our workunit queue -- from now on, there will always be work in the queue to help us debug continuously! We didn\'t do this before because we needed quick turnaround for certain new workunits every couple days. Now we\'ve changed the workunit buffer size and priority system to let us send out new jobs quickly while maintaining a trickle of regular jobs at other times. Does that sound OK to everyone?


                                    Sounds great, thanks!

                                    4 results no delay, no graphics problems so far. As soon as it works off a little debt will be back crunching :)

                                    Profile feet1st

                                    Joined: Mar 7 06
                                    Posts: 312
                                    ID: 1028
                                    Credit: 110,522
                                    RAC: 0
                                    Message 1805 - Posted 9 Jun 2006 20:55:42 UTC - in response to Message 1791.

                                      Last modified: 9 Jun 2006 20:55:52 UTC

                                      So we\'re trying a new strategy for our workunit queue

                                      That will be great. That way when we post on Rosetta suggesting someone join Ralph... they can actually get down to testing right away.

                                      You might want to post a message on the homepage to alert people to this. And suggest how they should adjust resource share to control Ralph\'s crunching, rather than counting on the lack of WUs to be the limiting factor.

                                      ____________

                                      Message boards : RALPH@home bug list : Bug reports for Ralph 5.21


                                      Home | Join | About | Participants | Community | Statistics

                                      Copyright © 2017 University of Washington

                                      Last Modified: 20 Nov 2008 19:41:56 UTC
                                      Back to top ^