RALPH@home

Bug Reports for 5.45

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search

Message boards : RALPH@home bug list : Bug Reports for 5.45

AuthorMessage
Chu
Forum moderator
Project developer
Project scientist

Joined: Sep 26 06
Posts: 61
ID: 1900
Credit: 12,545
RAC: 0
Message 2715 - Posted 27 Jan 2007 22:09:57 UTC

    Last modified: 28 Jan 2007 1:53:28 UTC

    Ralph has been updated to 5.45. In this update, we include a fix to the long known graphic problem and we would like to send it out for a test here RALPH first. In our beta test on our local windows and mac hosts, different rosetta jobs, which used to crash within 5 to 10 minutes with graphics on, are running in a much more stable manner. Given the desriable test results, we turned back the sidechain drawing and mouse-rotation features. Please give it a try either by turning on graphics in boinc manager or by enabling boinc screensaver. If you spot any problem, please report to us here ( more detailed description on errors are prefered ). Thanks.

    For Mac users, even with the fix we still see that sometimes the graphic frame is suddently frozen due to an entrapment in the graphic thread (somewhere in glut library). When this happens, the graphic window can be closed without any problem but just can not be re-opened. The effect is limited to the graphic thread only and the worker thread still run properly (you can see increased progress) and return valid results when it finishes (Before the fix, it used to crash both the graphic thread and worker thread, and trigger a segmentation violation or bus error). If you see similar behavior for Ralph jobs, please keep the WU cruching and see if the WU will indeed produce results properly in the end. Thanks.

    For windows users, we did not see any problem so far in our local tests and would like to see how it goes with Ralph.

    Profile feet1st

    Joined: Mar 7 06
    Posts: 312
    ID: 1028
    Credit: 110,522
    RAC: 0
    Message 2717 - Posted 28 Jan 2007 1:44:10 UTC

      Yippie!! Project TFlops here we come!

      Do you plan to do several batches of Ralph testing? People need time to suspend Rosetta so they can enable the screensaver to test the Ralph tasks, and then time to catch some tasks available on the server etc. etc.

      1,000 tasks, twice a day for a few days?

      Keep in mind, most users now do not use the screensaver. And most Ralph users also run Rosetta, so we\'re going to have to do a little jockeying around to do some good tests.
      ____________

      Profile KSMarksPsych
      Avatar

      Joined: Feb 16 06
      Posts: 40
      ID: 72
      Credit: 8,226
      RAC: 0
      Message 2718 - Posted 28 Jan 2007 3:35:52 UTC

        I just successfully completed one WU.

        Opened the graphics window and played around rotating the protein.

        Using BOINC 5.8.6a. P4 2.8, 512 of RAM, XP Pro.

        Profile KSMarksPsych
        Avatar

        Joined: Feb 16 06
        Posts: 40
        ID: 72
        Credit: 8,226
        RAC: 0
        Message 2719 - Posted 28 Jan 2007 3:38:07 UTC - in response to Message 2718.

          I just successfully completed one WU.

          Opened the graphics window and played around rotating the protein.

          Using BOINC 5.8.6a. P4 2.8, 512 of RAM, XP Pro.



          This WU


          [aside]What happened to message editing... or was it never here?[/aside]

          Chu
          Forum moderator
          Project developer
          Project scientist

          Joined: Sep 26 06
          Posts: 61
          ID: 1900
          Credit: 12,545
          RAC: 0
          Message 2720 - Posted 28 Jan 2007 4:37:01 UTC - in response to Message 2717.

            Good point. How should we proceed? Right after the update this afternoon, we sent out about 600 WUs and now half of them are already done. However, my guess is that most of them were crunched without graphics at all as people may not know the update in time to enable their graphics. We do need to send several batches for testing, and I just want to spread the words a little bit more before doing so. There are two ways by which people can help testing:

            1. keep the screensaver disabled but manually enable graphics within boinc $manager by pushing \"show graphics\" button ( as reported by KSMarksPsych above). This way Rosetta@home does not have to be suspended, but more users\' attention are required.

            2. suspend Rosetta@Home first and enable boinc screensaver. My only concern is that TFflops for Rosetta may drop temporarily and Ralph may not have enough WUs to feed all the testing hosts, thus a lot of time will be wasted.

            I personally prefer the first option, but if anybody has a better solution, please let us know. Meanwhile, we will send out graphics testing WUs periodically so that it can provide enough coverage before drawing the conclusion.

            Yippie!! Project TFlops here we come!

            Do you plan to do several batches of Ralph testing? People need time to suspend Rosetta so they can enable the screensaver to test the Ralph tasks, and then time to catch some tasks available on the server etc. etc.

            1,000 tasks, twice a day for a few days?

            Keep in mind, most users now do not use the screensaver. And most Ralph users also run Rosetta, so we\'re going to have to do a little jockeying around to do some good tests.

            Chu
            Forum moderator
            Project developer
            Project scientist

            Joined: Sep 26 06
            Posts: 61
            ID: 1900
            Credit: 12,545
            RAC: 0
            Message 2721 - Posted 28 Jan 2007 4:39:38 UTC - in response to Message 2718.

              Great, one positive data point, thanks for the report. If possible, try to leave the graphic window open even if you do not stay in front your computer all the time.

              I just successfully completed one WU.

              Opened the graphics window and played around rotating the protein.

              Using BOINC 5.8.6a. P4 2.8, 512 of RAM, XP Pro.

              Profile anders n

              Joined: Feb 16 06
              Posts: 166
              ID: 91
              Credit: 131,419
              RAC: 0
              Message 2724 - Posted 28 Jan 2007 8:03:18 UTC

                I\'m new to MAC but when I try to zoom in and out on the grafics

                it just rotates.

                Is it just me or is somthing not right?

                Anders n
                ____________

                Profile anders n

                Joined: Feb 16 06
                Posts: 166
                ID: 91
                Credit: 131,419
                RAC: 0
                Message 2725 - Posted 28 Jan 2007 8:04:35 UTC - in response to Message 2724.

                  I\'m new to MAC but when I try to zoom in and out on the grafics

                  it just rotates.

                  Is it just me or is somthing not right?

                  Anders n


                  Hmmm where did edit go???

                  It works like it should on the windows computers :)

                  Anders n

                  ____________

                  Tom Philippart

                  Joined: Jun 24 06
                  Posts: 4
                  ID: 1554
                  Credit: 883
                  RAC: 0
                  Message 2726 - Posted 28 Jan 2007 10:00:10 UTC

                    http://ralph.bakerlab.org/result.php?resultid=407247
                    Windows Vista x64
                    I pressed \"show graphics\" and left them on and played a lot with them during the whole runtime of the WU, no problems!
                    ____________

                    Profile [AF>France>TDM>Centre]Jeannot Le Tazon

                    Joined: Jun 11 06
                    Posts: 3
                    ID: 1513
                    Credit: 1,754
                    RAC: 0
                    Message 2727 - Posted 28 Jan 2007 10:46:57 UTC - in response to Message 2720.

                      1. keep the screensaver disabled but manually enable graphics within boinc $manager by pushing \"show graphics\" button ( as reported by KSMarksPsych above).

                      Wu http://ralph.bakerlab.org/result.php?resultid=407583 OK

                      Profile anders n

                      Joined: Feb 16 06
                      Posts: 166
                      ID: 91
                      Credit: 131,419
                      RAC: 0
                      Message 2729 - Posted 28 Jan 2007 13:55:13 UTC

                        http://ralph.bakerlab.org/result.php?resultid=406892

                        Not a grafics but a stuck WU.

                        Anders n
                        ____________

                        Profile feet1st

                        Joined: Mar 7 06
                        Posts: 312
                        ID: 1028
                        Credit: 110,522
                        RAC: 0
                        Message 2730 - Posted 28 Jan 2007 15:38:52 UTC - in response to Message 2720.

                          1. keep the screensaver disabled but manually enable graphics within boinc $manager by pushing \"show graphics\" button ( as reported by KSMarksPsych above). This way Rosetta@home does not have to be suspended, but more users\' attention are required.

                          2. suspend Rosetta@Home first and enable boinc screensaver. My only concern is that TFflops for Rosetta may drop temporarily and Ralph may not have enough WUs to feed all the testing hosts, thus a lot of time will be wasted.

                          I personally prefer the first option, but if anybody has a better solution, please let us know. Meanwhile, we will send out graphics testing WUs periodically so that it can provide enough coverage before drawing the conclusion.


                          Ya my TFlops comment was optimisitically looking forward to the new code rolling out to Rosetta and less users there having problems or confusion, or leaving due to failures.

                          I think just do what you\'re doing, keep small amounts of work coming at various times of day (think dial-up, each day after work). But I just wanted to point out that this test has enough special circumstances around it that it needs more time then most you\'ve done before here on Ralph.

                          Speaking of TFlops, were you able to devise thread safety without too much of a performance impact? I\'ve always been curious how many conformations would be showing if the graphic actually showed each and every one of them.

                          I picked up two DOC WUs last night on the PC that I was trying (and having problems with) previously, running 24hr time pref. so they\'re 6.5hrs in without any graphics enabled. Then I\'ll be using my PC most of today and have suspended Rosetta and enabled the ss for tonight.

                          ...2 DOC WUs, one using 204MB the other using 177MB. So, I\'ll ask again, is there a simple way we can tell that a given WU was designed for high memory systems?

                          ____________

                          Profile anders n

                          Joined: Feb 16 06
                          Posts: 166
                          ID: 91
                          Credit: 131,419
                          RAC: 0
                          Message 2732 - Posted 28 Jan 2007 16:46:30 UTC

                            I was running 2 Wu-s at the same time on my MAC.
                            1 with grafics window on 1 without.
                            I did not get a true picture of how much cpu power the grafics
                            takes (due to that the Wu without grafics got stuck) but after
                            3H runtime the grafics WU was 18 min back.

                            Anders n
                            ____________

                            Chu
                            Forum moderator
                            Project developer
                            Project scientist

                            Joined: Sep 26 06
                            Posts: 61
                            ID: 1900
                            Credit: 12,545
                            RAC: 0
                            Message 2733 - Posted 28 Jan 2007 18:07:05 UTC - in response to Message 2730.

                              The current fix should not have any impact on the performance as compared to before.

                              We can define a high memory requirement in our job submission script to instruct only sending out the batch to cilents with larger memory. For most of the rosetta jobs, the default vaule should be fine, but with Rosetta design coming along, it will probably require more memory than usual.


                              Ya my TFlops comment was optimisitically looking forward to the new code rolling out to Rosetta and less users there having problems or confusion, or leaving due to failures.

                              I think just do what you\'re doing, keep small amounts of work coming at various times of day (think dial-up, each day after work). But I just wanted to point out that this test has enough special circumstances around it that it needs more time then most you\'ve done before here on Ralph.

                              Speaking of TFlops, were you able to devise thread safety without too much of a performance impact? I\'ve always been curious how many conformations would be showing if the graphic actually showed each and every one of them.

                              I picked up two DOC WUs last night on the PC that I was trying (and having problems with) previously, running 24hr time pref. so they\'re 6.5hrs in without any graphics enabled. Then I\'ll be using my PC most of today and have suspended Rosetta and enabled the ss for tonight.

                              ...2 DOC WUs, one using 204MB the other using 177MB. So, I\'ll ask again, is there a simple way we can tell that a given WU was designed for high memory systems?

                              Profile feet1st

                              Joined: Mar 7 06
                              Posts: 312
                              ID: 1028
                              Credit: 110,522
                              RAC: 0
                              Message 2734 - Posted 28 Jan 2007 18:25:07 UTC

                                My point was just that I am observing a Ralph WU that takes 200MB to run. That is high enough I know such a WU should probably be given the \"high memory only\" designation on the serverl; or? perhaps it isn\'t running correctly. But, to my knowledge, I have no way to tell (since I do have a high memory machine) whether this \"high memory only\" designation has properly been made. If there were something in the WU name, or in an XML file somewhere that we could check, we\'d know when to notify you when we observe memory use beyond your plan. Perhaps a \"HM\" or \"LM\" designation somewhere in the WU name.
                                ____________

                                Viromancy

                                Joined: Jan 20 07
                                Posts: 7
                                ID: 2554
                                Credit: 1,425
                                RAC: 0
                                Message 2735 - Posted 28 Jan 2007 20:34:22 UTC

                                  Failed WU here.

                                  Same type of error that forced me to stop crunching Rosetta altogether after decreasing stability for ver 5.43 resulted in around 75% of WUs aborting prematurely. Never had this problem at all with any WUs from other BOINC applications I run (World Community Grid/Malaria Control) and very rare with Rosetta before version 5.43. Had one instance of the same with version 5.44 here. Also, along with others, saw three odd, unrelated WU failures with ver 5.44 just before 5.45 was introduced here, here and here. I know these latter aren\'t ver 5.45, but for sake of completelness I thought it was worth mentioning.

                                  I don\'t use graphics, at all. All these errors, and almost all of the constant errors being thrown up by Rosetta ver 5.43, occurred while the application was running in the background and the machine was otherwise idle.

                                  Profile anders n

                                  Joined: Feb 16 06
                                  Posts: 166
                                  ID: 91
                                  Credit: 131,419
                                  RAC: 0
                                  Message 2738 - Posted 29 Jan 2007 5:21:11 UTC

                                    1 more stck Wu on my MAC.

                                    http://ralph.bakerlab.org/result.php?resultid=406892

                                    I will set the target time 4 H to se if it problem dissapears.

                                    Anders n
                                    ____________

                                    Chu
                                    Forum moderator
                                    Project developer
                                    Project scientist

                                    Joined: Sep 26 06
                                    Posts: 61
                                    ID: 1900
                                    Credit: 12,545
                                    RAC: 0
                                    Message 2740 - Posted 29 Jan 2007 16:50:41 UTC - in response to Message 2735.

                                      Hi Viromancy, I am a little surprised to hear that even with graphics disabled, you only got 75% failure rate for Rosetta@Home and from our current statistics, that number on average stays below 10% for windows platform. The error message you got is certainly one of the symtoms related to graphics, but definitely not limited to that. May I ask if you have experienced any stability issue with your machine in general? We certainly do not want to lose users because of application stability and that is why we are trying to work on improving it. Maybe you can check whether this is improved in 5.45 and if the failure rate goes down significantly, you may considering attaching back to Rosetta@Home.

                                      BTW, the last three failure mentioned below in your post were caused by some problems in Rosetta science code and that is exactly the purpose running the alpha test to catch it.

                                      Failed WU here.

                                      Same type of error that forced me to stop crunching Rosetta altogether after decreasing stability for ver 5.43 resulted in around 75% of WUs aborting prematurely. Never had this problem at all with any WUs from other BOINC applications I run (World Community Grid/Malaria Control) and very rare with Rosetta before version 5.43. Had one instance of the same with version 5.44 here. Also, along with others, saw three odd, unrelated WU failures with ver 5.44 just before 5.45 was introduced here, here and here. I know these latter aren\'t ver 5.45, but for sake of completelness I thought it was worth mentioning.

                                      I don\'t use graphics, at all. All these errors, and almost all of the constant errors being thrown up by Rosetta ver 5.43, occurred while the application was running in the background and the machine was otherwise idle.

                                      Chu
                                      Forum moderator
                                      Project developer
                                      Project scientist

                                      Joined: Sep 26 06
                                      Posts: 61
                                      ID: 1900
                                      Credit: 12,545
                                      RAC: 0
                                      Message 2741 - Posted 29 Jan 2007 16:52:38 UTC - in response to Message 2738.

                                        Thanks Anders n, that might be due to a bad trajectory.

                                        1 more stck Wu on my MAC.

                                        http://ralph.bakerlab.org/result.php?resultid=406892

                                        I will set the target time 4 H to se if it problem dissapears.

                                        Anders n

                                        Viromancy

                                        Joined: Jan 20 07
                                        Posts: 7
                                        ID: 2554
                                        Credit: 1,425
                                        RAC: 0
                                        Message 2742 - Posted 29 Jan 2007 18:44:54 UTC - in response to Message 2740.

                                          The error message you got is certainly one of the symtoms related to graphics, but definitely not limited to that. May I ask if you have experienced any stability issue with your machine in general?


                                          Hi Chu. Apologies for the long post.

                                          No, I\'ve never had any stability issue with my machine for any applications I run on it, with the sole exception that it doesn\'t like running the BOINC manager at the same time as I\'m ripping DVDs. Other than that, it\'s rock solid. It\'s fairly well overclocked -I\'m running a Core2Duo E6700 at 3.46 GHz, and my PC6400-rated RAM is actually running as PC8200 - but it\'s tested completely stable and several months of running both cores at 100% capacity 24/7 has never generated a single error for any BOINC application WU except Rosetta. Rosetta, though, became very touchy about running. It would inevitably fail a WU that was pre-empted and swapped out to allow something else to run. I had to leave it runing all the time on one core.

                                          We certainly do not want to lose users because of application stability and that is why we are trying to work on improving it. Maybe you can check whether this is improved in 5.45 and if the failure rate goes down significantly, you may considering attaching back to Rosetta@Home.


                                          I was quite puzzled and a bit disturbed at how the failure rate on Rosetta got more and more pronounced over time without any change to my machine\'s configuration or any other evidence of instability. I kept going for as long as possible because I liked crunching Rosetta and I\'d accumulated a very respectable number of WUs. But the failure rate was becoming alarming, and on the 15th-16th January this year some 75-80% of all WUs aborted prematurely. That\'s when I regretfully had to call a halt. I joined RALPH to see whether the newer versions were more stable with an eye to going back to Rosetta when they\'re implemented. It\'s hard to tell, since the fairly irregular availability of work means I don\'t have a large WU base to draw conclusions from, but both 5.45 and 5.44 before it seem more stable than 5.43 on my machine; for one thing, they can both be swapped in and out to allow other BOINC applications to run without causing problems.

                                          Out of curiosity, since the beta versions seemed more stable, I allowed my BOINC manager to download some new Rosetta workunits under 5.43 on Jan 27th. Sure enough, the first three it tried to run all failed with access violations, here, here and here. The fourth WU succeeded. By that stage, though, I\'d had enough again and shut it down.

                                          I have no idea why this is happening, and the 10% failure rate you mention would have been, if anything, an overestimate of the situation during the first few months I was crunching. The problems really seem to stem from the introduction of 5.43; which is puzzling since I don\'t use the graphics. I\'ll certainly try Rosetta again when 5.43 is upgraded, but I\'d be a lot happier if I knew what was going wrong.


                                          Chu
                                          Forum moderator
                                          Project developer
                                          Project scientist

                                          Joined: Sep 26 06
                                          Posts: 61
                                          ID: 1900
                                          Credit: 12,545
                                          RAC: 0
                                          Message 2743 - Posted 29 Jan 2007 19:18:43 UTC - in response to Message 2742.

                                            If there is any apology to make, that should be from us. Thank you for your time and effort helping us.

                                            I see, you were having problem of pre-empting a rosetta job and swapping it in and out with other BOINC applications. This is consistent with my previous speculation that your problem is probably not grahic-related. Honestly speaking, I don\'t know exactly either about what has gone wrong, but it could be somehow related to the BOINC api we were using for the rosetta 5.43 (though it did not explain why the problem did not happen universally on all other cilents\' machines). The current 5.45 being tested on Ralph has been built with the newest version of BOINC API and that might help solve your problem. The plan is to put it on Rosetta@Home either later today or tomorrow. So please give it a try when it is upgraded and see if things improve on your side. Again, thank you for your generous contribution to our project.

                                            The error message you got is certainly one of the symtoms related to graphics, but definitely not limited to that. May I ask if you have experienced any stability issue with your machine in general?


                                            Hi Chu. Apologies for the long post.

                                            No, I\'ve never had any stability issue with my machine for any applications I run on it, with the sole exception that it doesn\'t like running the BOINC manager at the same time as I\'m ripping DVDs. Other than that, it\'s rock solid. It\'s fairly well overclocked -I\'m running a Core2Duo E6700 at 3.46 GHz, and my PC6400-rated RAM is actually running as PC8200 - but it\'s tested completely stable and several months of running both cores at 100% capacity 24/7 has never generated a single error for any BOINC application WU except Rosetta. Rosetta, though, became very touchy about running. It would inevitably fail a WU that was pre-empted and swapped out to allow something else to run. I had to leave it runing all the time on one core.

                                            We certainly do not want to lose users because of application stability and that is why we are trying to work on improving it. Maybe you can check whether this is improved in 5.45 and if the failure rate goes down significantly, you may considering attaching back to Rosetta@Home.


                                            I was quite puzzled and a bit disturbed at how the failure rate on Rosetta got more and more pronounced over time without any change to my machine\'s configuration or any other evidence of instability. I kept going for as long as possible because I liked crunching Rosetta and I\'d accumulated a very respectable number of WUs. But the failure rate was becoming alarming, and on the 15th-16th January this year some 75-80% of all WUs aborted prematurely. That\'s when I regretfully had to call a halt. I joined RALPH to see whether the newer versions were more stable with an eye to going back to Rosetta when they\'re implemented. It\'s hard to tell, since the fairly irregular availability of work means I don\'t have a large WU base to draw conclusions from, but both 5.45 and 5.44 before it seem more stable than 5.43 on my machine; for one thing, they can both be swapped in and out to allow other BOINC applications to run without causing problems.

                                            Out of curiosity, since the beta versions seemed more stable, I allowed my BOINC manager to download some new Rosetta workunits under 5.43 on Jan 27th. Sure enough, the first three it tried to run all failed with access violations, here, here and here. The fourth WU succeeded. By that stage, though, I\'d had enough again and shut it down.

                                            I have no idea why this is happening, and the 10% failure rate you mention would have been, if anything, an overestimate of the situation during the first few months I was crunching. The problems really seem to stem from the introduction of 5.43; which is puzzling since I don\'t use the graphics. I\'ll certainly try Rosetta again when 5.43 is upgraded, but I\'d be a lot happier if I knew what was going wrong.



                                            Profile feet1st

                                            Joined: Mar 7 06
                                            Posts: 312
                                            ID: 1028
                                            Credit: 110,522
                                            RAC: 0
                                            Message 2745 - Posted 30 Jan 2007 3:13:13 UTC

                                              My previously problematic machine just went 18hrs, ss active, without a burp. Successfully complete 3 WUs and is still crunching on a fourth. During the start of getting these WUs I had set to enable my screen saver, went to take a shower, forgot I had left Rosetta active too, and by time I got back to this machine it was hung already. ...the Rosetta WU, not Ralph!

                                              ...so I\'d say things are looking great on Windows.
                                              ____________

                                              Billy

                                              Joined: Jan 29 07
                                              Posts: 13
                                              ID: 2592
                                              Credit: 5,855
                                              RAC: 0
                                              Message 2746 - Posted 30 Jan 2007 14:22:59 UTC

                                                It isn\'t possible to test this update on my Mac as there is no work units. I did get 2 work units on one day, but they ran and I didn\'t notice them, so I couldn\'t turn on the graphics.

                                                Chu
                                                Forum moderator
                                                Project developer
                                                Project scientist

                                                Joined: Sep 26 06
                                                Posts: 61
                                                ID: 1900
                                                Credit: 12,545
                                                RAC: 0
                                                Message 2748 - Posted 30 Jan 2007 21:05:25 UTC - in response to Message 2746.

                                                  Now it is updated on Rosetta@Home and you will get plenty of WUs to crunch. Just be aware that there is still some minor problem unsolved for mac platforms. See here

                                                  It isn\'t possible to test this update on my Mac as there is no work units. I did get 2 work units on one day, but they ran and I didn\'t notice them, so I couldn\'t turn on the graphics.

                                                  Rhiju
                                                  Forum moderator
                                                  Project developer
                                                  Project scientist

                                                  Joined: Feb 14 06
                                                  Posts: 161
                                                  ID: 4
                                                  Credit: 3,725
                                                  RAC: 0
                                                  Message 2749 - Posted 31 Jan 2007 2:53:59 UTC - in response to Message 2748.

                                                    Work units of the form

                                                    s018__CASP7_ASSEMBLE_SAVE_ALL_OUT_hom001__IGNORE_THE_REST_s018__BOINC_LOOP_RELAX__1446_0.clean.out.2

                                                    are acting a little wacky -- I\'m working on the fix!

                                                    Now it is updated on Rosetta@Home and you will get plenty of WUs to crunch. Just be aware that there is still some minor problem unsolved for mac platforms. See here
                                                    It isn\'t possible to test this update on my Mac as there is no work units. I did get 2 work units on one day, but they ran and I didn\'t notice them, so I couldn\'t turn on the graphics.



                                                    ____________

                                                    Profile Conan
                                                    Avatar

                                                    Joined: Feb 16 06
                                                    Posts: 344
                                                    ID: 145
                                                    Credit: 1,309,534
                                                    RAC: 0
                                                    Message 2750 - Posted 31 Jan 2007 10:24:55 UTC

                                                      > Had this one fail, was not at the computer so did not operate Boinc screensaver still using standard Windows one. All others have progressed with no trouble so far.

                                                      http:ralph.bakerlab.org/result.php?resultid=411601

                                                      <message>
                                                      - exit code -1073741819 (0xc0000005)
                                                      </message>
                                                      <stderr_txt>
                                                      # random seed: 2755617


                                                      Unhandled Exception Detected...

                                                      - Unhandled Exception Record -
                                                      Reason: Access Violation (0xc0000005) at address 0x00681A55 read attempt to address 0x7BCFB090

                                                      Engaging BOINC Windows Runtime Debugger...
                                                      ____________

                                                      Profile Bober [B@P]

                                                      Joined: Jun 18 06
                                                      Posts: 6
                                                      ID: 1538
                                                      Credit: 15,427
                                                      RAC: 0
                                                      Message 2751 - Posted 31 Jan 2007 11:04:16 UTC - in response to Message 2750.

                                                        I had two WUs with the same error:
                                                        result 411470
                                                        result 411652

                                                        ____________

                                                        tallguy-13088
                                                        Avatar

                                                        Joined: Feb 17 06
                                                        Posts: 10
                                                        ID: 376
                                                        Credit: 121,701
                                                        RAC: 0
                                                        Message 2752 - Posted 1 Feb 2007 1:33:58 UTC

                                                          Hello,

                                                          I just aborted two RALPH work Units. They were:

                                                          s018__CASP7_ASSEMBLE_SAVE_ALL_OUT_hom001__IGNORE_THE_REST_s018__BOINC_LOOP_RELAX__1446_0.clean.out.1_1670_3

                                                          - and -

                                                          s018__CASP7_ASSEMBLE_SAVE_ALL_OUT_hom001__IGNORE_THE_REST_s018__BOINC_LOOP_RELAX__1446_0.clean.out.2_1670_3

                                                          Both were at 100%, PRE-EMPTED and still accumulating time while other projects were active. Earlier this evening, both had accumulated 10+ hours apiece. Upon restarting BOINC Manager (v5.4.9), unit #1 dropped back to 5.442% completion (at 49m 16s accumulated time) and the second went back to 10.442% completion at 45m 09s accumulated time). The graphics stated the second was in \"stage assembly\" for the process.

                                                          I am running W2K Build 2195 Service Pack 4 on dual Xeon 2.8Ghz cores. Ralph@Home was at 5.45. If there is any more info you need, please reply to this post. Thanks!
                                                          ____________

                                                          =Lupus=

                                                          Joined: Sep 23 06
                                                          Posts: 4
                                                          ID: 1893
                                                          Credit: 15,762
                                                          RAC: 0
                                                          Message 2754 - Posted 1 Feb 2007 7:24:59 UTC

                                                            Result 412972 same 0xc0000005 error. I was not even near the \"show grafx\" button! Good luck in bug-hunting,

                                                            =Lupus=

                                                            Profile Conan
                                                            Avatar

                                                            Joined: Feb 16 06
                                                            Posts: 344
                                                            ID: 145
                                                            Credit: 1,309,534
                                                            RAC: 0
                                                            Message 2755 - Posted 1 Feb 2007 7:25:54 UTC

                                                              > Got a different one this time.
                                                              It had got to 100.00% but the Boinc Manager said it was still running. So I checked in my System Monitor (I am using Linux on this Opteron 275 machine) and it said that 1 of my 4 cpus was at idle and the other 3 at 100%. This then changed with the idle cpu moving from cpu to cpu till all 4 were swapping the idle job around from core to core. It also still held 166 MB of memory.
                                                              I had to abort it then all cpus ran at 100% again.

                                                              This workunit http://ralph.bakerlab.org/workunit.php?wuid=364578

                                                              genes
                                                              Avatar

                                                              Joined: Feb 16 06
                                                              Posts: 45
                                                              ID: 57
                                                              Credit: 43,300
                                                              RAC: 0
                                                              Message 2757 - Posted 1 Feb 2007 11:40:19 UTC

                                                                I am still having problems when I display the graphics, notably when I enable the screensaver on [url=http://ralph.bakerlab.org/show_host_detail.php?hostid=2016]this computer{/url].

                                                                I currently have an ATI x850x graphics card installed, and the installed driver is 7-1_xp_dd_ccc_wdm_enu_40211 (catalyst version). Here is what I saw happen: the BOINC screensaver was running, and over time I saw the CPDN graphics, the QMC graphics, and either Ralph or Rosetta (both of which are 5.45). The last graphics I saw were from Ralph or Rosetta, then I came back and saw that the \"VPU recover\" feature had activated (display driver resets instead of hanging, and prepares a crash report for ATI). I allowed it to submit the report, and the Rosetta/Ralph WU did not crash, but finished normally, so I can\'t point you to the bad WU.

                                                                Later today I will put back the NVidia card that I also use with this machine (a GeForce FX5950), and see how that behaves.

                                                                ____________

                                                                genes
                                                                Avatar

                                                                Joined: Feb 16 06
                                                                Posts: 45
                                                                ID: 57
                                                                Credit: 43,300
                                                                RAC: 0
                                                                Message 2758 - Posted 1 Feb 2007 11:41:47 UTC

                                                                  Rats. I typo\'ed the link, and I can\'t edit it. I\'ll try again. It\'s this computer
                                                                  ____________

                                                                  genes
                                                                  Avatar

                                                                  Joined: Feb 16 06
                                                                  Posts: 45
                                                                  ID: 57
                                                                  Credit: 43,300
                                                                  RAC: 0
                                                                  Message 2760 - Posted 3 Feb 2007 3:06:32 UTC

                                                                    I have the NVidia card installed (it\'s a GeForce Fx5950), and I haven\'t seen any graphics problems since, either with Ralph or Rosetta. I\'m using driver version 93.71. So much for ATI.

                                                                    ____________

                                                                    Viromancy

                                                                    Joined: Jan 20 07
                                                                    Posts: 7
                                                                    ID: 2554
                                                                    Credit: 1,425
                                                                    RAC: 0
                                                                    Message 2761 - Posted 3 Feb 2007 20:10:39 UTC

                                                                      Well, after all the head scratching in the thread above, it seems I\'ve finally managed to crack the Rosetta Weirdness on my machine. And in some respects it\'s obvious, while in others it\'s baffling. It seems I managed to pick a totally borderline overclock setting for my 2.66 GHZ C2D. Every other application and BOINC client program ran at 3.46 GHz without any problem, and that included all the overclock stress-test applications I ran. Apparently, though, Rosetta from mid 5.43 onwards doesn\'t.

                                                                      So after going mad when 5.45 didn\'t work, I tried dropping the effective clock to 3.40 GHz and Vcore down to 5.125V. Rosetta ver 5.45 now appears to be totally stable for a 1.7% reduction in overclocked processor speed, right up to a 24 hour WU timing. I think I can live with that :-) Bloody peculiar, though. Maybe Rosetta should get prepared for being used as an OC stability check, because nothing else showed any effect; though admittedly I didn\'t try computing prime numbers for 12 hours...

                                                                      zombie67 [MM]
                                                                      Avatar

                                                                      Joined: Aug 8 06
                                                                      Posts: 70
                                                                      ID: 1666
                                                                      Credit: 1,419,520
                                                                      RAC: 352
                                                                      Message 2763 - Posted 5 Feb 2007 4:32:32 UTC

                                                                        Please turn on RAC decay.

                                                                        http://boinc.berkeley.edu/project_tasks.php
                                                                        ____________
                                                                        Dublin, CA
                                                                        SETI.USA - Stats - My stuff - BOINC IRC chat

                                                                        Profile Conan
                                                                        Avatar

                                                                        Joined: Feb 16 06
                                                                        Posts: 344
                                                                        ID: 145
                                                                        Credit: 1,309,534
                                                                        RAC: 0
                                                                        Message 2764 - Posted 5 Feb 2007 20:36:19 UTC

                                                                          > What happened to the crediting system?
                                                                          It is back to what you get is what you claim, I checked one persons work units and he is getting up to 50 credits an hour (398 credits on an 8 hour WU with not that many decoys done) on the latest batch. Sure beats my 14 to low 20\'s that I get for my 6 hours processing per WU.
                                                                          ____________

                                                                          Profile feet1st

                                                                          Joined: Mar 7 06
                                                                          Posts: 312
                                                                          ID: 1028
                                                                          Credit: 110,522
                                                                          RAC: 0
                                                                          Message 2780 - Posted 7 Feb 2007 14:38:48 UTC

                                                                            I just got one of these WUs:
                                                                            1who__BOINC_ABINITIO_CONTROL2__1749_26_0 using rosetta_beta version 545
                                                                            the graphic doesn\'t show the sidechains.
                                                                            ____________

                                                                            Profile Conan
                                                                            Avatar

                                                                            Joined: Feb 16 06
                                                                            Posts: 344
                                                                            ID: 145
                                                                            Credit: 1,309,534
                                                                            RAC: 0
                                                                            Message 2782 - Posted 8 Feb 2007 0:37:24 UTC

                                                                              > Just had 4 Work Units fail, all at 1 hour processing time, I am expecting the other 2 to fail as well.
                                                                              All the work units got \'stuck\' and the Watchdog says it ended the run, but this is not the case.
                                                                              All 4 work units on the Boinc Manager said that they were still running with NO CPU usage but still using up to 308 MB of RAM for each WU. All 4 got to 1 hour (my preferences are for 6 hours) and then said they were 100% complete but the WU did not release the CPU to go to another task.

                                                                              http//ralph.bakerlab.org/result.php?resultid=420621
                                                                              http//ralph.bakerlab.org/result.php?resultid=420709
                                                                              http//ralph.bakerlab.org/result.php?resultid=420761
                                                                              http//ralph.bakerlab.org/result.php?resultid=420767

                                                                              Thanks
                                                                              ____________

                                                                              Chu
                                                                              Forum moderator
                                                                              Project developer
                                                                              Project scientist

                                                                              Joined: Sep 26 06
                                                                              Posts: 61
                                                                              ID: 1900
                                                                              Credit: 12,545
                                                                              RAC: 0
                                                                              Message 2783 - Posted 8 Feb 2007 4:56:57 UTC - in response to Message 2782.

                                                                                sounds like some problem interfacing with BOINC manager. Those WUs themselves are fine and several of them you killed actually showed that they were stuck at score 0 which means this did not happen in the middle of a simulation. Could you please next time close the BOINC manager and re-open it to see if any of these WUs will be finished and reported? If that does not help, then go ahead to kill them. In addition, it seems to be specific to your linux hosts, but not Windows, right?

                                                                                > Just had 4 Work Units fail, all at 1 hour processing time, I am expecting the other 2 to fail as well.
                                                                                All the work units got \'stuck\' and the Watchdog says it ended the run, but this is not the case.
                                                                                All 4 work units on the Boinc Manager said that they were still running with NO CPU usage but still using up to 308 MB of RAM for each WU. All 4 got to 1 hour (my preferences are for 6 hours) and then said they were 100% complete but the WU did not release the CPU to go to another task.

                                                                                http//ralph.bakerlab.org/result.php?resultid=420621
                                                                                http//ralph.bakerlab.org/result.php?resultid=420709
                                                                                http//ralph.bakerlab.org/result.php?resultid=420761
                                                                                http//ralph.bakerlab.org/result.php?resultid=420767

                                                                                Thanks

                                                                                Chu
                                                                                Forum moderator
                                                                                Project developer
                                                                                Project scientist

                                                                                Joined: Sep 26 06
                                                                                Posts: 61
                                                                                ID: 1900
                                                                                Credit: 12,545
                                                                                RAC: 0
                                                                                Message 2784 - Posted 8 Feb 2007 4:59:03 UTC - in response to Message 2780.

                                                                                  in early stage of some simulations, we carried out low-resolution search and thus sidechains will not be shown. Usually in the first box, there will either \"search backbone\"( no sidechains) or \"search_full_atom\" (with sidechains).

                                                                                  I just got one of these WUs:
                                                                                  1who__BOINC_ABINITIO_CONTROL2__1749_26_0 using rosetta_beta version 545
                                                                                  the graphic doesn\'t show the sidechains.

                                                                                  Profile anders n

                                                                                  Joined: Feb 16 06
                                                                                  Posts: 166
                                                                                  ID: 91
                                                                                  Credit: 131,419
                                                                                  RAC: 0
                                                                                  Message 2789 - Posted 11 Feb 2007 18:02:20 UTC

                                                                                    Last modified: 11 Feb 2007 18:08:46 UTC

                                                                                    exit code 1 (0x1)
                                                                                    </message>
                                                                                    <stderr_txt>
                                                                                    # random seed: 2742357
                                                                                    ERROR:: Exit at: .\\loop_relax.cc line:1803

                                                                                    On several WU-s

                                                                                    like this one http://ralph.bakerlab.org/result.php?resultid=424908

                                                                                    Anders n

                                                                                    EDIT same goes for MAC http://ralph.bakerlab.org/result.php?resultid=425079
                                                                                    ____________

                                                                                    Profile Inais
                                                                                    Avatar

                                                                                    Joined: Jul 30 06
                                                                                    Posts: 12
                                                                                    ID: 1634
                                                                                    Credit: 13,115
                                                                                    RAC: 0
                                                                                    Message 2790 - Posted 11 Feb 2007 19:52:22 UTC

                                                                                      A lot of WU\'s comming up with counting error after arround 50 seconds.

                                                                                      1 WU told - waiting for storrage. This I have seen the first time
                                                                                      ____________
                                                                                      I wish I can fly like a bird in the sky

                                                                                      Bjarke

                                                                                      Joined: Feb 25 06
                                                                                      Posts: 5
                                                                                      ID: 796
                                                                                      Credit: 5,523
                                                                                      RAC: 0
                                                                                      Message 2791 - Posted 11 Feb 2007 21:04:10 UTC

                                                                                        LOTS of errors! The wu\'s run for aprox 70 seconds before failure.

                                                                                        my results
                                                                                        ____________

                                                                                        Message boards : RALPH@home bug list : Bug Reports for 5.45


                                                                                        Home | Join | About | Participants | Community | Statistics

                                                                                        Copyright © 2017 University of Washington

                                                                                        Last Modified: 20 Nov 2008 19:41:56 UTC
                                                                                        Back to top ^