RALPH@home

Switching between projects with applications removed from memory

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search

Message boards : Current tests : Switching between projects with applications removed from memory

AuthorMessage
Profile dekim
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jan 20 06
Posts: 210
ID: 1
Credit: 469,656
RAC: 1,937
Message 4 - Posted 15 Feb 2006 21:06:16 UTC

    A known bug of rosetta is that the application will die when preempted if your general preferences are not set to \"Leave applications in memory while preempted.\" This bug effects users who are involved in multiple boinc projects and do not leave applications in memory.

    We may have fixed this bug for windows platforms by using Visual Studio 2005 to build the application instead of Visual Studio 2003.
    ____________

    Profile UBT - Halifax--lad

    Joined: Feb 15 06
    Posts: 29
    ID: 7
    Credit: 2,723
    RAC: 0
    Message 28 - Posted 16 Feb 2006 8:16:09 UTC - in response to Message 4.

      A known bug of rosetta is that the application will die when preempted if your general preferences are not set to \"Leave applications in memory while preempted.\" This bug effects users who are involved in multiple boinc projects and do not leave applications in memory.

      We may have fixed this bug for windows platforms by using Visual Studio 2005 to build the application instead of Visual Studio 2003.


      Indeed you sedem to have done so, I had to reset my computer half way through a WU, to install some updates.

      BOINC took the WU out of memory which it wasn\'t supposed to, but I had forgotten to set that option in the 1st place.

      When I came back on and BOINC loaded it just carried on from where it had left off

      ____________
      Join us in Chat (see the forum) Click the Sig


      Join UBT

      genes
      Avatar

      Joined: Feb 16 06
      Posts: 45
      ID: 57
      Credit: 43,300
      RAC: 0
      Message 38 - Posted 16 Feb 2006 13:22:41 UTC

        I\'m having a problem with this, but not the one you\'re trying to fix. BOINC simply does not have enough \"venues\" to set up custom situations to either test specific things or to tune resources for specific machines. And since it doesn\'t allow \"local control\", we have to balance carefully.

        OK, so \"school\" is going to have \"leave in memory\" OFF, all others will have it ON. Otherwise, \"school\" will be like \"home\". Good thing I can use the same resource shares for these two. Then, of course, I have to visit all the projects and update on all their web sites, or stuff will be hopelessly confused.

        ____________

        KWSN Sir Clark
        Avatar

        Joined: Feb 16 06
        Posts: 4
        ID: 62
        Credit: 21
        RAC: 0
        Message 77 - Posted 16 Feb 2006 21:37:46 UTC

          One of mine got unceremoniously ditched from memory when I was allowing another project to download more work.......it errored out, even though it was set to remain in memory.
          ____________


          www.chris-kent.co.uk aka Chief.com

          Profile [B^S] Doug Worrall
          Avatar

          Joined: Feb 16 06
          Posts: 10
          ID: 220
          Credit: 1,515
          RAC: 0
          Message 84 - Posted 16 Feb 2006 22:30:30 UTC


            As a Linux user.Running a Rosetta W/u is a \"Non quit boinc\" issue and I was
            hoping this Bug will be fixed by Ralph.Presently\"with memory to \"saved\" in
            the General Preferences.If I \"Quit\" a Rosetta w/u by Rebooting {quiting Boinc}
            The Rosetta w/u is Fubarred 70% of the time.
            Still waiting on some w/u to crunch.
            \"Salude\"
            Sluger

            ____________

            Dimitris Hatzopoulos

            Joined: Feb 16 06
            Posts: 31
            ID: 303
            Credit: 2,308
            RAC: 0
            Message 87 - Posted 16 Feb 2006 22:59:28 UTC

              I wonder how exactly the process of \"removing app from memory\" is handled by BOINC and science app.

              Would e.g. Rosetta lose any data it computed, since its last \"checkpoint\" (writing temporary results to disk every x minutes or y progress?)

              I know I could look at the source of some open-source science app like SETI, but ... I thought I\'d save a bit of time asking :-)
              ____________

              genes
              Avatar

              Joined: Feb 16 06
              Posts: 45
              ID: 57
              Credit: 43,300
              RAC: 0
              Message 108 - Posted 17 Feb 2006 2:37:13 UTC

                Last modified: 17 Feb 2006 2:41:30 UTC

                OK, the machine I had set \"leave in memory\" to OFF had an error on its one WU that it got:

                http://ralph.bakerlab.org/result.php?resultid=1666

                It\'s not getting any more at the moment (no work from project). It also just had a Rosetta WU error out. I set Rosetta to NNW on that machine for now so I won\'t lose any more work.

                This is the machine BTW:

                http://ralph.bakerlab.org/show_host_detail.php?hostid=76

                ____________

                Moderator9
                Forum moderator

                Joined: Feb 16 06
                Posts: 251
                ID: 210
                Credit: 0
                RAC: 0
                Message 117 - Posted 17 Feb 2006 4:32:52 UTC - in response to Message 108.

                  Last modified: 17 Feb 2006 4:47:49 UTC

                  OK, the machine I had set \"leave in memory\" to OFF had an error on its one WU that it got:

                  http://ralph.bakerlab.org/result.php?resultid=1666

                  It\'s not getting any more at the moment (no work from project). It also just had a Rosetta WU error out. I set Rosetta to NNW on that machine for now so I won\'t lose any more work.

                  This is the machine BTW:

                  http://ralph.bakerlab.org/show_host_detail.php?hostid=76


                  For the purposes of the ALPHA testing, you should expect to loose processing time. That is just the nature of testing. If loss of processing time is important to you please consider if the test project is the best use of your system.

                  Certainly the time spent processing for Ralph is valuable in testing the next generation of Rosetta applications, but credit is not a priority for the testing. The more Work Units you can process the better for the test. For Ralph project details please see This thread

                  ____________
                  Moderator9
                  RALPH@home FAQs
                  RALPH@home Guidelines
                  Moderator Contact

                  genes
                  Avatar

                  Joined: Feb 16 06
                  Posts: 45
                  ID: 57
                  Credit: 43,300
                  RAC: 0
                  Message 170 - Posted 18 Feb 2006 0:46:09 UTC - in response to Message 117.


                    For the purposes of the ALPHA testing, you should expect to loose processing time. That is just the nature of testing. If loss of processing time is important to you please consider if the test project is the best use of your system.


                    Yes, I agree. I expect to lose Ralph WU\'s, but I didn\'t want to ruin Rosetta WU\'s, so I am not allowing that machine to get any more Rosetta for the duration of the test. I now have a new Ralph WU on that machine, but with a 4.84 app version. What\'s new in 4.84?

                    ____________

                    Profile Contact
                    Avatar

                    Joined: Feb 16 06
                    Posts: 10
                    ID: 74
                    Credit: 92,601
                    RAC: 0
                    Message 171 - Posted 18 Feb 2006 0:46:48 UTC - in response to Message 4.

                      A known bug of rosetta is that the application will die when preempted if your general preferences are not set to \"Leave applications in memory while preempted.\"

                      Seems ok for me on XP. Even after OS reboot, wu\'s resume properly and are valid.
                      Will try on 98 soon.
                      ____________

                      Click and enter your name for your BOINC Statistics

                      genes
                      Avatar

                      Joined: Feb 16 06
                      Posts: 45
                      ID: 57
                      Credit: 43,300
                      RAC: 0
                      Message 179 - Posted 18 Feb 2006 2:47:15 UTC

                        Had another WU crash, report here:

                        http://ralph.bakerlab.org/forum_thread.php?id=2#178

                        ____________

                        Moderator9
                        Forum moderator

                        Joined: Feb 16 06
                        Posts: 251
                        ID: 210
                        Credit: 0
                        RAC: 0
                        Message 188 - Posted 18 Feb 2006 4:26:19 UTC - in response to Message 170.

                          Last modified: 18 Feb 2006 4:30:04 UTC


                          For the purposes of the ALPHA testing, you should expect to loose processing time. That is just the nature of testing. If loss of processing time is important to you please consider if the test project is the best use of your system.


                          Yes, I agree. I expect to lose Ralph WU\'s, but I didn\'t want to ruin Rosetta WU\'s, so I am not allowing that machine to get any more Rosetta for the duration of the test. I now have a new Ralph WU on that machine, but with a 4.84 app version. What\'s new in 4.84?


                          Frankly, changes are happening so fast now that I do not know what went into the minor update. Perhaps David Kim will chime in on that. But don\'t be afraid of destroying Work Units. If you beat up a few the project can learn from that.
                          ____________
                          Moderator9
                          RALPH@home FAQs
                          RALPH@home Guidelines
                          Moderator Contact

                          Dimitris Hatzopoulos

                          Joined: Feb 16 06
                          Posts: 31
                          ID: 303
                          Credit: 2,308
                          RAC: 0
                          Message 189 - Posted 18 Feb 2006 4:31:07 UTC

                            Last modified: 18 Feb 2006 4:33:43 UTC

                            Any suggestions on the kinds of stress-tests we should try on RALPH WUs, to \"speed things up\"? Any recommended settings? I have # hours to run set to 4. Is there a point in reducing it even more (if one doesn\'t care about the download overheads) to get more WU samples? Or reduce \"Switch between applications every\" to 30min? (from 60) again to \"force\" more removes from mem?

                            Also, is there a phase in Rosetta\'s progress (e.g. <10% progress) that a WU is more susceptible to the dreaded \"Computation error\", due to checkpointing or whatever?

                            Since everytime a user manually requests an update, BOINC does a request_reschedule_cpus, which removes currently running apps from memory and resumes/starts others. So, one can manually force multiple app removal from mem actions, not having to wait 60min.
                            ____________

                            Profile Angus

                            Joined: Feb 17 06
                            Posts: 10
                            ID: 385
                            Credit: 1,007
                            RAC: 0
                            Message 226 - Posted 18 Feb 2006 17:41:42 UTC

                              http://ralph.bakerlab.org/result.php?resultid=3669

                              Nothing in log for almost 30 minutes prior to error. Not task switching.

                              NOT left in memory. Just crashed.
                              ____________

                              genes
                              Avatar

                              Joined: Feb 16 06
                              Posts: 45
                              ID: 57
                              Credit: 43,300
                              RAC: 0
                              Message 247 - Posted 18 Feb 2006 22:13:57 UTC

                                I have a 4.85 WU now. Are these new changes for the \"Leave In Memory = NO\" bug?

                                ____________

                                Profile dekim
                                Forum moderator
                                Project administrator
                                Project developer
                                Project scientist

                                Joined: Jan 20 06
                                Posts: 210
                                ID: 1
                                Credit: 469,656
                                RAC: 1,937
                                Message 276 - Posted 19 Feb 2006 1:08:25 UTC

                                  The recent app update had a few fixes in the cpu run time code. We should continue to test for the leave in memory bug to get an idea of what fraction of computers are actually having this problem. I am going to update the production R@h application soon since the success rates so far look better. We are still seeing a few of the \"0xffffffffc0000005\" crashes and I am not sure if they are all due to preemption crashes or also include random crashes that are common on Windows platforms. The major change for windows was switching to Visual Studio 2005 from 2003. There were some significant compiler fixes particularly with optimization and we were hoping that the change would produce a more stable build. It has definitely fixed some other issues we were having with specific types of experiments that were not effecting results and science but were showing some unexpected but benign behaviour. The optimized Windows build with VS2005 now produces results that are very consistent with the linux build given the same random seed.
                                  ____________

                                  genes
                                  Avatar

                                  Joined: Feb 16 06
                                  Posts: 45
                                  ID: 57
                                  Credit: 43,300
                                  RAC: 0
                                  Message 284 - Posted 19 Feb 2006 1:45:47 UTC

                                    Thanks for the info. :-)
                                    ____________

                                    Dimitris Hatzopoulos

                                    Joined: Feb 16 06
                                    Posts: 31
                                    ID: 303
                                    Credit: 2,308
                                    RAC: 0
                                    Message 407 - Posted 21 Feb 2006 2:54:03 UTC

                                      Last modified: 21 Feb 2006 2:55:56 UTC

                                      Can we now test the newest \"production\" R@H (Win/v4.82 and Linux/v4.81) executables with \"Leave preempted app in mem\"=NO ?

                                      Otherwise, we still can\'t test RALPH (for this particular bug) and still run Rosetta@Home on same PC, as suggested per RALPH FAQ


                                      ____________

                                      Profile dekim
                                      Forum moderator
                                      Project administrator
                                      Project developer
                                      Project scientist

                                      Joined: Jan 20 06
                                      Posts: 210
                                      ID: 1
                                      Credit: 469,656
                                      RAC: 1,937
                                      Message 408 - Posted 21 Feb 2006 3:40:51 UTC

                                        Yes, the applications are now equivalent.
                                        ____________

                                        River~~

                                        Joined: Feb 20 06
                                        Posts: 20
                                        ID: 637
                                        Credit: 503
                                        RAC: 0
                                        Message 430 - Posted 21 Feb 2006 17:08:23 UTC

                                          hi David,

                                          a similar question based around the keep in memory issue.

                                          Am I right that where a machine is turned off daily, it would be useful to have the cpu time set long enough to force every WU to experience at least one power cycle? So with the machine left on for 7hrs/day, I\'d set the cpu time well over 7hrs for example.

                                          River~~
                                          ____________

                                          Aglarond

                                          Joined: Feb 16 06
                                          Posts: 11
                                          ID: 184
                                          Credit: 1,094
                                          RAC: 0
                                          Message 470 - Posted 22 Feb 2006 12:53:02 UTC

                                            Hi, is switching between projects the same problem as with going to standby with my PC? As I have laptop running rosetta, I usually go to standby when I want to take it elsewhere. And Rosetta (also Ralph) always crashes like this:

                                            22. 2. 2006 13:53:32|ralph@home|Result BARCODE_30_1cc8A_215_35_0 exited with zero status but no \'finished\' file
                                            22. 2. 2006 13:53:32|ralph@home|If this happens repeatedly you may need to reset the project.
                                            22. 2. 2006 13:53:32||Rescheduling CPU: application exited
                                            22. 2. 2006 13:53:32|ralph@home|Restarting result BARCODE_30_1cc8A_215_35_0 using rosetta_beta version 486

                                            This may help you find the problem: when I go to standby mode and then wake up my laptop in very short time (5 sec), rosetta continues normally and also graphic window continues as if nothing happend. But if I leave it in standby just a litte longer (15 sec), rosetta crashes and graphic window closes. The same happens with some other Boinc projects.
                                            ____________

                                            Moderator9
                                            Forum moderator

                                            Joined: Feb 16 06
                                            Posts: 251
                                            ID: 210
                                            Credit: 0
                                            RAC: 0
                                            Message 477 - Posted 22 Feb 2006 14:41:41 UTC - in response to Message 470.

                                              Last modified: 22 Feb 2006 14:44:50 UTC

                                              Hi, is switching between projects the same problem as with going to standby with my PC? As I have laptop running rosetta, I usually go to standby when I want to take it elsewhere. And Rosetta (also Ralph) always crashes like this:

                                              22. 2. 2006 13:53:32|ralph@home|Result BARCODE_30_1cc8A_215_35_0 exited with zero status but no \'finished\' file
                                              22. 2. 2006 13:53:32|ralph@home|If this happens repeatedly you may need to reset the project.
                                              22. 2. 2006 13:53:32||Rescheduling CPU: application exited
                                              22. 2. 2006 13:53:32|ralph@home|Restarting result BARCODE_30_1cc8A_215_35_0 using rosetta_beta version 486

                                              This may help you find the problem: when I go to standby mode and then wake up my laptop in very short time (5 sec), rosetta continues normally and also graphic window continues as if nothing happend. But if I leave it in standby just a litte longer (15 sec), rosetta crashes and graphic window closes. The same happens with some other Boinc projects.




                                              Sleep or standby mode is actually very different than an application swap. However, most laptops do not crash projects when they sleep. While activity suspends just as you might expect, the system should snapshot the application and then sleep. It looks as though your system is having some kind of problem reloading after sleep. IT could be caused by a number of things, but it is not likely a Ralph issue.

                                              I assume your battery is fully changed, but are you certain it is in good condition?. If not this can cause a sleeping system to crash. Try running the BOINC system on battery power for a half hour or so and see if the system fails

                                              ____________
                                              Moderator9
                                              RALPH@home FAQs
                                              RALPH@home Guidelines
                                              Moderator Contact

                                              Aglarond

                                              Joined: Feb 16 06
                                              Posts: 11
                                              ID: 184
                                              Credit: 1,094
                                              RAC: 0
                                              Message 513 - Posted 23 Feb 2006 1:59:47 UTC - in response to Message 477.


                                                Sleep or standby mode is actually very different than an application swap. However, most laptops do not crash projects when they sleep. While activity suspends just as you might expect, the system should snapshot the application and then sleep. It looks as though your system is having some kind of problem reloading after sleep. IT could be caused by a number of things, but it is not likely a Ralph issue.

                                                I assume your battery is fully changed, but are you certain it is in good condition?. If not this can cause a sleeping system to crash. Try running the BOINC system on battery power for a half hour or so and see if the system fails


                                                Hmm.. no other apps ever had problems with it, except Boinc projects. But still it can be problem with my laptop. I tried it also running on AC power, and also running on battery power - both with the same result.

                                                ____________

                                                Aglarond

                                                Joined: Feb 16 06
                                                Posts: 11
                                                ID: 184
                                                Credit: 1,094
                                                RAC: 0
                                                Message 514 - Posted 23 Feb 2006 2:30:18 UTC - in response to Message 513.

                                                  Now I looked into WU, that was running when I tried to switch apps in Boinc (without leavin in memory) and also, while I have put my laptop into standby. This is part of it:

                                                  <stderr_txt>
                                                  ...
                                                  No heartbeat from core client for 31 sec - exiting
                                                  ...
                                                  </stderr_txt>

                                                  Do you think this can be the reason why Rosetta exits after my system wake-ups from standby? It doesn\'t exit when I wake-up my laptop in just few seconds. This behavior is similar with other Boinc projects.
                                                  ____________

                                                  tgm

                                                  Joined: Feb 19 06
                                                  Posts: 5
                                                  ID: 598
                                                  Credit: 1,066
                                                  RAC: 0
                                                  Message 558 - Posted 24 Feb 2006 6:06:30 UTC

                                                    Removing rosetta beta 4.87 work units from memory on one of my windows machines is definitely FAILING with end state client error. This machine is a DUAL PROCESSOR P3 750 w/ 512MB ram running on Windows Server 2003.

                                                    I have three examples:

                                                    http://ralph.bakerlab.org/workunit.php?wuid=5559
                                                    http://ralph.bakerlab.org/workunit.php?wuid=5560
                                                    http://ralph.bakerlab.org/workunit.php?wuid=5561

                                                    I have now switched my configuration to keep wu\'s in memory and performed an update. We\'ll see what happens.

                                                    Curiously, I have another wu running on a Fedora box that that is showing some other bizare behavior, but I\'ll start a new post for this one.

                                                    Colin Porter

                                                    Joined: Feb 16 06
                                                    Posts: 3
                                                    ID: 283
                                                    Credit: 24
                                                    RAC: 0
                                                    Message 620 - Posted 25 Feb 2006 14:00:01 UTC

                                                      YOU MAY HAVE CRACKED IT.

                                                      Until today I have not been able to complete a WU with either \"Leave applications in memory while preempted\" selected to YES or NO. As soon as a switch occured, for whatever reason, the WU would error out. The difference today is Ralph is runnung 4.89.

                                                      My Results

                                                      ____________

                                                      Dimitris Hatzopoulos

                                                      Joined: Feb 16 06
                                                      Posts: 31
                                                      ID: 303
                                                      Credit: 2,308
                                                      RAC: 0
                                                      Message 649 - Posted 25 Feb 2006 19:21:13 UTC - in response to Message 558.

                                                        Removing rosetta beta 4.87 work units from memory on one of my windows machines is definitely FAILING with end state client error. This machine is a DUAL PROCESSOR P3 750 w/ 512MB ram running on Windows Server 2003.

                                                        I have now switched my configuration to keep wu\'s in memory and performed an update. We\'ll see what happens.

                                                        Curiously, I have another wu running on a Fedora box that that is showing some other bizare behavior, but I\'ll start a new post for this one.


                                                        I think this is the case when a slower machine (P3/750) takes too long to complete the first model and it gets pre-empted and removed from RAM / VM before even the first checkpoint is reached.

                                                        In which case you need to keep in RAM while pre-empted and/or increase times between app switching to a higher value from default 60min, to e.g. 4hr in your case.
                                                        ____________

                                                        tgm

                                                        Joined: Feb 19 06
                                                        Posts: 5
                                                        ID: 598
                                                        Credit: 1,066
                                                        RAC: 0
                                                        Message 691 - Posted 27 Feb 2006 3:42:37 UTC - in response to Message 649.

                                                          I think this is the case when a slower machine (P3/750) takes too long to complete the first model and it gets pre-empted and removed from RAM / VM before even the first checkpoint is reached.

                                                          In which case you need to keep in RAM while pre-empted and/or increase times between app switching to a higher value from default 60min, to e.g. 4hr in your case.


                                                          I sort of doubt this is the case. I know one of the wu\'s got up to more than 60% before it crashed.

                                                          Dimitris Hatzopoulos

                                                          Joined: Feb 16 06
                                                          Posts: 31
                                                          ID: 303
                                                          Credit: 2,308
                                                          RAC: 0
                                                          Message 701 - Posted 27 Feb 2006 10:10:08 UTC - in response to Message 691.

                                                            I sort of doubt this is the case. I know one of the wu\'s got up to more than 60% before it crashed.


                                                            Due to the way \"new\" Rosetta WUs work (variable # Models during a fixed time period e.g. 8hr), you might want to focus more on the Model / Step statistic, rather than % progress.

                                                            In that regard, the WU stderr provided aren\'t very helpful to do remote-diagnostics. In my case, I got similar errors (for R@h, not RALPH) with yours on a machine which had multiple reboots over the previous 3 days, due to power problems.
                                                            ____________

                                                            Moderator9
                                                            Forum moderator

                                                            Joined: Feb 16 06
                                                            Posts: 251
                                                            ID: 210
                                                            Credit: 0
                                                            RAC: 0
                                                            Message 705 - Posted 27 Feb 2006 17:19:52 UTC - in response to Message 701.

                                                              Last modified: 27 Feb 2006 17:26:43 UTC

                                                              I sort of doubt this is the case. I know one of the wu\'s got up to more than 60% before it crashed.


                                                              Due to the way \"new\" Rosetta WUs work (variable # Models during a fixed time period e.g. 8hr), you might want to focus more on the Model / Step statistic, rather than % progress.

                                                              In that regard, the WU stderr provided aren\'t very helpful to do remote-diagnostics. In my case, I got similar errors (for R@h, not RALPH) with yours on a machine which had multiple reboots over the previous 3 days, due to power problems.


                                                              You are correct about this. So much so that the explanation of all of this has been updated in the Rosetta FAQs in this post (I will do it here when there is some time).

                                                              Look below the green edit line for information specific to the 1% diagnostic info. Some of this might help remote diagnostics as well, but that is a specialized issue. If you can use the remote functions in BOINC you could still use the graphic.

                                                              ____________
                                                              Moderator9
                                                              RALPH@home FAQs
                                                              RALPH@home Guidelines
                                                              Moderator Contact

                                                              Aaron Finney

                                                              Joined: Feb 16 06
                                                              Posts: 56
                                                              ID: 98
                                                              Credit: 1,457
                                                              RAC: 0
                                                              Message 875 - Posted 14 Mar 2006 16:21:02 UTC - in response to Message 4.

                                                                Last modified: 14 Mar 2006 16:21:17 UTC

                                                                Had a problem with this on a workunit that had ran for 60 hours, application version 4.92

                                                                3/13/2006 7:40:03 PM||Suspending computation and network activity - user request
                                                                3/13/2006 7:40:03 PM|climateprediction.net|Pausing result sulphur_id14_000856696_0 (removed from memory)
                                                                3/13/2006 7:40:03 PM|ralph@home|Pausing result TEST_HOMOLOG_ABINITIO_hom008_1fna__220_3_2 (removed from memory)
                                                                3/13/2006 7:40:04 PM|ralph@home|Unrecoverable error for result TEST_HOMOLOG_ABINITIO_hom008_1fna__220_3_2 ( - exit code -1073741819 (0xc0000005))
                                                                3/13/2006 7:40:04 PM||request_reschedule_cpus: process exited
                                                                3/13/2006 7:40:04 PM|ralph@home|Computation for result TEST_HOMOLOG_ABINITIO_hom008_1fna__220_3_2 finished
                                                                3/13/2006 7:40:05 PM||request_reschedule_cpus: process exited
                                                                3/13/2006 7:40:07 PM||Resuming computation and network activity
                                                                3/13/2006 7:40:07 PM||request_reschedule_cpus: Resuming activities
                                                                3/13/2006 7:40:07 PM||Allowing work fetch again.
                                                                3/13/2006 7:40:07 PM||Resuming round-robin CPU scheduling.

                                                                ____________

                                                                Profile dekim
                                                                Forum moderator
                                                                Project administrator
                                                                Project developer
                                                                Project scientist

                                                                Joined: Jan 20 06
                                                                Posts: 210
                                                                ID: 1
                                                                Credit: 469,656
                                                                RAC: 1,937
                                                                Message 883 - Posted 16 Mar 2006 18:29:37 UTC

                                                                  The current windows application has a fix that we want to test for this issue. The last batch of work units have default cpu run times of 8 hours. Please let us know if the windows app version 4.93 continues to crash when switching to another app and not left in memory or if the fix helps.
                                                                  ____________

                                                                  [B^S] sTrey
                                                                  Avatar

                                                                  Joined: Feb 15 06
                                                                  Posts: 58
                                                                  ID: 36
                                                                  Credit: 15,430
                                                                  RAC: 0
                                                                  Message 887 - Posted 16 Mar 2006 23:07:20 UTC - in response to Message 38.

                                                                    I\'m having a problem with this, but not the one you\'re trying to fix. BOINC simply does not have enough \"venues\" to set up custom situations to either test specific things or to tune resources for specific machines. And since it doesn\'t allow \"local control\", we have to balance carefully.


                                                                    Duh, thanks genes for pointing out the fact that different venues, few as they are, can be used in this way, even with the same host. With one machine and multiple projects I wasn\'t going to change my memory settings for this test, but on seeing this I reconfigured to help out. It also alleviates a bit of the strain on my box\'s vmem since I\'m running cpdn\'s seasonal attribution project and it\'s quite a hog.

                                                                    Aglarond

                                                                    Joined: Feb 16 06
                                                                    Posts: 11
                                                                    ID: 184
                                                                    Credit: 1,094
                                                                    RAC: 0
                                                                    Message 888 - Posted 17 Mar 2006 0:14:09 UTC - in response to Message 887.

                                                                      It also alleviates a bit of the strain on my box\'s vmem since I\'m running cpdn\'s seasonal attribution project and it\'s quite a hog.


                                                                      Carefully with cpdn\'s seasonal attribution project. This is from their forums:
                                                                      If you have the option \'remove from memory\' when preempting, and the boinc default of 1 hour between swapping, the chances are that you have thrown away the model each time you preempt. This project\'s defaults are 2 hours and \'keep in memory\' for obvious reasons.

                                                                      ____________

                                                                      scottLobster

                                                                      Joined: Feb 17 06
                                                                      Posts: 1
                                                                      ID: 427
                                                                      Credit: 826
                                                                      RAC: 0
                                                                      Message 889 - Posted 17 Mar 2006 0:36:17 UTC - in response to Message 883.

                                                                        The current windows application has a fix that we want to test for this issue. The last batch of work units have default cpu run times of 8 hours. Please let us know if the windows app version 4.93 continues to crash when switching to another app and not left in memory or if the fix helps.


                                                                        Just did a few switches between Rosetta and Ralph with leave in memory disabled. Seems to work fine. Rosetta didn\'t crash either. I\'ll leave it like this overnight and see what happens.

                                                                        ____________

                                                                        [B^S] sTrey
                                                                        Avatar

                                                                        Joined: Feb 15 06
                                                                        Posts: 58
                                                                        ID: 36
                                                                        Credit: 15,430
                                                                        RAC: 0
                                                                        Message 890 - Posted 17 Mar 2006 2:45:03 UTC - in response to Message 888.

                                                                          Last modified: 17 Mar 2006 2:46:37 UTC

                                                                          Carefully with cpdn\'s seasonal attribution project. This is from their forums:
                                                                          If you have the option \'remove from memory\' when preempting, and the boinc default of 1 hour between swapping, the chances are that you have thrown away the model each time you preempt. This project\'s defaults are 2 hours and \'keep in
                                                                          memory\' for obvious reasons.


                                                                          Thanks for the warning. I keep all my projects in memory and will continue to do so with everything except this project during this test. Just happy to have it pointed out that I can use venues to have one project get tossed from memory on suspend, and the rest left in.

                                                                          OTOH I\'m not sure it\'s working. I added prefs for \"school\" and changed my computer to that venue, then did an update and saw the new venue message. My Ralph wu had not yet run. However it\'s since run for 2 hrs and been suspended, but rosetta beta is still in memory.

                                                                          p.s. I keep meaning to take out the sig but can\'t edit it out once posted, I\'ll go change my default.

                                                                          ____________

                                                                          Stargazer257

                                                                          Joined: Feb 16 06
                                                                          Posts: 6
                                                                          ID: 116
                                                                          Credit: 17,492
                                                                          RAC: 0
                                                                          Message 892 - Posted 17 Mar 2006 6:26:26 UTC

                                                                            Last modified: 17 Mar 2006 6:29:10 UTC

                                                                            So far, so good. Have run about 10 WUs on five different hosts (all WinXP SP2). No problems while changing settings to not stay resident in memory, and none so far with applications switching in and out. Knock on wood....
                                                                            ____________


                                                                            Join Us! - Click the Sig!

                                                                            [B^S] sTrey
                                                                            Avatar

                                                                            Joined: Feb 15 06
                                                                            Posts: 58
                                                                            ID: 36
                                                                            Credit: 15,430
                                                                            RAC: 0
                                                                            Message 895 - Posted 17 Mar 2006 16:22:37 UTC - in response to Message 892.

                                                                              Last modified: 17 Mar 2006 16:43:58 UTC

                                                                              So Aglarond was right to warn me.
                                                                              I added separate prefs for \"school\" and changed my computer\'s venue on this project only, and updated. hoping to have Ralph removed from memory when suspended but everything else stay resident. Overnight all my projects were removed from memory, not just ralph. [Even though it reported the venue correctly per project.] So apparently one can\'t fool around claiming one computer is in two places at once... Ralph behaved fine so far, for the 6 hours it\'s run. but I have switched back to keeping everything in memory.

                                                                              KB7RZF

                                                                              Joined: Feb 16 06
                                                                              Posts: 7
                                                                              ID: 49
                                                                              Credit: 1,426
                                                                              RAC: 0
                                                                              Message 896 - Posted 17 Mar 2006 18:17:40 UTC

                                                                                Did some playing around with just RALPH running. I changed pref\'s to take everything out of memory, I exited BOINC, restarted, suspended, rebooted, everything I could think of, and so far RALPH has not errored out on me. Seems to be working good so far.

                                                                                Jeremy

                                                                                doc :)

                                                                                Joined: Feb 16 06
                                                                                Posts: 46
                                                                                ID: 60
                                                                                Credit: 4,437
                                                                                RAC: 0
                                                                                Message 900 - Posted 18 Mar 2006 1:52:19 UTC

                                                                                  no crash through removing from memory here so far either (changed my prefs for rosetta to 1h workunits and put my app switch time to 90 minutes to avoid removing rosettas from memory :))
                                                                                  i still get random crashes when i do have the graphics open though (the exit code -1073741811 (0xc000000d) thing)

                                                                                  [B^S] sTrey
                                                                                  Avatar

                                                                                  Joined: Feb 15 06
                                                                                  Posts: 58
                                                                                  ID: 36
                                                                                  Credit: 15,430
                                                                                  RAC: 0
                                                                                  Message 909 - Posted 19 Mar 2006 0:57:20 UTC

                                                                                    Last modified: 19 Mar 2006 0:59:19 UTC

                                                                                    FWIW Ralph had behaved fine both when swapped and not, but it didn\'t survive a pc restart forced by a windows lockup. It was not the active project at the time, chkdsk found nothing scrambled, and none of the other projects lost their work (even cpdn seasonal!) -- but the Ralph wu which was at hour 14 of 16, has restarted at zero. Bummer.

                                                                                    Profile Fuzzy Hollynoodles
                                                                                    Avatar

                                                                                    Joined: Feb 19 06
                                                                                    Posts: 37
                                                                                    ID: 585
                                                                                    Credit: 2,089
                                                                                    RAC: 0
                                                                                    Message 914 - Posted 19 Mar 2006 7:12:48 UTC

                                                                                      Last modified: 19 Mar 2006 7:14:25 UTC

                                                                                      Rosetta crashed BIG time!

                                                                                      3/19/2006 8:07:15 AM|rosetta@home|Pausing result HOMSti_homDB019_1tif__352_1732_1 (removed from memory)
                                                                                      3/19/2006 8:07:15 AM|SETI@home Beta Test|Restarting result 01jl01ab.16610.114.798576.3.175_4 using setiathome_enhanced version 506
                                                                                      3/19/2006 8:07:16 AM||Rescheduling CPU: project op

                                                                                      ...

                                                                                      3/19/2006 8:07:24 AM|rosetta@home|Unrecoverable error for result HOMSti_homDB019_1tif__352_1732_1 ( - exit code -164 (0xffffff5c))
                                                                                      3/19/2006 8:07:24 AM||Rescheduling CPU: process exited
                                                                                      3/19/2006 8:07:24 AM|rosetta@home|Computation for result HOMSti_homDB019_1tif__352_1732_1 finished

                                                                                      This WU: http://boinc.bakerlab.org/rosetta/workunit.php?wuid=10786875
                                                                                      Result: http://boinc.bakerlab.org/rosetta/result.php?resultid=13549302

                                                                                      I see though that this WU has crashed for somebody else, so maybe a coincidence? Even I don\'t think it is.

                                                                                      Ralph WU runs fine. I\'ve tried to force it to run by suspending the others, and then resuming them, so the Ralph WU are preempted, and no crashes (so far).


                                                                                      ____________

                                                                                      "I'm trying to maintain a shred of dignity in this world." - Me

                                                                                      Marky-UK

                                                                                      Joined: Feb 16 06
                                                                                      Posts: 5
                                                                                      ID: 115
                                                                                      Credit: 1,530
                                                                                      RAC: 0
                                                                                      Message 917 - Posted 19 Mar 2006 12:02:40 UTC

                                                                                        Rosetta\'s just crashed for me too: http://ralph.bakerlab.org/workunit.php?wuid=17490

                                                                                        Unrecoverable error for result HB_BARCODE_30_1enh__352_83_0 ( - exit code -1073741811 (0xc000000d))
                                                                                        ____________

                                                                                        Profile Fuzzy Hollynoodles
                                                                                        Avatar

                                                                                        Joined: Feb 19 06
                                                                                        Posts: 37
                                                                                        ID: 585
                                                                                        Credit: 2,089
                                                                                        RAC: 0
                                                                                        Message 918 - Posted 19 Mar 2006 16:31:09 UTC

                                                                                          It happened again.

                                                                                          3/19/2006 5:36:43 PM|rosetta@home|Pausing result HB_BARCODE_30_1bq9A_351_14302_0 (removed from memory)
                                                                                          3/19/2006 5:36:44 PM|rosetta@home|Unrecoverable error for result HB_BARCODE_30_1bq9A_351_14302_0 ( - exit code -164 (0xffffff5c))
                                                                                          3/19/2006 5:36:44 PM||Rescheduling CPU: process exited
                                                                                          3/19/2006 5:36:44 PM|rosetta@home|Computation for result HB_BARCODE_30_1bq9A_351_14302_0 finished

                                                                                          Rosetta WU: http://boinc.bakerlab.org/rosetta/workunit.php?wuid=11537937
                                                                                          Result: http://boinc.bakerlab.org/rosetta/result.php?resultid=14251499

                                                                                          So this is it, I\'m changing back to keeping WU\'s in memory while preempted untill you get this bug fixed. Else you devs should say that we can\'t crunch Rosetta and Ralph WU\'s on the same computer!



                                                                                          ____________

                                                                                          "I'm trying to maintain a shred of dignity in this world." - Me

                                                                                          Profile Contact
                                                                                          Avatar

                                                                                          Joined: Feb 16 06
                                                                                          Posts: 10
                                                                                          ID: 74
                                                                                          Credit: 92,601
                                                                                          RAC: 0
                                                                                          Message 930 - Posted 20 Mar 2006 1:14:21 UTC

                                                                                            Looks good. No matter what i do, can\'t get ralph to fail under Win98 or XP while switching with app removed from memory.
                                                                                            ____________

                                                                                            Click and enter your name for your BOINC Statistics

                                                                                            Mike Gelvin
                                                                                            Avatar

                                                                                            Joined: Feb 17 06
                                                                                            Posts: 50
                                                                                            ID: 468
                                                                                            Credit: 55,397
                                                                                            RAC: 0
                                                                                            Message 938 - Posted 22 Mar 2006 0:28:26 UTC - in response to Message 918.

                                                                                              It happened again.

                                                                                              3/19/2006 5:36:43 PM|rosetta@home|Pausing result HB_BARCODE_30_1bq9A_351_14302_0 (removed from memory)
                                                                                              3/19/2006 5:36:44 PM|rosetta@home|Unrecoverable error for result HB_BARCODE_30_1bq9A_351_14302_0 ( - exit code -164 (0xffffff5c))
                                                                                              3/19/2006 5:36:44 PM||Rescheduling CPU: process exited
                                                                                              3/19/2006 5:36:44 PM|rosetta@home|Computation for result HB_BARCODE_30_1bq9A_351_14302_0 finished

                                                                                              Rosetta WU: http://boinc.bakerlab.org/rosetta/workunit.php?wuid=11537937
                                                                                              Result: http://boinc.bakerlab.org/rosetta/result.php?resultid=14251499

                                                                                              So this is it, I\'m changing back to keeping WU\'s in memory while preempted untill you get this bug fixed. Else you devs should say that we can\'t crunch Rosetta and Ralph WU\'s on the same computer!




                                                                                              You can crunch Ralph and Rosetta on the same computer. But if you don’t leave the apps in memory then Rosetta (from the Rosetta project NOT Ralph). does indeed have a good chance of failing. They have not yet updated Rosetta with the Ralph “leave in memory” fix.
                                                                                              When Rosetta gets fixed, I suspect you will see an app version greater than or equal to 4.93

                                                                                              ____________

                                                                                              Big Whiskey
                                                                                              Avatar

                                                                                              Joined: Mar 21 06
                                                                                              Posts: 3
                                                                                              ID: 1157
                                                                                              Credit: 3,342
                                                                                              RAC: 0
                                                                                              Message 943 - Posted 22 Mar 2006 3:14:52 UTC

                                                                                                My crash was BIGGER!!!

                                                                                                When I openned BOINC manager it said it was running Rosetta and I tried to open Show Graphics. I caught a brief look of the SETI screen then close it self, I tried a few more times until it didn\'t open at all.

                                                                                                So I did A Restart without suspending BOINC. Wrong move apparently!
                                                                                                When BOINC started again I had lost several WU,one Seti,one Rosetta and two Seti Betas. And what was left BOINC would run for a minute then switch to next one.
                                                                                                I checked the message log file and found that five minutes before I restarted that Rosetta and Ralph start switching WU every second for some reason.

                                                                                                All the units that I lost have the same error code

                                                                                                2006-03-21 17:29:11 [ralph@home] Unrecoverable error for result HB_BARCODE_30_1fna__352_134_0 ( - exit code -1073741502 (0xc0000142))
                                                                                                2006-03-21 17:29:12 [rosetta@home] Unrecoverable error for result FA_RLXti_hom027_1tit__362_302_0 ( - exit code -1073741502 (0xc0000142))





                                                                                                ____________

                                                                                                Profile Greg C. TNO

                                                                                                Joined: Mar 26 06
                                                                                                Posts: 1
                                                                                                ID: 1186
                                                                                                Credit: 51,485
                                                                                                RAC: 0
                                                                                                Message 1003 - Posted 28 Mar 2006 2:53:29 UTC

                                                                                                  I just started running Ralph ON SEVERAL MACHINES, SO FAR NOT A SINGLE ONE has crashed, stalled or pulled the 1% trick. I have it set to remove applications from memory and have discontinued Rosetta, (no new work). The reliability of this version (4.92) seems great.

                                                                                                  If you\'re curious, I\'m running older equipment and Win2k Pro. Right now I have Ralph on a slew of PIII 600E\'s, and an AMD Atlon XP 1500. I could be more specific if it is relevant.

                                                                                                  Regards.
                                                                                                  ____________

                                                                                                  david baker

                                                                                                  Joined: Mar 25 06
                                                                                                  Posts: 3
                                                                                                  ID: 1180
                                                                                                  Credit: 411
                                                                                                  RAC: 0
                                                                                                  Message 1004 - Posted 28 Mar 2006 4:16:41 UTC - in response to Message 1003.

                                                                                                    I just started running Ralph ON SEVERAL MACHINES, SO FAR NOT A SINGLE ONE has crashed, stalled or pulled the 1% trick. I have it set to remove applications from memory and have discontinued Rosetta, (no new work). The reliability of this version (4.92) seems great.

                                                                                                    If you\'re curious, I\'m running older equipment and Win2k Pro. Right now I have Ralph on a slew of PIII 600E\'s, and an AMD Atlon XP 1500. I could be more specific if it is relevant.

                                                                                                    Regards.


                                                                                                    that is great! what are other people finding?

                                                                                                    ____________

                                                                                                    Rayflic

                                                                                                    Joined: Feb 16 06
                                                                                                    Posts: 2
                                                                                                    ID: 269
                                                                                                    Credit: 2,886
                                                                                                    RAC: 0
                                                                                                    Message 1039 - Posted 7 Apr 2006 22:00:10 UTC

                                                                                                      A couple of problems (today)

                                                                                                      4/6/2006 10:46:16 AM|rosetta@home|Unrecoverable error for result HBLR_1.0_1hz6_420_2926_0 ( - exit code -1073741811 (0xc000000d))


                                                                                                      4/7/2006 12:55:47 PM|ralph@home|Unrecoverable error for result BARCODE_30_1ten__NATIVE_374_23_0 ( - exit code -1073741811 (0xc000000d))

                                                                                                      ____________

                                                                                                      Profile [AF>France>Est>Lorraine]Le Zam
                                                                                                      Avatar

                                                                                                      Joined: Mar 2 06
                                                                                                      Posts: 9
                                                                                                      ID: 929
                                                                                                      Credit: 3,278
                                                                                                      RAC: 0
                                                                                                      Message 1083 - Posted 12 Apr 2006 8:42:03 UTC

                                                                                                        Hello i have this error today 1%
                                                                                                        12/04/2006 10:52:10|ralph@home|Unrecoverable error for result HBLR_1.0_1b72_378_92_1 ( - exit code -1073741819 (0xc0000005))
                                                                                                        Go ahead.
                                                                                                        Thanks and have a good fun with Ralph !!!
                                                                                                        ____________

                                                                                                        Profile [AF>France>Est>Lorraine]Le Zam
                                                                                                        Avatar

                                                                                                        Joined: Mar 2 06
                                                                                                        Posts: 9
                                                                                                        ID: 929
                                                                                                        Credit: 3,278
                                                                                                        RAC: 0
                                                                                                        Message 1100 - Posted 12 Apr 2006 19:11:54 UTC

                                                                                                          Another couple of bad Works-units !!!

                                                                                                          12/04/2006 16:41:38|ralph@home|Unrecoverable error for result HBLR_1.0_1ogw_377_17_2 ( - exit code -1073741819 (0xc0000005))

                                                                                                          12/04/2006 17:25:40|ralph@home|Unrecoverable error for result HBLR_1.0_1r69_378_63_2 ( - exit code -1073741819 (0xc0000005))

                                                                                                          Bye

                                                                                                          ____________

                                                                                                          Profile [AF>France>Est>Lorraine]Le Zam
                                                                                                          Avatar

                                                                                                          Joined: Mar 2 06
                                                                                                          Posts: 9
                                                                                                          ID: 929
                                                                                                          Credit: 3,278
                                                                                                          RAC: 0
                                                                                                          Message 1127 - Posted 13 Apr 2006 16:04:15 UTC

                                                                                                            13/04/2006 13:31:58|ralph@home|Starting result FACONTACTS_NOFILTERS_1c9oA_381_9_0 using rosetta_beta version 499
                                                                                                            13/04/2006 13:53:31|ralph@home|Unrecoverable error for result FACONTACTS_NOFILTERS_1c9oA_381_9_0 ( - exit code -1073741819 (0xc0000005))
                                                                                                            13/04/2006 13:53:31||request_reschedule_cpus: process exited
                                                                                                            13/04/2006 13:53:31|ralph@home|Computation for result FACONTACTS_NOFILTERS_1c9oA_381_9_0 finished

                                                                                                            13/04/2006 18:17:40|ralph@home|Unrecoverable error for result FACONTACTS_NOFILTERS_1cc8A_381_9_0 (aborted via GUI RPC)
                                                                                                            I have stopped this Wu : 2H48 for 1.31%

                                                                                                            ____________

                                                                                                            Message boards : Current tests : Switching between projects with applications removed from memory


                                                                                                            Home | Join | About | Participants | Community | Statistics

                                                                                                            Copyright © 2017 University of Washington

                                                                                                            Last Modified: 20 Nov 2008 19:41:56 UTC
                                                                                                            Back to top ^