Message boards : Current tests : Switching between projects with applications removed from memory
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Aglarond Send message Joined: 16 Feb 06 Posts: 11 Credit: 1,094 RAC: 0 |
Hi, is switching between projects the same problem as with going to standby with my PC? As I have laptop running rosetta, I usually go to standby when I want to take it elsewhere. And Rosetta (also Ralph) always crashes like this: 22. 2. 2006 13:53:32|ralph@home|Result BARCODE_30_1cc8A_215_35_0 exited with zero status but no 'finished' file 22. 2. 2006 13:53:32|ralph@home|If this happens repeatedly you may need to reset the project. 22. 2. 2006 13:53:32||Rescheduling CPU: application exited 22. 2. 2006 13:53:32|ralph@home|Restarting result BARCODE_30_1cc8A_215_35_0 using rosetta_beta version 486 This may help you find the problem: when I go to standby mode and then wake up my laptop in very short time (5 sec), rosetta continues normally and also graphic window continues as if nothing happend. But if I leave it in standby just a litte longer (15 sec), rosetta crashes and graphic window closes. The same happens with some other Boinc projects. |
Moderator9 Volunteer moderator Send message Joined: 16 Feb 06 Posts: 251 Credit: 0 RAC: 0 |
Hi, is switching between projects the same problem as with going to standby with my PC? As I have laptop running rosetta, I usually go to standby when I want to take it elsewhere. And Rosetta (also Ralph) always crashes like this: Sleep or standby mode is actually very different than an application swap. However, most laptops do not crash projects when they sleep. While activity suspends just as you might expect, the system should snapshot the application and then sleep. It looks as though your system is having some kind of problem reloading after sleep. IT could be caused by a number of things, but it is not likely a Ralph issue. I assume your battery is fully changed, but are you certain it is in good condition?. If not this can cause a sleeping system to crash. Try running the BOINC system on battery power for a half hour or so and see if the system fails Moderator9 RALPH@home FAQs RALPH@home Guidelines Moderator Contact |
Aglarond Send message Joined: 16 Feb 06 Posts: 11 Credit: 1,094 RAC: 0 |
Hmm.. no other apps ever had problems with it, except Boinc projects. But still it can be problem with my laptop. I tried it also running on AC power, and also running on battery power - both with the same result. |
Aglarond Send message Joined: 16 Feb 06 Posts: 11 Credit: 1,094 RAC: 0 |
Now I looked into WU, that was running when I tried to switch apps in Boinc (without leavin in memory) and also, while I have put my laptop into standby. This is part of it: <stderr_txt> ... No heartbeat from core client for 31 sec - exiting ... </stderr_txt> Do you think this can be the reason why Rosetta exits after my system wake-ups from standby? It doesn't exit when I wake-up my laptop in just few seconds. This behavior is similar with other Boinc projects. |
tgm Send message Joined: 19 Feb 06 Posts: 5 Credit: 1,066 RAC: 0 |
Removing rosetta beta 4.87 work units from memory on one of my windows machines is definitely FAILING with end state client error. This machine is a DUAL PROCESSOR P3 750 w/ 512MB ram running on Windows Server 2003. I have three examples: https://ralph.bakerlab.org/workunit.php?wuid=5559 https://ralph.bakerlab.org/workunit.php?wuid=5560 https://ralph.bakerlab.org/workunit.php?wuid=5561 I have now switched my configuration to keep wu's in memory and performed an update. We'll see what happens. Curiously, I have another wu running on a Fedora box that that is showing some other bizare behavior, but I'll start a new post for this one. |
Colin Porter Send message Joined: 16 Feb 06 Posts: 3 Credit: 24 RAC: 0 |
YOU MAY HAVE CRACKED IT. Until today I have not been able to complete a WU with either "Leave applications in memory while preempted" selected to YES or NO. As soon as a switch occured, for whatever reason, the WU would error out. The difference today is Ralph is runnung 4.89. My Results |
Dimitris Hatzopoulos Send message Joined: 16 Feb 06 Posts: 31 Credit: 2,308 RAC: 0 |
Removing rosetta beta 4.87 work units from memory on one of my windows machines is definitely FAILING with end state client error. This machine is a DUAL PROCESSOR P3 750 w/ 512MB ram running on Windows Server 2003. I think this is the case when a slower machine (P3/750) takes too long to complete the first model and it gets pre-empted and removed from RAM / VM before even the first checkpoint is reached. In which case you need to keep in RAM while pre-empted and/or increase times between app switching to a higher value from default 60min, to e.g. 4hr in your case. |
tgm Send message Joined: 19 Feb 06 Posts: 5 Credit: 1,066 RAC: 0 |
I think this is the case when a slower machine (P3/750) takes too long to complete the first model and it gets pre-empted and removed from RAM / VM before even the first checkpoint is reached. I sort of doubt this is the case. I know one of the wu's got up to more than 60% before it crashed. |
Dimitris Hatzopoulos Send message Joined: 16 Feb 06 Posts: 31 Credit: 2,308 RAC: 0 |
I sort of doubt this is the case. I know one of the wu's got up to more than 60% before it crashed. Due to the way "new" Rosetta WUs work (variable # Models during a fixed time period e.g. 8hr), you might want to focus more on the Model / Step statistic, rather than % progress. In that regard, the WU stderr provided aren't very helpful to do remote-diagnostics. In my case, I got similar errors (for R@h, not RALPH) with yours on a machine which had multiple reboots over the previous 3 days, due to power problems. |
Moderator9 Volunteer moderator Send message Joined: 16 Feb 06 Posts: 251 Credit: 0 RAC: 0 |
I sort of doubt this is the case. I know one of the wu's got up to more than 60% before it crashed. You are correct about this. So much so that the explanation of all of this has been updated in the Rosetta FAQs in this post (I will do it here when there is some time). Look below the green edit line for information specific to the 1% diagnostic info. Some of this might help remote diagnostics as well, but that is a specialized issue. If you can use the remote functions in BOINC you could still use the graphic. Moderator9 RALPH@home FAQs RALPH@home Guidelines Moderator Contact |
Aaron Finney Send message Joined: 16 Feb 06 Posts: 56 Credit: 1,457 RAC: 0 |
Had a problem with this on a workunit that had ran for 60 hours, application version 4.92 3/13/2006 7:40:03 PM||Suspending computation and network activity - user request 3/13/2006 7:40:03 PM|climateprediction.net|Pausing result sulphur_id14_000856696_0 (removed from memory) 3/13/2006 7:40:03 PM|ralph@home|Pausing result TEST_HOMOLOG_ABINITIO_hom008_1fna__220_3_2 (removed from memory) 3/13/2006 7:40:04 PM|ralph@home|Unrecoverable error for result TEST_HOMOLOG_ABINITIO_hom008_1fna__220_3_2 ( - exit code -1073741819 (0xc0000005)) 3/13/2006 7:40:04 PM||request_reschedule_cpus: process exited 3/13/2006 7:40:04 PM|ralph@home|Computation for result TEST_HOMOLOG_ABINITIO_hom008_1fna__220_3_2 finished 3/13/2006 7:40:05 PM||request_reschedule_cpus: process exited 3/13/2006 7:40:07 PM||Resuming computation and network activity 3/13/2006 7:40:07 PM||request_reschedule_cpus: Resuming activities 3/13/2006 7:40:07 PM||Allowing work fetch again. 3/13/2006 7:40:07 PM||Resuming round-robin CPU scheduling. |
dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0 |
The current windows application has a fix that we want to test for this issue. The last batch of work units have default cpu run times of 8 hours. Please let us know if the windows app version 4.93 continues to crash when switching to another app and not left in memory or if the fix helps. |
[B^S] sTrey Send message Joined: 15 Feb 06 Posts: 58 Credit: 15,430 RAC: 0 |
I'm having a problem with this, but not the one you're trying to fix. BOINC simply does not have enough "venues" to set up custom situations to either test specific things or to tune resources for specific machines. And since it doesn't allow "local control", we have to balance carefully. Duh, thanks genes for pointing out the fact that different venues, few as they are, can be used in this way, even with the same host. With one machine and multiple projects I wasn't going to change my memory settings for this test, but on seeing this I reconfigured to help out. It also alleviates a bit of the strain on my box's vmem since I'm running cpdn's seasonal attribution project and it's quite a hog. |
Aglarond Send message Joined: 16 Feb 06 Posts: 11 Credit: 1,094 RAC: 0 |
It also alleviates a bit of the strain on my box's vmem since I'm running cpdn's seasonal attribution project and it's quite a hog. Carefully with cpdn's seasonal attribution project. This is from their forums: If you have the option 'remove from memory' when preempting, and the boinc default of 1 hour between swapping, the chances are that you have thrown away the model each time you preempt. This project's defaults are 2 hours and 'keep in memory' for obvious reasons. |
scottLobster Send message Joined: 17 Feb 06 Posts: 1 Credit: 826 RAC: 0 |
The current windows application has a fix that we want to test for this issue. The last batch of work units have default cpu run times of 8 hours. Please let us know if the windows app version 4.93 continues to crash when switching to another app and not left in memory or if the fix helps. Just did a few switches between Rosetta and Ralph with leave in memory disabled. Seems to work fine. Rosetta didn't crash either. I'll leave it like this overnight and see what happens. |
[B^S] sTrey Send message Joined: 15 Feb 06 Posts: 58 Credit: 15,430 RAC: 0 |
Carefully with cpdn's seasonal attribution project. This is from their forums: Thanks for the warning. I keep all my projects in memory and will continue to do so with everything except this project during this test. Just happy to have it pointed out that I can use venues to have one project get tossed from memory on suspend, and the rest left in. OTOH I'm not sure it's working. I added prefs for "school" and changed my computer to that venue, then did an update and saw the new venue message. My Ralph wu had not yet run. However it's since run for 2 hrs and been suspended, but rosetta beta is still in memory. p.s. I keep meaning to take out the sig but can't edit it out once posted, I'll go change my default. |
Stargazer257 Send message Joined: 16 Feb 06 Posts: 6 Credit: 17,492 RAC: 0 |
|
[B^S] sTrey Send message Joined: 15 Feb 06 Posts: 58 Credit: 15,430 RAC: 0 |
So Aglarond was right to warn me. I added separate prefs for "school" and changed my computer's venue on this project only, and updated. hoping to have Ralph removed from memory when suspended but everything else stay resident. Overnight all my projects were removed from memory, not just ralph. [Even though it reported the venue correctly per project.] So apparently one can't fool around claiming one computer is in two places at once... Ralph behaved fine so far, for the 6 hours it's run. but I have switched back to keeping everything in memory. |
KB7RZF Send message Joined: 16 Feb 06 Posts: 7 Credit: 1,426 RAC: 0 |
Did some playing around with just RALPH running. I changed pref's to take everything out of memory, I exited BOINC, restarted, suspended, rebooted, everything I could think of, and so far RALPH has not errored out on me. Seems to be working good so far. Jeremy |
doc :) Send message Joined: 16 Feb 06 Posts: 46 Credit: 4,437 RAC: 0 |
no crash through removing from memory here so far either (changed my prefs for rosetta to 1h workunits and put my app switch time to 90 minutes to avoid removing rosettas from memory :)) i still get random crashes when i do have the graphics open though (the exit code -1073741811 (0xc000000d) thing) |
Message boards :
Current tests :
Switching between projects with applications removed from memory
©2024 University of Washington
http://www.bakerlab.org