Switching between projects with applications removed from memory

Message boards : Current tests : Switching between projects with applications removed from memory

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Aglarond

Send message
Joined: 16 Feb 06
Posts: 11
Credit: 1,094
RAC: 0
Message 470 - Posted: 22 Feb 2006, 12:53:02 UTC

Hi, is switching between projects the same problem as with going to standby with my PC? As I have laptop running rosetta, I usually go to standby when I want to take it elsewhere. And Rosetta (also Ralph) always crashes like this:

22. 2. 2006 13:53:32|ralph@home|Result BARCODE_30_1cc8A_215_35_0 exited with zero status but no 'finished' file
22. 2. 2006 13:53:32|ralph@home|If this happens repeatedly you may need to reset the project.
22. 2. 2006 13:53:32||Rescheduling CPU: application exited
22. 2. 2006 13:53:32|ralph@home|Restarting result BARCODE_30_1cc8A_215_35_0 using rosetta_beta version 486

This may help you find the problem: when I go to standby mode and then wake up my laptop in very short time (5 sec), rosetta continues normally and also graphic window continues as if nothing happend. But if I leave it in standby just a litte longer (15 sec), rosetta crashes and graphic window closes. The same happens with some other Boinc projects.
ID: 470 · Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 16 Feb 06
Posts: 251
Credit: 0
RAC: 0
Message 477 - Posted: 22 Feb 2006, 14:41:41 UTC - in response to Message 470.  
Last modified: 22 Feb 2006, 14:44:50 UTC

Hi, is switching between projects the same problem as with going to standby with my PC? As I have laptop running rosetta, I usually go to standby when I want to take it elsewhere. And Rosetta (also Ralph) always crashes like this:

22. 2. 2006 13:53:32|ralph@home|Result BARCODE_30_1cc8A_215_35_0 exited with zero status but no 'finished' file
22. 2. 2006 13:53:32|ralph@home|If this happens repeatedly you may need to reset the project.
22. 2. 2006 13:53:32||Rescheduling CPU: application exited
22. 2. 2006 13:53:32|ralph@home|Restarting result BARCODE_30_1cc8A_215_35_0 using rosetta_beta version 486

This may help you find the problem: when I go to standby mode and then wake up my laptop in very short time (5 sec), rosetta continues normally and also graphic window continues as if nothing happend. But if I leave it in standby just a litte longer (15 sec), rosetta crashes and graphic window closes. The same happens with some other Boinc projects.




Sleep or standby mode is actually very different than an application swap. However, most laptops do not crash projects when they sleep. While activity suspends just as you might expect, the system should snapshot the application and then sleep. It looks as though your system is having some kind of problem reloading after sleep. IT could be caused by a number of things, but it is not likely a Ralph issue.

I assume your battery is fully changed, but are you certain it is in good condition?. If not this can cause a sleeping system to crash. Try running the BOINC system on battery power for a half hour or so and see if the system fails

Moderator9
RALPH@home FAQs
RALPH@home Guidelines
Moderator Contact
ID: 477 · Report as offensive    Reply Quote
Aglarond

Send message
Joined: 16 Feb 06
Posts: 11
Credit: 1,094
RAC: 0
Message 513 - Posted: 23 Feb 2006, 1:59:47 UTC - in response to Message 477.  


Sleep or standby mode is actually very different than an application swap. However, most laptops do not crash projects when they sleep. While activity suspends just as you might expect, the system should snapshot the application and then sleep. It looks as though your system is having some kind of problem reloading after sleep. IT could be caused by a number of things, but it is not likely a Ralph issue.

I assume your battery is fully changed, but are you certain it is in good condition?. If not this can cause a sleeping system to crash. Try running the BOINC system on battery power for a half hour or so and see if the system fails


Hmm.. no other apps ever had problems with it, except Boinc projects. But still it can be problem with my laptop. I tried it also running on AC power, and also running on battery power - both with the same result.

ID: 513 · Report as offensive    Reply Quote
Aglarond

Send message
Joined: 16 Feb 06
Posts: 11
Credit: 1,094
RAC: 0
Message 514 - Posted: 23 Feb 2006, 2:30:18 UTC - in response to Message 513.  

Now I looked into WU, that was running when I tried to switch apps in Boinc (without leavin in memory) and also, while I have put my laptop into standby. This is part of it:

<stderr_txt>
...
No heartbeat from core client for 31 sec - exiting
...
</stderr_txt>

Do you think this can be the reason why Rosetta exits after my system wake-ups from standby? It doesn't exit when I wake-up my laptop in just few seconds. This behavior is similar with other Boinc projects.
ID: 514 · Report as offensive    Reply Quote
tgm

Send message
Joined: 19 Feb 06
Posts: 5
Credit: 1,066
RAC: 0
Message 558 - Posted: 24 Feb 2006, 6:06:30 UTC

Removing rosetta beta 4.87 work units from memory on one of my windows machines is definitely FAILING with end state client error. This machine is a DUAL PROCESSOR P3 750 w/ 512MB ram running on Windows Server 2003.

I have three examples:

https://ralph.bakerlab.org/workunit.php?wuid=5559
https://ralph.bakerlab.org/workunit.php?wuid=5560
https://ralph.bakerlab.org/workunit.php?wuid=5561

I have now switched my configuration to keep wu's in memory and performed an update. We'll see what happens.

Curiously, I have another wu running on a Fedora box that that is showing some other bizare behavior, but I'll start a new post for this one.
ID: 558 · Report as offensive    Reply Quote
Colin Porter

Send message
Joined: 16 Feb 06
Posts: 3
Credit: 24
RAC: 0
Message 620 - Posted: 25 Feb 2006, 14:00:01 UTC

YOU MAY HAVE CRACKED IT.

Until today I have not been able to complete a WU with either "Leave applications in memory while preempted" selected to YES or NO. As soon as a switch occured, for whatever reason, the WU would error out. The difference today is Ralph is runnung 4.89.

My Results

ID: 620 · Report as offensive    Reply Quote
Dimitris Hatzopoulos

Send message
Joined: 16 Feb 06
Posts: 31
Credit: 2,308
RAC: 0
Message 649 - Posted: 25 Feb 2006, 19:21:13 UTC - in response to Message 558.  

Removing rosetta beta 4.87 work units from memory on one of my windows machines is definitely FAILING with end state client error. This machine is a DUAL PROCESSOR P3 750 w/ 512MB ram running on Windows Server 2003.

I have now switched my configuration to keep wu's in memory and performed an update. We'll see what happens.

Curiously, I have another wu running on a Fedora box that that is showing some other bizare behavior, but I'll start a new post for this one.


I think this is the case when a slower machine (P3/750) takes too long to complete the first model and it gets pre-empted and removed from RAM / VM before even the first checkpoint is reached.

In which case you need to keep in RAM while pre-empted and/or increase times between app switching to a higher value from default 60min, to e.g. 4hr in your case.
ID: 649 · Report as offensive    Reply Quote
tgm

Send message
Joined: 19 Feb 06
Posts: 5
Credit: 1,066
RAC: 0
Message 691 - Posted: 27 Feb 2006, 3:42:37 UTC - in response to Message 649.  

I think this is the case when a slower machine (P3/750) takes too long to complete the first model and it gets pre-empted and removed from RAM / VM before even the first checkpoint is reached.

In which case you need to keep in RAM while pre-empted and/or increase times between app switching to a higher value from default 60min, to e.g. 4hr in your case.


I sort of doubt this is the case. I know one of the wu's got up to more than 60% before it crashed.
ID: 691 · Report as offensive    Reply Quote
Dimitris Hatzopoulos

Send message
Joined: 16 Feb 06
Posts: 31
Credit: 2,308
RAC: 0
Message 701 - Posted: 27 Feb 2006, 10:10:08 UTC - in response to Message 691.  

I sort of doubt this is the case. I know one of the wu's got up to more than 60% before it crashed.


Due to the way "new" Rosetta WUs work (variable # Models during a fixed time period e.g. 8hr), you might want to focus more on the Model / Step statistic, rather than % progress.

In that regard, the WU stderr provided aren't very helpful to do remote-diagnostics. In my case, I got similar errors (for R@h, not RALPH) with yours on a machine which had multiple reboots over the previous 3 days, due to power problems.
ID: 701 · Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 16 Feb 06
Posts: 251
Credit: 0
RAC: 0
Message 705 - Posted: 27 Feb 2006, 17:19:52 UTC - in response to Message 701.  
Last modified: 27 Feb 2006, 17:26:43 UTC

I sort of doubt this is the case. I know one of the wu's got up to more than 60% before it crashed.


Due to the way "new" Rosetta WUs work (variable # Models during a fixed time period e.g. 8hr), you might want to focus more on the Model / Step statistic, rather than % progress.

In that regard, the WU stderr provided aren't very helpful to do remote-diagnostics. In my case, I got similar errors (for R@h, not RALPH) with yours on a machine which had multiple reboots over the previous 3 days, due to power problems.


You are correct about this. So much so that the explanation of all of this has been updated in the Rosetta FAQs in this post (I will do it here when there is some time).

Look below the green edit line for information specific to the 1% diagnostic info. Some of this might help remote diagnostics as well, but that is a specialized issue. If you can use the remote functions in BOINC you could still use the graphic.

Moderator9
RALPH@home FAQs
RALPH@home Guidelines
Moderator Contact
ID: 705 · Report as offensive    Reply Quote
Aaron Finney

Send message
Joined: 16 Feb 06
Posts: 56
Credit: 1,457
RAC: 0
Message 875 - Posted: 14 Mar 2006, 16:21:02 UTC - in response to Message 4.  
Last modified: 14 Mar 2006, 16:21:17 UTC

Had a problem with this on a workunit that had ran for 60 hours, application version 4.92

3/13/2006 7:40:03 PM||Suspending computation and network activity - user request
3/13/2006 7:40:03 PM|climateprediction.net|Pausing result sulphur_id14_000856696_0 (removed from memory)
3/13/2006 7:40:03 PM|ralph@home|Pausing result TEST_HOMOLOG_ABINITIO_hom008_1fna__220_3_2 (removed from memory)
3/13/2006 7:40:04 PM|ralph@home|Unrecoverable error for result TEST_HOMOLOG_ABINITIO_hom008_1fna__220_3_2 ( - exit code -1073741819 (0xc0000005))
3/13/2006 7:40:04 PM||request_reschedule_cpus: process exited
3/13/2006 7:40:04 PM|ralph@home|Computation for result TEST_HOMOLOG_ABINITIO_hom008_1fna__220_3_2 finished
3/13/2006 7:40:05 PM||request_reschedule_cpus: process exited
3/13/2006 7:40:07 PM||Resuming computation and network activity
3/13/2006 7:40:07 PM||request_reschedule_cpus: Resuming activities
3/13/2006 7:40:07 PM||Allowing work fetch again.
3/13/2006 7:40:07 PM||Resuming round-robin CPU scheduling.

ID: 875 · Report as offensive    Reply Quote
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 250
Credit: 543,579
RAC: 0
Message 883 - Posted: 16 Mar 2006, 18:29:37 UTC

The current windows application has a fix that we want to test for this issue. The last batch of work units have default cpu run times of 8 hours. Please let us know if the windows app version 4.93 continues to crash when switching to another app and not left in memory or if the fix helps.
ID: 883 · Report as offensive    Reply Quote
[B^S] sTrey
Avatar

Send message
Joined: 15 Feb 06
Posts: 58
Credit: 15,430
RAC: 0
Message 887 - Posted: 16 Mar 2006, 23:07:20 UTC - in response to Message 38.  

I'm having a problem with this, but not the one you're trying to fix. BOINC simply does not have enough "venues" to set up custom situations to either test specific things or to tune resources for specific machines. And since it doesn't allow "local control", we have to balance carefully.


Duh, thanks genes for pointing out the fact that different venues, few as they are, can be used in this way, even with the same host. With one machine and multiple projects I wasn't going to change my memory settings for this test, but on seeing this I reconfigured to help out. It also alleviates a bit of the strain on my box's vmem since I'm running cpdn's seasonal attribution project and it's quite a hog.
ID: 887 · Report as offensive    Reply Quote
Aglarond

Send message
Joined: 16 Feb 06
Posts: 11
Credit: 1,094
RAC: 0
Message 888 - Posted: 17 Mar 2006, 0:14:09 UTC - in response to Message 887.  

It also alleviates a bit of the strain on my box's vmem since I'm running cpdn's seasonal attribution project and it's quite a hog.


Carefully with cpdn's seasonal attribution project. This is from their forums:
If you have the option 'remove from memory' when preempting, and the boinc default of 1 hour between swapping, the chances are that you have thrown away the model each time you preempt. This project's defaults are 2 hours and 'keep in memory' for obvious reasons.

ID: 888 · Report as offensive    Reply Quote
scottLobster

Send message
Joined: 17 Feb 06
Posts: 1
Credit: 826
RAC: 0
Message 889 - Posted: 17 Mar 2006, 0:36:17 UTC - in response to Message 883.  

The current windows application has a fix that we want to test for this issue. The last batch of work units have default cpu run times of 8 hours. Please let us know if the windows app version 4.93 continues to crash when switching to another app and not left in memory or if the fix helps.


Just did a few switches between Rosetta and Ralph with leave in memory disabled. Seems to work fine. Rosetta didn't crash either. I'll leave it like this overnight and see what happens.

ID: 889 · Report as offensive    Reply Quote
[B^S] sTrey
Avatar

Send message
Joined: 15 Feb 06
Posts: 58
Credit: 15,430
RAC: 0
Message 890 - Posted: 17 Mar 2006, 2:45:03 UTC - in response to Message 888.  
Last modified: 17 Mar 2006, 2:46:37 UTC

Carefully with cpdn's seasonal attribution project. This is from their forums:
If you have the option 'remove from memory' when preempting, and the boinc default of 1 hour between swapping, the chances are that you have thrown away the model each time you preempt. This project's defaults are 2 hours and 'keep in
memory' for obvious reasons.


Thanks for the warning. I keep all my projects in memory and will continue to do so with everything except this project during this test. Just happy to have it pointed out that I can use venues to have one project get tossed from memory on suspend, and the rest left in.

OTOH I'm not sure it's working. I added prefs for "school" and changed my computer to that venue, then did an update and saw the new venue message. My Ralph wu had not yet run. However it's since run for 2 hrs and been suspended, but rosetta beta is still in memory.

p.s. I keep meaning to take out the sig but can't edit it out once posted, I'll go change my default.

ID: 890 · Report as offensive    Reply Quote
Stargazer257

Send message
Joined: 16 Feb 06
Posts: 6
Credit: 17,492
RAC: 0
Message 892 - Posted: 17 Mar 2006, 6:26:26 UTC
Last modified: 17 Mar 2006, 6:29:10 UTC

So far, so good. Have run about 10 WUs on five different hosts (all WinXP SP2). No problems while changing settings to not stay resident in memory, and none so far with applications switching in and out. Knock on wood....


Join Us! - Click the Sig!
ID: 892 · Report as offensive    Reply Quote
[B^S] sTrey
Avatar

Send message
Joined: 15 Feb 06
Posts: 58
Credit: 15,430
RAC: 0
Message 895 - Posted: 17 Mar 2006, 16:22:37 UTC - in response to Message 892.  
Last modified: 17 Mar 2006, 16:43:58 UTC

So Aglarond was right to warn me.
I added separate prefs for "school" and changed my computer's venue on this project only, and updated. hoping to have Ralph removed from memory when suspended but everything else stay resident. Overnight all my projects were removed from memory, not just ralph. [Even though it reported the venue correctly per project.] So apparently one can't fool around claiming one computer is in two places at once... Ralph behaved fine so far, for the 6 hours it's run. but I have switched back to keeping everything in memory.
ID: 895 · Report as offensive    Reply Quote
KB7RZF

Send message
Joined: 16 Feb 06
Posts: 7
Credit: 1,426
RAC: 0
Message 896 - Posted: 17 Mar 2006, 18:17:40 UTC

Did some playing around with just RALPH running. I changed pref's to take everything out of memory, I exited BOINC, restarted, suspended, rebooted, everything I could think of, and so far RALPH has not errored out on me. Seems to be working good so far.

Jeremy
ID: 896 · Report as offensive    Reply Quote
doc :)

Send message
Joined: 16 Feb 06
Posts: 46
Credit: 4,437
RAC: 0
Message 900 - Posted: 18 Mar 2006, 1:52:19 UTC

no crash through removing from memory here so far either (changed my prefs for rosetta to 1h workunits and put my app switch time to 90 minutes to avoid removing rosettas from memory :))
i still get random crashes when i do have the graphics open though (the exit code -1073741811 (0xc000000d) thing)
ID: 900 · Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Current tests : Switching between projects with applications removed from memory



©2024 University of Washington
http://www.bakerlab.org