Switching between projects with applications removed from memory

Message boards : Current tests : Switching between projects with applications removed from memory

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 250
Credit: 543,579
RAC: 0
Message 4 - Posted: 15 Feb 2006, 21:06:16 UTC

A known bug of rosetta is that the application will die when preempted if your general preferences are not set to "Leave applications in memory while preempted." This bug effects users who are involved in multiple boinc projects and do not leave applications in memory.

We may have fixed this bug for windows platforms by using Visual Studio 2005 to build the application instead of Visual Studio 2003.
ID: 4 · Report as offensive    Reply Quote
Profile UBT - Halifax--lad

Send message
Joined: 15 Feb 06
Posts: 29
Credit: 2,723
RAC: 0
Message 28 - Posted: 16 Feb 2006, 8:16:09 UTC - in response to Message 4.  

A known bug of rosetta is that the application will die when preempted if your general preferences are not set to "Leave applications in memory while preempted." This bug effects users who are involved in multiple boinc projects and do not leave applications in memory.

We may have fixed this bug for windows platforms by using Visual Studio 2005 to build the application instead of Visual Studio 2003.


Indeed you sedem to have done so, I had to reset my computer half way through a WU, to install some updates.

BOINC took the WU out of memory which it wasn't supposed to, but I had forgotten to set that option in the 1st place.

When I came back on and BOINC loaded it just carried on from where it had left off

Join us in Chat (see the forum) Click the Sig


Join UBT
ID: 28 · Report as offensive    Reply Quote
genes
Avatar

Send message
Joined: 16 Feb 06
Posts: 45
Credit: 43,300
RAC: 0
Message 38 - Posted: 16 Feb 2006, 13:22:41 UTC

I'm having a problem with this, but not the one you're trying to fix. BOINC simply does not have enough "venues" to set up custom situations to either test specific things or to tune resources for specific machines. And since it doesn't allow "local control", we have to balance carefully.

OK, so "school" is going to have "leave in memory" OFF, all others will have it ON. Otherwise, "school" will be like "home". Good thing I can use the same resource shares for these two. Then, of course, I have to visit all the projects and update on all their web sites, or stuff will be hopelessly confused.

ID: 38 · Report as offensive    Reply Quote
KWSN Sir Clark
Avatar

Send message
Joined: 16 Feb 06
Posts: 4
Credit: 21
RAC: 0
Message 77 - Posted: 16 Feb 2006, 21:37:46 UTC

One of mine got unceremoniously ditched from memory when I was allowing another project to download more work.......it errored out, even though it was set to remain in memory.


www.chris-kent.co.uk aka Chief.com
ID: 77 · Report as offensive    Reply Quote
Profile [B^S] Doug Worrall
Avatar

Send message
Joined: 16 Feb 06
Posts: 10
Credit: 1,515
RAC: 0
Message 84 - Posted: 16 Feb 2006, 22:30:30 UTC


As a Linux user.Running a Rosetta W/u is a "Non quit boinc" issue and I was
hoping this Bug will be fixed by Ralph.Presently"with memory to "saved" in
the General Preferences.If I "Quit" a Rosetta w/u by Rebooting {quiting Boinc}
The Rosetta w/u is Fubarred 70% of the time.
Still waiting on some w/u to crunch.
"Salude"
Sluger

ID: 84 · Report as offensive    Reply Quote
Dimitris Hatzopoulos

Send message
Joined: 16 Feb 06
Posts: 31
Credit: 2,308
RAC: 0
Message 87 - Posted: 16 Feb 2006, 22:59:28 UTC

I wonder how exactly the process of "removing app from memory" is handled by BOINC and science app.

Would e.g. Rosetta lose any data it computed, since its last "checkpoint" (writing temporary results to disk every x minutes or y progress?)

I know I could look at the source of some open-source science app like SETI, but ... I thought I'd save a bit of time asking :-)
ID: 87 · Report as offensive    Reply Quote
genes
Avatar

Send message
Joined: 16 Feb 06
Posts: 45
Credit: 43,300
RAC: 0
Message 108 - Posted: 17 Feb 2006, 2:37:13 UTC
Last modified: 17 Feb 2006, 2:41:30 UTC

OK, the machine I had set "leave in memory" to OFF had an error on its one WU that it got:

https://ralph.bakerlab.org/result.php?resultid=1666

It's not getting any more at the moment (no work from project). It also just had a Rosetta WU error out. I set Rosetta to NNW on that machine for now so I won't lose any more work.

This is the machine BTW:

https://ralph.bakerlab.org/show_host_detail.php?hostid=76

ID: 108 · Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 16 Feb 06
Posts: 251
Credit: 0
RAC: 0
Message 117 - Posted: 17 Feb 2006, 4:32:52 UTC - in response to Message 108.  
Last modified: 17 Feb 2006, 4:47:49 UTC

OK, the machine I had set "leave in memory" to OFF had an error on its one WU that it got:

https://ralph.bakerlab.org/result.php?resultid=1666

It's not getting any more at the moment (no work from project). It also just had a Rosetta WU error out. I set Rosetta to NNW on that machine for now so I won't lose any more work.

This is the machine BTW:

https://ralph.bakerlab.org/show_host_detail.php?hostid=76


For the purposes of the ALPHA testing, you should expect to loose processing time. That is just the nature of testing. If loss of processing time is important to you please consider if the test project is the best use of your system.

Certainly the time spent processing for Ralph is valuable in testing the next generation of Rosetta applications, but credit is not a priority for the testing. The more Work Units you can process the better for the test. For Ralph project details please see This thread

Moderator9
RALPH@home FAQs
RALPH@home Guidelines
Moderator Contact
ID: 117 · Report as offensive    Reply Quote
genes
Avatar

Send message
Joined: 16 Feb 06
Posts: 45
Credit: 43,300
RAC: 0
Message 170 - Posted: 18 Feb 2006, 0:46:09 UTC - in response to Message 117.  


For the purposes of the ALPHA testing, you should expect to loose processing time. That is just the nature of testing. If loss of processing time is important to you please consider if the test project is the best use of your system.


Yes, I agree. I expect to lose Ralph WU's, but I didn't want to ruin Rosetta WU's, so I am not allowing that machine to get any more Rosetta for the duration of the test. I now have a new Ralph WU on that machine, but with a 4.84 app version. What's new in 4.84?

ID: 170 · Report as offensive    Reply Quote
Profile Contact
Avatar

Send message
Joined: 16 Feb 06
Posts: 19
Credit: 132,286
RAC: 0
Message 171 - Posted: 18 Feb 2006, 0:46:48 UTC - in response to Message 4.  

A known bug of rosetta is that the application will die when preempted if your general preferences are not set to "Leave applications in memory while preempted."

Seems ok for me on XP. Even after OS reboot, wu's resume properly and are valid.
Will try on 98 soon.
ID: 171 · Report as offensive    Reply Quote
genes
Avatar

Send message
Joined: 16 Feb 06
Posts: 45
Credit: 43,300
RAC: 0
Message 179 - Posted: 18 Feb 2006, 2:47:15 UTC

Had another WU crash, report here:

https://ralph.bakerlab.org/forum_thread.php?id=2#178

ID: 179 · Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 16 Feb 06
Posts: 251
Credit: 0
RAC: 0
Message 188 - Posted: 18 Feb 2006, 4:26:19 UTC - in response to Message 170.  
Last modified: 18 Feb 2006, 4:30:04 UTC


For the purposes of the ALPHA testing, you should expect to loose processing time. That is just the nature of testing. If loss of processing time is important to you please consider if the test project is the best use of your system.


Yes, I agree. I expect to lose Ralph WU's, but I didn't want to ruin Rosetta WU's, so I am not allowing that machine to get any more Rosetta for the duration of the test. I now have a new Ralph WU on that machine, but with a 4.84 app version. What's new in 4.84?


Frankly, changes are happening so fast now that I do not know what went into the minor update. Perhaps David Kim will chime in on that. But don't be afraid of destroying Work Units. If you beat up a few the project can learn from that.
Moderator9
RALPH@home FAQs
RALPH@home Guidelines
Moderator Contact
ID: 188 · Report as offensive    Reply Quote
Dimitris Hatzopoulos

Send message
Joined: 16 Feb 06
Posts: 31
Credit: 2,308
RAC: 0
Message 189 - Posted: 18 Feb 2006, 4:31:07 UTC
Last modified: 18 Feb 2006, 4:33:43 UTC

Any suggestions on the kinds of stress-tests we should try on RALPH WUs, to "speed things up"? Any recommended settings? I have # hours to run set to 4. Is there a point in reducing it even more (if one doesn't care about the download overheads) to get more WU samples? Or reduce "Switch between applications every" to 30min? (from 60) again to "force" more removes from mem?

Also, is there a phase in Rosetta's progress (e.g. <10% progress) that a WU is more susceptible to the dreaded "Computation error", due to checkpointing or whatever?

Since everytime a user manually requests an update, BOINC does a request_reschedule_cpus, which removes currently running apps from memory and resumes/starts others. So, one can manually force multiple app removal from mem actions, not having to wait 60min.
ID: 189 · Report as offensive    Reply Quote
Profile Angus

Send message
Joined: 17 Feb 06
Posts: 10
Credit: 1,007
RAC: 0
Message 226 - Posted: 18 Feb 2006, 17:41:42 UTC

https://ralph.bakerlab.org/result.php?resultid=3669

Nothing in log for almost 30 minutes prior to error. Not task switching.

NOT left in memory. Just crashed.
ID: 226 · Report as offensive    Reply Quote
genes
Avatar

Send message
Joined: 16 Feb 06
Posts: 45
Credit: 43,300
RAC: 0
Message 247 - Posted: 18 Feb 2006, 22:13:57 UTC

I have a 4.85 WU now. Are these new changes for the "Leave In Memory = NO" bug?

ID: 247 · Report as offensive    Reply Quote
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 250
Credit: 543,579
RAC: 0
Message 276 - Posted: 19 Feb 2006, 1:08:25 UTC

The recent app update had a few fixes in the cpu run time code. We should continue to test for the leave in memory bug to get an idea of what fraction of computers are actually having this problem. I am going to update the production R@h application soon since the success rates so far look better. We are still seeing a few of the "0xffffffffc0000005" crashes and I am not sure if they are all due to preemption crashes or also include random crashes that are common on Windows platforms. The major change for windows was switching to Visual Studio 2005 from 2003. There were some significant compiler fixes particularly with optimization and we were hoping that the change would produce a more stable build. It has definitely fixed some other issues we were having with specific types of experiments that were not effecting results and science but were showing some unexpected but benign behaviour. The optimized Windows build with VS2005 now produces results that are very consistent with the linux build given the same random seed.
ID: 276 · Report as offensive    Reply Quote
genes
Avatar

Send message
Joined: 16 Feb 06
Posts: 45
Credit: 43,300
RAC: 0
Message 284 - Posted: 19 Feb 2006, 1:45:47 UTC

Thanks for the info. :-)
ID: 284 · Report as offensive    Reply Quote
Dimitris Hatzopoulos

Send message
Joined: 16 Feb 06
Posts: 31
Credit: 2,308
RAC: 0
Message 407 - Posted: 21 Feb 2006, 2:54:03 UTC
Last modified: 21 Feb 2006, 2:55:56 UTC

Can we now test the newest "production" R@H (Win/v4.82 and Linux/v4.81) executables with "Leave preempted app in mem"=NO ?

Otherwise, we still can't test RALPH (for this particular bug) and still run Rosetta@Home on same PC, as suggested per RALPH FAQ


ID: 407 · Report as offensive    Reply Quote
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 250
Credit: 543,579
RAC: 0
Message 408 - Posted: 21 Feb 2006, 3:40:51 UTC

Yes, the applications are now equivalent.
ID: 408 · Report as offensive    Reply Quote
River~~

Send message
Joined: 20 Feb 06
Posts: 20
Credit: 503
RAC: 0
Message 430 - Posted: 21 Feb 2006, 17:08:23 UTC

hi David,

a similar question based around the keep in memory issue.

Am I right that where a machine is turned off daily, it would be useful to have the cpu time set long enough to force every WU to experience at least one power cycle? So with the machine left on for 7hrs/day, I'd set the cpu time well over 7hrs for example.

River~~
ID: 430 · Report as offensive    Reply Quote
1 · 2 · 3 · Next

Message boards : Current tests : Switching between projects with applications removed from memory



©2024 University of Washington
http://www.bakerlab.org