Message boards : Current tests : Switching between projects with applications removed from memory
Author | Message |
---|---|
dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0 |
A known bug of rosetta is that the application will die when preempted if your general preferences are not set to "Leave applications in memory while preempted." This bug effects users who are involved in multiple boinc projects and do not leave applications in memory. We may have fixed this bug for windows platforms by using Visual Studio 2005 to build the application instead of Visual Studio 2003. |
UBT - Halifax--lad Send message Joined: 15 Feb 06 Posts: 29 Credit: 2,723 RAC: 0 |
A known bug of rosetta is that the application will die when preempted if your general preferences are not set to "Leave applications in memory while preempted." This bug effects users who are involved in multiple boinc projects and do not leave applications in memory. Indeed you sedem to have done so, I had to reset my computer half way through a WU, to install some updates. BOINC took the WU out of memory which it wasn't supposed to, but I had forgotten to set that option in the 1st place. When I came back on and BOINC loaded it just carried on from where it had left off Join us in Chat (see the forum) Click the Sig Join UBT |
genes Send message Joined: 16 Feb 06 Posts: 45 Credit: 43,706 RAC: 20 |
I'm having a problem with this, but not the one you're trying to fix. BOINC simply does not have enough "venues" to set up custom situations to either test specific things or to tune resources for specific machines. And since it doesn't allow "local control", we have to balance carefully. OK, so "school" is going to have "leave in memory" OFF, all others will have it ON. Otherwise, "school" will be like "home". Good thing I can use the same resource shares for these two. Then, of course, I have to visit all the projects and update on all their web sites, or stuff will be hopelessly confused. |
KWSN Sir Clark Send message Joined: 16 Feb 06 Posts: 4 Credit: 21 RAC: 0 |
One of mine got unceremoniously ditched from memory when I was allowing another project to download more work.......it errored out, even though it was set to remain in memory. www.chris-kent.co.uk aka Chief.com |
[B^S] Doug Worrall Send message Joined: 16 Feb 06 Posts: 10 Credit: 1,515 RAC: 0 |
As a Linux user.Running a Rosetta W/u is a "Non quit boinc" issue and I was hoping this Bug will be fixed by Ralph.Presently"with memory to "saved" in the General Preferences.If I "Quit" a Rosetta w/u by Rebooting {quiting Boinc} The Rosetta w/u is Fubarred 70% of the time. Still waiting on some w/u to crunch. "Salude" Sluger |
Dimitris Hatzopoulos Send message Joined: 16 Feb 06 Posts: 31 Credit: 2,308 RAC: 0 |
I wonder how exactly the process of "removing app from memory" is handled by BOINC and science app. Would e.g. Rosetta lose any data it computed, since its last "checkpoint" (writing temporary results to disk every x minutes or y progress?) I know I could look at the source of some open-source science app like SETI, but ... I thought I'd save a bit of time asking :-) |
genes Send message Joined: 16 Feb 06 Posts: 45 Credit: 43,706 RAC: 20 |
OK, the machine I had set "leave in memory" to OFF had an error on its one WU that it got: https://ralph.bakerlab.org/result.php?resultid=1666 It's not getting any more at the moment (no work from project). It also just had a Rosetta WU error out. I set Rosetta to NNW on that machine for now so I won't lose any more work. This is the machine BTW: https://ralph.bakerlab.org/show_host_detail.php?hostid=76 |
genes Send message Joined: 16 Feb 06 Posts: 45 Credit: 43,706 RAC: 20 |
Yes, I agree. I expect to lose Ralph WU's, but I didn't want to ruin Rosetta WU's, so I am not allowing that machine to get any more Rosetta for the duration of the test. I now have a new Ralph WU on that machine, but with a 4.84 app version. What's new in 4.84? |
Contact Send message Joined: 16 Feb 06 Posts: 20 Credit: 137,458 RAC: 2 |
|
genes Send message Joined: 16 Feb 06 Posts: 45 Credit: 43,706 RAC: 20 |
Had another WU crash, report here: https://ralph.bakerlab.org/forum_thread.php?id=2#178 |
Dimitris Hatzopoulos Send message Joined: 16 Feb 06 Posts: 31 Credit: 2,308 RAC: 0 |
Any suggestions on the kinds of stress-tests we should try on RALPH WUs, to "speed things up"? Any recommended settings? I have # hours to run set to 4. Is there a point in reducing it even more (if one doesn't care about the download overheads) to get more WU samples? Or reduce "Switch between applications every" to 30min? (from 60) again to "force" more removes from mem? Also, is there a phase in Rosetta's progress (e.g. <10% progress) that a WU is more susceptible to the dreaded "Computation error", due to checkpointing or whatever? Since everytime a user manually requests an update, BOINC does a request_reschedule_cpus, which removes currently running apps from memory and resumes/starts others. So, one can manually force multiple app removal from mem actions, not having to wait 60min. |
Angus Send message Joined: 17 Feb 06 Posts: 10 Credit: 1,007 RAC: 0 |
https://ralph.bakerlab.org/result.php?resultid=3669 Nothing in log for almost 30 minutes prior to error. Not task switching. NOT left in memory. Just crashed. |
genes Send message Joined: 16 Feb 06 Posts: 45 Credit: 43,706 RAC: 20 |
I have a 4.85 WU now. Are these new changes for the "Leave In Memory = NO" bug? |
dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0 |
The recent app update had a few fixes in the cpu run time code. We should continue to test for the leave in memory bug to get an idea of what fraction of computers are actually having this problem. I am going to update the production R@h application soon since the success rates so far look better. We are still seeing a few of the "0xffffffffc0000005" crashes and I am not sure if they are all due to preemption crashes or also include random crashes that are common on Windows platforms. The major change for windows was switching to Visual Studio 2005 from 2003. There were some significant compiler fixes particularly with optimization and we were hoping that the change would produce a more stable build. It has definitely fixed some other issues we were having with specific types of experiments that were not effecting results and science but were showing some unexpected but benign behaviour. The optimized Windows build with VS2005 now produces results that are very consistent with the linux build given the same random seed. |
genes Send message Joined: 16 Feb 06 Posts: 45 Credit: 43,706 RAC: 20 |
Thanks for the info. :-) |
Dimitris Hatzopoulos Send message Joined: 16 Feb 06 Posts: 31 Credit: 2,308 RAC: 0 |
Can we now test the newest "production" R@H (Win/v4.82 and Linux/v4.81) executables with "Leave preempted app in mem"=NO ? Otherwise, we still can't test RALPH (for this particular bug) and still run Rosetta@Home on same PC, as suggested per RALPH FAQ |
dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0 |
Yes, the applications are now equivalent. |
River~~ Send message Joined: 20 Feb 06 Posts: 20 Credit: 503 RAC: 0 |
hi David, a similar question based around the keep in memory issue. Am I right that where a machine is turned off daily, it would be useful to have the cpu time set long enough to force every WU to experience at least one power cycle? So with the machine left on for 7hrs/day, I'd set the cpu time well over 7hrs for example. River~~ |
Aglarond Send message Joined: 16 Feb 06 Posts: 11 Credit: 1,094 RAC: 0 |
Hi, is switching between projects the same problem as with going to standby with my PC? As I have laptop running rosetta, I usually go to standby when I want to take it elsewhere. And Rosetta (also Ralph) always crashes like this: 22. 2. 2006 13:53:32|ralph@home|Result BARCODE_30_1cc8A_215_35_0 exited with zero status but no 'finished' file 22. 2. 2006 13:53:32|ralph@home|If this happens repeatedly you may need to reset the project. 22. 2. 2006 13:53:32||Rescheduling CPU: application exited 22. 2. 2006 13:53:32|ralph@home|Restarting result BARCODE_30_1cc8A_215_35_0 using rosetta_beta version 486 This may help you find the problem: when I go to standby mode and then wake up my laptop in very short time (5 sec), rosetta continues normally and also graphic window continues as if nothing happend. But if I leave it in standby just a litte longer (15 sec), rosetta crashes and graphic window closes. The same happens with some other Boinc projects. |
Aglarond Send message Joined: 16 Feb 06 Posts: 11 Credit: 1,094 RAC: 0 |
Hmm.. no other apps ever had problems with it, except Boinc projects. But still it can be problem with my laptop. I tried it also running on AC power, and also running on battery power - both with the same result. |
Message boards :
Current tests :
Switching between projects with applications removed from memory
©2024 University of Washington
http://www.bakerlab.org