Message boards : RALPH@home bug list : Report \"failure when switching projects without keeping applications in memory\" bugs here
Author | Message |
---|---|
dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0 |
We would like volunteers to test this known bug. |
William Senn Send message Joined: 16 Feb 06 Posts: 4 Credit: 30,895 RAC: 0 |
We would like volunteers to test this known bug. I could do this for you, if you want me to, got everything setup, there's nothing downloaded yet, though... William Senn.. |
Krunchin-Keith [USA] Send message Joined: 15 Feb 06 Posts: 6 Credit: 638 RAC: 0 |
I am testing this with host 34 I have a 30 minute time slice so all work units should get swapped out at least once. the fisrt wuid 1410 result 1445 finished after 00:57:53, but has not been reported yet. the second wuid 1373 result 1408 shows a computational error after 00:55:38 run time. At 07:53:33 AM it was paused (removed from memory), Then at 07:53:34 AM it shows unrecoverable error ( - exit code -164 (0xffffff5c)) and then finished. I have 15 more to process. I am going on vacation, sorry bad timing, I will let them run and they will be reported when finished but I will not be able to comment on them. but I'm set for no more work so only those now showing for my one host will run. When I return I can do more and put some of my other krunchers on this too. |
KWSN Sir Clark Send message Joined: 16 Feb 06 Posts: 4 Credit: 21 RAC: 0 |
This unit errored out when I took Sztaki off No New Work.........it seemed to be dumped out of the memory for some reason even though I'm set up to keep apps in memory. https://ralph.bakerlab.org/result.php?resultid=1611 Host Project Date Message CK ck --- 16/02/2006 21:31:31 request_reschedule_cpus: project op CK ck BBC Climate Change Experiment 16/02/2006 21:31:31 Restarting result hadcm3l_0h9f_00022824_0 using hadcm3l version 507 CK ck ralph@home 16/02/2006 21:31:31 Pausing result HBLR_1.0_1dtj_206_25_0 (removed from memory) CK ck ralph@home 16/02/2006 21:31:32 Unrecoverable error for result HBLR_1.0_1dtj_206_25_0 ( - exit code -1073741819 (0xc0000005)) CK ck --- 16/02/2006 21:31:33 request_reschedule_cpus: process exited CK ck ralph@home 16/02/2006 21:31:33 Computation for result HBLR_1.0_1dtj_206_25_0 finished CK ck SZTAKI Desktop Grid 16/02/2006 21:31:35 Sending scheduler request to http://szdg.lpds.sztaki.hu/szdg/cgi-bin/scheduler Messages are from BoincView........ www.chris-kent.co.uk aka Chief.com |
UBT - Halifax--lad Send message Joined: 15 Feb 06 Posts: 29 Credit: 2,723 RAC: 0 |
This unit errored out when I took Sztaki off No New Work.........it seemed to be dumped out of the memory for some reason even though I'm set up to keep apps in memory. Double check your preempt in memory setting on the preferences on this project, for some reason mine was set to No when I joined up, usually it automatically says Yes on other projects I join, that may be why it was removed from memory Join us in Chat (see the forum) Click the Sig Join UBT |
Pieface Send message Joined: 16 Feb 06 Posts: 64 Credit: 203,513 RAC: 0 |
this seems similar to the one reported in message nr 42: i'm running seti/albert/ralph 1/3 each swapping out every hour on a WIN XP P4 with HT. I was watching one ralph WU on screensaver when the swap hit. The screensaver went away (and that unit seems to be ok), but the other ralph WU that was running concurrently bombed out : Unrecoverable error for result barcode_30_1bq9a_native_208_2_0 ( - exit code -164 (0xffff5c)) CPID 263, WU 1920, Res ID 1972 |
Dimitris Hatzopoulos Send message Joined: 16 Feb 06 Posts: 31 Credit: 2,308 RAC: 0 |
Double check your preempt in memory setting on the preferences on this project, for some reason mine was set to No when I joined up, usually it automatically says Yes on other projects I join, that may be why it was removed from memory "Leave in memory when preempted" is a BOINC GLOBAL default, that "propagates" across all projects. BOINC uses the config from the project with with the newest time-stamp. i.e. if you run SETI/Rosetta/RALPH and change RALPH today, those RALPH settings (e.g. "Leave app in mem" to NO) will be used by all other projects. It wasn't quite clear to me too, and I had to look for this "detail". |
genes Send message Joined: 16 Feb 06 Posts: 45 Credit: 43,706 RAC: 20 |
Oops -- I should have put this here, but instead I put it in the "current tests" area. Well, here it is anyway. OK, the machine I had set "leave in memory" to OFF had an error on its one WU that it got: https://ralph.bakerlab.org/result.php?resultid=1666 It's not getting any more at the moment (no work from project). It also just had a Rosetta WU error out. I set Rosetta to NNW on that machine for now so I won't lose any more work. This is the machine BTW: https://ralph.bakerlab.org/show_host_detail.php?hostid=76 |
Aaron Finney Send message Joined: 16 Feb 06 Posts: 56 Credit: 1,457 RAC: 0 |
We would like volunteers to test this known bug. WORKS HERE - at least when forced, using "suspend" under the projects tab. Didn't work on the old rosetta! Le Roi es mort! Vivè le roi! Now I'll let it churn for a few days and see if it does it on it's own.. hehe.. |
UBT - Halifax--lad Send message Joined: 15 Feb 06 Posts: 29 Credit: 2,723 RAC: 0 |
Double check your preempt in memory setting on the preferences on this project, for some reason mine was set to No when I joined up, usually it automatically says Yes on other projects I join, that may be why it was removed from memory Yes I know that but it defaults to No on this project so people need to be aware of that and possibly set RALPH up on a different preference (homes, school or work) with preempt set to No, that way it wont interfere with other project preferences Join us in Chat (see the forum) Click the Sig Join UBT |
Pieface Send message Joined: 16 Feb 06 Posts: 64 Credit: 203,513 RAC: 0 |
this seems similar to the one reported in message nr 42: I double checked my settings on this one, my general pref's came from ralph and I have leave in memory set to 'no'. |
Pieface Send message Joined: 16 Feb 06 Posts: 64 Credit: 203,513 RAC: 0 |
this seems similar to the one reported in message nr 42: Just happened again. Looks like when two ralph's are running at the same time and get swapped out simultaneously one of them dies a terrible death. 2006-02-17 16:20:54 [ralph@home] Unrecoverable error for result BARCODE_30_1aiu__NATIVE_210_28_0 ( - exit code -1073741819 (0xc0000005)) |
Dimitris Hatzopoulos Send message Joined: 16 Feb 06 Posts: 31 Credit: 2,308 RAC: 0 |
AFAIK "Leave in memory" is a global default, not per project or per location (work/home/school) and so setting it independantly isn't as easy (if you share the same PC between Rosetta and Ralph) see my deciding on resource share |
Psycodad Send message Joined: 16 Feb 06 Posts: 14 Credit: 2,157 RAC: 0 |
Got my first WU and it errored out after a hour of crunching :( 18.02.2006 00:01:10|SETI@home|Restarting result 18dc04aa.7932.30433.467306.1.218_1 using setiathome version 418 18.02.2006 00:01:10|ralph@home|Pausing result BARCODE_30_1c9oA_NATIVE_210_6_0 (removed from memory) 18.02.2006 00:01:11|ralph@home|Unrecoverable error for result BARCODE_30_1c9oA_NATIVE_210_6_0 ( - exit code -164 (0xffffff5c)) 18.02.2006 00:01:12||request_reschedule_cpus: process exited 18.02.2006 00:01:12|ralph@home|Computation for result BARCODE_30_1c9oA_NATIVE_210_6_0 finished Result Workunit |
genes Send message Joined: 16 Feb 06 Posts: 45 Credit: 43,706 RAC: 20 |
Oh, well, just had a 4.84 WU crash: 2/17/2006 8:39:37 PM|ralph@home|Unrecoverable error for result BARCODE_30_1tig__NATIVE_210_3_0 ( - exit code -1073741819 (0xc0000005)) This one: https://ralph.bakerlab.org/result.php?resultid=2785 This computer: https://ralph.bakerlab.org/show_host_detail.php?hostid=76 So far both WU's this machine has had have crashed. It has "leave in Memory" set to "NO". |
dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0 |
Looks like we are still seeing this problem on some machines. Can people who are having crashes upon preemption check to see if keeping applications in memory fixes the problem. And can people who are not seeing this problem respond also. Thanks! |
genes Send message Joined: 16 Feb 06 Posts: 45 Credit: 43,706 RAC: 20 |
OK, I have just set that machine's "Leave in Memory" to YES. It has had 2/2 failures. Hopefully it'll get some more work soon. I have another machine which has *always* had Leave in Memory set to YES return a good result. This one: https://ralph.bakerlab.org/show_host_detail.php?hostid=81 |
Carlos_Pfitzner Send message Joined: 16 Feb 06 Posts: 182 Credit: 22,792 RAC: 0 |
I switched to w/o keep app into memory to NO *and I am getting this bug https://ralph.bakerlab.org/forum_thread.php?id=33 *However when I used YES - the bug was the same? Click signature for global team stats |
Psycodad Send message Joined: 16 Feb 06 Posts: 14 Credit: 2,157 RAC: 0 |
|
Pieface Send message Joined: 16 Feb 06 Posts: 64 Credit: 203,513 RAC: 0 |
this seems similar to the one reported in message nr 42: By George, I think you've got it! Running two ralph 4.85's this time, still 1/3 each with seti and albert. both ralph units were running and swapped out simultaneously with no abend. I'll watch to make sure both finish OK. |
Message boards :
RALPH@home bug list :
Report \"failure when switching projects without keeping applications in memory\" bugs here
©2024 University of Washington
http://www.bakerlab.org