Message boards : RALPH@home bug list : Report \"failure when switching projects without keeping applications in memory\" bugs here
Previous · 1 · 2 · 3
Author | Message |
---|---|
Psycodad Send message Joined: 16 Feb 06 Posts: 14 Credit: 2,157 RAC: 0 |
Here we go! 24.02.2006 16:12:01|climateprediction.net|Restarting result sulphur_in3i_100869742_0 using sulphur_cycle version 422 24.02.2006 16:12:01|ralph@home|Pausing result BARCODE_30_2chf__219_2_0 (removed from memory) 24.02.2006 16:12:03|ralph@home|Unrecoverable error for result BARCODE_30_2chf__219_2_0 ( - exit code -164 (0xffffff5c)) 24.02.2006 16:12:03||request_reschedule_cpus: process exited Result WU Preferences still set to "Leave in Memory" - No |
Aglarond Send message Joined: 16 Feb 06 Posts: 11 Credit: 1,094 RAC: 0 |
the same error with ResultID = 9547: 25. 2. 2006 1:02:16|ralph@home|Starting result BARCODE_30_1bk2__221_3_0 using rosetta_beta version 488 25. 2. 2006 2:02:16|ralph@home|Pausing result BARCODE_30_1bk2__221_3_0 (removed from memory) 25. 2. 2006 2:02:16|boincsimap|Starting result 200602217.018957_3 using simap version 507 25. 2. 2006 2:02:17|ralph@home|Unrecoverable error for result BARCODE_30_1bk2__221_3_0 ( - exit code -164 (0xffffff5c)) 25. 2. 2006 2:02:17||Rescheduling CPU: application exited 25. 2. 2006 2:02:17|ralph@home|Computation for result BARCODE_30_1bk2__221_3_0 finished |
Moderator9 Volunteer moderator Send message Joined: 16 Feb 06 Posts: 251 Credit: 0 RAC: 0 |
Here we go! Isn't running CPDN with leave in memory turned off costing you a lof of cycles for that project? Moderator9 RALPH@home FAQs RALPH@home Guidelines Moderator Contact |
Psycodad Send message Joined: 16 Feb 06 Posts: 14 Credit: 2,157 RAC: 0 |
Actually i did not pay attention to this and was not aware about the fact, that leaving in memory set to no, could cost a lot of cylces How can i test or see , if this preferences cost me a lot cycles? Anyway the first time of switching projects with the RalphVersion 4.88 went well i will keep an Eye on it ^^ Edit: Sorry for my bad English |
Moderator9 Volunteer moderator Send message Joined: 16 Feb 06 Posts: 251 Credit: 0 RAC: 0 |
Actually i did not pay attention to this and was not aware about the fact, that leaving in memory set to no, could cost a lot of cylces First your english is fine. CPDN only checkpoints every 15 min. If you do not keep application in memory during switching, it could loose as much as 15 min. every time CPDN Stops and restarts. So the recommendation at CPDN is to set "keep applications in memory" =YES, to prevent this loss of time. If you watch CPDN CPU time just before it swaps and just after it starts you will see the time drop back a little (up to 15 min). A faster test is to pick a time when the CPU time on the CPDN WU is showing something like xxx:10:xx. suspend the CPDN model, and restart the CPDN model. You will see it loose some time when it restarts. The time loss really mounts up over the length of a model run. Moderator9 RALPH@home FAQs RALPH@home Guidelines Moderator Contact |
Psycodad Send message Joined: 16 Feb 06 Posts: 14 Credit: 2,157 RAC: 0 |
Thank you for this information! |
Psycodad Send message Joined: 16 Feb 06 Posts: 14 Credit: 2,157 RAC: 0 |
It happened again! This WU crashed after switching to CPDN. It was for the second time :( 25.02.2006 21:25:23|climateprediction.net|Restarting result sulphur_in3i_100869742_0 using sulphur_cycle version 422 25.02.2006 21:25:23|ralph@home|Pausing result BARCODE_30_1aiu__221_19_0 (removed from memory) 25.02.2006 21:25:28|climateprediction.net|Sending scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi 25.02.2006 21:25:33|climateprediction.net|Scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded 25.02.2006 21:25:36|ralph@home|Unrecoverable error for result BARCODE_30_1aiu__221_19_0 ( - exit code -1073741819 (0xc0000005)) 25.02.2006 21:25:36||request_reschedule_cpus: process exited 25.02.2006 21:25:36|ralph@home|Computation for result BARCODE_30_1aiu__221_19_0 finished Result WU Should I set the preferences to "Leave in memory"-yes for the next time and look what will happen then? |
Moderator9 Volunteer moderator Send message Joined: 16 Feb 06 Posts: 251 Credit: 0 RAC: 0 |
It happened again! Yes I would try that and see if it fixes the problem. Let us know what happens Moderator9 RALPH@home FAQs RALPH@home Guidelines Moderator Contact |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
Many Client errors on this computer due to swap outs. https://ralph.bakerlab.org/show_host_detail.php?hostid=611 |
sslickerson Send message Joined: 15 Feb 06 Posts: 17 Credit: 4,006 RAC: 0 |
Swap out failure: 3/27/2006 4:05:54 PM|ralph@home|Computation for result BARCODE_30_1cc8A_215_1_1 finished 3/27/2006 4:05:54 PM|ralph@home|Output file BARCODE_30_1cc8A_215_1_1_0 for result BARCODE_30_1cc8A_215_1_1 exceeds size limit. 3/27/2006 4:05:54 PM|ralph@home|File size: 148638304.000000 bytes. Limit: 25000000.000000 bytes 3/27/2006 4:05:54 PM||Allowing work fetch again. 3/27/2006 4:05:54 PM||Resuming round-robin CPU scheduling. 3/27/2006 4:05:55 PM|ralph@home|Unrecoverable error for result BARCODE_30_1cc8A_215_1_1 (<file_xfer_error> <file_name>BARCODE_30_1cc8A_215_1_1_0</file_name> <error_code>-131</error_code> <error_message></error_message></file_xfer_error>) 3/27/2006 4:05:56 PM|ralph@home|Started upload of BARCODE_30_1cc8A_215_1_1_1 Result Workunit Other Recent failures |
sslickerson Send message Joined: 15 Feb 06 Posts: 17 Credit: 4,006 RAC: 0 |
Had this failure after I pulled Ralph out of memory--My fault, sorry :( 2/27/2006 11:53:05 PM|ralph@home|Unrecoverable error for result HOMSdc_homDB024_1dcj__229_3_0 (Incorrect function. (0x1) - exit code 1 (0x1)) 2/27/2006 11:53:05 PM||request_reschedule_cpus: process exited 2/27/2006 11:53:05 PM|ralph@home|Computation for result HOMSdc_homDB024_1dcj__229_3_0 finished 2/27/2006 11:54:08 PM|ralph@home|Sending scheduler request to https://ralph.bakerlab.org/ralph_cgi/cgi 2/27/2006 11:54:08 PM|ralph@home|Reason: To fetch work 2/27/2006 11:54:08 PM|ralph@home|Requesting 8640 seconds of new work, and reporting 1 results 2/27/2006 11:54:13 PM|ralph@home|Scheduler request to https://ralph.bakerlab.org/ralph_cgi/cgi succeeded 2/27/2006 11:54:13 PM|ralph@home|Message from server: No work sent 2/27/2006 11:54:13 PM|ralph@home|Message from server: (reached daily quota of 1 results) 2/27/2006 11:54:13 PM|ralph@home|No work from project Result WUID Host |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
4 successful runs and one client error (Incorrect function. (0x1) - exit code 1 (0x1)) since reseting to leave app in memory. XP Pro. Not sure when the switch to 4.90 happened during this run. |
Psycodad Send message Joined: 16 Feb 06 Posts: 14 Credit: 2,157 RAC: 0 |
It happened again! All right ^^ Test with Preferences set to " Leave in memory" - yes, finished without any problems. For the next WU, i will change it back to "no" |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
It is safe to say this problem was not fixed with 4.91. I ran 3 units with remove from memory, all failed: 3/4/2006 1:08:11 AM|ralph@home|Pausing result BARCODE_30_1opd__236_10_0 (removed from memory) 3/4/2006 1:08:11 AM|SETI@home|Starting result 04ap01ab.1337.20720.97162.1.69_2 using setiathome version 418 3/4/2006 1:08:13 AM|ralph@home|Unrecoverable error for result BARCODE_30_1opd__236_10_0 ( - exit code -1073741819 (0xc0000005)) 3/4/2006 1:08:13 AM||request_reschedule_cpus: process exited 3/4/2006 1:08:13 AM|ralph@home|Computation for result BARCODE_30_1opd__236_10_0 finished https://ralph.bakerlab.org/result.php?resultid=13917 3/4/2006 2:59:48 AM|ralph@home|Scheduler request to https://ralph.bakerlab.org/ralph_cgi/cgi succeeded 3/4/2006 4:22:24 AM|ralph@home|Pausing result BARCODE_30_1fna__236_10_0 (removed from memory) 3/4/2006 4:22:24 AM|SETI@home|Starting result 21mr01aa.28899.15153.17324.1.120_0 using setiathome version 418 3/4/2006 4:22:26 AM|ralph@home|Unrecoverable error for result BARCODE_30_1fna__236_10_0 ( - exit code -1073741819 (0xc0000005)) 3/4/2006 4:22:26 AM||request_reschedule_cpus: process exited 3/4/2006 4:22:26 AM|ralph@home|Computation for result BARCODE_30_1fna__236_10_0 finished https://ralph.bakerlab.org/result.php?resultid=13918 3/4/2006 12:45:18 PM||request_reschedule_cpus: files downloaded 3/4/2006 12:45:18 PM|ralph@home|Pausing result BARCODE_30_1bm8__236_10_0 (removed from memory) 3/4/2006 12:45:18 PM|SETI@home|Starting result 15fe03aa.3675.3361.892344.1.108_1 using setiathome version 418 3/4/2006 12:45:20 PM|SETI@home|Finished download of 07au01ab.18308.26112.359658.1.146 3/4/2006 12:45:20 PM|SETI@home|Throughput 93799 bytes/sec 3/4/2006 12:45:20 PM|SETI@home|Finished download of 07au01ab.18308.26112.359658.1.141 3/4/2006 12:45:20 PM|SETI@home|Throughput 201484 bytes/sec 3/4/2006 12:45:20 PM|SETI@home|Started download of 07au01ab.18308.26112.359658.1.142 3/4/2006 12:45:21 PM|ralph@home|Unrecoverable error for result BARCODE_30_1bm8__236_10_0 ( - exit code -1073741819 (0xc0000005)) 3/4/2006 12:45:21 PM||request_reschedule_cpus: process exited 3/4/2006 12:45:21 PM||request_reschedule_cpus: files downloaded 3/4/2006 12:45:21 PM||request_reschedule_cpus: files downloaded 3/4/2006 12:45:21 PM|ralph@home|Computation for result BARCODE_30_1bm8__236_10_0 finished https://ralph.bakerlab.org/result.php?resultid=13919 I will now change over to leave in memory and report back. |
Psycodad Send message Joined: 16 Feb 06 Posts: 14 Credit: 2,157 RAC: 0 |
I have to aggree with Mike This Wu, crashed after 5 hours of crunching. Before this failure it switched several times without any problems 06.03.2006 20:48:25|ralph@home|Pausing result BARCODE_30_1a19A_225_3_1 (removed from memory) 06.03.2006 20:48:25|Einstein@Home|Restarting result r1_0197.0__162_S4R2a_0 using albert version 437 06.03.2006 20:49:14|ralph@home|Unrecoverable error for result BARCODE_30_1a19A_225_3_1 ( - exit code -1073741819 (0xc0000005)) 06.03.2006 20:49:14||request_reschedule_cpus: process exited 06.03.2006 20:49:14|ralph@home|Computation for result BARCODE_30_1a19A_225_3_1 finished WU Result |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
I only had one more work unit when I switched over to leave app in memory. It completed successfully. https://ralph.bakerlab.org/result.php?resultid=13901 |
pisi78 Send message Joined: 16 Feb 06 Posts: 7 Credit: 2,020 RAC: 0 |
https://ralph.bakerlab.org/workunit.php?wuid=11287 naturally with application not keeped in memory and running seti and einstein :) |
MatthewBChambers Send message Joined: 13 Mar 06 Posts: 4 Credit: 5,367 RAC: 0 |
Hi, I am not sure if this is to post here or not--it is an error from rosetta but happened when I have ralph going.
Here are the BOINC 'startup' messages if it helps
|
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
Eight work units in a row have completed without a hitch. All were 4.94 running 8 hours of CPU time with swap outs (out of memory) every 2 hours. Version 4.95 is in the Queue. Win 2000 SP4 Intel Pent 4 2.40GHz Looks like this one is put to bed. Thanks! |
Message boards :
RALPH@home bug list :
Report \"failure when switching projects without keeping applications in memory\" bugs here
©2024 University of Washington
http://www.bakerlab.org