Report \"failure when switching projects without keeping applications in memory\" bugs here

Message boards : RALPH@home bug list : Report \"failure when switching projects without keeping applications in memory\" bugs here

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Psycodad

Send message
Joined: 16 Feb 06
Posts: 14
Credit: 2,157
RAC: 0
Message 592 - Posted: 25 Feb 2006, 0:25:13 UTC

Here we go!


24.02.2006 16:12:01|climateprediction.net|Restarting result sulphur_in3i_100869742_0 using sulphur_cycle version 422
24.02.2006 16:12:01|ralph@home|Pausing result BARCODE_30_2chf__219_2_0 (removed from memory)
24.02.2006 16:12:03|ralph@home|Unrecoverable error for result BARCODE_30_2chf__219_2_0 ( - exit code -164 (0xffffff5c))
24.02.2006 16:12:03||request_reschedule_cpus: process exited



Result
WU

Preferences still set to "Leave in Memory" - No
ID: 592 · Report as offensive    Reply Quote
Aglarond

Send message
Joined: 16 Feb 06
Posts: 11
Credit: 1,094
RAC: 0
Message 597 - Posted: 25 Feb 2006, 1:25:34 UTC

the same error with ResultID = 9547:

25. 2. 2006 1:02:16|ralph@home|Starting result BARCODE_30_1bk2__221_3_0 using rosetta_beta version 488
25. 2. 2006 2:02:16|ralph@home|Pausing result BARCODE_30_1bk2__221_3_0 (removed from memory)
25. 2. 2006 2:02:16|boincsimap|Starting result 200602217.018957_3 using simap version 507
25. 2. 2006 2:02:17|ralph@home|Unrecoverable error for result BARCODE_30_1bk2__221_3_0 ( - exit code -164 (0xffffff5c))
25. 2. 2006 2:02:17||Rescheduling CPU: application exited
25. 2. 2006 2:02:17|ralph@home|Computation for result BARCODE_30_1bk2__221_3_0 finished
ID: 597 · Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 16 Feb 06
Posts: 251
Credit: 0
RAC: 0
Message 604 - Posted: 25 Feb 2006, 3:54:59 UTC - in response to Message 592.  

Here we go!


24.02.2006 16:12:01|climateprediction.net|Restarting result sulphur_in3i_100869742_0 using sulphur_cycle version 422
24.02.2006 16:12:01|ralph@home|Pausing result BARCODE_30_2chf__219_2_0 (removed from memory)
24.02.2006 16:12:03|ralph@home|Unrecoverable error for result BARCODE_30_2chf__219_2_0 ( - exit code -164 (0xffffff5c))
24.02.2006 16:12:03||request_reschedule_cpus: process exited



Result
WU

Preferences still set to "Leave in Memory" - No


Isn't running CPDN with leave in memory turned off costing you a lof of cycles for that project?

Moderator9
RALPH@home FAQs
RALPH@home Guidelines
Moderator Contact
ID: 604 · Report as offensive    Reply Quote
Psycodad

Send message
Joined: 16 Feb 06
Posts: 14
Credit: 2,157
RAC: 0
Message 630 - Posted: 25 Feb 2006, 14:51:34 UTC

Actually i did not pay attention to this and was not aware about the fact, that leaving in memory set to no, could cost a lot of cylces

How can i test or see , if this preferences cost me a lot cycles?

Anyway the first time of switching projects with the RalphVersion 4.88 went well
i will keep an Eye on it ^^


Edit: Sorry for my bad English
ID: 630 · Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 16 Feb 06
Posts: 251
Credit: 0
RAC: 0
Message 631 - Posted: 25 Feb 2006, 15:10:57 UTC - in response to Message 630.  
Last modified: 25 Feb 2006, 15:13:20 UTC

Actually i did not pay attention to this and was not aware about the fact, that leaving in memory set to no, could cost a lot of cylces

How can i test or see , if this preferences cost me a lot cycles?

Anyway the first time of switching projects with the RalphVersion 4.88 went well
i will keep an Eye on it ^^


Edit: Sorry for my bad English



First your english is fine.

CPDN only checkpoints every 15 min. If you do not keep application in memory during switching, it could loose as much as 15 min. every time CPDN Stops and restarts. So the recommendation at CPDN is to set "keep applications in memory" =YES, to prevent this loss of time.

If you watch CPDN CPU time just before it swaps and just after it starts you will see the time drop back a little (up to 15 min). A faster test is to pick a time when the CPU time on the CPDN WU is showing something like xxx:10:xx. suspend the CPDN model, and restart the CPDN model. You will see it loose some time when it restarts. The time loss really mounts up over the length of a model run.

Moderator9
RALPH@home FAQs
RALPH@home Guidelines
Moderator Contact
ID: 631 · Report as offensive    Reply Quote
Psycodad

Send message
Joined: 16 Feb 06
Posts: 14
Credit: 2,157
RAC: 0
Message 632 - Posted: 25 Feb 2006, 15:28:16 UTC

Thank you for this information!

ID: 632 · Report as offensive    Reply Quote
Psycodad

Send message
Joined: 16 Feb 06
Posts: 14
Credit: 2,157
RAC: 0
Message 652 - Posted: 25 Feb 2006, 20:50:23 UTC
Last modified: 25 Feb 2006, 20:50:40 UTC

It happened again!
This WU crashed after switching to CPDN. It was for the second time :(



25.02.2006 21:25:23|climateprediction.net|Restarting result sulphur_in3i_100869742_0 using sulphur_cycle version 422
25.02.2006 21:25:23|ralph@home|Pausing result BARCODE_30_1aiu__221_19_0 (removed from memory)
25.02.2006 21:25:28|climateprediction.net|Sending scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
25.02.2006 21:25:33|climateprediction.net|Scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded
25.02.2006 21:25:36|ralph@home|Unrecoverable error for result BARCODE_30_1aiu__221_19_0 ( - exit code -1073741819 (0xc0000005))
25.02.2006 21:25:36||request_reschedule_cpus: process exited
25.02.2006 21:25:36|ralph@home|Computation for result BARCODE_30_1aiu__221_19_0 finished




Result

WU


Should I set the preferences to "Leave in memory"-yes for the next time and look what will happen then?
ID: 652 · Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 16 Feb 06
Posts: 251
Credit: 0
RAC: 0
Message 679 - Posted: 26 Feb 2006, 14:06:17 UTC - in response to Message 652.  

It happened again!
This WU crashed after switching to CPDN. It was for the second time :(



25.02.2006 21:25:23|climateprediction.net|Restarting result sulphur_in3i_100869742_0 using sulphur_cycle version 422
25.02.2006 21:25:23|ralph@home|Pausing result BARCODE_30_1aiu__221_19_0 (removed from memory)
25.02.2006 21:25:28|climateprediction.net|Sending scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
25.02.2006 21:25:33|climateprediction.net|Scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded
25.02.2006 21:25:36|ralph@home|Unrecoverable error for result BARCODE_30_1aiu__221_19_0 ( - exit code -1073741819 (0xc0000005))
25.02.2006 21:25:36||request_reschedule_cpus: process exited
25.02.2006 21:25:36|ralph@home|Computation for result BARCODE_30_1aiu__221_19_0 finished




Result

WU


Should I set the preferences to "Leave in memory"-yes for the next time and look what will happen then?

Yes I would try that and see if it fixes the problem. Let us know what happens
Moderator9
RALPH@home FAQs
RALPH@home Guidelines
Moderator Contact
ID: 679 · Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 17 Feb 06
Posts: 50
Credit: 55,397
RAC: 0
Message 699 - Posted: 27 Feb 2006, 8:48:08 UTC

Many Client errors on this computer due to swap outs.

https://ralph.bakerlab.org/show_host_detail.php?hostid=611

ID: 699 · Report as offensive    Reply Quote
Profile sslickerson

Send message
Joined: 15 Feb 06
Posts: 17
Credit: 4,006
RAC: 0
Message 709 - Posted: 27 Feb 2006, 23:34:34 UTC
Last modified: 27 Feb 2006, 23:43:18 UTC

Swap out failure:

3/27/2006 4:05:54 PM|ralph@home|Computation for result BARCODE_30_1cc8A_215_1_1 finished
3/27/2006 4:05:54 PM|ralph@home|Output file BARCODE_30_1cc8A_215_1_1_0 for result BARCODE_30_1cc8A_215_1_1 exceeds size limit.
3/27/2006 4:05:54 PM|ralph@home|File size: 148638304.000000 bytes. Limit: 25000000.000000 bytes
3/27/2006 4:05:54 PM||Allowing work fetch again.
3/27/2006 4:05:54 PM||Resuming round-robin CPU scheduling.
3/27/2006 4:05:55 PM|ralph@home|Unrecoverable error for result BARCODE_30_1cc8A_215_1_1 (<file_xfer_error> <file_name>BARCODE_30_1cc8A_215_1_1_0</file_name> <error_code>-131</error_code> <error_message></error_message></file_xfer_error>)
3/27/2006 4:05:56 PM|ralph@home|Started upload of BARCODE_30_1cc8A_215_1_1_1

Result

Workunit

Other Recent failures




ID: 709 · Report as offensive    Reply Quote
Profile sslickerson

Send message
Joined: 15 Feb 06
Posts: 17
Credit: 4,006
RAC: 0
Message 724 - Posted: 28 Feb 2006, 6:54:02 UTC

Had this failure after I pulled Ralph out of memory--My fault, sorry :(

2/27/2006 11:53:05 PM|ralph@home|Unrecoverable error for result HOMSdc_homDB024_1dcj__229_3_0 (Incorrect function. (0x1) - exit code 1 (0x1))
2/27/2006 11:53:05 PM||request_reschedule_cpus: process exited
2/27/2006 11:53:05 PM|ralph@home|Computation for result HOMSdc_homDB024_1dcj__229_3_0 finished
2/27/2006 11:54:08 PM|ralph@home|Sending scheduler request to https://ralph.bakerlab.org/ralph_cgi/cgi
2/27/2006 11:54:08 PM|ralph@home|Reason: To fetch work
2/27/2006 11:54:08 PM|ralph@home|Requesting 8640 seconds of new work, and reporting 1 results
2/27/2006 11:54:13 PM|ralph@home|Scheduler request to https://ralph.bakerlab.org/ralph_cgi/cgi succeeded
2/27/2006 11:54:13 PM|ralph@home|Message from server: No work sent
2/27/2006 11:54:13 PM|ralph@home|Message from server: (reached daily quota of 1 results)
2/27/2006 11:54:13 PM|ralph@home|No work from project

Result

WUID

Host




ID: 724 · Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 17 Feb 06
Posts: 50
Credit: 55,397
RAC: 0
Message 776 - Posted: 1 Mar 2006, 20:59:32 UTC

4 successful runs and one client error (Incorrect function. (0x1) - exit code 1 (0x1)) since reseting to leave app in memory.
XP Pro.
Not sure when the switch to 4.90 happened during this run.

ID: 776 · Report as offensive    Reply Quote
Psycodad

Send message
Joined: 16 Feb 06
Posts: 14
Credit: 2,157
RAC: 0
Message 777 - Posted: 1 Mar 2006, 22:41:46 UTC - in response to Message 679.  

It happened again!
This WU crashed after switching to CPDN. It was for the second time :(



25.02.2006 21:25:23|climateprediction.net|Restarting result sulphur_in3i_100869742_0 using sulphur_cycle version 422
25.02.2006 21:25:23|ralph@home|Pausing result BARCODE_30_1aiu__221_19_0 (removed from memory)
25.02.2006 21:25:28|climateprediction.net|Sending scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
25.02.2006 21:25:33|climateprediction.net|Scheduler request to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded
25.02.2006 21:25:36|ralph@home|Unrecoverable error for result BARCODE_30_1aiu__221_19_0 ( - exit code -1073741819 (0xc0000005))
25.02.2006 21:25:36||request_reschedule_cpus: process exited
25.02.2006 21:25:36|ralph@home|Computation for result BARCODE_30_1aiu__221_19_0 finished




Result

WU


Should I set the preferences to "Leave in memory"-yes for the next time and look what will happen then?

Yes I would try that and see if it fixes the problem. Let us know what happens



All right ^^
Test with Preferences set to " Leave in memory" - yes, finished without any problems.
For the next WU, i will change it back to "no"

ID: 777 · Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 17 Feb 06
Posts: 50
Credit: 55,397
RAC: 0
Message 814 - Posted: 5 Mar 2006, 7:10:05 UTC

It is safe to say this problem was not fixed with 4.91.

I ran 3 units with remove from memory, all failed:

3/4/2006 1:08:11 AM|ralph@home|Pausing result BARCODE_30_1opd__236_10_0 (removed from memory)
3/4/2006 1:08:11 AM|SETI@home|Starting result 04ap01ab.1337.20720.97162.1.69_2 using setiathome version 418
3/4/2006 1:08:13 AM|ralph@home|Unrecoverable error for result BARCODE_30_1opd__236_10_0 ( - exit code -1073741819 (0xc0000005))
3/4/2006 1:08:13 AM||request_reschedule_cpus: process exited
3/4/2006 1:08:13 AM|ralph@home|Computation for result BARCODE_30_1opd__236_10_0 finished

https://ralph.bakerlab.org/result.php?resultid=13917

3/4/2006 2:59:48 AM|ralph@home|Scheduler request to https://ralph.bakerlab.org/ralph_cgi/cgi succeeded
3/4/2006 4:22:24 AM|ralph@home|Pausing result BARCODE_30_1fna__236_10_0 (removed from memory)
3/4/2006 4:22:24 AM|SETI@home|Starting result 21mr01aa.28899.15153.17324.1.120_0 using setiathome version 418
3/4/2006 4:22:26 AM|ralph@home|Unrecoverable error for result BARCODE_30_1fna__236_10_0 ( - exit code -1073741819 (0xc0000005))
3/4/2006 4:22:26 AM||request_reschedule_cpus: process exited
3/4/2006 4:22:26 AM|ralph@home|Computation for result BARCODE_30_1fna__236_10_0 finished

https://ralph.bakerlab.org/result.php?resultid=13918


3/4/2006 12:45:18 PM||request_reschedule_cpus: files downloaded
3/4/2006 12:45:18 PM|ralph@home|Pausing result BARCODE_30_1bm8__236_10_0 (removed from memory)
3/4/2006 12:45:18 PM|SETI@home|Starting result 15fe03aa.3675.3361.892344.1.108_1 using setiathome version 418
3/4/2006 12:45:20 PM|SETI@home|Finished download of 07au01ab.18308.26112.359658.1.146
3/4/2006 12:45:20 PM|SETI@home|Throughput 93799 bytes/sec
3/4/2006 12:45:20 PM|SETI@home|Finished download of 07au01ab.18308.26112.359658.1.141
3/4/2006 12:45:20 PM|SETI@home|Throughput 201484 bytes/sec
3/4/2006 12:45:20 PM|SETI@home|Started download of 07au01ab.18308.26112.359658.1.142
3/4/2006 12:45:21 PM|ralph@home|Unrecoverable error for result BARCODE_30_1bm8__236_10_0 ( - exit code -1073741819 (0xc0000005))
3/4/2006 12:45:21 PM||request_reschedule_cpus: process exited
3/4/2006 12:45:21 PM||request_reschedule_cpus: files downloaded
3/4/2006 12:45:21 PM||request_reschedule_cpus: files downloaded
3/4/2006 12:45:21 PM|ralph@home|Computation for result BARCODE_30_1bm8__236_10_0 finished

https://ralph.bakerlab.org/result.php?resultid=13919

I will now change over to leave in memory and report back.


ID: 814 · Report as offensive    Reply Quote
Psycodad

Send message
Joined: 16 Feb 06
Posts: 14
Credit: 2,157
RAC: 0
Message 822 - Posted: 6 Mar 2006, 19:50:12 UTC

I have to aggree with Mike

This Wu, crashed after 5 hours of crunching.
Before this failure it switched several times without any problems

06.03.2006 20:48:25|ralph@home|Pausing result BARCODE_30_1a19A_225_3_1 (removed from memory)
06.03.2006 20:48:25|Einstein@Home|Restarting result r1_0197.0__162_S4R2a_0 using albert version 437
06.03.2006 20:49:14|ralph@home|Unrecoverable error for result BARCODE_30_1a19A_225_3_1 ( - exit code -1073741819 (0xc0000005))
06.03.2006 20:49:14||request_reschedule_cpus: process exited
06.03.2006 20:49:14|ralph@home|Computation for result BARCODE_30_1a19A_225_3_1 finished


WU
Result
ID: 822 · Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 17 Feb 06
Posts: 50
Credit: 55,397
RAC: 0
Message 823 - Posted: 6 Mar 2006, 20:02:22 UTC

I only had one more work unit when I switched over to leave app in memory. It completed successfully.

https://ralph.bakerlab.org/result.php?resultid=13901

ID: 823 · Report as offensive    Reply Quote
pisi78

Send message
Joined: 16 Feb 06
Posts: 7
Credit: 2,020
RAC: 0
Message 828 - Posted: 7 Mar 2006, 12:37:41 UTC

https://ralph.bakerlab.org/workunit.php?wuid=11287


naturally with application not keeped in memory and running seti and einstein :)


ID: 828 · Report as offensive    Reply Quote
MatthewBChambers

Send message
Joined: 13 Mar 06
Posts: 4
Credit: 5,367
RAC: 0
Message 961 - Posted: 23 Mar 2006, 5:07:10 UTC

Hi, I am not sure if this is to post here or not--it is an error from rosetta but happened when I have ralph going.


3/22/2006 6:47:26 PM|rosetta@home|Starting result FA_RLXwh_hom015_1who__362_15_0 using rosetta version 482
3/22/2006 6:47:27 PM||request_reschedule_cpus: process exited
3/22/2006 7:27:20 PM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
3/22/2006 7:27:20 PM|rosetta@home|Reason: To report results
3/22/2006 7:27:20 PM|rosetta@home|Reporting 1 results
3/22/2006 7:27:25 PM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
3/22/2006 7:47:28 PM|climateprediction.net|Restarting result sulphur_igi8_000861200_0 using sulphur_cycle version 422
3/22/2006 7:47:28 PM|rosetta@home|Pausing result FA_RLXwh_hom015_1who__362_15_0 (removed from memory)
3/22/2006 7:47:30 PM|rosetta@home|Unrecoverable error for result FA_RLXwh_hom015_1who__362_15_0 ( - exit code -164 (0xffffff5c))
3/22/2006 7:47:30 PM||request_reschedule_cpus: process exited
3/22/2006 7:47:30 PM|rosetta@home|Computation for result FA_RLXwh_hom015_1who__362_15_0 finished



Here are the BOINC 'startup' messages if it helps

3/21/2006 11:44:05 AM||Starting BOINC client version 5.2.13 for windows_intelx86
3/21/2006 11:44:05 AM||libcurl/7.14.0 OpenSSL/0.9.8 zlib/1.2.3
3/21/2006 11:44:05 AM||Data directory: C:Program FilesBOINC
3/21/2006 11:44:06 AM||Processor: 1 GenuineIntel Intel(R) Pentium(R) 4 CPU 1.60GHz
3/21/2006 11:44:06 AM||Memory: 511.23 MB physical, 1.22 GB virtual
3/21/2006 11:44:06 AM||Disk: 37.30 GB total, 8.68 GB free
3/21/2006 11:44:06 AM|rosetta@home|Computer ID: 57038; location: home; project prefs: default
3/21/2006 11:44:06 AM|boincsimap|Computer ID: 6371; location: home; project prefs: default
3/21/2006 11:44:06 AM|climateprediction.net|Computer ID: 265183; location: ; project prefs: default
3/21/2006 11:44:06 AM|Einstein@Home|Computer ID: 450344; location: home; project prefs: default
3/21/2006 11:44:06 AM|LHC@home|Computer ID: 77254; location: ; project prefs: default
3/21/2006 11:44:06 AM|Predictor @ Home|Computer ID: 169507; location: home; project prefs: default
3/21/2006 11:44:06 AM|ralph@home|Computer ID: 1791; location: ; project prefs: default
3/21/2006 11:44:06 AM|SETI@home|Computer ID: 1585570; location: home; project prefs: default
3/21/2006 11:44:06 AM|SZTAKI Desktop Grid|Computer ID: 11497; location: home; project prefs: default
3/21/2006 11:44:06 AM|World Community Grid|Computer ID: 22915; location: ; project prefs: default
3/21/2006 11:44:06 AM||General prefs: from ralph@home (last modified 2006-03-18 11:57:57)
3/21/2006 11:44:06 AM||General prefs: using your defaults
3/21/2006 11:44:07 AM||Remote control not allowed; using loopback address

ID: 961 · Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 17 Feb 06
Posts: 50
Credit: 55,397
RAC: 0
Message 1001 - Posted: 27 Mar 2006, 20:01:21 UTC

Eight work units in a row have completed without a hitch. All were 4.94 running 8 hours of CPU time with swap outs (out of memory) every 2 hours.

Version 4.95 is in the Queue.

Win 2000 SP4 Intel Pent 4 2.40GHz

Looks like this one is put to bed. Thanks!

ID: 1001 · Report as offensive    Reply Quote
Previous · 1 · 2 · 3

Message boards : RALPH@home bug list : Report \"failure when switching projects without keeping applications in memory\" bugs here



©2024 University of Washington
http://www.bakerlab.org