Posts by David@home

1) Message boards : RALPH@home bug list : removed from memory by benchmark (Message 819)
Posted 5 Mar 2006 by Profile David@home
Post:
When the benchmark ran it forced RALPH out of memory. Not sure how this can be managed better.


This was a problem with BOINC not with the science apps, I'm not sure which version fixed it (it might be in the development version) but try updating to the current version for your computer.

--Nathan


Cool, if a newer dev version of BOINC handles this better then that is good. Part of the alpha test should be to feedback on the BOINC infrstructure as well if it highlights an issue but it sounds like this is already covered off.
2) Message boards : RALPH@home bug list : Report - Previously Unclassified Work Unit Errors (Message 818)
Posted 5 Mar 2006 by Profile David@home
Post:
This WU had unrecoverable error result 13723

in BOINC log:

05/03/2006 20:37:03|ralph@home|Unrecoverable error for result BARCODE_30_1c8cA_236_4_0 ( - exit code -1073741819 (0xc0000005))

XP Pro SP2, Intel P4 single CPU no HT. BOINC 5.2.13

3) Message boards : RALPH@home bug list : removed from memory by benchmark (Message 816)
Posted 5 Mar 2006 by Profile David@home
Post:
The way to solve it is to manually do a benchmark when a RALPH or Rosetta WU is not in the cache if at all possible

That is NOT a solution - that is a work-around.

A solution would be to fix the Rosetta client application so that it does not lose the work when suspended and removed from memory for the scheduled benchmark calibration.


Exactly. For test projects it is be expected to spend time doing things to help but for production experiments you just want to let BOINC do its stuff. You should not have to micro manage BOINC. Look at the success of the BBC climate project. It must be the fastest growing project which in part will be due to the ease of running it: download, install, enter an email address to register and thats it.

BOINC was developed initially alongside SETI@home which has a 60 second checkpoint cycle. Newer projects that use longer checkpoints do not fit well. E.g. a user that only uses it as a screen saver or set to run when idle could lose a significant amount of elapsed time to complete the CPU time between checkpoints. The BOINC infrastructure needs to manage running the bench mark differently. BOINC should not need to remove clients out of memory, e.g. could it check for free RAM before running the benchmark and just suspend clients? Can Rosetta use a different checkpoint algorithm?



4) Message boards : RALPH@home bug list : removed from memory by benchmark (Message 809)
Posted 3 Mar 2006 by Profile David@home
Post:
More an observation, but one of concern after reading the FAQ about checkpoint times and that it is best to keep Rosetta in memory etc. When the benchmark ran it forced RALPH out of memory. Not sure how this can be managed better.


2006-03-03 09:29:51 [ralph@home] Resuming computation for result BARCODE_30_2ci2I_237_4_0 using rosetta_beta version 4.92
2006-03-03 09:31:38 [---] Suspending computation and network activity - running CPU benchmarks
2006-03-03 09:31:38 [ralph@home] Pausing result BARCODE_30_2ci2I_237_4_0 (removed from memory)
2006-03-03 09:31:39 [---] request_reschedule_cpus: process exited
2006-03-03 09:31:40 [---] Running CPU benchmarks
2006-03-03 09:32:37 [---] Benchmark results:
2006-03-03 09:32:37 [---] Number of CPUs: 1
2006-03-03 09:32:37 [---] 1369 double precision MIPS (Whetstone) per CPU
2006-03-03 09:32:37 [---] 2854 integer MIPS (Dhrystone) per CPU
2006-03-03 09:32:37 [---] Finished CPU benchmarks
2006-03-03 09:32:37 [---] Resuming computation and network activity
2006-03-03 09:32:37 [---] schedule_cpus: must schedule
2006-03-03 09:32:37 [ralph@home] Restarting result BARCODE_30_2ci2I_237_4_0 using rosetta_beta version 4.92
5) Message boards : RALPH@home bug list : Report - Previously Unclassified Work Unit Errors (Message 807)
Posted 3 Mar 2006 by Profile David@home
Post:
This WU finished using v 4.92 and claimed credit but contained some interesting messages so worth a look by the experts:

# Exception caught in nstruct loop ii=1 i=40
# num_decoys:39 attempts:40 cpu_run_time:26311.8

***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x7C910E03 write attempt to address 0x00000000

# cpu_run_time_pref: 28800

WU result is resultid 14051
6) Message boards : RALPH@home bug list : Nice work and new core-clients, but NO WORK !! (Message 794)
Posted 2 Mar 2006 by Profile David@home
Post:
What should i do ??
Without work, no tests ... :(


Just leave the project running, the BOINC core client will keep trying to connect and download work. RALPH is a test project and releases work when something needs to be tested. I am in the same situation, aborted my 4.90 WUs and downloaded 4.91 but no work. Which is a shame as with SETI@home being down I could have crunched quite a bit of RALPHA work.

7) Message boards : RALPH@home bug list : application not staying in memory (Message 782)
Posted 2 Mar 2006 by Profile David@home
Post:

Hmmm, perhaps Windows decided it was time to run one of those findfast-Utilities that scan your harddisks?


I disabled the indexing service on my PC a long time ago as fast search is a pointless CPU wasting activity IMHO. (My Computer > Drive letter > right mouse click > properties > General tab and uncheck "Allow Indexing Service to index this disk for fast file searching". No Google or MSN desktop search either :-) The PC would only have been running SETI at the time. :-(

I have aborted the 4.90 WUs as per the news, anybody know if v 4.91 has any updates to try to address this issue? The project seems to carry on from the last checkpoint but the loss of credit would be an issue in the production environment. E.g if this were to happen one hour from the end of a 10 hour run you would only get credit for the last hour of CPU time. Looking at the result returned this WU dropped out of memory three times so this would be a common problem in production at least on my PC.

http://ralph.bakerlab.org/result.php?resultid=12783


8) Message boards : RALPH@home bug list : application not staying in memory (Message 773)
Posted 1 Mar 2006 by Profile David@home
Post:
Alas not good news.

I updated to BOINC v 5.2.13 and I am currently running Rosetta Beta 4.90. I am still getting the client dropping out of memory when it is swapped for another project and reside in memory is set on.

e.g.

28/02/2006 23:43:04|ralph@home|Result HOMSdi_homDB018_1di2__228_10_0 exited with zero status but no 'finished' file
28/02/2006 23:43:04|ralph@home|If this happens repeatedly you may need to reset the project.

When this happens you lose all credit for the work done up to this point and it restarts calculating credit when the client is becomes active again. Not an issue for RALPHA but one which would stop me running it on Rosseta live system.

The PC was only running SETI@home at the time above, no user activity, no backup, no antivirus etc was running. The PC has 1GB of RAM so there is no issue with physical memory availability.



Any ideas?
9) Message boards : RALPH@home bug list : Rosetta does not give up CPU time to cleanmgr.exe (Message 571)
Posted 24 Feb 2006 by Profile David@home
Post:
It kindof defeats the purpose to disable the process, if I WANT to run it yeah?



No. RALPH is a test project so this test will show if it is the compress files check that is causing an issue with RALPH or not. You can restore it back as per the instructions afterwards.
10) Message boards : RALPH@home bug list : application not staying in memory (Message 561)
Posted 24 Feb 2006 by Profile David@home
Post:
I can update BOINC Mgr and see if this helps.


Previous 20 · Next 20



©2024 University of Washington
http://www.bakerlab.org