Posts by Dimitris Hatzopoulos

21) Message boards : RALPH@home bug list : Discussion of the \"1% Hang\" issue (Message 341)
Posted 20 Feb 2006 by Dimitris Hatzopoulos
Post:
Sorry for intervening, but I'm trying to understand how to tell the difference of various bugs.

Carlos, does your Rosetta executable keep running? consuming 100% of CPU time? (as seen via Win Task Manager (alt-ctrl-del etc) or using some tool like ProcessExplorer (free, standalone exe, no install required, I've been using it for years)

Because I've never encountered a Rosetta WU that "stuck", consuming 100% CPU time, ad infinitum. The ones I've seen "stuck" were all stopped (loaded in memory, BOINC thought they were running, but "top" or "ps" revealed that Rosetta wasn't running, it was "SN"=stopped,nice).

And, killing just the Rosetta-task (not ./boinc or anything else, which has been happily running for 1+ month now continuously) will have BOINC re-start the WU with different random-seed and it'll finish OK this time (on the handful of ocassions I encountered sofar).
22) Message boards : Number crunching : rosetta_beta_4.83_i686-pc-linux-gnu -> frozen (Message 267)
Posted 19 Feb 2006 by Dimitris Hatzopoulos
Post:
Could be a coincidence, but also in my case, the problems of Rosetta v4.80 process being stuck "frozen" (not consuming any CPU time) appear on a Linux (Debian Sarge kernel 2.4.27) machine with only 256MB RAM (but plenty of virtual).

No other science apps had a problem on that machine (tried 6 projects), including WCG/HPF which is running an older version of Rosetta v4.21

PS: I haven't attached this (underspec'ed) machine to RALPH.
23) Message boards : Current tests : Switching between projects with applications removed from memory (Message 189)
Posted 18 Feb 2006 by Dimitris Hatzopoulos
Post:
Any suggestions on the kinds of stress-tests we should try on RALPH WUs, to "speed things up"? Any recommended settings? I have # hours to run set to 4. Is there a point in reducing it even more (if one doesn't care about the download overheads) to get more WU samples? Or reduce "Switch between applications every" to 30min? (from 60) again to "force" more removes from mem?

Also, is there a phase in Rosetta's progress (e.g. <10% progress) that a WU is more susceptible to the dreaded "Computation error", due to checkpointing or whatever?

Since everytime a user manually requests an update, BOINC does a request_reschedule_cpus, which removes currently running apps from memory and resumes/starts others. So, one can manually force multiple app removal from mem actions, not having to wait 60min.
24) Message boards : Number crunching : Resource share: RALPH instead of Rosetta@home? (Message 173)
Posted 18 Feb 2006 by Dimitris Hatzopoulos
Post:
genes, thx for info, now that I had more time, I played with RALPH settings to find how to do separate configs, without having settings "spill over".

I've set the 1 PC which joined RALPH to "work" and "work"'s general settings include the "Leave app in memory when preempted"=NO. I'm not going to run R@H on this one for the time being (as long as I want to test if R v4.84 solved the issue we have with R v4.81)

Apparently a host (PC) can be in location "work" for project X and in location "home" for project Y (had to look in account_*.xml files, field "host_venue")
25) Message boards : RALPH@home bug list : Report \"failure when switching projects without keeping applications in memory\" bugs here (Message 163)
Posted 17 Feb 2006 by Dimitris Hatzopoulos
Post:

Yes I know that but it defaults to No on this project so people need to be aware of that and possibly set RALPH up on a different preference (homes, school or work) with preempt set to No, that way it wont interfere with other project preferences



AFAIK "Leave in memory" is a global default, not per project or per location (work/home/school) and so setting it independantly isn't as easy (if you share the same PC between Rosetta and Ralph) see my deciding on resource share
26) Message boards : Number crunching : Resource share: RALPH instead of Rosetta@home? (Message 155)
Posted 17 Feb 2006 by Dimitris Hatzopoulos
Post:
Just to clarify previous comments, I'm not talking about credits AT ALL.

It's only about the technical aspects of running RALPH and R@H side by side on the same PC or on PCs sharing some projects, so the "leave app in mem"=NO setting for RALPH can "spill-over" and break R@H WUs.
27) Message boards : Number crunching : Resource share: RALPH instead of Rosetta@home? (Message 154)
Posted 17 Feb 2006 by Dimitris Hatzopoulos
Post:
How should we do it?

I run mostly Rosetta because that is the science research application, and put a lower resource share on Ralph. But it is a matter of personal preference of course. :)


Sure, but if RALPH is supposed to also test against the 1% bug, we would need to run the PC WITHOUT the "Leave in mem when preempted" option, in which case the ordinary Rosetta WUs will bomb. After setting "Leave in mem when preempted" to "No", I just had a R WU bomb with "Computational error" after 7+ hr of CPU time...

So, if I understand correctly, running both Rosetta and RALPH on same PC with same BOINC settings would be counter-productive.

And btw, as far as I can tell, there is no way to change the "leave app in memory" per project or per PC or per location (home/work/school).

So, it seems to me at first glance, that the ONLY way to test RALPH for the "Leave in mem" issue, without killing Rosetta's WUs, would be to test RALPH on a PCs that only run RALPH (not R@H) and any projects unaffected by this setting, but which don't share Rosetta on other PCs (so I can set RALPH settings to "Leave app in mem" to "No", but this setting won't "propagate" to the R@H production PCs).

Maybe I am missing something?
28) Message boards : RALPH@home bug list : Report \"failure when switching projects without keeping applications in memory\" bugs here (Message 110)
Posted 17 Feb 2006 by Dimitris Hatzopoulos
Post:
Double check your preempt in memory setting on the preferences on this project, for some reason mine was set to No when I joined up, usually it automatically says Yes on other projects I join, that may be why it was removed from memory


"Leave in memory when preempted" is a BOINC GLOBAL default, that "propagates" across all projects. BOINC uses the config from the project with with the newest time-stamp.

i.e. if you run SETI/Rosetta/RALPH and change RALPH today, those RALPH settings (e.g. "Leave app in mem" to NO) will be used by all other projects.

It wasn't quite clear to me too, and I had to look for this "detail".
29) Message boards : Number crunching : Resource share: RALPH instead of Rosetta@home? (Message 92)
Posted 16 Feb 2006 by Dimitris Hatzopoulos
Post:
How should we do it?

Is RALPH running same WUs as R@H, with the newer executable code, in which case I set "No new work" for R@H and continue with RALPH-only?

Or is RALPH just test WUs with more or less "random" data, just to test things. In which case I'd keep RALPH with relatively small resource share?
30) Message boards : Current tests : Switching between projects with applications removed from memory (Message 87)
Posted 16 Feb 2006 by Dimitris Hatzopoulos
Post:
I wonder how exactly the process of "removing app from memory" is handled by BOINC and science app.

Would e.g. Rosetta lose any data it computed, since its last "checkpoint" (writing temporary results to disk every x minutes or y progress?)

I know I could look at the source of some open-source science app like SETI, but ... I thought I'd save a bit of time asking :-)
31) Message boards : Number crunching : Need for Linux testing? (Message 86)
Posted 16 Feb 2006 by Dimitris Hatzopoulos
Post:
Should I have my Linux machine attach to RALPH? (I've already joined the beta via WinXP)


Previous 20



©2024 University of Washington
http://www.bakerlab.org