Posts by mtyka

21) Message boards : RALPH@home bug list : minirosetta 1.58 (Message 4642)
Posted 1 Feb 2009 by mtyka
Post:
Hopefully all the graphics work now (no more blackout current windows).

As always, your feedback is highly appreciated !

Mike
22) Message boards : RALPH@home bug list : minirosetta v1.55 bug thread (Message 4641)
Posted 1 Feb 2009 by mtyka
Post:
I'm seeing the same quirk in progress times that robertmiles and others have already reported. I've got a bunch of tasks with names of the form csttest_1_8_nativecst_harm*, all of which, under Mac OS X 10.4.11, are supposed to take 1 hour approximately to complete. What I'm seeing is that, after say 45 minutes, progress is apparently only 15% complete and there is 1:20:00 left. It's stepping very slowly at this point in a stage called MoverBase+Minimization. Nevertheless the tasks complete in about one hour as they're supposed to.



You have to remember that the percentage bar is merely a cosmetic feature - it's not a "real" percentage bar. Its just a really crude estimate of the time left. Now: there is NO way of knowing how long the job will take before you've finished the first decoy .
I've recently changed this estimate to be much more conservative. We estimate it as percentage=100*time_spent/max_time.
where maxtime is USERTIME+4hrs, not USERTIME. because the WU cannot run longer then an excess of 4 hrs over the user time (due to the watchdog).
After the first decoy is completed the program can make a slightly more educated guess about how long it's going to actually take, so the percentages get more accurate as more decoys are produced.
So i'f you're 45 minutes in on the first decoy 15% is about correct. (since your max runtime is 300minutes. 45/300 = 0.15

I'm sorry there's no better way to do this, but rosetta goes through many different stages in making a decyo and its simply impossible to know how long it's going to take.

Mike



23) Message boards : RALPH@home bug list : minirosetta v1.55 bug thread (Message 4620)
Posted 31 Jan 2009 by mtyka
Post:
since we are on 1.56 now, should there be a new thread for that?

btw..very interesting color choice for the accepted energy line and the other line above the rmsd box. what does the blue and yellow mean and isn't there a purplish color in there as well?


Nope - its just prettyness. No extra meaning i'm afraid.
24) Message boards : RALPH@home bug list : minirosetta v1.55 bug thread (Message 4612)
Posted 30 Jan 2009 by mtyka
Post:
a little treat for your guys ;)
25) Message boards : RALPH@home bug list : minirosetta v1.55 bug thread (Message 4609)
Posted 29 Jan 2009 by mtyka
Post:
Yeah the graphics do work afterall. It's just the settings thing as you rightly point out NAthan. Thanks so much btw for identifying this bug. Its a major bugfix in the BOINC API and i would never have found 9or even suspected it) if i hadn't seen your trace !

Ok, so the graphics app needs to be recompiled too. ok, that's no problem. :)

26) Message boards : RALPH@home bug list : minirosetta v1.55 bug thread (Message 4606)
Posted 29 Jan 2009 by mtyka
Post:
In fact in this WU
http://ralph.bakerlab.org/result.php?resultid=1278720
the graphic dont work.


Yes i noticed as soon as i fired up my client. Its ok,
i'll track this down in the next version - i think its just todo with the fact i only updated the app and not the graphics_app.
27) Message boards : RALPH@home bug list : minirosetta v1.55 bug thread (Message 4603)
Posted 29 Jan 2009 by mtyka
Post:
hmm i think the graphics dont work with this one. not to worry.
28) Message boards : RALPH@home bug list : minirosetta v1.54 bug thread (Message 4602)
Posted 29 Jan 2009 by mtyka
Post:
I have 8 minirosetta 1.54 workunits from r@h that have completed successfully now without failures.

--Nathan



Ive just uploaded 1.55 with a fix to this problem - could you change your settings back again and test that this worked ? I wanna see it work correctly with the CPU overrides !
29) Message boards : RALPH@home bug list : minirosetta v1.55 bug thread (Message 4601)
Posted 29 Jan 2009 by mtyka
Post:
All right, while things over on BOINC are locking marvelous, here's another update on RALPH.

1.55 has three new things:

a) Fixes to deal with validator rejections for when the watchdog kicks in and when it says "too many restarts with no progress"

b) A very detailed debug information header which will hopefully help trace the problem inthe options system

c) ANOTHER bug fix in the BOINC API this time in the user preferences. THis bug lead directly to the phenomenon that Brotherbard managed to oint out by running his app in GDB. Awesome! Read about it here
Ramostol, this is relevant for you too, i think that's the same bug.
You two, could you set your settings back to restrict to specific days and see if it works now ? It did here :)


The lock file issue remains the last issue that we dont even have the faintest handle on, Apparently it's to do with setting the client to not allocate 100% of CPU.

Anyawy, please post reports here.
30) Message boards : RALPH@home bug list : minirosetta v1.54 bug thread (Message 4598)
Posted 28 Jan 2009 by mtyka
Post:
Has anyone with the lock-file problem tried the solution I suggested? I am curious if that is the cause of that problem ...


Paul,

can you point me to the thing you read about Lockfile problems on Einstein !?
(i read your post over on boinc)

Mike
31) Message boards : RALPH@home bug list : minirosetta v1.54 bug thread (Message 4596)
Posted 28 Jan 2009 by mtyka
Post:
I have 8 minirosetta 1.54 workunits from r@h that have completed successfully now without failures.

--Nathan



Fabulous. So we have a workaround - at least.
Trying to track down the problem but it's proving difficult..
32) Message boards : RALPH@home bug list : minirosetta v1.54 bug thread (Message 4594)
Posted 28 Jan 2009 by mtyka
Post:
Mike is off to break the app on his computer by setting these :D

lets see what happens ... mooohahah ;)
33) Message boards : RALPH@home bug list : minirosetta v1.54 bug thread (Message 4592)
Posted 28 Jan 2009 by mtyka
Post:
lazypug, thanks for that. Yes i've found this bug recently too, i will put in a fix in the next update.
34) Message boards : RALPH@home bug list : minirosetta v1.54 bug thread (Message 4591)
Posted 28 Jan 2009 by mtyka
Post:
I ran the minirosetta 1.54 app in gdb and here is the stack trace:


genius.



I'm not sure if minirosetta is doing anything special with the global prefs,

nope.


I would suspect this is a BOINC defect. I'm running BOINC 6.2.18, and have not tested this on other versions, nor do I have work from any other project on this machine at the moment so cannot test if other projects fail like this too.

--Nathan



Nathan - awesome ! Let me have a look at the code now, at least we have a handle on what failed. Maybe i can fix it in the next release. We should notify the boinc people too if this is an API error.

So what exactly is the week override setting ?

What happens if you remove it ?

Does that app run fine and through to the end ?


-- Mike
35) Message boards : RALPH@home bug list : minirosetta v1.54 bug thread (Message 4580)
Posted 28 Jan 2009 by mtyka
Post:

Ok, you power users, i need your help. We are seeing in our statistics
that a lot of people are seeing these errors over on BOINC:

<message>
too many exit(0)s
</message>

or lots of these:

Can't acquire lockfile - exiting
BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting


These are one and the same problem. Just sometimes the actuall error messages dont get saved. If you see this please let me know h
ow this looks from your point of view ?

Those of you who had lockfile problems - how did you solve them ?

What Client versions do you use ?

Are these clients somehow stuck ?

Need info - am pretty stuck with this one - it accounts for a hell of a lot of failures.


Some people never seem to get them and some get them all the time, if not every time.

Mike
36) Message boards : RALPH@home bug list : minirosetta v1.48-1.51 bug thread (Message 4578)
Posted 27 Jan 2009 by mtyka
Post:
I was starting to think that must be what was occuring. And also was thinking that once the model was completed, that these temp files should be deletable. So, great minds think alike I guess... and so do we!

Glad the info. came of some good use.


yeah the non-deleteing i think is due to a logic flaw when it gets to the end in the dcoy. Its not suuuuuper urgent, i'll include a fix with the next update. All it does is accumulate a few more structures until the WU ends. But frankly that's rarely goingot be more than 5-10 MB.

Mike
37) Message boards : RALPH@home bug list : minirosetta v1.48-1.51 bug thread (Message 4575)
Posted 27 Jan 2009 by mtyka
Post:
Feet1st:

Looking at this, the checkpointing is working fine. As i said, rosetta will now buffer checkpoints up to a limit and them *dump all of them at the same time*.
So those large blocks of files is when it dumped all those checkpoints in one go.
Sadly we cannot just write the last checkpoint, we need all intermediate checkpoints to corerctly restore the state of the program.
(rather then spreading them over time, which would keep the disc spinnng).

HOWEVER: YOu;ve found another problem: minirosetta is not deleting all the checkpoints after its
done with them !!! Theoretically it should delete checkpoints after each decoy.

I'll look into that.

Cheers, Mike
38) Message boards : RALPH@home bug list : minirosetta v1.54 bug thread (Message 4574)
Posted 27 Jan 2009 by mtyka
Post:

Did you ever increase the size of the shared memory segment?

It is POSSIBLE that the original configuration is too limiting and that may be causing the error ... there are directions for increasing the size that *MAY* help ...


Yes, I did that some time ago.

When I checked it this morning, I had three workunits (1 from ralph and 2 from rosetta) that were stuck at a very low completion (around 0.111 to 0.240) and the two rosetta ones were just shy of 8 hours with my time setting at 2 hours. Show graphics does not show anything.

I needed to reboot due to installing some system updates and when BOINC came back up they started over at zero time and two of them failed right away, with one getting stuck again. I'm down to just this one workunit on my machine and I noticed that it is running around 200% CPU usage. It appears that both the main thread and the watchdog thread are stuck in __spin_lock.

--Nathan



Hi Nathan,

You as well ramostol are having a strange problem on MacOS that i've not seen anywhere else yet. It always seems to start with an error just after the Semaphore initialization and then fails a litlte bit further down.
Not sure how to approach this. I could send you a directory with a debug build and see if i can get a trace or something. But its going ot be neigh impossible to debug this from here since, i'm sad to say, on our MAcOSX machines is does not happen.
39) Message boards : RALPH@home bug list : minirosetta v1.54 bug thread (Message 4573)
Posted 27 Jan 2009 by mtyka
Post:
Having problems with Mac OS X on a Mac Pro on both ralph and rosetta. So far not a single 1.54 task has completed successfully. I've stopped downloading more.

http://ralph.bakerlab.org/results.php?hostid=16351
http://boinc.bakerlab.org/rosetta/results.php?hostid=585071



Did you ever increase the size of the shared memory segment?

It is POSSIBLE that the original configuration is too limiting and that may be causing the error ... there are directions for increasing the size that *MAY* help ...


Did you mean *me* or brotherbard ?

Now *I* did indeed increase the shared memory buffersize required for mini -
i think it will now use about 3MB (per app i guess).

To quote the page that Paul. D Buck pointed out:
"The amount of shared memory available on a Mac is configured at boot time. Once the shared memory system has been initiallized it is not possible to change the shared memory configuration[1]. At present the same amount of shared memory is configured on any Mac (about 4MB), regardless of the number of processors or the amount of total memory available."

Holy smokes ?! When setting this app i considered 3MB to be quite conservative on, oyu know, machines that routinely have 1GB and more. But this might be an issue if the dfault config is closer to 4MB.

Let me talk to the original authors of the graphics app and find out more - we might be onto soemthing here...


What else ?

glad to hear the 99 decoy limit is working ! yeah!

There is still something seriously fishy in the options systems i see a bunch of traces that end straight after "Initializing options..ok", 1.56 is in preparation that will hopefully reveal more about this bug.


THis:
ERROR: Option matching -loop:close_loops not found in command line top-level context

Is ok, its just old WUs executing with the new version which does no longer suppor tthis option. not to worry.




40) Message boards : RALPH@home bug list : minirosetta v1.54 bug thread (Message 4559)
Posted 26 Jan 2009 by mtyka
Post:
Forget the last post, me bad ! Checked using memtest 86+ V2.10 and the last DIMM always produced a SINGLE fault not always in the same place no matter what settings I use (Voltage , CAS etc)

Oh well, I've removed what I think is the offender so lets see how we go !?!

Sorry about the red herrings


WOW! Are you saying thos random crashes are likely due to the faulty DIMM ? Awesome - well that removes a whole bunch of random failures! We've been very confused by these random failures.

Mike


Previous 20 · Next 20



©2024 University of Washington
http://www.bakerlab.org