minirosetta v1.43 bug thread

Message boards : RALPH@home bug list : minirosetta v1.43 bug thread

To post messages, you must log in.

AuthorMessage
James
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 22 Jun 06
Posts: 19
Credit: 278
RAC: 0
Message 4370 - Posted: 1 Dec 2008, 6:08:47 UTC

This is a minor update to v1.42 which was posted earlier last week, and contains the following fixes:

- Excessive memory usage and long running jobs - jobs submitted with minirosetta v1.43 shouldn't have the same problems with memory and runtimes as earlier versions.
- Validator errors - there was a small bug in v1.42 that resulted in results being called invalid by the BOINC server. This is now fixed.
- Check point errors and restarting jobs - we have finer-grained checkpointing in our full-atom refinement mode, which means that there should be fewer errors and less wasted time.
- NANs in hbonding: we have a more aggressive fix that tests for the NaN condition and continues more gracefully. This has been a tricky bug to track down, but we think that this is a big step forward.

Please post bugs to this thread, and thank you very much for your patience. Cheers,

James


ID: 4370 · Report as offensive    Reply Quote
mtyka
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 19 Mar 08
Posts: 79
Credit: 0
RAC: 0
Message 4372 - Posted: 3 Dec 2008, 0:29:11 UTC

We've loaded a whole bunch of stuff onto the queue and things are looking good from our point of view. Most of the errors we're seeing are download errors.. this is typical when lots of clients try and get the new apps and should ease up shortly.

Anything from your end ?
ID: 4372 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 4373 - Posted: 3 Dec 2008, 5:19:02 UTC
Last modified: 3 Dec 2008, 5:22:07 UTC

Really liking the stats on the homepage!! (down to about a 9% failure rate)

I'm 22hrs in to a 24hr run on this guy:
fast_ramp_0.01_rep_16_rlb_1o4w_IGNORE_THE_REST_DECOY_5787_1_0

First off... wow! 64 models so far, and this is a large protein!

I brought up the graphic... all I see graphed is the RMSD on the right. No energy, and no cross hair of the two. Also... haven't seen any of the red dots of any of the prior 64 models, which seems unlikely to be correct.

She's running with 177MB of memory. I presume it's already classified as a high memory task... but I really wish you could find room in those short little task names to provide some indication of a tasks minimum memory. I mean I've got 1GB for an HT processor, can't say I ever have memory problems... and I can end up crunching most anything you put out... but it would be nice if we could see in the WU name that this one only goes to machines with 512MB or whatever, so we know NOT to report "high memory problems" on the task.
ID: 4373 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 4374 - Posted: 3 Dec 2008, 5:29:28 UTC

P.S. I'm the first to admit that if the graphic is the ONLY problem I can report, then things are looking great!

I'm on WinXP. I tried suspending/resuming tasks and projects, they seem to stop using CPU on command. As they should.
ID: 4374 · Report as offensive    Reply Quote
mtyka
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 19 Mar 08
Posts: 79
Credit: 0
RAC: 0
Message 4375 - Posted: 3 Dec 2008, 18:26:38 UTC - in response to Message 4374.  

What's wrong with the Graphics ?
ID: 4375 · Report as offensive    Reply Quote
mtyka
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 19 Mar 08
Posts: 79
Credit: 0
RAC: 0
Message 4376 - Posted: 3 Dec 2008, 18:31:03 UTC - in response to Message 4373.  


I brought up the graphic... all I see graphed is the RMSD on the right. No energy, and no cross hair of the two. Also... haven't seen any of the red dots of any of the prior 64 models, which seems unlikely to be correct.

Ahh - i seee - sorry i sdidnt see your post when i posted the above mesage.
Hmm - i'll see if i can see what's going on here.


She's running with 177MB of memory. I presume it's already classified as a high memory task...


Really ? Aehm - i' say 177MB of memory is actually fairly *low*. Over 250 MB i consider a high memory job. I thoguht the minimum requirement for R@H Wus is 250 MB ? David ?


ID: 4376 · Report as offensive    Reply Quote
Path7

Send message
Joined: 11 Feb 08
Posts: 56
Credit: 4,974
RAC: 0
Message 4378 - Posted: 3 Dec 2008, 19:07:23 UTC

This WU:
cc_0_6_nocst4_homo_bench_foldcst_chunk_general_t303__olange_IGNORE_THE_REST _2AH5A_3_5823_3_0
attracted my attention because it ran ongoing for 11250 seconds ( No other project ran in the meantime), and generated 2 decoys.
Runtime preference: 2 hours (7200 seconds)
Switch between application (setting): 60 minutes (3600 seconds).

I wonder myself: did this WU made any checkpoint?

Have a nice day,
Path7.
ID: 4378 · Report as offensive    Reply Quote
olange

Send message
Joined: 27 Nov 08
Posts: 2
Credit: 0
RAC: 0
Message 4379 - Posted: 3 Dec 2008, 20:31:11 UTC - in response to Message 4378.  

Hi Path7, thanks for the report.

the application should make checkpoints pretty regularly. also the runtime looks really long. I will run this WU locally and check things out.

what CPU was this WU running on ?

-Oliver
ID: 4379 · Report as offensive    Reply Quote
mtyka
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 19 Mar 08
Posts: 79
Credit: 0
RAC: 0
Message 4380 - Posted: 3 Dec 2008, 20:52:01 UTC

I've looked into the graphisc problem - i can totally reproduce this here, it seems to have to do with the way it scales the graph and is merely cosmetic. I'll try and get this fixed with the next graphics update (which is separate from the main application update).

Mike
ID: 4380 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 4381 - Posted: 3 Dec 2008, 20:55:11 UTC - in response to Message 4376.  

Really ? Aehm - i' say 177MB of memory is actually fairly *low*. Over 250 MB i consider a high memory job. I thoguht the minimum requirement for R@H Wus is 250 MB ? David ?


256MB is the minimum for the SYSTEM! Not the max for a task. Leave some room for an operating system and a browser window or two there.

177MB is above average, which tends to be closer to 120MB. So, my point is that we're sitting here observing the ~60MB increase, and not knowing if you are already aware of it, or if the tasks have been properly set up to only run on machines with more then the minimum requirement. ...I guess if I had a machine with the minimum memory and saw it, then I would know it needs to be pointed out. But, as it stands, I have no way to tell.
ID: 4381 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 4382 - Posted: 3 Dec 2008, 21:03:56 UTC

I have been getting the following error on just one of my hosts This Host

CPU time 0
stderr out

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
Maximum CPU time exceeded
</message>

The run time is Zero with of course no decoys generated.

This has now been happening on this host since Version 1.42 with 1188054
Now on Version 1.43 with 1188071
1188409
1188547
1193391

Also had another validate error on 1190103
It ran for over 12,000 seconds yet generated no decoys.

Hope this helps and hope you can help my computer as well, Conan.
ID: 4382 · Report as offensive    Reply Quote
mtyka
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 19 Mar 08
Posts: 79
Credit: 0
RAC: 0
Message 4383 - Posted: 3 Dec 2008, 21:22:33 UTC - in response to Message 4381.  

feet1st.
Oh wow - i see. Hmm - i will discuss this with DK in a minute. For my Wus the memory requirements (per JOB! ) are around:

relax_benchmark (rlb_**) around 100-200MB
homology_benchamrk (*_homo_bench_*) around 150-330MB (the proteins here are much much bigger)

Thanks for this info!

Mike
ID: 4383 · Report as offensive    Reply Quote
mtyka
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 19 Mar 08
Posts: 79
Credit: 0
RAC: 0
Message 4384 - Posted: 3 Dec 2008, 21:26:36 UTC


Also had another validate error on 1190103
It ran for over 12,000 seconds yet generated no decoys.


Ah yes - we saw that and we have a fix. A flag was missing from that WU that
prevents this error, so our future WUs should not produce thi sparticular validate error anymore.


YOur other errors.. stranger - we'll look into it.

You say it's only a single machine that has them ? Have you trie restarting/reinstalling boinc on it ? Is there anything particular about that box ?

ID: 4384 · Report as offensive    Reply Quote
Path7

Send message
Joined: 11 Feb 08
Posts: 56
Credit: 4,974
RAC: 0
Message 4385 - Posted: 3 Dec 2008, 23:31:36 UTC - in response to Message 4379.  
Last modified: 3 Dec 2008, 23:36:46 UTC

Hi Path7, thanks for the report.
................ I will run this WU locally and check things out.

what CPU was this WU running on ?

-Oliver

Hi olange/Oliver,

Thanks for your reaction.
The “olange” WU ran on a single core AMD sempron 3000+ 1.8 GHz, Ubuntu 8.04.

I hope this helps,
Path7.
ID: 4385 · Report as offensive    Reply Quote

Message boards : RALPH@home bug list : minirosetta v1.43 bug thread



©2024 University of Washington
http://www.bakerlab.org