Message boards : RALPH@home bug list : minirosetta v1.43 bug thread
Author | Message |
---|---|
James Volunteer moderator Project developer Project scientist Send message Joined: 22 Jun 06 Posts: 19 Credit: 278 RAC: 0 |
This is a minor update to v1.42 which was posted earlier last week, and contains the following fixes: - Excessive memory usage and long running jobs - jobs submitted with minirosetta v1.43 shouldn't have the same problems with memory and runtimes as earlier versions. - Validator errors - there was a small bug in v1.42 that resulted in results being called invalid by the BOINC server. This is now fixed. - Check point errors and restarting jobs - we have finer-grained checkpointing in our full-atom refinement mode, which means that there should be fewer errors and less wasted time. - NANs in hbonding: we have a more aggressive fix that tests for the NaN condition and continues more gracefully. This has been a tricky bug to track down, but we think that this is a big step forward. Please post bugs to this thread, and thank you very much for your patience. Cheers, James |
mtyka Volunteer moderator Project developer Project scientist Send message Joined: 19 Mar 08 Posts: 79 Credit: 0 RAC: 0 |
We've loaded a whole bunch of stuff onto the queue and things are looking good from our point of view. Most of the errors we're seeing are download errors.. this is typical when lots of clients try and get the new apps and should ease up shortly. Anything from your end ? |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Really liking the stats on the homepage!! (down to about a 9% failure rate) I'm 22hrs in to a 24hr run on this guy: fast_ramp_0.01_rep_16_rlb_1o4w_IGNORE_THE_REST_DECOY_5787_1_0 First off... wow! 64 models so far, and this is a large protein! I brought up the graphic... all I see graphed is the RMSD on the right. No energy, and no cross hair of the two. Also... haven't seen any of the red dots of any of the prior 64 models, which seems unlikely to be correct. She's running with 177MB of memory. I presume it's already classified as a high memory task... but I really wish you could find room in those short little task names to provide some indication of a tasks minimum memory. I mean I've got 1GB for an HT processor, can't say I ever have memory problems... and I can end up crunching most anything you put out... but it would be nice if we could see in the WU name that this one only goes to machines with 512MB or whatever, so we know NOT to report "high memory problems" on the task. |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
P.S. I'm the first to admit that if the graphic is the ONLY problem I can report, then things are looking great! I'm on WinXP. I tried suspending/resuming tasks and projects, they seem to stop using CPU on command. As they should. |
mtyka Volunteer moderator Project developer Project scientist Send message Joined: 19 Mar 08 Posts: 79 Credit: 0 RAC: 0 |
What's wrong with the Graphics ? |
mtyka Volunteer moderator Project developer Project scientist Send message Joined: 19 Mar 08 Posts: 79 Credit: 0 RAC: 0 |
Ahh - i seee - sorry i sdidnt see your post when i posted the above mesage. Hmm - i'll see if i can see what's going on here.
Really ? Aehm - i' say 177MB of memory is actually fairly *low*. Over 250 MB i consider a high memory job. I thoguht the minimum requirement for R@H Wus is 250 MB ? David ? |
Path7 Send message Joined: 11 Feb 08 Posts: 56 Credit: 4,974 RAC: 0 |
This WU: cc_0_6_nocst4_homo_bench_foldcst_chunk_general_t303__olange_IGNORE_THE_REST _2AH5A_3_5823_3_0 attracted my attention because it ran ongoing for 11250 seconds ( No other project ran in the meantime), and generated 2 decoys. Runtime preference: 2 hours (7200 seconds) Switch between application (setting): 60 minutes (3600 seconds). I wonder myself: did this WU made any checkpoint? Have a nice day, Path7. |
olange Send message Joined: 27 Nov 08 Posts: 2 Credit: 0 RAC: 0 |
Hi Path7, thanks for the report. the application should make checkpoints pretty regularly. also the runtime looks really long. I will run this WU locally and check things out. what CPU was this WU running on ? -Oliver |
mtyka Volunteer moderator Project developer Project scientist Send message Joined: 19 Mar 08 Posts: 79 Credit: 0 RAC: 0 |
I've looked into the graphisc problem - i can totally reproduce this here, it seems to have to do with the way it scales the graph and is merely cosmetic. I'll try and get this fixed with the next graphics update (which is separate from the main application update). Mike |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Really ? Aehm - i' say 177MB of memory is actually fairly *low*. Over 250 MB i consider a high memory job. I thoguht the minimum requirement for R@H Wus is 250 MB ? David ? 256MB is the minimum for the SYSTEM! Not the max for a task. Leave some room for an operating system and a browser window or two there. 177MB is above average, which tends to be closer to 120MB. So, my point is that we're sitting here observing the ~60MB increase, and not knowing if you are already aware of it, or if the tasks have been properly set up to only run on machines with more then the minimum requirement. ...I guess if I had a machine with the minimum memory and saw it, then I would know it needs to be pointed out. But, as it stands, I have no way to tell. |
Conan Send message Joined: 16 Feb 06 Posts: 364 Credit: 1,368,421 RAC: 0 |
I have been getting the following error on just one of my hosts This Host CPU time 0 stderr out <core_client_version>5.10.45</core_client_version> <![CDATA[ <message> Maximum CPU time exceeded </message> The run time is Zero with of course no decoys generated. This has now been happening on this host since Version 1.42 with 1188054 Now on Version 1.43 with 1188071 1188409 1188547 1193391 Also had another validate error on 1190103 It ran for over 12,000 seconds yet generated no decoys. Hope this helps and hope you can help my computer as well, Conan. |
mtyka Volunteer moderator Project developer Project scientist Send message Joined: 19 Mar 08 Posts: 79 Credit: 0 RAC: 0 |
feet1st. Oh wow - i see. Hmm - i will discuss this with DK in a minute. For my Wus the memory requirements (per JOB! ) are around: relax_benchmark (rlb_**) around 100-200MB homology_benchamrk (*_homo_bench_*) around 150-330MB (the proteins here are much much bigger) Thanks for this info! Mike |
mtyka Volunteer moderator Project developer Project scientist Send message Joined: 19 Mar 08 Posts: 79 Credit: 0 RAC: 0 |
Ah yes - we saw that and we have a fix. A flag was missing from that WU that prevents this error, so our future WUs should not produce thi sparticular validate error anymore. YOur other errors.. stranger - we'll look into it. You say it's only a single machine that has them ? Have you trie restarting/reinstalling boinc on it ? Is there anything particular about that box ? |
Path7 Send message Joined: 11 Feb 08 Posts: 56 Credit: 4,974 RAC: 0 |
Hi Path7, thanks for the report. Hi olange/Oliver, Thanks for your reaction. The “olange†WU ran on a single core AMD sempron 3000+ 1.8 GHz, Ubuntu 8.04. I hope this helps, Path7. |
Message boards :
RALPH@home bug list :
minirosetta v1.43 bug thread
©2024 University of Washington
http://www.bakerlab.org