Message boards : RALPH@home bug list : Bug Reports for 5.45
Author | Message |
---|---|
Chu Volunteer moderator Project developer Project scientist Send message Joined: 26 Sep 06 Posts: 61 Credit: 12,545 RAC: 0 |
Ralph has been updated to 5.45. In this update, we include a fix to the long known graphic problem and we would like to send it out for a test here RALPH first. In our beta test on our local windows and mac hosts, different rosetta jobs, which used to crash within 5 to 10 minutes with graphics on, are running in a much more stable manner. Given the desriable test results, we turned back the sidechain drawing and mouse-rotation features. Please give it a try either by turning on graphics in boinc manager or by enabling boinc screensaver. If you spot any problem, please report to us here ( more detailed description on errors are prefered ). Thanks. For Mac users, even with the fix we still see that sometimes the graphic frame is suddently frozen due to an entrapment in the graphic thread (somewhere in glut library). When this happens, the graphic window can be closed without any problem but just can not be re-opened. The effect is limited to the graphic thread only and the worker thread still run properly (you can see increased progress) and return valid results when it finishes (Before the fix, it used to crash both the graphic thread and worker thread, and trigger a segmentation violation or bus error). If you see similar behavior for Ralph jobs, please keep the WU cruching and see if the WU will indeed produce results properly in the end. Thanks. For windows users, we did not see any problem so far in our local tests and would like to see how it goes with Ralph. |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Yippie!! Project TFlops here we come! Do you plan to do several batches of Ralph testing? People need time to suspend Rosetta so they can enable the screensaver to test the Ralph tasks, and then time to catch some tasks available on the server etc. etc. 1,000 tasks, twice a day for a few days? Keep in mind, most users now do not use the screensaver. And most Ralph users also run Rosetta, so we're going to have to do a little jockeying around to do some good tests. |
KSMarksPsych Send message Joined: 16 Feb 06 Posts: 40 Credit: 8,226 RAC: 0 |
I just successfully completed one WU. Opened the graphics window and played around rotating the protein. Using BOINC 5.8.6a. P4 2.8, 512 of RAM, XP Pro. |
KSMarksPsych Send message Joined: 16 Feb 06 Posts: 40 Credit: 8,226 RAC: 0 |
I just successfully completed one WU. This WU [aside]What happened to message editing... or was it never here?[/aside] |
Chu Volunteer moderator Project developer Project scientist Send message Joined: 26 Sep 06 Posts: 61 Credit: 12,545 RAC: 0 |
Good point. How should we proceed? Right after the update this afternoon, we sent out about 600 WUs and now half of them are already done. However, my guess is that most of them were crunched without graphics at all as people may not know the update in time to enable their graphics. We do need to send several batches for testing, and I just want to spread the words a little bit more before doing so. There are two ways by which people can help testing: 1. keep the screensaver disabled but manually enable graphics within boinc $manager by pushing "show graphics" button ( as reported by KSMarksPsych above). This way Rosetta@home does not have to be suspended, but more users' attention are required. 2. suspend Rosetta@Home first and enable boinc screensaver. My only concern is that TFflops for Rosetta may drop temporarily and Ralph may not have enough WUs to feed all the testing hosts, thus a lot of time will be wasted. I personally prefer the first option, but if anybody has a better solution, please let us know. Meanwhile, we will send out graphics testing WUs periodically so that it can provide enough coverage before drawing the conclusion. Yippie!! Project TFlops here we come! |
Chu Volunteer moderator Project developer Project scientist Send message Joined: 26 Sep 06 Posts: 61 Credit: 12,545 RAC: 0 |
Great, one positive data point, thanks for the report. If possible, try to leave the graphic window open even if you do not stay in front your computer all the time. I just successfully completed one WU. |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
I'm new to MAC but when I try to zoom in and out on the grafics it just rotates. Is it just me or is somthing not right? Anders n |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
I'm new to MAC but when I try to zoom in and out on the grafics Hmmm where did edit go??? It works like it should on the windows computers :) Anders n |
Tom Philippart Send message Joined: 24 Jun 06 Posts: 4 Credit: 883 RAC: 0 |
https://ralph.bakerlab.org/result.php?resultid=407247 Windows Vista x64 I pressed "show graphics" and left them on and played a lot with them during the whole runtime of the WU, no problems! |
[AF>France>TDM>Centre]Jeannot Le Tazon Send message Joined: 11 Jun 06 Posts: 3 Credit: 1,754 RAC: 0 |
1. keep the screensaver disabled but manually enable graphics within boinc $manager by pushing "show graphics" button ( as reported by KSMarksPsych above). Wu https://ralph.bakerlab.org/result.php?resultid=407583 OK |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
|
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
1. keep the screensaver disabled but manually enable graphics within boinc $manager by pushing "show graphics" button ( as reported by KSMarksPsych above). This way Rosetta@home does not have to be suspended, but more users' attention are required. Ya my TFlops comment was optimisitically looking forward to the new code rolling out to Rosetta and less users there having problems or confusion, or leaving due to failures. I think just do what you're doing, keep small amounts of work coming at various times of day (think dial-up, each day after work). But I just wanted to point out that this test has enough special circumstances around it that it needs more time then most you've done before here on Ralph. Speaking of TFlops, were you able to devise thread safety without too much of a performance impact? I've always been curious how many conformations would be showing if the graphic actually showed each and every one of them. I picked up two DOC WUs last night on the PC that I was trying (and having problems with) previously, running 24hr time pref. so they're 6.5hrs in without any graphics enabled. Then I'll be using my PC most of today and have suspended Rosetta and enabled the ss for tonight. ...2 DOC WUs, one using 204MB the other using 177MB. So, I'll ask again, is there a simple way we can tell that a given WU was designed for high memory systems? |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
I was running 2 Wu-s at the same time on my MAC. 1 with grafics window on 1 without. I did not get a true picture of how much cpu power the grafics takes (due to that the Wu without grafics got stuck) but after 3H runtime the grafics WU was 18 min back. Anders n |
Chu Volunteer moderator Project developer Project scientist Send message Joined: 26 Sep 06 Posts: 61 Credit: 12,545 RAC: 0 |
The current fix should not have any impact on the performance as compared to before. We can define a high memory requirement in our job submission script to instruct only sending out the batch to cilents with larger memory. For most of the rosetta jobs, the default vaule should be fine, but with Rosetta design coming along, it will probably require more memory than usual.
|
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
My point was just that I am observing a Ralph WU that takes 200MB to run. That is high enough I know such a WU should probably be given the "high memory only" designation on the serverl; or? perhaps it isn't running correctly. But, to my knowledge, I have no way to tell (since I do have a high memory machine) whether this "high memory only" designation has properly been made. If there were something in the WU name, or in an XML file somewhere that we could check, we'd know when to notify you when we observe memory use beyond your plan. Perhaps a "HM" or "LM" designation somewhere in the WU name. |
Viromancy Send message Joined: 20 Jan 07 Posts: 7 Credit: 1,425 RAC: 0 |
Failed WU here. Same type of error that forced me to stop crunching Rosetta altogether after decreasing stability for ver 5.43 resulted in around 75% of WUs aborting prematurely. Never had this problem at all with any WUs from other BOINC applications I run (World Community Grid/Malaria Control) and very rare with Rosetta before version 5.43. Had one instance of the same with version 5.44 here. Also, along with others, saw three odd, unrelated WU failures with ver 5.44 just before 5.45 was introduced here, here and here. I know these latter aren't ver 5.45, but for sake of completelness I thought it was worth mentioning. I don't use graphics, at all. All these errors, and almost all of the constant errors being thrown up by Rosetta ver 5.43, occurred while the application was running in the background and the machine was otherwise idle. |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
1 more stck Wu on my MAC. https://ralph.bakerlab.org/result.php?resultid=406892 I will set the target time 4 H to se if it problem dissapears. Anders n |
Chu Volunteer moderator Project developer Project scientist Send message Joined: 26 Sep 06 Posts: 61 Credit: 12,545 RAC: 0 |
Hi Viromancy, I am a little surprised to hear that even with graphics disabled, you only got 75% failure rate for Rosetta@Home and from our current statistics, that number on average stays below 10% for windows platform. The error message you got is certainly one of the symtoms related to graphics, but definitely not limited to that. May I ask if you have experienced any stability issue with your machine in general? We certainly do not want to lose users because of application stability and that is why we are trying to work on improving it. Maybe you can check whether this is improved in 5.45 and if the failure rate goes down significantly, you may considering attaching back to Rosetta@Home. BTW, the last three failure mentioned below in your post were caused by some problems in Rosetta science code and that is exactly the purpose running the alpha test to catch it. Failed WU here. |
Chu Volunteer moderator Project developer Project scientist Send message Joined: 26 Sep 06 Posts: 61 Credit: 12,545 RAC: 0 |
Thanks Anders n, that might be due to a bad trajectory. 1 more stck Wu on my MAC. |
Viromancy Send message Joined: 20 Jan 07 Posts: 7 Credit: 1,425 RAC: 0 |
The error message you got is certainly one of the symtoms related to graphics, but definitely not limited to that. May I ask if you have experienced any stability issue with your machine in general? Hi Chu. Apologies for the long post. No, I've never had any stability issue with my machine for any applications I run on it, with the sole exception that it doesn't like running the BOINC manager at the same time as I'm ripping DVDs. Other than that, it's rock solid. It's fairly well overclocked -I'm running a Core2Duo E6700 at 3.46 GHz, and my PC6400-rated RAM is actually running as PC8200 - but it's tested completely stable and several months of running both cores at 100% capacity 24/7 has never generated a single error for any BOINC application WU except Rosetta. Rosetta, though, became very touchy about running. It would inevitably fail a WU that was pre-empted and swapped out to allow something else to run. I had to leave it runing all the time on one core. We certainly do not want to lose users because of application stability and that is why we are trying to work on improving it. Maybe you can check whether this is improved in 5.45 and if the failure rate goes down significantly, you may considering attaching back to Rosetta@Home. I was quite puzzled and a bit disturbed at how the failure rate on Rosetta got more and more pronounced over time without any change to my machine's configuration or any other evidence of instability. I kept going for as long as possible because I liked crunching Rosetta and I'd accumulated a very respectable number of WUs. But the failure rate was becoming alarming, and on the 15th-16th January this year some 75-80% of all WUs aborted prematurely. That's when I regretfully had to call a halt. I joined RALPH to see whether the newer versions were more stable with an eye to going back to Rosetta when they're implemented. It's hard to tell, since the fairly irregular availability of work means I don't have a large WU base to draw conclusions from, but both 5.45 and 5.44 before it seem more stable than 5.43 on my machine; for one thing, they can both be swapped in and out to allow other BOINC applications to run without causing problems. Out of curiosity, since the beta versions seemed more stable, I allowed my BOINC manager to download some new Rosetta workunits under 5.43 on Jan 27th. Sure enough, the first three it tried to run all failed with access violations, here, here and here. The fourth WU succeeded. By that stage, though, I'd had enough again and shut it down. I have no idea why this is happening, and the 10% failure rate you mention would have been, if anything, an overestimate of the situation during the first few months I was crunching. The problems really seem to stem from the introduction of 5.43; which is puzzling since I don't use the graphics. I'll certainly try Rosetta again when 5.43 is upgraded, but I'd be a lot happier if I knew what was going wrong. |
Message boards :
RALPH@home bug list :
Bug Reports for 5.45
©2024 University of Washington
http://www.bakerlab.org