Bug Reports for 5.45

Message boards : RALPH@home bug list : Bug Reports for 5.45

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Chu
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 Sep 06
Posts: 61
Credit: 12,545
RAC: 0
Message 2715 - Posted: 27 Jan 2007, 22:09:57 UTC
Last modified: 28 Jan 2007, 1:53:28 UTC

Ralph has been updated to 5.45. In this update, we include a fix to the long known graphic problem and we would like to send it out for a test here RALPH first. In our beta test on our local windows and mac hosts, different rosetta jobs, which used to crash within 5 to 10 minutes with graphics on, are running in a much more stable manner. Given the desriable test results, we turned back the sidechain drawing and mouse-rotation features. Please give it a try either by turning on graphics in boinc manager or by enabling boinc screensaver. If you spot any problem, please report to us here ( more detailed description on errors are prefered ). Thanks.

For Mac users, even with the fix we still see that sometimes the graphic frame is suddently frozen due to an entrapment in the graphic thread (somewhere in glut library). When this happens, the graphic window can be closed without any problem but just can not be re-opened. The effect is limited to the graphic thread only and the worker thread still run properly (you can see increased progress) and return valid results when it finishes (Before the fix, it used to crash both the graphic thread and worker thread, and trigger a segmentation violation or bus error). If you see similar behavior for Ralph jobs, please keep the WU cruching and see if the WU will indeed produce results properly in the end. Thanks.

For windows users, we did not see any problem so far in our local tests and would like to see how it goes with Ralph.
ID: 2715 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2717 - Posted: 28 Jan 2007, 1:44:10 UTC

Yippie!! Project TFlops here we come!

Do you plan to do several batches of Ralph testing? People need time to suspend Rosetta so they can enable the screensaver to test the Ralph tasks, and then time to catch some tasks available on the server etc. etc.

1,000 tasks, twice a day for a few days?

Keep in mind, most users now do not use the screensaver. And most Ralph users also run Rosetta, so we're going to have to do a little jockeying around to do some good tests.
ID: 2717 · Report as offensive    Reply Quote
Profile KSMarksPsych
Avatar

Send message
Joined: 16 Feb 06
Posts: 40
Credit: 8,226
RAC: 0
Message 2718 - Posted: 28 Jan 2007, 3:35:52 UTC

I just successfully completed one WU.

Opened the graphics window and played around rotating the protein.

Using BOINC 5.8.6a. P4 2.8, 512 of RAM, XP Pro.
ID: 2718 · Report as offensive    Reply Quote
Profile KSMarksPsych
Avatar

Send message
Joined: 16 Feb 06
Posts: 40
Credit: 8,226
RAC: 0
Message 2719 - Posted: 28 Jan 2007, 3:38:07 UTC - in response to Message 2718.  

I just successfully completed one WU.

Opened the graphics window and played around rotating the protein.

Using BOINC 5.8.6a. P4 2.8, 512 of RAM, XP Pro.



This WU


[aside]What happened to message editing... or was it never here?[/aside]
ID: 2719 · Report as offensive    Reply Quote
Chu
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 Sep 06
Posts: 61
Credit: 12,545
RAC: 0
Message 2720 - Posted: 28 Jan 2007, 4:37:01 UTC - in response to Message 2717.  

Good point. How should we proceed? Right after the update this afternoon, we sent out about 600 WUs and now half of them are already done. However, my guess is that most of them were crunched without graphics at all as people may not know the update in time to enable their graphics. We do need to send several batches for testing, and I just want to spread the words a little bit more before doing so. There are two ways by which people can help testing:

1. keep the screensaver disabled but manually enable graphics within boinc $manager by pushing "show graphics" button ( as reported by KSMarksPsych above). This way Rosetta@home does not have to be suspended, but more users' attention are required.

2. suspend Rosetta@Home first and enable boinc screensaver. My only concern is that TFflops for Rosetta may drop temporarily and Ralph may not have enough WUs to feed all the testing hosts, thus a lot of time will be wasted.

I personally prefer the first option, but if anybody has a better solution, please let us know. Meanwhile, we will send out graphics testing WUs periodically so that it can provide enough coverage before drawing the conclusion.

Yippie!! Project TFlops here we come!

Do you plan to do several batches of Ralph testing? People need time to suspend Rosetta so they can enable the screensaver to test the Ralph tasks, and then time to catch some tasks available on the server etc. etc.

1,000 tasks, twice a day for a few days?

Keep in mind, most users now do not use the screensaver. And most Ralph users also run Rosetta, so we're going to have to do a little jockeying around to do some good tests.

ID: 2720 · Report as offensive    Reply Quote
Chu
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 Sep 06
Posts: 61
Credit: 12,545
RAC: 0
Message 2721 - Posted: 28 Jan 2007, 4:39:38 UTC - in response to Message 2718.  

Great, one positive data point, thanks for the report. If possible, try to leave the graphic window open even if you do not stay in front your computer all the time.
I just successfully completed one WU.

Opened the graphics window and played around rotating the protein.

Using BOINC 5.8.6a. P4 2.8, 512 of RAM, XP Pro.

ID: 2721 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 2724 - Posted: 28 Jan 2007, 8:03:18 UTC

I'm new to MAC but when I try to zoom in and out on the grafics

it just rotates.

Is it just me or is somthing not right?

Anders n
ID: 2724 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 2725 - Posted: 28 Jan 2007, 8:04:35 UTC - in response to Message 2724.  

I'm new to MAC but when I try to zoom in and out on the grafics

it just rotates.

Is it just me or is somthing not right?

Anders n


Hmmm where did edit go???

It works like it should on the windows computers :)

Anders n

ID: 2725 · Report as offensive    Reply Quote
Tom Philippart

Send message
Joined: 24 Jun 06
Posts: 4
Credit: 883
RAC: 0
Message 2726 - Posted: 28 Jan 2007, 10:00:10 UTC

https://ralph.bakerlab.org/result.php?resultid=407247
Windows Vista x64
I pressed "show graphics" and left them on and played a lot with them during the whole runtime of the WU, no problems!
ID: 2726 · Report as offensive    Reply Quote
Profile [AF>France>TDM>Centre]Jeannot Le Tazon

Send message
Joined: 11 Jun 06
Posts: 3
Credit: 1,754
RAC: 0
Message 2727 - Posted: 28 Jan 2007, 10:46:57 UTC - in response to Message 2720.  

1. keep the screensaver disabled but manually enable graphics within boinc $manager by pushing "show graphics" button ( as reported by KSMarksPsych above).

Wu https://ralph.bakerlab.org/result.php?resultid=407583 OK
ID: 2727 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 2729 - Posted: 28 Jan 2007, 13:55:13 UTC

https://ralph.bakerlab.org/result.php?resultid=406892

Not a grafics but a stuck WU.

Anders n
ID: 2729 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2730 - Posted: 28 Jan 2007, 15:38:52 UTC - in response to Message 2720.  

1. keep the screensaver disabled but manually enable graphics within boinc $manager by pushing "show graphics" button ( as reported by KSMarksPsych above). This way Rosetta@home does not have to be suspended, but more users' attention are required.

2. suspend Rosetta@Home first and enable boinc screensaver. My only concern is that TFflops for Rosetta may drop temporarily and Ralph may not have enough WUs to feed all the testing hosts, thus a lot of time will be wasted.

I personally prefer the first option, but if anybody has a better solution, please let us know. Meanwhile, we will send out graphics testing WUs periodically so that it can provide enough coverage before drawing the conclusion.


Ya my TFlops comment was optimisitically looking forward to the new code rolling out to Rosetta and less users there having problems or confusion, or leaving due to failures.

I think just do what you're doing, keep small amounts of work coming at various times of day (think dial-up, each day after work). But I just wanted to point out that this test has enough special circumstances around it that it needs more time then most you've done before here on Ralph.

Speaking of TFlops, were you able to devise thread safety without too much of a performance impact? I've always been curious how many conformations would be showing if the graphic actually showed each and every one of them.

I picked up two DOC WUs last night on the PC that I was trying (and having problems with) previously, running 24hr time pref. so they're 6.5hrs in without any graphics enabled. Then I'll be using my PC most of today and have suspended Rosetta and enabled the ss for tonight.

...2 DOC WUs, one using 204MB the other using 177MB. So, I'll ask again, is there a simple way we can tell that a given WU was designed for high memory systems?

ID: 2730 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 2732 - Posted: 28 Jan 2007, 16:46:30 UTC

I was running 2 Wu-s at the same time on my MAC.
1 with grafics window on 1 without.
I did not get a true picture of how much cpu power the grafics
takes (due to that the Wu without grafics got stuck) but after
3H runtime the grafics WU was 18 min back.

Anders n
ID: 2732 · Report as offensive    Reply Quote
Chu
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 Sep 06
Posts: 61
Credit: 12,545
RAC: 0
Message 2733 - Posted: 28 Jan 2007, 18:07:05 UTC - in response to Message 2730.  

The current fix should not have any impact on the performance as compared to before.

We can define a high memory requirement in our job submission script to instruct only sending out the batch to cilents with larger memory. For most of the rosetta jobs, the default vaule should be fine, but with Rosetta design coming along, it will probably require more memory than usual.


Ya my TFlops comment was optimisitically looking forward to the new code rolling out to Rosetta and less users there having problems or confusion, or leaving due to failures.

I think just do what you're doing, keep small amounts of work coming at various times of day (think dial-up, each day after work). But I just wanted to point out that this test has enough special circumstances around it that it needs more time then most you've done before here on Ralph.

Speaking of TFlops, were you able to devise thread safety without too much of a performance impact? I've always been curious how many conformations would be showing if the graphic actually showed each and every one of them.

I picked up two DOC WUs last night on the PC that I was trying (and having problems with) previously, running 24hr time pref. so they're 6.5hrs in without any graphics enabled. Then I'll be using my PC most of today and have suspended Rosetta and enabled the ss for tonight.

...2 DOC WUs, one using 204MB the other using 177MB. So, I'll ask again, is there a simple way we can tell that a given WU was designed for high memory systems?

ID: 2733 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2734 - Posted: 28 Jan 2007, 18:25:07 UTC

My point was just that I am observing a Ralph WU that takes 200MB to run. That is high enough I know such a WU should probably be given the "high memory only" designation on the serverl; or? perhaps it isn't running correctly. But, to my knowledge, I have no way to tell (since I do have a high memory machine) whether this "high memory only" designation has properly been made. If there were something in the WU name, or in an XML file somewhere that we could check, we'd know when to notify you when we observe memory use beyond your plan. Perhaps a "HM" or "LM" designation somewhere in the WU name.
ID: 2734 · Report as offensive    Reply Quote
Viromancy

Send message
Joined: 20 Jan 07
Posts: 7
Credit: 1,425
RAC: 0
Message 2735 - Posted: 28 Jan 2007, 20:34:22 UTC

Failed WU here.

Same type of error that forced me to stop crunching Rosetta altogether after decreasing stability for ver 5.43 resulted in around 75% of WUs aborting prematurely. Never had this problem at all with any WUs from other BOINC applications I run (World Community Grid/Malaria Control) and very rare with Rosetta before version 5.43. Had one instance of the same with version 5.44 here. Also, along with others, saw three odd, unrelated WU failures with ver 5.44 just before 5.45 was introduced here, here and here. I know these latter aren't ver 5.45, but for sake of completelness I thought it was worth mentioning.

I don't use graphics, at all. All these errors, and almost all of the constant errors being thrown up by Rosetta ver 5.43, occurred while the application was running in the background and the machine was otherwise idle.
ID: 2735 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 2738 - Posted: 29 Jan 2007, 5:21:11 UTC

1 more stck Wu on my MAC.

https://ralph.bakerlab.org/result.php?resultid=406892

I will set the target time 4 H to se if it problem dissapears.

Anders n
ID: 2738 · Report as offensive    Reply Quote
Chu
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 Sep 06
Posts: 61
Credit: 12,545
RAC: 0
Message 2740 - Posted: 29 Jan 2007, 16:50:41 UTC - in response to Message 2735.  

Hi Viromancy, I am a little surprised to hear that even with graphics disabled, you only got 75% failure rate for Rosetta@Home and from our current statistics, that number on average stays below 10% for windows platform. The error message you got is certainly one of the symtoms related to graphics, but definitely not limited to that. May I ask if you have experienced any stability issue with your machine in general? We certainly do not want to lose users because of application stability and that is why we are trying to work on improving it. Maybe you can check whether this is improved in 5.45 and if the failure rate goes down significantly, you may considering attaching back to Rosetta@Home.

BTW, the last three failure mentioned below in your post were caused by some problems in Rosetta science code and that is exactly the purpose running the alpha test to catch it.
Failed WU here.

Same type of error that forced me to stop crunching Rosetta altogether after decreasing stability for ver 5.43 resulted in around 75% of WUs aborting prematurely. Never had this problem at all with any WUs from other BOINC applications I run (World Community Grid/Malaria Control) and very rare with Rosetta before version 5.43. Had one instance of the same with version 5.44 here. Also, along with others, saw three odd, unrelated WU failures with ver 5.44 just before 5.45 was introduced here, here and here. I know these latter aren't ver 5.45, but for sake of completelness I thought it was worth mentioning.

I don't use graphics, at all. All these errors, and almost all of the constant errors being thrown up by Rosetta ver 5.43, occurred while the application was running in the background and the machine was otherwise idle.

ID: 2740 · Report as offensive    Reply Quote
Chu
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 Sep 06
Posts: 61
Credit: 12,545
RAC: 0
Message 2741 - Posted: 29 Jan 2007, 16:52:38 UTC - in response to Message 2738.  

Thanks Anders n, that might be due to a bad trajectory.
1 more stck Wu on my MAC.

https://ralph.bakerlab.org/result.php?resultid=406892

I will set the target time 4 H to se if it problem dissapears.

Anders n

ID: 2741 · Report as offensive    Reply Quote
Viromancy

Send message
Joined: 20 Jan 07
Posts: 7
Credit: 1,425
RAC: 0
Message 2742 - Posted: 29 Jan 2007, 18:44:54 UTC - in response to Message 2740.  

The error message you got is certainly one of the symtoms related to graphics, but definitely not limited to that. May I ask if you have experienced any stability issue with your machine in general?


Hi Chu. Apologies for the long post.

No, I've never had any stability issue with my machine for any applications I run on it, with the sole exception that it doesn't like running the BOINC manager at the same time as I'm ripping DVDs. Other than that, it's rock solid. It's fairly well overclocked -I'm running a Core2Duo E6700 at 3.46 GHz, and my PC6400-rated RAM is actually running as PC8200 - but it's tested completely stable and several months of running both cores at 100% capacity 24/7 has never generated a single error for any BOINC application WU except Rosetta. Rosetta, though, became very touchy about running. It would inevitably fail a WU that was pre-empted and swapped out to allow something else to run. I had to leave it runing all the time on one core.

We certainly do not want to lose users because of application stability and that is why we are trying to work on improving it. Maybe you can check whether this is improved in 5.45 and if the failure rate goes down significantly, you may considering attaching back to Rosetta@Home.


I was quite puzzled and a bit disturbed at how the failure rate on Rosetta got more and more pronounced over time without any change to my machine's configuration or any other evidence of instability. I kept going for as long as possible because I liked crunching Rosetta and I'd accumulated a very respectable number of WUs. But the failure rate was becoming alarming, and on the 15th-16th January this year some 75-80% of all WUs aborted prematurely. That's when I regretfully had to call a halt. I joined RALPH to see whether the newer versions were more stable with an eye to going back to Rosetta when they're implemented. It's hard to tell, since the fairly irregular availability of work means I don't have a large WU base to draw conclusions from, but both 5.45 and 5.44 before it seem more stable than 5.43 on my machine; for one thing, they can both be swapped in and out to allow other BOINC applications to run without causing problems.

Out of curiosity, since the beta versions seemed more stable, I allowed my BOINC manager to download some new Rosetta workunits under 5.43 on Jan 27th. Sure enough, the first three it tried to run all failed with access violations, here, here and here. The fourth WU succeeded. By that stage, though, I'd had enough again and shut it down.

I have no idea why this is happening, and the 10% failure rate you mention would have been, if anything, an overestimate of the situation during the first few months I was crunching. The problems really seem to stem from the introduction of 5.43; which is puzzling since I don't use the graphics. I'll certainly try Rosetta again when 5.43 is upgraded, but I'd be a lot happier if I knew what was going wrong.


ID: 2742 · Report as offensive    Reply Quote
1 · 2 · 3 · Next

Message boards : RALPH@home bug list : Bug Reports for 5.45



©2024 University of Washington
http://www.bakerlab.org