Bug reports for Ralph 5.21

Message boards : RALPH@home bug list : Bug reports for Ralph 5.21

To post messages, you must log in.

AuthorMessage
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1771 - Posted: 5 Jun 2006, 3:00:13 UTC

In ralph 5.20 we made some fixes to the graphics and
Rom put in some stuff to clean up how different threads (e.g. rosetta itself, graphics, teh watchdog) shut down at the end of the run. We expect this to reduce errors!

In ralph 5.21 we fixed a small bug introduced into 5.20 where the application would hang for a little while after Rosetta finished and waited for the watchdog to check in (which could take up to an hour!).
ID: 1771 · Report as offensive    Reply Quote
[B^S] sTrey
Avatar

Send message
Joined: 15 Feb 06
Posts: 58
Credit: 15,430
RAC: 0
Message 1774 - Posted: 5 Jun 2006, 17:04:10 UTC
Last modified: 5 Jun 2006, 17:06:25 UTC

My one 5.21 wu ran before I got up, but from the log I'd say it finished without the delay. Could use some more wus :) to check graphics etc.
One thing, the watchdog timer bug is said to have been introduced with 5.20 but I definitely saw the delayed-finishing behavior in 5.19. Makes me wonder.
ID: 1774 · Report as offensive    Reply Quote
doc :)

Send message
Joined: 16 Feb 06
Posts: 46
Credit: 4,437
RAC: 0
Message 1777 - Posted: 5 Jun 2006, 22:36:20 UTC

3 successes without the delay or any errors as far as i can tell (was not there most of the time)

and i agree, that bug was introduced before 5.20, i never had any 5.20 work but i definitely remember having that bug with 5.19. not that it is that important as long as it is fixed though :)
ID: 1777 · Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 19 Feb 06
Posts: 37
Credit: 2,089
RAC: 0
Message 1778 - Posted: 6 Jun 2006, 3:13:57 UTC

One successfull finished:

https://ralph.bakerlab.org/workunit.php?wuid=129796

Result: https://ralph.bakerlab.org/result.php?resultid=151527

It uploaded fine after finishing.

But it was a huge protein! After 3 hours it was at the same first model. And it ran almost 4 hours, even I have Target CPU run time to 2 hours.



[color=navy][b]"I'm trying to maintain a shred of dignity in this world." - Me[/b][/color]

ID: 1778 · Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 19 Feb 06
Posts: 37
Credit: 2,089
RAC: 0
Message 1779 - Posted: 6 Jun 2006, 7:38:30 UTC

And the next finished fine:

https://ralph.bakerlab.org/workunit.php?wuid=134423

Result: https://ralph.bakerlab.org/result.php?resultid=152501

No problems so far. :-)


[color=navy][b]"I'm trying to maintain a shred of dignity in this world." - Me[/b][/color]

ID: 1779 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1780 - Posted: 6 Jun 2006, 11:33:56 UTC
Last modified: 6 Jun 2006, 11:49:57 UTC

So far so good on my Celeron 500, Win98se, 256M ram, screen saver disabled. These are all the results for that machine(note: bottom two are 5.20, top two are 5.21):

151320 128850 5 Jun 2006 7:44:20 UTC 6 Jun 2006 0:59:45 UTC Over Success Done 13,805.00 6.72 6.72
150599 132927 5 Jun 2006 5:34:20 UTC 5 Jun 2006 18:05:24 UTC Over Success Done 7,576.00 3.69 3.69
149963 132447 4 Jun 2006 2:58:27 UTC 4 Jun 2006 22:16:14 UTC Over Success Done 27,478.00 13.37 13.37
149454 132032 3 Jun 2006 5:51:31 UTC 3 Jun 2006 16:34:09 UTC Over Success Done 14,206.00 6.91 6.91

Also good on my P4 1.8 winxp, screensaver enabled (top is 5.21, rest are 5.19)
150342 132671 5 Jun 2006 3:13:50 UTC 6 Jun 2006 10:40:40 UTC Over Success Done 16,171.36 24.95 24.95
150341 132670 5 Jun 2006 3:13:50 UTC 5 Jun 2006 11:10:55 UTC Over Success Done 13,574.19 20.95 20.95
144445 128098 1 Jun 2006 5:56:15 UTC 4 Jun 2006 0:35:55 UTC Over Success Done 14,410.27 23.46 23.46
144444 128097 1 Jun 2006 5:56:15 UTC 2 Jun 2006 11:40:34 UTC Over Success Done 13,161.00 21.42 21.42
144443 128096 1 Jun 2006 5:56:15 UTC 2 Jun 2006 3:43:23 UTC Over Success Done 13,093.22 21.31 21.31
143910 126962 1 Jun 2006 0:06:20 UTC 2 Jun 2006 3:43:23 UTC Over Success Done 13,946.11 22.70 22.70

Also good on my AMD 64 3700 "Mobile" 754 socket laptop, screensaver enabled: (top two 5.21, rest are 5.19)
150593 132921 5 Jun 2006 5:16:58 UTC 6 Jun 2006 10:52:52 UTC Over Success Done 13,880.64 51.21 51.21
150592 132920 5 Jun 2006 5:16:58 UTC 5 Jun 2006 13:43:50 UTC Over Success Done 14,694.41 54.21 54.21
144382 128035 1 Jun 2006 5:31:50 UTC 2 Jun 2006 19:56:12 UTC Over Success Done 13,925.86 51.51 51.51
144254 127907 1 Jun 2006 5:31:50 UTC 2 Jun 2006 11:40:12 UTC Over Success Done 13,994.19 51.77 51.77
144253 127906 1 Jun 2006 5:31:50 UTC 2 Jun 2006 3:45:43 UTC Over Success Done 14,501.66 53.64 53.64

Even good on my AMD64 3700 sandiego, although I have the screensaver disabled on this one. Would you like me to re enable the screensaver to see if I get fatal windows errors? (top is 5.21, next is 5.20, rest are 5.19)
151133 133451 5 Jun 2006 7:04:09 UTC 5 Jun 2006 22:17:50 UTC Over Success Done 13,768.64 54.29 54.29
150091 129771 4 Jun 2006 11:03:49 UTC 4 Jun 2006 20:18:34 UTC Over Success Done 9,495.28 37.54 37.54
148489 131139 2 Jun 2006 11:47:20 UTC 4 Jun 2006 0:35:48 UTC Over Success Done 12,861.34 50.85 50.85
145068 128721 1 Jun 2006 7:42:42 UTC 3 Jun 2006 11:26:40 UTC Over Success Done 13,851.55 54.76 54.76
145067 128720 1 Jun 2006 7:42:42 UTC 2 Jun 2006 11:39:56 UTC Over Success Done 13,212.41 52.24 52.24

ID: 1780 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 1782 - Posted: 6 Jun 2006, 14:17:34 UTC
Last modified: 6 Jun 2006, 14:46:43 UTC

Rosetta_betta_5.21 Windows

NO bugs yet !


It needs of about 110 MB into RAM,
and the total RAM Utilization (ram+swap) is about 325 MB

What means that any pc with 256 MB physical RAM can run it without problems

The graphics screen uses at most 10% CPU

Thus, I consider ralph 5.21 good to replace current rosetta 5.16

btw: I have sucessfully crunched following WUs (5.21) without any problems
https://ralph.bakerlab.org/result.php?resultid=150394
https://ralph.bakerlab.org/result.php?resultid=150487
https://ralph.bakerlab.org/result.php?resultid=151970
https://ralph.bakerlab.org/result.php?resultid=151971
https://ralph.bakerlab.org/result.php?resultid=152703
https://ralph.bakerlab.org/result.php?resultid=152868
https://ralph.bakerlab.org/result.php?resultid=153082
https://ralph.bakerlab.org/result.php?resultid=153169
https://ralph.bakerlab.org/result.php?resultid=153204

ps: Why not use 3Dnow! to speedup float point operations ? (Athlon XP+)
*On einsten, crunching time went from 6 hours wu to 1 hour wu , cause 3dnow!
and cpu 5 C hotter -:)

Thanks,
Click signature for global team stats
ID: 1782 · Report as offensive    Reply Quote
tralala

Send message
Joined: 12 Apr 06
Posts: 52
Credit: 15,257
RAC: 0
Message 1783 - Posted: 6 Jun 2006, 14:29:31 UTC
Last modified: 6 Jun 2006, 14:31:37 UTC

I had four good results with 5.21.
However I noticed that no checkpointing was done between the models. On my fast computer a model completed between 10 and 25 minutes. For this WU for example it took 25 minutes between the checkpoints (models) which can translate in over an hour on a slow Mac.

Over at Rosetta people are "complaining" that it may take between 90-120 minutes for a WU to reach its first checkpoint.

What happened to more often checkpointing?
ID: 1783 · Report as offensive    Reply Quote
NJMHoffmann

Send message
Joined: 17 Feb 06
Posts: 8
Credit: 1,270
RAC: 0
Message 1784 - Posted: 6 Jun 2006, 19:39:43 UTC - in response to Message 1783.  
Last modified: 6 Jun 2006, 19:48:49 UTC

For this WU for example it took 25 minutes between the checkpoints (models) which can translate in over an hour on a slow Mac.
And this was one of the small (t314 with Nres=106) targets.

Over at Rosetta people are "complaining" that it may take between 90-120 minutes for a WU to reach its first checkpoint.
I am running two WUs for the slightly bigger targets 299 (Nres=180). One is at 40 min., one at 35. No checkpoint until now.
(Edit: finished first models at 44/38 mins.)

You can imagine how the t296 with Nres=445 looked like.

What happened to more often checkpointing?
Inquiring minds want to know :-)

Norbert (waiting for the boinc client, that waits for a checkpoint before switching the task)
ID: 1784 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1785 - Posted: 6 Jun 2006, 20:25:43 UTC
Last modified: 6 Jun 2006, 20:26:01 UTC

OOPS, spoke to soon. On my AMD64 3700 I got excited about a graphics fix, so I turned ON the screensaver.

wuid=133453

got another fatal windows error.


Result ID 151135
Name t307__CASP7_ABRELAX_SAVE_ALL_OUT_CONTACT_hom001__649_260_0
Workunit 133453
Created 5 Jun 2006 5:31:35 UTC
Sent 5 Jun 2006 7:04:09 UTC
Received 6 Jun 2006 20:16:34 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -1073741811 (0xffffffffc000000d)
Computer ID 2172
Report deadline 9 Jun 2006 7:04:09 UTC
CPU time 13127.953125
stderr out <core_client_version>5.4.9</core_client_version>
<message>
- exit code -1073741811 (0xc000000d)
</message>
<stderr_txt>
# random seed: 3033888
# cpu_run_time_pref: 14400

</stderr_txt>


Validate state Invalid
Claimed credit 51.7612201395287
Granted credit 0
application version 5.21
ID: 1785 · Report as offensive    Reply Quote
Rom Walton (BOINC)
Volunteer moderator
Project developer

Send message
Joined: 10 Mar 06
Posts: 21
Credit: 5,515
RAC: 0
Message 1788 - Posted: 6 Jun 2006, 20:31:03 UTC

Tony,

What kind of graphics adapter do you have on that machine?

ID: 1788 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1790 - Posted: 6 Jun 2006, 21:14:29 UTC - in response to Message 1788.  
Last modified: 6 Jun 2006, 21:20:18 UTC

Tony,

What kind of graphics adapter do you have on that machine?

AMD64 3700 Sandiego processor, Asus A8N-E mobo, Asus EN6200TC256/TD/64M/A Pci express video card, 1 GB OCZ Gold RAM. This is my only machine giving this fatal windows error, I has Nvidia chipsets in both the mobo and Video card.

tony

display "plug and play monitor onn NVIDIA Geforce 6200 TurboCache(TM)

Says ASUS OSD provide you the access to dynamically adjust parameters in D3D or OpenGL games by hotkeys.

Graphics card info
GeForce 6200TurboCache
Video Bios Version, 5.44.02.11
IRQ 18
PCI Express X16
256 MB memory
ForceWare Version 71.24
TV Encoder Type: Nvidia integrated
ID: 1790 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1791 - Posted: 7 Jun 2006, 1:32:59 UTC - in response to Message 1777.  
Last modified: 7 Jun 2006, 1:42:02 UTC

Hi: sorry, the delay bug may have been introduced in 5.19. I'm glad you all posted about it to give us a chance to fix it.

I think the most common refrain on ralph message boards is that there's not enough work. So we're trying a new strategy for our workunit queue -- from now on, there will always be work in the queue to help us debug continuously! We didn't do this before because we needed quick turnaround for certain new workunits every couple days. Now we've changed the workunit buffer size and priority system to let us send out new jobs quickly while maintaining a trickle of regular jobs at other times. Does that sound OK to everyone?

3 successes without the delay or any errors as far as i can tell (was not there most of the time)

and i agree, that bug was introduced before 5.20, i never had any 5.20 work but i definitely remember having that bug with 5.19. not that it is that important as long as it is fixed though :)


ID: 1791 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1792 - Posted: 7 Jun 2006, 1:41:40 UTC - in response to Message 1784.  

This was a mistake, thanks very much for pointing it out. We had some of the jobs sent out without checkpointing, and now we're switching back. There will be a delay, though, because those jobs are already in the queue... for all the jobs we sent out, we've made sure that the time between decoys is, on average, less than half an hour (but longer on a Mac).

For the crazy t296 protein, we're not doing the second, super-long "relax" stage
of the Rosetta protocol. But the first stage takes long enough on those guys that we should put some checkpoints in that stage too... this will require changing the app, and I can try doing this over the next week. Again, thanks for the suggestion!

For this WU for example it took 25 minutes between the checkpoints (models) which can translate in over an hour on a slow Mac.
And this was one of the small (t314 with Nres=106) targets.

Over at Rosetta people are "complaining" that it may take between 90-120 minutes for a WU to reach its first checkpoint.
I am running two WUs for the slightly bigger targets 299 (Nres=180). One is at 40 min., one at 35. No checkpoint until now.
(Edit: finished first models at 44/38 mins.)

You can imagine how the t296 with Nres=445 looked like.

What happened to more often checkpointing?
Inquiring minds want to know :-)

Norbert (waiting for the boinc client, that waits for a checkpoint before switching the task)


ID: 1792 · Report as offensive    Reply Quote
suguruhirahara

Send message
Joined: 5 Mar 06
Posts: 40
Credit: 11,320
RAC: 0
Message 1793 - Posted: 7 Jun 2006, 1:47:04 UTC - in response to Message 1791.  

So we're trying a new strategy for our workunit queue -- from now on, there will always be work in the queue to help us debug continuously! ... Does that sound OK to everyone?
it sounds okay.

I got an error with this result.
https://ralph.bakerlab.org/result.php?resultid=152634

Outcome Client error
Client state Computing
Exit status 3 (0x3)

ID: 1793 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 1794 - Posted: 7 Jun 2006, 2:48:46 UTC

To Rom Walton (BOINC)

Cause 5.20 had the freeze bug, and I was with a lot of 5.20 WUs
on my Queue, I did a reset for ralph project ...

All well, it did worked, and I started receiving 5.21 WUs -:)

The bug are these phantom WU(s) listed on the server side
150394 132722 5 Jun 2006 5:07:36 UTC 5 Jun 2006 15:01:10 UTC Over Success Done 3,454.54 19.77 19.77 
150241 132571 5 Jun 2006 1:58:12 UTC 9 Jun 2006 1:58:12 UTC In Progress Unknown New --- --- --- 
150240 132570 5 Jun 2006 1:58:12 UTC 9 Jun 2006 1:58:12 UTC In Progress Unknown New --- --- --- 
150239 132569 5 Jun 2006 1:58:12 UTC 9 Jun 2006 1:58:12 UTC In Progress Unknown New --- --- --- 
150238 132568 5 Jun 2006 1:58:11 UTC 9 Jun 2006 1:58:11 UTC In Progress Unknown New --- --- --- 
150237 132567 5 Jun 2006 1:58:11 UTC 9 Jun 2006 1:58:11 UTC In Progress Unknown New --- --- --- 
150083 128443 4 Jun 2006 6:53:49 UTC 4 Jun 2006 21:40:58 UTC Over Client error Computing 3,273.47 18.47 --- 


I hope a "Fix" for the boinc "server side" can be done, to avoid that phantom(s)

*ALL above listed WUs was on my queue (5.20), that the reset get rid of

However continue "listed" on the server as "In progress"
waiting deadtime be over, to change as "No reply" -:(

Thanks
Click signature for global team stats
ID: 1794 · Report as offensive    Reply Quote
[B^S] sTrey
Avatar

Send message
Joined: 15 Feb 06
Posts: 58
Credit: 15,430
RAC: 0
Message 1796 - Posted: 7 Jun 2006, 6:59:30 UTC - in response to Message 1791.  

I think the most common refrain on ralph message boards is that there's not enough work. So we're trying a new strategy for our workunit queue -- from now on, there will always be work in the queue to help us debug continuously! We didn't do this before because we needed quick turnaround for certain new workunits every couple days. Now we've changed the workunit buffer size and priority system to let us send out new jobs quickly while maintaining a trickle of regular jobs at other times. Does that sound OK to everyone?


Sounds great, thanks!

4 results no delay, no graphics problems so far. As soon as it works off a little debt will be back crunching :)
ID: 1796 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 1805 - Posted: 9 Jun 2006, 20:55:42 UTC - in response to Message 1791.  
Last modified: 9 Jun 2006, 20:55:52 UTC

So we're trying a new strategy for our workunit queue

That will be great. That way when we post on Rosetta suggesting someone join Ralph... they can actually get down to testing right away.

You might want to post a message on the homepage to alert people to this. And suggest how they should adjust resource share to control Ralph's crunching, rather than counting on the lack of WUs to be the limiting factor.

ID: 1805 · Report as offensive    Reply Quote

Message boards : RALPH@home bug list : Bug reports for Ralph 5.21



©2024 University of Washington
http://www.bakerlab.org