Bug reports for 5.66-5.68

Message boards : RALPH@home bug list : Bug reports for 5.66-5.68

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 3146 - Posted: 24 May 2007, 2:49:44 UTC
Last modified: 24 May 2007, 2:50:01 UTC

Ralph 5.66 fixed a problem where the graphics thread was crashing when sidechains were shown.
Ralph 5.67 fixes an issue in the output of symmetric proteins.
Thanks in advance for your posts! The posts for 5.65 helped a lot.

ID: 3146 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 3153 - Posted: 24 May 2007, 22:16:37 UTC
Last modified: 24 May 2007, 22:16:56 UTC

My sidechains still fall off on 5.67, as they did with 5.65
Example screenshot
ID: 3153 · Report as offensive    Reply Quote
Odysseus

Send message
Joined: 4 May 07
Posts: 23
Credit: 16,331
RAC: 0
Message 3154 - Posted: 24 May 2007, 23:54:12 UTC

My Mac G4/733 crashed after more than ten hours of crunching 1gidA_BOINC_MG_SASAPAIR_ALLRES_RNA_ABINITIO_SAVE_ALL_OUT_BARCODE_RNA_CONTACT_RNA_LONG_RANGE_CONTACT_RNA_SASA-1gidA-_2068_172; last time I looked it was showing only about ten minutes to go but hadn’t decremented that time for quite a while. (The percent done was over 98% and continuing to increment.) Exit status 1 (0x1), with the all-too-familiar “SIGBUS: bus error” message in the output file. Once again, the crash occurred either while the display was blacked out (having displayed the screensaver for a minute) or when I interrupted it. BTW this system has always been set to work while in use and to keep apps in memory, so I don’t understand why starting and stopping the graphics should be a problem—if that’s indeed the case.
ID: 3154 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 3157 - Posted: 25 May 2007, 7:46:22 UTC - in response to Message 3153.  

Hi feet1st -- yea, its because Rosetta changes its fold while the graphics thread finishes its drawing. We considered at one point freezing Rosetta until each graphics frame finishes, but were worried about the performance cost! So these large molecules may continue to get rendered in freaky ways!


My sidechains still fall off on 5.67, as they did with 5.65
Example screenshot


ID: 3157 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 3158 - Posted: 25 May 2007, 7:48:55 UTC - in response to Message 3154.  
Last modified: 25 May 2007, 7:52:43 UTC

Thanks for the post. I doubt that it is the graphics start and stop, but it might be. Please do post again if you find your mac crashing when you play with graphics. Those are tough bugs to fix, because a lot of the graphics stuff is out of our direct control. The good news (well, maybe bad to start with) is that the BOINC infrastructure will be moving to a new way of doing graphics that is apparently more robust, I think by the end of the summer. So after we iron out the kinks, that might help the graphics-related errors...

Incidentally, those workunits do take a long time (we have implemented checkpointing so that work should be saved freuqently in case of crashese), and Mac G4's are pretty slow for running Rosetta, unfortunately.


My Mac G4/733 crashed after more than ten hours of crunching 1gidA_BOINC_MG_SASAPAIR_ALLRES_RNA_ABINITIO_SAVE_ALL_OUT_BARCODE_RNA_CONTACT_RNA_LONG_RANGE_CONTACT_RNA_SASA-1gidA-_2068_172; last time I looked it was showing only about ten minutes to go but hadn’t decremented that time for quite a while. (The percent done was over 98% and continuing to increment.) Exit status 1 (0x1), with the all-too-familiar “SIGBUS: bus error” message in the output file. Once again, the crash occurred either while the display was blacked out (having displayed the screensaver for a minute) or when I interrupted it. BTW this system has always been set to work while in use and to keep apps in memory, so I don’t understand why starting and stopping the graphics should be a problem—if that’s indeed the case.


ID: 3158 · Report as offensive    Reply Quote
Profile KC0ISW

Send message
Joined: 17 Feb 06
Posts: 20
Credit: 11,725
RAC: 0
Message 3159 - Posted: 25 May 2007, 10:34:30 UTC

https://ralph.bakerlab.org/result.php?resultid=529432
ID: 3159 · Report as offensive    Reply Quote
Profile KC0ISW

Send message
Joined: 17 Feb 06
Posts: 20
Credit: 11,725
RAC: 0
Message 3160 - Posted: 25 May 2007, 10:58:21 UTC

https://ralph.bakerlab.org/result.php?resultid=530490

errored at 0.71%
ID: 3160 · Report as offensive    Reply Quote
Profile KC0ISW

Send message
Joined: 17 Feb 06
Posts: 20
Credit: 11,725
RAC: 0
Message 3161 - Posted: 25 May 2007, 10:59:41 UTC

https://ralph.bakerlab.org/result.php?resultid=530509
ID: 3161 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 3162 - Posted: 25 May 2007, 13:15:44 UTC - in response to Message 3157.  

Hi feet1st -- yea, its because Rosetta changes its fold while the graphics thread finishes its drawing. We considered at one point freezing Rosetta until each graphics frame finishes, but were worried about the performance cost! So these large molecules may continue to get rendered in freaky ways!


...and so it's not JUST the large ones? But they take longer to rendure and so I'm more likely to spot it there?

So, I am seeing the backbone of the next contortion, and the sidechains from the last? ...or perhaps visa-versa.

I personally am a computer programmer. I understand the challenge and performance concern, and personally feel that the graphic is just a nice thing for curious participants to keep us interested and involved. ...but I fear that many confuse the graphic with the science. They see a graphic that "doesn't work right", and they start to question the integrity of the science being done as well.

I already know the science is top-notch. But others do not take that for granted. So, I hope your mind will continue crunching on this issue until you can find a happy compromise that will yield proper graphics, as well as efficient crunching.

I take it that as you resolved the thread-safety issues with the graphic thread, that you devised a means of sharing the same memory. Which is more efficient then the double buffer approach I had envisioned. But... have you tested at all how MUCH more efficient? It may just be a 1% cost to push stuff out to a new memory area for the graphic thread.

Would it be possible to have the graphic thread grab a semaphore once it has rendured a frame? And then if that semaphore is in use, a new frame of data get's pushed out to the isolated "graphic-only" memory area, and the semaphore is freed.

I'm thinking that the graphic thread probably regulates itself so far as frames per second and etc. And such an approach would allow you to only push bytes around once per frame rendured. So you might be able to crunch 100 steps and only have the overhead for one frame of memory copy.

I note that my approach actually redures a "stale" frame, rather then the one actually in progress at this micro-second. Because it pushes out the current model at the end of the reduring of the last frame, rather then just before reduring the next frame; but, I don't think anyone would mind. The result would be similar to how the football game you see on your television has a satilite delay from the ACTUAL game being played 2,000 miles away.
ID: 3162 · Report as offensive    Reply Quote
Deborah Goldsmith

Send message
Joined: 16 Feb 06
Posts: 3
Credit: 253,789
RAC: 0
Message 3163 - Posted: 25 May 2007, 21:23:30 UTC

This is on an Intel Mac (Macbook Pro). 5.68 seems to use far more memory than earlier versions. The VM size is 1.6GB, and the working set is over 600MB. This is causing a big impact on the machine.

ID: 3163 · Report as offensive    Reply Quote
Odysseus

Send message
Joined: 4 May 07
Posts: 23
Credit: 16,331
RAC: 0
Message 3164 - Posted: 26 May 2007, 0:02:51 UTC

A very weird one from my Mac G4/733: when I came in to work this morning I saw that a Ralph task appeared frozen at 13.586% done, although its status showed as Running. Opening the graphics window, I saw this:



Note that the window says it’s 54.35% complete, contradicting BOINC Manager (although the times agree exactly), and that it seems to have lost track of my account—and even its own version number: “rosetta@home v0”!

Looking in the Messages tab I found:
Thu May 24 17:37:41 2007|ralph@home|Starting 1eyvA_BOINC_NOFILTERS_ABRELAX_SAVE_ALL_OUT_NEWRELAXFLAGS-1eyvA-frags83__2069_6_0
Thu May 24 17:37:42 2007|ralph@home|Starting task 1eyvA_BOINC_NOFILTERS_ABRELAX_SAVE_ALL_OUT_NEWRELAXFLAGS-1eyvA-frags83__2069_6_0 using rosetta_beta version 567
Thu May 24 17:44:37 2007|ralph@home|Sending scheduler request: Requested by user
Thu May 24 17:44:37 2007|ralph@home|Reporting 1 tasks
Thu May 24 17:44:42 2007|ralph@home|Scheduler RPC succeeded [server version 509]
Thu May 24 17:44:42 2007|ralph@home|Deferring communication for 4 min 2 sec
Thu May 24 17:44:42 2007|ralph@home|Reason: requested by project
That last bit was from my Updating to get yesterday’s crash reported. Then there was nothing for the remaining fifteen hours or so, aside from a SETI@home download a few minutes after the above messages were logged—so apparently it had prevented BOINC from crunching all night. I suspended the task; other projects resumed OK. A little while later I tried resuming the task, and it still seemed stuck, so I quit and relaunched BOINC. The WU seemed to have disappeared without a trace: no log entries indicating an upload or a report. Just to top the strangeness off, the result doesn’t seem to be on the website; I can’t find it anywhere in my account, under that host or elsewhere.
ID: 3164 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 3166 - Posted: 26 May 2007, 3:13:10 UTC

5.9MB task downloads? THAT's new! People will want to be aware of that in the new release notes on Rosetta.
ID: 3166 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 3167 - Posted: 26 May 2007, 4:18:05 UTC
Last modified: 26 May 2007, 4:43:16 UTC

3 failures
Work Unit https://ralph.bakerlab.org/result.php?resultid=529839 failed with Exit code 1, Error exit from: hbonds.cc line: 648
Work Unit https://ralph.bakerlab.org/result.php?resultid=531669 failed with Exit code 1, Error exit from: hbonds.cc line: 624
Work Unit https://ralph.bakerlab.org/result.php?resultid=531753 failed with Exit code 193, SIGSEGV, Segmentation Violation.

Hope this helps.
ID: 3167 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 3168 - Posted: 26 May 2007, 7:03:35 UTC - in response to Message 3163.  
Last modified: 26 May 2007, 7:06:33 UTC

This is on an Intel Mac (Macbook Pro). 5.68 seems to use far more memory than earlier versions. The VM size is 1.6GB, and the working set is over 600MB. This is causing a big impact on the machine.



I have this running on a XP and it take atleast 1,2 GB of VM.

Anders n

EDIT

It took 5H 20 min to do 1 model on a P4 2,8
ID: 3168 · Report as offensive    Reply Quote
Bjarke

Send message
Joined: 25 Feb 06
Posts: 5
Credit: 5,523
RAC: 0
Message 3169 - Posted: 26 May 2007, 10:45:31 UTC

This WU uses A LOT of memory. My laptop has only got 512 mb ram, so the wu uses above 95% of the pagefile (1,5 to 1,6 Gb). Now after running for 3hours 27 minutes, the wu pauses and the status-field in BOINC shows the message "Waiting for memory". BOINC then just switched to another wu.

What should I do?
ID: 3169 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 3170 - Posted: 26 May 2007, 11:12:55 UTC - in response to Message 3169.  

This WU uses A LOT of memory. My laptop has only got 512 mb ram, so the wu uses above 95% of the pagefile (1,5 to 1,6 Gb). Now after running for 3hours 27 minutes, the wu pauses and the status-field in BOINC shows the message "Waiting for memory". BOINC then just switched to another wu.

What should I do?


Hi Bjarke

I'm not on the team "just" a tester like you :)

Is there a chance for you to increase the VM?

Maybe it would get the WU kicking again.

Anders n

ID: 3170 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 3171 - Posted: 26 May 2007, 11:40:37 UTC - in response to Message 3167.  

3 failures
Work Unit https://ralph.bakerlab.org/result.php?resultid=529839 failed with Exit code 1, Error exit from: hbonds.cc line: 648
Work Unit https://ralph.bakerlab.org/result.php?resultid=531669 failed with Exit code 1, Error exit from: hbonds.cc line: 624
Work Unit https://ralph.bakerlab.org/result.php?resultid=531753 failed with Exit code 193, SIGSEGV, Segmentation Violation.

Hope this helps.


Another 2 failed after only 3 minutes

https://ralph.bakerlab.org/result.php?resultid=532538
https://ralph.bakerlab.org/result.php?resultid=532562

both failed with this error

process exited with code 1 (0x1)
trouble finding jump_templates_RNA_basepairs_v2.dat
ERROR:: Exit from: read_paths.cc line: 360
ID: 3171 · Report as offensive    Reply Quote
Bjarke

Send message
Joined: 25 Feb 06
Posts: 5
Credit: 5,523
RAC: 0
Message 3172 - Posted: 26 May 2007, 16:35:01 UTC - in response to Message 3170.  
Last modified: 26 May 2007, 16:35:50 UTC

This WU uses A LOT of memory. My laptop has only got 512 mb ram, so the wu uses above 95% of the pagefile (1,5 to 1,6 Gb). Now after running for 3hours 27 minutes, the wu pauses and the status-field in BOINC shows the message "Waiting for memory". BOINC then just switched to another wu.

What should I do?


Hi Bjarke

I'm not on the team "just" a tester like you :)

Is there a chance for you to increase the VM?

Maybe it would get the WU kicking again.

Anders n

Thanks for the tip, though I've already tried that without luck. Anyway it seems that my computer resumed working on that WU after i while, unfortunately it came out with a failure.

The result, for anyone interested: 531974
ID: 3172 · Report as offensive    Reply Quote
Profile Trog Dog
Avatar

Send message
Joined: 8 Aug 06
Posts: 38
Credit: 41,996
RAC: 0
Message 3173 - Posted: 27 May 2007, 2:14:55 UTC

Got a SIGSEV error on this wu.
ID: 3173 · Report as offensive    Reply Quote
Thomas Leibold

Send message
Joined: 25 Feb 07
Posts: 27
Credit: 77,464
RAC: 0
Message 3174 - Posted: 27 May 2007, 6:00:26 UTC

Error on workunit 460745 and 463219:

<core_client_version>5.8.15</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1)
</message>
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
trouble finding jump_templates_RNA_basepairs_v2.dat
ERROR:: Exit from: read_paths.cc line: 360

</stderr_txt>
]]>

Same error in both cases. Both workunits failed for other users as well.
ID: 3174 · Report as offensive    Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : RALPH@home bug list : Bug reports for 5.66-5.68



©2024 University of Washington
http://www.bakerlab.org