Message boards : RALPH@home bug list : Bug reports for 5.66-5.68
Author | Message |
---|---|
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Ralph 5.66 fixed a problem where the graphics thread was crashing when sidechains were shown. Ralph 5.67 fixes an issue in the output of symmetric proteins. Thanks in advance for your posts! The posts for 5.65 helped a lot. |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
|
Odysseus Send message Joined: 4 May 07 Posts: 23 Credit: 16,331 RAC: 0 |
My Mac G4/733 crashed after more than ten hours of crunching 1gidA_BOINC_MG_SASAPAIR_ALLRES_RNA_ABINITIO_SAVE_ALL_OUT_BARCODE_RNA_CONTACT_RNA_LONG_RANGE_CONTACT_RNA_SASA-1gidA-_2068_172; last time I looked it was showing only about ten minutes to go but hadn’t decremented that time for quite a while. (The percent done was over 98% and continuing to increment.) Exit status 1 (0x1), with the all-too-familiar “SIGBUS: bus error†message in the output file. Once again, the crash occurred either while the display was blacked out (having displayed the screensaver for a minute) or when I interrupted it. BTW this system has always been set to work while in use and to keep apps in memory, so I don’t understand why starting and stopping the graphics should be a problem—if that’s indeed the case. |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Hi feet1st -- yea, its because Rosetta changes its fold while the graphics thread finishes its drawing. We considered at one point freezing Rosetta until each graphics frame finishes, but were worried about the performance cost! So these large molecules may continue to get rendered in freaky ways! My sidechains still fall off on 5.67, as they did with 5.65 |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Thanks for the post. I doubt that it is the graphics start and stop, but it might be. Please do post again if you find your mac crashing when you play with graphics. Those are tough bugs to fix, because a lot of the graphics stuff is out of our direct control. The good news (well, maybe bad to start with) is that the BOINC infrastructure will be moving to a new way of doing graphics that is apparently more robust, I think by the end of the summer. So after we iron out the kinks, that might help the graphics-related errors... Incidentally, those workunits do take a long time (we have implemented checkpointing so that work should be saved freuqently in case of crashese), and Mac G4's are pretty slow for running Rosetta, unfortunately. My Mac G4/733 crashed after more than ten hours of crunching 1gidA_BOINC_MG_SASAPAIR_ALLRES_RNA_ABINITIO_SAVE_ALL_OUT_BARCODE_RNA_CONTACT_RNA_LONG_RANGE_CONTACT_RNA_SASA-1gidA-_2068_172; last time I looked it was showing only about ten minutes to go but hadn’t decremented that time for quite a while. (The percent done was over 98% and continuing to increment.) Exit status 1 (0x1), with the all-too-familiar “SIGBUS: bus error†message in the output file. Once again, the crash occurred either while the display was blacked out (having displayed the screensaver for a minute) or when I interrupted it. BTW this system has always been set to work while in use and to keep apps in memory, so I don’t understand why starting and stopping the graphics should be a problem—if that’s indeed the case. |
KC0ISW Send message Joined: 17 Feb 06 Posts: 20 Credit: 11,725 RAC: 0 |
https://ralph.bakerlab.org/result.php?resultid=529432 |
KC0ISW Send message Joined: 17 Feb 06 Posts: 20 Credit: 11,725 RAC: 0 |
https://ralph.bakerlab.org/result.php?resultid=530490 errored at 0.71% |
KC0ISW Send message Joined: 17 Feb 06 Posts: 20 Credit: 11,725 RAC: 0 |
https://ralph.bakerlab.org/result.php?resultid=530509 |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Hi feet1st -- yea, its because Rosetta changes its fold while the graphics thread finishes its drawing. We considered at one point freezing Rosetta until each graphics frame finishes, but were worried about the performance cost! So these large molecules may continue to get rendered in freaky ways! ...and so it's not JUST the large ones? But they take longer to rendure and so I'm more likely to spot it there? So, I am seeing the backbone of the next contortion, and the sidechains from the last? ...or perhaps visa-versa. I personally am a computer programmer. I understand the challenge and performance concern, and personally feel that the graphic is just a nice thing for curious participants to keep us interested and involved. ...but I fear that many confuse the graphic with the science. They see a graphic that "doesn't work right", and they start to question the integrity of the science being done as well. I already know the science is top-notch. But others do not take that for granted. So, I hope your mind will continue crunching on this issue until you can find a happy compromise that will yield proper graphics, as well as efficient crunching. I take it that as you resolved the thread-safety issues with the graphic thread, that you devised a means of sharing the same memory. Which is more efficient then the double buffer approach I had envisioned. But... have you tested at all how MUCH more efficient? It may just be a 1% cost to push stuff out to a new memory area for the graphic thread. Would it be possible to have the graphic thread grab a semaphore once it has rendured a frame? And then if that semaphore is in use, a new frame of data get's pushed out to the isolated "graphic-only" memory area, and the semaphore is freed. I'm thinking that the graphic thread probably regulates itself so far as frames per second and etc. And such an approach would allow you to only push bytes around once per frame rendured. So you might be able to crunch 100 steps and only have the overhead for one frame of memory copy. I note that my approach actually redures a "stale" frame, rather then the one actually in progress at this micro-second. Because it pushes out the current model at the end of the reduring of the last frame, rather then just before reduring the next frame; but, I don't think anyone would mind. The result would be similar to how the football game you see on your television has a satilite delay from the ACTUAL game being played 2,000 miles away. |
Deborah Goldsmith Send message Joined: 16 Feb 06 Posts: 3 Credit: 253,789 RAC: 0 |
This is on an Intel Mac (Macbook Pro). 5.68 seems to use far more memory than earlier versions. The VM size is 1.6GB, and the working set is over 600MB. This is causing a big impact on the machine. |
Odysseus Send message Joined: 4 May 07 Posts: 23 Credit: 16,331 RAC: 0 |
A very weird one from my Mac G4/733: when I came in to work this morning I saw that a Ralph task appeared frozen at 13.586% done, although its status showed as Running. Opening the graphics window, I saw this: Note that the window says it’s 54.35% complete, contradicting BOINC Manager (although the times agree exactly), and that it seems to have lost track of my account—and even its own version number: “rosetta@home v0â€! Looking in the Messages tab I found: Thu May 24 17:37:41 2007|ralph@home|Starting 1eyvA_BOINC_NOFILTERS_ABRELAX_SAVE_ALL_OUT_NEWRELAXFLAGS-1eyvA-frags83__2069_6_0 Thu May 24 17:37:42 2007|ralph@home|Starting task 1eyvA_BOINC_NOFILTERS_ABRELAX_SAVE_ALL_OUT_NEWRELAXFLAGS-1eyvA-frags83__2069_6_0 using rosetta_beta version 567 Thu May 24 17:44:37 2007|ralph@home|Sending scheduler request: Requested by user Thu May 24 17:44:37 2007|ralph@home|Reporting 1 tasks Thu May 24 17:44:42 2007|ralph@home|Scheduler RPC succeeded [server version 509] Thu May 24 17:44:42 2007|ralph@home|Deferring communication for 4 min 2 sec Thu May 24 17:44:42 2007|ralph@home|Reason: requested by projectThat last bit was from my Updating to get yesterday’s crash reported. Then there was nothing for the remaining fifteen hours or so, aside from a SETI@home download a few minutes after the above messages were logged—so apparently it had prevented BOINC from crunching all night. I suspended the task; other projects resumed OK. A little while later I tried resuming the task, and it still seemed stuck, so I quit and relaunched BOINC. The WU seemed to have disappeared without a trace: no log entries indicating an upload or a report. Just to top the strangeness off, the result doesn’t seem to be on the website; I can’t find it anywhere in my account, under that host or elsewhere. |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
5.9MB task downloads? THAT's new! People will want to be aware of that in the new release notes on Rosetta. |
Conan Send message Joined: 16 Feb 06 Posts: 364 Credit: 1,368,421 RAC: 0 |
3 failures Work Unit https://ralph.bakerlab.org/result.php?resultid=529839 failed with Exit code 1, Error exit from: hbonds.cc line: 648 Work Unit https://ralph.bakerlab.org/result.php?resultid=531669 failed with Exit code 1, Error exit from: hbonds.cc line: 624 Work Unit https://ralph.bakerlab.org/result.php?resultid=531753 failed with Exit code 193, SIGSEGV, Segmentation Violation. Hope this helps. |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
This is on an Intel Mac (Macbook Pro). 5.68 seems to use far more memory than earlier versions. The VM size is 1.6GB, and the working set is over 600MB. This is causing a big impact on the machine. I have this running on a XP and it take atleast 1,2 GB of VM. Anders n EDIT It took 5H 20 min to do 1 model on a P4 2,8 |
Bjarke Send message Joined: 25 Feb 06 Posts: 5 Credit: 5,523 RAC: 0 |
This WU uses A LOT of memory. My laptop has only got 512 mb ram, so the wu uses above 95% of the pagefile (1,5 to 1,6 Gb). Now after running for 3hours 27 minutes, the wu pauses and the status-field in BOINC shows the message "Waiting for memory". BOINC then just switched to another wu. What should I do? |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
This WU uses A LOT of memory. My laptop has only got 512 mb ram, so the wu uses above 95% of the pagefile (1,5 to 1,6 Gb). Now after running for 3hours 27 minutes, the wu pauses and the status-field in BOINC shows the message "Waiting for memory". BOINC then just switched to another wu. Hi Bjarke I'm not on the team "just" a tester like you :) Is there a chance for you to increase the VM? Maybe it would get the WU kicking again. Anders n |
Conan Send message Joined: 16 Feb 06 Posts: 364 Credit: 1,368,421 RAC: 0 |
3 failures Another 2 failed after only 3 minutes https://ralph.bakerlab.org/result.php?resultid=532538 https://ralph.bakerlab.org/result.php?resultid=532562 both failed with this error process exited with code 1 (0x1) trouble finding jump_templates_RNA_basepairs_v2.dat ERROR:: Exit from: read_paths.cc line: 360 |
Bjarke Send message Joined: 25 Feb 06 Posts: 5 Credit: 5,523 RAC: 0 |
This WU uses A LOT of memory. My laptop has only got 512 mb ram, so the wu uses above 95% of the pagefile (1,5 to 1,6 Gb). Now after running for 3hours 27 minutes, the wu pauses and the status-field in BOINC shows the message "Waiting for memory". BOINC then just switched to another wu. Thanks for the tip, though I've already tried that without luck. Anyway it seems that my computer resumed working on that WU after i while, unfortunately it came out with a failure. The result, for anyone interested: 531974 |
Trog Dog Send message Joined: 8 Aug 06 Posts: 38 Credit: 41,996 RAC: 0 |
Got a SIGSEV error on this wu. |
Thomas Leibold Send message Joined: 25 Feb 07 Posts: 27 Credit: 77,464 RAC: 0 |
Error on workunit 460745 and 463219: <core_client_version>5.8.15</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1) </message> <stderr_txt> Graphics are disabled due to configuration... # cpu_run_time_pref: 14400 trouble finding jump_templates_RNA_basepairs_v2.dat ERROR:: Exit from: read_paths.cc line: 360 </stderr_txt> ]]> Same error in both cases. Both workunits failed for other users as well. |
Message boards :
RALPH@home bug list :
Bug reports for 5.66-5.68
©2024 University of Washington
http://www.bakerlab.org