Message boards : RALPH@home bug list : Bug reports for Ralph 5.42 and 5.43
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Chu Volunteer moderator Project developer Project scientist Send message Joined: 26 Sep 06 Posts: 61 Credit: 12,545 RAC: 0 |
I just put some docking WUs on ralph for graphic stability test and let's what will come out from that. Well, I'm now back to my problem PC, and was about to blow the train whistle --Whoo hooo!-- when I saw my screensaver MOVING, and it was on model 94. But then I noticed it only crunched the first WU for almost exactly 3hrs. Now that I've updated to project it shows watchdog ended it. And I commonly saw that same symptom on Rosetta once I activated the screensaver on the same host. |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Whoo - Hooo!!! Both WUs are still crunching away, screensaver ran all night. I didn't have Docking WUs, but with 5.41 I couldn't seem to get ANY WU to run that long with screensaver active. So, they've both run on my HT Windows PC for about 9hours, the screensaver was active and cut out normally when I awoke this morning. This looks very promising. Unless they fail, I won't have any other feedback until late tonight Central time when they should have completed their 24hr runtime preference. |
UBT - Mikeejones Send message Joined: 22 Mar 06 Posts: 2 Credit: 3,174 RAC: 0 |
Not really a problem but I noticed during this WU... 13/12/2006 15:24:08|ralph@home|Starting task eps1__BOINC_JUMPABINITIO_SAVE_ALL_OUT_BARCODE_CRYOEM_truncate_hom001__1555_46_1 using rosetta_beta version 543 That when the screensaver is in full screen the bottom of the graphics disappears off the bottom edge - can't see the line which includes 'accepted energy etc. It's only just there when set to a window. Just thought I'd mention it. Mike |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
I have tried to fore a grafics error... without any luck. Not that my compurets has a history of grafic errors. Anders n |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Great this is all looking good! From our end, error rates are extremely low, and we have at least a couple reports of formerly problematic machines being "cured". I'll do a rosetta@home update later today, and we'll monitor rosetta@home stuff very carefully over the next few days. I have tried to fore a grafics error... without any luck. |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
My two WUs are both still crunching strong 21hrs in to a 24hr run. This on the machine that couldn't run for more then about 3hrs without failing in some way or presenting a screensaver which was, well, cumbersome to regain control of your PC from. So, I think you're good to go! |
Jon Bahen Send message Joined: 6 Dec 06 Posts: 3 Credit: 1,007 RAC: 0 |
We are trying to increase stability in this release... We have turned off mouse rotation and sidechains temporarily. Please let us know if you can force a crash by playing with the "show graphics" option from the boinc manager, or with your screensaver! If I open the show graphics window then minimize and re-open the window is just black and then crashes |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
First 24hr WU reported through just fine. The second has another hour to go (that shows you how much CPU that graphics consumed during the day with the screensaver!) ...I just got more Ralph 5.43 WUs downloaded? |
Chu Volunteer moderator Project developer Project scientist Send message Joined: 26 Sep 06 Posts: 61 Credit: 12,545 RAC: 0 |
Did it freeze (and you had to maually kill it) or just crash itself? Thanks. We are trying to increase stability in this release... We have turned off mouse rotation and sidechains temporarily. Please let us know if you can force a crash by playing with the "show graphics" option from the boinc manager, or with your screensaver! |
Jon Bahen Send message Joined: 6 Dec 06 Posts: 3 Credit: 1,007 RAC: 0 |
Did it freeze (and you had to maually kill it) or just crash itself? Thanks. It froze |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
I got another hang too. Similar feel to what occured before, only this time there was zero CPU being used, and TWO rosetta.exe's shown in the applications tab of Windows. I had 2 run fine for 24hrs, then got two more that were crunching when I went to bed, these apparently ended prematurely and two more came down, these apparently both were invoked at some point as screensaver and the 1gup task was displayed (according to processing time shown) at the time it got hung up. Awoke this AM, screensaver non-responsive. It contacted the project directly from Rosetta.exe uploaded some diagnostics for you, and was still unresponsive so I right click the BOINC icon in the task bar and SNOOZE. This then caused both WUs to fail. Here's what my messages since last night look like: 12/13/2006 11:40:37 PM|ralph@home|Computation for task eps1__BOINC_JUMPABINITIO_SAVE_ALL_OUT_BARCODE_CRYOEM_truncate_hom001__1555_2_0 finished 12/13/2006 11:40:37 PM|ralph@home|Starting task 2chf__BOINC_POSE_ABRELAX_VARY_SC_BOND_ANGLES_NEWRELAXFLAGS_frags83__1560_3_0 using rosetta_beta version 543 12/13/2006 11:40:39 PM|ralph@home|Started upload of file eps1__BOINC_JUMPABINITIO_SAVE_ALL_OUT_BARCODE_CRYOEM_truncate_hom001__1555_2_0_0 12/13/2006 11:41:07 PM|ralph@home|Finished upload of file eps1__BOINC_JUMPABINITIO_SAVE_ALL_OUT_BARCODE_CRYOEM_truncate_hom001__1555_2_0_0 12/13/2006 11:41:07 PM|ralph@home|Throughput 5822 bytes/sec 12/13/2006 11:41:23 PM|ralph@home|Sending scheduler request to https://ralph.bakerlab.org/ralph_cgi/cgi 12/13/2006 11:41:23 PM|ralph@home|Reason: Requested by user 12/13/2006 11:41:23 PM|ralph@home|Reporting 1 tasks 12/13/2006 11:41:28 PM|ralph@home|Scheduler request succeeded 12/14/2006 2:01:58 AM||Rescheduling CPU: application exited 12/14/2006 2:01:58 AM|ralph@home|Computation for task 1shfA_BOINC_POSE_ABRELAX_NEWRELAXFLAGS_frags83__1559_3_0 finished 12/14/2006 2:01:58 AM|ralph@home|Sending scheduler request to https://ralph.bakerlab.org/ralph_cgi/cgi 12/14/2006 2:01:58 AM|ralph@home|Reason: To fetch work 12/14/2006 2:01:58 AM|ralph@home|Requesting 306440 seconds of new work 12/14/2006 2:02:00 AM|ralph@home|Started upload of file 1shfA_BOINC_POSE_ABRELAX_NEWRELAXFLAGS_frags83__1559_3_0_0 12/14/2006 2:02:07 AM|ralph@home|Finished upload of file 1shfA_BOINC_POSE_ABRELAX_NEWRELAXFLAGS_frags83__1559_3_0_0 12/14/2006 2:02:07 AM|ralph@home|Throughput 1934 bytes/sec 12/14/2006 2:02:08 AM|ralph@home|Scheduler request succeeded 12/14/2006 2:02:10 AM|ralph@home|Started download of file frags83_1fna_.fasta.gz 12/14/2006 2:02:10 AM|ralph@home|Started download of file frags83_1fna_.psipred_ss2.gz 12/14/2006 2:02:15 AM|ralph@home|Finished download of file frags83_1fna_.fasta.gz 12/14/2006 2:02:15 AM|ralph@home|Throughput 38 bytes/sec 12/14/2006 2:02:15 AM|ralph@home|Finished download of file frags83_1fna_.psipred_ss2.gz 12/14/2006 2:02:15 AM|ralph@home|Throughput 310 bytes/sec 12/14/2006 2:02:15 AM|ralph@home|Started download of file boinc_frags83_aa1fna_03_05.200_v1_3.gz 12/14/2006 2:02:15 AM|ralph@home|Started download of file boinc_frags83_aa1fna_09_05.200_v1_3.gz 12/14/2006 2:02:24 AM|ralph@home|Finished download of file boinc_frags83_aa1fna_09_05.200_v1_3.gz 12/14/2006 2:02:24 AM|ralph@home|Throughput 24131 bytes/sec 12/14/2006 2:02:24 AM|ralph@home|Started download of file frags83_1fna.pdb.gz 12/14/2006 2:02:27 AM|ralph@home|Finished download of file frags83_1fna.pdb.gz 12/14/2006 2:02:27 AM|ralph@home|Throughput 6361 bytes/sec 12/14/2006 2:02:27 AM|ralph@home|Started download of file frags83_1gvp_.fasta.gz 12/14/2006 2:02:29 AM|ralph@home|Finished download of file frags83_1gvp_.fasta.gz 12/14/2006 2:02:29 AM|ralph@home|Throughput 113 bytes/sec 12/14/2006 2:02:29 AM|ralph@home|Started download of file frags83_1gvp_.psipred_ss2.gz 12/14/2006 2:02:31 AM|ralph@home|Finished download of file frags83_1gvp_.psipred_ss2.gz 12/14/2006 2:02:31 AM|ralph@home|Throughput 882 bytes/sec 12/14/2006 2:02:31 AM|ralph@home|Started download of file boinc_frags83_aa1gvp_03_05.200_v1_3.gz 12/14/2006 2:02:32 AM|ralph@home|Finished download of file boinc_frags83_aa1fna_03_05.200_v1_3.gz 12/14/2006 2:02:32 AM|ralph@home|Throughput 49290 bytes/sec 12/14/2006 2:02:32 AM|ralph@home|Started download of file boinc_frags83_aa1gvp_09_05.200_v1_3.gz 12/14/2006 2:02:33 AM||Rescheduling CPU: files downloaded 12/14/2006 2:02:33 AM|ralph@home|Starting task 1fna__BOINC_POSE_ABRELAX_NEWRELAXFLAGS_frags83__1559_7_0 using rosetta_beta version 543 12/14/2006 2:02:40 AM|ralph@home|Finished download of file boinc_frags83_aa1gvp_09_05.200_v1_3.gz 12/14/2006 2:02:40 AM|ralph@home|Throughput 26148 bytes/sec 12/14/2006 2:02:40 AM|ralph@home|Started download of file frags83_1gvp.pdb.gz 12/14/2006 2:02:43 AM|ralph@home|Finished download of file frags83_1gvp.pdb.gz 12/14/2006 2:02:43 AM|ralph@home|Throughput 7169 bytes/sec 12/14/2006 2:02:48 AM|ralph@home|Finished download of file boinc_frags83_aa1gvp_03_05.200_v1_3.gz 12/14/2006 2:02:48 AM|ralph@home|Throughput 47573 bytes/sec 12/14/2006 2:02:49 AM||Rescheduling CPU: files downloaded 12/14/2006 3:41:12 AM||Rescheduling CPU: application exited 12/14/2006 3:41:12 AM|ralph@home|Computation for task 2chf__BOINC_POSE_ABRELAX_VARY_SC_BOND_ANGLES_NEWRELAXFLAGS_frags83__1560_3_0 finished 12/14/2006 3:41:12 AM|ralph@home|Starting task 1gvp__BOINC_POSE_ABRELAX_NEWRELAXFLAGS_frags83__1559_7_0 using rosetta_beta version 543 12/14/2006 3:41:14 AM|ralph@home|Started upload of file 2chf__BOINC_POSE_ABRELAX_VARY_SC_BOND_ANGLES_NEWRELAXFLAGS_frags83__1560_3_0_0 12/14/2006 3:41:21 AM|ralph@home|Finished upload of file 2chf__BOINC_POSE_ABRELAX_VARY_SC_BOND_ANGLES_NEWRELAXFLAGS_frags83__1560_3_0_0 12/14/2006 3:41:21 AM|ralph@home|Throughput 586 bytes/sec 12/14/2006 8:37:49 AM||Suspending computation - user request 12/14/2006 8:37:49 AM|ralph@home|Pausing task 1fna__BOINC_POSE_ABRELAX_NEWRELAXFLAGS_frags83__1559_7_0 (left in memory) 12/14/2006 8:37:49 AM|ralph@home|Pausing task 1gvp__BOINC_POSE_ABRELAX_NEWRELAXFLAGS_frags83__1559_7_0 (left in memory) 12/14/2006 8:37:49 AM||Suspending network activity - user request 12/14/2006 8:37:51 AM|ralph@home|rosetta_beta not responding to screensaver, exiting 12/14/2006 8:38:00 AM|ralph@home|Unrecoverable error for result 1gvp__BOINC_POSE_ABRELAX_NEWRELAXFLAGS_frags83__1559_7_0 ( - exit code -1 (0xffffffff)) 12/14/2006 8:38:00 AM|ralph@home|Deferring scheduler requests for 1 minutes and 0 seconds 12/14/2006 8:38:00 AM||Rescheduling CPU: application exited 12/14/2006 8:38:00 AM|ralph@home|Computation for task 1gvp__BOINC_POSE_ABRELAX_NEWRELAXFLAGS_frags83__1559_7_0 finished 12/14/2006 8:40:03 AM|ralph@home|Unrecoverable error for result 1fna__BOINC_POSE_ABRELAX_NEWRELAXFLAGS_frags83__1559_7_0 ( - exit code -1073741819 (0xc0000005)) 12/14/2006 8:40:03 AM|ralph@home|Deferring scheduler requests for 1 minutes and 0 seconds 12/14/2006 8:40:03 AM||Rescheduling CPU: application exited 12/14/2006 8:40:03 AM|ralph@home|Computation for task 1fna__BOINC_POSE_ABRELAX_NEWRELAXFLAGS_frags83__1559_7_0 finished 12/14/2006 8:40:34 AM||Resuming computation 12/14/2006 8:40:34 AM||Rescheduling CPU: Resuming computation 12/14/2006 8:40:34 AM||Resuming round-robin CPU scheduling. 12/14/2006 8:41:12 AM||Resuming network activity 12/14/2006 8:41:12 AM|ralph@home|Sending scheduler request to https://ralph.bakerlab.org/ralph_cgi/cgi 12/14/2006 8:41:12 AM|ralph@home|Reason: Requested by user 12/14/2006 8:41:12 AM|ralph@home|Requesting 345600 seconds of new work, and reporting 4 completed tasks |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Oh, forgot, that host isn't the only one of mine with Ralph WUs anymore. So that's the link to the host where I've been running the screensaver tests. Perhaps bringing back the hyperthreading impacted it? |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Feet1st, I'm a little confused. Did you do something (like activate hyperthreading) that might have caused that host to start crashing? It crunched one workunit fine, but the rest appear to be watchdog problems or outright crashes. Oh, forgot, that host isn't the only one of mine with Ralph WUs anymore. So that's the link to the host where I've been running the screensaver tests. Perhaps bringing back the hyperthreading impacted it? |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Correct, I used to have BOINC set to use only one CPU. I had done this back on Rosetta 5.41 to see if it effected the screensaver issue. It didn't seem to. I retained that setting on 5.42 by default and later got more adventurous and changed BOINC to allow "both" to run. So, the PC enabled hypterthread all along. But now BOINC is set to run on 2 CPUs. |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
OK, thanks for the clarification. Some followup news -- we can't seem to reproduce the hanging of particular workunits even if we use the same random seed. However, we do have a PC that hangs at least once a day, kind of like one of your hosts! And Chu has been able to put in a debugger to see where it gets stuck... very useful. Anyway, Chu is now testing to see if it behaves better when we disallow any display of graphics... Correct, I used to have BOINC set to use only one CPU. I had done this back on Rosetta 5.41 to see if it effected the screensaver issue. It didn't seem to. I retained that setting on 5.42 by default and later got more adventurous and changed BOINC to allow "both" to run. |
genes Send message Joined: 16 Feb 06 Posts: 45 Credit: 43,706 RAC: 20 |
I had a WU fail today, this message was in the log: 12/14/2006 9:14:07 PM|ralph@home|Unrecoverable error for result 1ten__BOINC_POSE_ABRELAX_VARY_ALL_BOND_ANGLES_VARY_ALL_BOND_DISTANCES_NEWRELAXFLAGS_frags83__1561_15_0 ( - exit code -1073741819 (0xc0000005)) This result: resultid=362757 I came back to the computer and had a Windows error message on the screen "Please tell Microsoft about this problem..." . I don't know if graphics were involved, since I was out, however I do have graphics enabled on this machine, and it is a multiprocessor machine. hostid=2016 |
fastdude Send message Joined: 13 Dec 06 Posts: 4 Credit: 113 RAC: 0 |
I've got a live one! I can't do anyting with the graphic, no rotate, zoom etc. therefore no crash, as per all these "beta / test" WU's. but when viewing the graphic, very slow to show the image, black screen for a few sec then very slow to update the graphic. WU at 1% but has taken 50min of cpu. then go back to boinc when typing this message, and the WU shows 1.5% complete and 4 min of cpu time. it didn't crash though. boinc 5.75 rosetta 5.43 ralph rosetta_beta 5.43 |
fastdude Send message Joined: 13 Dec 06 Posts: 4 Credit: 113 RAC: 0 |
I've got a live one! another of the same type of POS WU. Done 1 1/2 h but still showing 1% and the graphics perform like a stunned mullet. |
Chu Volunteer moderator Project developer Project scientist Send message Joined: 26 Sep 06 Posts: 61 Credit: 12,545 RAC: 0 |
Sidechains, zooming and rotating has been disabled in the current application to help us narrow down the cause of graphic crash. So it is normal that you can not do anything on the screen and since Rosetta spends most of its time in high-resolution refinement ( moving backbone a little and refine sidechains ), it is also normal to hardly observe changes on the screen. However, the step number, cpu time should change frequently to reflect that the WU is still alive . If the WU is still working on generating its first model, it shows the progress at 1% for a while. Not sure about the slow graphic updating. Do you have other windows application running at the same time which also share cpu, memory and other resouces as well? I've got a live one! |
Chu Volunteer moderator Project developer Project scientist Send message Joined: 26 Sep 06 Posts: 61 Credit: 12,545 RAC: 0 |
Hi gene, that job just crashed and did not freeze your computer, right? From users' report and my local test, it looks like that if a frozen WU is forced to be terminated, it reports error code as - exit code 1073807364 (0x40010004). If a WU just crashes itself without freezing the host computer, it will reports error code as -1073741819 (0xc0000005). I had a WU fail today, this message was in the log: |
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.42 and 5.43
©2024 University of Washington
http://www.bakerlab.org