Bug reports for Ralph 5.23

Author	Message
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0	Message 1807 - Posted: 10 Jun 2006, 1:25:34 UTC We fixed something in the interaction of Rosetta with BOINC to trigger more informative debugging messages upon crashes. Please continue to post what goes wrong! ID: 1807 · Reply Quote

Fuzzy Hollynoodles Send message Joined: 19 Feb 06 Posts: 37 Credit: 2,089 RAC: 0	Message 1815 - Posted: 11 Jun 2006, 11:18:05 UTC - in response to Message 1807. We fixed something in the interaction of Rosetta with BOINC to trigger more informative debugging messages upon crashes. Please continue to post what goes wrong! We will, as soon we're able to get some. :-/ [color=navy][b]"I'm trying to maintain a shred of dignity in this world." - Me[/b][/color] ID: 1815 · Reply Quote

Neal Send message Joined: 6 Mar 06 Posts: 4 Credit: 34,698 RAC: 0	Message 1816 - Posted: 11 Jun 2006, 18:11:08 UTC How long has the site been "Down for maintenance"? Neal ID: 1816 · Reply Quote

[B^S] sTrey Send message Joined: 15 Feb 06 Posts: 58 Credit: 15,430 RAC: 0	Message 1817 - Posted: 11 Jun 2006, 18:32:17 UTC At least as long as this thread has been going, so at least 24 hours. Going to blow past workunit deadlines soon :( ID: 1817 · Reply Quote

Pieface Send message Joined: 16 Feb 06 Posts: 64 Credit: 203,513 RAC: 0	Message 1818 - Posted: 11 Jun 2006, 18:44:09 UTC Looks like it 'went down' sometime late friday. Maybe when they re-started the Rosetta server after the boinc upgrade they forgot to check on poor old second-cousin Ralphie! ID: 1818 · Reply Quote

Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0	Message 1819 - Posted: 11 Jun 2006, 22:37:37 UTC - in response to Message 1818. Last modified: 11 Jun 2006, 22:38:03 UTC It was a database problem, and the database guy on our team was unavailable. Ralph's back! Looks like it 'went down' sometime late friday. Maybe when they re-started the Rosetta server after the boinc upgrade they forgot to check on poor old second-cousin Ralphie! ID: 1819 · Reply Quote

IceQueen41 Send message Joined: 22 Feb 06 Posts: 6 Credit: 9,473 RAC: 0	Message 1820 - Posted: 12 Jun 2006, 2:33:49 UTC Not sure if this is a bug or not, but in the graphics for this WU, the protein simply isn't there, though everything else appears to be running properly. Also, I'm sure this has been mentioned (haven't been around much lately), but with most of the WU graphics, when opened a second (and then third, etc) time, the information at the bottom is shifted down so that the bottom line (the project URL & Accepted Energy) is not visible. Other than that, everything looks good so far. ID: 1820 · Reply Quote

Sadir Send message Joined: 21 Feb 06 Posts: 6 Credit: 1,419 RAC: 0	Message 1821 - Posted: 12 Jun 2006, 8:32:14 UTC Last modified: 12 Jun 2006, 8:36:38 UTC 1)I don't know if this was not mentioned early, but the progres counting don't work well. 6 min ... 1.020% 24 min ... 1.041% (8:21:56 to completion) 40 min ... 1.042% (8:38:22 to completion) 1:09:30 ... completion (I saw this also with 5.22) 2)More memory need? 12/06/2006 09:42:53\|ralph@home\|Message from server: Your computer has only 402116608 bytes of memory; workunit requires 97883392 more bytes ID: 1821 · Reply Quote

doc :) Send message Joined: 16 Feb 06 Posts: 46 Credit: 4,437 RAC: 0	Message 1823 - Posted: 12 Jun 2006, 15:15:34 UTC Sadir: the percentage complete thing is perfectly normal, the first model just took that long. no errors with 5.23 so far here, couple of successfull WUs finished. ID: 1823 · Reply Quote

Astro Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0	Message 1825 - Posted: 13 Jun 2006, 11:24:34 UTC I have a funny (interesting) one. One my laptop (which has been pretty much flawless at Ralph, as opposed to my AMD64 3700 sandiego which experiences the "fatal windows" error) I've seen something happen twice in 24 hours. I see either the Rosetta 5.22 screensaver or the Ralph 5.23 screensaver will show on my window when I return from some personal task. the graphic will NOT go away by moving a mouse or pressing a key. I had another window open but couldn't see it. The mouse would still work on the unseen graphic if I just clicked all over I could hear it interacting, but the Rosetta Graphic would not release my screen. I ended up pressing the power button on both occasions, only to see the HD activity light blink and hear the windows log off Wav, but the Rosetta graphic was still on the screen all the way to Shutdown when the screen when dead. Since mine is the only report of this, it was on both Rosetta and Ralph, and hasn't happened with the laptop before, I will be doing some adware/malware/virus/others scans to see if the problem is on my end. tony ID: 1825 · Reply Quote

Aglarond Send message Joined: 16 Feb 06 Posts: 11 Credit: 1,094 RAC: 0	Message 1826 - Posted: 14 Jun 2006, 10:12:56 UTC Last modified: 14 Jun 2006, 10:15:00 UTC Strange error in FRA_t316_CASP7_hom001_1_IGNORE_THE_RESTt316_1_PROTINFO-AB_TS1.pdb_666_2_0 . At first it was running normally but several Simap WUs had errors. Later strange error message has appeared. Something I've never seen before: "Runtime error! Program: ...alph.bakerlab.orgrosetta_beta_5.23_windows_intelx86.exe This application has requested the Runtime to terminate it in unusual way. Please contact the application's support team for more information." Screenshot is here: Ralph_error.gif (7.76KB) My messages: 14. 6. 2006 10:50:47\|ralph@home\|Unrecoverable error for result FRA_t316_CASP7_hom001_1_IGNORE_THE_RESTt316_1_PROTINFO-AB_TS1.pdb_666_2_0 (The system cannot find the path specified. (0x3) - exit code 3 (0x3)) 14. 6. 2006 10:50:47\|ralph@home\|Deferring scheduler requests for 1 minutes and 0 seconds 14. 6. 2006 10:50:47\|\|Rescheduling CPU: application exited 14. 6. 2006 10:50:47\|ralph@home\|Computation for task FRA_t316_CASP7_hom001_1_IGNORE_THE_RESTt316_1_PROTINFO-AB_TS1.pdb_666_2_0 finished Full message log: Ralph_error_log_14june2006.txt (40KB) Computer where this error happened is PIII 500MHz, 160MB RAM, WinXP Home SP2, running only antivirus and Boinc with Simap and Ralph. (It has 512MB virtual memory, what is probably not enough for some bigger WUs) After this WU was finished (with error), Simap stopped to make errors and finished next WU successfully. ID: 1826 · Reply Quote

Astro Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0	Message 1828 - Posted: 15 Jun 2006, 10:55:11 UTC Rom had mentioned there might be a fix to the fatal windows errors in 5.23. When it was released, I set the box I usually got these errors with to NNW/NNT for all other projects and suspended them, so I'd run nothing but 5.23. I'm not ready to say "it's Fixed", but so far it sure looks good. 177824 158140 14 Jun 2006 19:37:15 UTC 15 Jun 2006 10:43:14 UTC Over Success Done 13,545.38 53.30 53.30 177438 157770 14 Jun 2006 15:16:32 UTC 15 Jun 2006 7:36:07 UTC Over Success Done 14,102.19 55.50 55.50 176725 156687 14 Jun 2006 7:37:56 UTC 15 Jun 2006 1:10:40 UTC Over Success Done 13,275.25 52.24 52.24 176356 156712 14 Jun 2006 3:52:59 UTC 14 Jun 2006 20:28:26 UTC Over Success Done 13,985.28 53.74 53.74 175612 151093 13 Jun 2006 20:01:42 UTC 14 Jun 2006 16:47:14 UTC Over Success Done 14,084.34 54.12 54.12 174950 155410 13 Jun 2006 13:25:22 UTC 14 Jun 2006 15:16:32 UTC Over Success Done 14,111.06 54.22 54.22 174529 155065 13 Jun 2006 9:17:55 UTC 14 Jun 2006 7:37:56 UTC Over Success Done 14,101.56 54.18 54.18 174341 154879 13 Jun 2006 6:09:29 UTC 14 Jun 2006 3:52:59 UTC Over Success Done 14,346.30 55.12 55.12 173772 154359 12 Jun 2006 22:33:06 UTC 13 Jun 2006 20:01:42 UTC Over Success Done 14,103.70 54.19 54.19 173541 154155 12 Jun 2006 19:11:15 UTC 13 Jun 2006 10:39:56 UTC Over Success Done 14,441.13 55.49 55.49 170677 146450 11 Jun 2006 22:52:42 UTC 13 Jun 2006 9:17:55 UTC Over Success Done 13,161.97 50.57 50.57 170660 146482 11 Jun 2006 22:52:42 UTC 13 Jun 2006 3:03:42 UTC Over Success Done 14,275.66 54.85 54.85 170659 146481 11 Jun 2006 22:52:42 UTC 12 Jun 2006 8:30:20 UTC Over Success Done 13,858.98 53.25 53.25 ID: 1828 · Reply Quote

william Send message Joined: 3 Jun 06 Posts: 4 Credit: 74 RAC: 0	Message 1829 - Posted: 16 Jun 2006, 14:45:21 UTC is this WU a troubled wu? FRA_t301_hom001_1_LOOPRLX_IGNORE_THE_REST__hom001_1_1bwzA__100_701_23_0 I have left it running now for 1:34:10 and it is only at 1.623% and finish time is up to 3333:00:01 now and still climbing ID: 1829 · Reply Quote

william Send message Joined: 3 Jun 06 Posts: 4 Credit: 74 RAC: 0	Message 1830 - Posted: 16 Jun 2006, 14:49:02 UTC - in response to Message 1829. is this WU a troubled wu? FRA_t301_hom001_1_LOOPRLX_IGNORE_THE_REST__hom001_1_1bwzA__100_701_23_0 I have left it running now for 1:34:10 and it is only at 1.623% and finish time is up to 3333:00:01 now and still climbing never mind it just hit 100% will I was typing this into the forums so I am not sure what happened as of yet time was 1:37:27 total ID: 1830 · Reply Quote

Astro Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0	Message 1831 - Posted: 16 Jun 2006, 14:51:23 UTC - in response to Message 1829. is this WU a troubled wu? FRA_t301_hom001_1_LOOPRLX_IGNORE_THE_REST__hom001_1_1bwzA__100_701_23_0 I have left it running now for 1:34:10 and it is only at 1.623% and finish time is up to 3333:00:01 now and still climbing the progress indicator isn't linear. What you'll see are jumps in percetage. All WUs start at 1% and slowly proceed higher until one model is done. Then it jumps to another percentage and the points to the right of the decimal slowly proceed again until the next model is done. I.E you might see this if you checked the status every 10 min: 1.000, 1.0001, 1.0002, 1.003, 12.000, 12.001, 12.002, 12.003, 24.000, 24.001 etc etc until the time runs out where it jumps to 100%, uploads and reports. The number of models depends on protein size, and puter speeds (for the most part). Every WU will run atleast ONE model regardless of time (except where terminated by "watchdog timer"). does this help? tony ID: 1831 · Reply Quote

william Send message Joined: 3 Jun 06 Posts: 4 Credit: 74 RAC: 0	Message 1832 - Posted: 16 Jun 2006, 14:56:26 UTC - in response to Message 1831. is this WU a troubled wu? FRA_t301_hom001_1_LOOPRLX_IGNORE_THE_REST__hom001_1_1bwzA__100_701_23_0 I have left it running now for 1:34:10 and it is only at 1.623% and finish time is up to 3333:00:01 now and still climbing the progress indicator isn't linear. What you'll see are jumps in percetage. All WUs start at 1% and slowly proceed higher until one model is done. Then it jumps to another percentage and the points to the right of the decimal slowly proceed again until the next model is done. I.E you might see this if you checked the status every 10 min: 1.000, 1.0001, 1.0002, 1.003, 12.000, 12.001, 12.002, 12.003, 24.000, 24.001 etc etc until the time runs out where it jumps to 100%, uploads and reports. The number of models depends on protein size, and puter speeds (for the most part). Every WU will run atleast ONE model regardless of time (except where terminated by "watchdog timer"). does this help? tony yes and watchdog did terminate that Wu on this computer, <core_client_version>5.4.9</core_client_version> <stderr_txt> # random seed: 2998884 # cpu_run_time_pref: 3600 # DONE :: 1 starting structures built 0 (nstruct) times # This process generated 1 decoys from 1 attempts # 0 starting pdbs were skipped BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> is the results of it. and this is computer so I do not know if this will help in the project CPU type GenuineIntel Intel(R) Celeron(R) CPU 2.60GHz Number of CPUs 1 Operating System Microsoft Windows XP Home Edition, Service Pack 2, (05.01.2600.00) Memory 510.98 MB Cache 976.56 KB Swap space 1248.2 MB Total disk space 37.26 GB Free Disk Space 22.39 GB Measured floating point speed 1329.23 million ops/sec Measured integer speed 2718.87 million ops/sec Average upload rate 1.41 KB/sec Average download rate 132.77 KB/sec ID: 1832 · Reply Quote

Astro Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0	Message 1833 - Posted: 16 Jun 2006, 15:29:22 UTC - in response to Message 1832. Last modified: 16 Jun 2006, 15:32:54 UTC BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... This is normal, it just means the wu completed successfully and didn't need the watchdog, so it shut it down. If you're talking about wuid=160649, then it completed sucessfully and has been credited. See the "result ID" for that WU below Result ID 180410 Name FRA_t301_hom001_1_LOOPRLX_IGNORE_THE_REST__hom001_1_1bwzA__100_701_23_0 Workunit 160649 Created 15 Jun 2006 22:54:54 UTC Sent 15 Jun 2006 23:51:21 UTC Received 16 Jun 2006 14:42:46 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 2910 Report deadline 19 Jun 2006 23:51:21 UTC CPU time 5847.390625 stderr out <core_client_version>5.4.9</core_client_version> <stderr_txt> # random seed: 2998884 # cpu_run_time_pref: 3600 # DONE :: 1 starting structures built 0 (nstruct) times # This process generated 1 decoys from 1 attempts # 0 starting pdbs were skipped BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> Validate state Valid Claimed credit 13.6983915054668 Granted credit 13.6983915054668 application version 5.23 if the protein is huge, your puter old, or your runtime is set low, then this is what you should be seeing with your future wus. It's helping. tony ID: 1833 · Reply Quote

Astro Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0	Message 1834 - Posted: 17 Jun 2006, 11:00:00 UTC - in response to Message 1828. Last modified: 17 Jun 2006, 11:05:16 UTC Rom had mentioned there might be a fix to the fatal windows errors in 5.23. When it was released, I set the box I usually got these errors with to NNW/NNT for all other projects and suspended them, so I'd run nothing but 5.23. I'm not ready to say "it's Fixed", but so far it sure looks good. 177824 158140 14 Jun 2006 19:37:15 UTC 15 Jun 2006 10:43:14 UTC Over Success Done 13,545.38 53.30 53.30 177438 157770 14 Jun 2006 15:16:32 UTC 15 Jun 2006 7:36:07 UTC Over Success Done 14,102.19 55.50 55.50 176725 156687 14 Jun 2006 7:37:56 UTC 15 Jun 2006 1:10:40 UTC Over Success Done 13,275.25 52.24 52.24 176356 156712 14 Jun 2006 3:52:59 UTC 14 Jun 2006 20:28:26 UTC Over Success Done 13,985.28 53.74 53.74 175612 151093 13 Jun 2006 20:01:42 UTC 14 Jun 2006 16:47:14 UTC Over Success Done 14,084.34 54.12 54.12 174950 155410 13 Jun 2006 13:25:22 UTC 14 Jun 2006 15:16:32 UTC Over Success Done 14,111.06 54.22 54.22 174529 155065 13 Jun 2006 9:17:55 UTC 14 Jun 2006 7:37:56 UTC Over Success Done 14,101.56 54.18 54.18 174341 154879 13 Jun 2006 6:09:29 UTC 14 Jun 2006 3:52:59 UTC Over Success Done 14,346.30 55.12 55.12 173772 154359 12 Jun 2006 22:33:06 UTC 13 Jun 2006 20:01:42 UTC Over Success Done 14,103.70 54.19 54.19 173541 154155 12 Jun 2006 19:11:15 UTC 13 Jun 2006 10:39:56 UTC Over Success Done 14,441.13 55.49 55.49 170677 146450 11 Jun 2006 22:52:42 UTC 13 Jun 2006 9:17:55 UTC Over Success Done 13,161.97 50.57 50.57 170660 146482 11 Jun 2006 22:52:42 UTC 13 Jun 2006 3:03:42 UTC Over Success Done 14,275.66 54.85 54.85 170659 146481 11 Jun 2006 22:52:42 UTC 12 Jun 2006 8:30:20 UTC Over Success Done 13,858.98 53.25 53.25 OK, OK I'm convinced. I haven't had any errors of any kind with 5.23. The following WU can be added to the list of successes for my error prone puter: 181942 162058 16 Jun 2006 18:44:02 UTC 17 Jun 2006 10:47:32 UTC Over Success Done 14,364.88 56.53 56.53 181682 161805 16 Jun 2006 14:44:34 UTC 17 Jun 2006 4:16:03 UTC Over Success Done 13,954.83 54.92 54.92 181002 161170 16 Jun 2006 8:41:29 UTC 17 Jun 2006 0:34:09 UTC Over Success Done 13,478.86 53.04 53.04 180751 160956 16 Jun 2006 4:39:34 UTC 16 Jun 2006 20:44:26 UTC Over Success Done 13,756.77 54.14 54.14 180432 151974 15 Jun 2006 23:29:54 UTC 16 Jun 2006 15:04:45 UTC Over Success Done 14,181.53 55.81 55.81 179254 159529 15 Jun 2006 11:09:29 UTC 16 Jun 2006 11:14:50 UTC Over Success Done 14,375.77 56.57 56.57 178840 159127 15 Jun 2006 7:36:07 UTC 16 Jun 2006 8:41:29 UTC Over Success Done 14,242.95 56.05 56.05 178558 158860 15 Jun 2006 4:15:02 UTC 15 Jun 2006 23:29:54 UTC Over Success Done 13,769.56 54.19 54.19 178127 158438 14 Jun 2006 23:29:32 UTC 15 Jun 2006 20:00:12 UTC Over Success Done 14,343.39 56.44 56.44 I've set the other projects back to "allow new work" and "resumed" them. THanks for fixing this ID: 1834 · Reply Quote

Carlos_Pfitzner Send message Joined: 16 Feb 06 Posts: 182 Credit: 22,792 RAC: 0	Message 1835 - Posted: 17 Jun 2006, 20:10:13 UTC - in response to Message 1821. Last modified: 17 Jun 2006, 20:37:25 UTC 1)I don't know if this was not mentioned early, but the progres counting don't work well. 6 min ... 1.020% 24 min ... 1.041% (8:21:56 to completion) 40 min ... 1.042% (8:38:22 to completion) 1:09:30 ... completion (I saw this also with 5.22) 2)More memory need? 12/06/2006 09:42:53\|ralph@home\|Message from server: Your computer has only 402116608 bytes of memory; workunit requires 97883392 more bytes I think there is no need for more momory -:) Jobs are finishing OK, w/o any errors, in normal run time ! What is need is to "fix" this misleading message on both: (ralph and rosetta) ps: I had sucessfully run rosetta with 64 MB RAM Only with that low ram I get about 70 page faults by second but I am getting above message on PCs with 256 MB physical RAM -:( Yeah ! Really this %done* indicator is not working smooth ps: on rosetta too, I had already aborted a WU by hand, cause this;I believed that it was stuck -:( It was with about 3 hours cpu time, at 1.00n% My preference run time is 2 hours, and boincview was prediting 190 hours of cpu time to completion -; ps: I really dont want to run a single job by 190 Hours :!: Thanks Click signature for global team stats ID: 1835 · Reply Quote

william Send message Joined: 3 Jun 06 Posts: 4 Credit: 74 RAC: 0	Message 1836 - Posted: 17 Jun 2006, 22:20:50 UTC - in response to Message 1835. 1)I don't know if this was not mentioned early, but the progres counting don't work well. 6 min ... 1.020% 24 min ... 1.041% (8:21:56 to completion) 40 min ... 1.042% (8:38:22 to completion) 1:09:30 ... completion (I saw this also with 5.22) 2)More memory need? 12/06/2006 09:42:53\|ralph@home\|Message from server: Your computer has only 402116608 bytes of memory; workunit requires 97883392 more bytes I think there is no need for more momory -:) Jobs are finishing OK, w/o any errors, in normal run time ! What is need is to "fix" this misleading message on both: (ralph and rosetta) ps: I had sucessfully run rosetta with 64 MB RAM Only with that low ram I get about 70 page faults by second but I am getting above message on PCs with 256 MB physical RAM -:( Yeah ! Really this %done* indicator is not working smooth ps: on rosetta too, I had already aborted a WU by hand, cause this;I believed that it was stuck -:( It was with about 3 hours cpu time, at 1.00n% My preference run time is 2 hours, and boincview was prediting 190 hours of cpu time to completion -; ps: I really dont want to run a single job by 190 Hours :!: Thanks I think Ralph needs to work on the Progress bar area of the program. I went thur no problems, It was the first time I saw that from this project is all. so I thought I would say something but if that is normal then nothing to worry about. ID: 1836 · Reply Quote