Message boards : RALPH@home bug list : Bug reports for Ralph 5.16
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Hi sTrey: This is a really good idea -- allowing users to set a preference for "big jobs". Its an idea that has come up a few times on the message boards, and we've contacted the BOINC team about it. For now, we're sending these jobs to machines with larger memories -- and we're tracking down ways to reduce the memory requirement. The MAPRELAX wu I reported earlier is now just shy of halfway through its 4-hour task. 149MB in memory, peak of 405MB, VM size 416MB. Not as bad as some, but more than I'm comfortable with. I'm going to suspend some other projects and restart boinc to lessen the competition until this wu finishes. |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Yes these were bad WUs. They've been cancelled and resent with corrected FASTA files. Thanks for posting! I Have a few - |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
An update for those of you who noticed the unusual growing memory of ralph with MAPRELAX jobs. I've partly pinpointed the problem to something that was introduced in the last month in the BOINC windows API. It only causes a growing memory footprint on windows (not linux, that's why I didn't see it originally) and only on those particular jobs. I'm contacting the BOINC team about it -- hopefully this will be fixed by the next ralph. It may also be a useful lead into reducing memory requirements for Windows machines. Thanks to sTrey and others for bringing this to our attention! The MAPRELAX wu I reported earlier is now just shy of halfway through its 4-hour task. 149MB in memory, peak of 405MB, VM size 416MB. Not as bad as some, but more than I'm comfortable with. I'm going to suspend some other projects and restart boinc to lessen the competition until this wu finishes. |
Astro Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0 |
I've had TWO byte the dust today, and one success. wuid=112719 Result ID 128199 Name t283_HOMOLOG_ABRELAX_hom001__532_59_0 Workunit 112719 Created 17 May 2006 7:54:43 UTC Sent 17 May 2006 8:34:07 UTC Received 17 May 2006 10:39:17 UTC Server state Over Outcome Client error Client state Computing Exit status -1073741811 (0xffffffffc000000d) Computer ID 2172 Report deadline 21 May 2006 8:34:07 UTC CPU time 5152 stderr out <core_client_version>5.4.9</core_client_version> <message> - exit code -1073741811 (0xc000000d) </message> <stderr_txt> # cpu_run_time_pref: 14400 # random seed: 3055630 </stderr_txt> Validate state Invalid Claimed credit 19.2013819424423 Granted credit 0 application version 5.16 AND wuid=112469 Result ID 127949 Name t283_HOMOLOG_ABRELAX_hom003__532_23_0 Workunit 112469 Created 17 May 2006 7:54:15 UTC Sent 17 May 2006 8:35:02 UTC Received 17 May 2006 20:27:12 UTC Server state Over Outcome Client error Client state Computing Exit status -1 (0xffffffffffffffff) Computer ID 2173 Report deadline 21 May 2006 8:35:02 UTC CPU time 3093.84375 stderr out <core_client_version>5.4.9</core_client_version> <message> - exit code -1 (0xffffffff) </message> <stderr_txt> # cpu_run_time_pref: 14400 # random seed: 3055266 </stderr_txt> Validate state Invalid Claimed credit 11.4215356293639 Granted credit 0 application version 5.16 |
Big Whiskey Send message Joined: 21 Mar 06 Posts: 3 Credit: 3,342 RAC: 0 |
Watchdog has fallen asleep on this work unit 83799 . Progress stuck at 1.03%, CPU time 23 hours and 24 hours to completion and both are rising. Nothing showing in graphics but a black screen. It seems to be missing some files. I found these messages in the stdout text file in the Slots file. WARNING:: paths.txt file not found!! Setting all paths to . Searching for dat file: .1enh.dat Searching for dat file: .1enh.dat WARNING!! .dat file not found! WARNING: CONSTRAINT FILE NOT FOUND Searched for: .1enh_.cst Running without distance constraints WARNING: DIPOLAR CONSTRAINT FILE NOT FOUND Searched for: .1enh_.dpl Dipolar constraints will not be used Looking for dssp file: .1enh.dssp dssp file not found Looking for secondary structure assignment file: .1enh_.ssa ssa file not found I'm going to have to retire this watchdog in the next day. WOOF |
Astro Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0 |
OK, I've gotten one more. I think I might see a pattern to some extent. of both the 5.12's and 5.16's that I've had the windows fault on, each time it involved my screensaver running at the time. Is anyone else seeing this?? wus I've run while awake and using this puter have been done successfully. anyway, here's last nites faulty wu wuid=112721 Result ID 128201 Name t283_HOMOLOG_ABRELAX_hom003__532_59_0 Workunit 112721 Created 17 May 2006 7:54:43 UTC Sent 17 May 2006 8:34:07 UTC Received 18 May 2006 11:08:21 UTC Server state Over Outcome Client error Client state Computing Exit status -1073741811 (0xffffffffc000000d) Computer ID 2172 Report deadline 21 May 2006 8:34:07 UTC CPU time 13684.0625 stderr out <core_client_version>5.4.9</core_client_version> <message> - exit code -1073741811 (0xc000000d) </message> <stderr_txt> # random seed: 3055230 # cpu_run_time_pref: 14400 # DONE :: 1 starting structures built 19 (nstruct) times # This process generated 19 decoys from 19 attempts </stderr_txt> Validate state Invalid Claimed credit 51.0001767443229 Granted credit 0 application version 5.16 |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
https://ralph.bakerlab.org/result.php?resultid=132116 ERROR:: Exit at: .barcode_classes.cc line:500 Anders n |
suguruhirahara Send message Joined: 5 Mar 06 Posts: 40 Credit: 11,320 RAC: 0 |
This version doesn't cause an error on my computer. Graphic : OK Work Tasks : OK Here are two of completed tasks on my computer. https://ralph.bakerlab.org/workunit.php?wuid=117061 https://ralph.bakerlab.org/workunit.php?wuid=115799 I appreciate developers for great work. :) Anyway, has a cause of errors been identified already? |
suguruhirahara Send message Joined: 5 Mar 06 Posts: 40 Credit: 11,320 RAC: 0 |
... I see. Please keep working to eliminate errors and add new functions. |
Jose Send message Joined: 25 Apr 06 Posts: 7 Credit: 77 RAC: 0 |
https://ralph.bakerlab.org/result.php?resultid=133961 |
Astro Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0 |
This is indeed a strange bug, which seems to be related to graphics, but not all graphics. My P4 1.8 with S3 onboard video doesn't have any errors with 5.16. My AMD 64 3700, 1 M ram, with a PCI-Express Asus EN6200TC256 video card does have problems. I've listed part of my results page below. the errors where while the screensaver was running. The one success early on was when the machine was in constant use. The other success came after I turned off the screensaver in windows. 132586 111400 20 May 2006 4:33:05 UTC 21 May 2006 16:25:53 UTC Over Success Done 14,851.61 58.65 58.65 132585 111399 20 May 2006 4:33:05 UTC 21 May 2006 16:25:53 UTC Over Success Done 13,593.53 53.68 53.68 132584 111344 20 May 2006 4:33:05 UTC 21 May 2006 4:02:36 UTC Over Success Done 14,273.86 56.37 56.37 132122 116565 19 May 2006 17:03:21 UTC 20 May 2006 4:18:50 UTC Over Client error Computing 0.00 0.00 --- 132108 116551 19 May 2006 17:03:21 UTC 20 May 2006 4:18:50 UTC Over Client error Computing 0.00 0.00 --- 132106 116549 19 May 2006 17:03:21 UTC 20 May 2006 4:18:50 UTC Over Client error Computing 0.00 0.00 --- 128201 112721 17 May 2006 8:34:07 UTC 18 May 2006 11:08:21 UTC Over Client error Computing 13,684.06 51.00 --- 128200 112720 17 May 2006 8:34:07 UTC 17 May 2006 16:21:57 UTC Over Success Done 14,411.83 53.71 53.71 128199 112719 17 May 2006 8:34:07 UTC 17 May 2006 10:39:17 UTC Over Client error Computing 5,152.00 19.20 --- My AMD64 3700 laptop doesn't have these errors (I don't think the one shown was the same fatal windows error). 131159 115609 19 May 2006 7:27:43 UTC 20 May 2006 10:18:42 UTC Over Success Done 14,053.16 51.88 51.88 131089 115539 19 May 2006 7:27:43 UTC 20 May 2006 19:30:06 UTC Over Success Done 15,001.38 55.38 55.38 131088 115538 19 May 2006 7:27:43 UTC 19 May 2006 21:05:09 UTC Over Success Done 14,072.97 51.95 51.95 127951 112471 17 May 2006 8:35:02 UTC 18 May 2006 21:13:39 UTC Over Success Done 14,273.81 52.69 52.69 127950 112470 17 May 2006 8:35:02 UTC 18 May 2006 11:08:46 UTC Over Success Done 14,932.47 55.13 55.13 127949 112469 17 May 2006 8:35:02 UTC 17 May 2006 20:27:12 UTC Over Client error Computing 3,093.84 11.42 --- |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
mmciastro and others: great observations! This definitely looks like a problem in the way Rosetta's "I'm finished" call interacts with Boinc -- maybe the graphics thread is not getting shut down properly. I'm sending a note to Rom. This is indeed a strange bug, which seems to be related to graphics, but not all graphics. My P4 1.8 with S3 onboard video doesn't have any errors with 5.16. My AMD 64 3700, 1 M ram, with a PCI-Express Asus EN6200TC256 video card does have problems. I've listed part of my results page below. the errors where while the screensaver was running. The one success early on was when the machine was in constant use. The other success came after I turned off the screensaver in windows. |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Jose: great, thanks for posting this! This is actually a very rare error code that you are seeing. You can see a list of top errors at [url] http://www.romwnet.org/dasblogce/[/url]. Have you seen this if you run other BOINC apps, e.g. Seti@home? https://ralph.bakerlab.org/result.php?resultid=133961 |
Jose Send message Joined: 25 Apr 06 Posts: 7 Credit: 77 RAC: 0 |
Jose: great, thanks for posting this! This is actually That bug has been reported in some of the Rosetta Work Units that have failed. Other BOINC Applications I have run had not reported that error |
wizzszz Send message Joined: 28 Apr 06 Posts: 17 Credit: 1,128 RAC: 0 |
I have "NO searching..." and "Accepted" graphic at all!!! The "Lowest" is broken into many pieces.... And the step counter is awfully slow!!! Running for 8 minutes now, step is only at about 2600! The rosetta WU I am crunching (CASP7, too) reached far beyond step 100.000 within 8 minutes of crunching... |
Basilaris Send message Joined: 16 Feb 06 Posts: 2 Credit: 10,006 RAC: 0 |
I have the same low step rate, but the workunits finish in the normal time after perhaps 4000 steps or so. Graphics are ok in my workunits. |
rob147147 Send message Joined: 11 May 06 Posts: 1 Credit: 312 RAC: 0 |
Had this work unit manage to crash my computer twice https://ralph.bakerlab.org/result.php?resultid=136079 It was running fine for about an hour, but i had noticed it hadnt checkpointed at all in that time. Suddenly my computer crashed...i initially presumed it was nothing to do with the workunit. After a quick reboot i started BOINC back up again and started the work unit again from scratch due to its lack of making a checkpoint. It again seemed to be running fine but after about 55 minutes my computer crashed again. After another reboot and starting BOINC up again the work unit froze after 8 seconds...i had to end the process in task manager so the work unit gave me the computing error... If you require any more info please let me know... Rob |
wizzszz Send message Joined: 28 Apr 06 Posts: 17 Credit: 1,128 RAC: 0 |
I have "NO searching..." and "Accepted" graphic at all!!! Ok, didn't see that is was already in relax phase... But relax phase should be a little later, not at step 2600!?? |
doc :) Send message Joined: 16 Feb 06 Posts: 46 Credit: 4,437 RAC: 0 |
wizzszz: there are different types of WUs, some (rare, or at least i didnt have many of them in the past) start with the relax stage without ever doing the faster ab initio stuff. back to topic :) just got this WU got no stuff in the searching and accepted boxes just like wizzszz in his screenshot, i was able to get the structure in the low energy screen though through moving it randomly around, it was somewhere offscreen, looking ok at first, but started to get randomly broken after a while. now at 1.561% (or a little earlier maybe) everything looks like its normal, all pics where they should be, and the structure in the low energy window is in the center now too when i move it around. |
doc :) Send message Joined: 16 Feb 06 Posts: 46 Credit: 4,437 RAC: 0 |
exact same behavior of the graphics on this WU too. nothing in accepted and searching, randomly broken stuff in low energy. at 1.561% all pictures look normal and the accepted energy graph looks like its starting from the beginning like if it was a new model while its actually still on model 1. |
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.16
©2024 University of Washington
http://www.bakerlab.org