Message boards : RALPH@home bug list : Bug reports for Ralph 5.16
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
[B^S] sTrey![]() Send message Joined: 15 Feb 06 Posts: 58 Credit: 15,430 RAC: 0 |
The MAPRELAX wu I reported earlier is now just shy of halfway through its 4-hour task. 149MB in memory, peak of 405MB, VM size 416MB. Not as bad as some, but more than I'm comfortable with. I'm going to suspend some other projects and restart boinc to lessen the competition until this wu finishes. If this large memory usage is in fact due to the type of wu and not leakage or some other bug, it would be nice if we could set willingness to crunch memory-gobbling tasks in host preferences. Of course there are no host preferences, but even a project-wide preference setting would be helpful for many of us. |
[B^S] sTrey![]() Send message Joined: 15 Feb 06 Posts: 58 Credit: 15,430 RAC: 0 |
The MAPRELAX wu I reported earlier is now just shy of halfway through its 4-hour task. 149MB in memory, peak of 405MB, VM size 416MB. Not as bad as some, but more than I'm comfortable with. I'm going to suspend some other projects and restart boinc to lessen the competition until this wu finishes. Just missed the edit-post deadline... I waited for a checkpoint then restarted boinc. At the time of the restart it was just beginning model 8. ralph was consuming 217MB memory peak 405, VM size 468MB. On the restart, it went up to about 93 MB peak 95, VM 117 MB. It's growing though; as I type this it's at 130MB memory peak 13, VM 145MB. |
TCU Computer Science Send message Joined: 16 Feb 06 Posts: 5 Credit: 241,166 RAC: 0 |
d287__CASP7_ABRELAX_521_7 has been running for 6 hours and shows only 1.044% progress. This is running on a Mac. |
Moderator9 Volunteer moderator Send message Joined: 16 Feb 06 Posts: 251 Credit: 0 RAC: 0 |
d287__CASP7_ABRELAX_521_7 Let it run. It is a test Work Unit for CASP7. It is probably just a large Work Unit. Do not be surprised if it suddenly jumps to 100% at the end of the first model. Do not stop Boinc Or Rosetta or it will start over at 0%. If it gets to the place where is has run longer that about 5 times the setting for "Time" in your preferences, it will either be stopped by the "Watchdog" or you might want to consider aborting it manually at that time. Keep us posted. Moderator9 RALPH@home FAQs RALPH@home Guidelines Moderator Contact |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Hi sTrey: This is a really good idea -- allowing users to set a preference for "big jobs". Its an idea that has come up a few times on the message boards, and we've contacted the BOINC team about it. For now, we're sending these jobs to machines with larger memories -- and we're tracking down ways to reduce the memory requirement. The MAPRELAX wu I reported earlier is now just shy of halfway through its 4-hour task. 149MB in memory, peak of 405MB, VM size 416MB. Not as bad as some, but more than I'm comfortable with. I'm going to suspend some other projects and restart boinc to lessen the competition until this wu finishes. |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Yes these were bad WUs. They've been cancelled and resent with corrected FASTA files. Thanks for posting! I Have a few - |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
An update for those of you who noticed the unusual growing memory of ralph with MAPRELAX jobs. I've partly pinpointed the problem to something that was introduced in the last month in the BOINC windows API. It only causes a growing memory footprint on windows (not linux, that's why I didn't see it originally) and only on those particular jobs. I'm contacting the BOINC team about it -- hopefully this will be fixed by the next ralph. It may also be a useful lead into reducing memory requirements for Windows machines. Thanks to sTrey and others for bringing this to our attention! The MAPRELAX wu I reported earlier is now just shy of halfway through its 4-hour task. 149MB in memory, peak of 405MB, VM size 416MB. Not as bad as some, but more than I'm comfortable with. I'm going to suspend some other projects and restart boinc to lessen the competition until this wu finishes. |
![]() Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0 |
I've had TWO byte the dust today, and one success. wuid=112719 Result ID 128199 Name t283_HOMOLOG_ABRELAX_hom001__532_59_0 Workunit 112719 Created 17 May 2006 7:54:43 UTC Sent 17 May 2006 8:34:07 UTC Received 17 May 2006 10:39:17 UTC Server state Over Outcome Client error Client state Computing Exit status -1073741811 (0xffffffffc000000d) Computer ID 2172 Report deadline 21 May 2006 8:34:07 UTC CPU time 5152 stderr out <core_client_version>5.4.9</core_client_version> <message> - exit code -1073741811 (0xc000000d) </message> <stderr_txt> # cpu_run_time_pref: 14400 # random seed: 3055630 </stderr_txt> Validate state Invalid Claimed credit 19.2013819424423 Granted credit 0 application version 5.16 AND wuid=112469 Result ID 127949 Name t283_HOMOLOG_ABRELAX_hom003__532_23_0 Workunit 112469 Created 17 May 2006 7:54:15 UTC Sent 17 May 2006 8:35:02 UTC Received 17 May 2006 20:27:12 UTC Server state Over Outcome Client error Client state Computing Exit status -1 (0xffffffffffffffff) Computer ID 2173 Report deadline 21 May 2006 8:35:02 UTC CPU time 3093.84375 stderr out <core_client_version>5.4.9</core_client_version> <message> - exit code -1 (0xffffffff) </message> <stderr_txt> # cpu_run_time_pref: 14400 # random seed: 3055266 </stderr_txt> Validate state Invalid Claimed credit 11.4215356293639 Granted credit 0 application version 5.16 |
Big Whiskey![]() Send message Joined: 21 Mar 06 Posts: 3 Credit: 3,342 RAC: 0 |
Watchdog has fallen asleep on this work unit 83799 . Progress stuck at 1.03%, CPU time 23 hours and 24 hours to completion and both are rising. Nothing showing in graphics but a black screen. It seems to be missing some files. I found these messages in the stdout text file in the Slots file. WARNING:: paths.txt file not found!! Setting all paths to . Searching for dat file: .1enh.dat Searching for dat file: .1enh.dat WARNING!! .dat file not found! WARNING: CONSTRAINT FILE NOT FOUND Searched for: .1enh_.cst Running without distance constraints WARNING: DIPOLAR CONSTRAINT FILE NOT FOUND Searched for: .1enh_.dpl Dipolar constraints will not be used Looking for dssp file: .1enh.dssp dssp file not found Looking for secondary structure assignment file: .1enh_.ssa ssa file not found I'm going to have to retire this watchdog in the next day. WOOF |
![]() Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0 |
OK, I've gotten one more. I think I might see a pattern to some extent. of both the 5.12's and 5.16's that I've had the windows fault on, each time it involved my screensaver running at the time. Is anyone else seeing this?? wus I've run while awake and using this puter have been done successfully. anyway, here's last nites faulty wu wuid=112721 Result ID 128201 Name t283_HOMOLOG_ABRELAX_hom003__532_59_0 Workunit 112721 Created 17 May 2006 7:54:43 UTC Sent 17 May 2006 8:34:07 UTC Received 18 May 2006 11:08:21 UTC Server state Over Outcome Client error Client state Computing Exit status -1073741811 (0xffffffffc000000d) Computer ID 2172 Report deadline 21 May 2006 8:34:07 UTC CPU time 13684.0625 stderr out <core_client_version>5.4.9</core_client_version> <message> - exit code -1073741811 (0xc000000d) </message> <stderr_txt> # random seed: 3055230 # cpu_run_time_pref: 14400 # DONE :: 1 starting structures built 19 (nstruct) times # This process generated 19 decoys from 19 attempts </stderr_txt> Validate state Invalid Claimed credit 51.0001767443229 Granted credit 0 application version 5.16 |
![]() Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
https://ralph.bakerlab.org/result.php?resultid=132116 ERROR:: Exit at: .barcode_classes.cc line:500 Anders n |
suguruhirahara Send message Joined: 5 Mar 06 Posts: 40 Credit: 11,320 RAC: 0 |
This version doesn't cause an error on my computer. Graphic : OK Work Tasks : OK Here are two of completed tasks on my computer. https://ralph.bakerlab.org/workunit.php?wuid=117061 https://ralph.bakerlab.org/workunit.php?wuid=115799 I appreciate developers for great work. :) Anyway, has a cause of errors been identified already? |
Moderator9 Volunteer moderator Send message Joined: 16 Feb 06 Posts: 251 Credit: 0 RAC: 0 |
... In part yes, and with the help of the people running RALPH they will eliminate it. Moderator9 RALPH@home FAQs RALPH@home Guidelines Moderator Contact |
suguruhirahara Send message Joined: 5 Mar 06 Posts: 40 Credit: 11,320 RAC: 0 |
... I see. Please keep working to eliminate errors and add new functions. |
Jose![]() Send message Joined: 25 Apr 06 Posts: 7 Credit: 77 RAC: 0 |
https://ralph.bakerlab.org/result.php?resultid=133961 |
![]() Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0 |
This is indeed a strange bug, which seems to be related to graphics, but not all graphics. My P4 1.8 with S3 onboard video doesn't have any errors with 5.16. My AMD 64 3700, 1 M ram, with a PCI-Express Asus EN6200TC256 video card does have problems. I've listed part of my results page below. the errors where while the screensaver was running. The one success early on was when the machine was in constant use. The other success came after I turned off the screensaver in windows. 132586 111400 20 May 2006 4:33:05 UTC 21 May 2006 16:25:53 UTC Over Success Done 14,851.61 58.65 58.65 132585 111399 20 May 2006 4:33:05 UTC 21 May 2006 16:25:53 UTC Over Success Done 13,593.53 53.68 53.68 132584 111344 20 May 2006 4:33:05 UTC 21 May 2006 4:02:36 UTC Over Success Done 14,273.86 56.37 56.37 132122 116565 19 May 2006 17:03:21 UTC 20 May 2006 4:18:50 UTC Over Client error Computing 0.00 0.00 --- 132108 116551 19 May 2006 17:03:21 UTC 20 May 2006 4:18:50 UTC Over Client error Computing 0.00 0.00 --- 132106 116549 19 May 2006 17:03:21 UTC 20 May 2006 4:18:50 UTC Over Client error Computing 0.00 0.00 --- 128201 112721 17 May 2006 8:34:07 UTC 18 May 2006 11:08:21 UTC Over Client error Computing 13,684.06 51.00 --- 128200 112720 17 May 2006 8:34:07 UTC 17 May 2006 16:21:57 UTC Over Success Done 14,411.83 53.71 53.71 128199 112719 17 May 2006 8:34:07 UTC 17 May 2006 10:39:17 UTC Over Client error Computing 5,152.00 19.20 --- My AMD64 3700 laptop doesn't have these errors (I don't think the one shown was the same fatal windows error). 131159 115609 19 May 2006 7:27:43 UTC 20 May 2006 10:18:42 UTC Over Success Done 14,053.16 51.88 51.88 131089 115539 19 May 2006 7:27:43 UTC 20 May 2006 19:30:06 UTC Over Success Done 15,001.38 55.38 55.38 131088 115538 19 May 2006 7:27:43 UTC 19 May 2006 21:05:09 UTC Over Success Done 14,072.97 51.95 51.95 127951 112471 17 May 2006 8:35:02 UTC 18 May 2006 21:13:39 UTC Over Success Done 14,273.81 52.69 52.69 127950 112470 17 May 2006 8:35:02 UTC 18 May 2006 11:08:46 UTC Over Success Done 14,932.47 55.13 55.13 127949 112469 17 May 2006 8:35:02 UTC 17 May 2006 20:27:12 UTC Over Client error Computing 3,093.84 11.42 --- |
Moderator9 Volunteer moderator Send message Joined: 16 Feb 06 Posts: 251 Credit: 0 RAC: 0 |
https://ralph.bakerlab.org/result.php?resultid=133961 Jose, Your result posted a fountain of very valuable error data. I have sent a message to Rhiju with a link and asked him to review it. Thanks for attaching here, this should be VERY helpful. We should hear somethng back soon. Moderator9 RALPH@home FAQs RALPH@home Guidelines Moderator Contact |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
mmciastro and others: great observations! This definitely looks like a problem in the way Rosetta's "I'm finished" call interacts with Boinc -- maybe the graphics thread is not getting shut down properly. I'm sending a note to Rom. This is indeed a strange bug, which seems to be related to graphics, but not all graphics. My P4 1.8 with S3 onboard video doesn't have any errors with 5.16. My AMD 64 3700, 1 M ram, with a PCI-Express Asus EN6200TC256 video card does have problems. I've listed part of my results page below. the errors where while the screensaver was running. The one success early on was when the machine was in constant use. The other success came after I turned off the screensaver in windows. |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Jose: great, thanks for posting this! This is actually a very rare error code that you are seeing. You can see a list of top errors at [url] http://www.romwnet.org/dasblogce/[/url]. Have you seen this if you run other BOINC apps, e.g. Seti@home? https://ralph.bakerlab.org/result.php?resultid=133961 |
Jose![]() Send message Joined: 25 Apr 06 Posts: 7 Credit: 77 RAC: 0 |
Jose: great, thanks for posting this! This is actually That bug has been reported in some of the Rosetta Work Units that have failed. Other BOINC Applications I have run had not reported that error |
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.16
©2023 University of Washington
http://www.bakerlab.org