Bug reports for Ralph 5.16

Message boards : RALPH@home bug list : Bug reports for Ralph 5.16

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
[B^S] sTrey
Avatar

Send message
Joined: 15 Feb 06
Posts: 58
Credit: 15,430
RAC: 0
Message 1649 - Posted: 16 May 2006, 4:36:24 UTC
Last modified: 16 May 2006, 4:36:58 UTC

The MAPRELAX wu I reported earlier is now just shy of halfway through its 4-hour task. 149MB in memory, peak of 405MB, VM size 416MB. Not as bad as some, but more than I'm comfortable with. I'm going to suspend some other projects and restart boinc to lessen the competition until this wu finishes.

If this large memory usage is in fact due to the type of wu and not leakage or some other bug, it would be nice if we could set willingness to crunch memory-gobbling tasks in host preferences.

Of course there are no host preferences, but even a project-wide preference setting would be helpful for many of us.
ID: 1649 · Report as offensive    Reply Quote
[B^S] sTrey
Avatar

Send message
Joined: 15 Feb 06
Posts: 58
Credit: 15,430
RAC: 0
Message 1651 - Posted: 16 May 2006, 5:33:09 UTC - in response to Message 1649.  

The MAPRELAX wu I reported earlier is now just shy of halfway through its 4-hour task. 149MB in memory, peak of 405MB, VM size 416MB. Not as bad as some, but more than I'm comfortable with. I'm going to suspend some other projects and restart boinc to lessen the competition until this wu finishes.


Just missed the edit-post deadline... I waited for a checkpoint then restarted boinc. At the time of the restart it was just beginning model 8. ralph was consuming 217MB memory peak 405, VM size 468MB.

On the restart, it went up to about 93 MB peak 95, VM 117 MB. It's growing though; as I type this it's at 130MB memory peak 13, VM 145MB.

ID: 1651 · Report as offensive    Reply Quote
TCU Computer Science

Send message
Joined: 16 Feb 06
Posts: 5
Credit: 241,166
RAC: 0
Message 1653 - Posted: 16 May 2006, 18:05:41 UTC

d287__CASP7_ABRELAX_521_7

has been running for 6 hours and shows only 1.044% progress. This is running on a Mac.
ID: 1653 · Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 16 Feb 06
Posts: 251
Credit: 0
RAC: 0
Message 1654 - Posted: 16 May 2006, 18:11:14 UTC - in response to Message 1653.  

d287__CASP7_ABRELAX_521_7

has been running for 6 hours and shows only 1.044% progress. This is running on a Mac.


Let it run. It is a test Work Unit for CASP7. It is probably just a large Work Unit. Do not be surprised if it suddenly jumps to 100% at the end of the first model. Do not stop Boinc Or Rosetta or it will start over at 0%.

If it gets to the place where is has run longer that about 5 times the setting for "Time" in your preferences, it will either be stopped by the "Watchdog" or you might want to consider aborting it manually at that time.

Keep us posted.

Moderator9
RALPH@home FAQs
RALPH@home Guidelines
Moderator Contact
ID: 1654 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1656 - Posted: 16 May 2006, 20:44:25 UTC - in response to Message 1649.  

Hi sTrey: This is a really good idea -- allowing users to set a preference for "big jobs". Its an idea that has come up a few times on the message boards, and we've contacted the BOINC team about it. For now, we're sending these jobs to machines with larger memories -- and we're tracking down ways to reduce the memory requirement.

The MAPRELAX wu I reported earlier is now just shy of halfway through its 4-hour task. 149MB in memory, peak of 405MB, VM size 416MB. Not as bad as some, but more than I'm comfortable with. I'm going to suspend some other projects and restart boinc to lessen the competition until this wu finishes.

If this large memory usage is in fact due to the type of wu and not leakage or some other bug, it would be nice if we could set willingness to crunch memory-gobbling tasks in host preferences.

Of course there are no host preferences, but even a project-wide preference setting would be helpful for many of us.


ID: 1656 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1657 - Posted: 16 May 2006, 20:47:08 UTC - in response to Message 1647.  

Yes these were bad WUs. They've been cancelled and resent with corrected FASTA files. Thanks for posting!

I Have a few -

    BOINC 5.4.9, Ralph 5.16
    GenuineIntel Intel(R) Pentium(R) M processor 1.86GHz
    Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)
    Memory 2039.37 MB
    cash 76.56 KB
    swap space 932.3 MB
    65.54 GB



resultid=125665 -
ERROR:: Unable to obtain sequence information. fasta file must be provided.
ERROR:: Exit at: .initialize.cc line:236
resultid=125589 -
ERROR:: Unable to obtain sequence information. fasta file must be provided.
ERROR:: Exit at: .initialize.cc line:236
resultid=125413 -
ERROR:: Unable to obtain sequence information. fasta file must be provided.
ERROR:: Exit at: .initialize.cc line:236

The machine is also running Rosetta

Regards
Phil



ID: 1657 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1658 - Posted: 17 May 2006, 3:41:41 UTC - in response to Message 1651.  

An update for those of you who noticed the unusual growing memory of ralph with MAPRELAX jobs. I've partly pinpointed the problem to something that was introduced in the last month in the BOINC windows API. It only causes a growing memory footprint on windows (not linux, that's why I didn't see it originally) and only on those particular jobs. I'm contacting the BOINC team about it -- hopefully this will be fixed by the next ralph. It may also be a useful lead into reducing memory requirements for Windows machines. Thanks to sTrey and others for bringing this to our attention!

The MAPRELAX wu I reported earlier is now just shy of halfway through its 4-hour task. 149MB in memory, peak of 405MB, VM size 416MB. Not as bad as some, but more than I'm comfortable with. I'm going to suspend some other projects and restart boinc to lessen the competition until this wu finishes.


Just missed the edit-post deadline... I waited for a checkpoint then restarted boinc. At the time of the restart it was just beginning model 8. ralph was consuming 217MB memory peak 405, VM size 468MB.

On the restart, it went up to about 93 MB peak 95, VM 117 MB. It's growing though; as I type this it's at 130MB memory peak 13, VM 145MB.



ID: 1658 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1659 - Posted: 17 May 2006, 20:45:19 UTC

I've had TWO byte the dust today, and one success.

wuid=112719
Result ID 128199
Name t283_HOMOLOG_ABRELAX_hom001__532_59_0
Workunit 112719
Created 17 May 2006 7:54:43 UTC
Sent 17 May 2006 8:34:07 UTC
Received 17 May 2006 10:39:17 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -1073741811 (0xffffffffc000000d)
Computer ID 2172
Report deadline 21 May 2006 8:34:07 UTC
CPU time 5152
stderr out <core_client_version>5.4.9</core_client_version>
<message>
- exit code -1073741811 (0xc000000d)
</message>
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 3055630

</stderr_txt>


Validate state Invalid
Claimed credit 19.2013819424423
Granted credit 0
application version 5.16

AND

wuid=112469
Result ID 127949
Name t283_HOMOLOG_ABRELAX_hom003__532_23_0
Workunit 112469
Created 17 May 2006 7:54:15 UTC
Sent 17 May 2006 8:35:02 UTC
Received 17 May 2006 20:27:12 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -1 (0xffffffffffffffff)
Computer ID 2173
Report deadline 21 May 2006 8:35:02 UTC
CPU time 3093.84375
stderr out <core_client_version>5.4.9</core_client_version>
<message>
- exit code -1 (0xffffffff)
</message>
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 3055266

</stderr_txt>


Validate state Invalid
Claimed credit 11.4215356293639
Granted credit 0
application version 5.16
ID: 1659 · Report as offensive    Reply Quote
Big Whiskey
Avatar

Send message
Joined: 21 Mar 06
Posts: 3
Credit: 3,342
RAC: 0
Message 1666 - Posted: 18 May 2006, 6:44:15 UTC
Last modified: 18 May 2006, 6:46:40 UTC

Watchdog has fallen asleep on this work unit 83799 . Progress stuck at 1.03%, CPU time 23 hours and 24 hours to completion and both are rising. Nothing showing in graphics but a black screen.

It seems to be missing some files. I found these messages in the stdout text file in the Slots file.

WARNING:: paths.txt file not found!!
Setting all paths to .

Searching for dat file: .1enh.dat
Searching for dat file: .1enh.dat
WARNING!! .dat file not found!

WARNING: CONSTRAINT FILE NOT FOUND
Searched for: .1enh_.cst
Running without distance constraints

WARNING: DIPOLAR CONSTRAINT FILE NOT FOUND
Searched for: .1enh_.dpl
Dipolar constraints will not be used

Looking for dssp file: .1enh.dssp
dssp file not found
Looking for secondary structure assignment file: .1enh_.ssa
ssa file not found

I'm going to have to retire this watchdog in the next day.
WOOF
ID: 1666 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1668 - Posted: 18 May 2006, 10:57:37 UTC
Last modified: 18 May 2006, 10:58:24 UTC

OK, I've gotten one more. I think I might see a pattern to some extent. of both the 5.12's and 5.16's that I've had the windows fault on, each time it involved my screensaver running at the time. Is anyone else seeing this?? wus I've run while awake and using this puter have been done successfully.

anyway, here's last nites faulty wu


wuid=112721

Result ID 128201
Name t283_HOMOLOG_ABRELAX_hom003__532_59_0
Workunit 112721
Created 17 May 2006 7:54:43 UTC
Sent 17 May 2006 8:34:07 UTC
Received 18 May 2006 11:08:21 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -1073741811 (0xffffffffc000000d)
Computer ID 2172
Report deadline 21 May 2006 8:34:07 UTC
CPU time 13684.0625
stderr out <core_client_version>5.4.9</core_client_version>
<message>
- exit code -1073741811 (0xc000000d)
</message>
<stderr_txt>
# random seed: 3055230
# cpu_run_time_pref: 14400
# DONE :: 1 starting structures built 19 (nstruct) times
# This process generated 19 decoys from 19 attempts

</stderr_txt>


Validate state Invalid
Claimed credit 51.0001767443229
Granted credit 0
application version 5.16
ID: 1668 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 1674 - Posted: 19 May 2006, 17:19:30 UTC

https://ralph.bakerlab.org/result.php?resultid=132116

ERROR:: Exit at: .barcode_classes.cc line:500


Anders n
ID: 1674 · Report as offensive    Reply Quote
suguruhirahara

Send message
Joined: 5 Mar 06
Posts: 40
Credit: 11,320
RAC: 0
Message 1680 - Posted: 20 May 2006, 12:36:36 UTC
Last modified: 20 May 2006, 12:37:22 UTC

This version doesn't cause an error on my computer.

Graphic : OK
Work Tasks : OK

Here are two of completed tasks on my computer.
https://ralph.bakerlab.org/workunit.php?wuid=117061
https://ralph.bakerlab.org/workunit.php?wuid=115799

I appreciate developers for great work. :)
Anyway, has a cause of errors been identified already?
ID: 1680 · Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 16 Feb 06
Posts: 251
Credit: 0
RAC: 0
Message 1681 - Posted: 20 May 2006, 13:53:27 UTC - in response to Message 1680.  

...
Anyway, has a cause of errors been identified already?

In part yes, and with the help of the people running RALPH they will eliminate it.
Moderator9
RALPH@home FAQs
RALPH@home Guidelines
Moderator Contact
ID: 1681 · Report as offensive    Reply Quote
suguruhirahara

Send message
Joined: 5 Mar 06
Posts: 40
Credit: 11,320
RAC: 0
Message 1682 - Posted: 21 May 2006, 8:21:47 UTC - in response to Message 1681.  

...
Anyway, has a cause of errors been identified already?

In part yes, and with the help of the people running RALPH they will eliminate it.


I see. Please keep working to eliminate errors and add new functions.
ID: 1682 · Report as offensive    Reply Quote
Jose
Avatar

Send message
Joined: 25 Apr 06
Posts: 7
Credit: 77
RAC: 0
Message 1683 - Posted: 21 May 2006, 13:12:41 UTC

https://ralph.bakerlab.org/result.php?resultid=133961


ID: 1683 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1684 - Posted: 21 May 2006, 16:12:21 UTC
Last modified: 21 May 2006, 16:15:40 UTC

This is indeed a strange bug, which seems to be related to graphics, but not all graphics. My P4 1.8 with S3 onboard video doesn't have any errors with 5.16. My AMD 64 3700, 1 M ram, with a PCI-Express Asus EN6200TC256 video card does have problems. I've listed part of my results page below. the errors where while the screensaver was running. The one success early on was when the machine was in constant use. The other success came after I turned off the screensaver in windows.

132586 111400 20 May 2006 4:33:05 UTC 21 May 2006 16:25:53 UTC Over Success Done 14,851.61 58.65 58.65
132585 111399 20 May 2006 4:33:05 UTC 21 May 2006 16:25:53 UTC Over Success Done 13,593.53 53.68 53.68
132584 111344 20 May 2006 4:33:05 UTC 21 May 2006 4:02:36 UTC Over Success Done 14,273.86 56.37 56.37
132122 116565 19 May 2006 17:03:21 UTC 20 May 2006 4:18:50 UTC Over Client error Computing 0.00 0.00 ---
132108 116551 19 May 2006 17:03:21 UTC 20 May 2006 4:18:50 UTC Over Client error Computing 0.00 0.00 ---
132106 116549 19 May 2006 17:03:21 UTC 20 May 2006 4:18:50 UTC Over Client error Computing 0.00 0.00 ---
128201 112721 17 May 2006 8:34:07 UTC 18 May 2006 11:08:21 UTC Over Client error Computing 13,684.06 51.00 ---

128200 112720 17 May 2006 8:34:07 UTC 17 May 2006 16:21:57 UTC Over Success Done 14,411.83 53.71 53.71
128199 112719 17 May 2006 8:34:07 UTC 17 May 2006 10:39:17 UTC Over Client error Computing 5,152.00 19.20 ---

My AMD64 3700 laptop doesn't have these errors (I don't think the one shown was the same fatal windows error).

131159 115609 19 May 2006 7:27:43 UTC 20 May 2006 10:18:42 UTC Over Success Done 14,053.16 51.88 51.88
131089 115539 19 May 2006 7:27:43 UTC 20 May 2006 19:30:06 UTC Over Success Done 15,001.38 55.38 55.38
131088 115538 19 May 2006 7:27:43 UTC 19 May 2006 21:05:09 UTC Over Success Done 14,072.97 51.95 51.95
127951 112471 17 May 2006 8:35:02 UTC 18 May 2006 21:13:39 UTC Over Success Done 14,273.81 52.69 52.69
127950 112470 17 May 2006 8:35:02 UTC 18 May 2006 11:08:46 UTC Over Success Done 14,932.47 55.13 55.13
127949 112469 17 May 2006 8:35:02 UTC 17 May 2006 20:27:12 UTC Over Client error Computing 3,093.84 11.42 ---

ID: 1684 · Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 16 Feb 06
Posts: 251
Credit: 0
RAC: 0
Message 1685 - Posted: 21 May 2006, 16:47:03 UTC - in response to Message 1683.  

https://ralph.bakerlab.org/result.php?resultid=133961


Jose,

Your result posted a fountain of very valuable error data. I have sent a message to Rhiju with a link and asked him to review it. Thanks for attaching here, this should be VERY helpful. We should hear somethng back soon.

Moderator9
RALPH@home FAQs
RALPH@home Guidelines
Moderator Contact
ID: 1685 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1686 - Posted: 21 May 2006, 19:21:30 UTC - in response to Message 1684.  

mmciastro and others: great observations! This definitely looks
like a problem in the way Rosetta's "I'm finished" call interacts with
Boinc -- maybe the graphics thread is not getting shut down properly. I'm sending a note to Rom.


This is indeed a strange bug, which seems to be related to graphics, but not all graphics. My P4 1.8 with S3 onboard video doesn't have any errors with 5.16. My AMD 64 3700, 1 M ram, with a PCI-Express Asus EN6200TC256 video card does have problems. I've listed part of my results page below. the errors where while the screensaver was running. The one success early on was when the machine was in constant use. The other success came after I turned off the screensaver in windows.

132586 111400 20 May 2006 4:33:05 UTC 21 May 2006 16:25:53 UTC Over Success Done 14,851.61 58.65 58.65
132585 111399 20 May 2006 4:33:05 UTC 21 May 2006 16:25:53 UTC Over Success Done 13,593.53 53.68 53.68
132584 111344 20 May 2006 4:33:05 UTC 21 May 2006 4:02:36 UTC Over Success Done 14,273.86 56.37 56.37
132122 116565 19 May 2006 17:03:21 UTC 20 May 2006 4:18:50 UTC Over Client error Computing 0.00 0.00 ---
132108 116551 19 May 2006 17:03:21 UTC 20 May 2006 4:18:50 UTC Over Client error Computing 0.00 0.00 ---
132106 116549 19 May 2006 17:03:21 UTC 20 May 2006 4:18:50 UTC Over Client error Computing 0.00 0.00 ---
128201 112721 17 May 2006 8:34:07 UTC 18 May 2006 11:08:21 UTC Over Client error Computing 13,684.06 51.00 ---

128200 112720 17 May 2006 8:34:07 UTC 17 May 2006 16:21:57 UTC Over Success Done 14,411.83 53.71 53.71
128199 112719 17 May 2006 8:34:07 UTC 17 May 2006 10:39:17 UTC Over Client error Computing 5,152.00 19.20 ---

My AMD64 3700 laptop doesn't have these errors (I don't think the one shown was the same fatal windows error).

131159 115609 19 May 2006 7:27:43 UTC 20 May 2006 10:18:42 UTC Over Success Done 14,053.16 51.88 51.88
131089 115539 19 May 2006 7:27:43 UTC 20 May 2006 19:30:06 UTC Over Success Done 15,001.38 55.38 55.38
131088 115538 19 May 2006 7:27:43 UTC 19 May 2006 21:05:09 UTC Over Success Done 14,072.97 51.95 51.95
127951 112471 17 May 2006 8:35:02 UTC 18 May 2006 21:13:39 UTC Over Success Done 14,273.81 52.69 52.69
127950 112470 17 May 2006 8:35:02 UTC 18 May 2006 11:08:46 UTC Over Success Done 14,932.47 55.13 55.13
127949 112469 17 May 2006 8:35:02 UTC 17 May 2006 20:27:12 UTC Over Client error Computing 3,093.84 11.42 ---



ID: 1686 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1687 - Posted: 21 May 2006, 19:32:26 UTC - in response to Message 1683.  

Jose: great, thanks for posting this! This is actually
a very rare error code that you are seeing. You can
see a list of top errors at [url] http://www.romwnet.org/dasblogce/[/url].
Have you seen this if you run other BOINC apps, e.g. Seti@home?

https://ralph.bakerlab.org/result.php?resultid=133961



ID: 1687 · Report as offensive    Reply Quote
Jose
Avatar

Send message
Joined: 25 Apr 06
Posts: 7
Credit: 77
RAC: 0
Message 1688 - Posted: 22 May 2006, 5:00:33 UTC - in response to Message 1687.  
Last modified: 22 May 2006, 5:40:00 UTC

Jose: great, thanks for posting this! This is actually
a very rare error code that you are seeing. You can
see a list of top errors at [url] http://www.romwnet.org/dasblogce/[/url].
Have you seen this if you run other BOINC apps, e.g. Seti@home?

https://ralph.bakerlab.org/result.php?resultid=133961




That bug has been reported in some of the Rosetta Work Units that have failed.
Other BOINC Applications I have run had not reported that error
ID: 1688 · Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : RALPH@home bug list : Bug reports for Ralph 5.16



©2024 University of Washington
http://www.bakerlab.org