Bug reports for Ralph 5.16

Message boards : RALPH@home bug list : Bug reports for Ralph 5.16

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1656 - Posted: 16 May 2006, 20:44:25 UTC - in response to Message 1649.  

Hi sTrey: This is a really good idea -- allowing users to set a preference for "big jobs". Its an idea that has come up a few times on the message boards, and we've contacted the BOINC team about it. For now, we're sending these jobs to machines with larger memories -- and we're tracking down ways to reduce the memory requirement.

The MAPRELAX wu I reported earlier is now just shy of halfway through its 4-hour task. 149MB in memory, peak of 405MB, VM size 416MB. Not as bad as some, but more than I'm comfortable with. I'm going to suspend some other projects and restart boinc to lessen the competition until this wu finishes.

If this large memory usage is in fact due to the type of wu and not leakage or some other bug, it would be nice if we could set willingness to crunch memory-gobbling tasks in host preferences.

Of course there are no host preferences, but even a project-wide preference setting would be helpful for many of us.


ID: 1656 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1657 - Posted: 16 May 2006, 20:47:08 UTC - in response to Message 1647.  

Yes these were bad WUs. They've been cancelled and resent with corrected FASTA files. Thanks for posting!

I Have a few -

    BOINC 5.4.9, Ralph 5.16
    GenuineIntel Intel(R) Pentium(R) M processor 1.86GHz
    Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)
    Memory 2039.37 MB
    cash 76.56 KB
    swap space 932.3 MB
    65.54 GB



resultid=125665 -
ERROR:: Unable to obtain sequence information. fasta file must be provided.
ERROR:: Exit at: .initialize.cc line:236
resultid=125589 -
ERROR:: Unable to obtain sequence information. fasta file must be provided.
ERROR:: Exit at: .initialize.cc line:236
resultid=125413 -
ERROR:: Unable to obtain sequence information. fasta file must be provided.
ERROR:: Exit at: .initialize.cc line:236

The machine is also running Rosetta

Regards
Phil



ID: 1657 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1658 - Posted: 17 May 2006, 3:41:41 UTC - in response to Message 1651.  

An update for those of you who noticed the unusual growing memory of ralph with MAPRELAX jobs. I've partly pinpointed the problem to something that was introduced in the last month in the BOINC windows API. It only causes a growing memory footprint on windows (not linux, that's why I didn't see it originally) and only on those particular jobs. I'm contacting the BOINC team about it -- hopefully this will be fixed by the next ralph. It may also be a useful lead into reducing memory requirements for Windows machines. Thanks to sTrey and others for bringing this to our attention!

The MAPRELAX wu I reported earlier is now just shy of halfway through its 4-hour task. 149MB in memory, peak of 405MB, VM size 416MB. Not as bad as some, but more than I'm comfortable with. I'm going to suspend some other projects and restart boinc to lessen the competition until this wu finishes.


Just missed the edit-post deadline... I waited for a checkpoint then restarted boinc. At the time of the restart it was just beginning model 8. ralph was consuming 217MB memory peak 405, VM size 468MB.

On the restart, it went up to about 93 MB peak 95, VM 117 MB. It's growing though; as I type this it's at 130MB memory peak 13, VM 145MB.



ID: 1658 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1659 - Posted: 17 May 2006, 20:45:19 UTC

I've had TWO byte the dust today, and one success.

wuid=112719
Result ID 128199
Name t283_HOMOLOG_ABRELAX_hom001__532_59_0
Workunit 112719
Created 17 May 2006 7:54:43 UTC
Sent 17 May 2006 8:34:07 UTC
Received 17 May 2006 10:39:17 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -1073741811 (0xffffffffc000000d)
Computer ID 2172
Report deadline 21 May 2006 8:34:07 UTC
CPU time 5152
stderr out <core_client_version>5.4.9</core_client_version>
<message>
- exit code -1073741811 (0xc000000d)
</message>
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 3055630

</stderr_txt>


Validate state Invalid
Claimed credit 19.2013819424423
Granted credit 0
application version 5.16

AND

wuid=112469
Result ID 127949
Name t283_HOMOLOG_ABRELAX_hom003__532_23_0
Workunit 112469
Created 17 May 2006 7:54:15 UTC
Sent 17 May 2006 8:35:02 UTC
Received 17 May 2006 20:27:12 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -1 (0xffffffffffffffff)
Computer ID 2173
Report deadline 21 May 2006 8:35:02 UTC
CPU time 3093.84375
stderr out <core_client_version>5.4.9</core_client_version>
<message>
- exit code -1 (0xffffffff)
</message>
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 3055266

</stderr_txt>


Validate state Invalid
Claimed credit 11.4215356293639
Granted credit 0
application version 5.16
ID: 1659 · Report as offensive    Reply Quote
Big Whiskey
Avatar

Send message
Joined: 21 Mar 06
Posts: 3
Credit: 3,342
RAC: 0
Message 1666 - Posted: 18 May 2006, 6:44:15 UTC
Last modified: 18 May 2006, 6:46:40 UTC

Watchdog has fallen asleep on this work unit 83799 . Progress stuck at 1.03%, CPU time 23 hours and 24 hours to completion and both are rising. Nothing showing in graphics but a black screen.

It seems to be missing some files. I found these messages in the stdout text file in the Slots file.

WARNING:: paths.txt file not found!!
Setting all paths to .

Searching for dat file: .1enh.dat
Searching for dat file: .1enh.dat
WARNING!! .dat file not found!

WARNING: CONSTRAINT FILE NOT FOUND
Searched for: .1enh_.cst
Running without distance constraints

WARNING: DIPOLAR CONSTRAINT FILE NOT FOUND
Searched for: .1enh_.dpl
Dipolar constraints will not be used

Looking for dssp file: .1enh.dssp
dssp file not found
Looking for secondary structure assignment file: .1enh_.ssa
ssa file not found

I'm going to have to retire this watchdog in the next day.
WOOF
ID: 1666 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1668 - Posted: 18 May 2006, 10:57:37 UTC
Last modified: 18 May 2006, 10:58:24 UTC

OK, I've gotten one more. I think I might see a pattern to some extent. of both the 5.12's and 5.16's that I've had the windows fault on, each time it involved my screensaver running at the time. Is anyone else seeing this?? wus I've run while awake and using this puter have been done successfully.

anyway, here's last nites faulty wu


wuid=112721

Result ID 128201
Name t283_HOMOLOG_ABRELAX_hom003__532_59_0
Workunit 112721
Created 17 May 2006 7:54:43 UTC
Sent 17 May 2006 8:34:07 UTC
Received 18 May 2006 11:08:21 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -1073741811 (0xffffffffc000000d)
Computer ID 2172
Report deadline 21 May 2006 8:34:07 UTC
CPU time 13684.0625
stderr out <core_client_version>5.4.9</core_client_version>
<message>
- exit code -1073741811 (0xc000000d)
</message>
<stderr_txt>
# random seed: 3055230
# cpu_run_time_pref: 14400
# DONE :: 1 starting structures built 19 (nstruct) times
# This process generated 19 decoys from 19 attempts

</stderr_txt>


Validate state Invalid
Claimed credit 51.0001767443229
Granted credit 0
application version 5.16
ID: 1668 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 1674 - Posted: 19 May 2006, 17:19:30 UTC

https://ralph.bakerlab.org/result.php?resultid=132116

ERROR:: Exit at: .barcode_classes.cc line:500


Anders n
ID: 1674 · Report as offensive    Reply Quote
suguruhirahara

Send message
Joined: 5 Mar 06
Posts: 40
Credit: 11,320
RAC: 0
Message 1680 - Posted: 20 May 2006, 12:36:36 UTC
Last modified: 20 May 2006, 12:37:22 UTC

This version doesn't cause an error on my computer.

Graphic : OK
Work Tasks : OK

Here are two of completed tasks on my computer.
https://ralph.bakerlab.org/workunit.php?wuid=117061
https://ralph.bakerlab.org/workunit.php?wuid=115799

I appreciate developers for great work. :)
Anyway, has a cause of errors been identified already?
ID: 1680 · Report as offensive    Reply Quote
suguruhirahara

Send message
Joined: 5 Mar 06
Posts: 40
Credit: 11,320
RAC: 0
Message 1682 - Posted: 21 May 2006, 8:21:47 UTC - in response to Message 1681.  

...
Anyway, has a cause of errors been identified already?

In part yes, and with the help of the people running RALPH they will eliminate it.


I see. Please keep working to eliminate errors and add new functions.
ID: 1682 · Report as offensive    Reply Quote
Jose
Avatar

Send message
Joined: 25 Apr 06
Posts: 7
Credit: 77
RAC: 0
Message 1683 - Posted: 21 May 2006, 13:12:41 UTC

https://ralph.bakerlab.org/result.php?resultid=133961


ID: 1683 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1684 - Posted: 21 May 2006, 16:12:21 UTC
Last modified: 21 May 2006, 16:15:40 UTC

This is indeed a strange bug, which seems to be related to graphics, but not all graphics. My P4 1.8 with S3 onboard video doesn't have any errors with 5.16. My AMD 64 3700, 1 M ram, with a PCI-Express Asus EN6200TC256 video card does have problems. I've listed part of my results page below. the errors where while the screensaver was running. The one success early on was when the machine was in constant use. The other success came after I turned off the screensaver in windows.

132586 111400 20 May 2006 4:33:05 UTC 21 May 2006 16:25:53 UTC Over Success Done 14,851.61 58.65 58.65
132585 111399 20 May 2006 4:33:05 UTC 21 May 2006 16:25:53 UTC Over Success Done 13,593.53 53.68 53.68
132584 111344 20 May 2006 4:33:05 UTC 21 May 2006 4:02:36 UTC Over Success Done 14,273.86 56.37 56.37
132122 116565 19 May 2006 17:03:21 UTC 20 May 2006 4:18:50 UTC Over Client error Computing 0.00 0.00 ---
132108 116551 19 May 2006 17:03:21 UTC 20 May 2006 4:18:50 UTC Over Client error Computing 0.00 0.00 ---
132106 116549 19 May 2006 17:03:21 UTC 20 May 2006 4:18:50 UTC Over Client error Computing 0.00 0.00 ---
128201 112721 17 May 2006 8:34:07 UTC 18 May 2006 11:08:21 UTC Over Client error Computing 13,684.06 51.00 ---

128200 112720 17 May 2006 8:34:07 UTC 17 May 2006 16:21:57 UTC Over Success Done 14,411.83 53.71 53.71
128199 112719 17 May 2006 8:34:07 UTC 17 May 2006 10:39:17 UTC Over Client error Computing 5,152.00 19.20 ---

My AMD64 3700 laptop doesn't have these errors (I don't think the one shown was the same fatal windows error).

131159 115609 19 May 2006 7:27:43 UTC 20 May 2006 10:18:42 UTC Over Success Done 14,053.16 51.88 51.88
131089 115539 19 May 2006 7:27:43 UTC 20 May 2006 19:30:06 UTC Over Success Done 15,001.38 55.38 55.38
131088 115538 19 May 2006 7:27:43 UTC 19 May 2006 21:05:09 UTC Over Success Done 14,072.97 51.95 51.95
127951 112471 17 May 2006 8:35:02 UTC 18 May 2006 21:13:39 UTC Over Success Done 14,273.81 52.69 52.69
127950 112470 17 May 2006 8:35:02 UTC 18 May 2006 11:08:46 UTC Over Success Done 14,932.47 55.13 55.13
127949 112469 17 May 2006 8:35:02 UTC 17 May 2006 20:27:12 UTC Over Client error Computing 3,093.84 11.42 ---

ID: 1684 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1686 - Posted: 21 May 2006, 19:21:30 UTC - in response to Message 1684.  

mmciastro and others: great observations! This definitely looks
like a problem in the way Rosetta's "I'm finished" call interacts with
Boinc -- maybe the graphics thread is not getting shut down properly. I'm sending a note to Rom.


This is indeed a strange bug, which seems to be related to graphics, but not all graphics. My P4 1.8 with S3 onboard video doesn't have any errors with 5.16. My AMD 64 3700, 1 M ram, with a PCI-Express Asus EN6200TC256 video card does have problems. I've listed part of my results page below. the errors where while the screensaver was running. The one success early on was when the machine was in constant use. The other success came after I turned off the screensaver in windows.

132586 111400 20 May 2006 4:33:05 UTC 21 May 2006 16:25:53 UTC Over Success Done 14,851.61 58.65 58.65
132585 111399 20 May 2006 4:33:05 UTC 21 May 2006 16:25:53 UTC Over Success Done 13,593.53 53.68 53.68
132584 111344 20 May 2006 4:33:05 UTC 21 May 2006 4:02:36 UTC Over Success Done 14,273.86 56.37 56.37
132122 116565 19 May 2006 17:03:21 UTC 20 May 2006 4:18:50 UTC Over Client error Computing 0.00 0.00 ---
132108 116551 19 May 2006 17:03:21 UTC 20 May 2006 4:18:50 UTC Over Client error Computing 0.00 0.00 ---
132106 116549 19 May 2006 17:03:21 UTC 20 May 2006 4:18:50 UTC Over Client error Computing 0.00 0.00 ---
128201 112721 17 May 2006 8:34:07 UTC 18 May 2006 11:08:21 UTC Over Client error Computing 13,684.06 51.00 ---

128200 112720 17 May 2006 8:34:07 UTC 17 May 2006 16:21:57 UTC Over Success Done 14,411.83 53.71 53.71
128199 112719 17 May 2006 8:34:07 UTC 17 May 2006 10:39:17 UTC Over Client error Computing 5,152.00 19.20 ---

My AMD64 3700 laptop doesn't have these errors (I don't think the one shown was the same fatal windows error).

131159 115609 19 May 2006 7:27:43 UTC 20 May 2006 10:18:42 UTC Over Success Done 14,053.16 51.88 51.88
131089 115539 19 May 2006 7:27:43 UTC 20 May 2006 19:30:06 UTC Over Success Done 15,001.38 55.38 55.38
131088 115538 19 May 2006 7:27:43 UTC 19 May 2006 21:05:09 UTC Over Success Done 14,072.97 51.95 51.95
127951 112471 17 May 2006 8:35:02 UTC 18 May 2006 21:13:39 UTC Over Success Done 14,273.81 52.69 52.69
127950 112470 17 May 2006 8:35:02 UTC 18 May 2006 11:08:46 UTC Over Success Done 14,932.47 55.13 55.13
127949 112469 17 May 2006 8:35:02 UTC 17 May 2006 20:27:12 UTC Over Client error Computing 3,093.84 11.42 ---



ID: 1686 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1687 - Posted: 21 May 2006, 19:32:26 UTC - in response to Message 1683.  

Jose: great, thanks for posting this! This is actually
a very rare error code that you are seeing. You can
see a list of top errors at [url] http://www.romwnet.org/dasblogce/[/url].
Have you seen this if you run other BOINC apps, e.g. Seti@home?

https://ralph.bakerlab.org/result.php?resultid=133961



ID: 1687 · Report as offensive    Reply Quote
Jose
Avatar

Send message
Joined: 25 Apr 06
Posts: 7
Credit: 77
RAC: 0
Message 1688 - Posted: 22 May 2006, 5:00:33 UTC - in response to Message 1687.  
Last modified: 22 May 2006, 5:40:00 UTC

Jose: great, thanks for posting this! This is actually
a very rare error code that you are seeing. You can
see a list of top errors at [url] http://www.romwnet.org/dasblogce/[/url].
Have you seen this if you run other BOINC apps, e.g. Seti@home?

https://ralph.bakerlab.org/result.php?resultid=133961




That bug has been reported in some of the Rosetta Work Units that have failed.
Other BOINC Applications I have run had not reported that error
ID: 1688 · Report as offensive    Reply Quote
wizzszz

Send message
Joined: 28 Apr 06
Posts: 17
Credit: 1,128
RAC: 0
Message 1689 - Posted: 23 May 2006, 12:41:30 UTC

I have "NO searching..." and "Accepted" graphic at all!!!
The "Lowest" is broken into many pieces....

And the step counter is awfully slow!!!

Running for 8 minutes now, step is only at about 2600!
The rosetta WU I am crunching (CASP7, too) reached far
beyond step 100.000 within 8 minutes of crunching...


ID: 1689 · Report as offensive    Reply Quote
Basilaris

Send message
Joined: 16 Feb 06
Posts: 2
Credit: 10,006
RAC: 0
Message 1690 - Posted: 23 May 2006, 13:08:07 UTC

I have the same low step rate, but the workunits finish in the normal time after perhaps 4000 steps or so.

Graphics are ok in my workunits.
ID: 1690 · Report as offensive    Reply Quote
rob147147

Send message
Joined: 11 May 06
Posts: 1
Credit: 312
RAC: 0
Message 1692 - Posted: 23 May 2006, 22:32:22 UTC
Last modified: 23 May 2006, 22:33:38 UTC

Had this work unit manage to crash my computer twice
https://ralph.bakerlab.org/result.php?resultid=136079

It was running fine for about an hour, but i had noticed it hadnt checkpointed at all in that time. Suddenly my computer crashed...i initially presumed it was nothing to do with the workunit. After a quick reboot i started BOINC back up again and started the work unit again from scratch due to its lack of making a checkpoint. It again seemed to be running fine but after about 55 minutes my computer crashed again. After another reboot and starting BOINC up again the work unit froze after 8 seconds...i had to end the process in task manager so the work unit gave me the computing error...

If you require any more info please let me know...

Rob
ID: 1692 · Report as offensive    Reply Quote
wizzszz

Send message
Joined: 28 Apr 06
Posts: 17
Credit: 1,128
RAC: 0
Message 1693 - Posted: 24 May 2006, 1:16:44 UTC - in response to Message 1691.  

I have "NO searching..." and "Accepted" graphic at all!!!
The "Lowest" is broken into many pieces....

And the step counter is awfully slow!!!

Running for 8 minutes now, step is only at about 2600!
The rosetta WU I am crunching (CASP7, too) reached far
beyond step 100.000 within 8 minutes of crunching...

...

The relax phase is always slower, but the graphic should not look like that. Rhiju is aware there are problems with the graphic. I am not sure, but I think they are fixing it.


Ok, didn't see that is was already in relax phase...
But relax phase should be a little later, not at step 2600!??
ID: 1693 · Report as offensive    Reply Quote
doc :)

Send message
Joined: 16 Feb 06
Posts: 46
Credit: 4,437
RAC: 0
Message 1696 - Posted: 25 May 2006, 3:58:07 UTC

wizzszz: there are different types of WUs, some (rare, or at least i didnt have many of them in the past) start with the relax stage without ever doing the faster ab initio stuff.

back to topic :)
just got this WU
got no stuff in the searching and accepted boxes just like wizzszz in his screenshot, i was able to get the structure in the low energy screen though through moving it randomly around, it was somewhere offscreen, looking ok at first, but started to get randomly broken after a while. now at 1.561% (or a little earlier maybe) everything looks like its normal, all pics where they should be, and the structure in the low energy window is in the center now too when i move it around.
ID: 1696 · Report as offensive    Reply Quote
doc :)

Send message
Joined: 16 Feb 06
Posts: 46
Credit: 4,437
RAC: 0
Message 1697 - Posted: 25 May 2006, 8:12:13 UTC

exact same behavior of the graphics on this WU too.
nothing in accepted and searching, randomly broken stuff in low energy. at 1.561% all pictures look normal and the accepted energy graph looks like its starting from the beginning like if it was a new model while its actually still on model 1.
ID: 1697 · Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : RALPH@home bug list : Bug reports for Ralph 5.16



©2024 University of Washington
http://www.bakerlab.org