Minirosetta Beta 3.14

Message boards : RALPH@home bug list : Minirosetta Beta 3.14

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 221
Credit: 527,409
RAC: 0
Message 5325 - Posted: 15 Jun 2011, 4:29:26 UTC

Please post issues and bugs here. We are particularly interested in excessive disk usage and memory errors. We do expect some jobs to use up to 600-700MB of memory and we'll submit these to higher memory clients. We are also interested in a possible dead lock of the main application and the graphics app where the cpu usage goes to zero for both apps.
ID: 5325 · Report as offensive    Reply Quote
BigMike
Avatar

Send message
Joined: 23 Feb 06
Posts: 63
Credit: 58,730
RAC: 0
Message 5328 - Posted: 15 Jun 2011, 16:35:46 UTC

Had one blow up on a sin/cos range error.

=Mike
Don't believe everything you think.
ID: 5328 · Report as offensive    Reply Quote
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 221
Credit: 527,409
RAC: 0
Message 5329 - Posted: 15 Jun 2011, 22:29:44 UTC

thanks for the info. that's a known issue with that type of job.
ID: 5329 · Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 13 Jan 09
Posts: 82
Credit: 306,886
RAC: 178
Message 5331 - Posted: 16 Jun 2011, 1:28:51 UTC

One where BOINC thinks the workunit is still running, but it's using no CPU time at all now:

http://ralph.bakerlab.org/workunit.php?wuid=1802588

Elapsed 07:01:04
48.46% progress and no longer changing
To completion 06:27:09

I normally don't have the graphics portion showing, but when I asked for it, it came up solid black.

Anything special I need to do to send back useful information on why?
ID: 5331 · Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 13 Jan 09
Posts: 82
Credit: 306,886
RAC: 178
Message 5332 - Posted: 16 Jun 2011, 2:06:23 UTC
Last modified: 16 Jun 2011, 2:08:41 UTC

A few more details:

The workunit not using CPU time had a 530 MB maximum working set size.

Was running in 32-bit mode. Any plans to offer a 64-bit version of this application, even if its main advantage is to help computers like mine that seem to have a limit of around 4 GB on the maximum amount of memory that can be assigned to the entire set of 32-bit programs (BOINC or not) that are in memory at once?

More memory is installed, but seems useful mainly for 64-bit programs.

I haven't found a task name for the graphics app. What should I be looking for?

My other computer also has a 3.14 workunit, running in high priority mode but at least still showing an increasing progress.
ID: 5332 · Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 13 Jan 09
Posts: 82
Credit: 306,886
RAC: 178
Message 5333 - Posted: 16 Jun 2011, 3:07:41 UTC
Last modified: 16 Jun 2011, 3:20:02 UTC

I've now found something that might be the graphics application:

Minirosetta Beta 3.14 - Windows Internet Explorer

Listed under Applications under Windows Task Manager, not under Processes, and therefore shown without any task name.

Have not found any way to show the resource usage of anything listed only as an application.

Total disk usage by all programs about 1 MB per minute, and mainly by system programs.

Total network usage about 1 MB per minute, mainly by boincmgr.exe and boinc.exe.


BOINC 6.10.58
64-bit Vista SP2, with almost all updates offered except Internet Explorer 9


My other computer has already returned its 3.14 workunit hours sooner than its previous estimated time to completion; already marked as a success. Same versions of BOINC and Windows.
ID: 5333 · Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 13 Jan 09
Posts: 82
Credit: 306,886
RAC: 178
Message 5334 - Posted: 16 Jun 2011, 4:09:00 UTC

I've now identified:

Minirosetta Beta 3.14 - Windows Internet Explorer

It was the browser window under which I entered the last few messages.


CPU time at last checkpoint of the faulty workunit: 03:33:00

CPU time for the workunit: 03:33:15

Could this indicate a problem with resuming normal operation after checkpoints? I've forgotten just which BOINC project has often been showing workunits stopping any use of CPU time about that soon after a checkpoint lately. Would a separate thread used mainly for checking for such conditions be useful?


I've added up the memory currently reported as in use by 32-bit programs. About 1.7 GB total, so I don't expect any problem from that.

ID: 5334 · Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 13 Jan 09
Posts: 82
Credit: 306,886
RAC: 178
Message 5335 - Posted: 16 Jun 2011, 5:01:36 UTC

I decided to inspect the list of files in the slot for the failed workunit; it appears that the last file modified there was about 6 hours ago.

I also inspected the files lists under minirosetta-database and found that the sections for metal ions do not appear to list aluminum, even though it is connected to the brain damage in one of the later stages of Alzheimer's, or copper, even though the human brain's natural defense against Alzheimer's uses a copper-binding protein. I assume that is not important for this workunit, but how important is it for Rosetta@Home workunits aimed at Alzheimer's?
ID: 5335 · Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 13 Jan 09
Posts: 82
Credit: 306,886
RAC: 178
Message 5336 - Posted: 16 Jun 2011, 7:27:53 UTC
Last modified: 16 Jun 2011, 7:45:46 UTC

Still more:

I clicked on the workunit, then Show graphics. Another window, all black inside. I clicked on the X to close that window and got a windows error message for minirosetta_graphics_3.13_windows_x86_64.exe. Details too long to copy, but I used the snipping tool to capture pictures of it.

If those details would be useful, how do I send the pictures?

Windows Task Manager does not list any program with that name among the programs now running or suspended, and did not when I started this series of messages.
ID: 5336 · Report as offensive    Reply Quote
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 221
Credit: 527,409
RAC: 0
Message 5338 - Posted: 16 Jun 2011, 20:33:42 UTC

robertmiles,

Sounds like it might be a dead lock issue. You can manually kill the minirosetta process. We'll look into this further. Let us know if it happens again.
ID: 5338 · Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 13 Jan 09
Posts: 82
Credit: 306,886
RAC: 178
Message 5339 - Posted: 17 Jun 2011, 0:44:36 UTC - in response to Message 5338.  

Thanks for replying.
ID: 5339 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 581
Credit: 1,028,705
RAC: 148
Message 5343 - Posted: 5 Jul 2011, 17:59:16 UTC
Last modified: 5 Jul 2011, 18:00:15 UTC

All error after few seconds on win7:

2056476
2056475
2056474
2056471
2056465


ERROR: unrecognized aa LIG
ERROR:: Exit from: ..\..\..\src\core\io\pdb\file_data.cc line: 641
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
ID: 5343 · Report as offensive    Reply Quote
svincent

Send message
Joined: 4 Apr 08
Posts: 34
Credit: 51,768
RAC: 0
Message 5344 - Posted: 5 Jul 2011, 19:30:21 UTC

Failing on Mac also. Slightly different error message


ERROR: Cannot open PDB file "2p9hA_suc_0001.pdb"
ERROR:: Exit from: src/core/import_pose/import_pose.cc line: 199
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

Task 2056185
ID: 5344 · Report as offensive    Reply Quote
Rocco Moretti
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 18 May 10
Posts: 11
Credit: 30,188
RAC: 0
Message 5345 - Posted: 5 Jul 2011, 23:20:52 UTC - in response to Message 5343.  

ERROR: unrecognized aa LIG


Sorry about that - there was a file missing from the input files. It should be corrected in newer submissions.

ERROR: Cannot open PDB file "2p9hA_suc_0001.pdb"


A different input file issue - also should be corrected with newer submissions.

--

(I will double check my input files before submitting.
I will double check my input files before submitting.
I will double check my input files before submitting. ...)
ID: 5345 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 581
Credit: 1,028,705
RAC: 148
Message 5346 - Posted: 6 Jul 2011, 5:20:36 UTC - in response to Message 5345.  


(I will double check my input files before submitting.
I will double check my input files before submitting.
I will double check my input files before submitting. ...)



:-)
ID: 5346 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 362
Credit: 1,368,421
RAC: 0
Message 5347 - Posted: 7 Jul 2011, 1:16:42 UTC
Last modified: 7 Jul 2011, 1:21:09 UTC

Had this error on 3 of my last few work units

ERROR: unrecognized aa LIG
ERROR:: Exit from: src/core/io/pdb/file_data.cc line: 641
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

See 2056632
2056714
2057573

Also had the following error on another 2 work units

ERROR: Cannot open PDB file "2p9hA_suc_0001.pdb"
ERROR:: Exit from: ..\..\..\src\core\import_pose\import_pose.cc line: 199
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

See 2057602
2057618

Conan
ID: 5347 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 581
Credit: 1,028,705
RAC: 148
Message 5348 - Posted: 8 Jul 2011, 6:32:01 UTC

2058828

ERROR: ERROR: FragmentIO: could not open file frags_w_cs_wt_200.11mers
ERROR:: Exit from: ..\..\..\src\core\fragment\FragmentIO.cc line: 230
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
ID: 5348 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 362
Credit: 1,368,421
RAC: 0
Message 5354 - Posted: 20 Jul 2011, 12:14:10 UTC

Had two errors with the following error code

ERROR: ct == final_atoms
ERROR:: Exit from: ..\..\..\src\core\scoring\rms_util.cc line: 524
BOINC:: Error reading and gzipping output datafile: default.out


On 2078013
and 2078108

Both failed for the resend as well.

Conan
ID: 5354 · Report as offensive    Reply Quote
Pieface

Send message
Joined: 16 Feb 06
Posts: 64
Credit: 203,513
RAC: 0
Message 5355 - Posted: 20 Jul 2011, 13:18:05 UTC

Same error here 2077145 wingmans unit died also.

ERROR: ct == final_atoms
ERROR:: Exit from: ..\..\..\src\core\scoring\rms_util.cc line: 524
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
ID: 5355 · Report as offensive    Reply Quote
Pieface

Send message
Joined: 16 Feb 06
Posts: 64
Credit: 203,513
RAC: 0
Message 5356 - Posted: 20 Jul 2011, 13:41:23 UTC
Last modified: 20 Jul 2011, 13:59:33 UTC

Anyone having watchdog problems with the cleft.cyca.CYCA... units? I have three all gone past the 12hr target point and bouncing between 9:59 and 10:00 minutes remaining. Longest one is at about 13 hrs 25 mins. Going to let them run this morning to see if they finish on their own.

edit: morning eyes, time for a shower, changed 'deft' to 'cleft'
ID: 5356 · Report as offensive    Reply Quote
1 · 2 · Next

Message boards : RALPH@home bug list : Minirosetta Beta 3.14



©2018 University of Washington
http://www.bakerlab.org