Rosetta mini 3.18

Message boards : RALPH@home bug list : Rosetta mini 3.18

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 5403 - Posted: 1 Nov 2011, 11:43:52 UTC

New version of rosetta mini. New function or a simply debug?
ID: 5403 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 5404 - Posted: 1 Nov 2011, 11:50:22 UTC
Last modified: 1 Nov 2011, 12:19:13 UTC

These work units take a very long time to run (around 10 hours on a 6 hour preference) and return very poor credit for the effort (31 points)
See WU 2305565

I have another two of these running at the moment and both are already over 9 hours and still going.

I also have a 3.17 WU that is over 9 hours as well. Most only run 1 to 2 hours.

Conan
ID: 5404 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 5405 - Posted: 1 Nov 2011, 13:07:26 UTC
Last modified: 1 Nov 2011, 13:08:42 UTC

Well I have had another 2 work units go to 10 hours 11 minutes before reporting as successful but the points are even lower than before
WU 2306108 gave 8.77 points
WU 2306211 gave 17.87 points

I have aborted WU 2306685 at 27,849 seconds (over 7 hours) and WU 2306474 at 25,481 seconds (7 hours) as they were going to take over 10 hours and give very little in return.

My preference is for 6 hours so longer than that I have a close look.

All long running work units have been on my two Linux machines, Windows not affected.
Also it has been both 3.17 and 3.18 work unit types.

Conan
ID: 5405 · Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 13 Jan 09
Posts: 100
Credit: 331,865
RAC: 0
Message 5406 - Posted: 1 Nov 2011, 16:19:43 UTC - in response to Message 5404.  

These work units take a very long time to run (around 10 hours on a 6 hour preference) and return very poor credit for the effort (31 points)
See WU 2305565

I have another two of these running at the moment and both are already over 9 hours and still going.

I also have a 3.17 WU that is over 9 hours as well. Most only run 1 to 2 hours.

Conan


These could be looking for a bug I saw earlier in 3.14. If so, something it would be useful to watch for: The workunit stops using any CPU time at all, WITHOUT telling the BOINC manager that there is a problem so that another workunit can run instead. If that's the problem, the workunit can easily sit there not really running for days, since the time limit detection can't run either.
ID: 5406 · Report as offensive    Reply Quote
Rocco Moretti
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 18 May 10
Posts: 11
Credit: 30,188
RAC: 0
Message 5408 - Posted: 1 Nov 2011, 17:20:40 UTC - in response to Message 5403.  

New version of rosetta mini. New function or a simply debug?


As mentioned by cmiles over in the Rosetta@home forums, minirosetta 3.18 is identical to minirosetta_beta 3.17, and is identical to the version (3.17) currently being run on Rosetta@home. (The difference in numbering is because of technical reasons.)

ID: 5408 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 5409 - Posted: 2 Nov 2011, 0:57:31 UTC

A few error are starting to show up today

First two are on Linux
WU 2316336 shows this

ERROR: seqpos <= size()
ERROR:: Exit from: src/core/conformation/Conformation.hh line: 267
BOINC:: Error reading and gzipping output datafile: default.out

WU 2049320 shows
ERROR: Illegal value specified for option -run: protocol : medal_abinitio

Next two are on Windows
WU 2316455 and WU 2316209

Both show this

ERROR: seqpos <= size()
ERROR:: Exit from: d:boinc_buildminirosetta_beta_3.17rosetta_sourcesrccore/conformation/Conformation.hh line: 267
BOINC:: Error reading and gzipping output datafile: default.out

Conan
ID: 5409 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 5410 - Posted: 2 Nov 2011, 17:02:08 UTC - in response to Message 5408.  

Thank for the answer!!
ID: 5410 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 5411 - Posted: 2 Nov 2011, 17:02:12 UTC - in response to Message 5408.  

Thanks for the answer!!
ID: 5411 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 5412 - Posted: 2 Nov 2011, 23:57:01 UTC

A few more Windows errors
WU 2305445
- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x7C90120E

WU 2304479
"Maximum Elapsed Time Exceeded"

WU 2325467 and WU 2324219

ERROR: seqpos <= size()
ERROR:: Exit from: d:boinc_buildminirosetta_beta_3.17rosetta_sourcesrccore/conformation/Conformation.hh line: 267
BOINC:: Error reading and gzipping output datafile: default.out

Conan
ID: 5412 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 5413 - Posted: 4 Nov 2011, 9:19:52 UTC

Some validate errors
2338233
2338190

BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
======================================================
DONE :: 1 starting structures 1201 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state Invalid
ID: 5413 · Report as offensive    Reply Quote
Trotador

Send message
Joined: 7 May 10
Posts: 33
Credit: 14,751,627
RAC: 0
Message 5414 - Posted: 4 Nov 2011, 17:22:46 UTC
Last modified: 4 Nov 2011, 17:30:35 UTC

Hi

Many validation errors today, around 90 out of 320 units, most of them finish in less of 100-200 seconds with few o them reaching 600 or 1000 seconds. So far all the wingmen also failed in these workunits. Both in Linux and in W7.

Regarding the extra long units with very low scores, I think they all are TO538..., It happened firtst with the beta 3.17 (TBC) and the subsequent releases behave equally. I tend lately to abort them.

regards

Edit: I've noticed that many units with validation errors are over 200 seconds
ID: 5414 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 5415 - Posted: 6 Nov 2011, 18:12:22 UTC

2361173

ERROR: seqpos <= size()
ERROR:: Exit from: d:boinc_buildminirosetta_beta_3.17rosetta_sourcesrccore/conformation/Conformation.hh line: 267
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
ID: 5415 · Report as offensive    Reply Quote
Profile [SG-FC] dingdong

Send message
Joined: 17 Mar 09
Posts: 18
Credit: 5,046,648
RAC: 0
Message 5416 - Posted: 6 Nov 2011, 18:54:36 UTC

Work Unit ID 2074393: 50 minutes without action, cpu - load = 0%
ID: 5416 · Report as offensive    Reply Quote
Trotador

Send message
Joined: 7 May 10
Posts: 33
Credit: 14,751,627
RAC: 0
Message 5417 - Posted: 8 Nov 2011, 23:13:25 UTC

2KZU_... units are erroring just at the start

ERROR: in::file::boinc_wu_zip 4-boinc-submit/2KZU_chromodomain.zip does not exist!
ERROR:: Exit from: src/apps/public/boinc/minirosetta.cc line: 168
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish


a problem with the nemae of the files
ID: 5417 · Report as offensive    Reply Quote
svincent

Send message
Joined: 4 Apr 08
Posts: 34
Credit: 51,768
RAC: 0
Message 5418 - Posted: 11 Nov 2011, 2:50:03 UTC

Task 2379638 gave a Validate Error, but without anything noteworthy appearing in the log file.

Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
======================================================
DONE :: 1 starting structures 1201 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>
Validate state Invalid
ID: 5418 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 5422 - Posted: 17 Nov 2011, 22:27:56 UTC

Validate errors
2429175
2429148

# cpu_run_time_pref: 7200
======================================================
DONE :: 32 starting structures 7150.81 cpu seconds
This process generated 32 decoys from 32 attempts
======================================================
BOINC :: WS_max 8.35789e+008

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
ID: 5422 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 5423 - Posted: 19 Nov 2011, 14:36:27 UTC

Heaps of bugs/ errors

The following error appeared on one Windows machine but seems to now be processing OK
WU 2435485
WU 2436196
WU 2436362
WU 2436427
WU 2436489
WU 2436965

app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>minirosetta_database_rev45517.zip</file_name>
<error_code>-120</error_code>
<error_message>signature verification failed</error_message>
</file_xfer_error>

WU 2436832 had the following error
ERROR:Option matching -in:file:boinc_wu_fix:zip not found in command line top-level context

ALL of the following Linux Work Units have VALIDATE ERRORS

WU 2441745
WU 2441633
WU 2441023
WU 2439837
WU 2439085
WU 2438445
WU 2438250
WU 2438113
WU 2437946
WU 2437831
WU 2437828
WU 2437700
WU 2437662
WU 2433162
WU 2433080
WU 2441742
WU 2441538
WU 2441458
WU 2441308
WU 2441223
WU 2440847
WU 2440310
WU 2438270
WU 2437847
WU 2437609
WU 2437451
WU 2437440
WU 2437361
WU 2437278
WU 2436307
WU 2429070
WU 2429068

There seems to be a problem

Conan
ID: 5423 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 5424 - Posted: 20 Nov 2011, 0:09:38 UTC

ALL LINUX WORK UNITS GET VALIDATE ERRORS

NONE are successful

Most Windows WUs validate but some are now starting get validate errors as well.

Conan
ID: 5424 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 5427 - Posted: 20 Nov 2011, 8:09:31 UTC

Despite random validate error on win7, i see this:
I set 1h my wus, some wus run for more than 3h.
The 1h wu gives me from 13 to 16 points, the 3h gives me from 4 to 7 points
ID: 5427 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 5432 - Posted: 4 Dec 2011, 19:33:55 UTC

This batch seems very good! No one error...
ID: 5432 · Report as offensive    Reply Quote
1 · 2 · Next

Message boards : RALPH@home bug list : Rosetta mini 3.18



©2024 University of Washington
http://www.bakerlab.org