MiniRosetta Beta 3.26

Message boards : RALPH@home bug list : MiniRosetta Beta 3.26

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 5511 - Posted: 5 Apr 2012, 1:19:54 UTC

New Application.

Not a very good one as nearly all work units are failing with various errors

Some failed work units are as follows
2640054
2639860
2639851
2639832

Conan

See also the Minirosetta beta 3.24 thread as I posted there as well

ID: 5511 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 5512 - Posted: 5 Apr 2012, 1:23:13 UTC

The 6 Work Unit Limit is a bit of a pain.

If the project sends out faulty work then I can't get any more for the day to test if some work units actually work or not.
This will spread the work around I suppose but slow down getting the work returned.

Conan
ID: 5512 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 5513 - Posted: 5 Apr 2012, 7:37:05 UTC

Same here on windows xp:
2641583
2641588
2641590

ERROR: [ERROR] Error opening symmetry file '/work/dimaio/projects/casp9/T0524/run_12/symmdef/3imhA_101_C4.symm'
ERROR:: Exit from: ......srccoreconformationsymmetrySymmData.cc line: 535
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
ID: 5513 · Report as offensive    Reply Quote
Snagletooth

Send message
Joined: 4 May 07
Posts: 67
Credit: 134,427
RAC: 0
Message 5514 - Posted: 5 Apr 2012, 9:44:21 UTC

CASP9_bv_benchmark_hybridization_run48_T0518_0_C2_SAVE_ALL_OUT_IGNORE_THE_REST_17843_2_0
ERROR: [ERROR] Error opening symmetry file '/work/dimaio/projects/casp9/T0518/run_12/symmdef/3h3lA_201_C2.symm'
ERROR:: Exit from: src/core/conformation/symmetry/SymmData.cc line: 535
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

CASP9_bv_benchmark_hybridization_run48_T0563_0_C3_SAVE_ALL_OUT_IGNORE_THE_REST_17886_5_0
ERROR: [ERROR] Error opening symmetry file '/work/dimaio/projects/casp9/T0563/run_12/symmdef/1unbA_301_C3.symm'
ERROR:: Exit from: src/core/conformation/symmetry/SymmData.cc line: 535
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

CASP9_bv_benchmark_hybridization_run48_T0521_0_C2_SAVE_ALL_OUT_IGNORE_THE_REST_17845_6_0
ERROR: [ERROR] Error opening symmetry file '/work/dimaio/projects/casp9/T0521/run_12/symmdef/3l19B_102_C2.symm'
ERROR:: Exit from: src/core/conformation/symmetry/SymmData.cc line: 535
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

The C1 units appear to be fine on my Mac.
CASP9_bv_benchmark_hybridization_run48_T0561_2_C1_SAVE_ALL_OUT_IGNORE_THE_REST_17884_5_0

Currently crunching another C1 so we'll see if it holds up. It's about 45 minutes in with a cpu preferred runtime of 4 hours.

Best,
Snags

ID: 5514 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 5515 - Posted: 5 Apr 2012, 13:53:13 UTC

2642639

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x7C812AFB

Engaging BOINC Windows Runtime Debugger...

- Registers -
eax=0222bb88 ebx=004536c0 ecx=00000000 edx=015f6a28 esi=0222bc10 edi=015f65e0
eip=7c812afb esp=0222bb84 ebp=0222bbd8
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206

- Callstack -
ChildEBP RetAddr Args to Child
0222bbd8 0041262e e06d7363 00000001 00000003 0222bc04 kernel32!_RaiseException@16+0x0
0222bc10 004125e5 0222bc20 01431280 012bd338 012c1ab0 minirosetta_beta_3.26_windows_i!cppdb::atomic_counter::atomic_counter+0x0
0222bc30 008c7a3e 04536c00 0222d21c 0222d21c 00000000 minirosetta_beta_3.26_windows_i!cppdb::atomic_counter::atomic_counter+0x0
0222bc48 008c80cf 004536c0 0222bd74 00a2bb72 e3d4513a minirosetta_beta_3.26_windows_i!cppdb::atomic_counter::get+0x0
0222bce8 00a2c15d 0222bd74 0222d21c e3d453e6 0000008d minirosetta_beta_3.26_windows_i!cppdb::atomic_counter::get+0x0
0222be34 008dc042 0222c05c 0222d21c 0222c104 0222d2cc minirosetta_beta_3.26_windows_i!cppdb::mutex::~mutex+0x0
0222d4ec 008e5985 069a0490 00000000 069a0490 00000001 minirosetta_beta_3.26_windows_i!cppdb::atomic_counter::get+0x0
0222d554 00e97fc4 00000001 1a3dd8c0 069a0490 00000000 minirosetta_beta_3.26_windows_i!cppdb::atomic_counter::get+0x0
0222d5ac 004fd499 1a3dd8c0 00000001 069a0490 1a3d5510 minirosetta_beta_3.26_windows_i!cppdb::backend::static_driver::in_use+0x0
0222de3c 005025b2 069a0490 e3d4337a 060a54f8 1a270d48 minirosetta_beta_3.26_windows_i!cppdb::atomic_counter::atomic_counter+0x0
0222dea8 00a34e37 069a0490 1a270d48 01199e1b 00000000 minirosetta_beta_3.26_windows_i!cppdb::atomic_counter::atomic_counter+0x0
0222dec0 00a3543b 0222eb48 e3d4333a 0222eb48 000000dd minirosetta_beta_3.26_windows_i!cppdb::mutex::~mutex+0x0
0222dee8 00c71ebf 0222eb48 1a3d5510 00000000 06a15688 minirosetta_beta_3.26_windows_i!cppdb::mutex::~mutex+0x0
0222e51c 00c5ffea 0222eb48 e3d4052a 063a62d8 06a9aaf8 minirosetta_beta_3.26_windows_i!cppdb::mutex::~mutex+0x0
0222e8f8 0060b673 0222eb48 e3d404be 063a62d8 06a9aaf8 minirosetta_beta_3.26_windows_i!cppdb::mutex::~mutex+0x0
0222e96c 0060b84c 0222eb48 06a9aaf8 063a6354 00000008 minirosetta_beta_3.26_windows_i!cppdb::backend::driver::connect+0x0
0222e984 0060c36c 0222eb48 06a9aaf8 e3d40786 00000000 minirosetta_beta_3.26_windows_i!cppdb::backend::driver::connect+0x0
0222ea54 005f4e1d 0222eb48 e3d40136 02ac4610 02ac4610 minirosetta_beta_3.26_windows_i!cppdb::backend::driver::connect+0x0
0222ece4 00612a72 00000000 e3d400da 02ac4620 0222ecec minirosetta_beta_3.26_windows_i!cppdb::atomic_counter::atomic_counter+0x0
0222ed08 006097df 00000000 e3d40092 00000000 0000000d minirosetta_beta_3.26_windows_i!cppdb::backend::driver::connect+0x0
0222ed40 00405450 00000000 e3d402ca 00000000 00000000 minirosetta_beta_3.26_windows_i!cppdb::backend::driver::connect+0x0
0222ef18 004056fd 0000001b 0222ef30 00052310 0222ef30 minirosetta_beta_3.26_windows_i!+0x0
0222ff30 0041814e 00400000 00000000 00052357 0000000a minirosetta_beta_3.26_windows_i!+0x0
0222ffc0 7c817077 00000000 00000000 7ffd5000 e06d7363 minirosetta_beta_3.26_windows_i!cppdb::atomic_counter::atomic_counter+0x0
0222fff0 00000000 004181a1 00000000 00000000 00000000 kernel32!_BaseProcessStart@4+0x0

*** Dump of thread ID 3832 (state: Waiting): ***

- Information -
Status: Wait Reason: ExecutionDelay, , Kernel Time: 701008.000000, User Time: 300432.000000, Wait Time: 2061862.000000

- Registers -
eax=00000000 ebx=00000000 ecx=0410f898 edx=00000304 esi=00000000 edi=0410ff64
eip=7c91e514 esp=0410ff34 ebp=0410ff8c
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202

- Callstack -
ChildEBP RetAddr Args to Child
0410ff30 7c91d21a 7c8023f1 00000000 0410ff64 7c801e1a ntdll!_KiFastSystemCallRet@0+0x0 FPO: [0,0,0]
0410ff34 7c8023f1 00000000 0410ff64 7c801e1a 00000002 ntdll!_NtDelayExecution@8+0x0 FPO: [2,0,0]
0410ff8c 7c802455 00000064 00000000 0410ffb4 004080a8 kernel32!_SleepEx@8+0x0
0410ff9c 004080a8 00000064 0000000c 19b4dcec 404e91a7 kernel32!_Sleep@4+0x0
0410ffb4 7c80b729 00000000 0000000c 00000002 00000000 minirosetta_beta_3.26_windows_i!+0x0
0410ffec 00000000 00408090 00000000 00000000 eb832e98 kernel32!_BaseThreadStart@8+0x0

*** Dump of thread ID 1272 (state: Waiting): ***

- Information -
Status: Wait Reason: ExecutionDelay, , Kernel Time: 500720.000000, User Time: 0.000000, Wait Time: 2061778.000000

- Registers -
eax=00000000 ebx=05c59600 ecx=7c802413 edx=ffffffff esi=00000000 edi=075dfe48
eip=7c91e514 esp=075dfe18 ebp=075dfe70
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206

- Callstack -
ChildEBP RetAddr Args to Child
075dfe14 7c91d21a 7c8023f1 00000000 075dfe48 00000031 ntdll!_KiFastSystemCallRet@0+0x0 FPO: [0,0,0]
075dfe18 7c8023f1 00000000 075dfe48 00000031 00000000 ntdll!_NtDelayExecution@8+0x0 FPO: [2,0,0]
075dfe70 7c802455 000007d0 00000000 075dff68 00619853 kernel32!_SleepEx@8+0x0
075dfe80 00619853 000007d0 e6ab12ba 00000050 05c596f0 kernel32!_Sleep@4+0x0
075dff68 00619a47 00000000 004150c3 00000000 e6ab127a minirosetta_beta_3.26_windows_i!cppdb::backend::driver::connect+0x0
075dffa8 0041514d 060afd50 075dffec 7c80b729 05c596f0 minirosetta_beta_3.26_windows_i!cppdb::backend::driver::connect+0x0
075dffb4 7c80b729 05c596f0 00000050 060afd50 05c596f0 minirosetta_beta_3.26_windows_i!cppdb::atomic_counter::atomic_counter+0x0
075dffec 00000000 004150e9 05c596f0 00000000 08560000 kernel32!_BaseThreadStart@8+0x0


*** Debug Message Dump ****


*** Foreground Window Data ***
Window Name :
Window Class :
Window Process ID: 0
Window Thread ID : 0

Exiting...
ID: 5515 · Report as offensive    Reply Quote
TPCBF

Send message
Joined: 20 Jun 11
Posts: 30
Credit: 27,776
RAC: 0
Message 5516 - Posted: 5 Apr 2012, 17:44:13 UTC - in response to Message 5512.  
Last modified: 5 Apr 2012, 17:46:15 UTC

The 6 Work Unit Limit is a bit of a pain.

If the project sends out faulty work then I can't get any more for the day to test if some work units actually work or not.
This will spread the work around I suppose but slow down getting the work returned.

Conan
What 6 WU Limit?
I had in the last couple of days up to 20 if I counted right of those quickly failing 3.24 ones, and right now I have 9 of the 3.26 Beta WUs in queue (one currently running)...
Ok, make that 8 in queue and one just finished successfully...

Ralf
ID: 5516 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 5517 - Posted: 6 Apr 2012, 0:25:57 UTC - in response to Message 5516.  

The 6 Work Unit Limit is a bit of a pain.

If the project sends out faulty work then I can't get any more for the day to test if some work units actually work or not.
This will spread the work around I suppose but slow down getting the work returned.

Conan
What 6 WU Limit?
I had in the last couple of days up to 20 if I counted right of those quickly failing 3.24 ones, and right now I have 9 of the 3.26 Beta WUs in queue (one currently running)...
Ok, make that 8 in queue and one just finished successfully...

Ralf


When you get a number of errors (as in quite a lot of them), the project limits how many work units you get so that a lot of work is not 'trashed'.
However if the project sends out faulty work units you get the same result and then you are limited in how many work units, PER MACHINE, you can get.

Myself and a number of others hit this limit on some of our computers, we could then only get 6 WUs for the whole day.
Once a few successful WUs go through this limit gets lifted and then we can get as many as we can handle.

Conan
ID: 5517 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 5519 - Posted: 7 Apr 2012, 13:55:47 UTC
Last modified: 7 Apr 2012, 13:56:06 UTC

2647170

ERROR: [ERROR] Error opening symmetry file '/work/dimaio/projects/casp9/T0555/run_12/symmdef/1yc9A_201_C3.symm'
ERROR:: Exit from: ......srccoreconformationsymmetrySymmData.cc line: 535
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
ID: 5519 · Report as offensive    Reply Quote
Profile Saenger
Avatar

Send message
Joined: 28 Feb 06
Posts: 13
Credit: 67,395
RAC: 0
Message 5520 - Posted: 9 Apr 2012, 21:14:58 UTC

18,800 seconds for 1.95 credits, that's incredible:
2646029 	21363 	5 Apr 2012 23:19:34 UTC 	9 Apr 2012 21:11:20 UTC 	Over 	Success 	Done 	18,800.09 	187.68 	1.95


Any idea what went so terribly wrong?
https://ralph.bakerlab.org/result.php?resultid=2646029
Grüße vom Sänger
ID: 5520 · Report as offensive    Reply Quote
Rocco Moretti
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 18 May 10
Posts: 11
Credit: 30,188
RAC: 0
Message 5521 - Posted: 10 Apr 2012, 1:17:40 UTC - in response to Message 5520.  

18,800 seconds for 1.95 credits, that's incredible:

Any idea what went so terribly wrong?


From the stderr out, it looks like your boinc client actually ran the executable twice. Once for 99 decoys, and the second time for just a single decoy - the output file of which likely overwrote the output file of the first time around. This means that although you crunched for 100 decoys worth of time, you only sent back (and got credit for) one decoy.

Why boinc re-ran the minirosetta application, I don't know - it might have something to do with the "No heartbeat from core client for 30 sec - exiting" line. If I had to guess, your Boinc manager was not running or unresponsive when the minirosetta application finished, so it didn't recognize that it was done, causing it to restart it and overwrite the results.
ID: 5521 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 5528 - Posted: 16 Apr 2012, 9:52:02 UTC

2650831

Unpacking zip data: ../../projects/ralph.bakerlab.org/minirosetta_database_rev48292.zip
Unpacking WU data ...
Unpacking data: ../../projects/ralph.bakerlab.org/input_CASP9_bz_benchmark_hybridization_run52_T0596_0_C1_yfsong.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.

</stderr_txt>
]]>

Validate state Invalid
ID: 5528 · Report as offensive    Reply Quote
Profile rilian

Send message
Joined: 7 Sep 07
Posts: 8
Credit: 74,079
RAC: 0
Message 5529 - Posted: 16 May 2012, 14:12:48 UTC - in response to Message 5528.  
Last modified: 16 May 2012, 14:14:19 UTC

ERROR: Cannot open PDB file "/work/brunette/experiments/alignment_challenge/raptor_difficult_cases/T0540/native//T0540.pdb"
ERROR:: Exit from: src/core/import_pose/import_pose.cc line: 184
BOINC:: Error reading and gzipping output datafile: default.out

some brunette poses are invalid :D

https://ralph.bakerlab.org/result.php?resultid=2658197

on a side note, i think error ins in file path format with double slashes here //T0540.pdb
I am the member of Rosetta Ukraine team
ID: 5529 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 5531 - Posted: 25 May 2012, 20:05:35 UTC

I'm running 3.26 version. Why not 3.31??
ID: 5531 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 5532 - Posted: 26 May 2012, 6:26:59 UTC

2659979

======================================================
DONE :: 1 starting structures 4896.75 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
BOINC :: WS_max 4.53599e+008

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state Invalid
ID: 5532 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 5534 - Posted: 14 Jun 2012, 1:02:26 UTC

On this Work Unit 2666791 I ran into this error

<message>
couldn't start Can't write init file: -108: -108
</message>

Conan
ID: 5534 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 5535 - Posted: 20 Jun 2012, 20:20:36 UTC

A LOT of errors on my win7 32bit:

2670362
2670356
2670348
2670333

<core_client_version>7.0.25</core_client_version>
<![CDATA[
<message>
Funzione non corretta. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
[2012- 6-20 9:15:33:] :: BOINC:: Initializing ... ok.
[2012- 6-20 9:15:33:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/ralph.bakerlab.org/minirosetta_database_rev48292.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.

</stderr_txt>
]]>

Validate state Invalid
ID: 5535 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 5541 - Posted: 23 Jun 2012, 8:17:10 UTC

Again, errors
2676163


CPU time 7071.837
stderr out

<core_client_version>7.0.25</core_client_version>
<![CDATA[
<message>
Funzione non corretta. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
[2012- 6-23 6:27:19:] :: BOINC:: Initializing ... ok.
[2012- 6-23 6:27:19:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/ralph.bakerlab.org/minirosetta_database_rev48292.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 7200

</stderr_txt>
]]>

Validate state Invalid
ID: 5541 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 840
Credit: 1,888,960
RAC: 0
Message 5545 - Posted: 28 Jun 2012, 18:42:57 UTC - in response to Message 5541.  



<core_client_version>7.0.25</core_client_version>
<![CDATA[
<message>
Funzione non corretta. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>


Validate state Invalid


Usual error
2693765
ID: 5545 · Report as offensive    Reply Quote
TPCBF

Send message
Joined: 20 Jun 11
Posts: 30
Credit: 27,776
RAC: 0
Message 5546 - Posted: 29 Jun 2012, 4:43:57 UTC - in response to Message 5545.  

Since the latest batch started the other day, I get roughly one compute error for each dozen or so WUs that go through just fine...

Ralf
ID: 5546 · Report as offensive    Reply Quote
Snagletooth

Send message
Joined: 4 May 07
Posts: 67
Credit: 134,427
RAC: 0
Message 5547 - Posted: 29 Jun 2012, 13:52:33 UTC - in response to Message 5531.  

I'm running 3.26 version. Why not 3.31??


I have this question as well. The 3.30 version included the fix for the Mac slowdown problem which effected every type of work unit. It (the fix) presumably will be included in every new version going forward so why would it not be used on Ralph?
ID: 5547 · Report as offensive    Reply Quote
1 · 2 · Next

Message boards : RALPH@home bug list : MiniRosetta Beta 3.26



©2024 University of Washington
http://www.bakerlab.org