minirosetta v1.55 bug thread

Message boards : RALPH@home bug list : minirosetta v1.55 bug thread

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
mtyka
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 19 Mar 08
Posts: 79
Credit: 0
RAC: 0
Message 4601 - Posted: 29 Jan 2009, 5:55:09 UTC
Last modified: 29 Jan 2009, 5:57:46 UTC

All right, while things over on BOINC are locking marvelous, here's another update on RALPH.

1.55 has three new things:

a) Fixes to deal with validator rejections for when the watchdog kicks in and when it says "too many restarts with no progress"

b) A very detailed debug information header which will hopefully help trace the problem inthe options system

c) ANOTHER bug fix in the BOINC API this time in the user preferences. THis bug lead directly to the phenomenon that Brotherbard managed to oint out by running his app in GDB. Awesome! Read about it here
Ramostol, this is relevant for you too, i think that's the same bug.
You two, could you set your settings back to restrict to specific days and see if it works now ? It did here :)


The lock file issue remains the last issue that we dont even have the faintest handle on, Apparently it's to do with setting the client to not allocate 100% of CPU.

Anyawy, please post reports here.
ID: 4601 · Report as offensive    Reply Quote
mtyka
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 19 Mar 08
Posts: 79
Credit: 0
RAC: 0
Message 4603 - Posted: 29 Jan 2009, 6:48:06 UTC

hmm i think the graphics dont work with this one. not to worry.
ID: 4603 · Report as offensive    Reply Quote
Profile [Toscana]SickBoy88

Send message
Joined: 27 Jan 09
Posts: 3
Credit: 17,581
RAC: 0
Message 4605 - Posted: 29 Jan 2009, 17:26:09 UTC

In fact in this WU
https://ralph.bakerlab.org/result.php?resultid=1278720
the graphic dont work.
ID: 4605 · Report as offensive    Reply Quote
mtyka
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 19 Mar 08
Posts: 79
Credit: 0
RAC: 0
Message 4606 - Posted: 29 Jan 2009, 19:29:09 UTC - in response to Message 4605.  

In fact in this WU
https://ralph.bakerlab.org/result.php?resultid=1278720
the graphic dont work.


Yes i noticed as soon as i fired up my client. Its ok,
i'll track this down in the next version - i think its just todo with the fact i only updated the app and not the graphics_app.
ID: 4606 · Report as offensive    Reply Quote
HA-SOFT, s.r.o.

Send message
Joined: 19 Jan 09
Posts: 6
Credit: 19,644
RAC: 0
Message 4607 - Posted: 29 Jan 2009, 19:49:35 UTC

Wow! Can confirm that my W2008 X64 server works ok for version 1.55.

Good to know it before preparing my new blade with 20 cores.

ID: 4607 · Report as offensive    Reply Quote
Profile Brotherbard

Send message
Joined: 16 Feb 06
Posts: 15
Credit: 76,109
RAC: 0
Message 4608 - Posted: 29 Jan 2009, 19:56:09 UTC - in response to Message 4603.  

I have 4 1.55 workunits that all start great even with weekday time limits set. Still waiting for them to finish.

Looking at the graphics app it looks like it has the same error as the main app. Here is the gdb output:

Initializing options.... ok 
Loaded options.... ok 
Processed options.... ok 
core.init: command: /Library/Application Support/BOINC Data/projects/ralph.bakerlab.org/minirosetta_graphics_1.54_i686-apple-darwin
core.init: 'RNG device' seed mode, using '/dev/urandom', seed=-1656248255 seed_offset=0 real_seed=-1656248255
Initializing random generators... ok 
core.init.random: RandomGenerator:init: Normal mode, seed=-1656248255 RG_type=mt19937
Initialization complete. 
Opened semaphore

Breakpoint 1, 0x9603b4a9 in malloc_error_break ()
(gdb) bt
#0  0x9603b4a9 in malloc_error_break ()
#1  0x96036497 in szone_error ()
#2  0x95f60463 in szone_free ()
#3  0x95f602cd in free ()
#4  0x000a7cb6 in WEEK_PREFS::~WEEK_PREFS ()
#5  0x007d24a9 in GLOBAL_PREFS::~GLOBAL_PREFS ()
#6  0x001ad5a8 in get_shmem_name ()
#7  0x001ad634 in boinc_graphics_get_shmem ()
#8  0x00085dd8 in protocols::boinc::Boinc::attach_shared_memory ()
#9  0x000074b9 in app_graphics_init ()
#10 0x0000ca75 in boinc_graphics_loop ()
#11 0x000087f6 in main ()

And the stderrgfx.txt has a "Non-aligned pointer being freed (2)" error for each of the weekday setting just like the science app did.

The graphics start up fine if the weekday prefs are not set when BOINC first starts up. But if the prefs were set when I started BOINC then even if I reset the preferences the graphics apps will still not start up.

--Nathan

ID: 4608 · Report as offensive    Reply Quote
mtyka
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 19 Mar 08
Posts: 79
Credit: 0
RAC: 0
Message 4609 - Posted: 29 Jan 2009, 22:45:21 UTC - in response to Message 4608.  

Yeah the graphics do work afterall. It's just the settings thing as you rightly point out NAthan. Thanks so much btw for identifying this bug. Its a major bugfix in the BOINC API and i would never have found 9or even suspected it) if i hadn't seen your trace !

Ok, so the graphics app needs to be recompiled too. ok, that's no problem. :)

ID: 4609 · Report as offensive    Reply Quote
Profile Brotherbard

Send message
Joined: 16 Feb 06
Posts: 15
Credit: 76,109
RAC: 0
Message 4610 - Posted: 30 Jan 2009, 0:11:29 UTC - in response to Message 4608.  

The 4 workunits have run to completion successfully. https://ralph.bakerlab.org/results.php?userid=82

--Nathan
ID: 4610 · Report as offensive    Reply Quote
Profile cenit

Send message
Joined: 26 Apr 08
Posts: 5
Credit: 25,392
RAC: 0
Message 4611 - Posted: 30 Jan 2009, 15:27:49 UTC - in response to Message 4610.  
Last modified: 30 Jan 2009, 15:28:11 UTC

i've some new 1.56 WUs with really nice graphics!
ID: 4611 · Report as offensive    Reply Quote
mtyka
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 19 Mar 08
Posts: 79
Credit: 0
RAC: 0
Message 4612 - Posted: 30 Jan 2009, 17:17:03 UTC - in response to Message 4611.  

a little treat for your guys ;)
ID: 4612 · Report as offensive    Reply Quote
svincent

Send message
Joined: 4 Apr 08
Posts: 34
Credit: 51,768
RAC: 0
Message 4613 - Posted: 30 Jan 2009, 18:25:43 UTC
Last modified: 30 Jan 2009, 18:26:55 UTC

Task: 1283328
Workunit: 1117100
Name: testD_cc_1_8_nocst4_hb_t360__IGNORE_THE_REST_2DO9A_7_7074_1_1
OS: Mac OS X 10.4.11

failed at initialization

<core_client_version>6.2.18</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
BOINC:: Initializing ... ok.
[2009- 1-30 7: 3:49:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing core...
Initializing options.... ok
ERROR: Unique best command line context option match not found for -user_tag

</stderr_txt>
]

(edited for readabilty)
ID: 4613 · Report as offensive    Reply Quote
I _ quit

Send message
Joined: 13 Jan 09
Posts: 44
Credit: 88,562
RAC: 0
Message 4614 - Posted: 30 Jan 2009, 20:21:22 UTC

since we are on 1.56 now, should there be a new thread for that?

btw..very interesting color choice for the accepted energy line and the other line above the rmsd box. what does the blue and yellow mean and isn't there a purplish color in there as well?

just completed 2 of the 1.56 tasks with no problems on 4 hr run time.
ID: 4614 · Report as offensive    Reply Quote
AdeB
Avatar

Send message
Joined: 22 Dec 07
Posts: 61
Credit: 161,367
RAC: 0
Message 4619 - Posted: 31 Jan 2009, 0:24:13 UTC
Last modified: 31 Jan 2009, 0:24:50 UTC

This task was valid and it got credit, but there are some error messages in stderr out:

...
Starting work on structure: S_shuffle_00012 <--- F_00007_0000861_0
Fullatom mode .. 
Hbond tripped.

ERROR: dis==0 in pairtermderiv!
ERROR:: Exit from: src/core/scoring/methods/PairEnergy.cc line: 338
called boinc_finish
...


AdeB
ID: 4619 · Report as offensive    Reply Quote
mtyka
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 19 Mar 08
Posts: 79
Credit: 0
RAC: 0
Message 4620 - Posted: 31 Jan 2009, 0:35:02 UTC - in response to Message 4614.  

since we are on 1.56 now, should there be a new thread for that?

btw..very interesting color choice for the accepted energy line and the other line above the rmsd box. what does the blue and yellow mean and isn't there a purplish color in there as well?


Nope - its just prettyness. No extra meaning i'm afraid.
ID: 4620 · Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 14 Jan 09
Posts: 62
Credit: 33,293
RAC: 0
Message 4621 - Posted: 31 Jan 2009, 4:17:02 UTC

Well, I am so disappointed ... no failure from any system ... sigh ... some other folks have all the fun ...

I suppose that is good news ... but seriously, running on 3-4 systems and not a failure in sight...

Almost as sad is that I have been running Rosetta on all systems but the linux box (too slow) and no failures there either ...
ID: 4621 · Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 14 Jan 09
Posts: 62
Credit: 33,293
RAC: 0
Message 4622 - Posted: 31 Jan 2009, 7:23:11 UTC
Last modified: 31 Jan 2009, 7:25:04 UTC

HOLD THE PRESSES!

I got one failure

Windows XP Pro (32-bit), i7 ...

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x005124B3 write attempt to address 0x3FF00000


App version 1.56 ... no idea what the gobble-de-gook is about the error for the rest of it ...

AT LAST ...

Of course it will be no fun if this is another of those .. "cannot reproduce" errors ...

{edit}
You know how we can attract testers ... crashed tasks get paid for by the project ... especially if the data finds a bug ... just a thought ... :)
ID: 4622 · Report as offensive    Reply Quote
Aegis Maelstrom

Send message
Joined: 19 Jan 09
Posts: 12
Credit: 4,751
RAC: 0
Message 4623 - Posted: 31 Jan 2009, 12:05:27 UTC

Version 1.55, Task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0.

Repeated "exited with zero status but no 'finished' file" problem.
BOINC logs:

2009-01-31 00:48:10|ralph@home|Starting _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0
2009-01-31 00:48:22|ralph@home|Starting task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 using minirosetta version 155
(...)
2009-01-31 01:04:19|ralph@home|Task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 exited with zero status but no 'finished' file
2009-01-31 01:04:19|ralph@home|If this happens repeatedly you may need to reset the project.
2009-01-31 01:05:12|ralph@home|Restarting task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 using minirosetta version 155
(...)
2009-01-31 01:14:32|ralph@home|Task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 exited with zero status but no 'finished' file
2009-01-31 01:14:32|ralph@home|If this happens repeatedly you may need to reset the project.
2009-01-31 01:14:49|ralph@home|Restarting task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 using minirosetta version 155
(...)
2009-01-31 01:31:09|ralph@home|Task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 exited with zero status but no 'finished' file
2009-01-31 01:31:09|ralph@home|If this happens repeatedly you may need to reset the project.
2009-01-31 01:32:25|ralph@home|Sending scheduler request: To fetch work. Requesting 1173 seconds of work, reporting 0 completed tasks
(...)
2009-01-31 01:48:42|ralph@home|Task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 exited with zero status but no 'finished' file
2009-01-31 01:48:42|ralph@home|If this happens repeatedly you may need to reset the project.
2009-01-31 01:49:46|ralph@home|Restarting task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 using minirosetta version 155
(...)
2009-01-31 02:06:39|ralph@home|Task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 exited with zero status but no 'finished' file
2009-01-31 02:06:40|ralph@home|If this happens repeatedly you may need to reset the project.
2009-01-31 02:07:45|ralph@home|Restarting task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 using minirosetta version 155
(...)
2009-01-31 02:15:11|ralph@home|Task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 exited with zero status but no 'finished' file
2009-01-31 02:15:11|ralph@home|If this happens repeatedly you may need to reset the project.
2009-01-31 02:15:28|ralph@home|Restarting task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 using minirosetta version 155
(...)
2009-01-31 02:31:56|ralph@home|Task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 exited with zero status but no 'finished' file
2009-01-31 02:31:56|ralph@home|If this happens repeatedly you may need to reset the project.
2009-01-31 02:31:56|ralph@home|Temporarily failed upload of _CAPRI17_T39_2_.sjf_br_one_docking.protocol__7228_256_0_0: connect() failed
2009-01-31 02:33:18|ralph@home|Restarting task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 using minirosetta version 155
(...)
.

As you may see, quite a waste of computing time. Before of that, other CAPRI17 task finished without any visible problems.

Finally the scheduler closed a time window for RALPH and started another project.

It was over 24 minutes, over 17% completed (however this last number is not really meaningful). What is interesting, boinccmd.exe --get_results claimed it was current CPU time 1455 sec, such as final CPU time, but the checkpoint CPU time was 1252 sec.

Finally I turned the RALPH on in the morning just to see what happens:

2009-01-31 11:34:25|ralph@home|Restarting task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 using minirosetta version 155
2009-01-31 11:34:39|ralph@home|Computation for task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 finished
.

It has finished with a so-called success, however with 2 decoys, low credit and a "Too many restarts with no progress. Keep application in memory while preempted." notice.

I hope it helps.

Best Regards and have a nice weekend!
a.m.
ID: 4623 · Report as offensive    Reply Quote
AdeB
Avatar

Send message
Joined: 22 Dec 07
Posts: 61
Credit: 161,367
RAC: 0
Message 4625 - Posted: 31 Jan 2009, 21:05:13 UTC - in response to Message 4620.  

since we are on 1.56 now, should there be a new thread for that?

btw..very interesting color choice for the accepted energy line and the other line above the rmsd box. what does the blue and yellow mean and isn't there a purplish color in there as well?


Nope - its just prettyness. No extra meaning i'm afraid.


Triggered by this message i tried to look at the graphics of task 1285083, but the graphics screen didn't open.
I'm running Gentoo linux on a AMD Athlon XP with 512MB memmory.

In version 1.54 the graphics did work, the only problem there (and in previous versions) was the presentation of Total credit and RAC.
ID: 4625 · Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 14 Jan 09
Posts: 62
Credit: 33,293
RAC: 0
Message 4626 - Posted: 31 Jan 2009, 22:36:52 UTC

On OS-X, no graphics with 1.56 ... so I too have the problem with graphics as AdeB reported below for Linux...
ID: 4626 · Report as offensive    Reply Quote
Tonno

Send message
Joined: 23 Nov 06
Posts: 16
Credit: 49,841
RAC: 0
Message 4627 - Posted: 1 Feb 2009, 0:25:34 UTC - in response to Message 4626.  

Compute error

<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
[2009- 1-31 18: 7:40:] :: BOINC:: Initializing ... ok.
[2009- 1-31 18: 7:40:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/ralph.bakerlab.org/minirosetta_database_rev26003.zip
Unpacking WU data ...
Unpacking data: ../../projects/ralph.bakerlab.org/mtyka_lr5_D_score12_normal.zip
Setting database description ...
Setting up checkpointing ...
Initializing score function:
Initializing relax mover:
Starting protocol...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on PDB structure: S_rb1_2d4f_native_0000_withcon_00001 <--- S_rb1_2d4f_native_0000.pdb


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x005124B3 write attempt to address 0x3FF00000

Engaging BOINC Windows Runtime Debugger...

ID: 4627 · Report as offensive    Reply Quote
1 · 2 · 3 · Next

Message boards : RALPH@home bug list : minirosetta v1.55 bug thread



©2024 University of Washington
http://www.bakerlab.org