Bug Reports for Minirosetta v1.36

Message boards : RALPH@home bug list : Bug Reports for Minirosetta v1.36

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 4281 - Posted: 16 Oct 2008, 8:40:46 UTC - in response to Message 4280.  

The new 'hombench_mtyka' work units seem to now be working ok, as none of my latest have gone past my 6 hour preference.

Some do, but some do not...

One more hombench_mtyka_foldcst_ task with "Maximum disk usage exceeded" and error -177, on WinXP SP3, 6.3.14 client :

hombench_mtyka_foldcst_loopbuild_tex_cst_foldcst_loopbuild_tex_cst_t286__IGNORE_THE_REST_1ZITA_1_5159_1_1 exited after 5916.234 seconds with -177 (0xffffffffffffff4f)
<core_client_version>6.3.14</core_client_version>
Maximum disk usage exceeded

sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
[...] (again repeated hundreds times)
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range

Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x7C90120E
Engaging BOINC Windows Runtime Debugger...

According to dbg output, I'd again bet on a broken stack...

Peter
ID: 4281 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 4282 - Posted: 16 Oct 2008, 10:04:06 UTC - in response to Message 4281.  
Last modified: 16 Oct 2008, 10:32:37 UTC

The new 'hombench_mtyka' work units seem to now be working ok, as none of my latest have gone past my 6 hour preference.

Some do, but some do not...

One more hombench_mtyka_foldcst_ task with "Maximum disk usage exceeded" and error -177, on WinXP SP3, 6.3.14 client :

hombench_mtyka_foldcst_loopbuild_tex_cst_foldcst_loopbuild_tex_cst_t286__IGNORE_THE_REST_1ZITA_1_5159_1_1 exited after 5916.234 seconds with -177 (0xffffffffffffff4f)
<core_client_version>6.3.14</core_client_version>
Maximum disk usage exceeded

sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
[...] (again repeated hundreds times)
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range

Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x7C90120E
Engaging BOINC Windows Runtime Debugger...

According to dbg output, I'd again bet on a broken stack...

Peter


Yes Pepo, you are correct, the ones I was referring to were the "hombench_mtyka_looprelax" work units which seem to run ok.

The "hombench_mtyka_foldcst" work units all seem to fail, in my case by running well past the set preference times.

My previous longest before I aborted it was over 11 hours on a 6 hour preference.

If I see this type of work unit I have been aborting them, but alas, I have had two WU's download and start when I was at work.

They have now exceeded my previous longest by running (so far, as they have not finished yet) for 17 hours 21 minutes and 14 hours 36 minutes.

As it has now been so long I will wait for Boinc to end the run as will have gone past 3 times my preference time, so they will be ended soon.

A lot of work for only a little return as I know I will probably only get 20 credits for them.

EDIT:: Have changed my mind after seeing that our reports seem to be on deaf ears at the moment, I have just checked todays returned work units and it appears I was incorrect about the "hombench_mtyka_looprelax" work units being ok.
I have just one success and all the others have been "Validate Error".
So after I don't know how many hours processing numerous work units and then getting nothing for them I will abort all of what I have and set my computers to "No New Work' until this current batch of work units is gone.
To me it is just wasting my time with no return and no response from the Ralph/Rosetta team as to what the problems are.
I did get a response on the Rosetta forums but they seem to think the "hombench_mtyka" type work units are all ok, I say try processing a few and see what happens.
ID: 4282 · Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 4283 - Posted: 16 Oct 2008, 11:11:43 UTC - in response to Message 4282.  
Last modified: 16 Oct 2008, 11:12:43 UTC

The new 'hombench_mtyka' work units seem to now be working ok, as none of my latest have gone past my 6 hour preference.

Some do, but some do not...


Yes Pepo, you are correct, the ones I was referring to were the "hombench_mtyka_looprelax" work units which seem to run ok.

The "hombench_mtyka_foldcst" work units all seem to fail, in my case by running well past the set preference times.

EDIT:: Have changed my mind after seeing that our reports seem to be on deaf ears at the moment, I have just checked todays returned work units and it appears I was incorrect about the "hombench_mtyka_looprelax" work units being ok.
I did get a response on the Rosetta forums but they seem to think the "hombench_mtyka" type work units are all ok, I say try processing a few and see what happens.

I've indeed got no errored "homebench_mtyka_looprelax", but crunched just a little bit. From what I've crunched in last weeks (both ralph and rosetta), 63 x homebench_mtyka_* (came since 22.9., 6 failed - 10% failure ratio), 6 x homebench_tex_* (since 2 deays ago, and all failed):

03 x homebench_mtyka_foldcst_boinc_test* - 2 failed,
11 x homebench_mtyka_foldcst_loopbuild_boinctest* - 0 failed,
03 x homebench_mtyka_foldcst_loopbuild_test1* - 1 failed,
04 x homebench_mtyka_foldcst_loopbuild_tex_cst_* - 3 failed,
03 x homebench_mtyka_foldcst_simple_* - 0 failed,
09 x homebench_mtyka_looprelax_ccd_close_* - 0 failed,
21 x homebench_mtyka_looprelax_ccd_moves_* - 0 failed,
09 x homebench_mtyka_looprelax_test_full_* - 0 failed,
06 x homebench_tex_looprelax_tex* - 6 failed,

The list is sorted alphabetically, it'd be probably better to sort them according to their time of appearance.

Peter
ID: 4283 · Report as offensive    Reply Quote
AdeB
Avatar

Send message
Joined: 22 Dec 07
Posts: 61
Credit: 161,367
RAC: 0
Message 4284 - Posted: 16 Oct 2008, 19:41:14 UTC - in response to Message 4280.  

At this moment i have a ralph and a rosetta task running at the same time on a single-core computer, both using 50 %CPU.
The ralph task has the status 'Waiting to run' in boinc manager, and it is a hombench_tex_looprelax_tex_cst_oneparam_looprelax_tex_...-task, so that is the one i'm going to abort.
Something is seriously wrong with these tasks.


I too have had this same experience with a four core machine running 7 boinc work units. 4 Docking and 3 Ralph, the Ralph work units had "waiting to run" next to them but were still running. I had to stop Boinc and restart it to get just 4 WU's running.

This has happened on more than one machine and more than once. All with the 'hombench_tex' name.


Some time later i saw the same thing happening on my other PC with a rosetta work unit. This w.u. was the only one running, when i checked 20 minutes later it had status 'Waiting to run' but was still running side by site with a Seti work unit. I've let them run and both work units had a valid result.
So apparently these tasks sometimes don't react to the boinc-command to suspend. Which is a bad thing when you look at it from any other boinc-project, but it does not screw up the science.
ID: 4284 · Report as offensive    Reply Quote
Barraud Denis

Send message
Joined: 5 Apr 07
Posts: 3
Credit: 84,809
RAC: 0
Message 4285 - Posted: 22 Oct 2008, 13:18:25 UTC

the units hombench are seriouly buged, a lot of them have long time cpu workink, with no reward for time the used, I think the watchdog should be turned OFF; or reward these unit more ..


Task ID 1137005
Name hombench_mtyka_looprelax_ccd_close_looprelax_t374__IGNORE_THE_REST_1YVOA_12_5182_1_0
Workunit 1002825
Created 14 Oct 2008 16:30:27 UTC
Sent 16 Oct 2008 22:18:51 UTC
Received 18 Oct 2008 5:05:06 UTC
Server state Over
Outcome Validate error
Client state Done
Exit status 0 (0x0)
Computer ID 12276
Report deadline 20 Oct 2008 22:18:51 UTC
CPU time 21089.7
stderr out

<core_client_version>6.3.14</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 21600
# cpu_run_time_pref: 21600
======================================================
DONE :: 1 starting structures 21088.6 cpu seconds
This process generated 42 decoys from 42 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 92.2812617237414
Granted credit 0
application version 1.36
ID: 4285 · Report as offensive    Reply Quote
AdeB
Avatar

Send message
Joined: 22 Dec 07
Posts: 61
Credit: 161,367
RAC: 0
Message 4290 - Posted: 23 Oct 2008, 18:34:23 UTC

Two workunits running too long, both with error after calling boinc_finish.

1141183:
Rosetta is going too long. Watchdog is ending the run!
CPU time: 50277.8 seconds. Greater than 3X preferred time: 14400 seconds
**********************************************************************
called boinc_finish
SIGSEGV: segmentation violation

1142108:
Rosetta is going too long. Watchdog is ending the run!
CPU time: 50018.7 seconds. Greater than 3X preferred time: 14400 seconds
**********************************************************************
called boinc_finish
SIGILL: illegal instruction
ID: 4290 · Report as offensive    Reply Quote
Previous · 1 · 2 · 3

Message boards : RALPH@home bug list : Bug Reports for Minirosetta v1.36



©2024 University of Washington
http://www.bakerlab.org