Bug reports for Ralph 5.33 and 5.34

Message boards : RALPH@home bug list : Bug reports for Ralph 5.33 and 5.34

To post messages, you must log in.

AuthorMessage
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 2416 - Posted: 19 Oct 2006, 18:45:47 UTC
Last modified: 24 Oct 2006, 3:53:24 UTC

1. We are trying out some more features that should get lower energy decoys for the same amount of computation.

2. Some slight fixes for docking graphics.

3. Added options to allow simulations constrained by data from solution x-ray scattering experiments.

Let us know if you see anything weird!
ID: 2416 · Report as offensive    Reply Quote
Pieface

Send message
Joined: 16 Feb 06
Posts: 64
Credit: 203,513
RAC: 0
Message 2417 - Posted: 20 Oct 2006, 12:49:45 UTC

Here are a couple of wu's that err'd on 5.33:

result 298467
and
result 298499

Both got:

<core_client_version>5.4.11</core_client_version>
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 2887450
ERROR:: Exit at: .pose_routines.cc line:126

ID: 2417 · Report as offensive    Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 8 Aug 06
Posts: 75
Credit: 2,396,363
RAC: 6,299
Message 2418 - Posted: 21 Oct 2006, 4:34:11 UTC

Here's another one.
Reno, NV
Team: SETI.USA
ID: 2418 · Report as offensive    Reply Quote
Sadir

Send message
Joined: 21 Feb 06
Posts: 6
Credit: 1,419
RAC: 0
Message 2421 - Posted: 23 Oct 2006, 21:03:18 UTC

Same problem with WU FRA_t389...
for example:
result 264439
result 264406
result 264336

ID: 2421 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 2422 - Posted: 24 Oct 2006, 3:03:51 UTC - in response to Message 2417.  

Thanks - I think I've fixed that problem in 5.34! We'll see...

Here are a couple of wu's that err'd on 5.33:

result 298467
and
result 298499

Both got:

5.4.11

Incorrect function. (0x1) - exit code 1 (0x1)


# random seed: 2887450
ERROR:: Exit at: .pose_routines.cc line:126


ID: 2422 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 2423 - Posted: 24 Oct 2006, 3:04:27 UTC - in response to Message 2421.  

OK, we're looking at it -- most of those WUs failed.

Same problem with WU FRA_t389...
for example:
result 264439
result 264406
result 264336


ID: 2423 · Report as offensive    Reply Quote
Nikolay A. Saharov

Send message
Joined: 17 Feb 06
Posts: 6
Credit: 25,102
RAC: 0
Message 2424 - Posted: 24 Oct 2006, 3:33:12 UTC

Results:
300497
300283


<core_client_version>5.6.5</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 2884013
ERROR:: Exit at: .minimize.cc line:2089

</stderr_txt>
]]>


ID: 2424 · Report as offensive    Reply Quote
Tobie

Send message
Joined: 4 Oct 06
Posts: 3
Credit: 582
RAC: 0
Message 2425 - Posted: 24 Oct 2006, 9:23:20 UTC
Last modified: 24 Oct 2006, 10:12:18 UTC

My WUs came up with errors.

Results:

301114
300324

<core_client_version>5.4.11</core_client_version>
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 2883800
ERROR:: Exit at: .minimize.cc line:2089

</stderr_txt>

ID: 2425 · Report as offensive    Reply Quote
Bin Qian

Send message
Joined: 13 Feb 06
Posts: 3
Credit: 4,483
RAC: 0
Message 2427 - Posted: 24 Oct 2006, 16:52:26 UTC - in response to Message 2425.  

Thanks for reporting these error! I've tracked down the bug and fixed it. The bug only affect a particular set of command line options.

The bug fix will be included in the next update.

My WUs came up with errors.

Results:

301114
300324

<core_client_version>5.4.11</core_client_version>
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 2883800
ERROR:: Exit at: .minimize.cc line:2089

</stderr_txt>



ID: 2427 · Report as offensive    Reply Quote
Conrad Poohs
Avatar

Send message
Joined: 29 Aug 06
Posts: 9
Credit: 1,955
RAC: 0
Message 2428 - Posted: 25 Oct 2006, 4:46:45 UTC

I have a WU which seems to be 'stuck' ie. the run time and the time to go are increasing at about 1 sec per sec but % done is remaining at 39.901.

I have tried stopping and restarting BOINC Client (as this has worked in similar situations before) to no avail.

This WU is hogging my BOIC processing as my PC (DELL 4600 Intel P4 3.00Ghz WinXPSP2) is now overcommitted.

Any answers please as I am only willing to give this WU another couple of hours unless it moves on.

Regards,
Andy G
ID: 2428 · Report as offensive    Reply Quote
Conrad Poohs
Avatar

Send message
Joined: 29 Aug 06
Posts: 9
Credit: 1,955
RAC: 0
Message 2429 - Posted: 25 Oct 2006, 4:50:03 UTC

Oops, forgot to say that WU is 263878.

Regards,
Andy G.
ID: 2429 · Report as offensive    Reply Quote
Conrad Poohs
Avatar

Send message
Joined: 29 Aug 06
Posts: 9
Credit: 1,955
RAC: 0
Message 2430 - Posted: 25 Oct 2006, 6:33:28 UTC

Above WU finally finished all of a sudden, having reduced run time back down to 2h 53m from over 6h (it actually suddenly changed to 90-something % complete and reduced time to go to a couple of minutes and run time to 2h 52m then completed).
As it actually clocked up considerably more runtime than it says I feel slightly cheated as I assume credits will be granted according to reported run time.

Regards,
Andy G.
ID: 2430 · Report as offensive    Reply Quote
SafeAggie

Send message
Joined: 5 Oct 06
Posts: 6
Credit: 4,207
RAC: 0
Message 2431 - Posted: 25 Oct 2006, 11:21:27 UTC

Unrecoverable error for result 1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_ANGLES_ALL_BOND_DISTANCES_SAVE_ALL_OUT__1385_26_0 (One or more arguments are invalid (0x80000003) - exit code -2147483645 (0x80000003))

10/25/2006 7:15:46 AM|ralph@home|Unrecoverable error for result 1hz6A_BOINC_NATIVEJUMPS_CLOSE_CHAINBREAKS_VARY_ALL_BOND_ANGLES_SAVE_ALL_OUT__1385_27_0 (One or more arguments are invalid (0x80000003) - exit code -2147483645 (0x80000003))

resultid=301379
resultid=301375
ID: 2431 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2432 - Posted: 25 Oct 2006, 23:50:11 UTC
Last modified: 26 Oct 2006, 0:39:31 UTC

Andy
Credits are based on the number of models you crunch, not raw CPU time (see Aug 23rd update).

The time remaining INCREASING as the WU runs is normal.

The CPU time spent on the WU will always be reduced if you end and restart BOINC. So, I suspect that's what happened in your case. It is just a question of how much time is lost.

Basically you lose all work done since the last checkpoint was established. Checkpoints are always established when a model reaches completion, they are also established periodically within a model, but for some types of WUs it can be more then an hour between checkpoints.

As for hogging your CPU... once BOINC feels overcommitted, it runs earliest deadline first. Since RALPH has short deadlines, it's common for it's WUs to have the earliest deadline in your list of tasks. Don't worry about the time hogging, BOINC keeps track of this and will make it up to the other projects. Read more.

In short, I don't see anything in your description that alarms me as being out of the ordinary. If you have further questions, perhaps start a new thread, either here on Ralph, or over on Rosetta.
ID: 2432 · Report as offensive    Reply Quote
Conrad Poohs
Avatar

Send message
Joined: 29 Aug 06
Posts: 9
Credit: 1,955
RAC: 0
Message 2433 - Posted: 26 Oct 2006, 6:49:10 UTC

Feet1st.

Thanks very much for the Info. I had got my head around most of that, it was just that it was the first time I had seen the time to go increase to quite such an extent (out to over 6 hours for a WU that estimated run time a about 3 hours). It also seems to me that the WU didn't checkpoint for some 3 hours but it was the large increase in run time that threw me.

Thanks again.

Regards,
Andy G.
ID: 2433 · Report as offensive    Reply Quote
Profile [B^S] Gamma^Ray

Send message
Joined: 20 Oct 06
Posts: 4
Credit: 1,038
RAC: 0
Message 2434 - Posted: 28 Oct 2006, 5:52:34 UTC

The first I ran V/5.34
Workunit 264440, Result ID 304006 Errored with:

stderr out <core_client_version>5.4.11</core_client_version>
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 2883885
# cpu_run_time_pref: 3600
ERROR:: Exit at: .minimize.cc line:2089

G^R




ID: 2434 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 2435 - Posted: 28 Oct 2006, 11:21:44 UTC

> The freezing and lock up problems is totally related to the new graphics that were introduced in Ralph 5.32. Within a very short time of the Boinc screensaver for Ralph/Rosetta coming on the work unit either stops/freezes and the processor may or may not keep going in the background with the Task Manager saying that all is well, or the graphics stop doing anything and Task Manager says 'not responding' with the cpu back to idle. In the first case I often had to reboot or the unit would eventually be killed by the watchdog. In the second I could manually kill the workunit in Task Manager.
Only one WU between Result Id 295099 and 295112 (13 wu's) actually completed successfully (295104) but reported late due to the computer closing down with these screen errors whilst I was away.
>> With Ralph 5.34 things have not changed at all see Result Id 302900.
I am having the same issues with Rosetta 5.32 and 5.34 and I have reported over there as well.
My solution has been to stop using Boinc Screensaver, this also stops my other projects from showing there graphics as well but every Work Unit since turning off the graphics has been successful, see Result Id 302935 to 302942 (8 results).
Also Rosetta has had no more problems either.

My 2 linux machines have no problems as they do not have graphics, 3 other Windows XP machines have no problem either due to Boinc being installed as a Service, so no graphics.
The 2 Windows XP machines with graphics are the only ones in trouble.
ID: 2435 · Report as offensive    Reply Quote
Brian B

Send message
Joined: 17 Feb 06
Posts: 9
Credit: 2,632
RAC: 0
Message 2442 - Posted: 1 Nov 2006, 2:46:09 UTC

Hi all. I have a wu/result that seems to have a checkpoint issue. I have noticed several times now that prior to shutting the laptop down to take with me, the wu might be at 5 hours CPU time and up around 11 hours (and increasing) to complete with Progress stuck at 1.00%. After arriving at my destination and booting the laptop back up, the wu will be restarted back to 0 hours CPU time with To complete at around 1:29 and increasing and Progress back to 0%. This has happened several times now. Sorry, but I am going to abort this wu, especially since its back to zero again and its probably already run around 10 hours total or so. Let me know if there is any more info I can supply. Thanks!
ID: 2442 · Report as offensive    Reply Quote

Message boards : RALPH@home bug list : Bug reports for Ralph 5.33 and 5.34



©2024 University of Washington
http://www.bakerlab.org