Bug Reports for Rosetta Mini Versions 1.+

Message boards : RALPH@home bug list : Bug Reports for Rosetta Mini Versions 1.+

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
j2satx

Send message
Joined: 17 Feb 06
Posts: 42
Credit: 168,797
RAC: 0
Message 3774 - Posted: 20 Feb 2008, 15:49:41 UTC


2/20/2008 12:04:01 AM|ralph@home|Task score13_hb_envtest62_A_5croA_3299_3750_0 exited with zero status but no 'finished' file
ID: 3774 · Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 3775 - Posted: 20 Feb 2008, 16:20:28 UTC - in response to Message 3772.  
Last modified: 20 Feb 2008, 16:30:15 UTC

I have been getting this kind of report continuously over the last few days!

Could you point to any of your failed results? (Hidden computers.)

To be more specific, the URL links to the mentioned results score13_hb_envtest62_A_1tig__3299_3942_0 and score13_hb_envtest62_A_1a19A_3299_3939_0.

I'm occasionally getting the "exited with zero status" mesages too. Last time I suppose (I'm just making assumptions from logs and task's stdout) because exiting Boinc did not notify preempted Ralph task (although it did notify 3 other running/preempted tasks), this did not remove the lock file and 37 seconds after new start few hours later, the task said "Can't acquire lockfile - exiting" and client said "Task .... exited with zero status but no 'finished' file". (The task was then correctly restarted and crunched until successful end.)

Peter
ID: 3775 · Report as offensive    Reply Quote
j2satx

Send message
Joined: 17 Feb 06
Posts: 42
Credit: 168,797
RAC: 0
Message 3779 - Posted: 21 Feb 2008, 16:40:25 UTC


1.08 won't run on my Windows W2K.....AMD or Intel, doesn't matter.

Is anyone getting good results with W2K?

ID: 3779 · Report as offensive    Reply Quote
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 250
Credit: 543,579
RAC: 0
Message 3781 - Posted: 21 Feb 2008, 18:17:34 UTC

I've noticed that and will try to fix it today. That is why there has been more failures. A compiler setting must have been changed. sorry about that.
ID: 3781 · Report as offensive    Reply Quote
j2satx

Send message
Joined: 17 Feb 06
Posts: 42
Credit: 168,797
RAC: 0
Message 3782 - Posted: 21 Feb 2008, 19:02:07 UTC - in response to Message 3781.  

I've noticed that and will try to fix it today. That is why there has been more failures. A compiler setting must have been changed. sorry about that.


It happens.

I'm watching three run on WXP now.......they look normal so far.

ID: 3782 · Report as offensive    Reply Quote
Evan

Send message
Joined: 23 Dec 07
Posts: 75
Credit: 69,584
RAC: 0
Message 3783 - Posted: 22 Feb 2008, 0:14:52 UTC

Here is another one that stalled. On restarting boinc it completed successfully.
score13_hb_envtest62_A_1opd__3299_4390_1
819291
ID: 3783 · Report as offensive    Reply Quote
Profile KC0ISW

Send message
Joined: 17 Feb 06
Posts: 20
Credit: 11,725
RAC: 0
Message 3785 - Posted: 23 Feb 2008, 7:06:55 UTC

https://ralph.bakerlab.org/result.php?resultid=786341
ID: 3785 · Report as offensive    Reply Quote
j2satx

Send message
Joined: 17 Feb 06
Posts: 42
Credit: 168,797
RAC: 0
Message 3786 - Posted: 23 Feb 2008, 16:45:42 UTC - in response to Message 3781.  

I've noticed that and will try to fix it today. That is why there has been more failures. A compiler setting must have been changed. sorry about that.


Still errors immediately on W2K.

Computer Project Date ID Message
775I65G01 ralph@home 2/23/2008 10:22:19 AM 3492 Task score13_hb_envtest62_A_1a19A_3299_5276_0 exited with zero status but no 'finished' file
ID: 3786 · Report as offensive    Reply Quote
Thomas Leibold

Send message
Joined: 25 Feb 07
Posts: 27
Credit: 77,464
RAC: 0
Message 3787 - Posted: 23 Feb 2008, 19:56:40 UTC

Linux x86 (SuSE 9.3), Boinc 5.10.21: While most workunits succeed I saw very similar errors for the following workunits:

674358 with mini-Rosetta 1.07
697535 with mini-Rosetta 1.08
698221 with mini-Rosetta 1.08

Stderr.txt shows: "Too many restarts with no progress. Keep application in memory while preempted." as well as exit code -161.

ID: 3787 · Report as offensive    Reply Quote
Evan

Send message
Joined: 23 Dec 07
Posts: 75
Credit: 69,584
RAC: 0
Message 3791 - Posted: 24 Feb 2008, 19:13:26 UTC
Last modified: 24 Feb 2008, 19:15:06 UTC

Here is yet another one that needed restarting. 700691
score13_hb_envtest62_A_1a19A_3299_5298_0

The error message states:
Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x63697461 read attempt to address 0x63697461

ID: 3791 · Report as offensive    Reply Quote
Sadir

Send message
Joined: 21 Feb 06
Posts: 6
Credit: 1,419
RAC: 0
Message 3792 - Posted: 25 Feb 2008, 10:20:10 UTC

Unable to run R@H :(

I am repeatedly getting
"25/02/2008 11:15:18|ralph@home|Restarting task score13_hb_envtest62_A_1tig__3299_6149_0 using minirosetta version 108
25/02/2008 11:15:19|ralph@home|Task score13_hb_envtest62_A_1tig__3299_6149_0 exited with zero status but no 'finished' file
25/02/2008 11:15:19|ralph@home|If this happens repeatedly you may need to reset the project."
even after reseting the project.

I aborted WU
705458
705801
after few minutes of getting that messages.

I've placed the minirosetta_1.08_windows_intelx86.pdb in the Ralp@Home project directory, but seems not working...
ID: 3792 · Report as offensive    Reply Quote
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 250
Credit: 543,579
RAC: 0
Message 3793 - Posted: 25 Feb 2008, 18:33:49 UTC

I'm working on another build that hopefully will fix the w2k problems. Still not sure why the symbols file isn't working. Hopefully I'll have a version update up later today.
ID: 3793 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 3794 - Posted: 25 Feb 2008, 20:10:57 UTC

This Mini 1.08 task ran for 69 min on WinXP and now shows "waiting to run", but 100% complete with "---" time to complete. I suspect it will go through normally once Ralph gets resource share back, I've just never seen such an issue on Windows before.

...it was only using 90MB while it was running, so good progress there!
ID: 3794 · Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 3795 - Posted: 25 Feb 2008, 20:22:44 UTC - in response to Message 3794.  

This Mini 1.08 task ran for 69 min on WinXP and now shows "waiting to run", but 100% complete with "---" time to complete. I suspect it will go through normally once Ralph gets resource share back, I've just never seen such an issue on Windows before.

Yes, it happens, I can see this occasionally, on various projects.

It is just enough that the app code checkpoints at 100% (sort of "YES I've got it!!") after finishing some last functional block, just before exit, the varying combination of different projects, their STD's and task lengths will take the opportunity to punish the application.

Peter
ID: 3795 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 3796 - Posted: 25 Feb 2008, 21:13:55 UTC

I should have mentioned, I'm running BOINC 5.10.20

I just suspended Rosetta tasks to force some time back to Ralph. It ran another 2.5 min the % complete revised back to 87.7% and it continued crunching.

I've got a 1hr RT preference. So that puts the task over the target, and the time remaining is just over 10 minutes, so that % complete is probably just the recomputed 10 minutes deal when task runs long. Now, even after watching for 30 seconds, the % complete hasn't budged. (usually, once you are down in the 10min range, the % complete adjusts just a smidge every 5 seconds).
ID: 3796 · Report as offensive    Reply Quote
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 250
Credit: 543,579
RAC: 0
Message 3797 - Posted: 25 Feb 2008, 22:48:48 UTC

yes, the 10 minutes remaining time is in mini as it is in rosetta++. The percent complete will always show that there is 10 minutes left near the end of the run. I've also added the watchdog into mini for the next version update since some users are reporting jobs being stuck (although it is not confirmed that jobs are really getting stuck).

Jobs may appear stuck because the current version of mini only updates the percent complete after a checkpoint is made and checkpoints aren't made that often during the full-atom refinement stage. Slower computers can run for hours before a checkpoint is made! However, we will be adding more checkpoints so this doesn't happen.
ID: 3797 · Report as offensive    Reply Quote
Evan

Send message
Joined: 23 Dec 07
Posts: 75
Credit: 69,584
RAC: 0
Message 3798 - Posted: 25 Feb 2008, 23:11:57 UTC

jobs may appear stuck because the current version of mini only updates the percent complete after a checkpoint is made and checkpoints aren't made that often during the full-atom refinement stage.


That may be so, but in the mini 1.08 cases I reported the cpu usage was running at about 1%. In my book that means stuck!
ID: 3798 · Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 3801 - Posted: 26 Feb 2008, 13:50:39 UTC - in response to Message 3798.  

jobs may appear stuck because the current version of mini only updates the percent complete after a checkpoint is made and checkpoints aren't made that often during the full-atom refinement stage.

That may be so, but in the mini 1.08 cases I reported the cpu usage was running at about 1%. In my book that means stuck!

There seems to be two definitions of "stuck" in broad use, which should be distinguished:

- tasks which do consume CPU time, but do not progress (%-wise) for maybe hours (because of not updating it), these might or might not checkpoint and will or will not be preempted after theit timeslot accordingly, and
- these which do not consume any CPU time (and do not progress %-wise accordingly :-) but probably still exchange heartbeat messages with the client, so client still leaves them in their latent state, hoping they might checkpoint soon. These "sleepy" tasks block one CPU from Boinc computations and might stay in such state until either being manualy suspended (and/)or the client is restarted.

Peter
ID: 3801 · Report as offensive    Reply Quote
tallguy-13088
Avatar

Send message
Joined: 17 Feb 06
Posts: 10
Credit: 121,701
RAC: 0
Message 3803 - Posted: 26 Feb 2008, 22:31:13 UTC
Last modified: 26 Feb 2008, 22:55:19 UTC

Dekim,

This might help with the W2K issue. I'm running dual Xeon 2.4Ghz on W2K (most recently "pulled" fixes around Oct 2007) under BOINC 5.10.30. I have been consistently seeing

core_client_version>5.10.30</core_client_version> <![CDATA[ <message> - exit code 1647259450 (0x622f2f3a) </message> ]]


since about Feb. 21. Here is an example of both the W/U and the RESULTS

W/U

RESULT

Hope this helps. Let me know if I can provide additional info. BTW, I d/l the Symbols but it sounds like you are having problems with them working. Good Luck!

Update: Rats! I had the 1.07 Symbol file and I see you have updated it subsequently! Just D/L'ed it.
ID: 3803 · Report as offensive    Reply Quote
AdeB
Avatar

Send message
Joined: 22 Dec 07
Posts: 61
Credit: 161,367
RAC: 0
Message 3809 - Posted: 29 Feb 2008, 16:54:00 UTC

Error in resultid=798264

<core_client_version>5.10.28</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
Failed to find rotamer: 3 0 2
Amongst options: 3 2 2
3 3 3
3 3 2
3 2 1
3 2 3
3 1 1
1 3 2
3 1 2
1 2 2
3 3 1
1 2 1
1 2 3
3 1 3
1 3 1
2 2 2
1 3 3
2 2 1
2 2 3
2 1 1
2 3 2
2 1 2
1 1 2
1 1 1
2 3 3
2 1 3
1 1 3
2 3 1
ERROR:: Exit from: src/core/scoring/dunbrack/SingleResidueDunbrackLibrary.tmpl.hh line: 142
called boinc_finish

</stderr_txt>
]]>
ID: 3809 · Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : RALPH@home bug list : Bug Reports for Rosetta Mini Versions 1.+



©2024 University of Washington
http://www.bakerlab.org