Message boards : RALPH@home bug list : Bug Reports for Rosetta Mini Versions 1.+
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
j2satx Send message Joined: 17 Feb 06 Posts: 42 Credit: 168,797 RAC: 0 |
2/20/2008 12:04:01 AM|ralph@home|Task score13_hb_envtest62_A_5croA_3299_3750_0 exited with zero status but no 'finished' file |
Pepo Send message Joined: 8 Sep 06 Posts: 104 Credit: 36,890 RAC: 0 |
I have been getting this kind of report continuously over the last few days! To be more specific, the URL links to the mentioned results score13_hb_envtest62_A_1tig__3299_3942_0 and score13_hb_envtest62_A_1a19A_3299_3939_0. I'm occasionally getting the "exited with zero status" mesages too. Last time I suppose (I'm just making assumptions from logs and task's stdout) because exiting Boinc did not notify preempted Ralph task (although it did notify 3 other running/preempted tasks), this did not remove the lock file and 37 seconds after new start few hours later, the task said "Can't acquire lockfile - exiting" and client said "Task .... exited with zero status but no 'finished' file". (The task was then correctly restarted and crunched until successful end.) Peter |
j2satx Send message Joined: 17 Feb 06 Posts: 42 Credit: 168,797 RAC: 0 |
1.08 won't run on my Windows W2K.....AMD or Intel, doesn't matter. Is anyone getting good results with W2K? |
dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0 |
I've noticed that and will try to fix it today. That is why there has been more failures. A compiler setting must have been changed. sorry about that. |
j2satx Send message Joined: 17 Feb 06 Posts: 42 Credit: 168,797 RAC: 0 |
I've noticed that and will try to fix it today. That is why there has been more failures. A compiler setting must have been changed. sorry about that. It happens. I'm watching three run on WXP now.......they look normal so far. |
Evan Send message Joined: 23 Dec 07 Posts: 75 Credit: 69,584 RAC: 0 |
Here is another one that stalled. On restarting boinc it completed successfully. score13_hb_envtest62_A_1opd__3299_4390_1 819291 |
KC0ISW Send message Joined: 17 Feb 06 Posts: 20 Credit: 11,725 RAC: 0 |
https://ralph.bakerlab.org/result.php?resultid=786341 |
j2satx Send message Joined: 17 Feb 06 Posts: 42 Credit: 168,797 RAC: 0 |
I've noticed that and will try to fix it today. That is why there has been more failures. A compiler setting must have been changed. sorry about that. Still errors immediately on W2K. Computer Project Date ID Message 775I65G01 ralph@home 2/23/2008 10:22:19 AM 3492 Task score13_hb_envtest62_A_1a19A_3299_5276_0 exited with zero status but no 'finished' file |
Thomas Leibold Send message Joined: 25 Feb 07 Posts: 27 Credit: 77,464 RAC: 0 |
Linux x86 (SuSE 9.3), Boinc 5.10.21: While most workunits succeed I saw very similar errors for the following workunits: 674358 with mini-Rosetta 1.07 697535 with mini-Rosetta 1.08 698221 with mini-Rosetta 1.08 Stderr.txt shows: "Too many restarts with no progress. Keep application in memory while preempted." as well as exit code -161. |
Evan Send message Joined: 23 Dec 07 Posts: 75 Credit: 69,584 RAC: 0 |
Here is yet another one that needed restarting. 700691 score13_hb_envtest62_A_1a19A_3299_5298_0 The error message states: Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x63697461 read attempt to address 0x63697461 |
Sadir Send message Joined: 21 Feb 06 Posts: 6 Credit: 1,419 RAC: 0 |
Unable to run R@H :( I am repeatedly getting "25/02/2008 11:15:18|ralph@home|Restarting task score13_hb_envtest62_A_1tig__3299_6149_0 using minirosetta version 108 25/02/2008 11:15:19|ralph@home|Task score13_hb_envtest62_A_1tig__3299_6149_0 exited with zero status but no 'finished' file 25/02/2008 11:15:19|ralph@home|If this happens repeatedly you may need to reset the project." even after reseting the project. I aborted WU 705458 705801 after few minutes of getting that messages. I've placed the minirosetta_1.08_windows_intelx86.pdb in the Ralp@Home project directory, but seems not working... |
dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0 |
I'm working on another build that hopefully will fix the w2k problems. Still not sure why the symbols file isn't working. Hopefully I'll have a version update up later today. |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
This Mini 1.08 task ran for 69 min on WinXP and now shows "waiting to run", but 100% complete with "---" time to complete. I suspect it will go through normally once Ralph gets resource share back, I've just never seen such an issue on Windows before. ...it was only using 90MB while it was running, so good progress there! |
Pepo Send message Joined: 8 Sep 06 Posts: 104 Credit: 36,890 RAC: 0 |
This Mini 1.08 task ran for 69 min on WinXP and now shows "waiting to run", but 100% complete with "---" time to complete. I suspect it will go through normally once Ralph gets resource share back, I've just never seen such an issue on Windows before. Yes, it happens, I can see this occasionally, on various projects. It is just enough that the app code checkpoints at 100% (sort of "YES I've got it!!") after finishing some last functional block, just before exit, the varying combination of different projects, their STD's and task lengths will take the opportunity to punish the application. Peter |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
I should have mentioned, I'm running BOINC 5.10.20 I just suspended Rosetta tasks to force some time back to Ralph. It ran another 2.5 min the % complete revised back to 87.7% and it continued crunching. I've got a 1hr RT preference. So that puts the task over the target, and the time remaining is just over 10 minutes, so that % complete is probably just the recomputed 10 minutes deal when task runs long. Now, even after watching for 30 seconds, the % complete hasn't budged. (usually, once you are down in the 10min range, the % complete adjusts just a smidge every 5 seconds). |
dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0 |
yes, the 10 minutes remaining time is in mini as it is in rosetta++. The percent complete will always show that there is 10 minutes left near the end of the run. I've also added the watchdog into mini for the next version update since some users are reporting jobs being stuck (although it is not confirmed that jobs are really getting stuck). Jobs may appear stuck because the current version of mini only updates the percent complete after a checkpoint is made and checkpoints aren't made that often during the full-atom refinement stage. Slower computers can run for hours before a checkpoint is made! However, we will be adding more checkpoints so this doesn't happen. |
Evan Send message Joined: 23 Dec 07 Posts: 75 Credit: 69,584 RAC: 0 |
jobs may appear stuck because the current version of mini only updates the percent complete after a checkpoint is made and checkpoints aren't made that often during the full-atom refinement stage. That may be so, but in the mini 1.08 cases I reported the cpu usage was running at about 1%. In my book that means stuck! |
Pepo Send message Joined: 8 Sep 06 Posts: 104 Credit: 36,890 RAC: 0 |
jobs may appear stuck because the current version of mini only updates the percent complete after a checkpoint is made and checkpoints aren't made that often during the full-atom refinement stage. There seems to be two definitions of "stuck" in broad use, which should be distinguished: - tasks which do consume CPU time, but do not progress (%-wise) for maybe hours (because of not updating it), these might or might not checkpoint and will or will not be preempted after theit timeslot accordingly, and - these which do not consume any CPU time (and do not progress %-wise accordingly :-) but probably still exchange heartbeat messages with the client, so client still leaves them in their latent state, hoping they might checkpoint soon. These "sleepy" tasks block one CPU from Boinc computations and might stay in such state until either being manualy suspended (and/)or the client is restarted. Peter |
tallguy-13088 Send message Joined: 17 Feb 06 Posts: 10 Credit: 121,701 RAC: 0 |
Dekim, This might help with the W2K issue. I'm running dual Xeon 2.4Ghz on W2K (most recently "pulled" fixes around Oct 2007) under BOINC 5.10.30. I have been consistently seeing core_client_version>5.10.30</core_client_version> <![CDATA[ <message> - exit code 1647259450 (0x622f2f3a) </message> ]] since about Feb. 21. Here is an example of both the W/U and the RESULTS W/U RESULT Hope this helps. Let me know if I can provide additional info. BTW, I d/l the Symbols but it sounds like you are having problems with them working. Good Luck! Update: Rats! I had the 1.07 Symbol file and I see you have updated it subsequently! Just D/L'ed it. |
AdeB Send message Joined: 22 Dec 07 Posts: 61 Credit: 161,367 RAC: 0 |
Error in resultid=798264 <core_client_version>5.10.28</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> Graphics are disabled due to configuration... # cpu_run_time_pref: 14400 Failed to find rotamer: 3 0 2 Amongst options: 3 2 2 3 3 3 3 3 2 3 2 1 3 2 3 3 1 1 1 3 2 3 1 2 1 2 2 3 3 1 1 2 1 1 2 3 3 1 3 1 3 1 2 2 2 1 3 3 2 2 1 2 2 3 2 1 1 2 3 2 2 1 2 1 1 2 1 1 1 2 3 3 2 1 3 1 1 3 2 3 1 ERROR:: Exit from: src/core/scoring/dunbrack/SingleResidueDunbrackLibrary.tmpl.hh line: 142 called boinc_finish </stderr_txt> ]]> |
Message boards :
RALPH@home bug list :
Bug Reports for Rosetta Mini Versions 1.+
©2024 University of Washington
http://www.bakerlab.org