Message boards : RALPH@home bug list : Rosetta min 1.03
Author | Message |
---|---|
Barraud Denis Send message Joined: 5 Apr 07 Posts: 3 Credit: 84,809 RAC: 0 |
This program is probablely bugged, when these wu start the boinc manager freeze, and dont work . the 2 process of boinc stay in memory, they start calculate and stop imediately the wus. No possibility to use the boinc manager completely blocked And no work are donne. The only solution i found to recover boinc is : to kill the 2 process boinc in task manager of XP, and to modify the file client_stat.xml to add in the <project> section of ralph and rosetta project after <send_job_log>0</send_job_log> the line <suspended_via_gui/> to stop the project. The right will be to stop rosetta mini 1.03 wu, or wait to new version 1.04. |
Dr Who Fan Send message Joined: 2 Sep 06 Posts: 76 Credit: 107,857 RAC: 0 |
I saw the same thing happen on one of my machines and had to use task manager to kill BOINC and all subprocess. Strange thing is the Rosetta min 1.03 task did not produce an error log in BOINC and seems to have restarted/running ok. I'll keep watch on my machines to see what happens when or if they complete. |
Pepo Send message Joined: 8 Sep 06 Posts: 104 Credit: 36,890 RAC: 0 |
Shortly after starting my Rosetta Mini 1.03, Kaspersky Antivirus started to consume 100% CPU for more than 2 minutes (no idea, whether Rosetta Mini triggered it, or something else, I'd bet it was it), approx. until Rosetta Mini exited. A bit later all other Boinc (5.1035) applications exited too (possibly missed heartbeat?), Boinc Manager and client were left unresponsive, client did not respond to GUI RPCs. I've tried to start another Boinc Manager, but it was left somewhere in memory and the GUI did not appear. After being left 3/4 hour untouched, finally both Managers suddenly appeared and the client started the apps again. Last lines from stdoutdae.txt: ..... 12:42:27 [Einstein@Home] [task_debug] result h1_0703.70_S5R2__151_S5R3a_0 checkpointed 12:43:00 [SETI@home] [task_debug] Process for 13dc06ab.1691.2117.6.6.33_0 exited 12:43:00 [SETI@home] [task_debug] task_state=EXITED for 13dc06ab.1691.2117.6.6.33_0 from handle_exited_app 12:43:00 [SETI@home] Computation for task 13dc06ab.1691.2117.6.6.33_0 finished 12:43:00 [SETI@home] [task_debug] result state=FILES_UPLOADING for 13dc06ab.1691.2117.6.6.33_0 from CS::app_finished 12:43:00 [ralph@home] Starting mini_abrelax-1c8cA-test_james_2821_6_2 12:43:00 [ralph@home] [cpu_sched] Starting mini_abrelax-1c8cA-test_james_2821_6_2 (initial) 12:43:01 [ralph@home] [task_debug] task_state=EXECUTING for mini_abrelax-1c8cA-test_james_2821_6_2 from start 12:43:01 [ralph@home] Starting task mini_abrelax-1c8cA-test_james_2821_6_2 using minirosetta version 103 12:45:00 [DepSpid] [task_debug] result spider_153965_0 checkpointed 12:48:41 [ralph@home] [task_debug] Process for mini_abrelax-1c8cA-test_james_2821_6_2 exited 12:48:41 [ralph@home] [task_debug] task_state=EXITED for mini_abrelax-1c8cA-test_james_2821_6_2 from handle_exited_app 12:48:41 [ralph@home] [sched_op_debug] Deferring communication for 20 min 1 sec 12:48:41 [ralph@home] [sched_op_debug] Reason: Unrecoverable error for result mini_abrelax-1c8cA-test_james_2821_6_2 ( - exit code -1073741819 (0xc0000005)) 12:48:41 [ralph@home] [task_debug] result state=COMPUTE_ERROR for mini_abrelax-1c8cA-test_james_2821_6_2 from CS::report_result_error 12:48:41 [ralph@home] [task_debug] Process for mini_abrelax-1c8cA-test_james_2821_6_2 exited Because I've left Boinc running, after the long delay it could finally flush its internal buffers and output some additional error lines. These came then after resurrection: 12:48:41 [ralph@home] [task_debug] exit code -1073741819 (0xc0000005): 13:12:48 [ralph@home] Computation for task mini_abrelax-1c8cA-test_james_2821_6_2 finished 13:12:48 [ralph@home] Output file mini_abrelax-1c8cA-test_james_2821_6_2_0 for task mini_abrelax-1c8cA-test_james_2821_6_2 absent 13:12:48 [ralph@home] [task_debug] result state=COMPUTE_ERROR for mini_abrelax-1c8cA-test_james_2821_6_2 from CS::app_finished 13:36:41 [SETI@home] Starting 13dc06ab.1691.2117.6.6.189_0 13:36:41 [SETI@home] [cpu_sched] Starting 13dc06ab.1691.2117.6.6.189_0 (initial) 13:36:41 [SETI@home] [task_debug] task_state=EXECUTING for 13dc06ab.1691.2117.6.6.189_0 from start 13:36:41 [SETI@home] Starting task 13dc06ab.1691.2117.6.6.189_0 using setiathome_enhanced version 527 .... All other results from the WU 647168 crashed, with lots of error liness like: 00307e28 0202f099 0202f09a 0202f09b 0202f09c 0202f09d minirosetta_1.03_windows_intelx!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '0202f098' 00307e2c 0202f09a 0202f09b 0202f09c 0202f09d 0202f09e minirosetta_1.03_windows_intelx!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '0202f099' 00307e30 0202f09b 0202f09c 0202f09d 0202f09e 0202f09f minirosetta_1.03_windows_intelx!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '0202f09a' My result 733546 produced BOINC Windows Runtime Debugger output instead. Peter |
[B^S] sTrey Send message Joined: 15 Feb 06 Posts: 58 Credit: 15,430 RAC: 0 |
Other experiences in another thread. This app did nasty things to 2/3 of my hosts, so Ralph is set to NNW. Please will the project people say when we allow work again without risk of running the toxic "mini" app? Thanks |
Dr Who Fan Send message Joined: 2 Sep 06 Posts: 76 Credit: 107,857 RAC: 0 |
I have had it with the RALPH@home project! The "Rosetta min 1.03" app crashed my whole PC - I have just spent 5+ hours rebuilding it from install disks and backups. I have detached all 3 of my machines from the project as a result and will not reattach. |
Pepo Send message Joined: 8 Sep 06 Posts: 104 Credit: 36,890 RAC: 0 |
My Ralph's "slots2" contained a folder "minirosetta_database", containing another 69 subfolders and 240 files, total 634 kB. After start, Boinc client seemed to be busy with these files looong after having started. One minute after start and unsuccessful attempts to connect, the Manager then asked whether to try to connect again... And the client was still checking Ralph's files. A hour long... It was enough to remove the "..Boincslots2minirosetta_database" tree, everything was then fine. Peter |
dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0 |
We are currently looking into this bug. In order to debug this new app, we will have to send out more work units. Sorry for all the troubles. |
Luuklag Send message Joined: 5 Jan 08 Posts: 15 Credit: 80 RAC: 0 |
i set my runtime to 2 hours but yet boinc says the expected runtime is 5,30 hours, so 2.25 times the enforced runtime, this cant be any good, can it? |
Luuklag Send message Joined: 5 Jan 08 Posts: 15 Credit: 80 RAC: 0 |
well that wu locks up whole boinc, any way i can pause ralph to continue rosie untill this is fixed. |
Pepo Send message Joined: 8 Sep 06 Posts: 104 Credit: 36,890 RAC: 0 |
dekim wrote: We are currently looking into this bug. Actually a harmless issue with remarkable consequences :-) Boinc devs wrote: Here's what was happening: dekim wrote: In order to debug this new app, we will have to send out more work units. Sorry for all the troubles. I think... who is willing to continue testing Rosetta mini: if Boinc will hang next time, go to "..Boincslots*minirosetta_database" and delete all read-only files in the subdirectories. (Actually just in "...svn" subdirectories, but maybe it is the same, if just one Rosetta mini WU was running.) Boinc will immediately continue its work. Peter |
dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0 |
When I said "we", I mainly meant David Anderson. I had to create more workunits for David Anderson to debug the client. Once he was able to get a work unit, he responded right away with the cause and checked in a fix. There are some non-boinc client related issues with mini also. If you are experiencing this problem, manually remove the minirosetta_database directory in the slots directory. This is David Anderson's reply: "Here's what was happening: The minirosetta app in Ralph unzipped an archive into its slot directory. This archive (accidentally) included some .svn directories (Subversion stuff) whose contents were flagged as read-only. When the job finished or was aborted, and the BOINC client tried to clean out the slot directory, the delete of each of these files would fail. The client would wait for 5 seconds and try again (it does this because sometimes files are locked temporarily by virus checkers or other disk-scan apps). This would happen for each file, resulting in a 10-20 minute period during which the client and Manager appear hung. I made two changes that fix this (and hopefully avoid similar problems in the future) as follows: 1) if a file delete fails with error ERROR_ACCESS_DENIED, use SetFileAttributes() to clear the read-only flag, then try again. 2) Don't use the 5-second retry mechanism when clearing out slot directories. These can contain unbounded numbers of files, and this can lead to long periods where the client appears hung. Rom, please back-port this to 5.10 -- DPA " |
Pepo Send message Joined: 8 Sep 06 Posts: 104 Credit: 36,890 RAC: 0 |
If you are experiencing this problem, manually remove the minirosetta_database directory in the slots directory. It is safe to say, as soon as the WU will start, it's enough to reset read-only flags on all files in the minirosetta_database folder tree? Boinc client is then able to handle them correctly. Peter |
Evan Send message Joined: 23 Dec 07 Posts: 75 Credit: 69,584 RAC: 0 |
After a complete failure with the minirosetta I deleted the database as advised and now back in control of my computer and haven't lost the Rosetta jobs I was working on. I thought something was going wrong when there were no graphics and it failed after 8%. I hope next time is less exiting!! |
Jack Shaftoe Send message Joined: 8 Aug 06 Posts: 13 Credit: 105,163 RAC: 0 |
if Boinc will hang next time, go to "..Boincslots*minirosetta_database" and delete all read-only files in the subdirectories. (Actually just in "...svn" subdirectories, but maybe it is the same, if just one Rosetta mini WU was running.) Boinc will immediately continue its work. Worked for me on my 2 machines with this problem. Thanks, |
Luuklag Send message Joined: 5 Jan 08 Posts: 15 Credit: 80 RAC: 0 |
ok :D my boinc is running again, after manualy exiting it with the good old ctrl+alt+del :) removing the library wich my virusscan didn't really liked but i got it bypassed and now its running again :) Ralph aint a project for Newbie's that for shure. |
BigMike Send message Joined: 23 Feb 06 Posts: 63 Credit: 58,730 RAC: 0 |
Yeow ... talk about ugly. I had to abort the two mini WU's I had (sorry) to keep everything from locking up. But I couldn't have done it without Barraud Denis's advice (thanks!): add in the <project> section of ralph and rosetta project after <send_job_log>0</send_job_log> And I guess some people don't understand that Alpha means Alpha: I have detached all 3 of my machines from the project as a result and will not reattach Too bad. However, it's nice to know we're a fairly competent group of people and can get around the occasional challenge! ==Mike Don't believe everything you think. |
Conan Send message Joined: 16 Feb 06 Posts: 364 Credit: 1,368,421 RAC: 0 |
Those 'mini's stuffed my computer. I have spent over two hours trying to get things going again. I tried restarting Boinc Manager, this failed. I tried to reboot and then start BM, this failed (it rebooted twice before Windows started). I tried to reinstall Boinc 5.10.35 but this failed. I tried to install Boinc 5.8.16 and this worked until I added back my projects, then it failed. I totally removed and reinstalled Boinc (various versions) all failed. Removed again and downloaded 5.10.38, amazingly this worked and all files kept going and are still running. I will check those other suggestions above to get rid of ralph mini. This is possibly the worst Ralph release to date. WU 735674 WU 735537 |
Pepo Send message Joined: 8 Sep 06 Posts: 104 Credit: 36,890 RAC: 0 |
Removed again and downloaded 5.10.38, amazingly this worked and all files kept going and are still running. With 5.10.38 you are done. Peter |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
...and without 5.10.38, you just have to follow Pepo's tip to remove the read-only attributes. It "stuffed" my computer as well. I run Windows. Symptom was that it seemed to hang BOINC Manager, it was no longer communicating and nothing was running. A restart didn't clear up the situation (because the slots directories were in an invalid state). |
zombie67 [MM] Send message Joined: 8 Aug 06 Posts: 75 Credit: 2,396,363 RAC: 6,299 |
The minirosetta app in Ralph unzipped an archive into its slot directory. Has this been fixed in the mini tasks and/or application yet? Was this ever a problem with OSX version? Reno, NV Team: SETI.USA |
Message boards :
RALPH@home bug list :
Rosetta min 1.03
©2024 University of Washington
http://www.bakerlab.org