Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Fuzzy Hollynoodles Send message Joined: 19 Feb 06 Posts: 37 Credit: 2,089 RAC: 0 |
I have one: https://ralph.bakerlab.org/workunit.php?wuid=11108 Result: https://ralph.bakerlab.org/result.php?resultid=12738 I got it last night, and it ran for more than an hour on 1%. I opened the graphic to see what was going on, and it seemed to be "alive", with some very small wiggles, and almost no movements of the curves. It ran, when I shut down before I went to bed, as I usually do, and I booted up again when I got up and went out. When I came home again about 30 minutes ago, the other project WU's have run, so everything was reset to zero, CPU time and percentage. It started again, after I manually updated Ralph, and it seems it has started again from scratch, as the CPU time has reset to zero and the percentage is on 1 again. I have set them all to stay in memory and a Target CPU run time set to default (8 hours). My computer is https://ralph.bakerlab.org/show_host_detail.php?hostid=797 But the stdout file looks interesting. David Kim, do you want me to mail it to you? It's very long, so I wont post it here. In the graphic it looks totally dead with not curves at all and no movements. It seems stopped at step 32509. Shall I leave it running and see what's happening? Or should I just put it out of it's misery? :-( EDIT: 2/28/2006 4:07:01 PM|ralph@home|Resuming result HOMSdi_homDB003_1di2__228_9_0 using rosetta_beta version 490 And it seems to be "alive" but very slow. It has now moved up to step 32521 [color=navy][b]"I'm trying to maintain a shred of dignity in this world." - Me[/b][/color] |
Fuzzy Hollynoodles Send message Joined: 19 Feb 06 Posts: 37 Credit: 2,089 RAC: 0 |
I have one: And it finished without I noticed it. Result: https://ralph.bakerlab.org/result.php?resultid=12738 [color=navy][b]"I'm trying to maintain a shred of dignity in this world." - Me[/b][/color] |
[B^S] thierry@home Send message Joined: 15 Feb 06 Posts: 20 Credit: 17,624 RAC: 0 |
I have a WU 4.90 stuck at 1% for 1h05'. The graphics are more or less freezed. THe protein shape moves a little bit every +/- 20 seconds. What do I do with this WU? I have suspended it until I know what to do. WU number : HOMSb7_homDB005_1b72_226_2 CPU : P4 3.0Ghz HT OS : XP SP2 |
Stargazer257 Send message Joined: 16 Feb 06 Posts: 6 Credit: 17,492 RAC: 0 |
I have a WU 4.90 stuck at 1% for 1h05'. The graphics are more or less freezed. THe protein shape moves a little bit every +/- 20 seconds. What do I do with this WU? Continue to run it. I had two that were like that (got to step ~34,000 real quick and then appeared to stop). One of them has since completed at ~5 hrs, the other is still going at 6+ hrs. Check the graphics/screensaver and see if the steps slowly increment. The one I have that is still running has only done ~500 steps since it appeared to slow down/stop. As long as the steps continue to increment (albiet, slowly), it is still running. And BTW, the progress only showed 1% done until it finished. Then it went to 100%. Hope yours are like that. Join Us! - Click the Sig! |
[B^S] thierry@home Send message Joined: 15 Feb 06 Posts: 20 Credit: 17,624 RAC: 0 |
OK, I've restarted it. Will see.... Thanks |
Hickory Explorer [USA] Send message Joined: 15 Feb 06 Posts: 2 Credit: 9,562 RAC: 0 |
I had a WU at 1% this morning. It finished while at work, so I didn't see it finish. Doesn't look like it completed much work in the 7.57 hours that it ran. WU ID : 11353 WU name : HOMSdc_homDB008_1dcj__229_7 CPU : P4 3.0Ghz HT OS : XP SP2 <core_client_version>5.2.13</core_client_version> <stderr_txt> # random seed: 3988759 # cpu_run_time_pref: 7200 # DONE :: 1 starting structures built 0 (nstruct) times # This process generated 1 decoys from 1 attempts </stderr_txt> |
Hickory Explorer [USA] Send message Joined: 15 Feb 06 Posts: 2 Credit: 9,562 RAC: 0 |
Have a 4.90 unit on another PC that was struck at 1%. It was on model 1 at step 34401. It had been running for 4 hours. I stopped and restarted Boinc. When the WU restarted, it started at 0. It has been iniatizing now for 30+ minutes. Will let it run. WU ID: 11340 Results ID: 12974 Result Name: HOMSdc_homDB025_1dcj__229_6_0 Computer ID: 100 CPU: Pentium M 1.73GHz OS: XP SP2 |
Stargazer257 Send message Joined: 16 Feb 06 Posts: 6 Credit: 17,492 RAC: 0 |
Have a 4.90 unit on another PC that was struck at 1%. It was on model 1 at step 34401. It had been running for 4 hours. That's what mine did too (reset to 0:00 upon restart). The reason it did this is because the work hadn't reached a "checkpoint" as it were. Upon reboot, it didn't have a place to start and had to begin anew. You will have to let it run longer (of the two WU's I had like that, one ran ~6 hrs, and the other is still running at 9+ hrs). Look at the screensaver/graphic and see if the steps increment (it may seem like it is stopped, but check the step, then check back later to see if it has changed). My WU's raced up to Step 34,000 then seemed to stop. It actually has done 5-600 additional steps over the last 9 hours. Good luck Join Us! - Click the Sig! |
STE\/E Send message Joined: 16 Feb 06 Posts: 27 Credit: 2,226,442 RAC: 783 |
How long should we let these WU's run ... ??? I have one now at over 11 hours & 1 at over 9 hours, both are still at 1% and the Computers are 3.4 Ghz. They should have been done by now I would think ... ??? PS: The one WU that was @ over 9 hours finally finished @ 9:47 Hr's .. The one @ over 11 hr's is still running, now up close to 12 hours ... :0 |
STE\/E Send message Joined: 16 Feb 06 Posts: 27 Credit: 2,226,442 RAC: 783 |
As far as I can determine the WU that was over 11 hours is still running according to the Process Manager. It show 50% usage of the CPU for that WU, it's still running & at 13:30 hours now. I'll let it continue to run & see what happens to it & will report back on it one way or the other ... |
[B^S] Dr. Bill Skiba Send message Joined: 15 Feb 06 Posts: 4 Credit: 6,496 RAC: 0 |
I just aborted this wu. https://ralph.bakerlab.org/result.php?resultid=12982. It reset itself to "0" time several times (yes, it was left in memory). I shut down BOINC, restarted the system and encountered the same behavior. After 3 more restarts from "0" time I gave up on it. |
Bruno G. Olsen & ESEA @ greenholt Send message Joined: 16 Feb 06 Posts: 4 Credit: 45,078 RAC: 0 |
work unit: https://ralph.bakerlab.org/workunit.php?wuid=11591 result: https://ralph.bakerlab.org/result.php?resultid=13442 host: https://ralph.bakerlab.org/show_host_detail.php?hostid=285 has been running for 1 hour and 44 minutes and reports 6 hours left |
STE\/E Send message Joined: 16 Feb 06 Posts: 27 Credit: 2,226,442 RAC: 783 |
As far as I can determine the WU that was over 11 hours is still running according to the Process Manager. It show 50% usage of the CPU for that WU, it's still running & at 13:30 hours now. I'll let it continue to run & see what happens to it & will report back on it one way or the other ... PS: This WU just finally did finish successfully @ the 20:41 Hour Mark, it never did show more than 1% finished the whole time it ran ... :) |
genes Send message Joined: 16 Feb 06 Posts: 45 Credit: 43,706 RAC: 20 |
Well, the machine I had set up to test "leave in memory = NO" has restarted a bunch of times, basically every time that the apps switch. I just changed that to "leave in memory = YES". I would guess that we can't do that test anymore while 4.90 WU's are being sent out. [edit] BTW, I'm now running BOINC ver. 5.3.22, since it has the ability to use a "global_prefs_override.xml" file to quickly change preferences like Leave Apps In Memory without worrying what venue a machine belongs to or what other machines the change might affect. FINALLY! [/edit] |
Brotherbard Send message Joined: 16 Feb 06 Posts: 15 Credit: 76,109 RAC: 0 |
The WU # 11525 1vdi_loop_1m5xA__1001_233_5 has been hung at 1% for 13 hours now. In the graphics the model is not changing and the stats show: Stage: Relax, Model: 1, Step: 0. The stderr file is filled with "Could not identify element type from chemical symbol. Setting as undefined". And both the stderr and sdtout files have not been modified since about a half hour from the start of the WU. It is still running. --Nathan |
Brotherbard Send message Joined: 16 Feb 06 Posts: 15 Credit: 76,109 RAC: 0 |
If it is a version 4.90 WU, abort it. If it is a 4.91 WU then try restarting it by restarting BOINC. It's on a Mac OS X 10.4.5, RAPLH v 4.85 --Nathan |
Carlos_Pfitzner Send message Joined: 16 Feb 06 Posts: 182 Credit: 22,792 RAC: 0 |
stuck at 1.00% https://ralph.bakerlab.org/result.php?resultid=17832 Rosetta_beta 4.84 Linux CPU 98% IDLE *Restarting boinc Click signature for global team stats |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
rosetta_beta 4.93 Core Client is 5.2.15 I have a "Stuck at 1%" in progress right now. I have the app set to leave workunits in memory. Its on a Win2000 SP4 machine. Heres the Result ID: https://ralph.bakerlab.org/result.php?resultid=17137 Workunit ID: https://ralph.bakerlab.org/workunit.php?wuid=11522 Its been running for 16 hours of CPU time. Is there any info I can gather to help with this one while its in progress? I noticed you include the .pdb file. I can do remote debugging of VS2005 apps on this machine, I just need some clues as to what to look for. |
Rom Walton (BOINC) Volunteer moderator Project developer Send message Joined: 10 Mar 06 Posts: 21 Credit: 5,515 RAC: 0 |
Probably the best thing to do is get this tool: http://www.sysinternals.com/Utilities/ProcessExplorer.html Open up process explorer. Right-Click on the Rosetta process and bring up the properties. Switch to the threads tab. For each thread that is eating CPU time click on the stack button. Click on the copy button. Do that a few times and post the results here. ----- Rom |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
Rom, There are 3 threads: Pass 1 for CSwitchDelta aprox 90 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x1de550 Stack: ntoskrnl.exe+0x68efb ntoskrnl.exe+0xe3ad2 rosetta_beta_4.93_windows_intelx86.exe+0x32f6c8 for CSwitchDelta 31 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x49fcf Stack: ntoskrnl.exe+0x68e35 win32k.sys+0x19c2 win32k.sys+0xb72 win32k.sys+0x75693 ntoskrnl.exe+0x65014 ntoskrnl.exe+0xe3ad2 USER32.DLL+0x31eb3 rosetta_beta_4.93_windows_intelx86.exe+0x47b2fb rosetta_beta_4.93_windows_intelx86.exe+0x26c504 KERNEL32.dll+0x28989 for CSwitchDelta 1 StartAddress WINMM.dll+0x927f Stack: ntoskrnl.exe+0x68e35 ntoskrnl.exe+0x4fc50 ntoskrnl.exe+0x65014 ntdll.dll+0x8f03 Pass 2 for CSwitchDelta aprox 90 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x1de550 Stack: ntoskrnl.exe+0x68efb ntoskrnl.exe+0xe3ad2 rosetta_beta_4.93_windows_intelx86.exe+0x32f656 for CSwitchDelta 31 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x49fcf Stack: ntoskrnl.exe+0x68e35 win32k.sys+0x19c2 win32k.sys+0xb72 win32k.sys+0x75693 ntoskrnl.exe+0x65014 ntoskrnl.exe+0xe3ad2 USER32.DLL+0x31eb3 rosetta_beta_4.93_windows_intelx86.exe+0x47b2fb rosetta_beta_4.93_windows_intelx86.exe+0x26c504 KERNEL32.dll+0x28989 for CSwitchDelta 1 StartAddress WINMM.dll+0x927f Stack: ntoskrnl.exe+0x68e35 ntoskrnl.exe+0x4fc50 ntoskrnl.exe+0x65014 ntdll.dll+0x8f03 Pass 3 for CSwitchDelta aprox 90 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x1de550 Stack: ntoskrnl.exe+0x68efb ntoskrnl.exe+0xe3ad2 rosetta_beta_4.93_windows_intelx86.exe+0x32f6b6 for CSwitchDelta 31 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x49fcf Stack: ntoskrnl.exe+0x68e35 win32k.sys+0x19c2 win32k.sys+0xb72 win32k.sys+0x75693 ntoskrnl.exe+0x65014 ntoskrnl.exe+0xe3ad2 USER32.DLL+0x31eb3 rosetta_beta_4.93_windows_intelx86.exe+0x47b2fb rosetta_beta_4.93_windows_intelx86.exe+0x26c504 KERNEL32.dll+0x28989 for CSwitchDelta 1 StartAddress WINMM.dll+0x927f Stack: ntoskrnl.exe+0x68e35 ntoskrnl.exe+0x4fc50 ntoskrnl.exe+0x65014 ntdll.dll+0x8f03 By suspending a much higher priority project I can get this work unit to run at will... for right now its in suspended animation and left in virtual memory. It currently has 17hrs 53min 7sec of CPU time on it and is still at 1.00%. Let me know if there is any thing else I can do. I suspect you can get my email address if you need more detailed conversations. I would also be willing to call you. Additional info. BOINC is running as a service. Mike |
Message boards :
RALPH@home bug list :
Report \"stuck at 1%\" bugs here
©2024 University of Washington
http://www.bakerlab.org