Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
genes Send message Joined: 16 Feb 06 Posts: 45 Credit: 43,706 RAC: 20 |
Well, the machine I had set up to test "leave in memory = NO" has restarted a bunch of times, basically every time that the apps switch. I just changed that to "leave in memory = YES". I would guess that we can't do that test anymore while 4.90 WU's are being sent out. [edit] BTW, I'm now running BOINC ver. 5.3.22, since it has the ability to use a "global_prefs_override.xml" file to quickly change preferences like Leave Apps In Memory without worrying what venue a machine belongs to or what other machines the change might affect. FINALLY! [/edit] |
Brotherbard Send message Joined: 16 Feb 06 Posts: 15 Credit: 76,109 RAC: 0 |
The WU # 11525 1vdi_loop_1m5xA__1001_233_5 has been hung at 1% for 13 hours now. In the graphics the model is not changing and the stats show: Stage: Relax, Model: 1, Step: 0. The stderr file is filled with "Could not identify element type from chemical symbol. Setting as undefined". And both the stderr and sdtout files have not been modified since about a half hour from the start of the WU. It is still running. --Nathan |
Moderator9 Volunteer moderator Send message Joined: 16 Feb 06 Posts: 251 Credit: 0 RAC: 0 |
The WU # 11525 1vdi_loop_1m5xA__1001_233_5 has been hung at 1% for 13 hours now. If it is a version 4.90 WU, abort it. If it is a 4.91 WU then try restarting it by restarting BOINC. Moderator9 RALPH@home FAQs RALPH@home Guidelines Moderator Contact |
Brotherbard Send message Joined: 16 Feb 06 Posts: 15 Credit: 76,109 RAC: 0 |
If it is a version 4.90 WU, abort it. If it is a 4.91 WU then try restarting it by restarting BOINC. It's on a Mac OS X 10.4.5, RAPLH v 4.85 --Nathan |
Carlos_Pfitzner Send message Joined: 16 Feb 06 Posts: 182 Credit: 22,792 RAC: 0 |
stuck at 1.00% https://ralph.bakerlab.org/result.php?resultid=17832 Rosetta_beta 4.84 Linux CPU 98% IDLE *Restarting boinc Click signature for global team stats |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
rosetta_beta 4.93 Core Client is 5.2.15 I have a "Stuck at 1%" in progress right now. I have the app set to leave workunits in memory. Its on a Win2000 SP4 machine. Heres the Result ID: https://ralph.bakerlab.org/result.php?resultid=17137 Workunit ID: https://ralph.bakerlab.org/workunit.php?wuid=11522 Its been running for 16 hours of CPU time. Is there any info I can gather to help with this one while its in progress? I noticed you include the .pdb file. I can do remote debugging of VS2005 apps on this machine, I just need some clues as to what to look for. |
Rom Walton (BOINC) Volunteer moderator Project developer Send message Joined: 10 Mar 06 Posts: 21 Credit: 5,515 RAC: 0 |
Probably the best thing to do is get this tool: http://www.sysinternals.com/Utilities/ProcessExplorer.html Open up process explorer. Right-Click on the Rosetta process and bring up the properties. Switch to the threads tab. For each thread that is eating CPU time click on the stack button. Click on the copy button. Do that a few times and post the results here. ----- Rom |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
Rom, There are 3 threads: Pass 1 for CSwitchDelta aprox 90 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x1de550 Stack: ntoskrnl.exe+0x68efb ntoskrnl.exe+0xe3ad2 rosetta_beta_4.93_windows_intelx86.exe+0x32f6c8 for CSwitchDelta 31 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x49fcf Stack: ntoskrnl.exe+0x68e35 win32k.sys+0x19c2 win32k.sys+0xb72 win32k.sys+0x75693 ntoskrnl.exe+0x65014 ntoskrnl.exe+0xe3ad2 USER32.DLL+0x31eb3 rosetta_beta_4.93_windows_intelx86.exe+0x47b2fb rosetta_beta_4.93_windows_intelx86.exe+0x26c504 KERNEL32.dll+0x28989 for CSwitchDelta 1 StartAddress WINMM.dll+0x927f Stack: ntoskrnl.exe+0x68e35 ntoskrnl.exe+0x4fc50 ntoskrnl.exe+0x65014 ntdll.dll+0x8f03 Pass 2 for CSwitchDelta aprox 90 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x1de550 Stack: ntoskrnl.exe+0x68efb ntoskrnl.exe+0xe3ad2 rosetta_beta_4.93_windows_intelx86.exe+0x32f656 for CSwitchDelta 31 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x49fcf Stack: ntoskrnl.exe+0x68e35 win32k.sys+0x19c2 win32k.sys+0xb72 win32k.sys+0x75693 ntoskrnl.exe+0x65014 ntoskrnl.exe+0xe3ad2 USER32.DLL+0x31eb3 rosetta_beta_4.93_windows_intelx86.exe+0x47b2fb rosetta_beta_4.93_windows_intelx86.exe+0x26c504 KERNEL32.dll+0x28989 for CSwitchDelta 1 StartAddress WINMM.dll+0x927f Stack: ntoskrnl.exe+0x68e35 ntoskrnl.exe+0x4fc50 ntoskrnl.exe+0x65014 ntdll.dll+0x8f03 Pass 3 for CSwitchDelta aprox 90 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x1de550 Stack: ntoskrnl.exe+0x68efb ntoskrnl.exe+0xe3ad2 rosetta_beta_4.93_windows_intelx86.exe+0x32f6b6 for CSwitchDelta 31 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x49fcf Stack: ntoskrnl.exe+0x68e35 win32k.sys+0x19c2 win32k.sys+0xb72 win32k.sys+0x75693 ntoskrnl.exe+0x65014 ntoskrnl.exe+0xe3ad2 USER32.DLL+0x31eb3 rosetta_beta_4.93_windows_intelx86.exe+0x47b2fb rosetta_beta_4.93_windows_intelx86.exe+0x26c504 KERNEL32.dll+0x28989 for CSwitchDelta 1 StartAddress WINMM.dll+0x927f Stack: ntoskrnl.exe+0x68e35 ntoskrnl.exe+0x4fc50 ntoskrnl.exe+0x65014 ntdll.dll+0x8f03 By suspending a much higher priority project I can get this work unit to run at will... for right now its in suspended animation and left in virtual memory. It currently has 17hrs 53min 7sec of CPU time on it and is still at 1.00%. Let me know if there is any thing else I can do. I suspect you can get my email address if you need more detailed conversations. I would also be willing to call you. Additional info. BOINC is running as a service. Mike |
Rom Walton (BOINC) Volunteer moderator Project developer Send message Joined: 10 Mar 06 Posts: 21 Credit: 5,515 RAC: 0 |
Oppps, forgot to ask you to do one additional thing.... In Process Explorer there is an Options menu... Configure Symbols... Can you set the Dbghelp.dll path to: C:Program FilesBOINCDbgHelp.dll After that could you rerun the tests again? When things are working right you'll get something that looks like this: rosetta_beta_4.93_windows_intelx86.exe!pairenergy+0x126 rosetta_beta_4.93_windows_intelx86.exe!fullatom_energy+0x1979 rosetta_beta_4.93_windows_intelx86.exe!scorefxn+0xb4e TIA. ----- Rom |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
Rom, Data with Symbols: Pass 1 for CSwitchDelta aprox 90 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x1de550 Stack: ntoskrnl.exe!KiDispatchInterrupt+0x7b ntoskrnl.exe!PsSetLegoNotifyRoutine+0x83a rosetta_beta_4.93_windows_intelx86.exe+0x32f6b6 for CSwitchDelta 31 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x49fcf Stack: ntoskrnl.exe!KiUnexpectedInterrupt+0x183 win32k.sys+0x19c2 win32k.sys+0xb72 win32k.sys!EngGetCurrentCodePage+0x3654 ntoskrnl.exe!KiReleaseSpinLock+0xae4 !local_unwind2+0x5fe830bb ntoskrnl.exe!PsSetLegoNotifyRoutine+0x83a USER32.DLL!DispatchMessageW+0x40 rosetta_beta_4.93_windows_intelx86.exe+0x47b2fb rosetta_beta_4.93_windows_intelx86.exe+0x26c504 KERNEL32.dll!ProcessIdToSessionId+0x17d for CSwitchDelta 1 StartAddress WINMM.dlltimeSetEvent+0x2b0 Stack: ntoskrnl.exe!KiUnexpectedInterrupt+0x183 ntoskrnl.exe!ObSetSecurityDescriptorInfo+0x62c ntoskrnl.exe!KiReleaseSpinLock+0xae4 ntdll.dll!ZwWaitForMultipleObjects+0xb Pass 2 for CSwitchDelta aprox 90 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x1de550 Stack: ntoskrnl.exe!KiDispatchInterrupt+0x7b !local_unwind2+0x5fe830bb ntoskrnl.exe!PsSetLegoNotifyRoutine+0x83a rosetta_beta_4.93_windows_intelx86.exe+0x49aeda rosetta_beta_4.93_windows_intelx86.exe+0x256bb5 for CSwitchDelta 31 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x49fcf Stack: ntoskrnl.exe!KiUnexpectedInterrupt+0x183 win32k.sys+0x19c2 win32k.sys+0xb72 win32k.sys!EngGetCurrentCodePage+0x3654 ntoskrnl.exe!KiReleaseSpinLock+0xae4 !local_unwind2+0x5fe830bb ntoskrnl.exe!PsSetLegoNotifyRoutine+0x83a USER32.DLL!DispatchMessageW+0x40 rosetta_beta_4.93_windows_intelx86.exe+0x47b2fb rosetta_beta_4.93_windows_intelx86.exe+0x26c504 KERNEL32.dll!ProcessIdToSessionId+0x17d for CSwitchDelta 1 StartAddress WINMM.dlltimeSetEvent+0x2b0 Stack: ntoskrnl.exe!KiUnexpectedInterrupt+0x183 ntoskrnl.exe!ZwYieldExecution+0x35f ntoskrnl.exe!KiUnexpectedInterrupt+0x1ba ntdll.dll!ZwWaitForMultipleObjects+0xb Pass 3 for CSwitchDelta aprox 90 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x1de550 Stack: ntoskrnl.exe!KiDispatchInterrupt+0x7b !local_unwind2+0x5fe830bb ntoskrnl.exe!PsSetLegoNotifyRoutine+0x83a rosetta_beta_4.93_windows_intelx86.exe+0x256b92 for CSwitchDelta 31 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x49fcf Stack: ntoskrnl.exe!KiUnexpectedInterrupt+0x183 win32k.sys+0x19c2 win32k.sys+0xb72 win32k.sys!EngGetCurrentCodePage+0x3654 ntoskrnl.exe!KiReleaseSpinLock+0xae4 !local_unwind2+0x5fe830bb ntoskrnl.exe!PsSetLegoNotifyRoutine+0x83a USER32.DLL!DispatchMessageW+0x40 rosetta_beta_4.93_windows_intelx86.exe+0x47b2fb rosetta_beta_4.93_windows_intelx86.exe+0x26c504 KERNEL32.dll!ProcessIdToSessionId+0x17d for CSwitchDelta 1 StartAddress WINMM.dlltimeSetEvent+0x2b0 Stack: ntoskrnl.exe!KiUnexpectedInterrupt+0x183 ntoskrnl.exe!ZwYieldExecution+0x35f ntoskrnl.exe!KiUnexpectedInterrupt+0x1ba ntdll.dll!ZwWaitForMultipleObjects+0xb Good luck with this! Mike |
Rom Walton (BOINC) Volunteer moderator Project developer Send message Joined: 10 Mar 06 Posts: 21 Credit: 5,515 RAC: 0 |
Mike, Using Process Explorer again, can you look at the thread state for each thread? What is the base priority and dynamic priority for each thread in your list? It should be visible on the Threads tab on the process properties dialog box. TIA. ----- Rom |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
Mike, More Info: for CSwitchDelta aprox 90 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x1de550 ThreadID 2716 State Ready Kernal Time 0:00:01.131 not moving User Time 18:34:50.250 and climbing fast Base Priority 1 Dynamic Priority 1 for CSwitchDelta 31 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x49fcf ThreadID 2680 State Ready Kernal Time 0:00:00.828 not moving User Time 0:00:00.187 not moving Base Priority 4 Dynamic Priority 6 for CSwitchDelta 1 StartAddress WINMM.dlltimeSetEvent+0x2b0 ThreadID 2720 State Wait:UserRequest Kernal Time 0:00:00.000 not moving User Time 0:00:00.000 not moving Base Priority 15 Dynamic Priority 15 |
Rom Walton (BOINC) Volunteer moderator Project developer Send message Joined: 10 Mar 06 Posts: 21 Credit: 5,515 RAC: 0 |
Mike, Are you familiar with the Windows debugging tools? The reason I ask, is if I could get a dump of the process this might go quite a bit quicker. Would you be game for trying to get me a dump? |
BennyRop Send message Joined: 11 Mar 06 Posts: 14 Credit: 674 RAC: 0 |
Or temporarily opening two holes in your firewall/router so that the system could be taken over through RealVNC? (emailing Rom the ip#, RealVNC name and password) Granted, it's something I'd only do with someone I trusted. :) |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
Mike, This is why I was suggesting direct contact. I am familiar with VS tools for remote debugging, but I always have the source where I can attach to a remote process and set breakpoints and such. How to debug without source is something I'm not sure about. (Never had to, so never I figured it out). |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
Or temporarily opening two holes in your firewall/router so that the system could be taken over through RealVNC? (emailing Rom the ip#, RealVNC name and password) Granted, it's something I'd only do with someone I trusted. :) I'm sorry, direct access is not possible. I'm stretching the rules just running foreign code. |
Rom Walton (BOINC) Volunteer moderator Project developer Send message Joined: 10 Mar 06 Posts: 21 Credit: 5,515 RAC: 0 |
Mike, Sweet. Attach to the process with Visual Studio. Break on all threads From the debug menu select Save Dump As. Be sure to change the dump type to dump with heap. And give it some sort of name. With winzip compression the fire should shrink to 20MB or so. Do you have a web server I would be able to dl it from? Or should we try email? ----- Rom |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
Rom, Ok, the latest. Like I said, Im unfamiliar with debugging without source code. So.. I attached to the process and broke all threads. I looked for the Dump As. It wasn’t in the debug menu so I did some checking in Help and discovered a passage that essentially said he symbols had to be loaded to allow a dump. So I did a “Continue†and detached from the process to investigate how to load the symbols. After figuring that out, I looked at the run time for the Rosetta Beta process and discovered it had started over at 0 CPU time. Do you know if this represents a true restart? If so, I may no longer be stuck at 0. Anyway, I now have the dump file, its zipped and its size is under 13 meg, easy enough for me to email. 1) Is it possible this is of no more value cause I might no longer be stuck? 2) Should I allow it to keep running and see? ( I have it swapped out at the moment with 11 minutes of run time according to task manager) 3) Do you still want the file? 4) Where to? Mike |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
Looking at the stdout file, it appears that it indeed did restart due to a failed heartbeat. It is however using the exact same command line including seed. So I am going to let it run and see if its still stuck at 0. |
Rom Walton (BOINC) Volunteer moderator Project developer Send message Joined: 10 Mar 06 Posts: 21 Credit: 5,515 RAC: 0 |
Ah, okay... Well hopefully it'll do it again... Let me know how it goes... |
Message boards :
RALPH@home bug list :
Report \"stuck at 1%\" bugs here
©2024 University of Washington
http://www.bakerlab.org