Report \"stuck at 1%\" bugs here

Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Rom Walton (BOINC)
Volunteer moderator
Project developer

Send message
Joined: 10 Mar 06
Posts: 21
Credit: 5,515
RAC: 0
Message 905 - Posted: 18 Mar 2006, 16:14:35 UTC
Last modified: 18 Mar 2006, 16:16:56 UTC

Oppps, forgot to ask you to do one additional thing....

In Process Explorer there is an Options menu... Configure Symbols...

Can you set the Dbghelp.dll path to:

C:Program FilesBOINCDbgHelp.dll

After that could you rerun the tests again?

When things are working right you'll get something that looks like this:
rosetta_beta_4.93_windows_intelx86.exe!pairenergy+0x126
rosetta_beta_4.93_windows_intelx86.exe!fullatom_energy+0x1979
rosetta_beta_4.93_windows_intelx86.exe!scorefxn+0xb4e

TIA.

----- Rom

ID: 905 · Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 17 Feb 06
Posts: 50
Credit: 55,397
RAC: 0
Message 906 - Posted: 18 Mar 2006, 17:08:22 UTC - in response to Message 905.  


After that could you rerun the tests again?


Rom,

Data with Symbols:

Pass 1


for CSwitchDelta aprox 90 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x1de550

Stack:
ntoskrnl.exe!KiDispatchInterrupt+0x7b
ntoskrnl.exe!PsSetLegoNotifyRoutine+0x83a
rosetta_beta_4.93_windows_intelx86.exe+0x32f6b6

for CSwitchDelta 31 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x49fcf

Stack:
ntoskrnl.exe!KiUnexpectedInterrupt+0x183
win32k.sys+0x19c2
win32k.sys+0xb72
win32k.sys!EngGetCurrentCodePage+0x3654
ntoskrnl.exe!KiReleaseSpinLock+0xae4
!local_unwind2+0x5fe830bb
ntoskrnl.exe!PsSetLegoNotifyRoutine+0x83a
USER32.DLL!DispatchMessageW+0x40
rosetta_beta_4.93_windows_intelx86.exe+0x47b2fb
rosetta_beta_4.93_windows_intelx86.exe+0x26c504
KERNEL32.dll!ProcessIdToSessionId+0x17d

for CSwitchDelta 1 StartAddress WINMM.dlltimeSetEvent+0x2b0

Stack:
ntoskrnl.exe!KiUnexpectedInterrupt+0x183
ntoskrnl.exe!ObSetSecurityDescriptorInfo+0x62c
ntoskrnl.exe!KiReleaseSpinLock+0xae4
ntdll.dll!ZwWaitForMultipleObjects+0xb


Pass 2

for CSwitchDelta aprox 90 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x1de550

Stack:
ntoskrnl.exe!KiDispatchInterrupt+0x7b
!local_unwind2+0x5fe830bb
ntoskrnl.exe!PsSetLegoNotifyRoutine+0x83a
rosetta_beta_4.93_windows_intelx86.exe+0x49aeda
rosetta_beta_4.93_windows_intelx86.exe+0x256bb5

for CSwitchDelta 31 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x49fcf

Stack:
ntoskrnl.exe!KiUnexpectedInterrupt+0x183
win32k.sys+0x19c2
win32k.sys+0xb72
win32k.sys!EngGetCurrentCodePage+0x3654
ntoskrnl.exe!KiReleaseSpinLock+0xae4
!local_unwind2+0x5fe830bb
ntoskrnl.exe!PsSetLegoNotifyRoutine+0x83a
USER32.DLL!DispatchMessageW+0x40
rosetta_beta_4.93_windows_intelx86.exe+0x47b2fb
rosetta_beta_4.93_windows_intelx86.exe+0x26c504
KERNEL32.dll!ProcessIdToSessionId+0x17d

for CSwitchDelta 1 StartAddress WINMM.dlltimeSetEvent+0x2b0

Stack:
ntoskrnl.exe!KiUnexpectedInterrupt+0x183
ntoskrnl.exe!ZwYieldExecution+0x35f
ntoskrnl.exe!KiUnexpectedInterrupt+0x1ba
ntdll.dll!ZwWaitForMultipleObjects+0xb



Pass 3

for CSwitchDelta aprox 90 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x1de550

Stack:
ntoskrnl.exe!KiDispatchInterrupt+0x7b
!local_unwind2+0x5fe830bb
ntoskrnl.exe!PsSetLegoNotifyRoutine+0x83a
rosetta_beta_4.93_windows_intelx86.exe+0x256b92

for CSwitchDelta 31 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x49fcf

Stack:
ntoskrnl.exe!KiUnexpectedInterrupt+0x183
win32k.sys+0x19c2
win32k.sys+0xb72
win32k.sys!EngGetCurrentCodePage+0x3654
ntoskrnl.exe!KiReleaseSpinLock+0xae4
!local_unwind2+0x5fe830bb
ntoskrnl.exe!PsSetLegoNotifyRoutine+0x83a
USER32.DLL!DispatchMessageW+0x40
rosetta_beta_4.93_windows_intelx86.exe+0x47b2fb
rosetta_beta_4.93_windows_intelx86.exe+0x26c504
KERNEL32.dll!ProcessIdToSessionId+0x17d

for CSwitchDelta 1 StartAddress WINMM.dlltimeSetEvent+0x2b0

Stack:
ntoskrnl.exe!KiUnexpectedInterrupt+0x183
ntoskrnl.exe!ZwYieldExecution+0x35f
ntoskrnl.exe!KiUnexpectedInterrupt+0x1ba
ntdll.dll!ZwWaitForMultipleObjects+0xb


Good luck with this!
Mike

ID: 906 · Report as offensive    Reply Quote
Rom Walton (BOINC)
Volunteer moderator
Project developer

Send message
Joined: 10 Mar 06
Posts: 21
Credit: 5,515
RAC: 0
Message 908 - Posted: 19 Mar 2006, 0:35:50 UTC

Mike,

Using Process Explorer again, can you look at the thread state for each thread?

What is the base priority and dynamic priority for each thread in your list?

It should be visible on the Threads tab on the process properties dialog box.

TIA.

----- Rom
ID: 908 · Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 17 Feb 06
Posts: 50
Credit: 55,397
RAC: 0
Message 913 - Posted: 19 Mar 2006, 7:08:07 UTC - in response to Message 908.  

Mike,

Using Process Explorer again, can you look at the thread state for each thread?

What is the base priority and dynamic priority for each thread in your list?

It should be visible on the Threads tab on the process properties dialog box.

TIA.

----- Rom


More Info:

for CSwitchDelta aprox 90 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x1de550

ThreadID 2716
State Ready
Kernal Time 0:00:01.131 not moving
User Time 18:34:50.250 and climbing fast
Base Priority 1
Dynamic Priority 1

for CSwitchDelta 31 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x49fcf

ThreadID 2680
State Ready
Kernal Time 0:00:00.828 not moving
User Time 0:00:00.187 not moving
Base Priority 4
Dynamic Priority 6

for CSwitchDelta 1 StartAddress WINMM.dlltimeSetEvent+0x2b0

ThreadID 2720
State Wait:UserRequest
Kernal Time 0:00:00.000 not moving
User Time 0:00:00.000 not moving
Base Priority 15
Dynamic Priority 15


ID: 913 · Report as offensive    Reply Quote
Rom Walton (BOINC)
Volunteer moderator
Project developer

Send message
Joined: 10 Mar 06
Posts: 21
Credit: 5,515
RAC: 0
Message 915 - Posted: 19 Mar 2006, 7:37:58 UTC

Mike,

Are you familiar with the Windows debugging tools?

The reason I ask, is if I could get a dump of the process this might go quite a bit quicker.

Would you be game for trying to get me a dump?

ID: 915 · Report as offensive    Reply Quote
BennyRop

Send message
Joined: 11 Mar 06
Posts: 14
Credit: 674
RAC: 0
Message 916 - Posted: 19 Mar 2006, 8:50:40 UTC

Or temporarily opening two holes in your firewall/router so that the system could be taken over through RealVNC? (emailing Rom the ip#, RealVNC name and password) Granted, it's something I'd only do with someone I trusted. :)
ID: 916 · Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 17 Feb 06
Posts: 50
Credit: 55,397
RAC: 0
Message 924 - Posted: 19 Mar 2006, 17:30:09 UTC - in response to Message 915.  
Last modified: 19 Mar 2006, 17:34:32 UTC

Mike,

Are you familiar with the Windows debugging tools?

The reason I ask, is if I could get a dump of the process this might go quite a bit quicker.

Would you be game for trying to get me a dump?

This is why I was suggesting direct contact. I am familiar with VS tools for remote debugging, but I always have the source where I can attach to a remote process and set breakpoints and such. How to debug without source is something I'm not sure about. (Never had to, so never I figured it out).

ID: 924 · Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 17 Feb 06
Posts: 50
Credit: 55,397
RAC: 0
Message 925 - Posted: 19 Mar 2006, 17:32:39 UTC - in response to Message 916.  

Or temporarily opening two holes in your firewall/router so that the system could be taken over through RealVNC? (emailing Rom the ip#, RealVNC name and password) Granted, it's something I'd only do with someone I trusted. :)


I'm sorry, direct access is not possible. I'm stretching the rules just running foreign code.
ID: 925 · Report as offensive    Reply Quote
Rom Walton (BOINC)
Volunteer moderator
Project developer

Send message
Joined: 10 Mar 06
Posts: 21
Credit: 5,515
RAC: 0
Message 926 - Posted: 19 Mar 2006, 18:29:27 UTC - in response to Message 924.  

Mike,

Are you familiar with the Windows debugging tools?

The reason I ask, is if I could get a dump of the process this might go quite a bit quicker.

Would you be game for trying to get me a dump?

This is why I was suggesting direct contact. I am familiar with VS tools for remote debugging, but I always have the source where I can attach to a remote process and set breakpoints and such. How to debug without source is something I'm not sure about. (Never had to, so never I figured it out).


Sweet.

Attach to the process with Visual Studio.
Break on all threads
From the debug menu select Save Dump As.
Be sure to change the dump type to dump with heap.
And give it some sort of name.

With winzip compression the fire should shrink to 20MB or so.

Do you have a web server I would be able to dl it from? Or should we try email?

----- Rom
ID: 926 · Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 17 Feb 06
Posts: 50
Credit: 55,397
RAC: 0
Message 927 - Posted: 19 Mar 2006, 19:21:24 UTC - in response to Message 926.  
Last modified: 19 Mar 2006, 19:22:23 UTC


Sweet.

Attach to the process with Visual Studio.
Break on all threads
From the debug menu select Save Dump As.
Be sure to change the dump type to dump with heap.
And give it some sort of name.

With winzip compression the fire should shrink to 20MB or so.

Do you have a web server I would be able to dl it from? Or should we try email?

----- Rom



Rom,

Ok, the latest. Like I said, Im unfamiliar with debugging without source code. So.. I attached to the process and broke all threads. I looked for the Dump As. It wasn’t in the debug menu so I did some checking in Help and discovered a passage that essentially said he symbols had to be loaded to allow a dump. So I did a “Continue” and detached from the process to investigate how to load the symbols. After figuring that out, I looked at the run time for the Rosetta Beta process and discovered it had started over at 0 CPU time. Do you know if this represents a true restart? If so, I may no longer be stuck at 0. Anyway, I now have the dump file, its zipped and its size is under 13 meg, easy enough for me to email.

1) Is it possible this is of no more value cause I might no longer be stuck?
2) Should I allow it to keep running and see? ( I have it swapped out at the moment with 11 minutes of run time according to task manager)
3) Do you still want the file?
4) Where to?

Mike

ID: 927 · Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 17 Feb 06
Posts: 50
Credit: 55,397
RAC: 0
Message 928 - Posted: 19 Mar 2006, 20:36:20 UTC

Looking at the stdout file, it appears that it indeed did restart due to a failed heartbeat.
It is however using the exact same command line including seed. So I am going to let it run and see if its still stuck at 0.

ID: 928 · Report as offensive    Reply Quote
Rom Walton (BOINC)
Volunteer moderator
Project developer

Send message
Joined: 10 Mar 06
Posts: 21
Credit: 5,515
RAC: 0
Message 929 - Posted: 19 Mar 2006, 21:28:14 UTC

Ah, okay...

Well hopefully it'll do it again...

Let me know how it goes...

ID: 929 · Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 17 Feb 06
Posts: 50
Credit: 55,397
RAC: 0
Message 931 - Posted: 20 Mar 2006, 6:41:39 UTC - in response to Message 929.  
Last modified: 20 Mar 2006, 6:42:31 UTC

Ah, okay...

Well hopefully it'll do it again...

Let me know how it goes...

OK, I'm 10+ hours in and still stuck at 1%. I think it will stay stuck. If you concur I will gather the info. In the meantime, I am going to preempt it.
ID: 931 · Report as offensive    Reply Quote
Rom Walton (BOINC)
Volunteer moderator
Project developer

Send message
Joined: 10 Mar 06
Posts: 21
Credit: 5,515
RAC: 0
Message 932 - Posted: 20 Mar 2006, 7:09:54 UTC

well go ahead and get a dump of it. I'm glad it at least repro'ed for you.

----- Rom
ID: 932 · Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 17 Feb 06
Posts: 50
Credit: 55,397
RAC: 0
Message 933 - Posted: 20 Mar 2006, 7:43:49 UTC - in response to Message 932.  

well go ahead and get a dump of it. I'm glad it at least repro'ed for you.

----- Rom

Got it... where to?
ID: 933 · Report as offensive    Reply Quote
Rom Walton (BOINC)
Volunteer moderator
Project developer

Send message
Joined: 10 Mar 06
Posts: 21
Credit: 5,515
RAC: 0
Message 934 - Posted: 20 Mar 2006, 14:47:02 UTC
Last modified: 20 Mar 2006, 14:47:17 UTC

Could you send it to this address:

romw at romwnet.org

It is currently setup with unrestricted sizes for sending and receiving email.

----- Rom
ID: 934 · Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 17 Feb 06
Posts: 50
Credit: 55,397
RAC: 0
Message 935 - Posted: 20 Mar 2006, 17:55:23 UTC - in response to Message 934.  
Last modified: 20 Mar 2006, 18:04:20 UTC

Could you send it to this address:

romw at romwnet.org

It is currently setup with unrestricted sizes for sending and receiving email.

----- Rom

I sent you an email with the following content... did you get it?

"Looks like I’m having trouble getting the 12 meg out of the gate here. My main email ISP has a 5 meg limit, another has a 10 meg limit (both I have direct access to).. yet another ISP I have an account with is unlimited, but I have no direct connection with them and they don’t allow relaying… So It looks like I am going to have to carve the files up. Do you have a preferred method? I can create segmented Zips, or there is a shareware program I have used in the past called EZSplit. Or I could just write a short program to cut it up."

Mike


ID: 935 · Report as offensive    Reply Quote
Rom Walton (BOINC)
Volunteer moderator
Project developer

Send message
Joined: 10 Mar 06
Posts: 21
Credit: 5,515
RAC: 0
Message 936 - Posted: 20 Mar 2006, 18:07:09 UTC - in response to Message 935.  

Could you send it to this address:

romw at romwnet.org

It is currently setup with unrestricted sizes for sending and receiving email.

----- Rom

I sent you an email with the following content... did you get it?

"Looks like I’m having trouble getting the 12 meg out of the gate here. My main email ISP has a 5 meg limit, another has a 10 meg limit (both I have direct access to).. yet another ISP I have an account with is unlimited, but I have no direct connection with them and they don’t allow relaying… So It looks like I am going to have to carve the files up. Do you have a preferred method? I can create segmented Zips, or there is a shareware program I have used in the past called EZSplit. Or I could just write a short program to cut it up."

Mike



I didn't get it. Go ahead and create mini rars then, winrar can break up the dump file and reassemble it without to much grief.

----- Rom
ID: 936 · Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 17 Feb 06
Posts: 50
Credit: 55,397
RAC: 0
Message 937 - Posted: 20 Mar 2006, 18:51:16 UTC - in response to Message 936.  


I didn't get it. Go ahead and create mini rars then, winrar can break up the dump file and reassemble it without to much grief.

----- Rom


Elvis has left the building.
ID: 937 · Report as offensive    Reply Quote
Profile UBT - Timbo

Send message
Joined: 16 Feb 06
Posts: 3
Credit: 3,924
RAC: 0
Message 951 - Posted: 22 Mar 2006, 14:51:14 UTC

Hi Rom,

As per isntructions in the other thread, have aborted the following RALPH 4.93 WU's as they were stuck at 1%:

22/03/2006 14:53:43|ralph@home|Unrecoverable error for result HB_BARCODE_30_1a19A_352_138_0 (aborted via GUI RPC)
22/03/2006 14:53:48|ralph@home|Unrecoverable error for result HB_BARCODE_30_1a68__352_138_0 (aborted via GUI RPC)
22/03/2006 14:53:55|ralph@home|Unrecoverable error for result HB_BARCODE_30_1ctf__352_137_0 (aborted via GUI RPC)
22/03/2006 14:54:00|ralph@home|Unrecoverable error for result HB_BARCODE_30_1ctf__352_136_0 (aborted via GUI RPC)
22/03/2006 14:54:11|ralph@home|Unrecoverable error for result HB_BARCODE_30_4ubpA_352_135_0 (aborted via GUI RPC)

Have 2 more that are progressing:


22/03/2006 14:54:23|ralph@home|Pausing result HB_BARCODE_30_5croA_352_136_0 (left in memory)
22/03/2006 14:56:02|ralph@home|Pausing result HB_BARCODE_30_1bk2__352_137_0 (left in memory)


and now both are at around 37% at:

Stage: "Ab initio".
Model: 95
Step: 325,000+

- had to change the CPU resource to 2 days (from 4 days), as these 2 WU's are preventing me crunching for any other project - but happy to help with 48 hours of solid RALPH crunching if it helps figure out the problem.

Now have some 4.94 WU's

regards,

Tim
ID: 951 · Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here



©2024 University of Washington
http://www.bakerlab.org