Posts by Mike Gelvin

21) Message boards : RALPH@home bug list : Bug reports for Ralph 5.05 and higher (Message 1476)
Posted 4 May 2006 by Mike Gelvin
Post:
[This computer is headless. Remote access only. Hence no screensaver.

Mike, I use VNC to see the graphics on my remote monitorless, keyboardless, and mouseless puter. I click on the WU from the task tab and then view graphics. No screensaver here either. If it's a service install your hosed.

tony


It is a service install. I forgot about the "View Graphics button" I do VN into this computer. OK... 1.041% complete after 40 hours. Stage Full atom relax, Mode 1, Step 100, Accepted RMSD 50.36, Accepted Energy -19.40622 whatever this all means.
22) Message boards : RALPH@home bug list : Bug reports for Ralph 5.05 and higher (Message 1473)
Posted 4 May 2006 by Mike Gelvin
Post:
36 hours and still stuck at 1.04%... the watchdog is NOT working... is anyone out there?


Hi Mike

Have you checked the grafics to se if the steps or % has changed?

The % should show with 1.04?? and not as on boinc manager with only 1,04.

Anders n


This computer is headless. Remote access only. Hence no screensaver.


23) Message boards : RALPH@home bug list : Bug reports for Ralph 5.05 and higher (Message 1471)
Posted 4 May 2006 by Mike Gelvin
Post:
ROM,
I currently have a rosetta_beta_5.06 that has been running 14 hours+ with 1.04% for progress. I have debug capability on this computer, any suggestions, or just Abort?

its labeled: WATCHDOG_KILL_VERY_LONG_JOBS_414_3

I notice that 2 others ran this unit and it died at 1.5 hours and 1.8 hours

Running on Win2000 SP4, leave in memory is set.


http://ralph.bakerlab.org/workunit.php?wuid=83793

Now at 24 hours and still stuck at 1.04%.


36 hours and still stuck at 1.04%... the watchdog is NOT working... is anyone out there?
24) Message boards : RALPH@home bug list : Bug reports for Ralph 5.05 and higher (Message 1469)
Posted 4 May 2006 by Mike Gelvin
Post:
ROM,
I currently have a rosetta_beta_5.06 that has been running 14 hours+ with 1.04% for progress. I have debug capability on this computer, any suggestions, or just Abort?

its labeled: WATCHDOG_KILL_VERY_LONG_JOBS_414_3

I notice that 2 others ran this unit and it died at 1.5 hours and 1.8 hours

Running on Win2000 SP4, leave in memory is set.


http://ralph.bakerlab.org/workunit.php?wuid=83793

Now at 24 hours and still stuck at 1.04%.
25) Message boards : RALPH@home bug list : Bug reports for Ralph 5.05 and higher (Message 1467)
Posted 3 May 2006 by Mike Gelvin
Post:
ROM,
I currently have a rosetta_beta_5.06 that has been running 14 hours+ with 1.04% for progress. I have debug capability on this computer, any suggestions, or just Abort?

its labeled: WATCHDOG_KILL_VERY_LONG_JOBS_414_3

I notice that 2 others ran this unit and it died at 1.5 hours and 1.8 hours

Running on Win2000 SP4, leave in memory is set.
26) Message boards : RALPH@home bug list : Bug reports for Ralph 5.05 and higher (Message 1422)
Posted 28 Apr 2006 by Mike Gelvin
Post:
4/28/2006 12:53:48 AM||Rescheduling CPU: files downloaded
4/28/2006 3:15:49 AM||Rescheduling CPU: application exited
4/28/2006 3:15:49 AM|ralph@home|Computation for task WATCHDOG_KILL_VERY_LONG_JOBS_424_9_2 finished
4/28/2006 3:15:50 AM|ralph@home|Unrecoverable error for result WATCHDOG_KILL_VERY_LONG_JOBS_424_9_2 (<file_xfer_error> <file_name>WATCHDOG_KILL_VERY_LONG_JOBS_424_9_2_0</file_name> <error_code>-161</error_code></file_xfer_error>)


result: http://ralph.bakerlab.org/result.php?resultid=97709

Win 2000 SP4 Intel Pentium 4 @ 2.4GHz w/ 512Meg RAM


There was is an additional message in the result about a non-existant file:
GZIP SILENT FILE: .xx1enh.out
WARNING! attempt to gzip file .xx1enh.out failed: file does not exist.
27) Message boards : RALPH@home bug list : Bug reports for Ralph 5.03 (Message 1334)
Posted 24 Apr 2006 by Mike Gelvin
Post:
24/04/2006 3:14:57 AM|ralph@home|Unrecoverable error for result NOCHECK_DEFAULT_DOG_7486h002_dec184_1.pdb_418_5_0 (<file_xfer_error> <file_name>NOCHECK_DEFAULT_DOG_7486h002_dec184_1.pdb_418_5_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
http://ralph.bakerlab.org/result.php?resultid=94190



I thought credit was supposed to be granted on these?
28) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 1218)
Posted 18 Apr 2006 by Mike Gelvin
Post:
Carlos, have they said they can't use older results to help debug the future versions? Have they said "always delete all results after a new versions come out? I haven't seen that. Hence, I'm reporting it and waiting for further instructions from Mod9/developers. I'd hate to dump it if it can be useful. Maybe they've already found the problem. If so someone should say something. This is an alpha project. Boinc Alpha wants reports from previous versions. They still have the 4.99 threads listed for use, that says, they still want reports or haven't "closed" them yet. Either way, I want someone to tell me if this is useful (see my first and succeeding posts). I will continue to post this until someone says otherwise.

I question my posts qualifying for acceptance to this thread, but it started as a 1% bug. Mod9 can feel free to move or delete it. All I need is some guidance from management as to how I can best help them.

tony



Tony,
I agree with you. You never know what the next version is really testing, might not be anything to do with the 1% bug and hence your observations/questions are very valid. I suspect they (devs) need all the help they can get.
Mike
29) Message boards : RALPH@home bug list : OLD- Bug reports for Windows Ver - 5.00 (and higher) (Message 1138)
Posted 14 Apr 2006 by Mike Gelvin
Post:
Tons of access violation errors. I have the a later client installed so there's lots of info in the return.. Follow my name link to my computer.
30) Message boards : RALPH@home bug list : Report \"failure when switching projects without keeping applications in memory\" bugs here (Message 1001)
Posted 27 Mar 2006 by Mike Gelvin
Post:
Eight work units in a row have completed without a hitch. All were 4.94 running 8 hours of CPU time with swap outs (out of memory) every 2 hours.

Version 4.95 is in the Queue.

Win 2000 SP4 Intel Pent 4 2.40GHz

Looks like this one is put to bed. Thanks!
31) Message boards : Current tests : Switching between projects with applications removed from memory (Message 938)
Posted 22 Mar 2006 by Mike Gelvin
Post:
It happened again.

3/19/2006 5:36:43 PM|rosetta@home|Pausing result HB_BARCODE_30_1bq9A_351_14302_0 (removed from memory)
3/19/2006 5:36:44 PM|rosetta@home|Unrecoverable error for result HB_BARCODE_30_1bq9A_351_14302_0 ( - exit code -164 (0xffffff5c))
3/19/2006 5:36:44 PM||Rescheduling CPU: process exited
3/19/2006 5:36:44 PM|rosetta@home|Computation for result HB_BARCODE_30_1bq9A_351_14302_0 finished

Rosetta WU: http://boinc.bakerlab.org/rosetta/workunit.php?wuid=11537937
Result: http://boinc.bakerlab.org/rosetta/result.php?resultid=14251499

So this is it, I'm changing back to keeping WU's in memory while preempted untill you get this bug fixed. Else you devs should say that we can't crunch Rosetta and Ralph WU's on the same computer!




You can crunch Ralph and Rosetta on the same computer. But if you don’t leave the apps in memory then Rosetta (from the Rosetta project NOT Ralph). does indeed have a good chance of failing. They have not yet updated Rosetta with the Ralph “leave in memory” fix.
When Rosetta gets fixed, I suspect you will see an app version greater than or equal to 4.93
32) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 937)
Posted 20 Mar 2006 by Mike Gelvin
Post:

I didn't get it. Go ahead and create mini rars then, winrar can break up the dump file and reassemble it without to much grief.

----- Rom


Elvis has left the building.
33) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 935)
Posted 20 Mar 2006 by Mike Gelvin
Post:
Could you send it to this address:

romw at romwnet.org

It is currently setup with unrestricted sizes for sending and receiving email.

----- Rom

I sent you an email with the following content... did you get it?

"Looks like I’m having trouble getting the 12 meg out of the gate here. My main email ISP has a 5 meg limit, another has a 10 meg limit (both I have direct access to).. yet another ISP I have an account with is unlimited, but I have no direct connection with them and they don’t allow relaying… So It looks like I am going to have to carve the files up. Do you have a preferred method? I can create segmented Zips, or there is a shareware program I have used in the past called EZSplit. Or I could just write a short program to cut it up."

Mike

34) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 933)
Posted 20 Mar 2006 by Mike Gelvin
Post:
well go ahead and get a dump of it. I'm glad it at least repro'ed for you.

----- Rom

Got it... where to?
35) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 931)
Posted 20 Mar 2006 by Mike Gelvin
Post:
Ah, okay...

Well hopefully it'll do it again...

Let me know how it goes...

OK, I'm 10+ hours in and still stuck at 1%. I think it will stay stuck. If you concur I will gather the info. In the meantime, I am going to preempt it.
36) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 928)
Posted 19 Mar 2006 by Mike Gelvin
Post:
Looking at the stdout file, it appears that it indeed did restart due to a failed heartbeat.
It is however using the exact same command line including seed. So I am going to let it run and see if its still stuck at 0.
37) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 927)
Posted 19 Mar 2006 by Mike Gelvin
Post:

Sweet.

Attach to the process with Visual Studio.
Break on all threads
From the debug menu select Save Dump As.
Be sure to change the dump type to dump with heap.
And give it some sort of name.

With winzip compression the fire should shrink to 20MB or so.

Do you have a web server I would be able to dl it from? Or should we try email?

----- Rom



Rom,

Ok, the latest. Like I said, Im unfamiliar with debugging without source code. So.. I attached to the process and broke all threads. I looked for the Dump As. It wasn’t in the debug menu so I did some checking in Help and discovered a passage that essentially said he symbols had to be loaded to allow a dump. So I did a “Continue” and detached from the process to investigate how to load the symbols. After figuring that out, I looked at the run time for the Rosetta Beta process and discovered it had started over at 0 CPU time. Do you know if this represents a true restart? If so, I may no longer be stuck at 0. Anyway, I now have the dump file, its zipped and its size is under 13 meg, easy enough for me to email.

1) Is it possible this is of no more value cause I might no longer be stuck?
2) Should I allow it to keep running and see? ( I have it swapped out at the moment with 11 minutes of run time according to task manager)
3) Do you still want the file?
4) Where to?

Mike
38) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 925)
Posted 19 Mar 2006 by Mike Gelvin
Post:
Or temporarily opening two holes in your firewall/router so that the system could be taken over through RealVNC? (emailing Rom the ip#, RealVNC name and password) Granted, it's something I'd only do with someone I trusted. :)


I'm sorry, direct access is not possible. I'm stretching the rules just running foreign code.
39) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 924)
Posted 19 Mar 2006 by Mike Gelvin
Post:
Mike,

Are you familiar with the Windows debugging tools?

The reason I ask, is if I could get a dump of the process this might go quite a bit quicker.

Would you be game for trying to get me a dump?

This is why I was suggesting direct contact. I am familiar with VS tools for remote debugging, but I always have the source where I can attach to a remote process and set breakpoints and such. How to debug without source is something I'm not sure about. (Never had to, so never I figured it out).
40) Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here (Message 913)
Posted 19 Mar 2006 by Mike Gelvin
Post:
Mike,

Using Process Explorer again, can you look at the thread state for each thread?

What is the base priority and dynamic priority for each thread in your list?

It should be visible on the Threads tab on the process properties dialog box.

TIA.

----- Rom


More Info:

for CSwitchDelta aprox 90 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x1de550

ThreadID 2716
State Ready
Kernal Time 0:00:01.131 not moving
User Time 18:34:50.250 and climbing fast
Base Priority 1
Dynamic Priority 1

for CSwitchDelta 31 StartAddress rosetta_beta_4.93_windows_intelx86.exe+0x49fcf

ThreadID 2680
State Ready
Kernal Time 0:00:00.828 not moving
User Time 0:00:00.187 not moving
Base Priority 4
Dynamic Priority 6

for CSwitchDelta 1 StartAddress WINMM.dlltimeSetEvent+0x2b0

ThreadID 2720
State Wait:UserRequest
Kernal Time 0:00:00.000 not moving
User Time 0:00:00.000 not moving
Base Priority 15
Dynamic Priority 15



Previous 20 · Next 20



©2024 University of Washington
http://www.bakerlab.org