Report \"stuck at 1%\" bugs here

Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Nuadormrac
Avatar

Send message
Joined: 22 Feb 06
Posts: 68
Credit: 11,362
RAC: 0
Message 1106 - Posted: 13 Apr 2006, 0:32:00 UTC
Last modified: 13 Apr 2006, 0:35:13 UTC

Actually, in Windows XP (not sure of Linux as I haven't looked into this there), there is a means to get Japanese characters using a US keyboard... Basically it requires setting up IME (input methods editor), and then setting the computer up for multi-lingual support. But basically, when one sets it to the correct language mode for what one wants to type into, typing char combinations will result in the text being converted on the screen, from what one typed to the char (in that case hiragana or katakana) that represents what one typed. It also has a "find kanjii function" to convert the appropriate things into Kanjii...

Many people in Japan (from what I heard in Japanese 201) also use standard English keyboards, rather then Hiragana based keyboards, and then use an IME to just convert the text in software as they type. Main reason, it's faster then a Japanese native keyboard from what I've heard.

That leaves the over-riding problem however, and that is speaking their langauge sufficiently to write a message. Sorry to say, I'm not even fluent enough to do that as of yet...

Oh, and on CPDN, make sure your backup is recent, when going to upgrade, and have networking disabled when it starts CPDN. That way if something does happen, CPDN can't "phone home" and one can restore the WU.... My backup's about 3 months (not actual months, just model months) out of date... That represents a few hours of crunch time or so on my A64...
ID: 1106 · Report as offensive    Reply Quote
Profile [B^S] Dr. Bill Skiba
Avatar

Send message
Joined: 15 Feb 06
Posts: 4
Credit: 6,496
RAC: 0
Message 1182 - Posted: 15 Apr 2006, 12:01:59 UTC

I aborted this wu stuck at 1.09% for over 3 hours.

https://ralph.bakerlab.org/result.php?resultid=86049

ID: 1182 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1185 - Posted: 15 Apr 2006, 19:59:04 UTC
Last modified: 15 Apr 2006, 20:02:20 UTC

on another machine I have an "hblr" 4.99 stuck at one percent, model 1, step 0, stage "initializing". I'd restarted boinc after getting a fatal error message, and seeing this wu at 59 hours and change. It is work unit HBLR_1.0_1di2_377_4.

OH wait a second, the cpu time dropped back to 54 hours and it's working again. Stage Ab initio, model 1, Step 19013 and progressing. do I let it continue? It's a celeron 500, win98se, 256Mram running on realVNC so obviously I don't watch this one too closely and don't care about credit. Is there someway this may be helpful?

tony

[edit] it's now at 1.041% so it's not stuck anymore, I'll let it run unless otherwise instructed.
Formerly
mmciastro. Name and avatar changed for a change

The New Online Helpsytem help is just a call away.
ID: 1185 · Report as offensive    Reply Quote
MatthewBChambers

Send message
Joined: 13 Mar 06
Posts: 4
Credit: 5,367
RAC: 0
Message 1186 - Posted: 15 Apr 2006, 23:42:17 UTC

(I don't know if this should be here, at the v4.99 thread, or both.)


I have a 4.99 for Windows version stuck at about 1% progress for many hours (at least 8 of its time). It currently says 1.13% after 16:17:07 CPU time, with 30:25:38 to go, supposedly. I just aborted it since a new version is out.



Host ID:
https://ralph.bakerlab.org/show_host_detail.php?hostid=2404

Work unit ID:
https://ralph.bakerlab.org/workunit.php?wuid=76544

Result ID:
https://ralph.bakerlab.org/result.php?resultid=85055

Here is the BOINC startup info:
4/15/2006 4:50:21 PM||Starting BOINC client version 5.2.13 for windows_intelx86
4/15/2006 4:50:21 PM||libcurl/7.14.0 OpenSSL/0.9.8 zlib/1.2.3
4/15/2006 4:50:21 PM||Data directory: C:Program FilesBOINC
4/15/2006 4:50:21 PM||Processor: 1 GenuineIntel x86 Family 6 Model 8 Stepping 6 863MHz
4/15/2006 4:50:21 PM||Memory: 383.30 MB physical, 922.22 MB virtual
4/15/2006 4:50:21 PM||Disk: 24.41 GB total, 19.34 GB free
4/15/2006 4:50:21 PM|rosetta@home|Computer ID: 197494; location: home; project prefs: default
4/15/2006 4:50:21 PM|boincsimap|Computer ID: 17955; location: home; project prefs: default
4/15/2006 4:50:21 PM|Einstein@Home|Computer ID: 594228; location: home; project prefs: default
4/15/2006 4:50:21 PM|LHC@home|Computer ID: 142531; location: home; project prefs: default
4/15/2006 4:50:21 PM|Predictor @ Home|Computer ID: 237773; location: home; project prefs: default
4/15/2006 4:50:21 PM|ralph@home|Computer ID: 2404; location: home; project prefs: default
4/15/2006 4:50:21 PM|SETI@home|Computer ID: 2330542; location: home; project prefs: default
4/15/2006 4:50:21 PM|SZTAKI Desktop Grid|Computer ID: 17392; location: home; project prefs: default
4/15/2006 4:50:21 PM|World Community Grid|Computer ID: 31989; location: ; project prefs: default
4/15/2006 4:50:21 PM||General prefs: from boincsimap (last modified 2006-04-11 17:42:26)
4/15/2006 4:50:21 PM||General prefs: no separate prefs for home; using your defaults
4/15/2006 4:50:22 PM||Remote control not allowed; using loopback address
ID: 1186 · Report as offensive    Reply Quote
bt1228

Send message
Joined: 22 Mar 06
Posts: 7
Credit: 9,385
RAC: 0
Message 1188 - Posted: 16 Apr 2006, 2:42:01 UTC

RALPH wu: FACONTACTS_NOFILTERS_1vie__381_1_0 using rosetta_beta version 499, has been running for 23:47:18 and is 1.041% complete. BOINC Mgr: 5.4.3

wu: https://ralph.bakerlab.org/workunit.php?wuid=78443
result: https://ralph.bakerlab.org/result.php?resultid=86069

I'll kill this WU when it hits 24:00:00.

--- bt
ID: 1188 · Report as offensive    Reply Quote
Psycodad

Send message
Joined: 16 Feb 06
Posts: 14
Credit: 2,157
RAC: 0
Message 1201 - Posted: 17 Apr 2006, 11:10:09 UTC

I have a WU: HBLR_1.0_1ogw_377_23_1 using rosetta_beta version 499, which only gets to 1.30 % and then goes back to 1.13 % after switching to another project.

Result
Workunit

What should I do now?
ID: 1201 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1202 - Posted: 17 Apr 2006, 12:03:31 UTC - in response to Message 1185.  

on another machine I have an "hblr" 4.99 stuck at one percent, model 1, step 0, stage "initializing". I'd restarted boinc after getting a fatal error message, and seeing this wu at 59 hours and change. It is work unit HBLR_1.0_1di2_377_4.

OH wait a second, the cpu time dropped back to 54 hours and it's working again. Stage Ab initio, model 1, Step 19013 and progressing. do I let it continue? It's a celeron 500, win98se, 256Mram running on realVNC so obviously I don't watch this one too closely and don't care about credit. Is there someway this may be helpful?

tony

[edit] it's now at 1.041% so it's not stuck anymore, I'll let it run unless otherwise instructed.

OK this unit is STILL running. I suspended boincsimap as it would cause error which made boinc be restarted. It's now at 89:32:49 cpu time, 94:35:57 remainiing, and 2.291 % done. It's on AB INITIO model 1, step 2691. This is the only thing running, it hasn't switched or been paused. It just seems to keep looping.

Still awaiting further instructions. Can this info be helpful?
ID: 1202 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 1203 - Posted: 17 Apr 2006, 12:17:18 UTC - in response to Message 1201.  
Last modified: 17 Apr 2006, 12:26:01 UTC

I have a WU: HBLR_1.0_1ogw_377_23_1 using rosetta_beta version 499, which only gets to 1.30 % and then goes back to 1.13 % after switching to another project.

Result
Workunit

What should I do now?


IMHO:
1) Install this debugger! may be they find the bugs better
https://ralph.bakerlab.org/forum_thread.php?id=166
*read all thread

2) Abort all WUs u have of rosetta_beta 4.99 and earlier versions
we are now testing 5.00 !
*Is a nonsense continue testing alpha versions of *OLD/obsolete* stuff!
Click signature for global team stats
ID: 1203 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1205 - Posted: 17 Apr 2006, 19:16:48 UTC - in response to Message 1202.  

on another machine I have an "hblr" 4.99 stuck at one percent, model 1, step 0, stage "initializing". I'd restarted boinc after getting a fatal error message, and seeing this wu at 59 hours and change. It is work unit HBLR_1.0_1di2_377_4.

OH wait a second, the cpu time dropped back to 54 hours and it's working again. Stage Ab initio, model 1, Step 19013 and progressing. do I let it continue? It's a celeron 500, win98se, 256Mram running on realVNC so obviously I don't watch this one too closely and don't care about credit. Is there someway this may be helpful?

tony

[edit] it's now at 1.041% so it's not stuck anymore, I'll let it run unless otherwise instructed.

OK this unit is STILL running. I suspended boincsimap as it would cause error which made boinc be restarted. It's now at 89:32:49 cpu time, 94:35:57 remainiing, and 2.291 % done. It's on AB INITIO model 1, step 2691. This is the only thing running, it hasn't switched or been paused. It just seems to keep looping.

Still awaiting further instructions. Can this info be helpful?

It's now at 96:40:29 cpu time, 2.539%, 101:16:46 to completion. There are 14 red dots on graphic, which states:
Stage: full atom relax, Model 1, Step 31878

Still letting her run, waiting on instructions.

tony
ID: 1205 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1210 - Posted: 18 Apr 2006, 2:55:31 UTC - in response to Message 1205.  

on another machine I have an "hblr" 4.99 stuck at one percent, model 1, step 0, stage "initializing". I'd restarted boinc after getting a fatal error message, and seeing this wu at 59 hours and change. It is work unit HBLR_1.0_1di2_377_4.

OH wait a second, the cpu time dropped back to 54 hours and it's working again. Stage Ab initio, model 1, Step 19013 and progressing. do I let it continue? It's a celeron 500, win98se, 256Mram running on realVNC so obviously I don't watch this one too closely and don't care about credit. Is there someway this may be helpful?

tony

[edit] it's now at 1.041% so it's not stuck anymore, I'll let it run unless otherwise instructed.

OK this unit is STILL running. I suspended boincsimap as it would cause error which made boinc be restarted. It's now at 89:32:49 cpu time, 94:35:57 remainiing, and 2.291 % done. It's on AB INITIO model 1, step 2691. This is the only thing running, it hasn't switched or been paused. It just seems to keep looping.

Still awaiting further instructions. Can this info be helpful?

It's now at 96:40:29 cpu time, 2.539%, 101:16:46 to completion. There are 14 red dots on graphic, which states:
Stage: full atom relax, Model 1, Step 31878

Still letting her run, waiting on instructions.

tony

It's still chugging along, but from the model and step numbers I think it's going in circles.

Cpu time 104:11:17, 2.795% done, 108:17:52 remaining
graph shows 17 red dots
Stage: full atom relax, Model 1, step 31473.

It was at model 1, step 31878 many hours ago. This is not being switched, paused, or removed from memory.
ID: 1210 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1211 - Posted: 18 Apr 2006, 3:14:34 UTC - in response to Message 1210.  
Last modified: 18 Apr 2006, 3:18:39 UTC

on another machine I have an "hblr" 4.99 stuck at one percent, model 1, step 0, stage "initializing". I'd restarted boinc after getting a fatal error message, and seeing this wu at 59 hours and change. It is work unit HBLR_1.0_1di2_377_4.

OH wait a second, the cpu time dropped back to 54 hours and it's working again. Stage Ab initio, model 1, Step 19013 and progressing. do I let it continue? It's a celeron 500, win98se, 256Mram running on realVNC so obviously I don't watch this one too closely and don't care about credit. Is there someway this may be helpful?

tony

[edit] it's now at 1.041% so it's not stuck anymore, I'll let it run unless otherwise instructed.

OK this unit is STILL running. I suspended boincsimap as it would cause error which made boinc be restarted. It's now at 89:32:49 cpu time, 94:35:57 remainiing, and 2.291 % done. It's on AB INITIO model 1, step 2691. This is the only thing running, it hasn't switched or been paused. It just seems to keep looping.

Still awaiting further instructions. Can this info be helpful?

It's now at 96:40:29 cpu time, 2.539%, 101:16:46 to completion. There are 14 red dots on graphic, which states:
Stage: full atom relax, Model 1, Step 31878

Still letting her run, waiting on instructions.

tony

It's still chugging along, but from the model and step numbers I think it's going in circles.

Cpu time 104:11:17, 2.795% done, 108:17:52 remaining
graph shows 17 red dots
Stage: full atom relax, Model 1, step 31473.

It was at model 1, step 31878 many hours ago. This is not being switched, paused, or removed from memory.

OK, I just watched it switch back to Model 1 Step 130, now it's 2.8155% done and there are NO red dots, but plenty of teal ones, and they're in a completely different pattern from what it just was. CPU time 104:30:12. It had gotten up to model 1 step 31535 (was the last I saw and only a few minutes had passed so it could have gone much beyond that).

in the time it took to type this the steps jumped up to 28000ish. it only stayed in Ab initio maybe a minute or two, and is now in full atom relax, and my red dots are back. I think the scale prevents me from seeing them all yet.
ID: 1211 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1212 - Posted: 18 Apr 2006, 3:47:34 UTC

Is switching from ab initio to full atom relax to ab intio to full atom relax and on and on and on within the same model normal?
ID: 1212 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1213 - Posted: 18 Apr 2006, 13:35:43 UTC - in response to Message 1211.  

on another machine I have an "hblr" 4.99 stuck at one percent, model 1, step 0, stage "initializing". I'd restarted boinc after getting a fatal error message, and seeing this wu at 59 hours and change. It is work unit HBLR_1.0_1di2_377_4.

OH wait a second, the cpu time dropped back to 54 hours and it's working again. Stage Ab initio, model 1, Step 19013 and progressing. do I let it continue? It's a celeron 500, win98se, 256Mram running on realVNC so obviously I don't watch this one too closely and don't care about credit. Is there someway this may be helpful?

tony

[edit] it's now at 1.041% so it's not stuck anymore, I'll let it run unless otherwise instructed.

OK this unit is STILL running. I suspended boincsimap as it would cause error which made boinc be restarted. It's now at 89:32:49 cpu time, 94:35:57 remainiing, and 2.291 % done. It's on AB INITIO model 1, step 2691. This is the only thing running, it hasn't switched or been paused. It just seems to keep looping.

Still awaiting further instructions. Can this info be helpful?

It's now at 96:40:29 cpu time, 2.539%, 101:16:46 to completion. There are 14 red dots on graphic, which states:
Stage: full atom relax, Model 1, Step 31878

Still letting her run, waiting on instructions.

tony

It's still chugging along, but from the model and step numbers I think it's going in circles.

Cpu time 104:11:17, 2.795% done, 108:17:52 remaining
graph shows 17 red dots
Stage: full atom relax, Model 1, step 31473.

It was at model 1, step 31878 many hours ago. This is not being switched, paused, or removed from memory.

OK, I just watched it switch back to Model 1 Step 130, now it's 2.8155% done and there are NO red dots, but plenty of teal ones, and they're in a completely different pattern from what it just was. CPU time 104:30:12. It had gotten up to model 1 step 31535 (was the last I saw and only a few minutes had passed so it could have gone much beyond that).

in the time it took to type this the steps jumped up to 28000ish. it only stayed in Ab initio maybe a minute or two, and is now in full atom relax, and my red dots are back. I think the scale prevents me from seeing them all yet.

OK, still running, 22 red dots, model 1, step 32190.
114:42:18 cpu time, 3.306% done, 117:51:40 remaining
ID: 1213 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 1214 - Posted: 18 Apr 2006, 14:41:28 UTC
Last modified: 18 Apr 2006, 14:49:41 UTC

mmciastro the current version is Rosetta_beta 5.00

Why are u reporting bugs of old versions here ?

If u want to finish u old version jobs, u are free to do it,
however I believe that cause version 4.99 had obvious bugs,
this will only increase u credit loss.

Better, u abort all of them, and help us testing the current version
that is available to windows mac and linux.

but do what u think is right -:(

ps: If possible do not post more bugs of *obsolete* versions.

This only servers to confuse developers, that believe this way
that there is still the 1% bug ...
and thus, delay them on making 5.00 the production version of Rosetta.

Thanks
Click signature for global team stats
ID: 1214 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1215 - Posted: 18 Apr 2006, 14:55:30 UTC

Carlos, have they said they can't use older results to help debug the future versions? Have they said "always delete all results after a new versions come out? I haven't seen that. Hence, I'm reporting it and waiting for further instructions from Mod9/developers. I'd hate to dump it if it can be useful. Maybe they've already found the problem. If so someone should say something. This is an alpha project. Boinc Alpha wants reports from previous versions. They still have the 4.99 threads listed for use, that says, they still want reports or haven't "closed" them yet. Either way, I want someone to tell me if this is useful (see my first and succeeding posts). I will continue to post this until someone says otherwise.

I question my posts qualifying for acceptance to this thread, but it started as a 1% bug. Mod9 can feel free to move or delete it. All I need is some guidance from management as to how I can best help them.

tony
Formerly
mmciastro. Name and avatar changed for a change

The New Online Helpsytem help is just a call away.
ID: 1215 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1216 - Posted: 18 Apr 2006, 15:20:27 UTC
Last modified: 18 Apr 2006, 15:20:39 UTC

Older versions can be helpful, see the following WU(this is very typical as I scan my results page):

79398 566 7 Apr 2006 22:24:47 UTC 8 Apr 2006 8:51:57 UTC Over Client error Computing 212.18 0.40 ---
81388 1531 8 Apr 2006 14:53:27 UTC 15 Apr 2006 18:53:05 UTC Over Client error Computing 130.30 0.27 ---
88134 2175 15 Apr 2006 18:53:29 UTC 18 Apr 2006 11:42:31 UTC Over Success Done 15,423.92 24.23 24.23

notice how the first two users had "client error computing" (these were "unhandled exceptions"), and yet I did it successfully? This tells them that the wu itself can be finished and isn't bad in all cases, it's just that some conditional difference exists between the first two users and myself. The question that can help debug becomes "what's different between the first two users and the third.

If I had aborted it, they wouldn't have this info to work with.
ID: 1216 · Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 17 Feb 06
Posts: 50
Credit: 55,397
RAC: 0
Message 1218 - Posted: 18 Apr 2006, 15:30:10 UTC - in response to Message 1215.  

Carlos, have they said they can't use older results to help debug the future versions? Have they said "always delete all results after a new versions come out? I haven't seen that. Hence, I'm reporting it and waiting for further instructions from Mod9/developers. I'd hate to dump it if it can be useful. Maybe they've already found the problem. If so someone should say something. This is an alpha project. Boinc Alpha wants reports from previous versions. They still have the 4.99 threads listed for use, that says, they still want reports or haven't "closed" them yet. Either way, I want someone to tell me if this is useful (see my first and succeeding posts). I will continue to post this until someone says otherwise.

I question my posts qualifying for acceptance to this thread, but it started as a 1% bug. Mod9 can feel free to move or delete it. All I need is some guidance from management as to how I can best help them.

tony



Tony,
I agree with you. You never know what the next version is really testing, might not be anything to do with the 1% bug and hence your observations/questions are very valid. I suspect they (devs) need all the help they can get.
Mike

ID: 1218 · Report as offensive    Reply Quote
Divide Overflow

Send message
Joined: 15 Feb 06
Posts: 12
Credit: 128,027
RAC: 0
Message 1221 - Posted: 18 Apr 2006, 16:41:15 UTC
Last modified: 18 Apr 2006, 16:44:53 UTC

Tony, I agree with you as well. The directive from the dev's is *not* to abort work units unless specifically asked to. https://ralph.bakerlab.org/forum_thread.php?id=18

If it's not giving you any problems and progressing properly, let it crunch! If you have a question or problem about what is currently being crunched, post and ask about it. The dev's are smart enough not to get confused about fixed vs. onging issues.

ID: 1221 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1222 - Posted: 18 Apr 2006, 16:48:07 UTC - in response to Message 1221.  

Tony, I agree with you as well. The directive from the dev's is *not* to abort work units unless specifically asked to. https://ralph.bakerlab.org/forum_thread.php?id=18

If it's not giving you any problems and progressing properly, let it crunch!

David, that's part of the question I need answered, it's a timex, in that it keeps crunching, graphics work well, all the bits move, % done advances, CPU time advances, and even the "estimate to completion" moves, but it keeps getting higher. This could be due to the way win98 counts time.

I see it run "Ab initio", then switch to "full atom relax", then it loops back to "ab initio" and starts all over again. All the while staying on "model 1". Is this how others see it working? I was thinking it did "ab initio", then "full atom relax", and then switched to the next model, but I'm not sure which way is "normal".

tony
ID: 1222 · Report as offensive    Reply Quote
tralala

Send message
Joined: 12 Apr 06
Posts: 52
Credit: 15,257
RAC: 0
Message 1224 - Posted: 18 Apr 2006, 19:49:56 UTC - in response to Message 1222.  
Last modified: 18 Apr 2006, 19:51:44 UTC

Tony, I agree with you as well. The directive from the dev's is *not* to abort work units unless specifically asked to. https://ralph.bakerlab.org/forum_thread.php?id=18

If it's not giving you any problems and progressing properly, let it crunch!

David, that's part of the question I need answered, it's a timex, in that it keeps crunching, graphics work well, all the bits move, % done advances, CPU time advances, and even the "estimate to completion" moves, but it keeps getting higher. This could be due to the way win98 counts time.

I see it run "Ab initio", then switch to "full atom relax", then it loops back to "ab initio" and starts all over again. All the while staying on "model 1". Is this how others see it working? I was thinking it did "ab initio", then "full atom relax", and then switched to the next model, but I'm not sure which way is "normal".

tony


@mmciastro I think you made your point now you should abort the WU. The error loop was discovered correctly from you and will help the devs no need to observe that loop another 100 hours. And yes there is already version 5.00 so abort all the 4.99 WUs since now the results of 5.00 do matter.
ID: 1224 · Report as offensive    Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here



©2024 University of Washington
http://www.bakerlab.org