Report \"stuck at 1%\" bugs here

Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Rom Walton (BOINC)
Volunteer moderator
Project developer

Send message
Joined: 10 Mar 06
Posts: 21
Credit: 5,515
RAC: 0
Message 958 - Posted: 23 Mar 2006, 2:26:41 UTC
Last modified: 23 Mar 2006, 2:27:16 UTC

For those who are searching for this bug could you upgrade your BOINC client to 5.3.28 or better?

Apparently the 5.2.x clients don't send the right instruction to the application when it is time to abort to cause it to dump the backtraces for the various threads.

Sorry about that. 5.3.x has been in the oven for quite awhile and I forgot what I was hooking into wasn't supported by the older client.

----- Rom
ID: 958 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 968 - Posted: 24 Mar 2006, 14:53:59 UTC - in response to Message 958.  

For those who are searching for this bug could you upgrade your BOINC client to 5.3.28 or better?

Apparently the 5.2.x clients don't send the right instruction to the application when it is time to abort to cause it to dump the backtraces for the various threads.

Sorry about that. 5.3.x has been in the oven for quite awhile and I forgot what I was hooking into wasn't supported by the older client.

----- Rom


It seems like 5.3.28 don´t work with WIN 98.

Anders n

ID: 968 · Report as offensive    Reply Quote
Profile UBT - Halifax--lad

Send message
Joined: 15 Feb 06
Posts: 29
Credit: 2,723
RAC: 0
Message 975 - Posted: 24 Mar 2006, 22:29:35 UTC - in response to Message 973.  

Before people perform an upgrade to their BOINC software, could you please provide some information as to the impact this may have on other projects they may be running. Many of the users are running multiple projects, and this kind of an upgrade could have serious implications for those other efforts.

In particular people running CPDN and Predictor may have some issues.



There are no implications I upgrade my BOINC client whenever ROM and the team bring out a new version, to help test if for bugs, in the many months I have been upgrading to various clients I have never had a trashed WU.

Besides if people wish to help RALPH solve the 1% bug they have no choice this is the only BOINC client that handles what RALPH needs for the error reporting
Join us in Chat (see the forum) Click the Sig


Join UBT
ID: 975 · Report as offensive    Reply Quote
rbpeake

Send message
Joined: 16 Feb 06
Posts: 19
Credit: 3,370
RAC: 0
Message 976 - Posted: 25 Mar 2006, 0:45:16 UTC

I do not know if this will help, but one of my units errored out:


Result ID 50692
Name HB_BARCODE_30_1ten__354_40_0
Workunit 46237
ID: 976 · Report as offensive    Reply Quote
Snake Doctor

Send message
Joined: 16 Feb 06
Posts: 37
Credit: 998,880
RAC: 0
Message 977 - Posted: 25 Mar 2006, 4:16:27 UTC - in response to Message 975.  

Before people perform an upgrade to their BOINC software, could you please provide some information as to the impact this may have on other projects they may be running. Many of the users are running multiple projects, and this kind of an upgrade could have serious implications for those other efforts.

In particular people running CPDN and Predictor may have some issues.



There are no implications I upgrade my BOINC client whenever ROM and the team bring out a new version, to help test if for bugs, in the many months I have been upgrading to various clients I have never had a trashed WU.

Besides if people wish to help RALPH solve the 1% bug they have no choice this is the only BOINC client that handles what RALPH needs for the error reporting



Actually there are implications for other some of the other projects. It depends on the platform a person is using and the project requirements. AS far as I can see the 1% problem is a Windoze problem. Those of us using Macs may not have to upgrade at all. Some of the projects cannot use the newest BOINC versions without upgrading their servers and or applications. This has been shown to be the case in the past. So while I am happy that this seems to work for you, I would for one would prefer to take guidance from Rom or David Kim on this point that is part of what they are here for.

Regards
Phil

ID: 977 · Report as offensive    Reply Quote
Profile UBT - Halifax--lad

Send message
Joined: 15 Feb 06
Posts: 29
Credit: 2,723
RAC: 0
Message 982 - Posted: 25 Mar 2006, 7:37:13 UTC - in response to Message 977.  

Some of the projects cannot use the newest BOINC versions without upgrading their servers and or applications


This is untrue for all the latest BOINC clients, they work off the Server Version 5, so every project will run off 5.3.28 with no problems

Join us in Chat (see the forum) Click the Sig


Join UBT
ID: 982 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 983 - Posted: 25 Mar 2006, 8:06:23 UTC
Last modified: 25 Mar 2006, 8:11:53 UTC

I am, from long time ago, running boinc 5.3.2 on my windows PC(s)

*This, cause I administer my putters remotely,
and I need of a consistent rpc port to connect with

see the line of a python script I use to start boinc on remote PC via telnet over INTERNET
-> note the -gui_rpc_port and the -detach
commands I use. these commands, does *not* work with 5.2.x clients

tn.write ('at %02d:%02d /next: "S:\boinc.exe" -redirectio -allow_remote_gui_rpc -gui_rpc_port 31416 -return_results_immediately -detachrn' % (hora, minuto))

Though, I have *no* stuck WU(s) to report on my windows PC(s) -:)
all my "stuck WU(s)" do happens on Linux, at any % ... without using CPU

Idea: How about a separate thread for each different % (stuck at) ?
*Stuck at 1% already exists ... stuck at 83.31% and all other %(s) are missing.

Click signature for global team stats
ID: 983 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 985 - Posted: 25 Mar 2006, 8:32:21 UTC
Last modified: 25 Mar 2006, 8:40:25 UTC

I want to run 3 projects on my PC

*However each project requires a different boinc version to run -:(

Actually seems that to run 3 projects at the same time, only if u own 3 pcs

This is what happens using a boinc version different from the boinc version
that the project requires

Date Host Project ID Message
3/23/2006 6:18:37 PM carlos.cp3 http://issofty17.is.noda.tus.ac.jp/ 169 Master file download succeeded
3/23/2006 6:18:37 PM carlos.cp3 http://issofty17.is.noda.tus.ac.jp/ 170 Sending scheduler request to http://issofty17.is.noda.tus.ac.jp/cgi/cgi
3/23/2006 6:18:37 PM carlos.cp3 http://issofty17.is.noda.tus.ac.jp/ 171 Reason: Requested by user
3/23/2006 6:18:37 PM carlos.cp3 http://issofty17.is.noda.tus.ac.jp/ 172 Requesting 43200 seconds of new work
3/23/2006 6:18:44 PM carlos.cp3 http://issofty17.is.noda.tus.ac.jp/ 173 Scheduler request to http://issofty17.is.noda.tus.ac.jp/cgi/cgi succeeded
3/23/2006 6:18:44 PM carlos.cp3 Project TANPAKU 174 Message from server: Need major version 4 of the BOINC core client. You have 5.
3/23/2006 6:18:44 PM carlos.cp3 Project TANPAKU 175 Resetting project
3/23/2006 6:18:44 PM carlos.cp3 --- 176 Rescheduling CPU: exit_tasks
3/23/2006 6:18:44 PM carlos.cp3 Project TANPAKU 177 Detaching from project
3/23/2006 6:53:38 PM carlos.cp3 --- 178 Rescheduling CPU: application exited

seems that changing the boinc version, causes only
a reset/detach for *all* projects u are running, that does not like
of the boinc version u are using -!

Click signature for global team stats
ID: 985 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 986 - Posted: 25 Mar 2006, 8:38:00 UTC

Hello Carlos

How about posting on Project TANPAKU forum asking them to upgade there server.
:)

Anders n
ID: 986 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 987 - Posted: 25 Mar 2006, 9:03:52 UTC - in response to Message 986.  

Hello Carlos

How about posting on Project TANPAKU forum asking them to upgade there server.
:)

Anders n


Somewhat difficult ... my keyboard does not have oriental language
special characteres keys.

However I am not discussing the problems of the project の計算結果

*I was only showing what happens when a boinc version get changed

Nothing to worry -:! only a (reset plus a detach for all u projects)
*I believe that u will lose only the cpu time on jobs partially crunched
(or finisehd, but not uploaded yet)

Click signature for global team stats
ID: 987 · Report as offensive    Reply Quote
Dotsch
Avatar

Send message
Joined: 4 Mar 06
Posts: 12
Credit: 13,725
RAC: 0
Message 1102 - Posted: 12 Apr 2006, 20:35:40 UTC - in response to Message 1.  

Result : https://ralph.bakerlab.org/result.php?resultid=85041
WU : https://ralph.bakerlab.org/workunit.php?wuid=777
Host : https://ralph.bakerlab.org/results.php?hostid=

Computed about 2 hours, max 1.19 %, switched back to 1.0 % after restart from scheduler (started other project and switched back).
ID: 1102 · Report as offensive    Reply Quote
Nuadormrac
Avatar

Send message
Joined: 22 Feb 06
Posts: 68
Credit: 11,362
RAC: 0
Message 1106 - Posted: 13 Apr 2006, 0:32:00 UTC
Last modified: 13 Apr 2006, 0:35:13 UTC

Actually, in Windows XP (not sure of Linux as I haven't looked into this there), there is a means to get Japanese characters using a US keyboard... Basically it requires setting up IME (input methods editor), and then setting the computer up for multi-lingual support. But basically, when one sets it to the correct language mode for what one wants to type into, typing char combinations will result in the text being converted on the screen, from what one typed to the char (in that case hiragana or katakana) that represents what one typed. It also has a "find kanjii function" to convert the appropriate things into Kanjii...

Many people in Japan (from what I heard in Japanese 201) also use standard English keyboards, rather then Hiragana based keyboards, and then use an IME to just convert the text in software as they type. Main reason, it's faster then a Japanese native keyboard from what I've heard.

That leaves the over-riding problem however, and that is speaking their langauge sufficiently to write a message. Sorry to say, I'm not even fluent enough to do that as of yet...

Oh, and on CPDN, make sure your backup is recent, when going to upgrade, and have networking disabled when it starts CPDN. That way if something does happen, CPDN can't "phone home" and one can restore the WU.... My backup's about 3 months (not actual months, just model months) out of date... That represents a few hours of crunch time or so on my A64...
ID: 1106 · Report as offensive    Reply Quote
Profile [B^S] Dr. Bill Skiba
Avatar

Send message
Joined: 15 Feb 06
Posts: 4
Credit: 6,496
RAC: 0
Message 1182 - Posted: 15 Apr 2006, 12:01:59 UTC

I aborted this wu stuck at 1.09% for over 3 hours.

https://ralph.bakerlab.org/result.php?resultid=86049

ID: 1182 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1185 - Posted: 15 Apr 2006, 19:59:04 UTC
Last modified: 15 Apr 2006, 20:02:20 UTC

on another machine I have an "hblr" 4.99 stuck at one percent, model 1, step 0, stage "initializing". I'd restarted boinc after getting a fatal error message, and seeing this wu at 59 hours and change. It is work unit HBLR_1.0_1di2_377_4.

OH wait a second, the cpu time dropped back to 54 hours and it's working again. Stage Ab initio, model 1, Step 19013 and progressing. do I let it continue? It's a celeron 500, win98se, 256Mram running on realVNC so obviously I don't watch this one too closely and don't care about credit. Is there someway this may be helpful?

tony

[edit] it's now at 1.041% so it's not stuck anymore, I'll let it run unless otherwise instructed.
Formerly
mmciastro. Name and avatar changed for a change

The New Online Helpsytem help is just a call away.
ID: 1185 · Report as offensive    Reply Quote
MatthewBChambers

Send message
Joined: 13 Mar 06
Posts: 4
Credit: 5,367
RAC: 0
Message 1186 - Posted: 15 Apr 2006, 23:42:17 UTC

(I don't know if this should be here, at the v4.99 thread, or both.)


I have a 4.99 for Windows version stuck at about 1% progress for many hours (at least 8 of its time). It currently says 1.13% after 16:17:07 CPU time, with 30:25:38 to go, supposedly. I just aborted it since a new version is out.



Host ID:
https://ralph.bakerlab.org/show_host_detail.php?hostid=2404

Work unit ID:
https://ralph.bakerlab.org/workunit.php?wuid=76544

Result ID:
https://ralph.bakerlab.org/result.php?resultid=85055

Here is the BOINC startup info:
4/15/2006 4:50:21 PM||Starting BOINC client version 5.2.13 for windows_intelx86
4/15/2006 4:50:21 PM||libcurl/7.14.0 OpenSSL/0.9.8 zlib/1.2.3
4/15/2006 4:50:21 PM||Data directory: C:Program FilesBOINC
4/15/2006 4:50:21 PM||Processor: 1 GenuineIntel x86 Family 6 Model 8 Stepping 6 863MHz
4/15/2006 4:50:21 PM||Memory: 383.30 MB physical, 922.22 MB virtual
4/15/2006 4:50:21 PM||Disk: 24.41 GB total, 19.34 GB free
4/15/2006 4:50:21 PM|rosetta@home|Computer ID: 197494; location: home; project prefs: default
4/15/2006 4:50:21 PM|boincsimap|Computer ID: 17955; location: home; project prefs: default
4/15/2006 4:50:21 PM|Einstein@Home|Computer ID: 594228; location: home; project prefs: default
4/15/2006 4:50:21 PM|LHC@home|Computer ID: 142531; location: home; project prefs: default
4/15/2006 4:50:21 PM|Predictor @ Home|Computer ID: 237773; location: home; project prefs: default
4/15/2006 4:50:21 PM|ralph@home|Computer ID: 2404; location: home; project prefs: default
4/15/2006 4:50:21 PM|SETI@home|Computer ID: 2330542; location: home; project prefs: default
4/15/2006 4:50:21 PM|SZTAKI Desktop Grid|Computer ID: 17392; location: home; project prefs: default
4/15/2006 4:50:21 PM|World Community Grid|Computer ID: 31989; location: ; project prefs: default
4/15/2006 4:50:21 PM||General prefs: from boincsimap (last modified 2006-04-11 17:42:26)
4/15/2006 4:50:21 PM||General prefs: no separate prefs for home; using your defaults
4/15/2006 4:50:22 PM||Remote control not allowed; using loopback address
ID: 1186 · Report as offensive    Reply Quote
bt1228

Send message
Joined: 22 Mar 06
Posts: 7
Credit: 9,385
RAC: 0
Message 1188 - Posted: 16 Apr 2006, 2:42:01 UTC

RALPH wu: FACONTACTS_NOFILTERS_1vie__381_1_0 using rosetta_beta version 499, has been running for 23:47:18 and is 1.041% complete. BOINC Mgr: 5.4.3

wu: https://ralph.bakerlab.org/workunit.php?wuid=78443
result: https://ralph.bakerlab.org/result.php?resultid=86069

I'll kill this WU when it hits 24:00:00.

--- bt
ID: 1188 · Report as offensive    Reply Quote
Psycodad

Send message
Joined: 16 Feb 06
Posts: 14
Credit: 2,157
RAC: 0
Message 1201 - Posted: 17 Apr 2006, 11:10:09 UTC

I have a WU: HBLR_1.0_1ogw_377_23_1 using rosetta_beta version 499, which only gets to 1.30 % and then goes back to 1.13 % after switching to another project.

Result
Workunit

What should I do now?
ID: 1201 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1202 - Posted: 17 Apr 2006, 12:03:31 UTC - in response to Message 1185.  

on another machine I have an "hblr" 4.99 stuck at one percent, model 1, step 0, stage "initializing". I'd restarted boinc after getting a fatal error message, and seeing this wu at 59 hours and change. It is work unit HBLR_1.0_1di2_377_4.

OH wait a second, the cpu time dropped back to 54 hours and it's working again. Stage Ab initio, model 1, Step 19013 and progressing. do I let it continue? It's a celeron 500, win98se, 256Mram running on realVNC so obviously I don't watch this one too closely and don't care about credit. Is there someway this may be helpful?

tony

[edit] it's now at 1.041% so it's not stuck anymore, I'll let it run unless otherwise instructed.

OK this unit is STILL running. I suspended boincsimap as it would cause error which made boinc be restarted. It's now at 89:32:49 cpu time, 94:35:57 remainiing, and 2.291 % done. It's on AB INITIO model 1, step 2691. This is the only thing running, it hasn't switched or been paused. It just seems to keep looping.

Still awaiting further instructions. Can this info be helpful?
ID: 1202 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 1203 - Posted: 17 Apr 2006, 12:17:18 UTC - in response to Message 1201.  
Last modified: 17 Apr 2006, 12:26:01 UTC

I have a WU: HBLR_1.0_1ogw_377_23_1 using rosetta_beta version 499, which only gets to 1.30 % and then goes back to 1.13 % after switching to another project.

Result
Workunit

What should I do now?


IMHO:
1) Install this debugger! may be they find the bugs better
https://ralph.bakerlab.org/forum_thread.php?id=166
*read all thread

2) Abort all WUs u have of rosetta_beta 4.99 and earlier versions
we are now testing 5.00 !
*Is a nonsense continue testing alpha versions of *OLD/obsolete* stuff!
Click signature for global team stats
ID: 1203 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1205 - Posted: 17 Apr 2006, 19:16:48 UTC - in response to Message 1202.  

on another machine I have an "hblr" 4.99 stuck at one percent, model 1, step 0, stage "initializing". I'd restarted boinc after getting a fatal error message, and seeing this wu at 59 hours and change. It is work unit HBLR_1.0_1di2_377_4.

OH wait a second, the cpu time dropped back to 54 hours and it's working again. Stage Ab initio, model 1, Step 19013 and progressing. do I let it continue? It's a celeron 500, win98se, 256Mram running on realVNC so obviously I don't watch this one too closely and don't care about credit. Is there someway this may be helpful?

tony

[edit] it's now at 1.041% so it's not stuck anymore, I'll let it run unless otherwise instructed.

OK this unit is STILL running. I suspended boincsimap as it would cause error which made boinc be restarted. It's now at 89:32:49 cpu time, 94:35:57 remainiing, and 2.291 % done. It's on AB INITIO model 1, step 2691. This is the only thing running, it hasn't switched or been paused. It just seems to keep looping.

Still awaiting further instructions. Can this info be helpful?

It's now at 96:40:29 cpu time, 2.539%, 101:16:46 to completion. There are 14 red dots on graphic, which states:
Stage: full atom relax, Model 1, Step 31878

Still letting her run, waiting on instructions.

tony
ID: 1205 · Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : RALPH@home bug list : Report \"stuck at 1%\" bugs here



©2024 University of Washington
http://www.bakerlab.org