Bug reports for Ralph 5.23

Message boards : RALPH@home bug list : Bug reports for Ralph 5.23

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1807 - Posted: 10 Jun 2006, 1:25:34 UTC

We fixed something in the interaction of Rosetta with BOINC to trigger more informative debugging messages upon crashes. Please continue to post what goes wrong!
ID: 1807 · Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 19 Feb 06
Posts: 37
Credit: 2,089
RAC: 0
Message 1815 - Posted: 11 Jun 2006, 11:18:05 UTC - in response to Message 1807.  

We fixed something in the interaction of Rosetta with BOINC to trigger more informative debugging messages upon crashes. Please continue to post what goes wrong!


We will, as soon we're able to get some. :-/



[color=navy][b]"I'm trying to maintain a shred of dignity in this world." - Me[/b][/color]

ID: 1815 · Report as offensive    Reply Quote
Neal

Send message
Joined: 6 Mar 06
Posts: 4
Credit: 34,698
RAC: 0
Message 1816 - Posted: 11 Jun 2006, 18:11:08 UTC

How long has the site been "Down for maintenance"? Neal
ID: 1816 · Report as offensive    Reply Quote
[B^S] sTrey
Avatar

Send message
Joined: 15 Feb 06
Posts: 58
Credit: 15,430
RAC: 0
Message 1817 - Posted: 11 Jun 2006, 18:32:17 UTC

At least as long as this thread has been going, so at least 24 hours. Going to blow past workunit deadlines soon :(
ID: 1817 · Report as offensive    Reply Quote
Pieface

Send message
Joined: 16 Feb 06
Posts: 64
Credit: 203,513
RAC: 0
Message 1818 - Posted: 11 Jun 2006, 18:44:09 UTC

Looks like it 'went down' sometime late friday. Maybe when they re-started the Rosetta server after the boinc upgrade they forgot to check on poor old second-cousin Ralphie!
ID: 1818 · Report as offensive    Reply Quote
Rhiju
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 14 Feb 06
Posts: 161
Credit: 3,725
RAC: 0
Message 1819 - Posted: 11 Jun 2006, 22:37:37 UTC - in response to Message 1818.  
Last modified: 11 Jun 2006, 22:38:03 UTC

It was a database problem, and the database guy on our team was unavailable. Ralph's back!

Looks like it 'went down' sometime late friday. Maybe when they re-started the Rosetta server after the boinc upgrade they forgot to check on poor old second-cousin Ralphie!


ID: 1819 · Report as offensive    Reply Quote
IceQueen41
Avatar

Send message
Joined: 22 Feb 06
Posts: 6
Credit: 9,473
RAC: 0
Message 1820 - Posted: 12 Jun 2006, 2:33:49 UTC

Not sure if this is a bug or not, but in the graphics for this WU, the protein simply isn't there, though everything else appears to be running properly. Also, I'm sure this has been mentioned (haven't been around much lately), but with most of the WU graphics, when opened a second (and then third, etc) time, the information at the bottom is shifted down so that the bottom line (the project URL & Accepted Energy) is not visible. Other than that, everything looks good so far.
ID: 1820 · Report as offensive    Reply Quote
Sadir

Send message
Joined: 21 Feb 06
Posts: 6
Credit: 1,419
RAC: 0
Message 1821 - Posted: 12 Jun 2006, 8:32:14 UTC
Last modified: 12 Jun 2006, 8:36:38 UTC

1)I don't know if this was not mentioned early, but the progres counting don't work well.
6 min ... 1.020%
24 min ... 1.041% (8:21:56 to completion)
40 min ... 1.042% (8:38:22 to completion)
1:09:30 ... completion
(I saw this also with 5.22)

2)More memory need?
12/06/2006 09:42:53|ralph@home|Message from server: Your computer has only 402116608 bytes of memory; workunit requires 97883392 more bytes

ID: 1821 · Report as offensive    Reply Quote
doc :)

Send message
Joined: 16 Feb 06
Posts: 46
Credit: 4,437
RAC: 0
Message 1823 - Posted: 12 Jun 2006, 15:15:34 UTC

Sadir: the percentage complete thing is perfectly normal, the first model just took that long.

no errors with 5.23 so far here, couple of successfull WUs finished.
ID: 1823 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1825 - Posted: 13 Jun 2006, 11:24:34 UTC

I have a funny (interesting) one. One my laptop (which has been pretty much flawless at Ralph, as opposed to my AMD64 3700 sandiego which experiences the "fatal windows" error) I've seen something happen twice in 24 hours. I see either the Rosetta 5.22 screensaver or the Ralph 5.23 screensaver will show on my window when I return from some personal task. the graphic will NOT go away by moving a mouse or pressing a key. I had another window open but couldn't see it. The mouse would still work on the unseen graphic if I just clicked all over I could hear it interacting, but the Rosetta Graphic would not release my screen. I ended up pressing the power button on both occasions, only to see the HD activity light blink and hear the windows log off Wav, but the Rosetta graphic was still on the screen all the way to Shutdown when the screen when dead.

Since mine is the only report of this, it was on both Rosetta and Ralph, and hasn't happened with the laptop before, I will be doing some adware/malware/virus/others scans to see if the problem is on my end.

tony
ID: 1825 · Report as offensive    Reply Quote
Aglarond

Send message
Joined: 16 Feb 06
Posts: 11
Credit: 1,094
RAC: 0
Message 1826 - Posted: 14 Jun 2006, 10:12:56 UTC
Last modified: 14 Jun 2006, 10:15:00 UTC

Strange error in FRA_t316_CASP7_hom001_1_IGNORE_THE_RESTt316_1_PROTINFO-AB_TS1.pdb_666_2_0 .

At first it was running normally but several Simap WUs had errors. Later strange error message has appeared. Something I've never seen before:
"Runtime error!
Program: ...alph.bakerlab.orgrosetta_beta_5.23_windows_intelx86.exe
This application has requested the Runtime to terminate it in unusual way. Please contact the application's support team for more information."
Screenshot is here: Ralph_error.gif (7.76KB)

My messages:
14. 6. 2006 10:50:47|ralph@home|Unrecoverable error for result FRA_t316_CASP7_hom001_1_IGNORE_THE_RESTt316_1_PROTINFO-AB_TS1.pdb_666_2_0 (The system cannot find the path specified. (0x3) - exit code 3 (0x3))
14. 6. 2006 10:50:47|ralph@home|Deferring scheduler requests for 1 minutes and 0 seconds
14. 6. 2006 10:50:47||Rescheduling CPU: application exited
14. 6. 2006 10:50:47|ralph@home|Computation for task FRA_t316_CASP7_hom001_1_IGNORE_THE_RESTt316_1_PROTINFO-AB_TS1.pdb_666_2_0 finished

Full message log: Ralph_error_log_14june2006.txt (40KB)

Computer where this error happened is PIII 500MHz, 160MB RAM, WinXP Home SP2, running only antivirus and Boinc with Simap and Ralph. (It has 512MB virtual memory, what is probably not enough for some bigger WUs)

After this WU was finished (with error), Simap stopped to make errors and finished next WU successfully.
ID: 1826 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1828 - Posted: 15 Jun 2006, 10:55:11 UTC

Rom had mentioned there might be a fix to the fatal windows errors in 5.23. When it was released, I set the box I usually got these errors with to NNW/NNT for all other projects and suspended them, so I'd run nothing but 5.23. I'm not ready to say "it's Fixed", but so far it sure looks good.
177824 158140 14 Jun 2006 19:37:15 UTC 15 Jun 2006 10:43:14 UTC Over Success Done 13,545.38 53.30 53.30
177438 157770 14 Jun 2006 15:16:32 UTC 15 Jun 2006 7:36:07 UTC Over Success Done 14,102.19 55.50 55.50
176725 156687 14 Jun 2006 7:37:56 UTC 15 Jun 2006 1:10:40 UTC Over Success Done 13,275.25 52.24 52.24
176356 156712 14 Jun 2006 3:52:59 UTC 14 Jun 2006 20:28:26 UTC Over Success Done 13,985.28 53.74 53.74
175612 151093 13 Jun 2006 20:01:42 UTC 14 Jun 2006 16:47:14 UTC Over Success Done 14,084.34 54.12 54.12
174950 155410 13 Jun 2006 13:25:22 UTC 14 Jun 2006 15:16:32 UTC Over Success Done 14,111.06 54.22 54.22
174529 155065 13 Jun 2006 9:17:55 UTC 14 Jun 2006 7:37:56 UTC Over Success Done 14,101.56 54.18 54.18
174341 154879 13 Jun 2006 6:09:29 UTC 14 Jun 2006 3:52:59 UTC Over Success Done 14,346.30 55.12 55.12
173772 154359 12 Jun 2006 22:33:06 UTC 13 Jun 2006 20:01:42 UTC Over Success Done 14,103.70 54.19 54.19
173541 154155 12 Jun 2006 19:11:15 UTC 13 Jun 2006 10:39:56 UTC Over Success Done 14,441.13 55.49 55.49
170677 146450 11 Jun 2006 22:52:42 UTC 13 Jun 2006 9:17:55 UTC Over Success Done 13,161.97 50.57 50.57
170660 146482 11 Jun 2006 22:52:42 UTC 13 Jun 2006 3:03:42 UTC Over Success Done 14,275.66 54.85 54.85
170659 146481 11 Jun 2006 22:52:42 UTC 12 Jun 2006 8:30:20 UTC Over Success Done 13,858.98 53.25 53.25
ID: 1828 · Report as offensive    Reply Quote
william

Send message
Joined: 3 Jun 06
Posts: 4
Credit: 74
RAC: 0
Message 1829 - Posted: 16 Jun 2006, 14:45:21 UTC

is this WU a troubled wu? FRA_t301_hom001_1_LOOPRLX_IGNORE_THE_REST__hom001_1_1bwzA__100_701_23_0
I have left it running now for 1:34:10 and it is only at 1.623% and finish time is up to 3333:00:01 now and still climbing
ID: 1829 · Report as offensive    Reply Quote
william

Send message
Joined: 3 Jun 06
Posts: 4
Credit: 74
RAC: 0
Message 1830 - Posted: 16 Jun 2006, 14:49:02 UTC - in response to Message 1829.  

is this WU a troubled wu? FRA_t301_hom001_1_LOOPRLX_IGNORE_THE_REST__hom001_1_1bwzA__100_701_23_0
I have left it running now for 1:34:10 and it is only at 1.623% and finish time is up to 3333:00:01 now and still climbing

never mind it just hit 100% will I was typing this into the forums so I am not sure what happened as of yet time was 1:37:27 total
ID: 1830 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1831 - Posted: 16 Jun 2006, 14:51:23 UTC - in response to Message 1829.  

is this WU a troubled wu? FRA_t301_hom001_1_LOOPRLX_IGNORE_THE_REST__hom001_1_1bwzA__100_701_23_0
I have left it running now for 1:34:10 and it is only at 1.623% and finish time is up to 3333:00:01 now and still climbing

the progress indicator isn't linear. What you'll see are jumps in percetage. All WUs start at 1% and slowly proceed higher until one model is done. Then it jumps to another percentage and the points to the right of the decimal slowly proceed again until the next model is done. I.E you might see this if you checked the status every 10 min: 1.000, 1.0001, 1.0002, 1.003, 12.000, 12.001, 12.002, 12.003, 24.000, 24.001 etc etc until the time runs out where it jumps to 100%, uploads and reports.

The number of models depends on protein size, and puter speeds (for the most part). Every WU will run atleast ONE model regardless of time (except where terminated by "watchdog timer").

does this help?

tony
ID: 1831 · Report as offensive    Reply Quote
william

Send message
Joined: 3 Jun 06
Posts: 4
Credit: 74
RAC: 0
Message 1832 - Posted: 16 Jun 2006, 14:56:26 UTC - in response to Message 1831.  

is this WU a troubled wu? FRA_t301_hom001_1_LOOPRLX_IGNORE_THE_REST__hom001_1_1bwzA__100_701_23_0
I have left it running now for 1:34:10 and it is only at 1.623% and finish time is up to 3333:00:01 now and still climbing

the progress indicator isn't linear. What you'll see are jumps in percetage. All WUs start at 1% and slowly proceed higher until one model is done. Then it jumps to another percentage and the points to the right of the decimal slowly proceed again until the next model is done. I.E you might see this if you checked the status every 10 min: 1.000, 1.0001, 1.0002, 1.003, 12.000, 12.001, 12.002, 12.003, 24.000, 24.001 etc etc until the time runs out where it jumps to 100%, uploads and reports.

The number of models depends on protein size, and puter speeds (for the most part). Every WU will run atleast ONE model regardless of time (except where terminated by "watchdog timer").

does this help?

tony

yes and watchdog did terminate that Wu on this computer,

<core_client_version>5.4.9</core_client_version>
<stderr_txt>
# random seed: 2998884
# cpu_run_time_pref: 3600
# DONE :: 1 starting structures built 0 (nstruct) times
# This process generated 1 decoys from 1 attempts
# 0 starting pdbs were skipped


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
is the results of it.
and this is computer so I do not know if this will help in the project

CPU type GenuineIntel
Intel(R) Celeron(R) CPU 2.60GHz
Number of CPUs 1
Operating System Microsoft Windows XP
Home Edition, Service Pack 2, (05.01.2600.00)
Memory 510.98 MB
Cache 976.56 KB
Swap space 1248.2 MB
Total disk space 37.26 GB
Free Disk Space 22.39 GB
Measured floating point speed 1329.23 million ops/sec
Measured integer speed 2718.87 million ops/sec
Average upload rate 1.41 KB/sec
Average download rate 132.77 KB/sec

ID: 1832 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1833 - Posted: 16 Jun 2006, 15:29:22 UTC - in response to Message 1832.  
Last modified: 16 Jun 2006, 15:32:54 UTC

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...


This is normal, it just means the wu completed successfully and didn't need the watchdog, so it shut it down.

If you're talking about wuid=160649, then it completed sucessfully and has been credited. See the "result ID" for that WU below

Result ID 180410
Name FRA_t301_hom001_1_LOOPRLX_IGNORE_THE_REST__hom001_1_1bwzA__100_701_23_0
Workunit 160649
Created 15 Jun 2006 22:54:54 UTC
Sent 15 Jun 2006 23:51:21 UTC
Received 16 Jun 2006 14:42:46 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 2910
Report deadline 19 Jun 2006 23:51:21 UTC
CPU time 5847.390625
stderr out <core_client_version>5.4.9</core_client_version>
<stderr_txt>
# random seed: 2998884
# cpu_run_time_pref: 3600
# DONE :: 1 starting structures built 0 (nstruct) times
# This process generated 1 decoys from 1 attempts
# 0 starting pdbs were skipped


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>


Validate state Valid
Claimed credit 13.6983915054668
Granted credit 13.6983915054668
application version 5.23
if the protein is huge, your puter old, or your runtime is set low, then this is what you should be seeing with your future wus. It's helping.

tony
ID: 1833 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1834 - Posted: 17 Jun 2006, 11:00:00 UTC - in response to Message 1828.  
Last modified: 17 Jun 2006, 11:05:16 UTC

Rom had mentioned there might be a fix to the fatal windows errors in 5.23. When it was released, I set the box I usually got these errors with to NNW/NNT for all other projects and suspended them, so I'd run nothing but 5.23. I'm not ready to say "it's Fixed", but so far it sure looks good.
177824 158140 14 Jun 2006 19:37:15 UTC 15 Jun 2006 10:43:14 UTC Over Success Done 13,545.38 53.30 53.30
177438 157770 14 Jun 2006 15:16:32 UTC 15 Jun 2006 7:36:07 UTC Over Success Done 14,102.19 55.50 55.50
176725 156687 14 Jun 2006 7:37:56 UTC 15 Jun 2006 1:10:40 UTC Over Success Done 13,275.25 52.24 52.24
176356 156712 14 Jun 2006 3:52:59 UTC 14 Jun 2006 20:28:26 UTC Over Success Done 13,985.28 53.74 53.74
175612 151093 13 Jun 2006 20:01:42 UTC 14 Jun 2006 16:47:14 UTC Over Success Done 14,084.34 54.12 54.12
174950 155410 13 Jun 2006 13:25:22 UTC 14 Jun 2006 15:16:32 UTC Over Success Done 14,111.06 54.22 54.22
174529 155065 13 Jun 2006 9:17:55 UTC 14 Jun 2006 7:37:56 UTC Over Success Done 14,101.56 54.18 54.18
174341 154879 13 Jun 2006 6:09:29 UTC 14 Jun 2006 3:52:59 UTC Over Success Done 14,346.30 55.12 55.12
173772 154359 12 Jun 2006 22:33:06 UTC 13 Jun 2006 20:01:42 UTC Over Success Done 14,103.70 54.19 54.19
173541 154155 12 Jun 2006 19:11:15 UTC 13 Jun 2006 10:39:56 UTC Over Success Done 14,441.13 55.49 55.49
170677 146450 11 Jun 2006 22:52:42 UTC 13 Jun 2006 9:17:55 UTC Over Success Done 13,161.97 50.57 50.57
170660 146482 11 Jun 2006 22:52:42 UTC 13 Jun 2006 3:03:42 UTC Over Success Done 14,275.66 54.85 54.85
170659 146481 11 Jun 2006 22:52:42 UTC 12 Jun 2006 8:30:20 UTC Over Success Done 13,858.98 53.25 53.25

OK, OK I'm convinced. I haven't had any errors of any kind with 5.23. The following WU can be added to the list of successes for my error prone puter:

181942 162058 16 Jun 2006 18:44:02 UTC 17 Jun 2006 10:47:32 UTC Over Success Done 14,364.88 56.53 56.53
181682 161805 16 Jun 2006 14:44:34 UTC 17 Jun 2006 4:16:03 UTC Over Success Done 13,954.83 54.92 54.92
181002 161170 16 Jun 2006 8:41:29 UTC 17 Jun 2006 0:34:09 UTC Over Success Done 13,478.86 53.04 53.04
180751 160956 16 Jun 2006 4:39:34 UTC 16 Jun 2006 20:44:26 UTC Over Success Done 13,756.77 54.14 54.14
180432 151974 15 Jun 2006 23:29:54 UTC 16 Jun 2006 15:04:45 UTC Over Success Done 14,181.53 55.81 55.81
179254 159529 15 Jun 2006 11:09:29 UTC 16 Jun 2006 11:14:50 UTC Over Success Done 14,375.77 56.57 56.57
178840 159127 15 Jun 2006 7:36:07 UTC 16 Jun 2006 8:41:29 UTC Over Success Done 14,242.95 56.05 56.05
178558 158860 15 Jun 2006 4:15:02 UTC 15 Jun 2006 23:29:54 UTC Over Success Done 13,769.56 54.19 54.19
178127 158438 14 Jun 2006 23:29:32 UTC 15 Jun 2006 20:00:12 UTC Over Success Done 14,343.39 56.44 56.44

I've set the other projects back to "allow new work" and "resumed" them. THanks for fixing this
ID: 1834 · Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 16 Feb 06
Posts: 182
Credit: 22,792
RAC: 0
Message 1835 - Posted: 17 Jun 2006, 20:10:13 UTC - in response to Message 1821.  
Last modified: 17 Jun 2006, 20:37:25 UTC

1)I don't know if this was not mentioned early, but the progres counting don't work well.
6 min ... 1.020%
24 min ... 1.041% (8:21:56 to completion)
40 min ... 1.042% (8:38:22 to completion)
1:09:30 ... completion
(I saw this also with 5.22)

2)More memory need?
12/06/2006 09:42:53|ralph@home|Message from server: Your computer has only 402116608 bytes of memory; workunit requires 97883392 more bytes


I think there is no need for more momory -:)
Jobs are finishing OK, w/o any errors, in normal run time !

What is need is to "fix" this misleading message on both: (ralph and rosetta)

ps: I had sucessfully run rosetta with 64 MB RAM
*Only with that low ram I get about 70 page faults by second

but I am getting above message on PCs with 256 MB physical RAM -:(

Yeah !
Really this %done indicator is not working smooth

ps: on rosetta too,
I had already aborted a WU by hand, cause this;I believed that it was stuck -:(
It was with about 3 hours cpu time, at 1.00n%

My preference run time is 2 hours, and boincview was prediting 190 hours
of cpu time to completion -;

ps: I really dont want to run a single job by 190 Hours :!:

Thanks
Click signature for global team stats
ID: 1835 · Report as offensive    Reply Quote
william

Send message
Joined: 3 Jun 06
Posts: 4
Credit: 74
RAC: 0
Message 1836 - Posted: 17 Jun 2006, 22:20:50 UTC - in response to Message 1835.  

1)I don't know if this was not mentioned early, but the progres counting don't work well.
6 min ... 1.020%
24 min ... 1.041% (8:21:56 to completion)
40 min ... 1.042% (8:38:22 to completion)
1:09:30 ... completion
(I saw this also with 5.22)

2)More memory need?
12/06/2006 09:42:53|ralph@home|Message from server: Your computer has only 402116608 bytes of memory; workunit requires 97883392 more bytes


I think there is no need for more momory -:)
Jobs are finishing OK, w/o any errors, in normal run time !

What is need is to "fix" this misleading message on both: (ralph and rosetta)

ps: I had sucessfully run rosetta with 64 MB RAM
*Only with that low ram I get about 70 page faults by second

but I am getting above message on PCs with 256 MB physical RAM -:(

Yeah !
Really this %done indicator is not working smooth

ps: on rosetta too,
I had already aborted a WU by hand, cause this;I believed that it was stuck -:(
It was with about 3 hours cpu time, at 1.00n%

My preference run time is 2 hours, and boincview was prediting 190 hours
of cpu time to completion -;

ps: I really dont want to run a single job by 190 Hours :!:

Thanks

I think Ralph needs to work on the Progress bar area of the program. I went thur no problems, It was the first time I saw that from this project is all. so I thought I would say something but if that is normal then nothing to worry about.


ID: 1836 · Report as offensive    Reply Quote
1 · 2 · Next

Message boards : RALPH@home bug list : Bug reports for Ralph 5.23



©2024 University of Washington
http://www.bakerlab.org