Message boards : RALPH@home bug list : Bug reports for Ralph 5.03
Author | Message |
---|---|
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
We've tried to make the watchdog a little less aggressive about aborting, and are having it give us back the reason for aborting. Let us know if you think these jobs are getting killed too soon, or too late. Thanks! |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Is anyone out there running with a Mac? Are your jobs from 5.02 or 5.03 running? We've tried to make the watchdog a little less aggressive about aborting, and are having it give |
tralala Send message Joined: 12 Apr 06 Posts: 52 Credit: 15,257 RAC: 0 |
This WU was aborted by the watchdog on another machine but fished ok on my machine: https://ralph.bakerlab.org/workunit.php?wuid=82603 Do you still receive the finished models if the watchdog kills a WU which gest stuck on model x? |
Snake Doctor Send message Joined: 16 Feb 06 Posts: 37 Credit: 998,880 RAC: 0 |
Is anyone out there running with a Mac? Are your jobs from 5.02 or 5.03 running? I am running on Macs. I have a G4 Dual that has a 5.03 job. And a G4 Laptop that was running 2, 5.01 jobs. One of the jobs on the Laptop hung at 1.4295% for 12 hours, I restarted BOINC and it Erred, but I cant get it to report (it is still stuck on my Tasks tab) I have a second on that machine that I aborted and it is stuck in the task tab too. This may be a boinc thing. I had upgraded to boinc 5.4.4 per instructions from Rom for error checking. It looked ok but it is really not running right. Anyway. the 5.03 WU on the G4 seems to be running fine. I changed the run time settings for it last night to 4 hours, and set the system to "remove apps from memory, to bang on it a while. It is at about 90% after 3:58 CPU time. I looked at the graphics last night and it seemed to be fine. EDIT/UPDATE - The two WU stuck in my task tab finally reported here is the one that was stuck for 12 hours. here is the one I aborted manually. Regards Phil |
Divide Overflow Send message Joined: 15 Feb 06 Posts: 12 Credit: 128,027 RAC: 0 |
|
Snake Doctor Send message Joined: 16 Feb 06 Posts: 37 Credit: 998,880 RAC: 0 |
(sorry for the second post, Darned 1 hour edit limit) Well my MAC G4 reported in this result for the only 5.03 WU I have had. It looks very normal to me. I do know the graphics were working (in fact they seemed faster somehow). I had no problems that I am aware of. Regards Phil |
Fuzzy Hollynoodles Send message Joined: 19 Feb 06 Posts: 37 Credit: 2,089 RAC: 0 |
I had this one: https://ralph.bakerlab.org/workunit.php?wuid=83796 Result: https://ralph.bakerlab.org/result.php?resultid=94327 I looked to the graphic when I saw it running, and it was stuck at about 1% without any movements at all. So I suppose the watchdog did it's job by killing it after some time. It ran about 80 minutes on my computer. It could have been killed a little sooner, I think, as it was totally dead, when I looked after about 45 minutes. I see the others, who ran it, did it in less time before it was killed. [color=navy][b]"I'm trying to maintain a shred of dignity in this world." - Me[/b][/color] |
MatthewBChambers Send message Joined: 13 Mar 06 Posts: 4 Credit: 5,367 RAC: 0 |
Host ID: https://ralph.bakerlab.org/show_host_detail.php?hostid=2404 Result ID: https://ralph.bakerlab.org/result.php?resultid=94285 Here is my 5.03 bug (in Windows XP, full details to follow): 4/23/2006 12:59:18 PM|ralph@home|Unrecoverable error for result NO_CHECK_NO_DOG_7486h002_dec123_1.pdb_408_5_2 (<file_xfer_error> <file_name>NO_CHECK_NO_DOG_7486h002_dec123_1.pdb_408_5_2_0</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error>) Here is the context: 4/23/2006 12:36:34 PM|ralph@home|Resuming result NO_CHECK_NO_DOG_7486h002_dec123_1.pdb_408_5_2 using rosetta_beta version 503 4/23/2006 12:36:34 PM|boincsimap|Pausing result 60420100.007375_0 (left in memory) 4/23/2006 12:59:15 PM|ralph@home|Sending scheduler request to https://ralph.bakerlab.org/ralph_cgi/cgi 4/23/2006 12:59:15 PM|ralph@home|Reason: To fetch work 4/23/2006 12:59:15 PM|ralph@home|Requesting 43200 seconds of new work 4/23/2006 12:59:17 PM||request_reschedule_cpus: process exited 4/23/2006 12:59:17 PM|ralph@home|Computation for result NO_CHECK_NO_DOG_7486h002_dec123_1.pdb_408_5_2 finished 4/23/2006 12:59:17 PM|Predictor @ Home|Resuming result abeta_7_135392_2 using mfoldB125 version 428 4/23/2006 12:59:18 PM|ralph@home|Unrecoverable error for result NO_CHECK_NO_DOG_7486h002_dec123_1.pdb_408_5_2 (<file_xfer_error> <file_name>NO_CHECK_NO_DOG_7486h002_dec123_1.pdb_408_5_2_0</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error>) 4/23/2006 12:59:20 PM|ralph@home|Scheduler request to https://ralph.bakerlab.org/ralph_cgi/cgi succeeded 4/23/2006 12:59:21 PM|ralph@home|No work from project 4/23/2006 1:03:26 PM|ralph@home|Sending scheduler request to https://ralph.bakerlab.org/ralph_cgi/cgi 4/23/2006 1:03:26 PM|ralph@home|Reason: To fetch work 4/23/2006 1:03:26 PM|ralph@home|Requesting 43200 seconds of new work, and reporting 1 results 4/23/2006 1:03:31 PM|ralph@home|Scheduler request to https://ralph.bakerlab.org/ralph_cgi/cgi succeeded 4/23/2006 1:03:31 PM|ralph@home|No work from project Here is the startup info for the computer: 4/15/2006 8:22:55 PM||Starting BOINC client version 5.2.13 for windows_intelx86 4/15/2006 8:22:55 PM||libcurl/7.14.0 OpenSSL/0.9.8 zlib/1.2.3 4/15/2006 8:22:55 PM||Data directory: C:Program FilesBOINC 4/15/2006 8:22:56 PM||Processor: 1 GenuineIntel x86 Family 6 Model 8 Stepping 6 863MHz 4/15/2006 8:22:56 PM||Memory: 383.30 MB physical, 1.29 GB virtual 4/15/2006 8:22:56 PM||Disk: 24.41 GB total, 19.33 GB free 4/15/2006 8:22:56 PM|rosetta@home|Computer ID: 197494; location: home; project prefs: default 4/15/2006 8:22:56 PM|boincsimap|Computer ID: 17955; location: home; project prefs: default 4/15/2006 8:22:56 PM|Einstein@Home|Computer ID: 594228; location: home; project prefs: default 4/15/2006 8:22:56 PM|LHC@home|Computer ID: 142531; location: home; project prefs: default 4/15/2006 8:22:56 PM|Predictor @ Home|Computer ID: 237773; location: home; project prefs: default 4/15/2006 8:22:56 PM|ralph@home|Computer ID: 2404; location: home; project prefs: default 4/15/2006 8:22:56 PM|SETI@home|Computer ID: 2330542; location: home; project prefs: default 4/15/2006 8:22:56 PM|SZTAKI Desktop Grid|Computer ID: 17392; location: home; project prefs: default 4/15/2006 8:22:56 PM|World Community Grid|Computer ID: 31989; location: ; project prefs: default 4/15/2006 8:22:56 PM||General prefs: from ralph@home (last modified 2006-04-15 20:06:57) 4/15/2006 8:22:56 PM||General prefs: no separate prefs for home; using your defaults 4/15/2006 8:22:57 PM||Remote control not allowed; using loopback address |
Fuzzy Hollynoodles Send message Joined: 19 Feb 06 Posts: 37 Credit: 2,089 RAC: 0 |
My next one ran without erroring out, so I don't know if there was any watchdog or it was supposed to run normal. https://ralph.bakerlab.org/workunit.php?wuid=83916 Result: https://ralph.bakerlab.org/result.php?resultid=94405 [color=navy][b]"I'm trying to maintain a shred of dignity in this world." - Me[/b][/color] |
casio7131 Send message Joined: 20 Mar 06 Posts: 15 Credit: 12,660 RAC: 0 |
24/04/2006 3:14:57 AM|ralph@home|Unrecoverable error for result NOCHECK_DEFAULT_DOG_7486h002_dec184_1.pdb_418_5_0 (<file_xfer_error> <file_name>NOCHECK_DEFAULT_DOG_7486h002_dec184_1.pdb_418_5_0_0</file_name> <error_code>-161</error_code></file_xfer_error>) https://ralph.bakerlab.org/result.php?resultid=94190 |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
A new error for me. https://ralph.bakerlab.org/result.php?resultid=94349 This was error no 2 on this WU. And another one https://ralph.bakerlab.org/result.php?resultid=94350 Anders n Edit no 2 |
Mike Gelvin Send message Joined: 17 Feb 06 Posts: 50 Credit: 55,397 RAC: 0 |
24/04/2006 3:14:57 AM|ralph@home|Unrecoverable error for result NOCHECK_DEFAULT_DOG_7486h002_dec184_1.pdb_418_5_0 (<file_xfer_error> <file_name>NOCHECK_DEFAULT_DOG_7486h002_dec184_1.pdb_418_5_0_0</file_name> <error_code>-161</error_code></file_xfer_error>) I thought credit was supposed to be granted on these? |
Moderator9 Volunteer moderator Send message Joined: 16 Feb 06 Posts: 251 Credit: 0 RAC: 0 |
24/04/2006 3:14:57 AM|ralph@home|Unrecoverable error for result NOCHECK_DEFAULT_DOG_7486h002_dec184_1.pdb_418_5_0 (<file_xfer_error> <file_name>NOCHECK_DEFAULT_DOG_7486h002_dec184_1.pdb_418_5_0_0</file_name> <error_code>-161</error_code></file_xfer_error>) The system IS supposed to award the claimed credits, for "Watchdog" terminated Work Units. But the ones I have seen so far have always had some model information reported back. Yours seems to have a 161 error implying that something file related is in play. Rhiju will have to explain why it did not get awarded. As you may recall in RALPH the credit will not be awarded after the fact, but we still need to know why it did not get awarded in the first place before this deploys to Rosetta. Moderator9 RALPH@home FAQs RALPH@home Guidelines Moderator Contact |
Yeti Send message Joined: 19 Feb 06 Posts: 32 Credit: 316,371 RAC: 853 |
|
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Thanks for the post; moderator 9 is right about what happened. Sorry about the annoying file transfer error -- I've fixed it. We will be testing it on ralph 5.04 later today. 24/04/2006 3:14:57 AM|ralph@home|Unrecoverable error for result NOCHECK_DEFAULT_DOG_7486h002_dec184_1.pdb_418_5_0 ( |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
GREAT! I actually forced an infinite loop in that one. Very glad it was killed by the watchdog. I had this one: https://ralph.bakerlab.org/workunit.php?wuid=83796 |
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.03
©2024 University of Washington
http://www.bakerlab.org