| Author | Message |
|
|
|
This version has a very slight change that will fix a bug in checkpointing that occurs a small fraction of the time in ABRELAX workunits.
____________
|
|
|
|
|
|
Hmmmm here\'s an odd one...
WU 204303 .
it\'s been so long since i had to do this I forgot how to format it!
Keep up the good work!!!!!!!!
|
|
|
|
|
|
Incorrect function. (0x1) - exit code 1 (0x1)
ERROR:: Exit at: .\\fragments.cc line:689
http://ralph.bakerlab.org/result.php?resultid=207053
SIGSEGV: segmentation violation
http://ralph.bakerlab.org/result.php?resultid=205541
____________
Click signature for global team stats
  |
|
|
|
|
|
Just noticed that on the last one I reported it was sent to three folks and all three got bitten by the watch-dog. Here\'s another from today:
WU 180623 |
|
|
|
|
Just noticed that on the last one I reported it was sent to three folks and all three got bitten by the watch-dog. Here\'s another from today:
WU 180623
The real error was this one
WARNING! attempt to gzip file .\\aat329.out failed: file does not exist.
May be someone will \"fix\" the WU generator script ? (server side)
Thanks
____________
Click signature for global team stats
  |
|
|
|
|
|
stuck at 1.044 %
Stage Ab initio + relax
step 341224
http://ralph.bakerlab.org/result.php?resultid=207703 -:(
____________
Click signature for global team stats
  |
|
|
|
|
|
I do not know if this is a bug but all my current WU\'s seem to be stuck at 1.623% until about 90 minutes in and then they finished successfully.
Ni
____________
|
|
|
|
|
|
Had 3 out of 4 workunits fail in a very short time with the error \"the system failed to find the path specified\".
All workunits started with \"FRA_t329_CASP7\".
The workunits are :-
http://ralph.bakerlab.org/workunit.php?wuid=185176
http://ralph.bakerlab.org/workunit.php?wuid=185177
http://ralph.bakerlab.org/workunit.php?wuid=185178
Other workunit seems to be ok, it is a t347_CASP7.
Hope this is of help as the correction from 5.24 needs correcting.

____________
 |
|
|
|
|
Had 3 out of 4 workunits fail in a very short time with the error \"the system failed to find the path specified\".
All workunits started with \"FRA_t329_CASP7\".
The workunits are :-
http://ralph.bakerlab.org/workunit.php?wuid=185176
http://ralph.bakerlab.org/workunit.php?wuid=185177
http://ralph.bakerlab.org/workunit.php?wuid=185178
Other workunit seems to be ok, it is a t347_CASP7.
Hope this is of help as the correction from 5.24 needs correcting.
>>> Spoke to soon,
workunit type t347_CASP7 http://ralph.bakerlab.org/workunit.php?wuid=177477
also failed but with error \"unhandled exception detected\" access violation.

____________
 |
|
|
|
|
|
Woke up to a screensaver and a \"runtime\" error box on my AMD64 3700. It\'s the first time I\'ve seen this one. it looked like this

So I hit printscreen and pasted it to Photoshop, before I could finish editing the photo it happened again as can be seen below.

Here\'s what my Boinc Manager looked like:

And the WUs were wuid=185158 and wuid=185157. I noticed one other user had an error with these same WUs before they were issued to me. Here\'s the Result ID\'s:
Result ID 209014
Name FRA_t329_CASP7_hom001_8_858_3_1
Workunit 185157
Created 5 Jul 2006 4:25:27 UTC
Sent 5 Jul 2006 4:25:38 UTC
Received 5 Jul 2006 11:00:31 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status 3 (0x3)
Computer ID 2172
Report deadline 9 Jul 2006 4:25:38 UTC
CPU time 99.3125
stderr out <core_client_version>5.5.4</core_client_version>
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 2970327
</stderr_txt>
Validate state Invalid
Claimed credit 0.389839542825192
Granted credit 0
application version 5.25
and
Result ID 209015
Name FRA_t329_CASP7_hom001_8_858_4_1
Workunit 185158
Created 5 Jul 2006 4:25:27 UTC
Sent 5 Jul 2006 4:25:38 UTC
Received 5 Jul 2006 11:00:31 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status 3 (0x3)
Computer ID 2172
Report deadline 9 Jul 2006 4:25:38 UTC
CPU time 72.921875
stderr out <core_client_version>5.5.4</core_client_version>
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
# random seed: 2970326
</stderr_txt>
Validate state Invalid
Claimed credit 0.286246247068151
Granted credit 0
application version 5.25
|
|
|
|
|
|
Windows Runtime Error.
http://ralph.bakerlab.org/workunit.php?wuid=185165
Result: http://ralph.bakerlab.org/result.php?resultid=209160
____________
"I'm trying to maintain a shred of dignity in this world." - Me
 |
|
|
|
|
|
I just got the third one of these. The three WUs were:
FRA_t329_CASP7_hom001_8_858_3
FRA_t329_CASP7_hom001_8_858_4
FRA_t329_CASP7_hom001_8_858_5
I think I see a pattern. lol
Each WU I did had previously failed for one other user, prior to them failing for me.
tony |
|
|
|
|
|
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
http://ralph.bakerlab.org/result.php?resultid=213243
Along with a windows popup that left my pc IDLE for hours !
Hours cause this pc is monitored eventually.
*in case of real remote pc trully unmonitored
this could left that pc not crunching anything more forever,
-or- maybe only until that pc nobreak breaks, and a reboot do occurs.
*in case of a crunching pc on a commercial company,
that popup may cause the bigboss ask to stop crunching anything
on all u own co. pcs, cause this popup disturbs their employee works!
So,
please, avoid that windows popups triggered by app errors
Thanks
____________
Click signature for global team stats
  |
|
|
|
|
|
WU download error: couldn\'t get input files:
http://ralph.bakerlab.org/result.php?resultid=212229
*Any problem with ralph servers ??? |
|
|
|
|
|
http://ralph.bakerlab.org/result.php?resultid=216027 WARNING! attempt to gzip file .\\xxt319.out failed: file does not exist.
____________
|
|
|
|
|
|
same thing happened here:
http://ralph.bakerlab.org/result.php?resultid=216029
http://ralph.bakerlab.org/result.php?resultid=216028
Both contain: WARNING! attempt to gzip file .\\xxt319.out failed: file does not exist.
____________
|
|
|
|
|
|
WARNING! attempt to gzip file .\\xxt319.out failed: file does not exist.
Same thing here, I\'ve had 11 of them reporting the same message the last few days after they seemingly have run their full course of about 1 hour ... O_o
|
|
|
|
|
|
The App seems to be okay now, out of 154 WU\'s run the last 2 days only 2 have Erred out, and both of them were at the beginning of the 154 WU\'s ... |
|
|
|
|
|
WU failed with this pop-up

BOINC ran a single thread on a dual-core until I clicked OK, then this message was displayed in BOINC Manager, and another WU began.
7/19/2006 9:23:25 AM|ralph@home|Unrecoverable error for result t353_LOOPRELAX_hom002_S_00001_0004344_0_1030_4_1 (The system cannot find the path specified. (0x3) - exit code 3 (0x3))
WU report shows:
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
____________
|
|
|
|
|
|
Got 2 failiurs \"WU download error: couldn\'t get input files:\"
http://ralph.bakerlab.org/results.php?hostid=118
Rosetta works fine.
Anders n
____________
|
|
|
|
|
|
just got this one
21/07/2006 21:39:02|ralph@home|Giving up on download of hom011_S_00001_0000033_1.obligate_loopfile.gz: file was not found on server
21/07/2006 21:39:02|ralph@home|Checksum or signature error for hom011_S_00001_0000033_1.obligate_loopfile.gz
21/07/2006 21:39:04|ralph@home|Unrecoverable error for result t353_LOOPRELAX_hom011_S_00001_0000033_1_1030_15_2 (WU download error: couldn\'t get input files:<file_xfer_error> <file_name>hom011_S_00001_0000033_1.obligate_loopfile.gz</file_name> <error_code>-163</error_code> <error_message>file was not found on server</error_message></file_xfer_error>)
____________
 |
|
|
|
|
|
What\'s the difference between Rosetta 5.25 and Ralph\'s Rosetta_Beta 5.25 ?
--- bt |
|
|
|
|
|
Isn\'t it almost same one with a little configuration change?
Wait for answer until developers finish the work on CASP.
____________
|
|
|
|
|
|
Ralph 5.25 has terminated this WU on 96.64%.
stderr.log:
Exiting...
Graphics are disabled due to configuration...
# cpu_run_time_pref: 345600
Graphics are disabled due to configuration...
# cpu_run_time_pref: 345600
Graphics are disabled due to configuration...
# cpu_run_time_pref: 345600
SIGSEGV: segmentation violation
Stack trace (20 frames):
[0x8849d1b]
[0x8861dcc]
[0xffffe420]
[0x88e40a9]
[0x88b2de7]
[0x88b51d1]
[0x809f31c]
[0x86b4609]
[0x86bab32]
[0x84b14f8]
[0x84b343b]
[0x84b6573]
[0x84b8231]
[0x87e6e77]
[0x86c3ac7]
[0x805f7c1]
[0x846e09d]
[0x8470594]
[0x88c12b4]
[0x8048111]
stdout.txt:
[T/F OPT]Default FALSE value for [-minimize_exclude_helix]
[T/F OPT]Default FALSE value for [-minimize_exclude_strand]
CYCLES::number is 1 x total_residue: 86
initializing full atom coordinates
BOINC :: [2006-07-23 23:31:01] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 305 :: num_decoys: 305 :: farlx_stage: 10
dump_fullatom_pdb: farlxcheck
starting score 4429.49316 rms 0
starting full atom minimization
[T/F OPT]Default FALSE value for [-infinite_loop]
score_filter: tag= relax_score_filter1 score= -112.872 rank= 91 max_rank= 95 nscores= 190 filter_score= -112.28
Suse Linux 10.1 on Athlon 2400+
____________
|
|
|
|
|
|
not exactly sure this is a bug, but I noticed that a couple of the t372 units ran longer than expected on my 1ghz win xp machine:
resid 231586 ran for 19,675 secs and
resid 231585 ran for 22,102 secs.
Both were still on the first structure, so maybe these guys need to be restricted to the faster machines?
[edit:] I guess I should have mentioned - run-time pref is set to 4 hours (14,400 secs)
____________
 |
|
|
|
|
|
As an update to my last, I have another of those t372 units running on the same machine. CPU time 4:18, pct complete 1.044, est to completion 10:12 and climbing. i\'m running 50/50 with rosie, but machine is in EDF mode running ralph\'s exclusively as I have four other projects set to no-new-work so the scheduler thinks ralph/rosie are only getting 16-2/3 pct of the cpu each and won\'t be able to finish two days work in the next week or so... oh wellll... |
|
|
|
|
|
I just had three workunits with screwed up downloads, all the same error. It looks like everyone else downloading had the same problem.
Units are:
http://ralph.bakerlab.org/workunit.php?wuid=205131
http://ralph.bakerlab.org/workunit.php?wuid=205144
http://ralph.bakerlab.org/workunit.php?wuid=205143
Messages below
************************************************************************
27/07/2006 1:32:14 PM|ralph@home|Sending scheduler request to http://ralph.bakerlab.org/ralph_cgi/cgi
27/07/2006 1:32:14 PM|ralph@home|Reason: To fetch work
27/07/2006 1:32:14 PM|ralph@home|Requesting 43200 seconds of new work
27/07/2006 1:32:34 PM|ralph@home|Scheduler request succeeded
27/07/2006 1:32:36 PM|ralph@home|Started download of file bq_cterm_hom001_t386_.fasta.gz
27/07/2006 1:32:36 PM|ralph@home|Started download of file bq_cterm_hom001_t386_.psipred_ss2.gz
27/07/2006 1:32:41 PM|ralph@home|Finished download of file bq_cterm_hom001_t386_.fasta.gz
27/07/2006 1:32:41 PM|ralph@home|Throughput 37 bytes/sec
27/07/2006 1:32:41 PM|ralph@home|Finished download of file bq_cterm_hom001_t386_.psipred_ss2.gz
27/07/2006 1:32:41 PM|ralph@home|Throughput 286 bytes/sec
27/07/2006 1:32:41 PM|ralph@home|Started download of file boinc_bq_cterm_hom001_aat386_03_05.200_v1_3.gz
27/07/2006 1:32:41 PM|ralph@home|Started download of file boinc_bq_cterm_hom001_aat386_09_05.200_v1_3.gz
27/07/2006 1:33:09 PM|ralph@home|Finished download of file boinc_bq_cterm_hom001_aat386_09_05.200_v1_3.gz
27/07/2006 1:33:09 PM|ralph@home|Throughput 7287 bytes/sec
27/07/2006 1:33:09 PM|ralph@home|Started download of file bq_cterm_hom001_killlocal.bar.gz
27/07/2006 1:33:15 PM|ralph@home|Incomplete read of less than 5KB for bq_cterm_hom001_killlocal.bar.gz - truncating
27/07/2006 1:33:15 PM|ralph@home|Finished download of file bq_cterm_hom001_killlocal.bar.gz
27/07/2006 1:33:15 PM|ralph@home|Throughput 33 bytes/sec
27/07/2006 1:33:15 PM|ralph@home|Started download of file casp7.description.shorter.txt
27/07/2006 1:33:15 PM|ralph@home|Checksum or signature error for bq_cterm_hom001_killlocal.bar.gz
27/07/2006 1:33:16 PM|ralph@home|Unrecoverable error for result t386__CASP7_ABRELAX_SAVE_ALL_OUT_BARCODE_bq_cterm_hom001__1060_5_2 (WU download error: couldn\'t get input files:<file_xfer_error> <file_name>bq_cterm_hom001_killlocal.bar.gz</file_name> <error_code>-200</error_code></file_xfer_error>)
27/07/2006 1:33:24 PM|ralph@home|Finished download of file casp7.description.shorter.txt
27/07/2006 1:33:24 PM|ralph@home|Throughput 10 bytes/sec
27/07/2006 1:33:24 PM|ralph@home|Started download of file bq_cterm_hom002_t386_.fasta.gz
27/07/2006 1:33:32 PM|ralph@home|Finished download of file bq_cterm_hom002_t386_.fasta.gz
27/07/2006 1:33:32 PM|ralph@home|Throughput 20 bytes/sec
27/07/2006 1:33:32 PM|ralph@home|Started download of file bq_cterm_hom002_t386_.psipred_ss2.gz
27/07/2006 1:33:39 PM|ralph@home|Finished download of file bq_cterm_hom002_t386_.psipred_ss2.gz
27/07/2006 1:33:39 PM|ralph@home|Throughput 155 bytes/sec
27/07/2006 1:33:39 PM|ralph@home|Started download of file boinc_bq_cterm_hom002_aat386_03_05.200_v1_3.gz
27/07/2006 1:33:41 PM|ralph@home|Finished download of file boinc_bq_cterm_hom001_aat386_03_05.200_v1_3.gz
27/07/2006 1:33:41 PM|ralph@home|Throughput 13205 bytes/sec
27/07/2006 1:33:41 PM|ralph@home|Started download of file boinc_bq_cterm_hom002_aat386_09_05.200_v1_3.gz
27/07/2006 1:34:09 PM|ralph@home|Finished download of file boinc_bq_cterm_hom002_aat386_09_05.200_v1_3.gz
27/07/2006 1:34:09 PM|ralph@home|Throughput 7793 bytes/sec
27/07/2006 1:34:09 PM|ralph@home|Started download of file bq_cterm_hom002_killlocal.bar.gz
27/07/2006 1:34:17 PM|ralph@home|Incomplete read of less than 5KB for bq_cterm_hom002_killlocal.bar.gz - truncating
27/07/2006 1:34:17 PM|ralph@home|Finished download of file bq_cterm_hom002_killlocal.bar.gz
27/07/2006 1:34:17 PM|ralph@home|Throughput 27 bytes/sec
27/07/2006 1:34:17 PM|ralph@home|Checksum or signature error for bq_cterm_hom002_killlocal.bar.gz
27/07/2006 1:34:18 PM|ralph@home|Unrecoverable error for result t386__CASP7_ABRELAX_SAVE_ALL_OUT_BARCODE_bq_cterm_hom002__1060_5_2 (WU download error: couldn\'t get input files:<file_xfer_error> <file_name>bq_cterm_hom002_killlocal.bar.gz</file_name> <error_code>-200</error_code></file_xfer_error>)
27/07/2006 1:34:38 PM|ralph@home|Finished download of file boinc_bq_cterm_hom002_aat386_03_05.200_v1_3.gz
27/07/2006 1:34:38 PM|ralph@home|Throughput 13514 bytes/sec
27/07/2006 1:35:03 PM|rosetta@home|Sending scheduler request to http://boinc.bakerlab.org/rosetta_cgi/cgi
27/07/2006 1:35:03 PM|rosetta@home|Reason: Requested by user
27/07/2006 1:35:03 PM|rosetta@home|Reporting 1 tasks
27/07/2006 1:35:13 PM|rosetta@home|Scheduler request succeeded
27/07/2006 1:35:18 PM|ralph@home|Sending scheduler request to http://ralph.bakerlab.org/ralph_cgi/cgi
27/07/2006 1:35:18 PM|ralph@home|Reason: Requested by user
27/07/2006 1:35:18 PM|ralph@home|Requesting 43200 seconds of new work, and reporting 2 completed tasks
27/07/2006 1:35:28 PM|ralph@home|Scheduler request succeeded
27/07/2006 1:35:28 PM|ralph@home|Message from server: Not sending work - last request too recent: 181 sec
27/07/2006 1:39:34 PM|ralph@home|Sending scheduler request to http://ralph.bakerlab.org/ralph_cgi/cgi
27/07/2006 1:39:34 PM|ralph@home|Reason: To fetch work
27/07/2006 1:39:34 PM|ralph@home|Requesting 43200 seconds of new work
27/07/2006 1:39:59 PM|ralph@home|Scheduler request succeeded
27/07/2006 1:40:02 PM|ralph@home|Started download of file bq_cterm_hom001_t386_.fasta.gz
27/07/2006 1:40:03 PM|ralph@home|Started download of file bq_cterm_hom001_t386_.psipred_ss2.gz
27/07/2006 1:40:09 PM|ralph@home|Finished download of file bq_cterm_hom001_t386_.fasta.gz
27/07/2006 1:40:09 PM|ralph@home|Throughput 29 bytes/sec
27/07/2006 1:40:09 PM|ralph@home|Finished download of file bq_cterm_hom001_t386_.psipred_ss2.gz
27/07/2006 1:40:09 PM|ralph@home|Throughput 222 bytes/sec
27/07/2006 1:40:09 PM|ralph@home|Started download of file boinc_bq_cterm_hom001_aat386_03_05.200_v1_3.gz
27/07/2006 1:40:09 PM|ralph@home|Started download of file boinc_bq_cterm_hom001_aat386_09_05.200_v1_3.gz
27/07/2006 1:40:37 PM|ralph@home|Finished download of file boinc_bq_cterm_hom001_aat386_09_05.200_v1_3.gz
27/07/2006 1:40:37 PM|ralph@home|Throughput 7517 bytes/sec
27/07/2006 1:40:37 PM|ralph@home|Started download of file bq_cterm_hom001_killlocal.bar.gz
27/07/2006 1:40:44 PM|ralph@home|Incomplete read of less than 5KB for bq_cterm_hom001_killlocal.bar.gz - truncating
27/07/2006 1:40:44 PM|ralph@home|Finished download of file bq_cterm_hom001_killlocal.bar.gz
27/07/2006 1:40:44 PM|ralph@home|Throughput 32 bytes/sec
27/07/2006 1:40:44 PM|ralph@home|Started download of file casp7.description.shorter.txt
27/07/2006 1:40:44 PM|ralph@home|Checksum or signature error for bq_cterm_hom001_killlocal.bar.gz
27/07/2006 1:40:45 PM|ralph@home|Unrecoverable error for result t386__CASP7_ABRELAX_SAVE_ALL_OUT_BARCODE_bq_cterm_hom001__1060_3_3 (WU download error: couldn\'t get input files:<file_xfer_error> <file_name>bq_cterm_hom001_killlocal.bar.gz</file_name> <error_code>-200</error_code></file_xfer_error>)
27/07/2006 1:40:50 PM|ralph@home|Finished download of file casp7.description.shorter.txt
27/07/2006 1:40:50 PM|ralph@home|Throughput 16 bytes/sec
27/07/2006 1:40:50 PM|ralph@home|Started download of file nohistag_hom001_t363_.fasta.gz
27/07/2006 1:40:58 PM|ralph@home|Finished download of file nohistag_hom001_t363_.fasta.gz
27/07/2006 1:40:58 PM|ralph@home|Throughput 19 bytes/sec
27/07/2006 1:40:58 PM|ralph@home|Started download of file nohistag_hom001_t363_.psipred_ss2.gz
27/07/2006 1:41:06 PM|ralph@home|Finished download of file nohistag_hom001_t363_.psipred_ss2.gz
27/07/2006 1:41:06 PM|ralph@home|Throughput 136 bytes/sec
27/07/2006 1:41:06 PM|ralph@home|Started download of file boinc_nohistag_hom001_aat363_03_05.200_v1_3.gz
27/07/2006 1:41:10 PM|ralph@home|Finished download of file boinc_bq_cterm_hom001_aat386_03_05.200_v1_3.gz
27/07/2006 1:41:10 PM|ralph@home|Throughput 13346 bytes/sec
27/07/2006 1:41:10 PM|ralph@home|Started download of file boinc_nohistag_hom001_aat363_09_05.200_v1_3.gz
27/07/2006 1:41:12 PM|ralph@home|Sending scheduler request to http://ralph.bakerlab.org/ralph_cgi/cgi
27/07/2006 1:41:12 PM|ralph@home|Reason: Requested by user
27/07/2006 1:41:12 PM|ralph@home|Reporting 1 tasks
27/07/2006 1:41:27 PM|ralph@home|Scheduler request succeeded
27/07/2006 1:41:41 PM|ralph@home|Finished download of file boinc_nohistag_hom001_aat363_09_05.200_v1_3.gz
27/07/2006 1:41:41 PM|ralph@home|Throughput 6360 bytes/sec
27/07/2006 1:41:41 PM|ralph@home|Started download of file hom020_S_00004_0000864_0.pdb.gz
27/07/2006 1:41:53 PM|ralph@home|Finished download of file hom020_S_00004_0000864_0.pdb.gz
27/07/2006 1:41:53 PM|ralph@home|Throughput 972 bytes/sec
27/07/2006 1:41:53 PM|ralph@home|Started download of file hom020_S_00004_0000864_0.loopfile.gz
27/07/2006 1:42:00 PM|ralph@home|Finished download of file hom020_S_00004_0000864_0.loopfile.gz
27/07/2006 1:42:00 PM|ralph@home|Throughput 13 bytes/sec
27/07/2006 1:42:00 PM|ralph@home|Started download of file hom020_S_00004_0000864_0.obligate_loopfile.gz
27/07/2006 1:42:07 PM|ralph@home|Finished download of file boinc_nohistag_hom001_aat363_03_05.200_v1_3.gz
27/07/2006 1:42:07 PM|ralph@home|Throughput 13009 bytes/sec
27/07/2006 1:42:07 PM|ralph@home|Finished download of file hom020_S_00004_0000864_0.obligate_loopfile.gz
27/07/2006 1:42:07 PM|ralph@home|Throughput 9 bytes/sec
27/07/2006 1:42:08 PM||Rescheduling CPU: files downloaded
27/07/2006 1:42:08 PM|QMC@HOME|Pausing task 03B_stdna_nodelete.1749_0 (left in memory)
27/07/2006 1:42:09 PM|ralph@home|Starting task t363_LOOPRELAX_hom020_S_00004_0000864_0_1077_1_0 using rosetta_beta version 525
____________
|
|
|
|
|
|
There\'s a Rosetta user getting this:
Incomplete read of less than 5KB for...
error as well. If a resolution is found, please inform them as well.
____________
|
|
|
|
|
|
looks like all 3 of us that got this w/u have had the same error
bad w/u?
<core_client_version>5.4.9</core_client_version>
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
ERROR:: Exit at: .\\pack.cc line:7839
____________
 |
|
|
|
|
|
Well, at least WatchDog is working.
Two people that got this WU got the Watchdog error. The third got the Incorrect function. (0x1) - exit code 1 (0x1) before a minute had passed.
____________
|
|
|
|
|
|
I am running Linux on an AMD Opteron machine that had 2 lots of file downloads fail within seconds of the work units starting.
All had the Error \" process exited with code 1 (0x1)
ERROR:: Exit at: pack.cc line.7839 \"
Had 24 fail on the 29/7/06 and 14 fail on the 1/8/06. There were no successful units processed, all failed.
All started with \"t347_CASP7_ABRELAX_SAVE_ALL_OUT_hom001_1087_\"
then the last parts are :-- 14_2, 15_2, 16_2, 17_2, 20_2, 132_1, 133_1, 21_2, 31_2, 32_2, 134_1, 22_2, 23_2, 24_2, 136_1, 137_1, 138_1, 37_2, 139_1, 141_1, 143_1, 144_1, 145_1, 146_1.
Second lot started the same but ended with :-- 163_2, 164_2, 165_2, 166_2, 167_2, 168_2, 169_2, 183_2, 184_2, 185_2, 186_2, 187_2, 188_2, 189_2.
____________
 |
|
|
|
|
|
Got another 2 Segmentation Violation errors
http://ralph.bakerlab.org/workunit.php?wuid=211463
http://ralph.bakerlab.org/workunit.php?wuid=211462
Also had problem with another WU as well
http://ralph.bakerlab.org/workunit.php?wuid=212403
it had Process exit code 1
ERROR:Exit at dock_structure.cc line:401
____________
 |
|
|
|
|
|
Project people, when you updated your Ralph@home project to show the new Credit System totals, myself and a number of other testers have been unable to upload WU results. The WU uploads but the results do not. Our time is running out to have these returned on time and I have about 46 to return.
Please see thread about \"Internal Server Error\".
I, myself am getting a \"No Schedulers Responded\" error, as are a few others, but some are getting an \"Internal Server\" error.
Please be prompt in repairing this as the deadline is only a day or so away.
It has been 2 days now since the fault started.
____________
 |
|
|
|
|
|
I\'ve Uploaded @ Reported 10 WU\'s Just this morning ... are you sure it\'s on Ralph\'s end & not yours ... ???? |
|
|
|
|
|
Thanks PoorBoy, the problem was at the Ralph end. It has taken 2 days to fix. I was not the only one having the problem and it has now been fixed between my last post and this one. All my WU\'s have now uploaded.
____________
 |
|
|
|
|
|
Have another 3 work units with Error SIGSEGV : Segmentation Violation
http://ralph.bakerlab.org/workunit.php?wuid=212912
http://ralph.bakerlab.org/workunit.php?wuid=212913
http://ralph.bakerlab.org/workunit.php?wuid=212914
Also another 1 with \"Process exit code 1\"
\"ERROR:Exit at:dock_structure.cc:line:401\"
http://ralph.bakerlab.org/workunit.php?wuid=212903
____________
 |
|
|
|
|
|
I just get a WU that crashes immediately:
8/29/2006 8:44:32 PM|ralph@home|Sending scheduler request to http://ralph.bakerlab.org/ralph_cgi/cgi
8/29/2006 8:44:32 PM|ralph@home|Reason: To fetch work
8/29/2006 8:44:32 PM|ralph@home|Requesting 19008 seconds of new work
8/29/2006 8:44:37 PM|ralph@home|Scheduler request succeeded
8/29/2006 8:44:39 PM|ralph@home|Started download of file aa2int_03_05.200_v1_3.gz
8/29/2006 8:44:39 PM|ralph@home|Started download of file aa2int_09_05.200_v1_3.gz
8/29/2006 8:45:10 PM|ralph@home|Finished download of file aa2int_03_05.200_v1_3.gz
8/29/2006 8:45:10 PM|ralph@home|Throughput 54665 bytes/sec
8/29/2006 8:45:10 PM|ralph@home|Started download of file 2int_.fasta.gz
8/29/2006 8:45:11 PM|ralph@home|Finished download of file 2int_.fasta.gz
8/29/2006 8:45:11 PM|ralph@home|Throughput 475 bytes/sec
8/29/2006 8:45:11 PM|ralph@home|Started download of file 2int.loop_file.gz
8/29/2006 8:45:12 PM|ralph@home|Finished download of file 2int.loop_file.gz
8/29/2006 8:45:12 PM|ralph@home|Throughput 240 bytes/sec
8/29/2006 8:45:12 PM|ralph@home|Started download of file 2int_1_model_12_idl.pdb.gz
8/29/2006 8:45:15 PM|ralph@home|Finished download of file 2int_1_model_12_idl.pdb.gz
8/29/2006 8:45:15 PM|ralph@home|Throughput 32415 bytes/sec
8/29/2006 8:45:15 PM|ralph@home|Started download of file paths_200_2int.txt
8/29/2006 8:45:16 PM|ralph@home|Finished download of file paths_200_2int.txt
8/29/2006 8:45:16 PM|ralph@home|Throughput 3689 bytes/sec
8/29/2006 8:45:20 PM|ralph@home|Finished download of file aa2int_09_05.200_v1_3.gz
8/29/2006 8:45:20 PM|ralph@home|Throughput 96325 bytes/sec
8/29/2006 8:45:21 PM||Rescheduling CPU: files downloaded
8/29/2006 8:45:21 PM|rosetta@home|Pausing task BENCH_ABRELAX_SAVE_ALL_OUT_1iibA_BARCODE_R55_filters_1214_6701_0 (left in memory)
8/29/2006 8:45:22 PM|ralph@home|Starting task NMR_2int_CASPR_1_2int_1_model_12IGNORE_THE_REST_idl_1266_6_0 using rosetta_beta version 525
8/29/2006 8:45:59 PM||Rescheduling CPU: application exited
8/29/2006 8:45:59 PM|ralph@home|Computation for task NMR_2int_CASPR_1_2int_1_model_12IGNORE_THE_REST_idl_1266_6_0 finished
8/29/2006 8:45:59 PM|rosetta@home|Resuming task BENCH_ABRELAX_SAVE_ALL_OUT_1iibA_BARCODE_R55_filters_1214_6701_0 using rosetta version 525
8/29/2006 8:46:00 PM|ralph@home|Unrecoverable error for result NMR_2int_CASPR_1_2int_1_model_12IGNORE_THE_REST_idl_1266_6_0 (<file_xfer_error> <file_name>NMR_2int_CASPR_1_2int_1_model_12IGNORE_THE_REST_idl_1266_6_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)
8/29/2006 8:48:41 PM|ralph@home|Sending scheduler request to http://ralph.bakerlab.org/ralph_cgi/cgi
8/29/2006 8:48:41 PM|ralph@home|Reason: To fetch work
8/29/2006 8:48:41 PM|ralph@home|Requesting 19008 seconds of new work, and reporting 1 completed tasks
8/29/2006 8:48:46 PM|ralph@home|Scheduler request succeeded
8/29/2006 8:48:48 PM|ralph@home|Started download of file 2int_1_model_11_idl.pdb.gz
8/29/2006 8:48:51 PM|ralph@home|Finished download of file 2int_1_model_11_idl.pdb.gz
8/29/2006 8:48:51 PM|ralph@home|Throughput 18677 bytes/sec
8/29/2006 8:48:52 PM||Rescheduling CPU: files downloaded
8/29/2006 8:48:52 PM|rosetta@home|Pausing task BENCH_ABRELAX_SAVE_ALL_OUT_1iibA_BARCODE_R55_filters_1214_6701_0 (left in memory)
8/29/2006 8:48:53 PM|ralph@home|Starting task NMR_2int_CASPR_1_2int_1_model_11IGNORE_THE_REST_idl_1266_7_1 using rosetta_beta version 525
8/29/2006 8:49:30 PM||Rescheduling CPU: application exited
8/29/2006 8:49:30 PM|ralph@home|Computation for task NMR_2int_CASPR_1_2int_1_model_11IGNORE_THE_REST_idl_1266_7_1 finished
Windows XP pro sp2
P4 3.0 HT (on)
1 Gb RAM
BOINC 5.4.11
|
|
|