Task 5458815

Name RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_f_pred_349_16902_4_0
Workunit 4848700
Created 13 Jun 2024, 5:41:15 UTC
Sent 13 Jun 2024, 8:57:52 UTC
Report deadline 14 Jun 2024, 8:57:52 UTC
Received 14 Jun 2024, 13:54:16 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 43234
Run time
CPU time
Validate state Invalid
Credit 0.00
Device peak FLOPS 3.38 GFLOPS
Application version Generalized biomolecular modeling and design with RoseTTAFold All-Atom v0.02 (nvidia_alpha)
windows_x86_64

Stderr output

<core_client_version>7.24.1</core_client_version>
<![CDATA[
<stderr_txt>
Traceback (most recent call last):
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\predict.py", line 708, in <module>
    pred.predict(out_name+f'_{n}', 
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\predict.py", line 551, in predict
    logit_s, logit_aa_s, logit_pae, logit_pde, p_bind, pred_crds, alpha, pred_allatom, pred_lddt_binned,                msa_prev, pair_prev, state_prev = self.model(
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\RoseTTAFoldModel.py", line 358, in forward
    msa, pair, xyz, alpha_s, xyz_allatom, state, symmsub = self.simulator(
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\Track_module.py", line 1106, in forward
    msa, pair, xyz, state, alpha, symmsub = self.main_block[i_m](msa, pair,
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\Track_module.py", line 929, in forward
    xyz, state, alpha = self.str2str(
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\cuda\amp\autocast_mode.py", line 141, in decorate_autocast
    return func(*args, **kwargs)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\Track_module.py", line 508, in forward
    offset[:,is_motif,...] = 0            
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
nnot access the file because it is being used by another process.

 (0x20)
00:59:42 (13132): Can't acquire lockfile (32) - waiting 35s
01:00:17 (13132): Can't acquire lockfile (32) - exiting
01:00:17 (13132): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:10:25 (15308): Can't acquire lockfile (32) - waiting 35s
01:11:00 (15308): Can't acquire lockfile (32) - exiting
01:11:00 (15308): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:21:05 (872): Can't acquire lockfile (32) - waiting 35s
01:21:40 (872): Can't acquire lockfile (32) - exiting
01:21:40 (872): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:31:55 (4552): Can't acquire lockfile (32) - waiting 35s
01:32:30 (4552): Can't acquire lockfile (32) - exiting
01:32:30 (4552): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:43:02 (3812): Can't acquire lockfile (32) - waiting 35s
01:43:37 (3812): Can't acquire lockfile (32) - exiting
01:43:37 (3812): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:54:22 (6180): Can't acquire lockfile (32) - waiting 35s
01:54:57 (6180): Can't acquire lockfile (32) - exiting
01:54:57 (6180): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:05:04 (19564): Can't acquire lockfile (32) - waiting 35s
02:05:39 (19564): Can't acquire lockfile (32) - exiting
02:05:39 (19564): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:16:12 (8412): Can't acquire lockfile (32) - waiting 35s
02:16:47 (8412): Can't acquire lockfile (32) - exiting
02:16:47 (8412): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:27:08 (18172): Can't acquire lockfile (32) - waiting 35s
02:27:43 (18172): Can't acquire lockfile (32) - exiting
02:27:43 (18172): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:27:49 (15236): Can't acquire lockfile (32) - waiting 35s
02:28:24 (15236): Can't acquire lockfile (32) - exiting
02:28:24 (15236): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:38:40 (19684): Can't acquire lockfile (32) - waiting 35s
02:39:15 (19684): Can't acquire lockfile (32) - exiting
02:39:15 (19684): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:49:22 (3100): Can't acquire lockfile (32) - waiting 35s
02:49:57 (3100): Can't acquire lockfile (32) - exiting
02:49:57 (3100): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:50:10 (8808): Can't acquire lockfile (32) - waiting 35s
02:50:45 (8808): Can't acquire lockfile (32) - exiting
02:50:45 (8808): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:01:03 (20384): Can't acquire lockfile (32) - waiting 35s
03:01:38 (20384): Can't acquire lockfile (32) - exiting
03:01:38 (20384): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:11:52 (10088): Can't acquire lockfile (32) - waiting 35s
03:12:27 (10088): Can't acquire lockfile (32) - exiting
03:12:27 (10088): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:22:30 (3164): Can't acquire lockfile (32) - waiting 35s
03:23:05 (3164): Can't acquire lockfile (32) - exiting
03:23:05 (3164): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:23:12 (9816): Can't acquire lockfile (32) - waiting 35s
03:23:47 (9816): Can't acquire lockfile (32) - exiting
03:23:47 (9816): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:33:51 (18124): Can't acquire lockfile (32) - waiting 35s
03:34:26 (18124): Can't acquire lockfile (32) - exiting
03:34:26 (18124): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:44:35 (10568): Can't acquire lockfile (32) - waiting 35s
03:45:11 (10568): Can't acquire lockfile (32) - exiting
03:45:11 (10568): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:55:28 (17676): Can't acquire lockfile (32) - waiting 35s
03:56:03 (17676): Can't acquire lockfile (32) - exiting
03:56:03 (17676): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:06:04 (11316): Can't acquire lockfile (32) - waiting 35s
04:06:39 (11316): Can't acquire lockfile (32) - exiting
04:06:39 (11316): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:17:00 (6828): Can't acquire lockfile (32) - waiting 35s
04:17:35 (6828): Can't acquire lockfile (32) - exiting
04:17:35 (6828): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:27:59 (7656): Can't acquire lockfile (32) - waiting 35s
04:28:34 (7656): Can't acquire lockfile (32) - exiting
04:28:34 (7656): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:38:52 (12928): Can't acquire lockfile (32) - waiting 35s
04:39:27 (12928): Can't acquire lockfile (32) - exiting
04:39:27 (12928): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:39:29 (15440): Can't acquire lockfile (32) - waiting 35s
04:40:04 (15440): Can't acquire lockfile (32) - exiting
04:40:04 (15440): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:50:27 (8008): Can't acquire lockfile (32) - waiting 35s
04:51:02 (8008): Can't acquire lockfile (32) - exiting
04:51:02 (8008): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:01:18 (8056): Can't acquire lockfile (32) - waiting 35s
05:01:53 (8056): Can't acquire lockfile (32) - exiting
05:01:53 (8056): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:11:57 (20160): Can't acquire lockfile (32) - waiting 35s
05:12:32 (20160): Can't acquire lockfile (32) - exiting
05:12:32 (20160): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:12:50 (13364): Can't acquire lockfile (32) - waiting 35s
05:13:25 (13364): Can't acquire lockfile (32) - exiting
05:13:25 (13364): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:23:31 (5680): Can't acquire lockfile (32) - waiting 35s
05:24:06 (5680): Can't acquire lockfile (32) - exiting
05:24:06 (5680): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:34:47 (17556): Can't acquire lockfile (32) - waiting 35s
05:35:22 (17556): Can't acquire lockfile (32) - exiting
05:35:22 (17556): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:45:52 (11700): Can't acquire lockfile (32) - waiting 35s
05:46:27 (11700): Can't acquire lockfile (32) - exiting
05:46:27 (11700): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:56:39 (2112): Can't acquire lockfile (32) - waiting 35s
05:57:14 (2112): Can't acquire lockfile (32) - exiting
05:57:14 (2112): Error: The process cannot access the file because it is being used by another process.

 (0x20)
06:07:57 (2504): Can't acquire lockfile (32) - waiting 35s
06:08:32 (2504): Can't acquire lockfile (32) - exiting
06:08:32 (2504): Error: The process cannot access the file because it is being used by another process.

 (0x20)
06:19:12 (1588): Can't acquire lockfile (32) - waiting 35s
06:19:47 (1588): Can't acquire lockfile (32) - exiting
06:19:47 (1588): Error: The process cannot access the file because it is being used by another process.

 (0x20)
06:30:24 (4352): Can't acquire lockfile (32) - waiting 35s
06:30:59 (4352): Can't acquire lockfile (32) - exiting
06:30:59 (4352): Error: The process cannot access the file because it is being used by another process.

 (0x20)
06:41:07 (11892): Can't acquire lockfile (32) - waiting 35s
06:41:42 (11892): Can't acquire lockfile (32) - exiting
06:41:42 (11892): Error: The process cannot access the file because it is being used by another process.

 (0x20)
06:52:27 (14432): Can't acquire lockfile (32) - waiting 35s
06:53:02 (14432): Can't acquire lockfile (32) - exiting
06:53:02 (14432): Error: The process cannot access the file because it is being used by another process.

 (0x20)
07:42:29 (19936): called boinc_finish(0)

</stderr_txt>
<message>
upload failure: <file_xfer_error>
  <file_name>RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_f_pred_349_16902_4_0_r1261289274_0</file_name>
  <error_code>-240 (stat() failed)</error_code>
</file_xfer_error>
</message>
]]>




©2024 University of Washington
http://www.bakerlab.org