Task 5458816

Name RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_f_pred_350_16902_3_0
Workunit 4848701
Created 13 Jun 2024, 5:41:15 UTC
Sent 13 Jun 2024, 8:57:52 UTC
Report deadline 14 Jun 2024, 8:57:52 UTC
Received 17 Jun 2024, 22:44:50 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 43234
Run time 1 days 17 hours 21 min 48 sec
CPU time 52 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 3.38 GFLOPS
Application version Generalized biomolecular modeling and design with RoseTTAFold All-Atom v0.02 (nvidia_alpha)
windows_x86_64
Peak working set size 7,222.87 MB
Peak swap size 12,688.20 MB
Peak disk usage 8.10 MB

Stderr output

<core_client_version>7.24.1</core_client_version>
<![CDATA[
<stderr_txt>
Traceback (most recent call last):
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\predict.py", line 708, in <module>
    pred.predict(out_name+f'_{n}', 
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\predict.py", line 551, in predict
    logit_s, logit_aa_s, logit_pae, logit_pde, p_bind, pred_crds, alpha, pred_allatom, pred_lddt_binned,                msa_prev, pair_prev, state_prev = self.model(
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\RoseTTAFoldModel.py", line 358, in forward
    msa, pair, xyz, alpha_s, xyz_allatom, state, symmsub = self.simulator(
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\Track_module.py", line 1106, in forward
    msa, pair, xyz, state, alpha, symmsub = self.main_block[i_m](msa, pair,
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\Track_module.py", line 929, in forward
    xyz, state, alpha = self.str2str(
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\cuda\amp\autocast_mode.py", line 141, in decorate_autocast
    return func(*args, **kwargs)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\Track_module.py", line 525, in forward
    v = xyz - xyz[:,:,1:2,:]
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
23:21:16 (9516): Can't acquire lockfile (32) - waiting 35s
23:21:51 (9516): Can't acquire lockfile (32) - exiting
23:21:51 (9516): Error: The process cannot access the file because it is being used by another process.

 (0x20)
23:31:57 (2112): Can't acquire lockfile (32) - waiting 35s
23:32:32 (2112): Can't acquire lockfile (32) - exiting
23:32:32 (2112): Error: The process cannot access the file because it is being used by another process.

 (0x20)
23:42:43 (10172): Can't acquire lockfile (32) - waiting 35s
23:43:18 (10172): Can't acquire lockfile (32) - exiting
23:43:18 (10172): Error: The process cannot access the file because it is being used by another process.

 (0x20)
23:54:08 (12064): Can't acquire lockfile (32) - waiting 35s
23:54:43 (12064): Can't acquire lockfile (32) - exiting
23:54:43 (12064): Error: The process cannot access the file because it is being used by another process.

 (0x20)
00:04:53 (9084): Can't acquire lockfile (32) - waiting 35s
00:05:28 (9084): Can't acquire lockfile (32) - exiting
00:05:28 (9084): Error: The process cannot access the file because it is being used by another process.

 (0x20)
00:05:44 (18260): Can't acquire lockfile (32) - waiting 35s
00:06:19 (18260): Can't acquire lockfile (32) - exiting
00:06:19 (18260): Error: The process cannot access the file because it is being used by another process.

 (0x20)
00:16:22 (5252): Can't acquire lockfile (32) - waiting 35s
00:16:57 (5252): Can't acquire lockfile (32) - exiting
00:16:57 (5252): Error: The process cannot access the file because it is being used by another process.

 (0x20)
00:27:16 (9140): Can't acquire lockfile (32) - waiting 35s
00:27:51 (9140): Can't acquire lockfile (32) - exiting
00:27:51 (9140): Error: The process cannot access the file because it is being used by another process.

 (0x20)
00:37:52 (6428): Can't acquire lockfile (32) - waiting 35s
00:38:27 (6428): Can't acquire lockfile (32) - exiting
00:38:27 (6428): Error: The process cannot access the file because it is being used by another process.

 (0x20)
00:48:33 (19568): Can't acquire lockfile (32) - waiting 35s
00:49:08 (19568): Can't acquire lockfile (32) - exiting
00:49:08 (19568): Error: The process cannot access the file because it is being used by another process.

 (0x20)
00:59:42 (944): Can't acquire lockfile (32) - waiting 35s
01:00:17 (944): Can't acquire lockfile (32) - exiting
01:00:17 (944): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:10:25 (14192): Can't acquire lockfile (32) - waiting 35s
01:11:00 (14192): Can't acquire lockfile (32) - exiting
01:11:00 (14192): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:21:05 (5664): Can't acquire lockfile (32) - waiting 35s
01:21:40 (5664): Can't acquire lockfile (32) - exiting
01:21:40 (5664): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:31:55 (1252): Can't acquire lockfile (32) - waiting 35s
01:32:30 (1252): Can't acquire lockfile (32) - exiting
01:32:30 (1252): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:43:02 (9104): Can't acquire lockfile (32) - waiting 35s
01:43:37 (9104): Can't acquire lockfile (32) - exiting
01:43:37 (9104): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:54:22 (14256): Can't acquire lockfile (32) - waiting 35s
01:54:57 (14256): Can't acquire lockfile (32) - exiting
01:54:57 (14256): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:05:04 (11624): Can't acquire lockfile (32) - waiting 35s
02:05:39 (11624): Can't acquire lockfile (32) - exiting
02:05:39 (11624): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:16:12 (14072): Can't acquire lockfile (32) - waiting 35s
02:16:47 (14072): Can't acquire lockfile (32) - exiting
02:16:47 (14072): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:27:08 (8132): Can't acquire lockfile (32) - waiting 35s
02:27:43 (8132): Can't acquire lockfile (32) - exiting
02:27:43 (8132): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:27:49 (18236): Can't acquire lockfile (32) - waiting 35s
02:28:24 (18236): Can't acquire lockfile (32) - exiting
02:28:24 (18236): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:38:40 (15048): Can't acquire lockfile (32) - waiting 35s
02:39:15 (15048): Can't acquire lockfile (32) - exiting
02:39:15 (15048): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:49:22 (10132): Can't acquire lockfile (32) - waiting 35s
02:49:57 (10132): Can't acquire lockfile (32) - exiting
02:49:57 (10132): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:50:10 (21328): Can't acquire lockfile (32) - waiting 35s
02:50:45 (21328): Can't acquire lockfile (32) - exiting
02:50:45 (21328): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:01:03 (2340): Can't acquire lockfile (32) - waiting 35s
03:01:38 (2340): Can't acquire lockfile (32) - exiting
03:01:38 (2340): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:11:52 (8700): Can't acquire lockfile (32) - waiting 35s
03:12:27 (8700): Can't acquire lockfile (32) - exiting
03:12:27 (8700): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:22:30 (19520): Can't acquire lockfile (32) - waiting 35s
03:23:05 (19520): Can't acquire lockfile (32) - exiting
03:23:05 (19520): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:23:12 (18576): Can't acquire lockfile (32) - waiting 35s
03:23:47 (18576): Can't acquire lockfile (32) - exiting
03:23:47 (18576): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:33:51 (6272): Can't acquire lockfile (32) - waiting 35s
03:34:26 (6272): Can't acquire lockfile (32) - exiting
03:34:26 (6272): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:44:35 (14308): Can't acquire lockfile (32) - waiting 35s
03:45:10 (14308): Can't acquire lockfile (32) - exiting
03:45:10 (14308): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:55:28 (20372): Can't acquire lockfile (32) - waiting 35s
03:56:03 (20372): Can't acquire lockfile (32) - exiting
03:56:03 (20372): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:06:04 (8016): Can't acquire lockfile (32) - waiting 35s
04:06:39 (8016): Can't acquire lockfile (32) - exiting
04:06:39 (8016): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:17:00 (10752): Can't acquire lockfile (32) - waiting 35s
04:17:35 (10752): Can't acquire lockfile (32) - exiting
04:17:35 (10752): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:27:59 (7916): Can't acquire lockfile (32) - waiting 35s
04:28:34 (7916): Can't acquire lockfile (32) - exiting
04:28:34 (7916): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:38:52 (13132): Can't acquire lockfile (32) - waiting 35s
04:39:27 (13132): Can't acquire lockfile (32) - exiting
04:39:27 (13132): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:39:29 (17096): Can't acquire lockfile (32) - waiting 35s
04:40:04 (17096): Can't acquire lockfile (32) - exiting
04:40:04 (17096): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:50:27 (13580): Can't acquire lockfile (32) - waiting 35s
04:51:02 (13580): Can't acquire lockfile (32) - exiting
04:51:02 (13580): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:01:18 (5208): Can't acquire lockfile (32) - waiting 35s
05:01:53 (5208): Can't acquire lockfile (32) - exiting
05:01:53 (5208): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:11:57 (18184): Can't acquire lockfile (32) - waiting 35s
05:12:32 (18184): Can't acquire lockfile (32) - exiting
05:12:32 (18184): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:12:50 (4396): Can't acquire lockfile (32) - waiting 35s
05:13:25 (4396): Can't acquire lockfile (32) - exiting
05:13:25 (4396): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:23:31 (20264): Can't acquire lockfile (32) - waiting 35s
05:24:06 (20264): Can't acquire lockfile (32) - exiting
05:24:06 (20264): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:34:47 (14100): Can't acquire lockfile (32) - waiting 35s
05:35:22 (14100): Can't acquire lockfile (32) - exiting
05:35:22 (14100): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:45:52 (928): Can't acquire lockfile (32) - waiting 35s
05:46:27 (928): Can't acquire lockfile (32) - exiting
05:46:27 (928): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:56:39 (14216): Can't acquire lockfile (32) - waiting 35s
05:57:14 (14216): Can't acquire lockfile (32) - exiting
05:57:14 (14216): Error: The process cannot access the file because it is being used by another process.

 (0x20)
06:07:57 (19976): Can't acquire lockfile (32) - waiting 35s
06:08:32 (19976): Can't acquire lockfile (32) - exiting
06:08:32 (19976): Error: The process cannot access the file because it is being used by another process.

 (0x20)
06:19:12 (3436): Can't acquire lockfile (32) - waiting 35s
06:19:47 (3436): Can't acquire lockfile (32) - exiting
06:19:47 (3436): Error: The process cannot access the file because it is being used by another process.

 (0x20)
06:30:24 (21164): Can't acquire lockfile (32) - waiting 35s
06:30:59 (21164): Can't acquire lockfile (32) - exiting
06:30:59 (21164): Error: The process cannot access the file because it is being used by another process.

 (0x20)
06:41:07 (3240): Can't acquire lockfile (32) - waiting 35s
06:41:42 (3240): Can't acquire lockfile (32) - exiting
06:41:42 (3240): Error: The process cannot access the file because it is being used by another process.

 (0x20)
06:52:27 (12132): Can't acquire lockfile (32) - waiting 35s
06:53:02 (12132): Can't acquire lockfile (32) - exiting
06:53:02 (12132): Error: The process cannot access the file because it is being used by another process.

 (0x20)
07:03:06 (14432): Can't acquire lockfile (32) - waiting 35s
07:03:41 (14432): Can't acquire lockfile (32) - exiting
07:03:41 (14432): Error: The process cannot access the file because it is being used by another process.

 (0x20)
17:42:54 (7220): called boinc_finish(0)

</stderr_txt>
<message>
upload failure: <file_xfer_error>
  <file_name>RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_f_pred_350_16902_3_0_r1266607577_0</file_name>
  <error_code>-240 (stat() failed)</error_code>
</file_xfer_error>
</message>
]]>




©2024 University of Washington
http://www.bakerlab.org