Task 5458772

Name RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_f_pred_343_16902_4_0
Workunit 4848663
Created 13 Jun 2024, 5:27:24 UTC
Sent 13 Jun 2024, 8:57:52 UTC
Report deadline 14 Jun 2024, 8:57:52 UTC
Received 17 Jun 2024, 22:50:54 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 43234
Run time 1 days 17 hours 25 min 3 sec
CPU time 52 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 3.38 GFLOPS
Application version Generalized biomolecular modeling and design with RoseTTAFold All-Atom v0.02 (nvidia_alpha)
windows_x86_64
Peak working set size 7,277.09 MB
Peak swap size 13,315.35 MB
Peak disk usage 8.07 MB

Stderr output

<core_client_version>7.24.1</core_client_version>
<![CDATA[
<stderr_txt>
Traceback (most recent call last):
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\predict.py", line 708, in <module>
    pred.predict(out_name+f'_{n}', 
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\predict.py", line 551, in predict
    logit_s, logit_aa_s, logit_pae, logit_pde, p_bind, pred_crds, alpha, pred_allatom, pred_lddt_binned,                msa_prev, pair_prev, state_prev = self.model(
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\RoseTTAFoldModel.py", line 358, in forward
    msa, pair, xyz, alpha_s, xyz_allatom, state, symmsub = self.simulator(
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\Track_module.py", line 1106, in forward
    msa, pair, xyz, state, alpha, symmsub = self.main_block[i_m](msa, pair,
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\Track_module.py", line 929, in forward
    xyz, state, alpha = self.str2str(
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\cuda\amp\autocast_mode.py", line 141, in decorate_autocast
    return func(*args, **kwargs)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\Track_module.py", line 476, in forward
    neighbor = get_seqsep_protein_sm(idx, bond_feats, dist_matrix, rotation_mask)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\util_module.py", line 106, in get_seqsep_protein_sm
    res_dist, atom_dist = get_res_atom_dist(idx, bond_feats, dist_matrix, sm_mask)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\util_module.py", line 141, in get_res_atom_dist
    i_sm = i_s[sm_mask[i_s]]
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
23:21:16 (10032): Can't acquire lockfile (32) - waiting 35s
23:21:51 (10032): Can't acquire lockfile (32) - exiting
23:21:51 (10032): Error: The process cannot access the file because it is being used by another process.

 (0x20)
23:31:57 (2692): Can't acquire lockfile (32) - waiting 35s
23:32:32 (2692): Can't acquire lockfile (32) - exiting
23:32:32 (2692): Error: The process cannot access the file because it is being used by another process.

 (0x20)
23:42:43 (1656): Can't acquire lockfile (32) - waiting 35s
23:43:18 (1656): Can't acquire lockfile (32) - exiting
23:43:18 (1656): Error: The process cannot access the file because it is being used by another process.

 (0x20)
23:54:08 (17696): Can't acquire lockfile (32) - waiting 35s
23:54:43 (17696): Can't acquire lockfile (32) - exiting
23:54:43 (17696): Error: The process cannot access the file because it is being used by another process.

 (0x20)
00:04:53 (16632): Can't acquire lockfile (32) - waiting 35s
00:05:28 (16632): Can't acquire lockfile (32) - exiting
00:05:28 (16632): Error: The process cannot access the file because it is being used by another process.

 (0x20)
00:05:44 (20392): Can't acquire lockfile (32) - waiting 35s
00:06:19 (20392): Can't acquire lockfile (32) - exiting
00:06:19 (20392): Error: The process cannot access the file because it is being used by another process.

 (0x20)
00:16:22 (18404): Can't acquire lockfile (32) - waiting 35s
00:16:57 (18404): Can't acquire lockfile (32) - exiting
00:16:57 (18404): Error: The process cannot access the file because it is being used by another process.

 (0x20)
00:27:16 (8328): Can't acquire lockfile (32) - waiting 35s
00:27:51 (8328): Can't acquire lockfile (32) - exiting
00:27:51 (8328): Error: The process cannot access the file because it is being used by another process.

 (0x20)
00:37:52 (14552): Can't acquire lockfile (32) - waiting 35s
00:38:27 (14552): Can't acquire lockfile (32) - exiting
00:38:27 (14552): Error: The process cannot access the file because it is being used by another process.

 (0x20)
00:48:33 (21456): Can't acquire lockfile (32) - waiting 35s
00:49:08 (21456): Can't acquire lockfile (32) - exiting
00:49:08 (21456): Error: The process cannot access the file because it is being used by another process.

 (0x20)
00:59:42 (16192): Can't acquire lockfile (32) - waiting 35s
01:00:17 (16192): Can't acquire lockfile (32) - exiting
01:00:17 (16192): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:10:25 (7212): Can't acquire lockfile (32) - waiting 35s
01:11:00 (7212): Can't acquire lockfile (32) - exiting
01:11:00 (7212): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:21:05 (5100): Can't acquire lockfile (32) - waiting 35s
01:21:40 (5100): Can't acquire lockfile (32) - exiting
01:21:40 (5100): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:31:55 (5248): Can't acquire lockfile (32) - waiting 35s
01:32:30 (5248): Can't acquire lockfile (32) - exiting
01:32:30 (5248): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:43:02 (4528): Can't acquire lockfile (32) - waiting 35s
01:43:37 (4528): Can't acquire lockfile (32) - exiting
01:43:37 (4528): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:54:22 (5144): Can't acquire lockfile (32) - waiting 35s
01:54:57 (5144): Can't acquire lockfile (32) - exiting
01:54:57 (5144): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:05:04 (6476): Can't acquire lockfile (32) - waiting 35s
02:05:39 (6476): Can't acquire lockfile (32) - exiting
02:05:39 (6476): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:16:12 (11848): Can't acquire lockfile (32) - waiting 35s
02:16:47 (11848): Can't acquire lockfile (32) - exiting
02:16:47 (11848): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:27:08 (17892): Can't acquire lockfile (32) - waiting 35s
02:27:43 (17892): Can't acquire lockfile (32) - exiting
02:27:43 (17892): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:27:49 (9912): Can't acquire lockfile (32) - waiting 35s
02:28:24 (9912): Can't acquire lockfile (32) - exiting
02:28:24 (9912): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:38:40 (18388): Can't acquire lockfile (32) - waiting 35s
02:39:15 (18388): Can't acquire lockfile (32) - exiting
02:39:15 (18388): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:49:22 (2516): Can't acquire lockfile (32) - waiting 35s
02:49:57 (2516): Can't acquire lockfile (32) - exiting
02:49:57 (2516): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:50:10 (15296): Can't acquire lockfile (32) - waiting 35s
02:50:45 (15296): Can't acquire lockfile (32) - exiting
02:50:45 (15296): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:01:03 (4016): Can't acquire lockfile (32) - waiting 35s
03:01:38 (4016): Can't acquire lockfile (32) - exiting
03:01:38 (4016): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:11:52 (14080): Can't acquire lockfile (32) - waiting 35s
03:12:27 (14080): Can't acquire lockfile (32) - exiting
03:12:27 (14080): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:22:30 (20372): Can't acquire lockfile (32) - waiting 35s
03:23:05 (20372): Can't acquire lockfile (32) - exiting
03:23:05 (20372): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:23:12 (17620): Can't acquire lockfile (32) - waiting 35s
03:23:47 (17620): Can't acquire lockfile (32) - exiting
03:23:47 (17620): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:33:51 (140): Can't acquire lockfile (32) - waiting 35s
03:34:26 (140): Can't acquire lockfile (32) - exiting
03:34:26 (140): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:44:35 (7288): Can't acquire lockfile (32) - waiting 35s
03:45:10 (7288): Can't acquire lockfile (32) - exiting
03:45:10 (7288): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:55:28 (18252): Can't acquire lockfile (32) - waiting 35s
03:56:03 (18252): Can't acquire lockfile (32) - exiting
03:56:03 (18252): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:06:04 (9352): Can't acquire lockfile (32) - waiting 35s
04:06:39 (9352): Can't acquire lockfile (32) - exiting
04:06:39 (9352): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:17:00 (6988): Can't acquire lockfile (32) - waiting 35s
04:17:35 (6988): Can't acquire lockfile (32) - exiting
04:17:35 (6988): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:27:59 (19380): Can't acquire lockfile (32) - waiting 35s
04:28:34 (19380): Can't acquire lockfile (32) - exiting
04:28:34 (19380): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:38:52 (10904): Can't acquire lockfile (32) - waiting 35s
04:39:27 (10904): Can't acquire lockfile (32) - exiting
04:39:27 (10904): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:39:29 (20648): Can't acquire lockfile (32) - waiting 35s
04:40:04 (20648): Can't acquire lockfile (32) - exiting
04:40:04 (20648): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:50:27 (5984): Can't acquire lockfile (32) - waiting 35s
04:51:02 (5984): Can't acquire lockfile (32) - exiting
04:51:02 (5984): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:01:18 (8112): Can't acquire lockfile (32) - waiting 35s
05:01:53 (8112): Can't acquire lockfile (32) - exiting
05:01:53 (8112): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:11:57 (11848): Can't acquire lockfile (32) - waiting 35s
05:12:32 (11848): Can't acquire lockfile (32) - exiting
05:12:32 (11848): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:12:50 (21164): Can't acquire lockfile (32) - waiting 35s
05:13:25 (21164): Can't acquire lockfile (32) - exiting
05:13:25 (21164): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:23:31 (9068): Can't acquire lockfile (32) - waiting 35s
05:24:06 (9068): Can't acquire lockfile (32) - exiting
05:24:06 (9068): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:34:47 (13848): Can't acquire lockfile (32) - waiting 35s
05:35:22 (13848): Can't acquire lockfile (32) - exiting
05:35:22 (13848): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:45:52 (10432): Can't acquire lockfile (32) - waiting 35s
05:46:27 (10432): Can't acquire lockfile (32) - exiting
05:46:27 (10432): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:56:39 (4588): Can't acquire lockfile (32) - waiting 35s
05:57:14 (4588): Can't acquire lockfile (32) - exiting
05:57:14 (4588): Error: The process cannot access the file because it is being used by another process.

 (0x20)
06:07:57 (13060): Can't acquire lockfile (32) - waiting 35s
06:08:32 (13060): Can't acquire lockfile (32) - exiting
06:08:32 (13060): Error: The process cannot access the file because it is being used by another process.

 (0x20)
06:19:12 (3780): Can't acquire lockfile (32) - waiting 35s
06:19:47 (3780): Can't acquire lockfile (32) - exiting
06:19:47 (3780): Error: The process cannot access the file because it is being used by another process.

 (0x20)
06:30:24 (9788): Can't acquire lockfile (32) - waiting 35s
06:30:59 (9788): Can't acquire lockfile (32) - exiting
06:30:59 (9788): Error: The process cannot access the file because it is being used by another process.

 (0x20)
06:41:07 (19272): Can't acquire lockfile (32) - waiting 35s
06:41:42 (19272): Can't acquire lockfile (32) - exiting
06:41:42 (19272): Error: The process cannot access the file because it is being used by another process.

 (0x20)
06:52:27 (2660): Can't acquire lockfile (32) - waiting 35s
06:53:02 (2660): Can't acquire lockfile (32) - exiting
06:53:02 (2660): Error: The process cannot access the file because it is being used by another process.

 (0x20)
07:03:06 (2660): Can't acquire lockfile (32) - waiting 35s
07:03:41 (2660): Can't acquire lockfile (32) - exiting
07:03:41 (2660): Error: The process cannot access the file because it is being used by another process.

 (0x20)
20:39:26 (13796): BOINC client no longer exists - exiting
20:39:26 (13796): timer handler: client dead, exiting
17:46:56 (6708): called boinc_finish(0)

</stderr_txt>
<message>
upload failure: <file_xfer_error>
  <file_name>RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_f_pred_343_16902_4_0_r220413113_0</file_name>
  <error_code>-240 (stat() failed)</error_code>
</file_xfer_error>
</message>
]]>




©2024 University of Washington
http://www.bakerlab.org