Name | RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_f_pred_343_16902_4_0 |
Workunit | 4848663 |
Created | 13 Jun 2024, 5:27:24 UTC |
Sent | 13 Jun 2024, 8:57:52 UTC |
Report deadline | 14 Jun 2024, 8:57:52 UTC |
Received | 17 Jun 2024, 22:50:54 UTC |
Server state | Over |
Outcome | Computation error |
Client state | Compute error |
Exit status | 0 (0x00000000) |
Computer ID | 43234 |
Run time | 1 days 17 hours 25 min 3 sec |
CPU time | 52 sec |
Validate state | Invalid |
Credit | 0.00 |
Device peak FLOPS | 3.38 GFLOPS |
Application version | Generalized biomolecular modeling and design with RoseTTAFold All-Atom v0.02 (nvidia_alpha) windows_x86_64 |
Peak working set size | 7,277.09 MB |
Peak swap size | 13,315.35 MB |
Peak disk usage | 8.07 MB |
<core_client_version>7.24.1</core_client_version> <![CDATA[ <stderr_txt> Traceback (most recent call last): File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\predict.py", line 708, in <module> pred.predict(out_name+f'_{n}', File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\predict.py", line 551, in predict logit_s, logit_aa_s, logit_pae, logit_pde, p_bind, pred_crds, alpha, pred_allatom, pred_lddt_binned, msa_prev, pair_prev, state_prev = self.model( File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\RoseTTAFoldModel.py", line 358, in forward msa, pair, xyz, alpha_s, xyz_allatom, state, symmsub = self.simulator( File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\Track_module.py", line 1106, in forward msa, pair, xyz, state, alpha, symmsub = self.main_block[i_m](msa, pair, File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\Track_module.py", line 929, in forward xyz, state, alpha = self.str2str( File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\cuda\amp\autocast_mode.py", line 141, in decorate_autocast return func(*args, **kwargs) File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\Track_module.py", line 476, in forward neighbor = get_seqsep_protein_sm(idx, bond_feats, dist_matrix, rotation_mask) File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\util_module.py", line 106, in get_seqsep_protein_sm res_dist, atom_dist = get_res_atom_dist(idx, bond_feats, dist_matrix, sm_mask) File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\util_module.py", line 141, in get_res_atom_dist i_sm = i_s[sm_mask[i_s]] RuntimeError: CUDA error: unknown error CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. 23:21:16 (10032): Can't acquire lockfile (32) - waiting 35s 23:21:51 (10032): Can't acquire lockfile (32) - exiting 23:21:51 (10032): Error: The process cannot access the file because it is being used by another process. (0x20) 23:31:57 (2692): Can't acquire lockfile (32) - waiting 35s 23:32:32 (2692): Can't acquire lockfile (32) - exiting 23:32:32 (2692): Error: The process cannot access the file because it is being used by another process. (0x20) 23:42:43 (1656): Can't acquire lockfile (32) - waiting 35s 23:43:18 (1656): Can't acquire lockfile (32) - exiting 23:43:18 (1656): Error: The process cannot access the file because it is being used by another process. (0x20) 23:54:08 (17696): Can't acquire lockfile (32) - waiting 35s 23:54:43 (17696): Can't acquire lockfile (32) - exiting 23:54:43 (17696): Error: The process cannot access the file because it is being used by another process. (0x20) 00:04:53 (16632): Can't acquire lockfile (32) - waiting 35s 00:05:28 (16632): Can't acquire lockfile (32) - exiting 00:05:28 (16632): Error: The process cannot access the file because it is being used by another process. (0x20) 00:05:44 (20392): Can't acquire lockfile (32) - waiting 35s 00:06:19 (20392): Can't acquire lockfile (32) - exiting 00:06:19 (20392): Error: The process cannot access the file because it is being used by another process. (0x20) 00:16:22 (18404): Can't acquire lockfile (32) - waiting 35s 00:16:57 (18404): Can't acquire lockfile (32) - exiting 00:16:57 (18404): Error: The process cannot access the file because it is being used by another process. (0x20) 00:27:16 (8328): Can't acquire lockfile (32) - waiting 35s 00:27:51 (8328): Can't acquire lockfile (32) - exiting 00:27:51 (8328): Error: The process cannot access the file because it is being used by another process. (0x20) 00:37:52 (14552): Can't acquire lockfile (32) - waiting 35s 00:38:27 (14552): Can't acquire lockfile (32) - exiting 00:38:27 (14552): Error: The process cannot access the file because it is being used by another process. (0x20) 00:48:33 (21456): Can't acquire lockfile (32) - waiting 35s 00:49:08 (21456): Can't acquire lockfile (32) - exiting 00:49:08 (21456): Error: The process cannot access the file because it is being used by another process. (0x20) 00:59:42 (16192): Can't acquire lockfile (32) - waiting 35s 01:00:17 (16192): Can't acquire lockfile (32) - exiting 01:00:17 (16192): Error: The process cannot access the file because it is being used by another process. (0x20) 01:10:25 (7212): Can't acquire lockfile (32) - waiting 35s 01:11:00 (7212): Can't acquire lockfile (32) - exiting 01:11:00 (7212): Error: The process cannot access the file because it is being used by another process. (0x20) 01:21:05 (5100): Can't acquire lockfile (32) - waiting 35s 01:21:40 (5100): Can't acquire lockfile (32) - exiting 01:21:40 (5100): Error: The process cannot access the file because it is being used by another process. (0x20) 01:31:55 (5248): Can't acquire lockfile (32) - waiting 35s 01:32:30 (5248): Can't acquire lockfile (32) - exiting 01:32:30 (5248): Error: The process cannot access the file because it is being used by another process. (0x20) 01:43:02 (4528): Can't acquire lockfile (32) - waiting 35s 01:43:37 (4528): Can't acquire lockfile (32) - exiting 01:43:37 (4528): Error: The process cannot access the file because it is being used by another process. (0x20) 01:54:22 (5144): Can't acquire lockfile (32) - waiting 35s 01:54:57 (5144): Can't acquire lockfile (32) - exiting 01:54:57 (5144): Error: The process cannot access the file because it is being used by another process. (0x20) 02:05:04 (6476): Can't acquire lockfile (32) - waiting 35s 02:05:39 (6476): Can't acquire lockfile (32) - exiting 02:05:39 (6476): Error: The process cannot access the file because it is being used by another process. (0x20) 02:16:12 (11848): Can't acquire lockfile (32) - waiting 35s 02:16:47 (11848): Can't acquire lockfile (32) - exiting 02:16:47 (11848): Error: The process cannot access the file because it is being used by another process. (0x20) 02:27:08 (17892): Can't acquire lockfile (32) - waiting 35s 02:27:43 (17892): Can't acquire lockfile (32) - exiting 02:27:43 (17892): Error: The process cannot access the file because it is being used by another process. (0x20) 02:27:49 (9912): Can't acquire lockfile (32) - waiting 35s 02:28:24 (9912): Can't acquire lockfile (32) - exiting 02:28:24 (9912): Error: The process cannot access the file because it is being used by another process. (0x20) 02:38:40 (18388): Can't acquire lockfile (32) - waiting 35s 02:39:15 (18388): Can't acquire lockfile (32) - exiting 02:39:15 (18388): Error: The process cannot access the file because it is being used by another process. (0x20) 02:49:22 (2516): Can't acquire lockfile (32) - waiting 35s 02:49:57 (2516): Can't acquire lockfile (32) - exiting 02:49:57 (2516): Error: The process cannot access the file because it is being used by another process. (0x20) 02:50:10 (15296): Can't acquire lockfile (32) - waiting 35s 02:50:45 (15296): Can't acquire lockfile (32) - exiting 02:50:45 (15296): Error: The process cannot access the file because it is being used by another process. (0x20) 03:01:03 (4016): Can't acquire lockfile (32) - waiting 35s 03:01:38 (4016): Can't acquire lockfile (32) - exiting 03:01:38 (4016): Error: The process cannot access the file because it is being used by another process. (0x20) 03:11:52 (14080): Can't acquire lockfile (32) - waiting 35s 03:12:27 (14080): Can't acquire lockfile (32) - exiting 03:12:27 (14080): Error: The process cannot access the file because it is being used by another process. (0x20) 03:22:30 (20372): Can't acquire lockfile (32) - waiting 35s 03:23:05 (20372): Can't acquire lockfile (32) - exiting 03:23:05 (20372): Error: The process cannot access the file because it is being used by another process. (0x20) 03:23:12 (17620): Can't acquire lockfile (32) - waiting 35s 03:23:47 (17620): Can't acquire lockfile (32) - exiting 03:23:47 (17620): Error: The process cannot access the file because it is being used by another process. (0x20) 03:33:51 (140): Can't acquire lockfile (32) - waiting 35s 03:34:26 (140): Can't acquire lockfile (32) - exiting 03:34:26 (140): Error: The process cannot access the file because it is being used by another process. (0x20) 03:44:35 (7288): Can't acquire lockfile (32) - waiting 35s 03:45:10 (7288): Can't acquire lockfile (32) - exiting 03:45:10 (7288): Error: The process cannot access the file because it is being used by another process. (0x20) 03:55:28 (18252): Can't acquire lockfile (32) - waiting 35s 03:56:03 (18252): Can't acquire lockfile (32) - exiting 03:56:03 (18252): Error: The process cannot access the file because it is being used by another process. (0x20) 04:06:04 (9352): Can't acquire lockfile (32) - waiting 35s 04:06:39 (9352): Can't acquire lockfile (32) - exiting 04:06:39 (9352): Error: The process cannot access the file because it is being used by another process. (0x20) 04:17:00 (6988): Can't acquire lockfile (32) - waiting 35s 04:17:35 (6988): Can't acquire lockfile (32) - exiting 04:17:35 (6988): Error: The process cannot access the file because it is being used by another process. (0x20) 04:27:59 (19380): Can't acquire lockfile (32) - waiting 35s 04:28:34 (19380): Can't acquire lockfile (32) - exiting 04:28:34 (19380): Error: The process cannot access the file because it is being used by another process. (0x20) 04:38:52 (10904): Can't acquire lockfile (32) - waiting 35s 04:39:27 (10904): Can't acquire lockfile (32) - exiting 04:39:27 (10904): Error: The process cannot access the file because it is being used by another process. (0x20) 04:39:29 (20648): Can't acquire lockfile (32) - waiting 35s 04:40:04 (20648): Can't acquire lockfile (32) - exiting 04:40:04 (20648): Error: The process cannot access the file because it is being used by another process. (0x20) 04:50:27 (5984): Can't acquire lockfile (32) - waiting 35s 04:51:02 (5984): Can't acquire lockfile (32) - exiting 04:51:02 (5984): Error: The process cannot access the file because it is being used by another process. (0x20) 05:01:18 (8112): Can't acquire lockfile (32) - waiting 35s 05:01:53 (8112): Can't acquire lockfile (32) - exiting 05:01:53 (8112): Error: The process cannot access the file because it is being used by another process. (0x20) 05:11:57 (11848): Can't acquire lockfile (32) - waiting 35s 05:12:32 (11848): Can't acquire lockfile (32) - exiting 05:12:32 (11848): Error: The process cannot access the file because it is being used by another process. (0x20) 05:12:50 (21164): Can't acquire lockfile (32) - waiting 35s 05:13:25 (21164): Can't acquire lockfile (32) - exiting 05:13:25 (21164): Error: The process cannot access the file because it is being used by another process. (0x20) 05:23:31 (9068): Can't acquire lockfile (32) - waiting 35s 05:24:06 (9068): Can't acquire lockfile (32) - exiting 05:24:06 (9068): Error: The process cannot access the file because it is being used by another process. (0x20) 05:34:47 (13848): Can't acquire lockfile (32) - waiting 35s 05:35:22 (13848): Can't acquire lockfile (32) - exiting 05:35:22 (13848): Error: The process cannot access the file because it is being used by another process. (0x20) 05:45:52 (10432): Can't acquire lockfile (32) - waiting 35s 05:46:27 (10432): Can't acquire lockfile (32) - exiting 05:46:27 (10432): Error: The process cannot access the file because it is being used by another process. (0x20) 05:56:39 (4588): Can't acquire lockfile (32) - waiting 35s 05:57:14 (4588): Can't acquire lockfile (32) - exiting 05:57:14 (4588): Error: The process cannot access the file because it is being used by another process. (0x20) 06:07:57 (13060): Can't acquire lockfile (32) - waiting 35s 06:08:32 (13060): Can't acquire lockfile (32) - exiting 06:08:32 (13060): Error: The process cannot access the file because it is being used by another process. (0x20) 06:19:12 (3780): Can't acquire lockfile (32) - waiting 35s 06:19:47 (3780): Can't acquire lockfile (32) - exiting 06:19:47 (3780): Error: The process cannot access the file because it is being used by another process. (0x20) 06:30:24 (9788): Can't acquire lockfile (32) - waiting 35s 06:30:59 (9788): Can't acquire lockfile (32) - exiting 06:30:59 (9788): Error: The process cannot access the file because it is being used by another process. (0x20) 06:41:07 (19272): Can't acquire lockfile (32) - waiting 35s 06:41:42 (19272): Can't acquire lockfile (32) - exiting 06:41:42 (19272): Error: The process cannot access the file because it is being used by another process. (0x20) 06:52:27 (2660): Can't acquire lockfile (32) - waiting 35s 06:53:02 (2660): Can't acquire lockfile (32) - exiting 06:53:02 (2660): Error: The process cannot access the file because it is being used by another process. (0x20) 07:03:06 (2660): Can't acquire lockfile (32) - waiting 35s 07:03:41 (2660): Can't acquire lockfile (32) - exiting 07:03:41 (2660): Error: The process cannot access the file because it is being used by another process. (0x20) 20:39:26 (13796): BOINC client no longer exists - exiting 20:39:26 (13796): timer handler: client dead, exiting 17:46:56 (6708): called boinc_finish(0) </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_f_pred_343_16902_4_0_r220413113_0</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> </message> ]]>
©2024 University of Washington
http://www.bakerlab.org