Task 5460155

Name RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_g_pred_23_16903_3_0
Workunit 4849079
Created 13 Jun 2024, 22:49:27 UTC
Sent 14 Jun 2024, 1:00:27 UTC
Report deadline 15 Jun 2024, 1:00:27 UTC
Received 17 Jun 2024, 22:50:54 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x00000000)
Computer ID 43234
Run time 1 days 17 hours 25 min 28 sec
CPU time 52 sec
Validate state Valid
Credit 590.96
Device peak FLOPS 3.38 GFLOPS
Application version Generalized biomolecular modeling and design with RoseTTAFold All-Atom v0.02 (nvidia_alpha)
windows_x86_64
Peak working set size 7,286.95 MB
Peak swap size 12,254.88 MB
Peak disk usage 8.13 MB

Stderr output

<core_client_version>7.24.1</core_client_version>
<![CDATA[
<stderr_txt>
Traceback (most recent call last):
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\predict.py", line 708, in <module>
    pred.predict(out_name+f'_{n}', 
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\predict.py", line 551, in predict
    logit_s, logit_aa_s, logit_pae, logit_pde, p_bind, pred_crds, alpha, pred_allatom, pred_lddt_binned,                msa_prev, pair_prev, state_prev = self.model(
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\RoseTTAFoldModel.py", line 358, in forward
    msa, pair, xyz, alpha_s, xyz_allatom, state, symmsub = self.simulator(
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\Track_module.py", line 1104, in forward
    dchiraldxyz, = calc_chiral_grads(xyz.detach(),chirals)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\ev0\lib\site-packages\torch\autograd\grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\loss.py", line 1363, in calc_chiral_grads
    l = calc_chiral_loss(xyz, chirals)
  File "B:\ProgramData\BOINC\projects\ralph.bakerlab.org\cv2\rf2aa\loss.py", line 1257, in calc_chiral_loss
    chiral_dih = pred[:, chirals[..., :-1].long(), 1]
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
23:21:18 (2196): Can't acquire lockfile (32) - waiting 35s
23:21:53 (2196): Can't acquire lockfile (32) - exiting
23:21:53 (2196): Error: The process cannot access the file because it is being used by another process.

 (0x20)
23:32:33 (11320): Can't acquire lockfile (32) - waiting 35s
23:33:08 (11320): Can't acquire lockfile (32) - exiting
23:33:08 (11320): Error: The process cannot access the file because it is being used by another process.

 (0x20)
23:43:18 (16192): Can't acquire lockfile (32) - waiting 35s
23:43:53 (16192): Can't acquire lockfile (32) - exiting
23:43:53 (16192): Error: The process cannot access the file because it is being used by another process.

 (0x20)
23:54:44 (6772): Can't acquire lockfile (32) - waiting 35s
23:55:19 (6772): Can't acquire lockfile (32) - exiting
23:55:19 (6772): Error: The process cannot access the file because it is being used by another process.

 (0x20)
00:05:44 (5200): Can't acquire lockfile (32) - waiting 35s
00:06:19 (5200): Can't acquire lockfile (32) - exiting
00:06:19 (5200): Error: The process cannot access the file because it is being used by another process.

 (0x20)
00:16:57 (17692): Can't acquire lockfile (32) - waiting 35s
00:17:32 (17692): Can't acquire lockfile (32) - exiting
00:17:32 (17692): Error: The process cannot access the file because it is being used by another process.

 (0x20)
00:27:51 (6392): Can't acquire lockfile (32) - waiting 35s
00:28:26 (6392): Can't acquire lockfile (32) - exiting
00:28:26 (6392): Error: The process cannot access the file because it is being used by another process.

 (0x20)
00:28:38 (17768): Can't acquire lockfile (32) - waiting 35s
00:29:13 (17768): Can't acquire lockfile (32) - exiting
00:29:13 (17768): Error: The process cannot access the file because it is being used by another process.

 (0x20)
00:45:56 (13436): Can't acquire lockfile (32) - waiting 35s
00:46:31 (13436): Can't acquire lockfile (32) - exiting
00:46:31 (13436): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:00:17 (11476): Can't acquire lockfile (32) - waiting 35s
01:00:52 (11476): Can't acquire lockfile (32) - exiting
01:00:52 (11476): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:11:01 (19028): Can't acquire lockfile (32) - waiting 35s
01:11:36 (19028): Can't acquire lockfile (32) - exiting
01:11:36 (19028): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:21:40 (16120): Can't acquire lockfile (32) - waiting 35s
01:22:15 (16120): Can't acquire lockfile (32) - exiting
01:22:15 (16120): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:22:36 (18276): Can't acquire lockfile (32) - waiting 35s
01:23:11 (18276): Can't acquire lockfile (32) - exiting
01:23:11 (18276): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:34:55 (19440): Can't acquire lockfile (32) - waiting 35s
01:35:30 (19440): Can't acquire lockfile (32) - exiting
01:35:30 (19440): Error: The process cannot access the file because it is being used by another process.

 (0x20)
01:54:57 (14308): Can't acquire lockfile (32) - waiting 35s
01:55:32 (14308): Can't acquire lockfile (32) - exiting
01:55:32 (14308): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:05:40 (13852): Can't acquire lockfile (32) - waiting 35s
02:06:15 (13852): Can't acquire lockfile (32) - exiting
02:06:15 (13852): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:06:26 (7140): Can't acquire lockfile (32) - waiting 35s
02:07:01 (7140): Can't acquire lockfile (32) - exiting
02:07:01 (7140): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:18:58 (13288): Can't acquire lockfile (32) - waiting 35s
02:19:33 (13288): Can't acquire lockfile (32) - exiting
02:19:33 (13288): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:31:02 (17796): Can't acquire lockfile (32) - waiting 35s
02:31:37 (17796): Can't acquire lockfile (32) - exiting
02:31:37 (17796): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:41:43 (14552): Can't acquire lockfile (32) - waiting 35s
02:42:18 (14552): Can't acquire lockfile (32) - exiting
02:42:18 (14552): Error: The process cannot access the file because it is being used by another process.

 (0x20)
02:52:37 (20880): Can't acquire lockfile (32) - waiting 35s
02:53:12 (20880): Can't acquire lockfile (32) - exiting
02:53:12 (20880): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:03:14 (324): Can't acquire lockfile (32) - waiting 35s
03:03:49 (324): Can't acquire lockfile (32) - exiting
03:03:49 (324): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:03:56 (20304): Can't acquire lockfile (32) - waiting 35s
03:04:31 (20304): Can't acquire lockfile (32) - exiting
03:04:31 (20304): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:14:40 (17144): Can't acquire lockfile (32) - waiting 35s
03:15:15 (17144): Can't acquire lockfile (32) - exiting
03:15:15 (17144): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:25:31 (3900): Can't acquire lockfile (32) - waiting 35s
03:26:06 (3900): Can't acquire lockfile (32) - exiting
03:26:06 (3900): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:43:24 (7528): Can't acquire lockfile (32) - waiting 35s
03:43:59 (7528): Can't acquire lockfile (32) - exiting
03:43:59 (7528): Error: The process cannot access the file because it is being used by another process.

 (0x20)
03:56:03 (8700): Can't acquire lockfile (32) - waiting 35s
03:56:38 (8700): Can't acquire lockfile (32) - exiting
03:56:38 (8700): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:06:39 (18228): Can't acquire lockfile (32) - waiting 35s
04:07:14 (18228): Can't acquire lockfile (32) - exiting
04:07:14 (18228): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:17:35 (3760): Can't acquire lockfile (32) - waiting 35s
04:18:10 (3760): Can't acquire lockfile (32) - exiting
04:18:10 (3760): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:28:34 (12124): Can't acquire lockfile (32) - waiting 35s
04:29:09 (12124): Can't acquire lockfile (32) - exiting
04:29:09 (12124): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:39:29 (16068): Can't acquire lockfile (32) - waiting 35s
04:40:04 (16068): Can't acquire lockfile (32) - exiting
04:40:04 (16068): Error: The process cannot access the file because it is being used by another process.

 (0x20)
04:51:02 (9672): Can't acquire lockfile (32) - waiting 35s
04:51:37 (9672): Can't acquire lockfile (32) - exiting
04:51:37 (9672): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:01:53 (6808): Can't acquire lockfile (32) - waiting 35s
05:02:28 (6808): Can't acquire lockfile (32) - exiting
05:02:28 (6808): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:12:50 (18076): Can't acquire lockfile (32) - waiting 35s
05:13:25 (18076): Can't acquire lockfile (32) - exiting
05:13:25 (18076): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:24:06 (2416): Can't acquire lockfile (32) - waiting 35s
05:24:41 (2416): Can't acquire lockfile (32) - exiting
05:24:41 (2416): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:35:22 (6092): Can't acquire lockfile (32) - waiting 35s
05:35:57 (6092): Can't acquire lockfile (32) - exiting
05:35:57 (6092): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:46:27 (17308): Can't acquire lockfile (32) - waiting 35s
05:47:02 (17308): Can't acquire lockfile (32) - exiting
05:47:02 (17308): Error: The process cannot access the file because it is being used by another process.

 (0x20)
05:57:15 (5904): Can't acquire lockfile (32) - waiting 35s
05:57:50 (5904): Can't acquire lockfile (32) - exiting
05:57:50 (5904): Error: The process cannot access the file because it is being used by another process.

 (0x20)
06:08:32 (21428): Can't acquire lockfile (32) - waiting 35s
06:09:07 (21428): Can't acquire lockfile (32) - exiting
06:09:07 (21428): Error: The process cannot access the file because it is being used by another process.

 (0x20)
06:19:47 (13912): Can't acquire lockfile (32) - waiting 35s
06:20:22 (13912): Can't acquire lockfile (32) - exiting
06:20:22 (13912): Error: The process cannot access the file because it is being used by another process.

 (0x20)
06:30:59 (17184): Can't acquire lockfile (32) - waiting 35s
06:31:34 (17184): Can't acquire lockfile (32) - exiting
06:31:34 (17184): Error: The process cannot access the file because it is being used by another process.

 (0x20)
06:41:43 (6060): Can't acquire lockfile (32) - waiting 35s
06:42:18 (6060): Can't acquire lockfile (32) - exiting
06:42:18 (6060): Error: The process cannot access the file because it is being used by another process.

 (0x20)
06:53:02 (11480): Can't acquire lockfile (32) - waiting 35s
06:53:37 (11480): Can't acquire lockfile (32) - exiting
06:53:37 (11480): Error: The process cannot access the file because it is being used by another process.

 (0x20)
20:39:27 (5996): BOINC client no longer exists - exiting
20:39:27 (5996): timer handler: client dead, exiting
17:48:21 (8016): called boinc_finish(0)

</stderr_txt>
]]>




©2024 University of Washington
http://www.bakerlab.org