Message boards : RALPH@home bug list : RoseTTAFold All-Atom
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 126 Credit: 193,939 RAC: 2,635 |
My CPU / GPU system with a lot more resources failed ever task and tried to run too many at once and tied itself up so bad that I could not get control of the system short of a Hard ungraceful power down. I have set this box to no new Ralph tasks until the app is better behaved.That's why i went with the max_concurrent option to limit them to 1 Task at a time. If it's run on the GPU, it only needs 1 thread to support it (although there are the odd periods where it uses a bit more than that). If it's just the CPU- then each Task tries to use 8 threads, regardless of how many are actually physically available, let alone being used by other applications. And each Task can use a bit over to 2GB of RAM. Grant Darwin NT |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 913 Credit: 1,892,541 RAC: 294 |
Seem they changed the name of the app: "Generalized biomolecular modeling and design with RoseTTAFold All-Atom 0.01 (env)" |
Vester Send message Joined: 29 Apr 20 Posts: 17 Credit: 1,176 RAC: 0 |
Seem they changed the name of the app: Waiting for some of the 1000 that are ready. We'll see if app_config.xml is still rosetta_beta. Edit: Released. App_config still works without change. Forty-eight (48) failures. Project down for maintenance and I cannot see the failed tasks yet. |
Coleslaw Send message Joined: 29 Nov 08 Posts: 1 Credit: 947,069 RAC: 82 |
|
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 913 Credit: 1,892,541 RAC: 294 |
Forty-eight (48) failures. Project down for maintenance and I cannot see the failed tasks yet. Yes. All errors also for me after few seconds Waiting to report to server to see the logs |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 126 Credit: 193,939 RAC: 2,635 |
I'm seeing Tasks ready to send, but when i try to contact the Scheduler i get 18/06/2024 15:13:51 | ralph@home | Project is temporarily shut down for maintenanceAnd the Server Staus page shows everything on the Ralph server as down. Grant Darwin NT |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 126 Credit: 193,939 RAC: 2,635 |
My guess is the VRAM. The 4060 was only 8GB but the 3060 was a 12GB version.My RTX 2060 has only got 6GB of RAM, and was able to process & return work without issues during the last batch (once i upgraded the driver). Grant Darwin NT |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 126 Credit: 193,939 RAC: 2,635 |
Edit: Released. App_config still works without change. Forty-eight (48) failures. Project down for maintenance and I cannot see the failed tasks yet.Looking at some of your past errored Tasks, this is a common thread. OSError: [WinError 1455] The paging file is too small for this operation to complete.If you've fixed your paging file size you might want to make it bigger- i've see Tasks with a Swap size of over 15GB. Grant Darwin NT |
PMH_UK Send message Joined: 29 Jan 09 Posts: 5 Credit: 297,074 RAC: 0 |
Just able to return but no task available to download. Latest error: <core_client_version>7.24.1</core_client_version> <![CDATA[ <message> The access code is invalid. (0xc) - exit code 12 (0xc)</message> <stderr_txt> usage: predict.py [-h] [-checkpoint CHECKPOINT] [-pdb PDB] [-silent SILENT] [-tmpl_chain TMPL_CHAIN] [-tmpl_conf TMPL_CONF] [-movescale MOVESCALE] [-n_pred N_PRED] [-n_cycle N_CYCLE] [-no_extra_l1] [-no_atom_frames] [-outcsv OUTCSV] predict.py: error: unrecognized arguments: -z .venv_h_pred_alpha_23.zip </stderr_txt> ]]> Paul. |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 913 Credit: 1,892,541 RAC: 294 |
If i try to see my wus i have this error Notice: Trying to get property of non-object in /projects/boinc/ralph/html/inc/result.inc on line 83 |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 126 Credit: 193,939 RAC: 2,635 |
Well, the Server Staus now shows all green, but i still can't get any work. 18/06/2024 19:14:16 | ralph@home | Sending scheduler request: To fetch work. 18/06/2024 19:14:16 | ralph@home | Requesting new tasks for CPU and NVIDIA GPU 18/06/2024 19:14:18 | ralph@home | Scheduler request completed: got 0 new tasks 18/06/2024 19:14:18 | ralph@home | No tasks sent 18/06/2024 19:14:18 | ralph@home | Project requested delay of 31 seconds 18/06/2024 19:19:59 | ralph@home | Sending scheduler request: To fetch work. 18/06/2024 19:19:59 | ralph@home | Requesting new tasks for CPU 18/06/2024 19:20:00 | ralph@home | Scheduler request completed: got 0 new tasks 18/06/2024 19:20:00 | ralph@home | No tasks sent 18/06/2024 19:20:00 | ralph@home | Project requested delay of 31 seconds 18/06/2024 19:21:22 | ralph@home | update requested by user 18/06/2024 19:21:28 | ralph@home | Sending scheduler request: Requested by user. 18/06/2024 19:21:28 | ralph@home | Requesting new tasks for CPU 18/06/2024 19:21:29 | ralph@home | Scheduler request completed: got 0 new tasks 18/06/2024 19:21:29 | ralph@home | No tasks sent 18/06/2024 19:21:29 | ralph@home | Project requested delay of 31 seconds 18/06/2024 19:22:06 | ralph@home | Sending scheduler request: To fetch work. 18/06/2024 19:22:06 | ralph@home | Requesting new tasks for CPU 18/06/2024 19:22:07 | ralph@home | Scheduler request completed: got 0 new tasks 18/06/2024 19:22:07 | ralph@home | No tasks sent 18/06/2024 19:22:07 | ralph@home | Project requested delay of 31 seconds Grant Darwin NT |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 913 Credit: 1,892,541 RAC: 294 |
Well, the Server Staus now shows all green, but i still can't get any work. In home page there are over 400wus, but in the status page is 0. Seems there is still some problems on the server |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 126 Credit: 193,939 RAC: 2,635 |
If i try to see my wus i have this errorBut the rest of the info is there? It's looking like the database got a bit scrambled during the outage. Grant Darwin NT |
kotenok2000 Send message Joined: 26 Feb 21 Posts: 22 Credit: 1,893 RAC: 0 |
I just got 3 |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 126 Credit: 193,939 RAC: 2,635 |
Well, the Server Staus now shows all green, but i still can't get any work. Ok, this time i managed to get the new application, and i am presently downloading the cv1.zip file. Hopefully once it's done i will be able to get more work (if there is any). Edit- i've got 2 Tasks showing as downloading, but holding (waiting for the zip file to download hopefully). Edit- Zip file downloaded, Tasks downloaded, instant error on both. https://ralph.bakerlab.org/result.php?resultid=5466574 https://ralph.bakerlab.org/result.php?resultid=5466538 For both Tasks. <core_client_version>8.0.2</core_client_version> <![CDATA[ <message> The access code is invalid. (0xc) - exit code 12 (0xc)</message> <stderr_txt> usage: predict.py [-h] [-checkpoint CHECKPOINT] [-pdb PDB] [-silent SILENT] [-tmpl_chain TMPL_CHAIN] [-tmpl_conf TMPL_CONF] [-movescale MOVESCALE] [-n_pred N_PRED] [-n_cycle N_CYCLE] [-no_extra_l1] [-no_atom_frames] [-outcsv OUTCSV] predict.py: error: unrecognized arguments: -z .ev0.zip </stderr_txt> ]]> Grant Darwin NT |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 126 Credit: 193,939 RAC: 2,635 |
I just got 3Are they running, or all errors? Grant Darwin NT |
kotenok2000 Send message Joined: 26 Feb 21 Posts: 22 Credit: 1,893 RAC: 0 |
Downloading cv1.zip and ev0.zip |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 126 Credit: 193,939 RAC: 2,635 |
People, check your Task lists. And then check your application details for your host if it has been able to get any of the new batch. Things are extremely messed up. "Generalized biomolecular modeling and design with RoseTTAFold All-Atom v0.02" is showing as processing the last batch of work. "Generalized biomolecular modeling and design with RoseTTAFold All-Atom v0.01" is showing as processing the current batch of work. And when i look at my Application details, it thinks the same thing- the new version shows me as having done 31 Tasks. The old version shows me as having done 2 (the ones that just downloaded and died instantly). Could be the problem is that the old application is trying to process the new Tasks because everything is so scrambled? (Computing, Applications shows the old application has been removed, and the new one shows the Average computing value of the old one). OK, i'm now extremely confused. In my Event log. 18/06/2024 19:26:25 | ralph@home | Sending scheduler request: Requested by user. 18/06/2024 19:26:25 | ralph@home | Requesting new tasks for CPU 18/06/2024 19:26:26 | ralph@home | Scheduler request completed: got 2 new tasks 18/06/2024 19:26:26 | ralph@home | Project requested delay of 31 seconds 18/06/2024 19:26:28 | ralph@home | Started download of w_0.01_windows_x86_64.exe 18/06/2024 19:26:28 | ralph@home | Started download of cv1.zip 18/06/2024 19:26:28 | ralph@home | Started download of venv_i_pred_alpha_26.txt 18/06/2024 19:26:28 | ralph@home | Started download of venv_i_pred_alpha_15.txt 18/06/2024 19:26:31 | ralph@home | Finished download of w_0.01_windows_x86_64.exe (844288 bytes) 18/06/2024 19:26:31 | ralph@home | Finished download of venv_i_pred_alpha_15.txt (3520348 bytes) 18/06/2024 19:26:32 | ralph@home | Finished download of venv_i_pred_alpha_26.txt (3527288 bytes) 18/06/2024 19:30:08 | ralph@home | Finished download of cv1.zip (1215772557 bytes) 18/06/2024 19:30:11 | ralph@home | Starting task RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_i_pred_15_16905_5_0 18/06/2024 19:30:15 | ralph@home | Computation for task RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_i_pred_15_16905_5_0 finished 18/06/2024 19:30:15 | ralph@home | Output file RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_i_pred_15_16905_5_0_r724705856_0 for task RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_i_pred_15_16905_5_0 absent 18/06/2024 19:30:15 | ralph@home | Starting task RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_i_pred_26_16905_1_0 18/06/2024 19:30:18 | ralph@home | Computation for task RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_i_pred_26_16905_1_0 finished 18/06/2024 19:30:18 | ralph@home | Output file RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_i_pred_26_16905_1_0_r103562235_0 for task RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_i_pred_26_16905_1_0 absent Yet in my Ralph folder, i have a "cv2" and a "ev0" folder. and a "w_0.02_windows_x86_64.exe" Which at least match the .02 part of the new application, but... Searching for any of the files that were listed as being downloaded in my Event log in the Ralph project folders results in no "No items match your search" for any of them. ????? Grant Darwin NT |
kotenok2000 Send message Joined: 26 Feb 21 Posts: 22 Credit: 1,893 RAC: 0 |
I get error " zipfile.badZipFile: File is not a zip file" Why do they use base64 encoded files? |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 126 Credit: 193,939 RAC: 2,635 |
I get error " zipfile.badZipFile: File is not a zip file"Because apparently that's the standard for converting text to binary for data transmission (according to a few quick searches). Grant Darwin NT |
Message boards :
RALPH@home bug list :
RoseTTAFold All-Atom
©2024 University of Washington
http://www.bakerlab.org