RoseTTAFold All-Atom

Message boards : RALPH@home bug list : RoseTTAFold All-Atom

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7673 - Posted: 17 Jun 2024, 6:38:49 UTC - in response to Message 7672.  

My CPU / GPU system with a lot more resources failed ever task and tried to run too many at once and tied itself up so bad that I could not get control of the system short of a Hard ungraceful power down. I have set this box to no new Ralph tasks until the app is better behaved.
That's why i went with the max_concurrent option to limit them to 1 Task at a time.
If it's run on the GPU, it only needs 1 thread to support it (although there are the odd periods where it uses a bit more than that). If it's just the CPU- then each Task tries to use 8 threads, regardless of how many are actually physically available, let alone being used by other applications. And each Task can use a bit over to 2GB of RAM.
Grant
Darwin NT
ID: 7673 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 7674 - Posted: 17 Jun 2024, 21:15:47 UTC

Seem they changed the name of the app:
"Generalized biomolecular modeling and design with RoseTTAFold All-Atom 0.01 (env)"
ID: 7674 · Report as offensive    Reply Quote
Vester

Send message
Joined: 29 Apr 20
Posts: 17
Credit: 1,176
RAC: 0
Message 7675 - Posted: 18 Jun 2024, 0:30:02 UTC - in response to Message 7674.  
Last modified: 18 Jun 2024, 0:42:39 UTC

Seem they changed the name of the app:
"Generalized biomolecular modeling and design with RoseTTAFold All-Atom 0.01 (env)"

Waiting for some of the 1000 that are ready. We'll see if app_config.xml is still rosetta_beta.

Edit: Released. App_config still works without change. Forty-eight (48) failures. Project down for maintenance and I cannot see the failed tasks yet.
ID: 7675 · Report as offensive    Reply Quote
Coleslaw
Avatar

Send message
Joined: 29 Nov 08
Posts: 1
Credit: 947,069
RAC: 82
Message 7676 - Posted: 18 Jun 2024, 2:31:00 UTC - in response to Message 7597.  

My guess is the VRAM. The 4060 was only 8GB but the 3060 was a 12GB version.
ID: 7676 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 7677 - Posted: 18 Jun 2024, 5:26:17 UTC - in response to Message 7675.  

Forty-eight (48) failures. Project down for maintenance and I cannot see the failed tasks yet.


Yes. All errors also for me after few seconds
Waiting to report to server to see the logs
ID: 7677 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7678 - Posted: 18 Jun 2024, 5:49:28 UTC

I'm seeing Tasks ready to send, but when i try to contact the Scheduler i get
18/06/2024 15:13:51 | ralph@home | Project is temporarily shut down for maintenance
And the Server Staus page shows everything on the Ralph server as down.
Grant
Darwin NT
ID: 7678 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7679 - Posted: 18 Jun 2024, 5:51:04 UTC - in response to Message 7676.  

My guess is the VRAM. The 4060 was only 8GB but the 3060 was a 12GB version.
My RTX 2060 has only got 6GB of RAM, and was able to process & return work without issues during the last batch (once i upgraded the driver).
Grant
Darwin NT
ID: 7679 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7680 - Posted: 18 Jun 2024, 5:57:42 UTC - in response to Message 7675.  

Edit: Released. App_config still works without change. Forty-eight (48) failures. Project down for maintenance and I cannot see the failed tasks yet.
Looking at some of your past errored Tasks, this is a common thread.
OSError: [WinError 1455] The paging file is too small for this operation to complete.
If you've fixed your paging file size you might want to make it bigger- i've see Tasks with a Swap size of over 15GB.
Grant
Darwin NT
ID: 7680 · Report as offensive    Reply Quote
PMH_UK

Send message
Joined: 29 Jan 09
Posts: 5
Credit: 297,074
RAC: 0
Message 7681 - Posted: 18 Jun 2024, 9:24:57 UTC - in response to Message 7677.  
Last modified: 18 Jun 2024, 9:31:24 UTC

Just able to return but no task available to download.
Latest error:
<core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
The access code is invalid.
(0xc) - exit code 12 (0xc)</message>
<stderr_txt>
usage: predict.py [-h] [-checkpoint CHECKPOINT] [-pdb PDB] [-silent SILENT] [-tmpl_chain TMPL_CHAIN]
[-tmpl_conf TMPL_CONF] [-movescale MOVESCALE] [-n_pred N_PRED] [-n_cycle N_CYCLE] [-no_extra_l1]
[-no_atom_frames] [-outcsv OUTCSV]
predict.py: error: unrecognized arguments: -z .venv_h_pred_alpha_23.zip

</stderr_txt>
]]>
Paul.
ID: 7681 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 7682 - Posted: 18 Jun 2024, 9:44:08 UTC - in response to Message 7681.  

If i try to see my wus i have this error

Notice: Trying to get property of non-object in /projects/boinc/ralph/html/inc/result.inc on line 83

Notice: Trying to get property of non-object in /projects/boinc/ralph/html/inc/result.inc on line 84

Notice: Trying to get property of non-object in /projects/boinc/ralph/html/inc/result.inc on line 85

Notice: Trying to get property of non-object in /projects/boinc/ralph/html/inc/result.inc on line 86

Notice: Trying to get property of non-object in /projects/boinc/ralph/html/inc/result.inc on line 86

Notice: Trying to get property of non-object in /projects/boinc/ralph/html/inc/result.inc on line 87
ID: 7682 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7684 - Posted: 18 Jun 2024, 9:53:12 UTC

Well, the Server Staus now shows all green, but i still can't get any work.

18/06/2024 19:14:16 | ralph@home | Sending scheduler request: To fetch work.
18/06/2024 19:14:16 | ralph@home | Requesting new tasks for CPU and NVIDIA GPU
18/06/2024 19:14:18 | ralph@home | Scheduler request completed: got 0 new tasks
18/06/2024 19:14:18 | ralph@home | No tasks sent
18/06/2024 19:14:18 | ralph@home | Project requested delay of 31 seconds
18/06/2024 19:19:59 | ralph@home | Sending scheduler request: To fetch work.
18/06/2024 19:19:59 | ralph@home | Requesting new tasks for CPU
18/06/2024 19:20:00 | ralph@home | Scheduler request completed: got 0 new tasks
18/06/2024 19:20:00 | ralph@home | No tasks sent
18/06/2024 19:20:00 | ralph@home | Project requested delay of 31 seconds
18/06/2024 19:21:22 | ralph@home | update requested by user
18/06/2024 19:21:28 | ralph@home | Sending scheduler request: Requested by user.
18/06/2024 19:21:28 | ralph@home | Requesting new tasks for CPU
18/06/2024 19:21:29 | ralph@home | Scheduler request completed: got 0 new tasks
18/06/2024 19:21:29 | ralph@home | No tasks sent
18/06/2024 19:21:29 | ralph@home | Project requested delay of 31 seconds
18/06/2024 19:22:06 | ralph@home | Sending scheduler request: To fetch work.
18/06/2024 19:22:06 | ralph@home | Requesting new tasks for CPU
18/06/2024 19:22:07 | ralph@home | Scheduler request completed: got 0 new tasks
18/06/2024 19:22:07 | ralph@home | No tasks sent
18/06/2024 19:22:07 | ralph@home | Project requested delay of 31 seconds

Grant
Darwin NT
ID: 7684 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 7685 - Posted: 18 Jun 2024, 9:55:06 UTC - in response to Message 7684.  

Well, the Server Staus now shows all green, but i still can't get any work.


In home page there are over 400wus, but in the status page is 0.
Seems there is still some problems on the server
ID: 7685 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7686 - Posted: 18 Jun 2024, 9:56:01 UTC - in response to Message 7682.  
Last modified: 18 Jun 2024, 9:58:32 UTC

If i try to see my wus i have this error

Notice: Trying to get property of non-object in /projects/boinc/ralph/html/inc/result.inc on line 83

Notice: Trying to get property of non-object in /projects/boinc/ralph/html/inc/result.inc on line 84

Notice: Trying to get property of non-object in /projects/boinc/ralph/html/inc/result.inc on line 85

Notice: Trying to get property of non-object in /projects/boinc/ralph/html/inc/result.inc on line 86

Notice: Trying to get property of non-object in /projects/boinc/ralph/html/inc/result.inc on line 86

Notice: Trying to get property of non-object in /projects/boinc/ralph/html/inc/result.inc on line 87
But the rest of the info is there?
It's looking like the database got a bit scrambled during the outage.
Grant
Darwin NT
ID: 7686 · Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 26 Feb 21
Posts: 22
Credit: 1,893
RAC: 0
Message 7687 - Posted: 18 Jun 2024, 9:56:47 UTC

I just got 3
ID: 7687 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7688 - Posted: 18 Jun 2024, 9:58:01 UTC - in response to Message 7684.  
Last modified: 18 Jun 2024, 10:21:05 UTC

Well, the Server Staus now shows all green, but i still can't get any work.

18/06/2024 19:14:16 | ralph@home | Sending scheduler request: To fetch work.
18/06/2024 19:14:16 | ralph@home | Requesting new tasks for CPU and NVIDIA GPU
18/06/2024 19:14:18 | ralph@home | Scheduler request completed: got 0 new tasks
18/06/2024 19:14:18 | ralph@home | No tasks sent
18/06/2024 19:14:18 | ralph@home | Project requested delay of 31 seconds
18/06/2024 19:19:59 | ralph@home | Sending scheduler request: To fetch work.
18/06/2024 19:19:59 | ralph@home | Requesting new tasks for CPU
18/06/2024 19:20:00 | ralph@home | Scheduler request completed: got 0 new tasks
18/06/2024 19:20:00 | ralph@home | No tasks sent
18/06/2024 19:20:00 | ralph@home | Project requested delay of 31 seconds
18/06/2024 19:21:22 | ralph@home | update requested by user
18/06/2024 19:21:28 | ralph@home | Sending scheduler request: Requested by user.
18/06/2024 19:21:28 | ralph@home | Requesting new tasks for CPU
18/06/2024 19:21:29 | ralph@home | Scheduler request completed: got 0 new tasks
18/06/2024 19:21:29 | ralph@home | No tasks sent
18/06/2024 19:21:29 | ralph@home | Project requested delay of 31 seconds
18/06/2024 19:22:06 | ralph@home | Sending scheduler request: To fetch work.
18/06/2024 19:22:06 | ralph@home | Requesting new tasks for CPU
18/06/2024 19:22:07 | ralph@home | Scheduler request completed: got 0 new tasks
18/06/2024 19:22:07 | ralph@home | No tasks sent
18/06/2024 19:22:07 | ralph@home | Project requested delay of 31 seconds


Ok, this time i managed to get the new application, and i am presently downloading the cv1.zip file.
Hopefully once it's done i will be able to get more work (if there is any).


Edit- i've got 2 Tasks showing as downloading, but holding (waiting for the zip file to download hopefully).


Edit- Zip file downloaded, Tasks downloaded, instant error on both.

https://ralph.bakerlab.org/result.php?resultid=5466574
https://ralph.bakerlab.org/result.php?resultid=5466538

For both Tasks.
<core_client_version>8.0.2</core_client_version>
<![CDATA[
<message>
The access code is invalid.
 (0xc) - exit code 12 (0xc)</message>
<stderr_txt>
usage: predict.py [-h] [-checkpoint CHECKPOINT] [-pdb PDB] [-silent SILENT] [-tmpl_chain TMPL_CHAIN]
                  [-tmpl_conf TMPL_CONF] [-movescale MOVESCALE] [-n_pred N_PRED] [-n_cycle N_CYCLE] [-no_extra_l1]
                  [-no_atom_frames] [-outcsv OUTCSV]
predict.py: error: unrecognized arguments: -z .ev0.zip

</stderr_txt>
]]>

Grant
Darwin NT
ID: 7688 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7689 - Posted: 18 Jun 2024, 9:59:11 UTC - in response to Message 7687.  

I just got 3
Are they running, or all errors?
Grant
Darwin NT
ID: 7689 · Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 26 Feb 21
Posts: 22
Credit: 1,893
RAC: 0
Message 7690 - Posted: 18 Jun 2024, 10:00:24 UTC - in response to Message 7689.  

Downloading cv1.zip and ev0.zip
ID: 7690 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7692 - Posted: 18 Jun 2024, 10:13:02 UTC
Last modified: 18 Jun 2024, 10:56:31 UTC

People, check your Task lists.
And then check your application details for your host if it has been able to get any of the new batch.

Things are extremely messed up.
"Generalized biomolecular modeling and design with RoseTTAFold All-Atom v0.02" is showing as processing the last batch of work.
"Generalized biomolecular modeling and design with RoseTTAFold All-Atom v0.01" is showing as processing the current batch of work.

And when i look at my Application details, it thinks the same thing- the new version shows me as having done 31 Tasks.
The old version shows me as having done 2 (the ones that just downloaded and died instantly).


Could be the problem is that the old application is trying to process the new Tasks because everything is so scrambled?
(Computing, Applications shows the old application has been removed, and the new one shows the Average computing value of the old one).



OK, i'm now extremely confused.
In my Event log.
18/06/2024 19:26:25 | ralph@home | Sending scheduler request: Requested by user.
18/06/2024 19:26:25 | ralph@home | Requesting new tasks for CPU
18/06/2024 19:26:26 | ralph@home | Scheduler request completed: got 2 new tasks
18/06/2024 19:26:26 | ralph@home | Project requested delay of 31 seconds
18/06/2024 19:26:28 | ralph@home | Started download of w_0.01_windows_x86_64.exe
18/06/2024 19:26:28 | ralph@home | Started download of cv1.zip
18/06/2024 19:26:28 | ralph@home | Started download of venv_i_pred_alpha_26.txt
18/06/2024 19:26:28 | ralph@home | Started download of venv_i_pred_alpha_15.txt
18/06/2024 19:26:31 | ralph@home | Finished download of w_0.01_windows_x86_64.exe (844288 bytes)
18/06/2024 19:26:31 | ralph@home | Finished download of venv_i_pred_alpha_15.txt (3520348 bytes)
18/06/2024 19:26:32 | ralph@home | Finished download of venv_i_pred_alpha_26.txt (3527288 bytes)
18/06/2024 19:30:08 | ralph@home | Finished download of cv1.zip (1215772557 bytes)
18/06/2024 19:30:11 | ralph@home | Starting task RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_i_pred_15_16905_5_0
18/06/2024 19:30:15 | ralph@home | Computation for task RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_i_pred_15_16905_5_0 finished
18/06/2024 19:30:15 | ralph@home | Output file RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_i_pred_15_16905_5_0_r724705856_0 for task RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_i_pred_15_16905_5_0 absent
18/06/2024 19:30:15 | ralph@home | Starting task RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_i_pred_26_16905_1_0
18/06/2024 19:30:18 | ralph@home | Computation for task RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_i_pred_26_16905_1_0 finished
18/06/2024 19:30:18 | ralph@home | Output file RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_i_pred_26_16905_1_0_r103562235_0 for task RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_i_pred_26_16905_1_0 absent


Yet in my Ralph folder,
i have a "cv2" and a "ev0" folder.
and a "w_0.02_windows_x86_64.exe"
Which at least match the .02 part of the new application, but...

Searching for any of the files that were listed as being downloaded in my Event log in the Ralph project folders results in no "No items match your search" for any of them.
?????
Grant
Darwin NT
ID: 7692 · Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 26 Feb 21
Posts: 22
Credit: 1,893
RAC: 0
Message 7693 - Posted: 18 Jun 2024, 10:30:57 UTC
Last modified: 18 Jun 2024, 10:33:24 UTC

I get error " zipfile.badZipFile: File is not a zip file"

Why do they use base64 encoded files?
ID: 7693 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7694 - Posted: 18 Jun 2024, 10:55:10 UTC - in response to Message 7693.  

I get error " zipfile.badZipFile: File is not a zip file"

Why do they use base64 encoded files?
Because apparently that's the standard for converting text to binary for data transmission (according to a few quick searches).
Grant
Darwin NT
ID: 7694 · Report as offensive    Reply Quote
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · Next

Message boards : RALPH@home bug list : RoseTTAFold All-Atom



©2024 University of Washington
http://www.bakerlab.org