RoseTTAFold All-Atom

Message boards : RALPH@home bug list : RoseTTAFold All-Atom

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
Henk Haneveld

Send message
Joined: 13 Apr 21
Posts: 8
Credit: 88
RAC: 0
Message 7695 - Posted: 18 Jun 2024, 12:35:37 UTC

It looks like they changed the definition for BOINC from a CPU app to a GPU app.

But because I have a older Nvida card with limited memory a get:

18/06/2024 14:25:32 | ralph@home | Requesting new tasks for NVIDIA GPU
18/06/2024 14:25:33 | ralph@home | Scheduler request completed: got 0 new tasks
18/06/2024 14:25:33 | ralph@home | A minimum of 6144 MB (preferably 6144 MB) of video RAM is needed to process tasks using your computer's NVIDIA GPU
18/06/2024 14:25:33 | ralph@home | Project requested delay of 31 seconds
ID: 7695 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 7697 - Posted: 18 Jun 2024, 12:45:47 UTC - in response to Message 7695.  

18/06/2024 14:25:32 | ralph@home | Requesting new tasks for NVIDIA GPU
18/06/2024 14:25:33 | ralph@home | Scheduler request completed: got 0 new tasks
18/06/2024 14:25:33 | ralph@home | A minimum of 6144 MB (preferably 6144 MB) of video RAM is needed to process tasks using your computer's NVIDIA GPU
18/06/2024 14:25:33 | ralph@home | Project requested delay of 31 seconds


Not bad, not bad... :-(
ID: 7697 · Report as offensive    Reply Quote
Henk Haneveld

Send message
Joined: 13 Apr 21
Posts: 8
Credit: 88
RAC: 0
Message 7698 - Posted: 18 Jun 2024, 12:51:27 UTC - in response to Message 7697.  

18/06/2024 14:25:32 | ralph@home | Requesting new tasks for NVIDIA GPU
18/06/2024 14:25:33 | ralph@home | Scheduler request completed: got 0 new tasks
18/06/2024 14:25:33 | ralph@home | A minimum of 6144 MB (preferably 6144 MB) of video RAM is needed to process tasks using your computer's NVIDIA GPU
18/06/2024 14:25:33 | ralph@home | Project requested delay of 31 seconds


Not bad, not bad... :-(

It is bad because the next call was for CPU work and I got this.

18/06/2024 14:44:09 | ralph@home | Requesting new tasks for CPU
18/06/2024 14:44:10 | ralph@home | Scheduler request completed: got 0 new tasks
18/06/2024 14:44:10 | ralph@home | A minimum of 6144 MB (preferably 6144 MB) of video RAM is needed to process tasks using your computer's NVIDIA GPU
18/06/2024 14:44:10 | ralph@home | No tasks sent
18/06/2024 14:44:10 | ralph@home | Project requested delay of 31 seconds

A clear sign that mixing CPU and GPU setting won't work.
ID: 7698 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7704 - Posted: 18 Jun 2024, 18:07:55 UTC - in response to Message 7695.  

It looks like they changed the definition for BOINC from a CPU app to a GPU app.

But because I have a older Nvida card with limited memory a get:

18/06/2024 14:25:32 | ralph@home | Requesting new tasks for NVIDIA GPU
18/06/2024 14:25:33 | ralph@home | Scheduler request completed: got 0 new tasks
18/06/2024 14:25:33 | ralph@home | A minimum of 6144 MB (preferably 6144 MB) of video RAM is needed to process tasks using your computer's NVIDIA GPU
18/06/2024 14:25:33 | ralph@home | Project requested delay of 31 seconds
I have that much VRAM, and it worked with the previous batch of work, but i am unable to get any work due to the same message on my system.
For either CPU or GPU.
Grant
Darwin NT
ID: 7704 · Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 26 Feb 21
Posts: 22
Credit: 1,893
RAC: 0
Message 7705 - Posted: 18 Jun 2024, 18:13:03 UTC - in response to Message 7704.  

You probably have old boinc client that doesn't report more than 4 gb of vram.

Install it using this.
https://boinc.berkeley.edu/linux_install.php
ID: 7705 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7708 - Posted: 19 Jun 2024, 6:11:44 UTC - in response to Message 7705.  

You probably have old boinc client that doesn't report more than 4 gb of vram.

Install it using this.
https://boinc.berkeley.edu/linux_install.php
I'm running Windows, and i've got the current Manager v8.0.2
Grant
Darwin NT
ID: 7708 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7710 - Posted: 19 Jun 2024, 6:42:50 UTC - in response to Message 7704.  
Last modified: 19 Jun 2024, 7:00:58 UTC

It looks like they changed the definition for BOINC from a CPU app to a GPU app.

But because I have a older Nvida card with limited memory a get:

18/06/2024 14:25:32 | ralph@home | Requesting new tasks for NVIDIA GPU
18/06/2024 14:25:33 | ralph@home | Scheduler request completed: got 0 new tasks
18/06/2024 14:25:33 | ralph@home | A minimum of 6144 MB (preferably 6144 MB) of video RAM is needed to process tasks using your computer's NVIDIA GPU
18/06/2024 14:25:33 | ralph@home | Project requested delay of 31 seconds
I have that much VRAM, and it worked with the previous batch of work, but i am unable to get any work due to the same message on my system.
For either CPU or GPU.
As it worked previously (i'd hope the new application doesn't require that much more VRAM than before), i'd hope they'd lower the requirement significantly. Ideally to less than 4GB, so 4GB and more would be OK

Looking at many of the Tasks that were processed by the new v0.02 (nvidia_alpha) application, the Peak working set size used is often well under 3GB.

Even Just 1MB (2MB to be sure) less and i'd be able to run a task.
Reported specs by BOINC, and what the minimum is set to-
 GeForce RTX 2060 (6143MB)
       minimum of  6144MB
Just one lousy MB...


I would also hope that they make it so that if you have a video card & the correct driver, but not enough video RAM, you can still download Tasks to run on the CPU (this is one of the problems with having a single application that can run on both the CPU & the GPU- the project has to come up with some way of allowing it to still be able run on one compute resource when the other isn't suitable. Easily done with 2 different applications (even if they are the same application with different names they still count as different applications). Not so easily done with a single one).
Grant
Darwin NT
ID: 7710 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7711 - Posted: 19 Jun 2024, 6:54:42 UTC
Last modified: 19 Jun 2024, 7:02:01 UTC

Looking at some of the Tasks that were able to run on a GPU on the new Application, there appears to be an issue with the expected runtime.

eg
RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_i_pred_74_16905_2_0
<core_client_version>8.0.0</core_client_version>
<![CDATA[
<message>
exceeded elapsed time limit 1594.21 (100000000.00G/62727.08G)</message>
<stderr_txt>
actual Runtime-
1,595.45

It looks like the expected APR (Average Processing Rate) for the GPU used (an RTX 3060) was higher than the (reported) actual rate, so the time limit to complete the Task was way too short, and it was terminated before completion.

This system had that occur with all the new Tasks on the new v0.02 (nvidia_alpha) Application.
Grant
Darwin NT
ID: 7711 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7712 - Posted: 19 Jun 2024, 7:31:32 UTC

Ok, in the Ralph@home preferences, there are now a couple of options
       Use CPU
Use NVIDIA GPU


With both selected, this is what i was getting when requesting work

19/06/2024 16:37:55 | ralph@home | update requested by user
19/06/2024 16:37:58 | ralph@home | Sending scheduler request: Requested by user.
19/06/2024 16:37:58 | ralph@home | Requesting new tasks for CPU and NVIDIA GPU
19/06/2024 16:37:59 | ralph@home | Scheduler request completed: got 0 new tasks
19/06/2024 16:37:59 | ralph@home | A minimum of 6144 MB (preferably 6144 MB) of video RAM is needed to process tasks using your computer's NVIDIA GPU
19/06/2024 16:37:59 | ralph@home | No tasks sent
19/06/2024 16:37:59 | ralph@home | Project requested delay of 31 seconds
19/06/2024 16:38:37 | ralph@home | Sending scheduler request: To fetch work.
19/06/2024 16:38:37 | ralph@home | Requesting new tasks for CPU and NVIDIA GPU
19/06/2024 16:38:38 | ralph@home | Scheduler request completed: got 0 new tasks
19/06/2024 16:38:38 | ralph@home | A minimum of 6144 MB (preferably 6144 MB) of video RAM is needed to process tasks using your computer's NVIDIA GPU
19/06/2024 16:38:38 | ralph@home | No tasks sent
19/06/2024 16:38:38 | ralph@home | Project requested delay of 31 seconds
19/06/2024 16:39:26 | ralph@home | update requested by user
19/06/2024 16:39:30 | ralph@home | Sending scheduler request: Requested by user.
19/06/2024 16:39:30 | ralph@home | Requesting new tasks for CPU and NVIDIA GPU


I de-selected the "Use NVIDIA GPU" option, and now this is what i get in response.

19/06/2024 16:40:09 | ralph@home | Sending scheduler request: To fetch work.
19/06/2024 16:40:09 | ralph@home | Requesting new tasks for CPU
19/06/2024 16:40:10 | ralph@home | Scheduler request completed: got 0 new tasks
19/06/2024 16:40:10 | ralph@home | A minimum of 6144 MB (preferably 6144 MB) of video RAM is needed to process tasks using your computer's NVIDIA GPU
19/06/2024 16:40:10 | ralph@home | No tasks sent
19/06/2024 16:40:10 | ralph@home | Tasks for NVIDIA GPU are available, but your preferences are set to not accept them
19/06/2024 16:40:10 | ralph@home | Project requested delay of 31 seconds
19/06/2024 16:44:55 | ralph@home | update requested by user
19/06/2024 16:45:00 | ralph@home | Sending scheduler request: Requested by user.
19/06/2024 16:45:00 | ralph@home | Requesting new tasks for CPU
19/06/2024 16:45:01 | ralph@home | Scheduler request completed: got 0 new tasks
19/06/2024 16:45:01 | ralph@home | A minimum of 6144 MB (preferably 6144 MB) of video RAM is needed to process tasks using your computer's NVIDIA GPU
19/06/2024 16:45:01 | ralph@home | No tasks sent
19/06/2024 16:45:01 | ralph@home | Tasks for NVIDIA GPU are available, but your preferences are set to not accept them
19/06/2024 16:45:01 | ralph@home | Project requested delay of 31 seconds
19/06/2024 16:45:38 | ralph@home | Sending scheduler request: To fetch work.
19/06/2024 16:45:38 | ralph@home | Requesting new tasks for CPU
19/06/2024 16:45:39 | ralph@home | Scheduler request completed: got 0 new tasks
19/06/2024 16:45:39 | ralph@home | A minimum of 6144 MB (preferably 6144 MB) of video RAM is needed to process tasks using your computer's NVIDIA GPU
19/06/2024 16:45:39 | ralph@home | No tasks sent
19/06/2024 16:45:39 | ralph@home | Tasks for NVIDIA GPU are available, but your preferences are set to not accept them
19/06/2024 16:45:39 | ralph@home | Project requested delay of 31 seconds
So it's recognising that i want CPU work, and no GPU work, but it still isn't sending any. As there is GPU work available, but supposedly no CPU work.


So i then set it to no CPU work, but give me GPU work, and this is now the response
19/06/2024 16:47:16 | ralph@home | update requested by user
19/06/2024 16:47:22 | ralph@home | Sending scheduler request: Requested by user.
19/06/2024 16:47:22 | ralph@home | Requesting new tasks for CPU
19/06/2024 16:47:23 | ralph@home | Scheduler request completed: got 0 new tasks
19/06/2024 16:47:23 | ralph@home | A minimum of 6144 MB (preferably 6144 MB) of video RAM is needed to process tasks using your computer's NVIDIA GPU
19/06/2024 16:47:23 | ralph@home | No tasks sent
19/06/2024 16:47:23 | ralph@home | Tasks for CPU are available, but your preferences are set to not accept them
19/06/2024 16:47:23 | ralph@home | Project requested delay of 31 seconds
19/06/2024 16:48:01 | ralph@home | Sending scheduler request: To fetch work.
19/06/2024 16:48:01 | ralph@home | Requesting new tasks for NVIDIA GPU
19/06/2024 16:48:03 | ralph@home | Scheduler request completed: got 0 new tasks
19/06/2024 16:48:03 | ralph@home | A minimum of 6144 MB (preferably 6144 MB) of video RAM is needed to process tasks using your computer's NVIDIA GPU
19/06/2024 16:48:03 | ralph@home | Project requested delay of 31 seconds
19/06/2024 16:49:09 | ralph@home | update requested by user
19/06/2024 16:49:16 | ralph@home | Sending scheduler request: Requested by user.
19/06/2024 16:49:16 | ralph@home | Requesting new tasks for NVIDIA GPU
19/06/2024 16:49:17 | ralph@home | Scheduler request completed: got 0 new tasks
19/06/2024 16:49:17 | ralph@home | A minimum of 6144 MB (preferably 6144 MB) of video RAM is needed to process tasks using your computer's NVIDIA GPU
19/06/2024 16:49:17 | ralph@home | Project requested delay of 31 seconds
19/06/2024 16:49:54 | ralph@home | Sending scheduler request: To fetch work.
19/06/2024 16:49:54 | ralph@home | Requesting new tasks for NVIDIA GPU
19/06/2024 16:49:55 | ralph@home | Scheduler request completed: got 0 new tasks
19/06/2024 16:49:55 | ralph@home | A minimum of 6144 MB (preferably 6144 MB) of video RAM is needed to process tasks using your computer's NVIDIA GPU
19/06/2024 16:49:55 | ralph@home | Project requested delay of 31 seconds


It mentions that there are CPU tasks available, until it gets the update to not request them, then it doesn't mention the availability of CPU work after that.

And enabling & disabling the use of the CPU & GPU in various other combinations still results in no work- the GPU because it doesn't meet the memory requirements, but i've no idea why it won't send out any work to run on the CPU.
Probably because it uses the same application. And if it were it to send out work for the CPU, it would end up on the GPU- which it thinks isn't up to the task. And if it weren't up to the Task that would lead to all sorts of problems- hence why i suspect it's not sending out CPU work when requested.
Grant
Darwin NT
ID: 7712 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7713 - Posted: 19 Jun 2024, 8:41:05 UTC
Last modified: 19 Jun 2024, 8:58:47 UTC

Just discovered another problem.

Just attached my second system to Ralph, which has an RTX2060 Super, with the latest driver.

It was able to download a bunch of work while the .zip files were still downloading- unfortunately the Estimated completion time for each of the Tasks is 3 seconds.
So i've set No New Tasks for now, and we'll see what happens once it finishes downloading the .zip files.
My guess- all of the Tasks downloaded so far will error out due to "Estimated completion time exceeded" errors.


OK, this is very odd.
New application appears to be reserving 1 thread for each GPU Task (0.997 CPUs + 1 NVIDIA GPU), Excellent (this would make max concurrent unnecessary for systems doing GPU only work, but don't know if CPU thread usage has been limited as well *fingers crossed*).
When the Task starts up, the first 30 sec or so the timer counts up, no movement on the Progress bar. Then it goes to 100%, estimated time 0, but then it starts running on the GPU.
Status bar shows it as running. Elapsed time continues to climb.


VRAM In use showing as 6.22GB (8192MB available), so 6GB cards may not be good enough 2.4GB of system RAM in use.
Would a card with more VRAM use more VRAM? Less VRAM use less VRAM? If so, it would allow more hardware to do work if 6GB and even 4GB video cards could be used.



Edit- system just completed it's first Task in 25min 44 sec, and it Validated.
Grant
Darwin NT
ID: 7713 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7725 - Posted: 19 Jun 2024, 18:20:01 UTC
Last modified: 19 Jun 2024, 18:32:40 UTC

This morning i checked my system that is able to get work- Estimated completion time is now 1 hour. The Progress bar isn't jumping to 100% almost as soon as it starts and is progressing as the Task is processed as well.


Interestingly, Event log messages on system that can get work have changed.

Before
19/06/2024 18:27:20 | ralph@home | Sending scheduler request: To fetch work.
19/06/2024 18:27:20 | ralph@home | Requesting new tasks for CPU and NVIDIA GPU
19/06/2024 18:27:22 | ralph@home | Scheduler request completed: got 2 new tasks
19/06/2024 18:27:22 | ralph@home | Project requested delay of 31 seconds
19/06/2024 18:27:24 | ralph@home | Started download of venv_i_pred_alpha_160.txt
19/06/2024 18:27:24 | ralph@home | Started download of venv_i_pred_alpha_62.txt
19/06/2024 18:27:27 | ralph@home | Finished download of venv_i_pred_alpha_160.txt (3532464 bytes)
19/06/2024 18:27:27 | ralph@home | Finished download of venv_i_pred_alpha_62.txt (3526876 bytes)
19/06/2024 18:27:59 | ralph@home | Sending scheduler request: To fetch work.
19/06/2024 18:27:59 | ralph@home | Requesting new tasks for CPU and NVIDIA GPU
19/06/2024 18:28:01 | ralph@home | Scheduler request completed: got 2 new tasks
19/06/2024 18:28:01 | ralph@home | Project requested delay of 31 seconds
19/06/2024 18:28:03 | ralph@home | Started download of venv_i_pred_alpha_150.txt
19/06/2024 18:28:06 | ralph@home | Finished download of venv_i_pred_alpha_150.txt (3509324 bytes)
19/06/2024 18:28:38 | ralph@home | Sending scheduler request: To fetch work.
19/06/2024 18:28:38 | ralph@home | Requesting new tasks for CPU and NVIDIA GPU
19/06/2024 18:28:40 | ralph@home | Scheduler request completed: got 2 new tasks
19/06/2024 18:28:40 | ralph@home | Project requested delay of 31 seconds
19/06/2024 18:28:42 | ralph@home | Started download of venv_i_pred_alpha_29.txt
19/06/2024 18:28:46 | ralph@home | Finished download of venv_i_pred_alpha_29.txt (3523v216 bytes)


Now
20/06/2024 3:50:29 | ralph@home | update requested by user
20/06/2024 3:50:30 | ralph@home | Sending scheduler request: Requested by user.
20/06/2024 3:50:30 | ralph@home | Reporting 3 completed tasks
20/06/2024 3:50:30 | ralph@home | Requesting new tasks for CPU
20/06/2024 3:50:32 | ralph@home | Scheduler request completed: got 0 new tasks
20/06/2024 3:50:32 | ralph@home | No tasks sent
20/06/2024 3:50:32 | ralph@home | Project requested delay of 31 seconds
20/06/2024 3:51:09 | ralph@home | Sending scheduler request: To fetch work.
20/06/2024 3:51:09 | ralph@home | Requesting new tasks for CPU
20/06/2024 3:51:11 | ralph@home | Scheduler request completed: got 0 new tasks
20/06/2024 3:51:11 | ralph@home | No tasks sent
20/06/2024 3:51:11 | ralph@home | Project requested delay of 31 seconds



It's just requesting GPU work, the system that can't get work is still requesting both types, Still have message about GPU RAM limit.
Grant
Darwin NT
ID: 7725 · Report as offensive    Reply Quote
Fardringle

Send message
Joined: 22 Feb 06
Posts: 18
Credit: 360,436
RAC: 1,901
Message 7727 - Posted: 19 Jun 2024, 21:45:12 UTC

GPU tasks are still failing immediately on my RTX 3060Ti. It's not a VRAM problem as the 3060Ti has 8GB.

It looks like it's actually a programming problem and there needs to be quotation marks around the file path, or whatever method is appropriate in Python to allow spaces in file paths, since is it failing when it tries to access the BOINC directory inside Program Files.

<core_client_version>8.0.2</core_client_version>
<![CDATA[
<message>
The access code is invalid.
 (0xc) - exit code 12 (0xc)</message>
<stderr_txt>
'C:Program' is not recognized as an internal or external command,
operable program or batch file.

</stderr_txt>
]]>

ID: 7727 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7730 - Posted: 20 Jun 2024, 4:30:20 UTC - in response to Message 7727.  

GPU tasks are still failing immediately on my RTX 3060Ti. It's not a VRAM problem as the 3060Ti has 8GB.

It looks like it's actually a programming problem and there needs to be quotation marks around the file path, or whatever method is appropriate in Python to allow spaces in file paths, since is it failing when it tries to access the BOINC directory inside Program Files.

<core_client_version>8.0.2</core_client_version>
<![CDATA[
<message>
The access code is invalid.
 (0xc) - exit code 12 (0xc)</message>
<stderr_txt>
'C:Program' is not recognized as an internal or external command,
operable program or batch file.

</stderr_txt>
]]>
Don't know what's going on there as i had that error issue with one of my systems early on, then the next time it tried for work it was able to get it and process it OK (that was with 6GB of VRAM).
When i could no longer get work on that system, i attached to Ralph with my other system, and it downloaded the files & started processing with no problems (other than some initial weirdness with estimated time & progress indication).

As it's unable to do any Ralph work at present, I'd suggest doing a Reset Project on that system- let it download everything from scratch & see if the error still occurs.
Grant
Darwin NT
ID: 7730 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7735 - Posted: 21 Jun 2024, 4:46:46 UTC - in response to Message 7727.  

GPU tasks are still failing immediately on my RTX 3060Ti. It's not a VRAM problem as the 3060Ti has 8GB.

It looks like it's actually a programming problem and there needs to be quotation marks around the file path, or whatever method is appropriate in Python to allow spaces in file paths, since is it failing when it tries to access the BOINC directory inside Program Files.

<core_client_version>8.0.2</core_client_version>
<![CDATA[
<message>
The access code is invalid.
 (0xc) - exit code 12 (0xc)</message>
<stderr_txt>
'C:Program' is not recognized as an internal or external command,
operable program or batch file.

</stderr_txt>
]]>
Looks like you were able to get this system to process some work in the end- 54 Valid Tasks.
Grant
Darwin NT
ID: 7735 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7743 - Posted: 21 Jun 2024, 10:21:10 UTC

Just noticed this on the Computing, Applications page
0.03 (nvidia_alpha) 21 Jun 2024, 8:41:54 UTC
Another new version, so with a bit of luck there might be some more work over the weekend.
(It would be nice to know what to expect with the new version...).
Grant
Darwin NT
ID: 7743 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7744 - Posted: 21 Jun 2024, 11:26:24 UTC
Last modified: 21 Jun 2024, 12:21:24 UTC

And new work is flowing.

On my system that was able to process work the last time around,
The Initial Remaining (estimated) time is very close to the past actual Runtimes, within about 30 sec.

This new application isn't working the GPU as hard as the last one- GPU load was around 90% most of the time with the occasional large dips to 50%. This one, is around 85% and it's large dips are as low as 25%. And during those dips CPU usage goes from a bit over 1 thread to a bit under 2 threads for a few seconds (or less).
GPU power draw is much choppier, showing the lower sustained load on the GPU.
Memory controller load is very close to the previous version levels.



My other system that couldn't get work due to only 6GB of VRAM, has now downloaded the new application and Tasks, and is crunching.
And all Tasks are now erroring out on 6GB VRAM system.
First Task ran for 1:41secs (suspect mostly CPU) then errored out, then the rest for 3 sec, then errored out.
Reset Project, but still broken.

Doesn't appear to be memory related as such, but a configuration issue.

RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_j_pred_46_16906_2_0
<core_client_version>8.0.2</core_client_version>
<![CDATA[
<message>
The access code is invalid.
 (0xc) - exit code 12 (0xc)</message>
<stderr_txt>
C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aautil.py:450: UserWarning: Using torch.cross without specifying the dim arg is deprecated.
Please either pass the dim explicitly or simply use torch.linalg.cross.
The default value of dim will change to agree with that of linalg.cross in a future release. (Triggered internally at C:cbpytorch_1000000000000workatensrcATennativeCross.cpp:66.)
  Z = torch.cross(Xn,Yn)
usage: predict.py [-h] [-checkpoint CHECKPOINT] [-pdb PDB] [-silent SILENT] [-tmpl_chain TMPL_CHAIN]
                  [-tmpl_conf TMPL_CONF] [-movescale MOVESCALE] [-n_pred N_PRED] [-n_cycle N_CYCLE] [-no_extra_l1]
                  [-no_atom_frames] [-outcsv OUTCSV]
predict.py: error: unrecognized arguments: -z .venv_j_pred_alpha_46.zip

</stderr_txt>
]]>


Interestingly the Tasks of this batch that are running OK & Validating on my other system with a RTX 2060 Super (8GB VRAM) give a similar error message, but don't error out.

RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_j_pred_8_16906_5_0

<core_client_version>8.0.2</core_client_version>
<![CDATA[
<stderr_txt>
C:ProgramDataBOINCprojectsralph.bakerlab.orgcv1rf2aautil.py:450: UserWarning: Using torch.cross without specifying the dim arg is deprecated.
Please either pass the dim explicitly or simply use torch.linalg.cross.
The default value of dim will change to agree with that of linalg.cross in a future release. (Triggered internally at C:cbpytorch_1000000000000workatensrcATennativeCross.cpp:66.)
  Z = torch.cross(Xn,Yn)
20:49:39 (15324): called boinc_finish(0)

</stderr_txt>
]]>




My other system that is completing Tasks without issue, the most VRAM i've seen in use has been 4790MB. And system RAM is less than 1GB (much, much less than before).
Grant
Darwin NT
ID: 7744 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7745 - Posted: 21 Jun 2024, 11:56:24 UTC - in response to Message 7744.  
Last modified: 21 Jun 2024, 12:42:31 UTC

My other system that couldn't get work due to only 6GB of VRAM, has now downloaded the new application and Tasks, and is crunching.
And all Tasks are now erroring out.
First Task ran for 1:41secs (suspect mostly CPU) then errored out, then the rest for 3 sec, then errored out.
Reset Project, but still broken.
OK, this is odd.

After resetting the project, my RTX 2060 (6GB VRAM) is now actually processing work, but everything else errored out on the GTX 1070 (8GB VRAM). (before they errored out on both cards).
Damn.
And i can't get any more work to see what's going on till tomorrow. (i've hit my daily quota with all the other errors).
The Task it was processing finished & Validated, with the same stderr_txt as those on the other system.

Monitoring VRAM on the RTX 2060 (6GB VRAM), and it's got up to 6121MB Maximum in use (so far). The other video card has more RAM, but is using less of it... (5314MB max so far)



Note to developers- it would be worth including in the stderr_txt the video card details of what card completed the Task eg Device 0 RTX 2060 6GB, Device 1 GTX 1070 8GB, so that for systems with multiple GPUs, if something goes wrong, we know what it went wrong on.
Also for when there is a CPU application again- include CPU in the stderr_txt to help avoid confusion with GPU processed Tasks.
Grant
Darwin NT
ID: 7745 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 7746 - Posted: 21 Jun 2024, 12:58:30 UTC - in response to Message 7712.  

It mentions that there are CPU tasks available, until it gets the update to not request them, then it doesn't mention the availability of CPU work after that.
And enabling & disabling the use of the CPU & GPU in various other combinations still results in no work- the GPU because it doesn't meet the memory requirements, but i've no idea why it won't send out any work to run on the CPU.
Probably because it uses the same application. And if it were it to send out work for the CPU, it would end up on the GPU- which it thinks isn't up to the task. And if it weren't up to the Task that would lead to all sorts of problems- hence why i suspect it's not sending out CPU work when requested.



Ok. I disabled the gpu option in my profile ('cause i have not Nvidia)
And the result is:
Tasks for NVIDIA GPU are available, but your preferences are set to not accept them

Waiting for cpu wus, if there ever will be
ID: 7746 · Report as offensive    Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 8 Aug 06
Posts: 75
Credit: 2,396,363
RAC: 6,299
Message 7748 - Posted: 21 Jun 2024, 14:30:04 UTC

.03 is working for me now.
Reno, NV
Team: SETI.USA
ID: 7748 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7752 - Posted: 21 Jun 2024, 17:50:06 UTC
Last modified: 21 Jun 2024, 18:03:57 UTC

Finally able to get more work on system that errored Tasks.

However, while the BOINC Manager indicates they are running on different GPUs, they are in fact running on the same GPU (which explains why so many errored out even though many showed as being on the GTX 1070 8GB VRAM GPU). It doesn't explain why that one Task was able to run OK after all the others did error out on the GPU that is now running Tasks OK....
And the Remaining (estimated) time is 3hr 20min, not the actual 26min or so processing time.

Re-instated application limit of 1 Task.
Exited BOINC and restarted, Task restarted from beginning (still no checkpointing),

Looking at progress indicator- appears to be based on time done and initial Remaining (estimated) time, not actual processing done.


This time around Monitoring VRAM on the RTX 2060 (6GB VRAM), and it's got up to 6097MB Maximum in use (so far).
Grant
Darwin NT
ID: 7752 · Report as offensive    Reply Quote
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next

Message boards : RALPH@home bug list : RoseTTAFold All-Atom



©2024 University of Washington
http://www.bakerlab.org