RoseTTAFold All-Atom

Message boards : RALPH@home bug list : RoseTTAFold All-Atom

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 890
Credit: 1,889,390
RAC: 1
Message 7652 - Posted: 15 Jun 2024, 13:39:24 UTC - in response to Message 7643.  

In LHC the same application runs ATLAS tasks on anything from 1 to 8 CPU threads. You don't need a different application just to change the requirements!!


They are using virtualbox app.
So, no gpu.
ID: 7652 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 890
Credit: 1,889,390
RAC: 1
Message 7653 - Posted: 15 Jun 2024, 13:40:41 UTC - in response to Message 7651.  

It is like GPU grid Python apps for GPU hosts


So, why i'm running some wus exclusively on cpu?
ID: 7653 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 890
Credit: 1,889,390
RAC: 1
Message 7654 - Posted: 15 Jun 2024, 13:44:53 UTC - in response to Message 7647.  

Yet you forgot the most important part of that- the time taken to do a given Task by the slowest of CPUs must be the same as the time taken to do that very same Task by the most powerful of GPUs. If not, all Scheduling is screwed & deadlines will be missed, Resource Share balancing will take forever, if it were to occur at all (oh, and i forgot about the random nature of the amount of Credit being awarded).

Which is why on every other BOINC project where the Runtime of Tasks is not fixed, they have different applications for CPU work and GPU work, for processing the same Tasks, on the same hardware, with the same OS (and even if like here the application is the same, the application names are different for the CPU & that GPU- effectively providing different CPU & GPU applications)


It seems so clear and logical to me
ID: 7654 · Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 26 Feb 21
Posts: 22
Credit: 1,893
RAC: 0
Message 7655 - Posted: 15 Jun 2024, 13:47:08 UTC

Rosettafold doesn't suspend whend told do.
When i suspended it it continued running.
ID: 7655 · Report as offensive    Reply Quote
Henk Haneveld

Send message
Joined: 13 Apr 21
Posts: 3
Credit: 88
RAC: 0
Message 7656 - Posted: 15 Jun 2024, 14:27:26 UTC

Why is there a stupid discussion over a combined CPU and GPU app, No such thing exists.

If you look in the properties for Ralph in your client you will see it shows:

Project has no apps for NVIDIA GPU

There is also no GPU app listed in Applications on the site and the app just runs on the CPU and cannot run on GPU.
ID: 7656 · Report as offensive    Reply Quote
mikey

Send message
Joined: 28 Nov 20
Posts: 8
Credit: 114,593
RAC: 280
Message 7657 - Posted: 15 Jun 2024, 15:47:45 UTC - in response to Message 7642.  

Look at the Einstein GPU tasks, they use BOTH the cpu and gpu. Just like Ralph, precisely no difference.
That shows just how confused you are. The GPU processes the Task, the CPU supports the GPU by keeping it fed. The CPU doesn't actually do any processing, the GPU does that. Depending on the Task, with a very well written application, CPU support can be next to nothing.
And none of that has any relevance to the post i made & you misread & mis-quoted. Repeatedly, over and over again.


Actually Peter is right about the Einstein gpu tasks, the newer tasks pause gpu crunching for a bit at 2 different times and process stuff on the cpu then go back to running more of the task on the gpu again. They said the reason is the gpu isn't as accurate as the cpu is and they need the more accurate cpu numbers.
ID: 7657 · Report as offensive    Reply Quote
mikey

Send message
Joined: 28 Nov 20
Posts: 8
Credit: 114,593
RAC: 280
Message 7658 - Posted: 15 Jun 2024, 15:49:16 UTC - in response to Message 7655.  

Rosettafold doesn't suspend whend told do.
When i suspended it it continued running.


give it time, mine does the same thing but does suspend eventually
ID: 7658 · Report as offensive    Reply Quote
mikey

Send message
Joined: 28 Nov 20
Posts: 8
Credit: 114,593
RAC: 280
Message 7659 - Posted: 15 Jun 2024, 15:52:53 UTC - in response to Message 7623.  

There needs to be a way to limit the number of threads a single Task can use.
And at the moment, the indications are that 1 Task using 8 threads performs no better than when 12 Tasks were trying to use 8 threads each, when there were only 12 threads available.
I'll keep an eye on things to see if they don't slow down later, but the initial signs are that the extra threads are providing not even the slightest improvement in processing time- they're just being wasted.


Try this:

<app_config>
<project_max_concurrent>1</project_max_concurrent>
</app_config>
Already done that, but just using max_concurrent. Even so, that doesn't limit the number of threads per Task, just the number of Tasks.



The other thing I'm seeing is that the Ralph tasks are taking about 10gb of ram for EACH task so I had to limit my running tasks accordingly. I'm running 1 cpu core per task and they are taking over 2 days to finish. I'm still getting some errors but have not ruled out it being a pc problem as yet.
The most i've seen for a CPU processed Task in use is a bit over 1.5GB.
For GPU processed Tasks, they're using up to 2.5GB of system RAM & 6GB of VRAM.
However the peak Swap file size i've seen is as high as 15GB.

Most of your errors appear to be file transfer errors, nothing to do with RAM usage.


Thank you, I will un-suspend the Project then.
ID: 7659 · Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 26 Feb 21
Posts: 22
Credit: 1,893
RAC: 0
Message 7660 - Posted: 15 Jun 2024, 15:54:57 UTC

And i was trying to run it on 4 gb gpu.
ID: 7660 · Report as offensive    Reply Quote
mikey

Send message
Joined: 28 Nov 20
Posts: 8
Credit: 114,593
RAC: 280
Message 7661 - Posted: 15 Jun 2024, 16:13:30 UTC - in response to Message 7623.  
Last modified: 15 Jun 2024, 16:17:14 UTC

Mikey said:

The other thing I'm seeing is that the Ralph tasks are taking about 10gb of ram for EACH task so I had to limit my running tasks accordingly. I'm running 1 cpu core per task and they are taking over 2 days to finish. I'm still getting some errors but have not ruled out it being a pc problem as yet.


Grant SSF said:

The most i've seen for a CPU processed Task in use is a bit over 1.5GB.
For GPU processed Tasks, they're using up to 2.5GB of system RAM & 6GB of VRAM.
However the peak Swap file size i've seen is as high as 15GB. .


mikey said:

Application
Generalized biomolecular modeling and design with RoseTTAFold All-Atom 0.02
Name
RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_g_pred_171_16903_6
State
Running
Received
6/14/2024 3:05:08 AM
Report deadline
6/15/2024 3:05:09 AM
Estimated computation size
80,000 GFLOPs
CPU time
00:03:30
CPU time since checkpoint
00:03:30
Elapsed time
1d 07:17:48
Estimated time remaining
---
Fraction done
100.000%
Virtual memory size
12.77 GB
Working set size
4.75 GB
Directory
slots/16
Process ID
8508
Progress rate
3.240% per hour
Executable
w_0.02_windows_x86_64.exe
ID: 7661 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 890
Credit: 1,889,390
RAC: 1
Message 7662 - Posted: 15 Jun 2024, 17:06:58 UTC

I completed my first "cpu-ony" wu: 5465161
ID: 7662 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 83
Credit: 67,898
RAC: 2,569
Message 7663 - Posted: 15 Jun 2024, 18:50:18 UTC - in response to Message 7656.  
Last modified: 15 Jun 2024, 19:01:37 UTC

Why is there a stupid discussion over a combined CPU and GPU app, No such thing exists.

If you look in the properties for Ralph in your client you will see it shows:

Project has no apps for NVIDIA GPU

There is also no GPU app listed in Applications on the site and the app just runs on the CPU and cannot run on GPU.
That is why my earlier long & pointless discussion took place.

There is now an application for both CPU & GPU- we are testing it here, now, but BOINC is completely unaware of that. It asks for CPU work, it gets a Task. If it can't run on the GPU, it runs on the CPU. If it can run on the GPU, then it does.
Hence my attempt to point out the ramifications of this behaviour earlier.

And unfortunately, unless there is an error with the Task, a completed Tasks doesn't give any indication in the Stderr output of what it was processed on (let alone what work or how much was done).
Grant
Darwin NT
ID: 7663 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 83
Credit: 67,898
RAC: 2,569
Message 7664 - Posted: 15 Jun 2024, 18:53:43 UTC - in response to Message 7658.  

Rosettafold doesn't suspend whend told do.
When i suspended it it continued running.


give it time, mine does the same thing but does suspend eventually
I had Tasks suspended for over an hour, they were still running.
Yes, the BOINC Manager shows them as suspended, but in Task Manager they are still running and using CPU time, along with the other TTAFold Tasks that show as running.
Grant
Darwin NT
ID: 7664 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 83
Credit: 67,898
RAC: 2,569
Message 7665 - Posted: 15 Jun 2024, 18:58:40 UTC - in response to Message 7653.  

It is like GPU grid Python apps for GPU hosts
So, why i'm running some wus exclusively on cpu?
You don't have an Nvidia GPU with the right driver.
If you did, they would run on the GPU.
Grant
Darwin NT
ID: 7665 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 83
Credit: 67,898
RAC: 2,569
Message 7666 - Posted: 15 Jun 2024, 19:06:11 UTC - in response to Message 7661.  
Last modified: 15 Jun 2024, 19:13:00 UTC

mikey said:
...
OK, and...?
(You do realise the swap file & Virtual Memory are the same? (Virtual Memory makes use of the swap file) It is disk space and not physical RAM that is in use?)
Grant
Darwin NT
ID: 7666 · Report as offensive    Reply Quote
Profile rilian
Avatar

Send message
Joined: 7 Sep 07
Posts: 20
Credit: 77,491
RAC: 58
Message 7667 - Posted: 15 Jun 2024, 20:36:42 UTC

i ve got

 <core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
The access code is invalid.
 (0xc) - exit code 12 (0xc)</message>
<stderr_txt>
'C:Program' is not recognized as an internal or external command,
operable program or batch file.

</stderr_txt>
]]>


but few WUs validated fine
--
I crunch for Ukraine

ID: 7667 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 83
Credit: 67,898
RAC: 2,569
Message 7668 - Posted: 15 Jun 2024, 20:55:54 UTC - in response to Message 7667.  
Last modified: 15 Jun 2024, 21:03:13 UTC

i ve got

 <core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
The access code is invalid.
 (0xc) - exit code 12 (0xc)</message>
<stderr_txt>
'C:Program' is not recognized as an internal or external command,
operable program or batch file.

</stderr_txt>
]]>
Haven't seen that error message before.


but few WUs validated fine
On a different system.

When we get some more work, if you still get the same error on your WIn7 system, i'd suggest resetting the project.
If after re-downloading all the files, it could be that the application isn't supported by WIn7


Edit- actually a quick search on "Python for Windows" shows that none of the versions released in the last 12-18 months can be used on Win7 or earlier.
Edit- the version of Python being used here is 3.9.19 from 19/3/2024, and cannot be used on Win7 or earlier.

I suggest you set no new Tasks for Ralph.




So two more things on the Developer's to do list- block any attempt to get work from Win7 or older Operating Systems.
- advise us of the minimum video driver version required for GPU processing, and stop it from attempting to run on systems with unsupported drivers.
Grant
Darwin NT
ID: 7668 · Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 26 Feb 21
Posts: 22
Credit: 1,893
RAC: 0
Message 7669 - Posted: 16 Jun 2024, 0:06:16 UTC

Looks like they stopped workunit generation
ID: 7669 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 83
Credit: 67,898
RAC: 2,569
Message 7670 - Posted: 16 Jun 2024, 1:56:11 UTC - in response to Message 7669.  

Looks like they stopped workunit generation
Yes, almost 2 days ago.
Hopefully the next batch will have at least a few of the issues of the last batch sorted out.
Grant
Darwin NT
ID: 7670 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 890
Credit: 1,889,390
RAC: 1
Message 7671 - Posted: 16 Jun 2024, 5:40:29 UTC - in response to Message 7670.  

Hopefully the next batch will have at least a few of the issues of the last batch sorted out.


Waiting for 0.03 version...and for clarifications about app (OS, gpu, etc)

P.S.
If you see my profile, i'm here since...well, i don't remember and the Ralph admins rarely are clear about project/app
ID: 7671 · Report as offensive    Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 · Next

Message boards : RALPH@home bug list : RoseTTAFold All-Atom



©2024 University of Washington
http://www.bakerlab.org