RoseTTAFold All-Atom

Message boards : RALPH@home bug list : RoseTTAFold All-Atom

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next

AuthorMessage
kotenok2000
Avatar

Send message
Joined: 26 Feb 21
Posts: 22
Credit: 1,893
RAC: 0
Message 7571 - Posted: 14 Jun 2024, 0:47:37 UTC - in response to Message 7570.  

I am talking about intel GPU.
ID: 7571 · Report as offensive    Reply Quote
Vester

Send message
Joined: 29 Apr 20
Posts: 17
Credit: 1,176
RAC: 0
Message 7572 - Posted: 14 Jun 2024, 1:34:40 UTC - in response to Message 7571.  

I am talking about intel GPU.

The Intel graphics are not running a task. Utilization of the GPU is about 3%.
ID: 7572 · Report as offensive    Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 8 Aug 06
Posts: 75
Credit: 2,396,363
RAC: 6,299
Message 7573 - Posted: 14 Jun 2024, 3:02:05 UTC
Last modified: 14 Jun 2024, 3:03:17 UTC

I had a full set of 16 tasks running just fine, apparently. But once they passed 25 hours, with 24 hours being the maximum, I knew they were broken and aborted them just now.
ID: 7573 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7574 - Posted: 14 Jun 2024, 3:49:00 UTC - in response to Message 7559.  

It uses GPU, but boinc manager doesn't reflect that.
This is a 100% CPU application.
No GPU work is being done at all.
ID: 7574 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7575 - Posted: 14 Jun 2024, 3:53:33 UTC - in response to Message 7573.  

I had a full set of 16 tasks running just fine, apparently. But once they passed 25 hours, with 24 hours being the maximum, I knew they were broken and aborted them just now.
Mine have been going for 18 hours.
I will give them until 24hrs, then abort.
ID: 7575 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7576 - Posted: 14 Jun 2024, 4:30:38 UTC - in response to Message 7567.  
Last modified: 14 Jun 2024, 4:34:49 UTC

Was anyone able to return task successfully?
Yes, a few people have.
In most cases, the Runtime is less than an hour.
(and they certainly make use of system resources eg
Peak working set size  6,323.86 MB
Peak swap size        10,856.63 MB
)



The CPU time counter is definitely broken, and i suspect checkpointing is done based on CPU time (not Run time), that's probably why the checkpointing isn't working.
They need to fix the CPU time counter, as well as put a watchdog timer on the tasks as per the Rosetta 4.20 Tasks- but i suspect the watchdog timer also uses CPU time, not run time.
So they really need to fix the CPU time issue.

Along with the no end in sight processing time of these Tasks.
ID: 7576 · Report as offensive    Reply Quote
Vester

Send message
Joined: 29 Apr 20
Posts: 17
Credit: 1,176
RAC: 0
Message 7577 - Posted: 14 Jun 2024, 4:56:30 UTC

My twelve tasks ran more than 1 day before failing with a message stating that I did not have enough paging files space although I have a fixed minimum and maximum of 84470 MB.

I have limited the number of cores in BIOS to five, and I am running 5 tasks. Also, I had updated my Ralph preferences to 4 hours.

Note: One can also limit the number of cores in Windows 11 by setting "number of processors" in Advanced Boot options (run msconfig).
ID: 7577 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 7578 - Posted: 14 Jun 2024, 5:20:52 UTC
Last modified: 14 Jun 2024, 5:21:26 UTC

It seems a multi-core app (or, better, windows/boinc consider it like a multi-core app).
I killed my wus except 5. Task manager said that cpu is still 100% and all cores are running. But my cpu is 16-cores!!

Now the wus remaining are at 99,950% after 23hrs and i think they will not finish in time
ID: 7578 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7579 - Posted: 14 Jun 2024, 5:33:17 UTC - in response to Message 7578.  

It seems a multi-core app (or, better, windows/boinc consider it like a multi-core app).
I killed my wus except 5. Task manager said that cpu is still 100% and all cores are running. But my cpu is 16-cores!!
If it's a multiprocessing application, then it would explain the odd CPU utilisation showing up in Task Manager, and in which case they need to give us the option to set the number of cores available for the application for each Task running- ie 1, 2 4 etc.
That way people won't end up with over committed systems (ie Runtime being multiple times longer thn CPU time, and missing deadlines because it's taking 20 hours to do 2 hours of work - they should pretty much be equal on a dedicated cruncher).
ID: 7579 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7580 - Posted: 14 Jun 2024, 6:00:44 UTC
Last modified: 14 Jun 2024, 6:22:13 UTC

These TTAFold tasks are very, very broken.
I limited them to only 4 running, but due to the broken nature of suspend etc, the suspended ones still kept running in the background.
And when the Rosetta Beta tasks started up, while their elapsed time ticked away, they received absolutely 0 CPU time.

So i exited BOINC & threw away all 20 hours of processing time on a dozen TTAFold Tasks.
When it restarted, those 4 TTAFold Tasks were using 100% of the CPU, still none at all available for the Rosetta Beta Tasks.

So i then limited the TTAFold Tasks to only 1 running Task.
That single Task is using 8 threads.


There needs to be a way to limit the number of threads a single Task can use.
And at the moment, the indications are that 1 Task using 8 threads performs no better than when 12 Tasks were trying to use 8 threads each, when there were only 12 threads available.
I'll keep an eye on things to see if they don't slow down later, but the initial signs are that the extra threads are providing not even the slightest improvement in processing time- they're just being wasted.
ID: 7580 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7581 - Posted: 14 Jun 2024, 6:03:03 UTC - in response to Message 7577.  

Note: One can also limit the number of cores in Windows 11 by setting "number of processors" in Advanced Boot options (run msconfig).
I've opted to use max_concurrent to limit the number of cores/threads avalable to the TTAFold Tasks, leaving the others available for other processes.
As i have found, they are pigs. 1 Task = 8 threads.
ID: 7581 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 7582 - Posted: 14 Jun 2024, 6:35:51 UTC - in response to Message 7580.  

So i then limited the TTAFold Tasks to only 1 running Task.
That single Task is using 8 threads.

There needs to be a way to limit the number of threads a single Task can use.


The app, probably, is NOT multi-threading.
As i wrote, seems a "misunderstandig" between app and Boinc/Windows
ID: 7582 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 913
Credit: 1,892,541
RAC: 294
Message 7583 - Posted: 14 Jun 2024, 6:36:37 UTC - in response to Message 7581.  

As i have found, they are pigs. 1 Task = 8 threads.


Pigs??
ID: 7583 · Report as offensive    Reply Quote
zioriga

Send message
Joined: 16 Feb 06
Posts: 8
Credit: 323,279
RAC: 1,175
Message 7584 - Posted: 14 Jun 2024, 6:48:58 UTC

I'running only 1 WUs and I have the GPU ( NVidia 3050) running at 99-100% (GPU_Z, no other WUs using GPU)
in the WU Properties :
cpu time 0
Elapsed time 21:20:15
Fraction done 99.334%

In other words: is the WU running only on CPU or only on GPU ????
ID: 7584 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7585 - Posted: 14 Jun 2024, 7:16:18 UTC - in response to Message 7584.  

In other words: is the WU running only on CPU or only on GPU ????
CPU only.
ID: 7585 · Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 3 Mar 23
Posts: 31
Credit: 9,510
RAC: 3
Message 7586 - Posted: 14 Jun 2024, 7:17:43 UTC

They use 10 threads each, but Boinc isn't told this, so it runs far too many. I only noticed this on my main machine because everything seemed so sluggish. It was trying to run 7 of 10 thread tasks on a 24 thread CXPU. Hence the GPU running asteroids slowed right down, and the interface was terribly sluggish.
ID: 7586 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7587 - Posted: 14 Jun 2024, 7:18:01 UTC - in response to Message 7583.  

As i have found, they are pigs. 1 Task = 8 threads.
Pigs??
They'll take everything you give them, even if they don't need it.

Resource Hog definition: A process which consumes a large amount of system resources compared to its importance or function.
ID: 7587 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7588 - Posted: 14 Jun 2024, 7:34:13 UTC - in response to Message 7586.  

They use 10 threads each
On my system i have limited them to only one TTAFold Task at a time.
It's using a maximum of 62% of my CPU time which works out at just under 8 threads. So it's effectively using 8.
Yet the processing rate is the same as when 12 of them were fighting over 12 threads in total, so they really only need 1.





A word to the developers- limit these TTAFold tasks to 1 thread per Task until such time as using more threads results in improved processing rates.
Even then, the default should still remain 1; with the option in the Ralph@home preferences (and eventually the Rosetta@home preferences) for people to select a higher value for the TTAfold application if they choose to.
ID: 7588 · Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 3 Mar 23
Posts: 31
Credit: 9,510
RAC: 3
Message 7589 - Posted: 14 Jun 2024, 7:56:25 UTC - in response to Message 7588.  

They use 10 threads each
On my system i have limited them to only one TTAFold Task at a time.
It's using a maximum of 62% of my CPU time which works out at just under 8 threads. So it's effectively using 8.
Yet the processing rate is the same as when 12 of them were fighting over 12 threads in total, so they really only need 1.
How do you know the rate is the same? We don't know how the task is counting progress. It could be timed like the standard Rosetta 4.2 tasks. Those take 8 hours on any speed of CPU.
ID: 7589 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 126
Credit: 193,939
RAC: 2,635
Message 7590 - Posted: 14 Jun 2024, 8:04:56 UTC - in response to Message 7589.  
Last modified: 14 Jun 2024, 8:14:28 UTC

How do you know the rate is the same?
Because as i said in my previous post- i had already processed 12 of those Tasks. The progress rate for the currently running single Tasks is the same as it was for those 12 other Tasks- it starts off fast, and continues to drop as the Task just keeps on going, well after the initial 4 hour estimate.

True- If a Task was to ever complete, then hopefully we could then see if it actually did do any more work in that time, but at present i've got a single Task that is on the very same course as the previous ones with the same processing rate, the same slowing rate of fraction done & heading for missing the deadline because it's going to take over 24hours to process (if it ever does manage to process it).

Given that others have run into memory issues after letting it run for 24hrs+, and I've now got 8 threads for the one Task i'd have expected to start running in to similar issues 8 times sooner, but that's yet t happen.

So i think it's a pretty reasonable assumption that it's not doing 8 times as much work as it was, which is what it would have to do to make using that many threads worthwhile.
ID: 7590 · Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next

Message boards : RALPH@home bug list : RoseTTAFold All-Atom



©2024 University of Washington
http://www.bakerlab.org