Message boards : RALPH@home bug list : RoseTTAFold All-Atom
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next
Author | Message |
---|---|
Mr P Hucker Send message Joined: 3 Mar 23 Posts: 31 Credit: 9,510 RAC: 3 |
Actually I have a different experience. I have a Ryzen 9 3900XT and a Ryzen 9 3900X (pretty much the same CPU). On one of them I told Boinc to allocate 9.6 threads (as it appeared to use about 9.6 on average when only one task was running). On the other CPU I didn't get round to it, and didn't care so much as it only runs Boinc and there's no GPU to slow down. The 9.6 thread one takes 2 hours to complete a task, the 1 thread one takes 23 hours. So from what I'm seeing it does use the threads it's given.They use 10 threads eachOn my system i have limited them to only one TTAFold Task at a time. Anyway we can't trust the progress as we don't know it's reliable, it could just be a timer. |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 126 Credit: 193,939 RAC: 2,635 |
The 9.6 thread one takes 2 hours to complete a task, the 1 thread one takes 23 hours. So from what I'm seeing it does use the threads it's given.All of your Tasks have errored out. Some of them may have finished, but then failed when it came to return the result. Yet another thing they need to fix if that is the case. And the times between the 2 systems are all over the place. The one with the 1 day 4 hour Tasks also has 5min, 30min, & 7 hour Tasks. The one that is consistently around 5hrs 30min, also has some 5min tasks. If the extra threads are doing more work, then it's not 8 times better, maybe 3 times. Waste of resources if that is the case. They need to fix all the other errors, then either optimise the multiprocessing or do away with it. Anyway we can't trust the progress as we don't know it's reliable, it could just be a timer.The Runtime is the elapsed time, which is a timer. There should also be CPU time but that's broken. The progress done does indicate how far along it is- hence it's slowing down as the Task's run time drags on. |
Mr P Hucker Send message Joined: 3 Mar 23 Posts: 31 Credit: 9,510 RAC: 3 |
The progress done simply indicates when it wants to finish. Rosetta 4.2 will make this about 8 hours, no matter how much actual work was done. So when you run 8 threads, you might actually be doing 8 times the work in the same task. It depends whether they want the best throughput or fast returns. Using 8 threads to do 3 threads of work is good if each task is returned 3 times earlier. Folding@Home for example loves quick returns so they can calculate the next task from the results. Faster at the expense of lower efficiency is therefore ok. Anyway there's so many bugs we can't draw much in the way of conclusions. I'm just telling my computers to allocate the number of threads the tasks want to use, so everything runs smoothly. It would be good if an admin replied. Are they even reading our posts and using the information? |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 913 Credit: 1,892,541 RAC: 294 |
Seems that some computers can run correctly the new app, for example: AuthenticAMD AMD Ryzen 9 5950X 16-Core Processor So, we need a little monster pc to crunch? |
Mr P Hucker Send message Joined: 3 Mar 23 Posts: 31 Credit: 9,510 RAC: 3 |
Seems that some computers can run correctly the new app, for example:Is this app only supposed to be given to PCs with a Nvidia? |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 913 Credit: 1,892,541 RAC: 294 |
NVIDIA GeForce RTX 3090 (24575MB) driver: 546.01 Perhaps (and not a little Nvidia) |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 913 Credit: 1,892,541 RAC: 294 |
(and not a little Nvidia) For example this guy has some pcs The wus run correctly on a 3060, but not on a 4060 (ex. wu 5460613) This is strange. |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 126 Credit: 193,939 RAC: 2,635 |
Is this app only supposed to be given to PCs with a Nvidia?Not according to the home page We are initially targeting only Windows machines with or without Nvidia GPUs for this test.Which is really odd wording- it's the same as saying "We are initially targeting only Windows machines with or without AMD GPUs for this test." or "We are initially targeting only Windows machines with or without Intel GPUs for this test." or "We are initially targeting only Windows machines with or without Apple GPUs for this test." ie what they are saying is "We are initially targeting only Windows machines for this test." |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 126 Credit: 193,939 RAC: 2,635 |
Different video drivers.(and not a little Nvidia) |
Vester Send message Joined: 29 Apr 20 Posts: 17 Credit: 1,176 RAC: 0 |
Upload failure. Task 5456545 Stderr output <core_client_version>8.0.2</core_client_version> <![CDATA[ <stderr_txt> 00:01:02 (16016): called boinc_finish(0) </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_f_pred_88_16902_2_0_r1516067865_0</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> </message> ]]> |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 126 Credit: 193,939 RAC: 2,635 |
How do you know the rate is the same? We don't know how the task is counting progress. It could be timed like the standard Rosetta 4.2 tasks. Those take 8 hours on any speed of CPU.Looks like those extra threads do have an impact- a Task finished in just under 4 hours (previously 20+ hrs and no sign of ending). But then it failed like yours with an upload issue- It also shows signs of trying to use my video card, but not being able to find CUDA support (it does have CUDA sport, (or did have)). RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_f_pred_340_16902_6_0 <core_client_version>8.0.2</core_client_version> <![CDATA[ <stderr_txt> C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchcuda__init__.py:52: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at ..c10cudaCUDAFunctions.cpp:115.) return torch._C._cuda_getDeviceCount() > 0 19:19:49 (9548): called boinc_finish(0) </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_f_pred_340_16902_6_0_r1143327185_0</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> </message> ]]> |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 913 Credit: 1,892,541 RAC: 294 |
Different video drivers. Uh, are we in such shape that a different driver can cause the failure of a wu? This would be a nightmare! |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 126 Credit: 193,939 RAC: 2,635 |
Upload failure. Report deadline 14 Jun 2024, 1:08:41 UTCI'm wondering if all these file transfer issues are due to the Ralph server not cancelling Tasks on the computer once they've been cancelled on the sever? You complete the task, but it can't be returned because it's been cancelled for missing the deadline on the server? |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 126 Credit: 193,939 RAC: 2,635 |
If there is a significant difference in the CUDA version between driver versions, and the TTAFold application makes use of calls that are only available in the most recent version, fail time.Different video drivers.Uh, are we in such shape that a different driver can cause the failure of a wu? |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 126 Credit: 193,939 RAC: 2,635 |
Report deadline 14 Jun 2024, 1:08:41 UTCI'm wondering if all these file transfer issues are due to the Ralph server not cancelling Tasks on the computer once they've been cancelled on the sever? My failed upload Task details. Report deadline 14 Jun 2024, 8:49:48 UTC Received 14 Jun 2024, 9:51:17 UTC P Hucker Report deadline 14 Jun 2024, 8:25:06 UTC Received 14 Jun 2024, 9:46:38 UTC Anyone noticing a pattern here with the failed file transfers? And every single Ralph Task on my system is past the deadline, and even the ones not yet started aren't being removed by the server. If i can't return the result, there's not much point keeping them... Grant Darwin NT |
Mr P Hucker Send message Joined: 3 Mar 23 Posts: 31 Credit: 9,510 RAC: 3 |
Seems plain to me. You must have windows. A Nvidia will be used if you have it. Other GPUs will not be used.Is this app only supposed to be given to PCs with a Nvidia?Not according to the home page |
Mr P Hucker Send message Joined: 3 Mar 23 Posts: 31 Credit: 9,510 RAC: 3 |
This is often the case with many projects. Primegrid has a list of which versions to avoid - Nvidia are well known for buggy drivers.Different video drivers. |
Mr P Hucker Send message Joined: 3 Mar 23 Posts: 31 Credit: 9,510 RAC: 3 |
The normal way to do things is let you report (and get credit for) a late task. But on the deadline it's resent incase your computer is never going to do it (eg. it broke). If yours is returned before the other guy starts it, his PC is told to not bother. If he's started it, he finishes it too, and it's used for comparison (and not to upset silly people who think they've wasted their time, but actually, they're wasting more time by completing a task already done by you).Report deadline 14 Jun 2024, 1:08:41 UTCI'm wondering if all these file transfer issues are due to the Ralph server not cancelling Tasks on the computer once they've been cancelled on the sever? |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 126 Credit: 193,939 RAC: 2,635 |
It also shows signs of trying to use my video card, but not being able to find CUDA support (it does have CUDA sport, (or did have)).GPU-Z says i have CUDA. Grant Darwin NT |
Vester Send message Joined: 29 Apr 20 Posts: 17 Credit: 1,176 RAC: 0 |
Note: One can also limit the number of cores in Windows 11 by setting "number of processors" in Advanced Boot options (run msconfig).I've opted to use max_concurrent to limit the number of cores/threads avalable to the TTAFold Tasks, leaving the others available for other processes. I am an idiot! All we have to do is set the percentage of CPUs in our account preferences. I set my 10 cores hyperthreaded (20) to 10% and run 2 tasks. |
Message boards :
RALPH@home bug list :
RoseTTAFold All-Atom
©2024 University of Washington
http://www.bakerlab.org