RoseTTAFold All-Atom 0.02 (env)

Message boards : RALPH@home bug list : RoseTTAFold All-Atom 0.02 (env)

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 910
Credit: 1,892,541
RAC: 294
Message 7683 - Posted: 18 Jun 2024, 9:48:00 UTC

Today we have this new app, 0.02(env)
But, as usual, no explanation about
ID: 7683 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 118
Credit: 193,939
RAC: 2,635
Message 7691 - Posted: 18 Jun 2024, 10:06:08 UTC
Last modified: 18 Jun 2024, 10:15:44 UTC

Things are very, very, very broken.
I am able to view my Tasks.

And apparently "Generalized biomolecular modeling and design with RoseTTAFold All-Atom v0.02 (env)" was used to process the last batch of work
"Generalized biomolecular modeling and design with RoseTTAFold All-Atom v0.01" is doing the current batch (which so far died straight away).

And when i look at my Application details, it thinks the same thing- the new version shows me as having done 31 Tasks.
The old version shows me as having done 2 (the ones that just downloaded and died instantly).

On the Computing, Application page, the old version has gone from the list & the new version is the only one there, showing the Average Computing number of the old application.



Could be the problem is that the old application is trying to process the new Tasks because everything is so scrambled?
Grant
Darwin NT
ID: 7691 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 910
Credit: 1,892,541
RAC: 294
Message 7696 - Posted: 18 Jun 2024, 12:43:45 UTC - in response to Message 7691.  

Could be the problem is that the old application is trying to process the new Tasks because everything is so scrambled?


Maybe.
Now it's "0.02 (nvidia_alpha)" and i'm not downloading wus (now over 1000 in queue), probably because i've not a Nvidia gpu.
ID: 7696 · Report as offensive    Reply Quote
Profile rilian
Avatar

Send message
Joined: 7 Sep 07
Posts: 35
Credit: 107,666
RAC: 725
Message 7699 - Posted: 18 Jun 2024, 14:36:46 UTC - in response to Message 7696.  
Last modified: 18 Jun 2024, 14:49:22 UTC

Generalized biomolecular modeling and design with RoseTTAFold All-Atom 0.02 (nvidia_alpha)

I've got few of these

Currently crunching 1st task - it always stays at 100% - few already failed after half hour

I updated BOINC to 8.0.2, to see if this helps task to process

<core_client_version>8.0.0</core_client_version>
<![CDATA[
<message>
exceeded elapsed time limit 1594.21 (100000000.00G/62727.08G)</message>
<stderr_txt>


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x00007FFED3C5AFA2

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 7.9.0


Dump Timestamp    : 06/18/24 10:16:53
Install Directory : C:Program FilesBOINC
Data Directory    : C:ProgramDataBOINC
Project Symstore  : https://boinc.bakerlab.org/rosetta/symstore
LoadLibraryA( C:ProgramDataBOINCdbghelp.dll ): GetLastError = 126
Loaded Library    : dbghelp.dll
LoadLibraryA( C:ProgramDataBOINCsymsrv.dll ): GetLastError = 126
LoadLibraryA( symsrv.dll ): GetLastError = 126
LoadLibraryA( C:ProgramDataBOINCsrcsrv.dll ): GetLastError = 126
LoadLibraryA( srcsrv.dll ): GetLastError = 126
LoadLibraryA( C:ProgramDataBOINCversion.dll ): GetLastError = 126
Loaded Library    : version.dll
Debugger Engine   : 4.0.5.0
Symbol Search Path: C:ProgramDataBOINCslots13;C:ProgramDataBOINCprojectsralph.bakerlab.org;srv*C:ProgramDataBOINCprojectsralph.bakerlab.orgsymbols*http://msdl.microsoft.com/download/symbols;srv*C:ProgramDataBOINCprojectsralph.bakerlab.orgsymbols*https://boinc.bakerlab.org/rosetta/symstore


ModLoad: 0000000040000000 000000000013f000 C:ProgramDataBOINCprojectsralph.bakerlab.orgw_0.02_windows_x86_64.exe (-nosymbols- Symbols Loaded)
    Linked PDB Filename   : C:UsersUsersourcereposConsoleApplication1x64ReleaseConsoleApplication1.pdb

ModLoad: 00000000d5e30000 00000000001f8000 C:windowsSYSTEM32ntdll.dll (6.2.19041.4522) (-exported- Symbols Loaded)
    Linked PDB Filename   : ntdll.pdb
    File Version          : 10.0.19041.3636 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.3636

ModLoad: 00000000d3e90000 00000000000bd000 C:windowsSystem32KERNEL32.DLL (6.2.19041.3636) (-exported- Symbols Loaded)
    Linked PDB Filename   : kernel32.pdb
    File Version          : 10.0.19041.3636 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.3636

ModLoad: 00000000d3b70000 00000000002f6000 C:windowsSystem32KERNELBASE.dll (6.2.19041.4522) (-exported- Symbols Loaded)
    Linked PDB Filename   : kernelbase.pdb
    File Version          : 10.0.19041.3636 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.3636

ModLoad: 00000000d4330000 000000000019f000 C:windowsSystem32USER32.dll (6.2.19041.4474) (-exported- Symbols Loaded)
    Linked PDB Filename   : user32.pdb
    File Version          : 10.0.19041.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.1

ModLoad: 00000000d35a0000 0000000000022000 C:windowsSystem32win32u.dll (6.2.19041.4529) (-exported- Symbols Loaded)
    Linked PDB Filename   : win32u.pdb
    File Version          : 10.0.19041.4529 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.4529

ModLoad: 00000000d5800000 000000000002b000 C:windowsSystem32GDI32.dll (6.2.19041.4474) (-exported- Symbols Loaded)
    Linked PDB Filename   : gdi32.pdb
    File Version          : 10.0.19041.4474 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.4474

ModLoad: 00000000d3660000 0000000000117000 C:windowsSystem32gdi32full.dll (6.2.19041.4474) (-exported- Symbols Loaded)
    Linked PDB Filename   : gdi32full.pdb
    File Version          : 10.0.19041.4474 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.4474

ModLoad: 00000000d3820000 000000000009d000 C:windowsSystem32msvcp_win.dll (6.2.19041.3636) (-exported- Symbols Loaded)
    Linked PDB Filename   : msvcp_win.pdb
    File Version          : 10.0.19041.3636 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.3636

ModLoad: 00000000d3a70000 0000000000100000 C:windowsSystem32ucrtbase.dll (6.2.19041.3636) (-exported- Symbols Loaded)
    Linked PDB Filename   : ucrtbase.pdb
    File Version          : 10.0.19041.3636 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.3636

ModLoad: 00000000d4160000 00000000000b0000 C:windowsSystem32ADVAPI32.dll (6.2.19041.4522) (-exported- Symbols Loaded)
    Linked PDB Filename   : advapi32.pdb
    File Version          : 10.0.19041.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.1

ModLoad: 00000000d4290000 000000000009e000 C:windowsSystem32msvcrt.dll (7.0.19041.3636) (-exported- Symbols Loaded)
    Linked PDB Filename   : msvcrt.pdb
    File Version          : 7.0.19041.3636 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 7.0.19041.3636

ModLoad: 00000000d4a20000 00000000000a0000 C:windowsSystem32sechost.dll (6.2.19041.4522) (-exported- Symbols Loaded)
    Linked PDB Filename   : sechost.pdb
    File Version          : 10.0.19041.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.1

ModLoad: 00000000d56d0000 0000000000123000 C:windowsSystem32RPCRT4.dll (6.2.19041.4355) (-exported- Symbols Loaded)
    Linked PDB Filename   : rpcrt4.pdb
    File Version          : 10.0.19041.3636 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.3636

ModLoad: 00000000d37f0000 0000000000027000 C:windowsSystem32bcrypt.dll (6.2.19041.3636) (-exported- Symbols Loaded)
    Linked PDB Filename   : bcrypt.pdb
    File Version          : 10.0.19041.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.1

ModLoad: 00000000d5830000 000000000002f000 C:windowsSystem32IMM32.DLL (6.2.19041.4474) (-exported- Symbols Loaded)
    Linked PDB Filename   : imm32.pdb
    File Version          : 10.0.19041.4474 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.4474

ModLoad: 00000000d1390000 0000000000012000 C:windowsSYSTEM32kernel.appcore.dll (6.2.19041.3758) (-exported- Symbols Loaded)
    Linked PDB Filename   : Kernel.Appcore.pdb
    File Version          : 10.0.19041.3758 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.3758

ModLoad: 00000000c8310000 00000000001e4000 C:windowsSYSTEM32dbghelp.dll (6.2.19041.3636) (-exported- Symbols Loaded)
    Linked PDB Filename   : dbghelp.pdb
    File Version          : 10.0.19041.3636 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.3636

ModLoad: 00000000cd1e0000 000000000000a000 C:windowsSYSTEM32version.dll (6.2.19041.3636) (-exported- Symbols Loaded)
    Linked PDB Filename   : version.pdb
    File Version          : 10.0.19041.3636 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.3636

ModLoad: 00000000d35d0000 0000000000082000 C:windowsSystem32bcryptPrimitives.dll (6.2.19041.3636) (-exported- Symbols Loaded)
    Linked PDB Filename   : bcryptprimitives.pdb
    File Version          : 10.0.19041.3636 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.19041.3636



*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 6, Write: 185, Other 91

- I/O Transfers Counters -
Read: 18454, Write: 194, Other 9532

- Paged Pool Usage -
QuotaPagedPoolUsage: 90264, QuotaPeakPagedPoolUsage: 90440
QuotaNonPagedPoolUsage: 6520, QuotaPeakNonPagedPoolUsage: 7472

- Virtual Memory Usage -
VirtualSize: 2031616, PeakVirtualSize: 83140608

- Pagefile Usage -
PagefileUsage: 2031616, PeakPagefileUsage: 2031616

- Working Set Size -
WorkingSetSize: 5619712, PeakWorkingSetSize: 5623808, PageFaultCount: 1499

*** Dump of thread ID 3956 (state: Waiting): ***

- Information -
Status: Wait Reason: UserRequest, , Kernel Time: 156250.000000, User Time: 0.000000, Wait Time: 4363216.000000

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x00007FFED3C5AFA2

- Registers -
rax=0000000000000000 rbx=0000000000000001 rcx=0000000040099c58 rdx=000000000231eb80 rsi=0000000000000000 rdi=0000000000000000
r8=000000000231eb80 r9=0000000040099c48 r10=0000000000000fff r11=0000000000000ff0 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000 rip=00000000d3c5afa2 rsp=000000000231eb58 rbp=0000000000000000
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246

- Callstack -
ChildEBP RetAddr  Args to Child
0231eb50 40041010 00000001 0231eb80 0231eb80 40099c48 KERNELBASE!DebugBreak+0x0 
0231ef90 40041b74 32000000 0231f2f3 00000000 00000154 w_0.02_windows_x86_64!+0x0 
0231fef0 40041f14 00000000 00000000 00000000 00000000 w_0.02_windows_x86_64!+0x0 
0231ff20 d3ea7344 00000000 00000000 00000000 00000000 w_0.02_windows_x86_64!+0x0 
0231ff50 d5e7cc91 00000000 00000000 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0 
0231ffd0 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0 

*** Dump of thread ID 4796 (state: Waiting): ***

- Information -
Status: Wait Reason: UserRequest, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 4261127.000000

- Registers -
rax=0000000000000004 rbx=0000000000000000 rcx=0000000000000130 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000130
r8=0000000000564ba0 r9=00000000000000ab r10=000000000000002d r11=00000000000000ab r12=0000000000000000 r13=0000000000000000
r14=0000000000000130 r15=0000000000000000 rip=00000000d5ecd5e4 rsp=000000000013b518 rbp=000000000013b679
cs=0033  ss=002b  ds=0000  es=0000  fs=0000  gs=0000             efl=00000246

- Callstack -
ChildEBP RetAddr  Args to Child
0013b510 d3b91c4e 00000000 d3bcb900 00000000 00000000 ntdll!ZwWaitForSingleObject+0x0 
0013b5b0 4006a010 00000130 ffffffff 00000000 00000130 KERNELBASE!WaitForSingleObjectEx+0x0 
0013b6d0 40069ddd 00000000 00000000 ffffffff 00000000 w_0.02_windows_x86_64!+0x0 
0013b730 4004be8b 00000000 00000000 0013b7a0 00567700 w_0.02_windows_x86_64!+0x0 
0013b7a0 40003e04 0056dde0 00000031 00567c00 0013b8b0 w_0.02_windows_x86_64!+0x0 
0014fee0 400447b0 00567570 00000000 00559300 00000000 w_0.02_windows_x86_64!+0x0 
0014ff20 d3ea7344 00000000 00000000 00000000 00000000 w_0.02_windows_x86_64!+0x0 
0014ff50 d5e7cc91 00000000 00000000 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0 
0014ffd0 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0 


*** Debug Message Dump ****


*** Foreground Window Data ***
    Window Name      : 
    Window Class     : 
    Window Process ID: 0
    Window Thread ID : 0

Exiting...

</stderr_txt>
]]>

--
I crunch for Ukraine

ID: 7699 · Report as offensive    Reply Quote
Profile rilian
Avatar

Send message
Joined: 7 Sep 07
Posts: 35
Credit: 107,666
RAC: 725
Message 7700 - Posted: 18 Jun 2024, 15:16:52 UTC
Last modified: 18 Jun 2024, 15:17:07 UTC

Updating BOINC to 8.0.2 did not help

The task fails with Exit status 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED after 25 mins

I have a setting to run tasks 1 day in preferences ...
--
I crunch for Ukraine

ID: 7700 · Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 9 Apr 08
Posts: 910
Credit: 1,892,541
RAC: 294
Message 7701 - Posted: 18 Jun 2024, 15:43:03 UTC - in response to Message 7700.  

Updating BOINC to 8.0.2 did not help
The task fails with Exit status 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED after 25 mins


Did you see, in the other thread, the message about gpu ram (min. 6gb)?
Have you the latest gpu driver for your Nvidia?
ID: 7701 · Report as offensive    Reply Quote
Profile rilian
Avatar

Send message
Joined: 7 Sep 07
Posts: 35
Credit: 107,666
RAC: 725
Message 7702 - Posted: 18 Jun 2024, 15:44:02 UTC - in response to Message 7701.  

Updating BOINC to 8.0.2 did not help
The task fails with Exit status 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED after 25 mins


Did you see, in the other thread, the message about gpu ram (min. 6gb)?
Have you the latest gpu driver for your Nvidia?


Yes i have RTX 3060 with 12GB ram - so i dont receive such error msg

Task crunches for ~25 mins and then fails
--
I crunch for Ukraine

ID: 7702 · Report as offensive    Reply Quote
Profile rilian
Avatar

Send message
Joined: 7 Sep 07
Posts: 35
Credit: 107,666
RAC: 725
Message 7703 - Posted: 18 Jun 2024, 15:47:03 UTC

I only switched to RTX 3060 yesterday (for Folding) and accidentally noticed it crunches Ralph tasks

I just noticed, the 7 tasks that i successfully completed on Jun 15-16 were also nvidia app. At that time i had Nvidia NVS 310 (very old graphics card) on this computer and it worked fine without memory requirements

I dont want to waste GPU time for tasks that all fail so i will try to run one more task tomorrow if there will be any
--
I crunch for Ukraine

ID: 7703 · Report as offensive    Reply Quote
Fardringle

Send message
Joined: 22 Feb 06
Posts: 18
Credit: 360,436
RAC: 1,901
Message 7706 - Posted: 19 Jun 2024, 3:12:41 UTC

One of my computers was able to download several of these tasks, and after about 3 hours of CPU time, the estimated remaining time was 15-20 minutes.

After about 15 hours, the estimated remaining time was finally down to 1 second.

At 17+ hours, the remaining time is zero (empty) but the tasks are still showing as running, and are using a significant amount of RAM and CPU power.
ID: 7706 · Report as offensive    Reply Quote
Profile rilian
Avatar

Send message
Joined: 7 Sep 07
Posts: 35
Credit: 107,666
RAC: 725
Message 7707 - Posted: 19 Jun 2024, 3:29:21 UTC - in response to Message 7706.  

One of my computers was able to download several of these tasks, and after about 3 hours of CPU time, the estimated remaining time was 15-20 minutes.

After about 15 hours, the estimated remaining time was finally down to 1 second.

At 17+ hours, the remaining time is zero (empty) but the tasks are still showing as running, and are using a significant amount of RAM and CPU power.


what target computation time do you have in the preferences?

i have set 30 mins and it still failed after 25m for me…
--
I crunch for Ukraine

ID: 7707 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 118
Credit: 193,939
RAC: 2,635
Message 7709 - Posted: 19 Jun 2024, 6:21:30 UTC - in response to Message 7706.  

One of my computers was able to download several of these tasks, and after about 3 hours of CPU time, the estimated remaining time was 15-20 minutes.

After about 15 hours, the estimated remaining time was finally down to 1 second.

At 17+ hours, the remaining time is zero (empty) but the tasks are still showing as running, and are using a significant amount of RAM and CPU power.
If you don't limit the threads available to the application being tested, every single running Task will try to use 8 threads, no matter how many are actually available.
Hence the Tasks take longer & longer to eventually complete as they are continually fighting each other for CPU time.
Grant
Darwin NT
ID: 7709 · Report as offensive    Reply Quote
Profile rilian
Avatar

Send message
Joined: 7 Sep 07
Posts: 35
Credit: 107,666
RAC: 725
Message 7717 - Posted: 19 Jun 2024, 14:45:56 UTC - in response to Message 7709.  

Did anyone finish a new task successfully ?
--
I crunch for Ukraine

ID: 7717 · Report as offensive    Reply Quote
fzs600

Send message
Joined: 4 Nov 10
Posts: 6
Credit: 1,175,574
RAC: 0
Message 7718 - Posted: 19 Jun 2024, 16:02:35 UTC - in response to Message 7717.  

Hello
when will a linux GPU application be available?
thanks
ID: 7718 · Report as offensive    Reply Quote
Fardringle

Send message
Joined: 22 Feb 06
Posts: 18
Credit: 360,436
RAC: 1,901
Message 7719 - Posted: 19 Jun 2024, 17:18:45 UTC - in response to Message 7707.  

Target computation time is set at 1 hour.

The 4 running tasks are still "running" with zero estimated time left after 32 hours. I want to just kill the tasks, but also have a bit of morbid curiosity to see if they will actually finish...
ID: 7719 · Report as offensive    Reply Quote
Profile rilian
Avatar

Send message
Joined: 7 Sep 07
Posts: 35
Credit: 107,666
RAC: 725
Message 7720 - Posted: 19 Jun 2024, 17:21:08 UTC - in response to Message 7719.  

Target computation time is set at 1 hour.

The 4 running tasks are still "running" with zero estimated time left after 32 hours. I want to just kill the tasks, but also have a bit of morbid curiosity to see if they will actually finish...

i'd suggest to use a HWiNFO64 or MSI Afterburner to measure if it really is using GPU..

Btw why it runs 4 tasks ? It is only possible if your machine has 4 GPUS, otherwise it would require custom app config to make it run in parallel ...
--
I crunch for Ukraine

ID: 7720 · Report as offensive    Reply Quote
Fardringle

Send message
Joined: 22 Feb 06
Posts: 18
Credit: 360,436
RAC: 1,901
Message 7721 - Posted: 19 Jun 2024, 17:39:25 UTC - in response to Message 7720.  

Target computation time is set at 1 hour.

The 4 running tasks are still "running" with zero estimated time left after 32 hours. I want to just kill the tasks, but also have a bit of morbid curiosity to see if they will actually finish...

i'd suggest to use a HWiNFO64 or MSI Afterburner to measure if it really is using GPU..

Btw why it runs 4 tasks ? It is only possible if your machine has 4 GPUS, otherwise it would require custom app config to make it run in parallel ...


This computer is not using a GPU. They are CPU tasks only. (I did get a few GPU tasks on another computer that has an RTX 3060ti, but they failed immediately, the same as other people have reported.)

I set up the app_config.xml file to only allow the computer to run a single Ralph task at a time and rebooted and all tasks reset to zero progress and it is now running the one task by itself and using around 3-4 of the 8 CPU threads. I'll watch it for a while to see if it can manage to actually complete a task with this configuration.
ID: 7721 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 118
Credit: 193,939
RAC: 2,635
Message 7723 - Posted: 19 Jun 2024, 18:13:01 UTC - in response to Message 7717.  
Last modified: 19 Jun 2024, 18:14:04 UTC

Did anyone finish a new task successfully ?
21 so far.
Grant
Darwin NT
ID: 7723 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 118
Credit: 193,939
RAC: 2,635
Message 7724 - Posted: 19 Jun 2024, 18:14:54 UTC - in response to Message 7721.  

I set up the app_config.xml file to only allow the computer to run a single Ralph task at a time and rebooted and all tasks reset to zero progress
So still no checkpointing.
Grant
Darwin NT
ID: 7724 · Report as offensive    Reply Quote
Fardringle

Send message
Joined: 22 Feb 06
Posts: 18
Credit: 360,436
RAC: 1,901
Message 7728 - Posted: 20 Jun 2024, 3:58:18 UTC - in response to Message 7721.  

I set up the app_config.xml file to only allow the computer to run a single Ralph task at a time and rebooted and all tasks reset to zero progress and it is now running the one task by itself and using around 3-4 of the 8 CPU threads. I'll watch it for a while to see if it can manage to actually complete a task with this configuration.


11 hours so far running a single task, and using between 30% and 70% of the CPU (using 3-6 of the 8 CPU threads), that single task says it is at 99.961% complete with 14 seconds left. But it was at 75% and 20 minutes left more than 9 hours ago, so it seems to be showing similar results as when several tasks were running at the same time on this computer...
ID: 7728 · Report as offensive    Reply Quote
Grant (SSSF)

Send message
Joined: 13 Jun 24
Posts: 118
Credit: 193,939
RAC: 2,635
Message 7731 - Posted: 20 Jun 2024, 4:44:31 UTC - in response to Message 7728.  
Last modified: 20 Jun 2024, 4:57:13 UTC

I set up the app_config.xml file to only allow the computer to run a single Ralph task at a time and rebooted and all tasks reset to zero progress and it is now running the one task by itself and using around 3-4 of the 8 CPU threads. I'll watch it for a while to see if it can manage to actually complete a task with this configuration.
11 hours so far running a single task, and using between 30% and 70% of the CPU (using 3-6 of the 8 CPU threads), that single task says it is at 99.961% complete with 14 seconds left. But it was at 75% and 20 minutes left more than 9 hours ago, so it seems to be showing similar results as when several tasks were running at the same time on this computer...
How are you limiting it's thread use?

Max-concurrent (which is what i used) limits the number of Tasks that will run, not the number of threads they will use.

I had to limit Rosetta (my other project) to just 4 threads (i've got 12 in total), otherwise it would try to use the others being used by Ralph- slowing Rosetta & Ralph down. When i had both projects limited, the Ralph Tasks processed close to the 4 hr Estimated time (although limiting the number resulted in the initial Estimated time being reduced).
But it did stop them from taking over 24hrs to process.
Since you've only got 8 threads on that system, you'd have to suspend all other work for them to process as quickly as they can (they completed in just under 4hrs).
Grant
Darwin NT
ID: 7731 · Report as offensive    Reply Quote
1 · 2 · Next

Message boards : RALPH@home bug list : RoseTTAFold All-Atom 0.02 (env)



©2024 University of Washington
http://www.bakerlab.org