Message boards : RALPH@home bug list : RoseTTAFold All-Atom 0.02 (env)
Author | Message |
---|---|
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 910 Credit: 1,892,541 RAC: 294 |
Today we have this new app, 0.02(env) But, as usual, no explanation about |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 118 Credit: 193,939 RAC: 2,635 |
Things are very, very, very broken. I am able to view my Tasks. And apparently "Generalized biomolecular modeling and design with RoseTTAFold All-Atom v0.02 (env)" was used to process the last batch of work "Generalized biomolecular modeling and design with RoseTTAFold All-Atom v0.01" is doing the current batch (which so far died straight away). And when i look at my Application details, it thinks the same thing- the new version shows me as having done 31 Tasks. The old version shows me as having done 2 (the ones that just downloaded and died instantly). On the Computing, Application page, the old version has gone from the list & the new version is the only one there, showing the Average Computing number of the old application. Could be the problem is that the old application is trying to process the new Tasks because everything is so scrambled? Grant Darwin NT |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 910 Credit: 1,892,541 RAC: 294 |
Could be the problem is that the old application is trying to process the new Tasks because everything is so scrambled? Maybe. Now it's "0.02 (nvidia_alpha)" and i'm not downloading wus (now over 1000 in queue), probably because i've not a Nvidia gpu. |
rilian Send message Joined: 7 Sep 07 Posts: 35 Credit: 107,666 RAC: 725 |
Generalized biomolecular modeling and design with RoseTTAFold All-Atom 0.02 (nvidia_alpha) I've got few of these Currently crunching 1st task - it always stays at 100% - few already failed after half hour I updated BOINC to 8.0.2, to see if this helps task to process <core_client_version>8.0.0</core_client_version> <![CDATA[ <message> exceeded elapsed time limit 1594.21 (100000000.00G/62727.08G)</message> <stderr_txt> Unhandled Exception Detected... - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x00007FFED3C5AFA2 Engaging BOINC Windows Runtime Debugger... ******************** BOINC Windows Runtime Debugger Version 7.9.0 Dump Timestamp : 06/18/24 10:16:53 Install Directory : C:Program FilesBOINC Data Directory : C:ProgramDataBOINC Project Symstore : https://boinc.bakerlab.org/rosetta/symstore LoadLibraryA( C:ProgramDataBOINCdbghelp.dll ): GetLastError = 126 Loaded Library : dbghelp.dll LoadLibraryA( C:ProgramDataBOINCsymsrv.dll ): GetLastError = 126 LoadLibraryA( symsrv.dll ): GetLastError = 126 LoadLibraryA( C:ProgramDataBOINCsrcsrv.dll ): GetLastError = 126 LoadLibraryA( srcsrv.dll ): GetLastError = 126 LoadLibraryA( C:ProgramDataBOINCversion.dll ): GetLastError = 126 Loaded Library : version.dll Debugger Engine : 4.0.5.0 Symbol Search Path: C:ProgramDataBOINCslots13;C:ProgramDataBOINCprojectsralph.bakerlab.org;srv*C:ProgramDataBOINCprojectsralph.bakerlab.orgsymbols*http://msdl.microsoft.com/download/symbols;srv*C:ProgramDataBOINCprojectsralph.bakerlab.orgsymbols*https://boinc.bakerlab.org/rosetta/symstore ModLoad: 0000000040000000 000000000013f000 C:ProgramDataBOINCprojectsralph.bakerlab.orgw_0.02_windows_x86_64.exe (-nosymbols- Symbols Loaded) Linked PDB Filename : C:UsersUsersourcereposConsoleApplication1x64ReleaseConsoleApplication1.pdb ModLoad: 00000000d5e30000 00000000001f8000 C:windowsSYSTEM32ntdll.dll (6.2.19041.4522) (-exported- Symbols Loaded) Linked PDB Filename : ntdll.pdb File Version : 10.0.19041.3636 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.3636 ModLoad: 00000000d3e90000 00000000000bd000 C:windowsSystem32KERNEL32.DLL (6.2.19041.3636) (-exported- Symbols Loaded) Linked PDB Filename : kernel32.pdb File Version : 10.0.19041.3636 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.3636 ModLoad: 00000000d3b70000 00000000002f6000 C:windowsSystem32KERNELBASE.dll (6.2.19041.4522) (-exported- Symbols Loaded) Linked PDB Filename : kernelbase.pdb File Version : 10.0.19041.3636 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.3636 ModLoad: 00000000d4330000 000000000019f000 C:windowsSystem32USER32.dll (6.2.19041.4474) (-exported- Symbols Loaded) Linked PDB Filename : user32.pdb File Version : 10.0.19041.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.1 ModLoad: 00000000d35a0000 0000000000022000 C:windowsSystem32win32u.dll (6.2.19041.4529) (-exported- Symbols Loaded) Linked PDB Filename : win32u.pdb File Version : 10.0.19041.4529 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.4529 ModLoad: 00000000d5800000 000000000002b000 C:windowsSystem32GDI32.dll (6.2.19041.4474) (-exported- Symbols Loaded) Linked PDB Filename : gdi32.pdb File Version : 10.0.19041.4474 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.4474 ModLoad: 00000000d3660000 0000000000117000 C:windowsSystem32gdi32full.dll (6.2.19041.4474) (-exported- Symbols Loaded) Linked PDB Filename : gdi32full.pdb File Version : 10.0.19041.4474 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.4474 ModLoad: 00000000d3820000 000000000009d000 C:windowsSystem32msvcp_win.dll (6.2.19041.3636) (-exported- Symbols Loaded) Linked PDB Filename : msvcp_win.pdb File Version : 10.0.19041.3636 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.3636 ModLoad: 00000000d3a70000 0000000000100000 C:windowsSystem32ucrtbase.dll (6.2.19041.3636) (-exported- Symbols Loaded) Linked PDB Filename : ucrtbase.pdb File Version : 10.0.19041.3636 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.3636 ModLoad: 00000000d4160000 00000000000b0000 C:windowsSystem32ADVAPI32.dll (6.2.19041.4522) (-exported- Symbols Loaded) Linked PDB Filename : advapi32.pdb File Version : 10.0.19041.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.1 ModLoad: 00000000d4290000 000000000009e000 C:windowsSystem32msvcrt.dll (7.0.19041.3636) (-exported- Symbols Loaded) Linked PDB Filename : msvcrt.pdb File Version : 7.0.19041.3636 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 7.0.19041.3636 ModLoad: 00000000d4a20000 00000000000a0000 C:windowsSystem32sechost.dll (6.2.19041.4522) (-exported- Symbols Loaded) Linked PDB Filename : sechost.pdb File Version : 10.0.19041.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.1 ModLoad: 00000000d56d0000 0000000000123000 C:windowsSystem32RPCRT4.dll (6.2.19041.4355) (-exported- Symbols Loaded) Linked PDB Filename : rpcrt4.pdb File Version : 10.0.19041.3636 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.3636 ModLoad: 00000000d37f0000 0000000000027000 C:windowsSystem32bcrypt.dll (6.2.19041.3636) (-exported- Symbols Loaded) Linked PDB Filename : bcrypt.pdb File Version : 10.0.19041.1 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.1 ModLoad: 00000000d5830000 000000000002f000 C:windowsSystem32IMM32.DLL (6.2.19041.4474) (-exported- Symbols Loaded) Linked PDB Filename : imm32.pdb File Version : 10.0.19041.4474 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.4474 ModLoad: 00000000d1390000 0000000000012000 C:windowsSYSTEM32kernel.appcore.dll (6.2.19041.3758) (-exported- Symbols Loaded) Linked PDB Filename : Kernel.Appcore.pdb File Version : 10.0.19041.3758 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.3758 ModLoad: 00000000c8310000 00000000001e4000 C:windowsSYSTEM32dbghelp.dll (6.2.19041.3636) (-exported- Symbols Loaded) Linked PDB Filename : dbghelp.pdb File Version : 10.0.19041.3636 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.3636 ModLoad: 00000000cd1e0000 000000000000a000 C:windowsSYSTEM32version.dll (6.2.19041.3636) (-exported- Symbols Loaded) Linked PDB Filename : version.pdb File Version : 10.0.19041.3636 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.3636 ModLoad: 00000000d35d0000 0000000000082000 C:windowsSystem32bcryptPrimitives.dll (6.2.19041.3636) (-exported- Symbols Loaded) Linked PDB Filename : bcryptprimitives.pdb File Version : 10.0.19041.3636 (WinBuild.160101.0800) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 10.0.19041.3636 *** Dump of the Process Statistics: *** - I/O Operations Counters - Read: 6, Write: 185, Other 91 - I/O Transfers Counters - Read: 18454, Write: 194, Other 9532 - Paged Pool Usage - QuotaPagedPoolUsage: 90264, QuotaPeakPagedPoolUsage: 90440 QuotaNonPagedPoolUsage: 6520, QuotaPeakNonPagedPoolUsage: 7472 - Virtual Memory Usage - VirtualSize: 2031616, PeakVirtualSize: 83140608 - Pagefile Usage - PagefileUsage: 2031616, PeakPagefileUsage: 2031616 - Working Set Size - WorkingSetSize: 5619712, PeakWorkingSetSize: 5623808, PageFaultCount: 1499 *** Dump of thread ID 3956 (state: Waiting): *** - Information - Status: Wait Reason: UserRequest, , Kernel Time: 156250.000000, User Time: 0.000000, Wait Time: 4363216.000000 - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x00007FFED3C5AFA2 - Registers - rax=0000000000000000 rbx=0000000000000001 rcx=0000000040099c58 rdx=000000000231eb80 rsi=0000000000000000 rdi=0000000000000000 r8=000000000231eb80 r9=0000000040099c48 r10=0000000000000fff r11=0000000000000ff0 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 rip=00000000d3c5afa2 rsp=000000000231eb58 rbp=0000000000000000 cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246 - Callstack - ChildEBP RetAddr Args to Child 0231eb50 40041010 00000001 0231eb80 0231eb80 40099c48 KERNELBASE!DebugBreak+0x0 0231ef90 40041b74 32000000 0231f2f3 00000000 00000154 w_0.02_windows_x86_64!+0x0 0231fef0 40041f14 00000000 00000000 00000000 00000000 w_0.02_windows_x86_64!+0x0 0231ff20 d3ea7344 00000000 00000000 00000000 00000000 w_0.02_windows_x86_64!+0x0 0231ff50 d5e7cc91 00000000 00000000 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0 0231ffd0 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0 *** Dump of thread ID 4796 (state: Waiting): *** - Information - Status: Wait Reason: UserRequest, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 4261127.000000 - Registers - rax=0000000000000004 rbx=0000000000000000 rcx=0000000000000130 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000130 r8=0000000000564ba0 r9=00000000000000ab r10=000000000000002d r11=00000000000000ab r12=0000000000000000 r13=0000000000000000 r14=0000000000000130 r15=0000000000000000 rip=00000000d5ecd5e4 rsp=000000000013b518 rbp=000000000013b679 cs=0033 ss=002b ds=0000 es=0000 fs=0000 gs=0000 efl=00000246 - Callstack - ChildEBP RetAddr Args to Child 0013b510 d3b91c4e 00000000 d3bcb900 00000000 00000000 ntdll!ZwWaitForSingleObject+0x0 0013b5b0 4006a010 00000130 ffffffff 00000000 00000130 KERNELBASE!WaitForSingleObjectEx+0x0 0013b6d0 40069ddd 00000000 00000000 ffffffff 00000000 w_0.02_windows_x86_64!+0x0 0013b730 4004be8b 00000000 00000000 0013b7a0 00567700 w_0.02_windows_x86_64!+0x0 0013b7a0 40003e04 0056dde0 00000031 00567c00 0013b8b0 w_0.02_windows_x86_64!+0x0 0014fee0 400447b0 00567570 00000000 00559300 00000000 w_0.02_windows_x86_64!+0x0 0014ff20 d3ea7344 00000000 00000000 00000000 00000000 w_0.02_windows_x86_64!+0x0 0014ff50 d5e7cc91 00000000 00000000 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0 0014ffd0 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0 *** Debug Message Dump **** *** Foreground Window Data *** Window Name : Window Class : Window Process ID: 0 Window Thread ID : 0 Exiting... </stderr_txt> ]]> -- I crunch for Ukraine |
rilian Send message Joined: 7 Sep 07 Posts: 35 Credit: 107,666 RAC: 725 |
Updating BOINC to 8.0.2 did not help The task fails with Exit status 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED after 25 mins I have a setting to run tasks 1 day in preferences ... -- I crunch for Ukraine |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 910 Credit: 1,892,541 RAC: 294 |
Updating BOINC to 8.0.2 did not help Did you see, in the other thread, the message about gpu ram (min. 6gb)? Have you the latest gpu driver for your Nvidia? |
rilian Send message Joined: 7 Sep 07 Posts: 35 Credit: 107,666 RAC: 725 |
Updating BOINC to 8.0.2 did not help Yes i have RTX 3060 with 12GB ram - so i dont receive such error msg Task crunches for ~25 mins and then fails -- I crunch for Ukraine |
rilian Send message Joined: 7 Sep 07 Posts: 35 Credit: 107,666 RAC: 725 |
I only switched to RTX 3060 yesterday (for Folding) and accidentally noticed it crunches Ralph tasks I just noticed, the 7 tasks that i successfully completed on Jun 15-16 were also nvidia app. At that time i had Nvidia NVS 310 (very old graphics card) on this computer and it worked fine without memory requirements I dont want to waste GPU time for tasks that all fail so i will try to run one more task tomorrow if there will be any -- I crunch for Ukraine |
Fardringle Send message Joined: 22 Feb 06 Posts: 18 Credit: 360,436 RAC: 1,901 |
One of my computers was able to download several of these tasks, and after about 3 hours of CPU time, the estimated remaining time was 15-20 minutes. After about 15 hours, the estimated remaining time was finally down to 1 second. At 17+ hours, the remaining time is zero (empty) but the tasks are still showing as running, and are using a significant amount of RAM and CPU power. |
rilian Send message Joined: 7 Sep 07 Posts: 35 Credit: 107,666 RAC: 725 |
One of my computers was able to download several of these tasks, and after about 3 hours of CPU time, the estimated remaining time was 15-20 minutes. what target computation time do you have in the preferences? i have set 30 mins and it still failed after 25m for me… -- I crunch for Ukraine |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 118 Credit: 193,939 RAC: 2,635 |
One of my computers was able to download several of these tasks, and after about 3 hours of CPU time, the estimated remaining time was 15-20 minutes.If you don't limit the threads available to the application being tested, every single running Task will try to use 8 threads, no matter how many are actually available. Hence the Tasks take longer & longer to eventually complete as they are continually fighting each other for CPU time. Grant Darwin NT |
rilian Send message Joined: 7 Sep 07 Posts: 35 Credit: 107,666 RAC: 725 |
|
fzs600 Send message Joined: 4 Nov 10 Posts: 6 Credit: 1,175,574 RAC: 0 |
Hello when will a linux GPU application be available? thanks |
Fardringle Send message Joined: 22 Feb 06 Posts: 18 Credit: 360,436 RAC: 1,901 |
Target computation time is set at 1 hour. The 4 running tasks are still "running" with zero estimated time left after 32 hours. I want to just kill the tasks, but also have a bit of morbid curiosity to see if they will actually finish... |
rilian Send message Joined: 7 Sep 07 Posts: 35 Credit: 107,666 RAC: 725 |
Target computation time is set at 1 hour. i'd suggest to use a HWiNFO64 or MSI Afterburner to measure if it really is using GPU.. Btw why it runs 4 tasks ? It is only possible if your machine has 4 GPUS, otherwise it would require custom app config to make it run in parallel ... -- I crunch for Ukraine |
Fardringle Send message Joined: 22 Feb 06 Posts: 18 Credit: 360,436 RAC: 1,901 |
Target computation time is set at 1 hour. This computer is not using a GPU. They are CPU tasks only. (I did get a few GPU tasks on another computer that has an RTX 3060ti, but they failed immediately, the same as other people have reported.) I set up the app_config.xml file to only allow the computer to run a single Ralph task at a time and rebooted and all tasks reset to zero progress and it is now running the one task by itself and using around 3-4 of the 8 CPU threads. I'll watch it for a while to see if it can manage to actually complete a task with this configuration. |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 118 Credit: 193,939 RAC: 2,635 |
Did anyone finish a new task successfully ?21 so far. Grant Darwin NT |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 118 Credit: 193,939 RAC: 2,635 |
I set up the app_config.xml file to only allow the computer to run a single Ralph task at a time and rebooted and all tasks reset to zero progressSo still no checkpointing. Grant Darwin NT |
Fardringle Send message Joined: 22 Feb 06 Posts: 18 Credit: 360,436 RAC: 1,901 |
I set up the app_config.xml file to only allow the computer to run a single Ralph task at a time and rebooted and all tasks reset to zero progress and it is now running the one task by itself and using around 3-4 of the 8 CPU threads. I'll watch it for a while to see if it can manage to actually complete a task with this configuration. 11 hours so far running a single task, and using between 30% and 70% of the CPU (using 3-6 of the 8 CPU threads), that single task says it is at 99.961% complete with 14 seconds left. But it was at 75% and 20 minutes left more than 9 hours ago, so it seems to be showing similar results as when several tasks were running at the same time on this computer... |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 118 Credit: 193,939 RAC: 2,635 |
How are you limiting it's thread use?I set up the app_config.xml file to only allow the computer to run a single Ralph task at a time and rebooted and all tasks reset to zero progress and it is now running the one task by itself and using around 3-4 of the 8 CPU threads. I'll watch it for a while to see if it can manage to actually complete a task with this configuration.11 hours so far running a single task, and using between 30% and 70% of the CPU (using 3-6 of the 8 CPU threads), that single task says it is at 99.961% complete with 14 seconds left. But it was at 75% and 20 minutes left more than 9 hours ago, so it seems to be showing similar results as when several tasks were running at the same time on this computer... Max-concurrent (which is what i used) limits the number of Tasks that will run, not the number of threads they will use. I had to limit Rosetta (my other project) to just 4 threads (i've got 12 in total), otherwise it would try to use the others being used by Ralph- slowing Rosetta & Ralph down. When i had both projects limited, the Ralph Tasks processed close to the 4 hr Estimated time (although limiting the number resulted in the initial Estimated time being reduced). But it did stop them from taking over 24hrs to process. Since you've only got 8 threads on that system, you'd have to suspend all other work for them to process as quickly as they can (they completed in just under 4hrs). Grant Darwin NT |
Message boards :
RALPH@home bug list :
RoseTTAFold All-Atom 0.02 (env)
©2024 University of Washington
http://www.bakerlab.org