Message boards : RALPH@home bug list : RoseTTAFold All-Atom
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 11 · Next
Author | Message |
---|---|
Vester Send message Joined: 29 Apr 20 Posts: 17 Credit: 1,176 RAC: 0 |
Task 5455911 Name RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_e_pred_41_16901_2_1 Stderr output <core_client_version>8.0.2</core_client_version> <![CDATA[ <message> The access code is invalid. (0xc) - exit code 12 (0xc)</message> <stderr_txt> Traceback (most recent call last): File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aapredict.py", line 699, in <module> with zipfile.ZipFile(b) as z: File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libzipfile.py", line 1268, in __init__ self._RealGetContents() File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libzipfile.py", line 1335, in _RealGetContents raise BadZipFile("File is not a zip file") zipfile.BadZipFile: File is not a zip file </stderr_txt> ]]> |
Vester Send message Joined: 29 Apr 20 Posts: 17 Credit: 1,176 RAC: 0 |
I have ten tasks in progress that have run more than twenty five minutes without failing. At first there were failures such as the one shown below indicating that my System Managed Paging Files were inadequate. I quickly suspended tasks, restarted and entered BIOS to stop hyperthreading my ten core Intel I9-10850K CPU. After continued running without task failures, I restarted and changed the paging files from the System Managed 42235 MB to a fixed 84470 MB. I have enabled hyperthreading again but have only 12 tasks available. Task 5456469 Name RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_f_pred_62_16902_5_0 Workunit 4847018 Created 12 Jun 2024, 23:56:01 UTC Sent 13 Jun 2024, 1:08:41 UTC Report deadline 14 Jun 2024, 1:08:41 UTC Received 13 Jun 2024, 1:13:38 UTC Server state Over Outcome Computation error Client state Compute error Exit status 12 (0x0000000C) Unknown error code Computer ID 49920 Run time 1 sec CPU time Validate state Invalid Credit 0.00 Device peak FLOPS 5.70 GFLOPS Application version Generalized biomolecular modeling and design with RoseTTAFold All-Atom v0.02 windows_x86_64 Peak working set size 33.86 MB Peak swap size 2,778.58 MB Peak disk usage 2.10 MB Stderr output <core_client_version>8.0.2</core_client_version> <![CDATA[ <message> The access code is invalid. (0xc) - exit code 12 (0xc)</message> [b]<stderr_txt>[/b] Traceback (most recent call last): File "C:ProgramDataBOINCprojectsralph.bakerlab.orgcv2rf2aapredict.py", line 8, in <module> import torch File "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorch__init__.py", line 124, in <module> raise err OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "C:ProgramDataBOINCprojectsralph.bakerlab.orgev0libsite-packagestorchlibcudnn_cnn_infer64_8.dll" or one of its dependencies. </stderr_txt> ]]> |
Vester Send message Joined: 29 Apr 20 Posts: 17 Credit: 1,176 RAC: 0 |
Are there no check points? I restarted after 30 minutes of crunching (over 20% complete) and the tasks restarted at zero progress. |
mikey Send message Joined: 28 Nov 20 Posts: 9 Credit: 114,771 RAC: 17 |
Mine are using almost 6gb of ram for EACH task so that could be why some are failing, I had to limit the 8 that were running to only 3 because of the ram usage but those 3 ARE running into the 3+ minutes so far!! |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 913 Credit: 1,892,541 RAC: 294 |
My default runtime is 4 hrs, but after 6hrs the wus are at 75%. If you see the windows task manager, you see these are python apps (on my pc is 1gb of ram/wu) The deadline is only 1 day :-( |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 126 Credit: 193,939 RAC: 2,635 |
I was getting Tasks erroring out, looking in my Event log & saw complaints about not enough disk space. Went from 20GB for BOINC to 30GB 13/06/2024 18:29:33 | ralph@home | Message from server: Generalized biomolecular modeling and design with RoseTTAFold All-Atom needs 3216.72MB more disk space. You currently have 13167.28 MB available and it needs 16384.00 MB. Don't know why it wasn't happy, but after giving it more the errors stopped (or i got lucky with the Tasks). Or it was trying to run Tasks before the last of the downloads had completed? BUt then why complain about disk space? Behaviour of these Tasks is very odd looking at Task manager- the CPU usage varies between 2% and 12% Suspending all but 6 Tasks (12 thread CPU) results in no improvement. In fact it makes it worse as suspending doesn't actually suspend them. In the BOINC Manager they show as suspended- but in Task manager they are still there using CPU time. After several suspends and resumes, the end result in BOINC Manager, 12 are running, 6 are waiting to run, 7 are ready to start. Yet in Task Manager there are 18 Python processes running. each using 700-1.6GB of RAM (the amount varies second by second). Developers- please fix suspend function so that Tasks do suspend when told to. FIx checkpointing so work can continue on, not start from scratch after restarting BOINC (or un-suspending when suspending is fixed). System is now sluggish, barely responsive at times. May need to kill some of those processes to keep the system up, but will try exiting & restarting BOINC first. Edit- exited BOINC, restarted, and back to just 12 running Tasks in Task Manager. |
Grant (SSSF) Send message Joined: 13 Jun 24 Posts: 126 Credit: 193,939 RAC: 2,635 |
Are there no check points? I restarted after 30 minutes of crunching (over 20% complete) and the tasks restarted at zero progress.Looking at the properties of my running Tasks, and it shows as no CPU time elapsed, and no elapsed CPU time since last checkpoint. Edit- exited & restarted BOINC- all work done lost, started from scratch. |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 913 Credit: 1,892,541 RAC: 294 |
Looking at the properties of my running Tasks, and it shows as no CPU time elapsed, and no elapsed CPU time since last checkpoint. Yep, i had to restart my pc after 11hrs (93% of calculation) and restarted from 0% :-( |
kotenok2000 Send message Joined: 26 Feb 21 Posts: 22 Credit: 1,893 RAC: 0 |
It uses GPU, but boinc manager doesn't reflect that. Also stderr.txt is empty. I would like to see some output during computation. |
Vester Send message Joined: 29 Apr 20 Posts: 17 Credit: 1,176 RAC: 0 |
There is not an entry in stderr because there not any errors. You may find what interests you in file:///C:/ProgramData/BOINC/client_state.xml, kotenok2000. |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 913 Credit: 1,892,541 RAC: 294 |
It uses GPU, but boinc manager doesn't reflect that. Nvidia, i suppose... |
kotenok2000 Send message Joined: 26 Feb 21 Posts: 22 Credit: 1,893 RAC: 0 |
My task crashed with <core_client_version>8.0.2</core_client_version> <![CDATA[ <stderr_txt> 16:47:05 (15252): called boinc_finish(0) </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>RF_SAVE_ALL_OUT_NOJRAN_IGNORE_THE_REST_validation_env_f_pred_342_16902_3_1_r301773838_0</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> </message> ]]> |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 913 Credit: 1,892,541 RAC: 294 |
The behaviour on my computer is this: the wus start "fast", 1% of calculation in approximately 2,5 minutes (so, to complete a wu, correctly, 4hrs) and then slow down Now, to crunch 1% between 51% and 52%, it takes over 5 minutes. And continue to reduce speed. This version of the app is better then the previous ones, but there is still a lot of work to do.... |
Vester Send message Joined: 29 Apr 20 Posts: 17 Credit: 1,176 RAC: 0 |
kotenok2000, that is not good news. I have 12 CPU tasks still running for over 12 hours. A Brave AI search found: Error_code -240 (stat() failed) /error_code |
kotenok2000 Send message Joined: 26 Feb 21 Posts: 22 Credit: 1,893 RAC: 0 |
I was getting Tasks erroring out, looking in my Event log & saw complaints about not enough disk space. You shouldn't run 12 tasks unless you have 12 GPUs. |
kotenok2000 Send message Joined: 26 Feb 21 Posts: 22 Credit: 1,893 RAC: 0 |
All tasks fail on my system. https://ralph.bakerlab.org/results.php?hostid=48013 And there is nothing useful in stderr.txt |
kotenok2000 Send message Joined: 26 Feb 21 Posts: 22 Credit: 1,893 RAC: 0 |
Was anyone able to return task successfully? |
Vester Send message Joined: 29 Apr 20 Posts: 17 Credit: 1,176 RAC: 0 |
I have twelve CPU tasks that are over 99% completed, but have not yet completed any tasks. I will know in the next few hours. |
kotenok2000 Send message Joined: 26 Feb 21 Posts: 22 Credit: 1,893 RAC: 0 |
does your task manager show any gpu activity? |
Vester Send message Joined: 29 Apr 20 Posts: 17 Credit: 1,176 RAC: 0 |
does your task manager show any gpu activity? I do not have an Nvidia GPU. |
Message boards :
RALPH@home bug list :
RoseTTAFold All-Atom
©2024 University of Washington
http://www.bakerlab.org