Message boards : RALPH@home bug list : minirosetta 1.58
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
robertmiles Send message Joined: 13 Jan 09 Posts: 103 Credit: 331,865 RAC: 0 |
https://ralph.bakerlab.org/workunit.php?wuid=1156395 Now that workunit has ended with a Computation error after a lot of these messages (not visible until the workunit ends): [2009- 2-14 11:55:25:] :: BOINC:: Initializing ... ok. Can't acquire lockfile - exiting [2009- 2-14 11:56: 7:] :: BOINC:: Initializing ... ok. Can't acquire lockfile - exiting [2009- 2-14 11:56:48:] :: BOINC:: Initializing ... ok. Can't acquire lockfile - exiting [2009- 2-14 11:57:29:] :: BOINC:: Initializing ... ok. Can't acquire lockfile - exiting [2009- 2-14 11:58:11:] :: BOINC:: Initializing ... ok. Can't acquire lockfile - exiting [2009- 2-14 11:58:52:] :: BOINC:: Initializing ... ok. Can't acquire lockfile - exiting [2009- 2-14 11:59:33:] :: BOINC:: Initializing ... ok. Can't acquire lockfile - exiting [2009- 2-14 12: 0:14:] :: BOINC:: Initializing ... ok. Can't acquire lockfile - exiting [2009- 2-14 12: 0:56:] :: BOINC:: Initializing ... ok. Can't acquire lockfile - exiting Hope these results are at least useful in tracking down the lockfile problem. |
robertmiles Send message Joined: 13 Jan 09 Posts: 103 Credit: 331,865 RAC: 0 |
I tried resetting the Ralph@home project as the messages suggested, and got these messages as a result: 2/14/2009 2:56:12 PM|ralph@home|Resetting project 2/14/2009 2:56:18 PM|ralph@home|[error] Couldn't delete file projects/ralph.bakerlab.org/minirosetta_1.58_windows_intelx86.exe 2/14/2009 2:56:33 PM|ralph@home|Sending scheduler request: To fetch work. Requesting 3853 seconds of work, reporting 0 completed tasks 2/14/2009 2:56:38 PM|ralph@home|Scheduler request succeeded: got 0 new tasks Looks like there's also a problem in your reset procedure. |
Paul D. Buck Send message Joined: 14 Jan 09 Posts: 62 Credit: 33,293 RAC: 0 |
I don't know if it's significant that I only saw this problem after enabling the graphics for a few minutes, even though I normally keep it disabled. The graphics looked reasonable, though. Ha! There may be the clue I was missing! I don't recall if I had been looking at the graphics or not. But it is likely when I saw a failure. Until I gave up in disgust. Though there are no tasks perhaps the test would be to run a few tasks with no looking and some where you look at the graphics. I am not sure why the launching of the graphics application would cause this issue but this could be the missing clue ... and why I never saw the issue in Einstein even when I had the setting that caused this issue in Rosetta ... MOST interesting is that you can launch graphics at 100% and have no issue. But, that the switch to pause the application would cause it. Oh, and if you don't want to lose that much CPU you can use 99% like I did and get the same effect ... |
robertmiles Send message Joined: 13 Jan 09 Posts: 103 Credit: 331,865 RAC: 0 |
Oh, and if you don't want to lose that much CPU you can use 99% like I did and get the same effect ... I tried 99% for a while but had two problems with this setting: 1. Problems making this setting reduce the CPU percentage at all - now fixed. 2. Problems getting Task Manager to show me such small gaps in CPU usage. I may try again soon at 95%, though. Mike, in order to save time in testing, you may want to consider these ideas: 1. Try to send a larger share of any workunits aimed at the lockfile problem to machines known to have had these problems recently. 2. If these same machines get a workunit aimed at testing anything else, immediately put a copy of that workunit back on the queue to be sent to machines not in this group. |
robertmiles Send message Joined: 13 Jan 09 Posts: 103 Credit: 331,865 RAC: 0 |
I tried resetting the Ralph@home project as the messages suggested, and got these messages as a result: Better check just what that Ralph@home reset procedure does. Since the reset, I haven't been able to connect to Rosetta@home, either through BOINC or through its website. I have a Rosetta@home result I haven't been able to send, or I'd try resetting the Rosetta@home project also. Has Rosetta@home been offline for several hours, or is this part of the result of the Ralph@home reset attempt? |
I _ quit Send message Joined: 13 Jan 09 Posts: 44 Credit: 88,562 RAC: 0 |
I tried resetting the Ralph@home project as the messages suggested, and got these messages as a result: It's been nearly 24 hrs and Rosetta is still down. It's almost like a system failure happened there. The problems there have nothing to do with Ralph and that problem you are having deleting the file. |
AdeB Send message Joined: 22 Dec 07 Posts: 61 Credit: 161,367 RAC: 0 |
This looks like a long-running model: resultid=1309799 Name: loopbuild_chunk_2_7_B_hb_t286__IGNORE_THE_REST_1YZFA_5_7846_1_0 Outcome: Validate error stderr out: . . . BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 14400 Hbond tripped !!! BOINC:: CPU time: 28889.1s, 14400s + 14400s[2009- 2-14 15:24: 2:] :: BOINC ====================================================== DONE :: 2 starting structures 28889.1 cpu seconds This process generated 3 decoys from 3 attempts ====================================================== called boinc_finish AdeB |
Paul D. Buck Send message Joined: 14 Jan 09 Posts: 62 Credit: 33,293 RAC: 0 |
At least we got a little work again ... |
robertmiles Send message Joined: 13 Jan 09 Posts: 103 Credit: 331,865 RAC: 0 |
A failed workunit: https://ralph.bakerlab.org/workunit.php?wuid=1160172 Some typical messages from it: 2/19/2009 12:19:42 AM|ralph@home|Restarting task 1ig5A_BOINC_ABRELAX_IGNORE_THE_REST-ENV10000--1ig5A-_7875_1_0 using minirosetta version 158 2/19/2009 12:20:22 AM|ralph@home|Task 1ig5A_BOINC_ABRELAX_IGNORE_THE_REST-ENV10000--1ig5A-_7875_1_0 exited with zero status but no 'finished' file 2/19/2009 12:20:22 AM|ralph@home|If this happens repeatedly you may need to reset the project. 2/19/2009 12:20:22 AM|ralph@home|Restarting task 1ig5A_BOINC_ABRELAX_IGNORE_THE_REST-ENV10000--1ig5A-_7875_1_0 using minirosetta version 158 2/19/2009 12:21:04 AM|ralph@home|Task 1ig5A_BOINC_ABRELAX_IGNORE_THE_REST-ENV10000--1ig5A-_7875_1_0 exited with zero status but no 'finished' file 2/19/2009 12:21:04 AM|ralph@home|If this happens repeatedly you may need to reset the project. These messages are repeated many times. I'm now running at 95% CPU, in order to help pin down the cause of this problem. |
Ian_D Send message Joined: 16 Feb 06 Posts: 16 Credit: 39,518 RAC: 0 |
https://ralph.bakerlab.org/result.php?resultid=1311870 <core_client_version>6.4.5</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> [2009- 2-17 6:40:25:] :: BOINC:: Initializing ... ok. [2009- 2-17 6:40:25:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Trying to access options object. Success. Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/ralph.bakerlab.org/minirosetta_database_rev26003.zip Setting database description ... Setting up checkpointing ... Setting up folding (abrelax) ... ERROR: ERROR: FragmentIO: could not open file cs_aa_1ji8A09_05.200_v1_3.gz ERROR:: Exit from: ....srccorefragmentFragmentIO.cc line: 245 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ]]> https://ralph.bakerlab.org/result.php?resultid=1311551 <core_client_version>6.4.5</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> [2009- 2-15 22:23:26:] :: BOINC:: Initializing ... ok. [2009- 2-15 22:23:26:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Trying to access options object. Success. Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/ralph.bakerlab.org/minirosetta_database_rev26003.zip Unpacking WU data ... Unpacking data: ../../projects/ralph.bakerlab.org/loopbuild_chunk_cheat_3_5.loopbuild_chunk.t326_.mtyka.boinc_files.zip Setting database description ... Setting up checkpointing ... ERROR: [ERROR] Error opening RBSeg file 'core_2GHRA_10_noloop_loops.txt' ERROR:: Exit from: ....srcprotocolsloopsLoopClass.cc line: 443 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ]]> |
Paul D. Buck Send message Joined: 14 Jan 09 Posts: 62 Credit: 33,293 RAC: 0 |
These messages are repeated many times. The one hint is that POSSIBLY only those tasks where you have the screen saver activate or use the graphics application ALONG with CPU throtteling may be linked ... can you make note of that? |
Ian_D Send message Joined: 16 Feb 06 Posts: 16 Credit: 39,518 RAC: 0 |
Invalid, Huh ? https://ralph.bakerlab.org/result.php?resultid=1316610 <core_client_version>6.4.5</core_client_version> <![CDATA[ <stderr_txt> [2009- 2-20 5:36:36:] :: BOINC:: Initializing ... ok. [2009- 2-20 5:36:36:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Trying to access options object. Success. Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/ralph.bakerlab.org/minirosetta_database_rev26003.zip Unpacking WU data ... Unpacking data: ../../projects/ralph.bakerlab.org/loopbuild_mamaln_ideal.loopbuild.t312_.mtyka.boinc_files.zip Setting database description ... Setting up checkpointing ... BOINC:: Worker startup. Starting watchdog... Watchdog active. ====================================================== DONE :: 1 starting structures 3034.98 cpu seconds This process generated 5 decoys from 5 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> ]]> |
robertmiles Send message Joined: 13 Jan 09 Posts: 103 Credit: 331,865 RAC: 0 |
These messages are repeated many times. Since then, I've had two 1.58 workunits complete successfully with no graphics application activation. Still running at 95% CPU. Doing the same for 1.54 over on Rosetta@home doesn't trigger the problem. In other words, the combination of all of the following trigger the problem for me: 1.58, less than 100% CPU, activating graphics after the workunit starts without them, shutting down the graphics window. Running 1.58 at less than 100% CPU, but with no graphics, doesn't trigger it for me. The problem doesn't trigger for 1.54. I haven't tested the other possibilities yet. I use an all-black screen saver these days. |
I _ quit Send message Joined: 13 Jan 09 Posts: 44 Credit: 88,562 RAC: 0 |
Just FYI, ALL tasks assigned to me completed ok. NO compute errors! |
robertmiles Send message Joined: 13 Jan 09 Posts: 103 Credit: 331,865 RAC: 0 |
The last few times I looked at the System Status, the File Deleter was not running. Does it need to be running more often? |
Paul D. Buck Send message Joined: 14 Jan 09 Posts: 62 Credit: 33,293 RAC: 0 |
It looks like three tasks with the same error... and it is not one I have seen before: ERROR: ( vol_a.length() == 2 ) && ( std::isalpha( vol_a[ 0 ] ) ) && ( vol_a[ 1 ] == ':' ) ERROR:: Exit from: ....srcutilityfileFileName.cc line: 41 BOINC:: Error reading and gzipping output datafile: default.out 1325665 1325692 1325691 |
svincent Send message Joined: 4 Apr 08 Posts: 34 Credit: 51,768 RAC: 0 |
I've had 4 workunits on Mac OS X 10.4.11 that all failed after apparent successful completion </stderr_txt> <message> <file_xfer_error> <file_name>homobench_natrelax_t312__8094_1_1_0</file_name> <error_code>-161</error_code> </file_xfer_error> Workunit ID's 1170292 1170291 1170290 1170289 It appears in each case, that they had previously been sent to a Windows machine where they failed (as noted by Paul Buck) in the manner shown below, but at the start, not at the end: ERROR: ( vol_a.length() == 2 ) && ( std::isalpha( vol_a[ 0 ] ) ) && ( vol_a[ 1 ] == ':' ) ERROR:: Exit from: ....srcutilityfileFileName.cc line: 41 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
This one had a result file over 1MB. It's name is cc_2_2_mamcstmix_cen_bounded_0.1_hb_t311__ IGNORE_THE_REST_1B0NA_6_8133_1_0 It also ended after under 15 hrs on my 24hr preference. Looks like it hit the 99 model limit. |
Evan Send message Joined: 23 Dec 07 Posts: 75 Credit: 69,584 RAC: 0 |
|
Paul D. Buck Send message Joined: 14 Jan 09 Posts: 62 Credit: 33,293 RAC: 0 |
|
Message boards :
RALPH@home bug list :
minirosetta 1.58
©2025 University of Washington
http://www.bakerlab.org