Message boards : RALPH@home bug list : minirosetta v1.48-1.51 bug thread
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Ian_D Send message Joined: 16 Feb 06 Posts: 16 Credit: 39,518 RAC: 0 |
Version 1.51 https://ralph.bakerlab.org/result.php?resultid=1258153 <core_client_version>6.4.5</core_client_version> <![CDATA[ <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> BOINC:: Initializing ... ok. [2009- 1-22 9:15:49:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Initializing options.... ok Initializing random generators... ok Initialization complete. Watchdog active. Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x0081E942 read attempt to address 0x00000000 Engaging BOINC Windows Runtime Debugger... |
I _ quit Send message Joined: 13 Jan 09 Posts: 44 Credit: 88,562 RAC: 0 |
I don't see that in my boinc manager thanks for the info evan. found the file, in 3 hrs+ run time one task accumulated 561 so far and the other 305. that's a pretty healthy number for 3 hrs. feet1st looks easy enough. i may try that later. thanks mtyka - now into the 1.50 tasks and so far no errors on any of the tasks sent to my system. |
HA-SOFT, s.r.o. Send message Joined: 19 Jan 09 Posts: 6 Credit: 19,644 RAC: 0 |
Tasks can not be suspended, boinc can do nothing with process. After few days I have about 10-15 death rosetta tasks with 3M RAM allocated. If I don't kill the app, it runs till pc restart (on server usually about 30 days till MS security update and restart). Can be problem with DEP turned on? Ver 1.51: BOINC:: Initializing ... ok. [2009- 1-22 11: 0:26:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. Unhandled Exception Detected... - Unhandled Exception Record - In Graphics application I have got: Starting graphics application... Setting window title to 'minirosetta version 1.51 [workunit: lr6_D_score12_rlbd_2ccv_IGNORE_THE_REST_DECOY_6840_2]'. OpenSemaphore failure Successfully loaded '../../projects/ralph.bakerlab.org/Helvetica.txf'... Close event (shmem not updated) detected, shutting down. Shutting down graphics application... but I'm running boinc like service app (and connecting over rdp) and this may be the problem for graphics. |
Evan Send message Joined: 23 Dec 07 Posts: 75 Credit: 69,584 RAC: 0 |
This one 1256978 has failed for the second time. Maybe the same problem as reported by Zdenek Vasku [url]<core_client_version>6.2.19</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> BOINC:: Initializing ... ok. [2009- 1-22 10:30:57:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Initializing options.... ok Initializing random generators... ok Initialization complete. Watchdog active. ERROR: unknown model name: 1DK8A_1 ERROR:: Exit from: d:boinc_buildminirosetta_windowsminisrcprotocols/abinitio/PairingStatistics.hh line: 170 called boinc_finish [/quote] |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
This task is running v1.51. It seems a bit on the large side in all dimensions here. It's been running for 6 hours, but is only on model 2. Step 900,000! It's peak memory usage so far was 430MB. Probably what you were expecting for such a large protein, but definitely needs a high memory flag. I believe the memory usage of the tasks is reported back with scheduler requests. Does the project process this data and query for anonomolies in memory usage? Or do you need us to report such things? |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
This task completed my 24hr runtime preference on v1.48. But it reported 99 starting structures (not the usual "1") and 98 decoys. So, what happened to the last one? |
Evan Send message Joined: 23 Dec 07 Posts: 75 Credit: 69,584 RAC: 0 |
|
robertmiles Send message Joined: 13 Jan 09 Posts: 103 Credit: 331,865 RAC: 0 |
Should the name of this thread be editted to include 1.51, or should a new thread be started for 1.51? |
robertmiles Send message Joined: 13 Jan 09 Posts: 103 Credit: 331,865 RAC: 0 |
Both copies of this test_cc 1.50 workunit failed quickly: https://ralph.bakerlab.org/workunit.php?wuid=1105459 |
Evan Send message Joined: 23 Dec 07 Posts: 75 Credit: 69,584 RAC: 0 |
The mammoths have not been having a happy time. I have had a run of 23 failures (1st time) including: unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x0081E942 read attempt to address 0x00000000 and ERROR: unknown model name: 1B0NA_10 ERROR:: Exit from: d:boinc_buildminirosetta_windowsminisrcprotocols/abinitio/PairingStatistics.hh line: 170 called boinc_finish |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
That was the step shown in the graphic. (to the right of the model number). I've just never seen such a large step number. |
Snagletooth Send message Joined: 4 May 07 Posts: 67 Credit: 134,427 RAC: 0 |
One more with v1.50: test_cc_1_8_nocst4_hb_t367__IGNORE_THE_REST_1UFBA_5_6830_3 <core_client_version>6.2.18</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> Watchdog active. # cpu_run_time_pref: 14400 ERROR: target_strands.size() ERROR:: Exit from: src/protocols/abinitio/TemplateJumpSetup.cc line: 94 called boinc_finish And one with v.1.51: test_cc2_1_8_mammoth_mix_cen_cst_hb_t311__IGNORE_THE_REST_1PERL_6_6852_1 <core_client_version>6.2.18</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> BOINC:: Initializing ... ok. [2009- 1-22 11:56:36:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Initializing options.... ok Initializing random generators... ok Initialization complete. Watchdog active. ERROR: unknown model name: 1B0NA_10 ERROR:: Exit from: src/protocols/abinitio/PairingStatistics.hh line: 170 called boinc_finish Snags |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
I tried loading up the docking task I got. It is 1.51. Displayed graphic just as the task was starting. Waited, waited... finally realized it was using more CPU then the thread working on the protein! Double checked Ralph settings for % of CPU for the graphic, set to default which is 10%. Here's my screenshot showing the graphic monopolizing one core, while the two running tasks are competing for the other. Net result, nothing shown in the graphic after several minutes, and graphic thread consuming much more then 10% of CPU. [edit] I gave up on it, captured the screen, uploaded the screenshot, reported it here... then when I opened the graphic a second time it was better behaved. Not overusing CPU... but was essentially unusable. Go to resize or rotate the images and it wouldn't respond for about 30 seconds each time. |
Ian_D Send message Joined: 16 Feb 06 Posts: 16 Credit: 39,518 RAC: 0 |
application version 1.51 https://ralph.bakerlab.org/result.php?resultid=1258676 <core_client_version>6.4.5</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> BOINC:: Initializing ... ok. [2009- 1-22 19: 2:51:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Initializing options.... ok Initializing random generators... ok Initialization complete. Watchdog active. ERROR: unknown model name: 1DK8A_1 ERROR:: Exit from: d:boinc_buildminirosetta_windowsminisrcprotocols/abinitio/PairingStatistics.hh line: 170 called boinc_finish </stderr_txt> ]]> |
Paul D. Buck Send message Joined: 14 Jan 09 Posts: 62 Credit: 33,293 RAC: 0 |
Otherwise this release has added debug information to let me figure out where all this stuff is failing. Believe me guys, we're now in the land whre i cannot reproduce these errors here what so ever. Not on the linux boxes, Mac boxes or windows boxes we have. nowhere. Why these remaining segfaults occur is a total mystery to me, so please bear with me. THis is going ot be incredibly difficult to track down. Questions I ask myself, Are they happening in the same module? Are they happening because of the same type of activity? What is common to all the events? Could it be something external? I know BOINC is supposed to be keeping things isolated and that application A is not supposed to affect application B ... but I have seen enough odd things that I am not convinced that this is completely so ... As to the tasks that fail ... you say they don't fail for you ... Why not grab the set and issue them to us with high replication counts ... then if they fail for everyone, that tells us one thing ... if they only fail for some and not others ... that tells us something else ... To my mind, the cases that fail are the ones that you should be saving and using as your issue tasks for each round of testing. Not to keep making new tasks in all cases, but to use those tasks that have proven their ability to cause a problem. Just some things to consider ... oh, and you are out of work again ... |
HA-SOFT, s.r.o. Send message Joined: 19 Jan 09 Posts: 6 Credit: 19,644 RAC: 0 |
DEP - execution protection on new Intel and AMD cores. How can I turn graphics off? |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
v1.51 ERROR: unknown model name: 2FRHA_10 ERROR:: Exit from: d:boinc_buildminirosetta_windowsminisrcprotocols/abinitio/PairingStatistics.hh line: 170 called boinc_finish on task test_cc2_1_8_mammoth_mix_cen_cst_hb_t327__IGNORE_THE_REST_2F2EA_7_6860_1_1 |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
I've seen a number of other reports of the screen saver/graphic just displaying as a black window. And I had always assumed people just weren't waiting long enough for the display to refresh. It does often take a long time, and I always assumed this was due to the allowed % of CPU time for the graphic. But, on the other hand, this wasn't an issue before about 2 or 3 months ago. I meant to point out that the screenshot shows the graphic thread has used 4:02 of CPU time, but the corresponding Ralph thread is shown with only 2:11 of CPU received so far. So the % CPU shown in the screenshot is only the last interval, but you can see from the totals that the number is roughly what it's been during the entire 4 minutes the task has been running. I was wondering if perhaps the graphic could always display at least it's grid lines, and text immediately, and perhaps a "protein information being retrieved... please wait" message in the frames. That way at least it would never just be "blank". |
I _ quit Send message Joined: 13 Jan 09 Posts: 44 Credit: 88,562 RAC: 0 |
Cool idea...I would love to see that as well. |
Snagletooth Send message Joined: 4 May 07 Posts: 67 Credit: 134,427 RAC: 0 |
test_cc_1_8_nocst4_hb_t327__IGNORE_THE_REST_2FSWA_6_6888_1 andtest_cc_1_8_nocst4_hb_t327__IGNORE_THE_REST_2F2EA_10_6888_1_1 both ended with: ERROR: unknown model name: 2FRHA_10 ERROR:: Exit from: src/protocols/abinitio/PairingStatistics.hh line: 170 This time they ran a few minutes instead of a few seconds, claimed to be done instead of declaring a compute error and received a validate error instead of a client error. Snags |
Message boards :
RALPH@home bug list :
minirosetta v1.48-1.51 bug thread
©2024 University of Washington
http://www.bakerlab.org