Message boards : RALPH@home bug list : Rosetta mini beta and/or android 3.61-3.83
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 12 · Next
Author | Message |
---|---|
Snagletooth Send message Joined: 4 May 07 Posts: 67 Credit: 134,427 RAC: 0 |
I'm getting quick client/computer errors for the backrub_design tasks. From the stderr out: minirosetta_3.71_x86_64-apple-darwin(50310,0x7fff732a2300) malloc: *** error for object 0x4b4fc3ef02e87d9a: pointer being freed was not allocated Also gaurav_rsmn_0161_65_daa2_2_SAVE_ALL_OUT_20296_50_0 is claiming a file transfer error: # cpu_run_time_pref: 14400 Are those results truly lost? |
dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0 |
I'm getting quick client/computer errors for the backrub_design tasks. From the stderr out: I'm not sure what is causing the backrub error but the gaurav jobs have a filter that may sometimes remove all models so the result is as expected for that test. I think the filter has been updated so that at least 1 model is generated in the next test batch but I'm not sure. Vikram, the one submitting those jobs is testing this. |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 905 Credit: 1,892,541 RAC: 294 |
This first kind of android wus ("simple_cycpep_predict_") seems to be ok on my smartphone. Now i'm downloading a new type: "db_design5_". |
Trotador Send message Joined: 7 May 10 Posts: 33 Credit: 14,751,677 RAC: 0 |
The current Ralph WUs use huge amounts of RAM, I've seen up to 4 Gb per unit, is it on purpose? any new kind of simulation? I've crunched a lot of these backrub units, they are tough due to the large memory requirements. It is necessary to limit the quantity of units being simultaneously crunched and a lot of baby sitting, but it is also fun :). Most of them don't use to go over 4 Gb but I got half a dozen reaching almost 7GB in the same host. It has 32 Gb but also 72 threads :), in short it stalled because lack of memory, So I finally had to abort them and a few more because they were nearly over the deadline. |
siunik Send message Joined: 16 Mar 16 Posts: 1 Credit: 0 RAC: 0 |
Yeah me too.. Don't understand. |
dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0 |
I just updated the minirosetta_beta application to 3.72. The 32 bit linux version has not been updated yet due to some memory issues while compiling. I hope to have it available soon. |
Dr. Merkwürdigliebe Send message Joined: 12 Jun 15 Posts: 16 Credit: 23,473 RAC: 0 |
Just a short question: Why does ralph@home also download minirosetta_3.71 ? |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 905 Credit: 1,892,541 RAC: 294 |
Some memory errors on my win10 3752038 - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x015FC9A4 read attempt to address 0x2F551088 3752039 - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x015FCA02 read attempt to address 0x30A68058 3752805 - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x015FCA02 read attempt to address 0x2194F048 |
Dr. Merkwürdigliebe Send message Joined: 12 Jun 15 Posts: 16 Credit: 23,473 RAC: 0 |
|
Trotador Send message Joined: 7 May 10 Posts: 33 Credit: 14,751,677 RAC: 0 |
In one of my hosts, all "des5ralph_design5" units failing after finishing crunching OK with </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>des5ralph_design5_hydrophobic32_test1_buriedtrp_S_0095_SAVE_ALL_OUT_20313_229_0_0</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> </message> ]]> This host have have processing time above default, all units have been crunched during 9-12 hours and generated lot of decoys but end with this error. Wingmen crunhing just an hour and generating few decoys are uploading OK. |
Trotador Send message Joined: 7 May 10 Posts: 33 Credit: 14,751,677 RAC: 0 |
All units erroring in all my Linux hosts: Some of the wus failing after finishing crunching OK with the error (these wus were donwloaded yesterday): </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>des5ralph_design5_hydrophobic32_test1_buriedtrp_S_0095_SAVE_ALL_OUT_20313_229_0_0</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> </message> ]]> Other failing after several hours or after restarting BOINC and reporting 0 seconds of time computed with the error (these ones dowloaded today): ERROR: ERROR: Option matching -cyclic_peptide:user_set_alph_dihedral_perturbation not found in command line top-level context I'm seing that most of the windows hosts seem to finish Ok the wu and report success, but it is not a conclusive fact. Stopping crunching until knowing more. |
BlisteringSheep Send message Joined: 3 Nov 15 Posts: 4 Credit: 2,231,667 RAC: 8 |
With 3.72, no successful work units on any Linux hosts. Tested across multiple distributions (all 64-bit). They are running to completion, but then reporting output file missing. |
robertmiles Send message Joined: 13 Jan 09 Posts: 103 Credit: 331,865 RAC: 0 |
These workunits gave a a computation error at about the same time that a workunit from another BOINC projects reached a point with a rather high memory demand - over a gigabyte. So they might be due to running out of memory, rather than anything else. https://ralph.bakerlab.org/result.php?resultid=3762275 https://ralph.bakerlab.org/result.php?resultid=3761810 https://ralph.bakerlab.org/result.php?resultid=3761801 https://ralph.bakerlab.org/result.php?resultid=3757576 https://ralph.bakerlab.org/result.php?resultid=3756003 However, my other computer running BOINC rarely runs out of memory, and gave a different error for some recent workunits. https://ralph.bakerlab.org/result.php?resultid=3757706 https://ralph.bakerlab.org/result.php?resultid=3753036 https://ralph.bakerlab.org/result.php?resultid=3752853 The application was shown as Rosetta Mini Beta, with no version number I could find after the workunits finished. The second computer shows three workunits that may be this type, still marked as version 3.72 while still on the computer. https://ralph.bakerlab.org/result.php?resultid=3763701 https://ralph.bakerlab.org/result.php?resultid=3762417 https://ralph.bakerlab.org/result.php?resultid=3763972 I've already looked into adding more memory for each of my computers that run BOINC. Their motherboards are not compatible with adding more. |
keputnam Send message Joined: 17 Feb 06 Posts: 2 Credit: 48,278 RAC: 0 |
Add me to the no more till it's fixedlist four WUs 0 successes 14 more stackee up that I will abort |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 905 Credit: 1,892,541 RAC: 294 |
I've already looked into adding more memory for each of my computers that run BOINC. Their motherboards are not compatible with adding more. My 6 cores has 16 Gb of ram and i have also wu's failure. I think it's not a question of "how much" memory, but seems to be an allocation problem. A 3.73 version will be welcome! P.S. 3.72 uses from 40 to 90 Mb of ram on my machines.... |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 905 Credit: 1,892,541 RAC: 294 |
Strange behaviour. Some wus fail after few minutes, others after 2 hours.... |
Mad_Max Send message Joined: 15 Nov 12 Posts: 15 Credit: 404,700 RAC: 0 |
Same here. A LOT of random WUs crashes on v 3.72 Different hosts, different CPUs (4/6/8 cores), different OS (Win 7 x64 and WinXP x32) - all getting a lot failed WUs with "Unhandled Exception Detected..." in logs |
Snagletooth Send message Joined: 4 May 07 Posts: 67 Credit: 134,427 RAC: 0 |
So far all "des5ralph_design5" tasks have failed and two of the three currently processing are exhibiting some curious behavior. Those that failed ended with: std::cerr: Exception was thrown: My target runtime is four hours. All of the tasks currently processing have exceeded that by two, eight and twenty-seven hours. According to the properties tab no checkpoints have been taken. I have confirmed via the computers' Activity Managers that all tasks are currently using the cpu. In the stderr out of the tasks that failed the lines "Starting watchdog...Watchdog active." do appear so presumably the watchdog is set but not working in the tasks I'm running now. Even more curious, two of the tasks on two different machines, with different versions of the Mac OS and different versions of BOINC, are recording elapsed times of less than the cpu times. Even my usually creative imagination is stumped by this. It seems fairly obvious that these tasks will have to be aborted but I'll hold off a bit in case anyone has any questions or DEK wants to try and retrieve a file for closer examination. Best, Snags |
Conan Send message Joined: 16 Feb 06 Posts: 364 Credit: 1,368,421 RAC: 0 |
With 3.72, no successful work units on any Linux hosts. Tested across multiple distributions (all 64-bit). They are running to completion, but then reporting output file missing. I am seeing the same thing, NO successful work units at all. Most run to completion (for me that is a 6 hour run time) but a number are also failing in less than an hour. This is on a 64 bit Linux host. Conan |
[VENETO] boboviz Send message Joined: 9 Apr 08 Posts: 905 Credit: 1,892,541 RAC: 294 |
An error also with the T0599_ batch, wu 3322377 - Unhandled Exception Record - |
Message boards :
RALPH@home bug list :
Rosetta mini beta and/or android 3.61-3.83
©2024 University of Washington
http://www.bakerlab.org