Posts by Trotador

11) Message boards : Number crunching : High memory usage by v4.07 task rb_03_24_16_25__t000__0_C2_SAVE_ALL_OUT_IGNORE_THE_REST_20741 (Message 6519)
Posted 3 Apr 2018 by Trotador
Post:
I'm seeing many units using over 1 GB RAM, many to 1.2 GB. and one now up to 1.6GB
12) Message boards : RALPH@home bug list : Rosetta_beta 4.0+ (Message 6513)
Posted 28 Mar 2018 by Trotador
Post:
4281026

File: C:cygwinhomeboincRosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306
chi angle must be between -180 and 180: -nan(ind)


Several of those ones as well.


A lot of them actually
13) Message boards : RALPH@home bug list : Rosetta_beta 4.0+ (Message 6512)
Posted 27 Mar 2018 by Trotador
Post:
4281026

File: C:cygwinhomeboincRosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306
chi angle must be between -180 and 180: -nan(ind)


Several of those ones as well.
14) Message boards : RALPH@home bug list : Rosetta_beta 4.0+ (Message 6239)
Posted 21 Nov 2017 by Trotador
Post:
I seems that there are lots of units to download but I can't downoload any in my hosts (linux), is it only for windows?
15) Message boards : RALPH@home bug list : Rosetta mini beta and/or android 3.61-3.83 (Message 6074)
Posted 23 Mar 2016 by Trotador
Post:
All WUs continue erroring in Linux. W7 seems Ok, rest of windows is a mix of failure/success
16) Message boards : RALPH@home bug list : Rosetta mini beta and/or android 3.61-3.83 (Message 6061)
Posted 19 Mar 2016 by Trotador
Post:
All units erroring in all my Linux hosts:

Some of the wus failing after finishing crunching OK with the error (these wus were donwloaded yesterday):

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>des5ralph_design5_hydrophobic32_test1_buriedtrp_S_0095_SAVE_ALL_OUT_20313_229_0_0</file_name>
<error_code>-161 (not found)</error_code>
</file_xfer_error>

</message>
]]>

Other failing after several hours or after restarting BOINC and reporting 0 seconds of time computed with the error (these ones dowloaded today):

ERROR: ERROR: Option matching -cyclic_peptide:user_set_alph_dihedral_perturbation not found in command line top-level context

I'm seing that most of the windows hosts seem to finish Ok the wu and report success, but it is not a conclusive fact.

Stopping crunching until knowing more.


17) Message boards : RALPH@home bug list : Rosetta mini beta and/or android 3.61-3.83 (Message 6060)
Posted 18 Mar 2016 by Trotador
Post:
In one of my hosts, all "des5ralph_design5" units failing after finishing crunching OK with

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>des5ralph_design5_hydrophobic32_test1_buriedtrp_S_0095_SAVE_ALL_OUT_20313_229_0_0</file_name>
<error_code>-161 (not found)</error_code>
</file_xfer_error>

</message>
]]>

This host have have processing time above default, all units have been crunched during 9-12 hours and generated lot of decoys but end with this error.

Wingmen crunhing just an hour and generating few decoys are uploading OK.
18) Message boards : RALPH@home bug list : Rosetta mini beta and/or android 3.61-3.83 (Message 6051)
Posted 13 Feb 2016 by Trotador
Post:
The current Ralph WUs use huge amounts of RAM, I've seen up to 4 Gb per unit, is it on purpose? any new kind of simulation?

thanks for the info




Yes, I'm running a test of a new type of job that runs small perturbations of the protein backbone and then does a round of design. The design protocol can use a lot of memory. I realize that this will be problematic and will see if we can distribute these jobs to high memory machines. We may just not be able to run these on R@h.



I've crunched a lot of these backrub units, they are tough due to the large memory requirements. It is necessary to limit the quantity of units being simultaneously crunched and a lot of baby sitting, but it is also fun :).

Most of them don't use to go over 4 Gb but I got half a dozen reaching almost 7GB in the same host. It has 32 Gb but also 72 threads :), in short it stalled because lack of memory, So I finally had to abort them and a few more because they were nearly over the deadline.



19) Message boards : RALPH@home bug list : Win10 3.71 Unhandled Exception: Reason: Out Of Memory (backrub) (Message 6047)
Posted 7 Feb 2016 by Trotador
Post:
Quote from thread http://ralph.bakerlab.org/forum_thread.php?id=567

Trotador
The current Ralph WUs use huge amounts of RAM, I've seen up to 4 Gb per unit, is it on purpose? any new kind of simulation?

thanks for the info


Dekim

Yes, I'm running a test of a new type of job that runs small perturbations of the protein backbone and then does a round of design. The design protocol can use a lot of memory. I realize that this will be problematic and will see if we can distribute these jobs to high memory machines. We may just not be able to run these on R@h.
20) Message boards : RALPH@home bug list : Rosetta mini beta and/or android 3.61-3.83 (Message 6038)
Posted 4 Feb 2016 by Trotador
Post:
The current Ralph WUs use huge amounts of RAM, I've seen up to 4 Gb per unit, is it on purpose? any new kind of simulation?

thanks for the info




Yes, I'm running a test of a new type of job that runs small perturbations of the protein backbone and then does a round of design. The design protocol can use a lot of memory. I realize that this will be problematic and will see if we can distribute these jobs to high memory machines. We may just not be able to run these on R@h.


R@H means Rosseta, doesn`t it? It is good for the investigation you could look for ways of distributing these units. The only effective way I could think of is limiting the quantity of units downloaded, by the user as in CEP project in WCG or by the project. Distributing them only to hosts with lot of memory could just not be enough if the hosts have also a lot of available threads (like mine).

A good thing I'm seeing with these units is that Boinc/Ralph seems to take into account the amount of available system memory and limits the menory used, even limiting the quantity of units in execution below the quantity of available threads, and the systems does not stall and hang as used to happen in these cases. Is it correct ?

21) Message boards : RALPH@home bug list : Rosetta mini beta and/or android 3.61-3.83 (Message 6035)
Posted 4 Feb 2016 by Trotador
Post:
The current Ralph WUs use huge amounts of RAM, I've seen up to 4 Gb per unit, is it on purpose? any new kind of simulation?

thanks for the info

22) Message boards : RALPH@home bug list : Rosetta mini beta and/or android 3.61-3.83 (Message 5905)
Posted 10 Oct 2015 by Trotador
Post:
I just updated the 64bit linux app. Please let me know how it goes as there might still be compatibility issues with the glut, GL, GLU libraries etc...


3.65 is working, great!
23) Message boards : RALPH@home bug list : Rosetta mini beta and/or android 3.61-3.83 (Message 5897)
Posted 9 Oct 2015 by Trotador
Post:
In Linux libglut.so.3 errors all over the place, perhaps looking in wrong directory?


me too :(
24) Message boards : RALPH@home bug list : minirosetta beta 3.50-3.52 apps (Message 5824)
Posted 5 Mar 2015 by Trotador
Post:
Validate error for all tasks returned today.

It looks like being on the server side, units crunch ok.

Suspended until solved.
25) Message boards : RALPH@home bug list : RosettaMini Beta 3.24 (Message 5491)
Posted 31 Mar 2012 by Trotador
Post:
Same here windows 7 and linux both 64 bits. All crashed with that same error.

regards
26) Message boards : RALPH@home bug list : Rosetta@Home Version 3.23 (Message 5473)
Posted 9 Mar 2012 by Trotador
Post:
Had this error on a 6 hour preference, it was completed by another volunteer with a much shorter runtime preference

See 2607010

ERROR: Fatal SOGFunc_Impl error.
ERROR:: Exit from: ......srccorescoringconstraintsSOGFunc_Impl.cc line: 181
BOINC:: Error reading and gzipping output datafile: default.out

Conan


Around a couple of tenths of mine have errored with this same message, all the wingmen but one also failed so far.

It is a system with windows 7 64 bits, boinc manager 6.10.58
27) Message boards : Number crunching : Milestones (Message 5454)
Posted 3 Feb 2012 by Trotador
Post:
Congratulations Conan! I take off my hat

regards
28) Message boards : RALPH@home bug list : Rosetta mini 3.18 (Message 5417)
Posted 8 Nov 2011 by Trotador
Post:
2KZU_... units are erroring just at the start

ERROR: in::file::boinc_wu_zip 4-boinc-submit/2KZU_chromodomain.zip does not exist!
ERROR:: Exit from: src/apps/public/boinc/minirosetta.cc line: 168
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish


a problem with the nemae of the files
29) Message boards : RALPH@home bug list : Rosetta mini 3.18 (Message 5414)
Posted 4 Nov 2011 by Trotador
Post:
Hi

Many validation errors today, around 90 out of 320 units, most of them finish in less of 100-200 seconds with few o them reaching 600 or 1000 seconds. So far all the wingmen also failed in these workunits. Both in Linux and in W7.

Regarding the extra long units with very low scores, I think they all are TO538..., It happened firtst with the beta 3.17 (TBC) and the subsequent releases behave equally. I tend lately to abort them.

regards

Edit: I've noticed that many units with validation errors are over 200 seconds
30) Message boards : Number crunching : Can't report work, Server Error : Can't attach shared memory (Message 5249)
Posted 31 Mar 2011 by Trotador
Post:
Hi

I'm now in this situation, I can't upload finished wus for two days receiving this same mesage.

regards


Previous 20 · Next 20



©2024 University of Washington
http://www.bakerlab.org