RALPH@home

Ralph v1.42

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search

Message boards : RALPH@home bug list : Ralph v1.42

AuthorMessage
James
Forum moderator
Project developer
Project scientist

Joined: Jun 22 06
Posts: 19
ID: 1548
Credit: 278
RAC: 0
Message 4357 - Posted 25 Nov 2008 9:28:29 UTC

    I've just updated Ralph to minirosetta to version 1.42. Please post about bugs/issues here. Cheers,

    James


    ____________

    Profile feet1st

    Joined: Mar 7 06
    Posts: 312
    ID: 1028
    Credit: 110,522
    RAC: 0
    Message 4359 - Posted 25 Nov 2008 13:56:42 UTC

      What bugs/issues do you feel you have resolved?

      Are there steps we can take to test any of the frequently encountered problems seen on Rosetta?
      ____________

      Profile Conan
      Avatar

      Joined: Feb 16 06
      Posts: 344
      ID: 145
      Credit: 1,309,534
      RAC: 0
      Message 4361 - Posted 26 Nov 2008 10:33:35 UTC

        Another Validate error on this WU

        I also have one that has run 2 hours past preference time and seems to be stuck at 9.57 minutes to go. Goes up at the rate of 0.001 percent about every 10 sec, the time is not decreasing in the last 2 hours.

        This is a similar problem to about 2 application versions ago. I doubt it will get to 100% by the time the Watchdog terminates the WU in about 1 hours time.
        ____________

        Profile Conan
        Avatar

        Joined: Feb 16 06
        Posts: 344
        ID: 145
        Credit: 1,309,534
        RAC: 0
        Message 4363 - Posted 27 Nov 2008 1:20:28 UTC

          More "Validate Errors" on 1185665
          1185675
          1186564
          1186565
          1186810
          1186813
          1186824

          All complete to completion but get no credit, lots of wasted time on this lot.
          ____________

          Profile Conan
          Avatar

          Joined: Feb 16 06
          Posts: 344
          ID: 145
          Credit: 1,309,534
          RAC: 0
          Message 4364 - Posted 27 Nov 2008 1:23:05 UTC

            "Compute Errors" on these two 1187124 and 1187126

            <message>
            process exited with code 1 (0x1, -255)
            </message>
            <stderr_txt>
            recovering checkpoint of tag S_1VL7A_12_00000001 with id abrelax_rg_state

            ERROR: Loops::add_loop error -- overlapping loop regions
            existing loop begin/end: 31/40
            new loop begin/end: 40/54
            ERROR:: Exit from: src/protocols/loops/LoopClass.cc line: 232
            called boinc_finish
            ____________

            James
            Forum moderator
            Project developer
            Project scientist

            Joined: Jun 22 06
            Posts: 19
            ID: 1548
            Credit: 278
            RAC: 0
            Message 4365 - Posted 27 Nov 2008 3:54:25 UTC

              Last modified: 27 Nov 2008 6:03:12 UTC

              feet1st: I've solicited information from the developers in our lab about what went into our update of Ralph v1.42, and I'll let some of the other project developers speak for themselves.

              Here are two update messages from our protein interface design team:
              - "We've revised the way that filters for protein-interface design runs are executed and reported in minirosetta 1.42. This will mean shorter run times for the Rosetta @ Home participants with more meaningful output for us."

              - "On the design front, we've made an effort to significantly reduce the memory overhead associated with design, so users with less RAM should be able to run these tasks without it bringing their system to a halt. However, design calculations are by nature very memory-intensive. Therefore, we have also restricted all design WUs to machines that allocate no less than 512MB for Rosetta @ Home."

              Here's a list of updates from our structure prediction team:
              "Constraints: Can now specify multiple constraints.
              Can now specify seperate constraints for centroid/fullatom

              Silent files now in AbrelaxApplication (FoldCST)

              Relax: A bunch of previously hardcoded parameters in Relax are now parameters.

              Native constraints to keep natives from drifting away.
              Start structure constraints with Loop Selection to restrain homology modelling cores from drifting too far.

              BugFix in Gaussian Constraints.

              Added a Pose Recombiner Mode that allows proteins to be spliced together.

              Job Distribution: Added a shuffle mode which allows us to run the large scale relax benchmark without destroying the BOINC database by flooding it with millions of command lines."



              Conan - those loop boundary errors were input errors by the person who submitted those workunits. The validate errors are the result of a new format added that's not yet supported by the BOINC server, and we'll have to update our server code to deal with it over the weekend. That slow workunit bug looks like something that we fixed several months ago, we've alerted the person who submitted those jobs and he's looking into them.

              Thanks for crunching, and from those of us in America - Happy Thanksgiving!

              Cheers,

              James
              ____________

              olange

              Joined: Nov 27 08
              Posts: 2
              ID: 4981
              Credit: 0
              RAC: 0
              Message 4366 - Posted 27 Nov 2008 6:45:07 UTC - in response to Message 4364.

                Hi Conan,

                the loop-errors are indeed an error in the input data. It shows why the ralph-project is so useful to us. Without ralph I would'nt have been able to spot these errors before running the jobs in a bigger scale on boinc.

                For the current project - development of a general and automatable comparative modelling machinery - we have ca. 40 target proteins each coming with 200 alignments to homologues proteins. These homologues proteins are somewhat similar to the target, and hence provide valuable structural clues, however, some parts are wrong and other parts are missing. A typical strategy is to rebuild everything which is missing and a couple of residues around that region.
                We are curious, however, if we can improve on that by also rebuilding other parts of the aligned regions, since these can be quite far from the target structure.
                Right now we try to find out where exactly we should struck the balance between rebuilding and copying homologues structure. This requires to scan a range of cutoffs. this is done by a script that creates loop-files, which encode exactly what has to be rebuild and what shall be kept rigid.
                The script generated 40*200*10 = 80.000 files. A lot of files. Due to a bug in the script, however, some contained subtle errors. I checked a handful of input conditions on our local machines and they were fine. So I went ahead and checked a larger number of input conditions on ralph. This revealed errors in some cases and thus valuable information to revise the script.

                Thank you for your interest and your help crunching for our science,
                Oliver

                Profile feet1st

                Joined: Mar 7 06
                Posts: 312
                ID: 1028
                Credit: 110,522
                RAC: 0
                Message 4367 - Posted 27 Nov 2008 16:23:17 UTC

                  These details are what will be needed to release on Rosetta, right? So, if we're testing here, let's be testing the descriptions as well, and if they don't make sense to lay-people, or whatever, we can work through those kinks as well.

                  Those descriptions are great, but perhaps the summarized version that will appear in the Rosetta news box would be good too "Brought memory usage down for tasks that were previously using more then normal, reduced per-model runtimes for most of the previously long-running models resulting in more consistent runtimes, additional refinements to the modeling logic...more details here"

                  What about the problem where suspended tasks keep running? I've not seen it occur here, but is that because it has been addressed? Or, luck of the task draw?
                  ____________

                  Profile Conan
                  Avatar

                  Joined: Feb 16 06
                  Posts: 344
                  ID: 145
                  Credit: 1,309,534
                  RAC: 0
                  Message 4368 - Posted 29 Nov 2008 7:41:06 UTC - in response to Message 4367.

                    *** G'Day feet1st,
                    I have often had the problem where both Ralph and Rosetta keep running even though Boinc Manager has switched the jobs and shows different tasks running.

                    I have a 4 core computer and the other night had 7 tasks running. This has happened a number of times.

                    If I stop and restart Boinc all is back to normal. If I let them run then it keeps happening till all the started jobs finish.

                    Can even happen when say Ralph and Cosmology are running together as well.

                    It is usually Ralph that has been doing this but I noticed Rosetta do it with Ralph only two nights ago.


                    *** G'Day to you James
                    and thanks for the follow up information, we then at least know we are helping. So if you know about the Validate errors then I will just say I did have another 7 of these (WU's 1186686, 1186687, 1186763, 1186909, 1186910 and 1186916).

                    *** G'Day olange,
                    Thanks for the feedback, not everyone processing work for this project reports problems, so I like to report what I find, also the other testers that report not only help you but also help me know when problems are occuring.



                    This WU did not do anything when it started. Unknown how long it was running before I realised it was not doing much as Boinc Manager said it was running but no cpu time showed and no percent done had happened. I aborted the WU.


                    I also have had three work units over the past week request access to my trusted zone and also internet access.
                    I have allowed these 3 requests but after allowing them the Work Units then error out anyway.
                    See 1187543
                    1187544
                    1187570

                    Trust this all helps.
                    ____________

                    mtyka
                    Forum moderator
                    Project developer
                    Project scientist

                    Joined: Mar 19 08
                    Posts: 79
                    ID: 4144
                    Credit: 0
                    RAC: 0
                    Message 4369 - Posted 29 Nov 2008 22:53:55 UTC

                      Ok, i've just checked in a fix for

                      a) the validator errors.
                      b) the checkpoint errors.
                      c) NANs in hbonding

                      This version will go out tomorrow onto ralph.
                      In total this version should address the following bugs (as far as i'm aware of)

                      - Excessive memory usage (design team)
                      - Long running jobs (desing team)
                      - Validator errors
                      - Check point errors
                      - NANs in hbonding
                      - Restarting jobs (there's finer checkpointing now in relaxmode)

                      Mike

                      Message boards : RALPH@home bug list : Ralph v1.42


                      Home | Join | About | Participants | Community | Statistics

                      Copyright © 2017 University of Washington

                      Last Modified: 20 Nov 2008 19:41:56 UTC
                      Back to top ^