RALPH@home

minirosetta v1.43 bug thread

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search

Message boards : RALPH@home bug list : minirosetta v1.43 bug thread

AuthorMessage
James
Forum moderator
Project developer
Project scientist

Joined: Jun 22 06
Posts: 19
ID: 1548
Credit: 278
RAC: 0
Message 4370 - Posted 1 Dec 2008 6:08:47 UTC

    This is a minor update to v1.42 which was posted earlier last week, and contains the following fixes:

    - Excessive memory usage and long running jobs - jobs submitted with minirosetta v1.43 shouldn't have the same problems with memory and runtimes as earlier versions.
    - Validator errors - there was a small bug in v1.42 that resulted in results being called invalid by the BOINC server. This is now fixed.
    - Check point errors and restarting jobs - we have finer-grained checkpointing in our full-atom refinement mode, which means that there should be fewer errors and less wasted time.
    - NANs in hbonding: we have a more aggressive fix that tests for the NaN condition and continues more gracefully. This has been a tricky bug to track down, but we think that this is a big step forward.

    Please post bugs to this thread, and thank you very much for your patience. Cheers,

    James


    ____________

    mtyka
    Forum moderator
    Project developer
    Project scientist

    Joined: Mar 19 08
    Posts: 79
    ID: 4144
    Credit: 0
    RAC: 0
    Message 4372 - Posted 3 Dec 2008 0:29:11 UTC

      We've loaded a whole bunch of stuff onto the queue and things are looking good from our point of view. Most of the errors we're seeing are download errors.. this is typical when lots of clients try and get the new apps and should ease up shortly.

      Anything from your end ?

      Profile feet1st

      Joined: Mar 7 06
      Posts: 312
      ID: 1028
      Credit: 110,522
      RAC: 1
      Message 4373 - Posted 3 Dec 2008 5:19:02 UTC

        Last modified: 3 Dec 2008 5:22:07 UTC

        Really liking the stats on the homepage!! (down to about a 9% failure rate)

        I'm 22hrs in to a 24hr run on this guy:
        fast_ramp_0.01_rep_16_rlb_1o4w_IGNORE_THE_REST_DECOY_5787_1_0

        First off... wow! 64 models so far, and this is a large protein!

        I brought up the graphic... all I see graphed is the RMSD on the right. No energy, and no cross hair of the two. Also... haven't seen any of the red dots of any of the prior 64 models, which seems unlikely to be correct.

        She's running with 177MB of memory. I presume it's already classified as a high memory task... but I really wish you could find room in those short little task names to provide some indication of a tasks minimum memory. I mean I've got 1GB for an HT processor, can't say I ever have memory problems... and I can end up crunching most anything you put out... but it would be nice if we could see in the WU name that this one only goes to machines with 512MB or whatever, so we know NOT to report "high memory problems" on the task.
        ____________

        Profile feet1st

        Joined: Mar 7 06
        Posts: 312
        ID: 1028
        Credit: 110,522
        RAC: 1
        Message 4374 - Posted 3 Dec 2008 5:29:28 UTC

          P.S. I'm the first to admit that if the graphic is the ONLY problem I can report, then things are looking great!

          I'm on WinXP. I tried suspending/resuming tasks and projects, they seem to stop using CPU on command. As they should.
          ____________

          mtyka
          Forum moderator
          Project developer
          Project scientist

          Joined: Mar 19 08
          Posts: 79
          ID: 4144
          Credit: 0
          RAC: 0
          Message 4375 - Posted 3 Dec 2008 18:26:38 UTC - in response to Message 4374.

            What's wrong with the Graphics ?

            mtyka
            Forum moderator
            Project developer
            Project scientist

            Joined: Mar 19 08
            Posts: 79
            ID: 4144
            Credit: 0
            RAC: 0
            Message 4376 - Posted 3 Dec 2008 18:31:03 UTC - in response to Message 4373.


              I brought up the graphic... all I see graphed is the RMSD on the right. No energy, and no cross hair of the two. Also... haven't seen any of the red dots of any of the prior 64 models, which seems unlikely to be correct.

              Ahh - i seee - sorry i sdidnt see your post when i posted the above mesage.
              Hmm - i'll see if i can see what's going on here.


              She's running with 177MB of memory. I presume it's already classified as a high memory task...


              Really ? Aehm - i' say 177MB of memory is actually fairly *low*. Over 250 MB i consider a high memory job. I thoguht the minimum requirement for R@H Wus is 250 MB ? David ?


              Path7

              Joined: Feb 11 08
              Posts: 56
              ID: 4036
              Credit: 4,974
              RAC: 0
              Message 4378 - Posted 3 Dec 2008 19:07:23 UTC

                This WU:
                cc_0_6_nocst4_homo_bench_foldcst_chunk_general_t303__olange_IGNORE_THE_REST _2AH5A_3_5823_3_0
                attracted my attention because it ran ongoing for 11250 seconds ( No other project ran in the meantime), and generated 2 decoys.
                Runtime preference: 2 hours (7200 seconds)
                Switch between application (setting): 60 minutes (3600 seconds).

                I wonder myself: did this WU made any checkpoint?

                Have a nice day,
                Path7.

                olange

                Joined: Nov 27 08
                Posts: 2
                ID: 4981
                Credit: 0
                RAC: 0
                Message 4379 - Posted 3 Dec 2008 20:31:11 UTC - in response to Message 4378.

                  Hi Path7, thanks for the report.

                  the application should make checkpoints pretty regularly. also the runtime looks really long. I will run this WU locally and check things out.

                  what CPU was this WU running on ?

                  -Oliver

                  mtyka
                  Forum moderator
                  Project developer
                  Project scientist

                  Joined: Mar 19 08
                  Posts: 79
                  ID: 4144
                  Credit: 0
                  RAC: 0
                  Message 4380 - Posted 3 Dec 2008 20:52:01 UTC

                    I've looked into the graphisc problem - i can totally reproduce this here, it seems to have to do with the way it scales the graph and is merely cosmetic. I'll try and get this fixed with the next graphics update (which is separate from the main application update).

                    Mike

                    Profile feet1st

                    Joined: Mar 7 06
                    Posts: 312
                    ID: 1028
                    Credit: 110,522
                    RAC: 1
                    Message 4381 - Posted 3 Dec 2008 20:55:11 UTC - in response to Message 4376.

                      Really ? Aehm - i' say 177MB of memory is actually fairly *low*. Over 250 MB i consider a high memory job. I thoguht the minimum requirement for R@H Wus is 250 MB ? David ?


                      256MB is the minimum for the SYSTEM! Not the max for a task. Leave some room for an operating system and a browser window or two there.

                      177MB is above average, which tends to be closer to 120MB. So, my point is that we're sitting here observing the ~60MB increase, and not knowing if you are already aware of it, or if the tasks have been properly set up to only run on machines with more then the minimum requirement. ...I guess if I had a machine with the minimum memory and saw it, then I would know it needs to be pointed out. But, as it stands, I have no way to tell.
                      ____________

                      Profile Conan
                      Avatar

                      Joined: Feb 16 06
                      Posts: 344
                      ID: 145
                      Credit: 1,309,534
                      RAC: 0
                      Message 4382 - Posted 3 Dec 2008 21:03:56 UTC

                        I have been getting the following error on just one of my hosts This Host

                        CPU time 0
                        stderr out

                        <core_client_version>5.10.45</core_client_version>
                        <![CDATA[
                        <message>
                        Maximum CPU time exceeded
                        </message>

                        The run time is Zero with of course no decoys generated.

                        This has now been happening on this host since Version 1.42 with 1188054
                        Now on Version 1.43 with 1188071
                        1188409
                        1188547
                        1193391

                        Also had another validate error on 1190103
                        It ran for over 12,000 seconds yet generated no decoys.

                        Hope this helps and hope you can help my computer as well, Conan.
                        ____________

                        mtyka
                        Forum moderator
                        Project developer
                        Project scientist

                        Joined: Mar 19 08
                        Posts: 79
                        ID: 4144
                        Credit: 0
                        RAC: 0
                        Message 4383 - Posted 3 Dec 2008 21:22:33 UTC - in response to Message 4381.

                          feet1st.
                          Oh wow - i see. Hmm - i will discuss this with DK in a minute. For my Wus the memory requirements (per JOB! ) are around:

                          relax_benchmark (rlb_**) around 100-200MB
                          homology_benchamrk (*_homo_bench_*) around 150-330MB (the proteins here are much much bigger)

                          Thanks for this info!

                          Mike

                          mtyka
                          Forum moderator
                          Project developer
                          Project scientist

                          Joined: Mar 19 08
                          Posts: 79
                          ID: 4144
                          Credit: 0
                          RAC: 0
                          Message 4384 - Posted 3 Dec 2008 21:26:36 UTC


                            Also had another validate error on 1190103
                            It ran for over 12,000 seconds yet generated no decoys.


                            Ah yes - we saw that and we have a fix. A flag was missing from that WU that
                            prevents this error, so our future WUs should not produce thi sparticular validate error anymore.


                            YOur other errors.. stranger - we'll look into it.

                            You say it's only a single machine that has them ? Have you trie restarting/reinstalling boinc on it ? Is there anything particular about that box ?

                            Path7

                            Joined: Feb 11 08
                            Posts: 56
                            ID: 4036
                            Credit: 4,974
                            RAC: 0
                            Message 4385 - Posted 3 Dec 2008 23:31:36 UTC - in response to Message 4379.

                              Last modified: 3 Dec 2008 23:36:46 UTC

                              Hi Path7, thanks for the report.
                              ................ I will run this WU locally and check things out.

                              what CPU was this WU running on ?

                              -Oliver

                              Hi olange/Oliver,

                              Thanks for your reaction.
                              The “olange” WU ran on a single core AMD sempron 3000+ 1.8 GHz, Ubuntu 8.04.

                              I hope this helps,
                              Path7.

                              Message boards : RALPH@home bug list : minirosetta v1.43 bug thread


                              Home | Join | About | Participants | Community | Statistics

                              Copyright © 2017 University of Washington

                              Last Modified: 20 Nov 2008 19:41:56 UTC
                              Back to top ^