RALPH@home

minirosetta beta 3.50-3.52 apps

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search

Message boards : RALPH@home bug list : minirosetta beta 3.50-3.52 apps

AuthorMessage
Profile dekim
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jan 20 06
Posts: 210
ID: 1
Credit: 477,266
RAC: 167
Message 5718 - Posted 28 Apr 2014 5:43:27 UTC

    It's been a while since our last update and there has been some significant code refactoring and updates. Please post any issues or bugs in this thread. We'll be submitting test jobs on and off. Thanks, David K
    ____________

    sergei.a.mochalov

    Joined: May 22 13
    Posts: 4
    ID: 18693
    Credit: 369,440
    RAC: 0
    Message 5719 - Posted 30 Apr 2014 9:36:06 UTC

      Last modified: 30 Apr 2014 9:37:18 UTC

      Got 62 new WU's, 11 are running at the moment, after 3+ hours runtime I have no errors. Runtime set to 8 hours. OS is Win7 x64.
      P.S. Is it possible to make some kind of "notification" via BOINC about new jobs? I receive some news from other projects in my BoincTasks, but not from RALPH.
      P.P.S. Also would be nice to update "Server status" at project's frontpage more frequently.

      sergei.a.mochalov

      Joined: May 22 13
      Posts: 4
      ID: 18693
      Credit: 369,440
      RAC: 0
      Message 5720 - Posted 30 Apr 2014 14:16:26 UTC

        So far I finished 2 tasks, both ran smooth and finished well.
        Link to my host - http://ralph.bakerlab.org/show_host_detail.php?hostid=30398

        Profile [VENETO] boboviz

        Joined: Apr 9 08
        Posts: 499
        ID: 4205
        Credit: 707,031
        RAC: 66
        Message 5721 - Posted 30 Apr 2014 19:28:11 UTC - in response to Message 5719.

          P.S. Is it possible to make some kind of "notification" via BOINC about new jobs? I receive some news from other projects in my BoincTasks, but not from RALPH.

          The "notification" on Boinc Client needs a recent version of Boinc Server...Rosetta/Ralph has an (very) old Boinc Server.

          P.P.S. Also would be nice to update "Server status" at project's frontpage more frequently.

          Not only:
          1) Script of "User of the day" needs to be restarted
          2) RAC updates have problems

          Profile Conan
          Avatar

          Joined: Feb 16 06
          Posts: 344
          ID: 145
          Credit: 1,309,534
          RAC: 0
          Message 5722 - Posted 1 May 2014 23:23:10 UTC

            Have had one WU play up by stopping at 35% but keep running the time up to over 14 Hours (my preferences are set to 6 hours).
            I suspended the WU then resumed it, and now it has reset itself back to just 2 Hours run time and progressing normally.

            Will see how it pans out as it is still running.

            This Result

            Conan
            ____________

            Profile Conan
            Avatar

            Joined: Feb 16 06
            Posts: 344
            ID: 145
            Credit: 1,309,534
            RAC: 0
            Message 5723 - Posted 2 May 2014 11:36:43 UTC - in response to Message 5722.

              Have had one WU play up by stopping at 35% but keep running the time up to over 14 Hours (my preferences are set to 6 hours).
              I suspended the WU then resumed it, and now it has reset itself back to just 2 Hours run time and progressing normally.

              Will see how it pans out as it is still running.

              This Result

              Conan


              Completed successfully, as has all other tasks.

              Conan
              ____________

              Profile Conan
              Avatar

              Joined: Feb 16 06
              Posts: 344
              ID: 145
              Credit: 1,309,534
              RAC: 0
              Message 5724 - Posted 3 May 2014 11:32:02 UTC

                Export of Stats not working.

                Conan
                ____________

                Profile Conan
                Avatar

                Joined: Feb 16 06
                Posts: 344
                ID: 145
                Credit: 1,309,534
                RAC: 0
                Message 5726 - Posted 9 May 2014 7:11:58 UTC

                  Stat exports not working still.

                  Conan
                  ____________

                  Profile Conan
                  Avatar

                  Joined: Feb 16 06
                  Posts: 344
                  ID: 145
                  Credit: 1,309,534
                  RAC: 0
                  Message 5727 - Posted 15 May 2014 11:24:58 UTC

                    Last modified: 15 May 2014 11:28:46 UTC

                    Your Server Status on the front page is at least 353 DAYS behind time.

                    It shows the 17th May 2013 and today is the 15th May 2014. (Shown in top right corner)

                    The actual Server time is correct on the Server Status page.

                    Conan
                    ____________

                    Profile [VENETO] boboviz

                    Joined: Apr 9 08
                    Posts: 499
                    ID: 4205
                    Credit: 707,031
                    RAC: 66
                    Message 5730 - Posted 16 May 2014 12:39:44 UTC

                      Error after 2h
                      2915391

                      Exit status 194 (0xc2)
                      ======================================================
                      DONE :: 1 starting structures 7182.52 cpu seconds
                      This process generated 19 decoys from 19 attempts
                      ======================================================
                      BOINC :: WS_max 4.15756e+008

                      BOINC :: Watchdog shutting down...
                      BOINC :: BOINC support services shutting down cleanly ...
                      called boinc_finish

                      </stderr_txt>
                      <message>
                      finish file present too long
                      </message>

                      Profile Conan
                      Avatar

                      Joined: Feb 16 06
                      Posts: 344
                      ID: 145
                      Credit: 1,309,534
                      RAC: 0
                      Message 5732 - Posted 19 May 2014 0:17:49 UTC

                        Had this TASK 2939795 that got to just over 17% and then stayed there and ran up time to over 12 hours run time (my preferences are set to 6 hours).

                        I suspended the task and resumed it and it had now dropped back to normal processing with the percent done moving forward again and the time has dropped back to as well showing 1 1/2 hours so far and 3 1/5 to go.

                        Will see if it processes successfully.

                        Conan
                        ____________

                        Profile [VENETO] boboviz

                        Joined: Apr 9 08
                        Posts: 499
                        ID: 4205
                        Credit: 707,031
                        RAC: 66
                        Message 5736 - Posted 26 May 2014 9:04:43 UTC

                          After some 3.51 wus ok, one error.
                          2980172

                          # cpu_run_time_pref: 7200
                          ERROR: Could not find disulfide partner for residue 49
                          ERROR:: Exit from: ..\..\..\src\core\scoring\disulfides\FullatomDisulfideEnergyContainer.cc line: 586

                          ERROR: Energies:: operation NOT permitted during scoring.
                          ERROR:: Exit from: ..\..\..\src\core\scoring\Energies.cc line: 372
                          std::cerr: Exception was thrown:


                          [ERROR] EXCN_utility_exit has been thrown from: ..\..\..\src\core\scoring\Energies.cc line: 372
                          ERROR: Energies:: operation NOT permitted during scoring.

                          Profile dekim
                          Forum moderator
                          Project administrator
                          Project developer
                          Project scientist

                          Joined: Jan 20 06
                          Posts: 210
                          ID: 1
                          Credit: 477,266
                          RAC: 167
                          Message 5737 - Posted 27 May 2014 18:50:39 UTC

                            Thanks for the info everyone. The cron jobs weren't running so I started them up again so the various web pages and related info mentioned below should be updated now. I just posted another app update to hopefully take care of the disulfide partner bug.

                            thanks!
                            ____________

                            Profile [VENETO] boboviz

                            Joined: Apr 9 08
                            Posts: 499
                            ID: 4205
                            Credit: 707,031
                            RAC: 66
                            Message 5738 - Posted 28 May 2014 8:02:29 UTC

                              Up to now, 12 wus 3.52 without problems!
                              Seems to be good

                              Profile [VENETO] boboviz

                              Joined: Apr 9 08
                              Posts: 499
                              ID: 4205
                              Credit: 707,031
                              RAC: 66
                              Message 5739 - Posted 16 Jun 2014 7:52:39 UTC

                                we added the android arm platform minirosetta_beta version


                                Will you create a "specifical counter" in the server status for android wus?
                                How many wus have you released up to now?

                                Profile [VENETO] boboviz

                                Joined: Apr 9 08
                                Posts: 499
                                ID: 4205
                                Credit: 707,031
                                RAC: 66
                                Message 5740 - Posted 11 Jul 2014 8:18:05 UTC - in response to Message 5739.

                                  Will you create a "specifical counter" in the server status for android wus?How many wus have you released up to now?


                                  Questions without answer....

                                  Profile [VENETO] boboviz

                                  Joined: Apr 9 08
                                  Posts: 499
                                  ID: 4205
                                  Credit: 707,031
                                  RAC: 66
                                  Message 5744 - Posted 17 Jul 2014 6:24:00 UTC

                                    My default run time is 2h. This wu runs for 6h and restart (3 times). At the end, i kill it.

                                    Unhandled Exception Detected...

                                    - Unhandled Exception Record -
                                    Reason: Breakpoint Encountered (0x80000003) at address 0x7455491E

                                    Engaging BOINC Windows Runtime Debugger...

                                    ********************

                                    BOINC Windows Runtime Debugger Version 6.5.0

                                    Dump Timestamp : 07/17/14 07:12:22
                                    Install Directory : C:\Program Files\BOINC\
                                    Data Directory : C:\ProgramData\BOINC
                                    Project Symstore : http://boinc.bakerlab.org/rosetta/symstore
                                    Loaded Library : C:\Program Files\BOINC\\dbghelp.dll
                                    Loaded Library : C:\Program Files\BOINC\\symsrv.dll
                                    Loaded Library : C:\Program Files\BOINC\\srcsrv.dll
                                    LoadLibraryA( C:\Program Files\BOINC\\version.dll ): GetLastError = 126
                                    Loaded Library : version.dll

                                    sgaboinc

                                    Joined: Jul 8 14
                                    Posts: 16
                                    ID: 33667
                                    Credit: 2,855
                                    RAC: 0
                                    Message 5747 - Posted 19 Jul 2014 1:31:55 UTC

                                      Last modified: 19 Jul 2014 1:39:41 UTC

                                      recent 8 tasks on 3.52 completed without errors.
                                      i did a project reset and aborted the next set of tasks, note it's not due to any errors.

                                      should my client download 3.53 beta rather than 3.52 as it seemed both versions appear in apps?
                                      http://ralph.bakerlab.org/apps.php

                                      it seemed that the current set of tasks downloads the associated 3.52 binary rather than 3.53

                                      Dr Who Fan
                                      Avatar

                                      Joined: Sep 2 06
                                      Posts: 63
                                      ID: 1787
                                      Credit: 46,809
                                      RAC: 16
                                      Message 5749 - Posted 19 Jul 2014 2:01:45 UTC

                                        Unable to get work for my Nexus 7 Tablet WHY?

                                        ralph@home 18-07-2014 20:52 Rosetta Mini is not available for your type of computer.

                                          CPU type ARM
                                          ARMv7 Processor rev 9 (v7l) @1300MHz
                                          Number of CPUs 4
                                          Operating System Android
                                          3.1.10-g1e42d16




                                        ____________

                                        Profile Conan
                                        Avatar

                                        Joined: Feb 16 06
                                        Posts: 344
                                        ID: 145
                                        Credit: 1,309,534
                                        RAC: 0
                                        Message 5751 - Posted 19 Jul 2014 13:46:12 UTC

                                          Last modified: 19 Jul 2014 13:53:40 UTC

                                          Every work unit on my Windows computers of the Tc804_Symm_hybrid_20174_xxxx_0 type only grants from 46 to 48 points per work unit no matter what the run time or the claimed credit.

                                          I have processed 8 of these so far and they all show the same behaviour.

                                          Times range from 17,895 seconds to 29,299 seconds but the credit granted is always 46-48 points.

                                          This only affects my Windows 32 bit computers as my 64 bit Linux machines don't have this issue and have normal points granted for the same work unit type.

                                          The Windows work units also only process 1 Decoy, yet the Linux work units process 3-4 Decoys.

                                          All other work units grant normal credits just the Tc804_Symm_hybrid_20174_xxxx_0 types on Windows grant the very low credit.

                                          Example Result 3000573
                                          Result 3004229
                                          Result 2989819

                                          Conan
                                          ____________

                                          Mad_Max

                                          Joined: Nov 15 12
                                          Posts: 9
                                          ID: 18322
                                          Credit: 294,797
                                          RAC: 1
                                          Message 5754 - Posted 20 Jul 2014 11:48:44 UTC

                                            Wus with names
                                            Tc794_hybrid...
                                            Tc804_summ_hybrid...

                                            Have problems with checkpointing (usual not working at all - reset to 0% progress if restart). And usual run much longer to target time (my target time set to 2 hour, but usual run 5-6 hours)
                                            Also some of this Wus grant only 20 Cr and have "InternalDecoyCount: 0 (GZ)" in logs
                                            AFAIK it is mean what no any usuful work was done and 5-6 hours of CPU time wasted at each such WU

                                            Profile [VENETO] boboviz

                                            Joined: Apr 9 08
                                            Posts: 499
                                            ID: 4205
                                            Credit: 707,031
                                            RAC: 66
                                            Message 5755 - Posted 20 Jul 2014 11:59:41 UTC

                                              As i say, continuos restart

                                              20/07/2014 12:50:54 | ralph@home | Task Tc804_symm_hybrid_20174_15261_0 exited with zero status but no 'finished' file

                                              I reset the project, without result

                                              Profile Conan
                                              Avatar

                                              Joined: Feb 16 06
                                              Posts: 344
                                              ID: 145
                                              Credit: 1,309,534
                                              RAC: 0
                                              Message 5756 - Posted 20 Jul 2014 13:39:24 UTC - in response to Message 5754.

                                                Wus with names
                                                Tc794_hybrid...
                                                Tc804_summ_hybrid...

                                                Have problems with checkpointing (usual not working at all - reset to 0% progress if restart). And usual run much longer to target time (my target time set to 2 hour, but usual run 5-6 hours)
                                                Also some of this Wus grant only 20 Cr and have "InternalDecoyCount: 0 (GZ)" in logs
                                                AFAIK it is mean what no any usuful work was done and 5-6 hours of CPU time wasted at each such WU


                                                Each work unit will run until at least 1 Decoy has been run.
                                                If it takes longer than your set preferences then it keeps going till at least 1 Decoy finishes.
                                                Setting to 2 hours will work most of the time.
                                                The 5-6 hour run times happen to suit my 6 hour preference setting, but even then I have had some go over 7 hours.

                                                I have not tested the check-pointing so I can't test if the work units restart from zero like you are seeing.


                                                Conan





                                                ____________

                                                sgaboinc

                                                Joined: Jul 8 14
                                                Posts: 16
                                                ID: 33667
                                                Credit: 2,855
                                                RAC: 0
                                                Message 5757 - Posted 20 Jul 2014 15:24:06 UTC

                                                  Last modified: 20 Jul 2014 15:48:19 UTC

                                                  checkpoint fail
                                                  TC804_symm_hybrid, Rosetta mini 3.52 - linux x64 - boinc client 7.0.36
                                                  7 concurrent (same) tasks starts and runs for an hour, reached 25% completion
                                                  suspended project, shutdown boinc-client (note, done via boinc-gui)

                                                  restarted project, out of the 7 tasks only 1 restarted from 25% completion, the rest of 6 tasks restarted from 0. in effect lost 6 x 1 hours work.

                                                  aborted 3 tasks, resume runs on half load 4 tasks

                                                  checkpoint preferences set for 60 secs. not sure where's the root cause. (boinc-client, rosetta app, or the parameters used to run the app

                                                  e.g. if there are no structures and the session is interrupted that'd effectively means if the task/job is restarted it'd start from zero all over?
                                                  hmm, perhaps something to be considered and improved on

                                                  i'd think rosetta need to save the state even if no structures are generated esp for such large?/complex? jobs where for that matter there may be no structures (i.e. the run did not find a root/solution/model)

                                                  the other thing would be if some jobs hits a 'dead end' (runs for hours without finding solutions, perhaps goes into endless chaotic loops), there'd hence be no structures & no credits would be awarded/claimed?

                                                  i'd think participants need to have influence on the max default run time per task, i.e. some participants would not be too happy to crunch perhaps jobs that runs say for 5-6 hours and not find a solution and hence no credits. if this is not possible, then participants may simply need to abort long jobs that goes beyond the 'normal' (say compared to average of all other jobs) durations

                                                  another way i'd guess is the necessity to award credits/allow claimed credits to the 'no solution' (no models) runs where after the 'reasonable' timeframe no solutions are found. i guess the max default run time is 6 hours, hence, app developer should consider terminating with credits for such cases.

                                                  however, for many participants with a fairly recent cpu that runs somewhat 'fast', after 3-4 hours where there are no solution, the participant may not want to continue the run. hence, participants need to have a 'computing preference' to state that the max default run time preferred is hence say 4 hours.

                                                  sgaboinc

                                                  Joined: Jul 8 14
                                                  Posts: 16
                                                  ID: 33667
                                                  Credit: 2,855
                                                  RAC: 0
                                                  Message 5758 - Posted 20 Jul 2014 16:42:52 UTC

                                                    Last modified: 20 Jul 2014 17:08:29 UTC

                                                    remaining 4 Tc804_symm_hybrid work units completed successfully no errors
                                                    http://ralph.bakerlab.org/result.php?resultid=3014838, 4,763.77 cpu secs
                                                    http://ralph.bakerlab.org/result.php?resultid=3014871, 4,601.52 cpu secs
                                                    http://ralph.bakerlab.org/result.php?resultid=3014874, 4,476.31 cpu secs
                                                    http://ralph.bakerlab.org/result.php?resultid=3014875, 8,502.676 cpu secs
                                                    cpu time varies as it seemed, after all psuedorandom numbers are involved in the simulations/solution search

                                                    *caveat*
                                                    those that clocked 4k cpu secs could be jobs that show up as 0% in bonic-client / gui after the shutdown interruption. that may suggest some bugs (not sure where/which app boinc-client, or rosetta) in updating the state xml statistics files. i.e. bonic-client/gui restarted showing 0%, however, rosetta probably did save the state and hence 3 jobs suggestively ended in half the timeframe. i.e. the statistics for the cpusecs is incorrect, those 3 jobs actually ran for 8k cpu secs. there are 4000 cpu secs 'lost' for each of the 3 jobs before the suspend project / boinc-client shutdown interruption, this is more like a missing statistics update. However, what could be postulated is that rosetta did checkpoint and resumed from the interruption, hence the 3 jobs suggestively completes in 4k cpu secs as the first half of the cpu secs statistics is 'lost'. i.e. if rosetta did not checkpoint, what would have showed up would be 8k cpu secs and the actual total cpu secs would be 12k cpu secs for those jobs

                                                    i guess i'd upgrade my boinc-client to see if that'd resolve the issue

                                                    ---
                                                    note that this has major impact to credits claimed / granted. as the 'lost' cpu secs would suggest that that job can be done in 1/2 the cpu secs (i.e. half of 8 k actual) which is *incorrect*

                                                    TPCBF

                                                    Joined: Jun 20 11
                                                    Posts: 28
                                                    ID: 16929
                                                    Credit: 15,842
                                                    RAC: 0
                                                    Message 5759 - Posted 20 Jul 2014 21:31:25 UTC - in response to Message 5758.

                                                      Same here, the 4 WUs I p/u on the 17th just keep restarting from 0% over and over again and each time, at least during the initial time, are trashing the hard drive like crazy...

                                                      Is anyone from the project actually around to monitor any responses. Or is Mr.Baker & Cie only available when there's a chance to bask in the limelight?

                                                      Ralf

                                                      Profile [VENETO] boboviz

                                                      Joined: Apr 9 08
                                                      Posts: 499
                                                      ID: 4205
                                                      Credit: 707,031
                                                      RAC: 66
                                                      Message 5760 - Posted 21 Jul 2014 6:54:34 UTC - in response to Message 5759.

                                                        Is anyone from the project actually around to monitor any responses. Or is Mr.Baker & Cie only available when there's a chance to bask in the limelight?


                                                        You're too mocking....but you've some reasons. There is a big "lack of comunications" with r@h team.

                                                        Profile [VENETO] boboviz

                                                        Joined: Apr 9 08
                                                        Posts: 499
                                                        ID: 4205
                                                        Credit: 707,031
                                                        RAC: 66
                                                        Message 5761 - Posted 21 Jul 2014 20:47:19 UTC

                                                          Wow, 20 points for over 6h of run :-P

                                                          # cpu_run_time_pref: 7200
                                                          BOINC:: CPU time: 22084.9s, 14400s + 7200s[2014- 7-21 22:38:48:] :: BOINC
                                                          WARNING! cannot get file size for default.out.gz: could not open file.
                                                          Output exists: default.out.gz Size: -1
                                                          InternalDecoyCount: 0 (GZ)
                                                          -----
                                                          0
                                                          -----
                                                          Stream information inconsistent.
                                                          Writing W_0000001
                                                          ======================================================
                                                          DONE :: 1 starting structures 22084.9 cpu seconds
                                                          This process generated 1 decoys from 1 attempts
                                                          ======================================================
                                                          called boinc_finish

                                                          </stderr_txt>
                                                          ]]>

                                                          Validate state Valid
                                                          Claimed credit 78.0462089549509
                                                          Granted credit 20

                                                          sgaboinc

                                                          Joined: Jul 8 14
                                                          Posts: 16
                                                          ID: 33667
                                                          Credit: 2,855
                                                          RAC: 0
                                                          Message 5762 - Posted 22 Jul 2014 1:33:10 UTC - in response to Message 5761.

                                                            Last modified: 22 Jul 2014 1:55:10 UTC

                                                            Wow, 20 points for over 6h of run :-P

                                                            Validate state Valid
                                                            Claimed credit 78.0462089549509
                                                            Granted credit 20


                                                            based on what i understand from rosetta@home message boards, the granted credit which tend to be different (not necessarily lower) is apparently due to averages being used. i've observed cases where granted credit > claimed credit

                                                            i.e. every participant's PC claim a certain number of computed credits (this is the actual cpu work done), if there is no fraud claim credits is actually *accurate*. however what's granted is the average

                                                            apparently as i've posted earlier in this thread there are bugs in boinc client, in my case if i suspend the jobs and shutdown the clients and restart them later, statistics for the initial run could be lost. however, apparently rosetta did checkpoint successfully and resumed from the point it is restated. hence, if the task is say 100 credits, and if the shutdown occur at 99 credits worth of cpu time, when i restart those tasks that's affected by the boinc client bug, it would complete that and claim 1 credit. that would *wrongly* imply that a 100 credits job can be done in 1 credit effort (this is completely inaccurate)

                                                            rosetta should built-in in the formula to reject out of band claimed credit for jobs. this can be done by taking standard deviations and rejecting those falling more than one or 1.96 (95% confidence interval, http://en.wikipedia.org/wiki/1.96) standard deviations below the statistical averages when computing the granted credits. that should result in the higher claim credits being averaged to reflect the true effort needed to complete the tasks

                                                            hope admins consider that and enhance the server codes.
                                                            rosetta@home/ralph@home should not be 'stingy' with credits as those are *true work done* and the project as a whole is competing with other boinc projects to show that they are popular projects that's getting the participant's attention (it is a very good form of free advertising for the project)

                                                            Profile [VENETO] boboviz

                                                            Joined: Apr 9 08
                                                            Posts: 499
                                                            ID: 4205
                                                            Credit: 707,031
                                                            RAC: 66
                                                            Message 5763 - Posted 22 Jul 2014 8:52:51 UTC - in response to Message 5762.

                                                              i've observed cases where granted credit > claimed credit

                                                              I know, i know the "problem" granted/claimed. I hope you realize i don't partecipate for credits :-)

                                                              hope admins consider that and enhance the server codes.

                                                              This request is repeated frequently on Rosetta forum.
                                                              Other volunteers have suggested that it is a problem of customization of actual server. But admins have not said anything.

                                                              sgaboinc

                                                              Joined: Jul 8 14
                                                              Posts: 16
                                                              ID: 33667
                                                              Credit: 2,855
                                                              RAC: 0
                                                              Message 5764 - Posted 22 Jul 2014 15:57:54 UTC - in response to Message 5763.

                                                                Last modified: 22 Jul 2014 16:19:14 UTC

                                                                i've observed cases where granted credit > claimed credit

                                                                I know, i know the "problem" granted/claimed. I hope you realize i don't partecipate for credits :-)

                                                                hope admins consider that and enhance the server codes.

                                                                This request is repeated frequently on Rosetta forum.
                                                                Other volunteers have suggested that it is a problem of customization of actual server. But admins have not said anything.


                                                                strictly speaking i'm speculating a possible factor for the low credits is mainly an *instance* caused (most likely) by faulty boinc-client s/w. as statistics is 'lost' on a shutdown/restart, it 'mis-reports' credits to the server. as the claimed credits is much lower after the restart it affects the average credits that's awarded to the task and any later participants. while i normally ignore them (as like u credits aren't really the purpose to crunch rosetta), it may make some participants unhappy about the low granted credits esp for those who pick up the subsequent same jobs. the solution of course is to fix my (an instance of) faulty boinc client, but i'm just putting in my 2 cents reasoning on the 'collerateral damage' that others may observe. my guess is that this issue may be partially alleviated from the server if the server ignores exceptionally low credits when computing the granted credits.

                                                                i'm not too sure if there may better way to award 'credits', however, taking an average of reported claimed credits is after all a good way to measure the work done statistically averaged across different systems. just that in this 'simple' points(credit) system, it is prone to be affected by 'mis-behaving' clients. i guess there really aren't perfect solutions

                                                                i'd soon upgrade my client, hopefully that'd 'fix' some of the statistical issues from my little leaf node

                                                                Mad_Max

                                                                Joined: Nov 15 12
                                                                Posts: 9
                                                                ID: 18322
                                                                Credit: 294,797
                                                                RAC: 1
                                                                Message 5765 - Posted 22 Jul 2014 21:49:14 UTC

                                                                  Last modified: 22 Jul 2014 22:02:32 UTC

                                                                  It is NOT "faulty boinc-client s/w." OR " statistics is 'lost' on a shutdown/restart"

                                                                  It is faulty rosetta software (or particular WUs batch) - it simply not write checkpoints at all (i already check this - intermediate checkpoints in last Wus batches not working, seems only full/finished models saved to disk). So at each restart ALL work already done before restart went to trash can. And start work from scratch after restart.
                                                                  So BOINC software do right when reset statistic and credits to zero too because: 0 useful work done = 0 Cr

                                                                  Also some of WUs run so long (possible algorithm looped infinitely or just very difficult model to calculate) so even after 5-7 hours of running(without interruptions / restarts) on modern CPU can not finish very first model (decoy).

                                                                  In this situations claimed credits (calculated by BOINC client) will be normal. But granted credit actually = 0, because R@H for granted credit use such formula:
                                                                  average claimed credit per 1 decoy (collected and calculated from prev users who report WUs from same batch) multiply by number of decoys reported in particular task of a specific user.
                                                                  So if decoy count = 0, granted credits = 0 too.

                                                                  But later programmers added exception: if user report task with decoy count = 0 not use general formula (which gives 0 Cr) but reward WU with fixed 20 Cr as some sort of consolation/booby prize.

                                                                  Mad_Max

                                                                  Joined: Nov 15 12
                                                                  Posts: 9
                                                                  ID: 18322
                                                                  Credit: 294,797
                                                                  RAC: 1
                                                                  Message 5766 - Posted 22 Jul 2014 21:59:23 UTC - in response to Message 5756.

                                                                    Wus with names
                                                                    Tc794_hybrid...
                                                                    Tc804_summ_hybrid...

                                                                    Have problems with checkpointing (usual not working at all - reset to 0% progress if restart). And usual run much longer to target time (my target time set to 2 hour, but usual run 5-6 hours)
                                                                    Also some of this Wus grant only 20 Cr and have "InternalDecoyCount: 0 (GZ)" in logs
                                                                    AFAIK it is mean what no any usuful work was done and 5-6 hours of CPU time wasted at each such WU


                                                                    I have not tested the check-pointing so I can't test if the work units restart from zero like you are seeing.

                                                                    Conan

                                                                    To roughly check works of checkpoints not necessarily to restart.
                                                                    You can click "properties" of any of the currently executing task and check the line "CPU time at last checkpoint".
                                                                    If checkpoints saving are working normal there will be the time(counted from start of task) of last checkpoint saved. If checkpoint does not work there will be "-- --" on this line.
                                                                    Or time few hours ago/less compared to total CPU time - if the client could finish at least one model completely and recorded it on a disk - it also counted as checkpoint and usual this part work normal.

                                                                    Profile [VENETO] boboviz

                                                                    Joined: Apr 9 08
                                                                    Posts: 499
                                                                    ID: 4205
                                                                    Credit: 707,031
                                                                    RAC: 66
                                                                    Message 5767 - Posted 23 Jul 2014 6:40:03 UTC - in response to Message 5765.

                                                                      Last modified: 23 Jul 2014 6:41:11 UTC

                                                                      It is faulty rosetta software (or particular WUs batch) - it simply not write checkpoints at all (i already check this - intermediate checkpoints in last Wus batches not working, seems only full/finished models saved to disk). So at each restart ALL work already done before restart went to trash can. And start work from scratch after restart.


                                                                      Yeap, i know the "situation" decoy/checkpoint
                                                                      My problem is that some (not a few) wus restart during crunching, without restart of pc/boinc manager and, i repeat, with this message:
                                                                      Task Tc804_symm_hybrid_20174_15261_0 exited with zero status but no 'finished' file

                                                                      cmt.explorer

                                                                      Joined: Jul 24 14
                                                                      Posts: 2
                                                                      ID: 39022
                                                                      Credit: 95
                                                                      RAC: 0
                                                                      Message 5768 - Posted 24 Jul 2014 13:18:15 UTC

                                                                        Greetings,

                                                                        what is meant in the news post to this thread by "If you have an android arm device/phone that supports android-9, ..."? What should "android-9" be?

                                                                        I tried to start a workunit on Android 4.4.4 with the NativeBoinc client but it didn't work - additonally I got the message "Rosetta Mini is not availiable for your type of computer.

                                                                        Any ideas?

                                                                        Thanks!

                                                                        TPCBF

                                                                        Joined: Jun 20 11
                                                                        Posts: 28
                                                                        ID: 16929
                                                                        Credit: 15,842
                                                                        RAC: 0
                                                                        Message 5769 - Posted 25 Jul 2014 2:41:26 UTC - in response to Message 5766.

                                                                          To roughly check works of checkpoints not necessarily to restart.
                                                                          You can click "properties" of any of the currently executing task and check the line "CPU time at last checkpoint".
                                                                          If checkpoints saving are working normal there will be the time(counted from start of task) of last checkpoint saved. If checkpoint does not work there will be "-- --" on this line.
                                                                          Or time few hours ago/less compared to total CPU time - if the client could finish at least one model completely and recorded it on a disk - it also counted as checkpoint and usual this part work normal.
                                                                          The problem with the current checkpoint setting in the WUs is that the recent batch of WUs seem to reset itself a lot, always starting from scratch instead of being able to continue from the last checkpoint. That's the purpose of checkpoints.
                                                                          As it is currently, a lot of processing power get's wasted this way...

                                                                          Ralf

                                                                          Profile [VENETO] boboviz

                                                                          Joined: Apr 9 08
                                                                          Posts: 499
                                                                          ID: 4205
                                                                          Credit: 707,031
                                                                          RAC: 66
                                                                          Message 5770 - Posted 25 Jul 2014 6:26:36 UTC - in response to Message 5768.

                                                                            What should "android-9" be?

                                                                            I'm not sure, but i think it's the version of api. ApiLevels

                                                                            I tried to start a workunit on Android 4.4.4 with the NativeBoinc client but it didn't work - additonally I got the message "Rosetta Mini is not availiable for your type of computer.
                                                                            Any ideas?

                                                                            The first version running on android is 3.53. Now there is a batch of 3.52 so it is normal this message.

                                                                            For Admins: i know 3.53 is the first version, but can you optimize it, please? This version uses a lot of ram, a lot of disc space, continuous restarts...

                                                                            Profile [VENETO] boboviz

                                                                            Joined: Apr 9 08
                                                                            Posts: 499
                                                                            ID: 4205
                                                                            Credit: 707,031
                                                                            RAC: 66
                                                                            Message 5771 - Posted 25 Jul 2014 6:27:04 UTC - in response to Message 5769.

                                                                              As it is currently, a lot of processing power get's wasted this way...


                                                                              +1

                                                                              sgaboinc

                                                                              Joined: Jul 8 14
                                                                              Posts: 16
                                                                              ID: 33667
                                                                              Credit: 2,855
                                                                              RAC: 0
                                                                              Message 5772 - Posted 27 Jul 2014 15:25:56 UTC - in response to Message 5765.

                                                                                Last modified: 27 Jul 2014 16:11:12 UTC

                                                                                It is NOT "faulty boinc-client s/w." OR " statistics is 'lost' on a shutdown/restart"

                                                                                It is faulty rosetta software (or particular WUs batch) - it simply not write checkpoints at all (i already check this - intermediate checkpoints in last Wus batches not working, seems only full/finished models saved to disk). So at each restart ALL work already done before restart went to trash can. And start work from scratch after restart.
                                                                                So BOINC software do right when reset statistic and credits to zero too because: 0 useful work done = 0 Cr



                                                                                hi Max,
                                                                                Thanks much for your post, I think i can confirm your observation:
                                                                                There is no checkpoint !


                                                                                all 6 concurrent ralph@home tasks did not checkpoint after running for more than an hour.
                                                                                this is a screen print, time of last checkpoint is --
                                                                                and elapsed time is some 1 hour 15 minutes

                                                                                compared to a a concurrently running task from rosetta@home


                                                                                the rosetta@home task is checkpointing well as indicated by the time of last checkpoint

                                                                                note that apparently the minirosetta 3.52, 2.53 (beta) binaries running on ralph@home and rosetta@home are the same
                                                                                http://ralph.bakerlab.org/forum_thread.php?id=557&nowrap=true#5753

                                                                                note that all these ralph@home and rosetta@home sessions are running concurrently in the same boinc-client (7.0.36) !


                                                                                some error in the job run parameters or that it's necessary to improve minirosetta to make such complex jobs/tasks checkpoint?

                                                                                Profile [VENETO] boboviz

                                                                                Joined: Apr 9 08
                                                                                Posts: 499
                                                                                ID: 4205
                                                                                Credit: 707,031
                                                                                RAC: 66
                                                                                Message 5773 - Posted 28 Jul 2014 18:29:23 UTC

                                                                                  Validate errors:
                                                                                  3052012
                                                                                  3052013

                                                                                  cmt.explorer

                                                                                  Joined: Jul 24 14
                                                                                  Posts: 2
                                                                                  ID: 39022
                                                                                  Credit: 95
                                                                                  RAC: 0
                                                                                  Message 5783 - Posted 15 Aug 2014 8:59:20 UTC - in response to Message 5768.

                                                                                    Last modified: 15 Aug 2014 9:00:18 UTC

                                                                                    With 3.53 I can now crunch WUs - thank you. See public results at wuprop.boinc-af.org/results/arm.py (ralph@home).

                                                                                    Profile [VENETO] boboviz

                                                                                    Joined: Apr 9 08
                                                                                    Posts: 499
                                                                                    ID: 4205
                                                                                    Credit: 707,031
                                                                                    RAC: 66
                                                                                    Message 5804 - Posted 21 Dec 2014 10:53:10 UTC

                                                                                      Any news? No new application??

                                                                                      Profile [VENETO] boboviz

                                                                                      Joined: Apr 9 08
                                                                                      Posts: 499
                                                                                      ID: 4205
                                                                                      Credit: 707,031
                                                                                      RAC: 66
                                                                                      Message 5806 - Posted 28 Dec 2014 8:10:46 UTC

                                                                                        All errors

                                                                                        Options::initialize()
                                                                                        Options::adding_options()
                                                                                        Options::initialize() Check specs.
                                                                                        Options::initialize() End reached
                                                                                        ERROR: Option matching -corrections:score:hb_sp2_peak_heigh_above_trough not found in command line top-level context

                                                                                        Werinbert

                                                                                        Joined: Nov 7 13
                                                                                        Posts: 2
                                                                                        ID: 18920
                                                                                        Credit: 100,810
                                                                                        RAC: 0
                                                                                        Message 5807 - Posted 4 Jan 2015 11:36:22 UTC - in response to Message 5806.

                                                                                          All errors
                                                                                          Options::initialize()
                                                                                          Options::adding_options()
                                                                                          Options::initialize() Check specs.
                                                                                          Options::initialize() End reached
                                                                                          ERROR: Option matching -corrections:score:hb_sp2_peak_heigh_above_trough not found in command line top-level context



                                                                                          The same problem is still around. I got a hand full of tasks and they all errored.

                                                                                          Profile [VENETO] boboviz

                                                                                          Joined: Apr 9 08
                                                                                          Posts: 499
                                                                                          ID: 4205
                                                                                          Credit: 707,031
                                                                                          RAC: 66
                                                                                          Message 5810 - Posted 25 Feb 2015 12:39:56 UTC

                                                                                            All errors:

                                                                                            ERROR: Error: can't read sequence! Use -in::file::fasta sequence.fasta or -in::file::native native.pdb!
                                                                                            ERROR:: Exit from: ..\..\..\src\protocols\abinitio\AbrelaxApplication.cc line: 488
                                                                                            std::cerr: Exception was thrown:

                                                                                            [ERROR] EXCN_utility_exit has been thrown from: ..\..\..\src\protocols\abinitio\AbrelaxApplication.cc line: 488
                                                                                            ERROR: Error: can't read sequence! Use -in::file::fasta sequence.fasta or -in::file::native native.pdb!

                                                                                            zioriga

                                                                                            Joined: Feb 16 06
                                                                                            Posts: 7
                                                                                            ID: 260
                                                                                            Credit: 52,529
                                                                                            RAC: 1
                                                                                            Message 5811 - Posted 25 Feb 2015 18:22:42 UTC - in response to Message 5810.

                                                                                              This error:

                                                                                              dock_placestub_0_input_0181_0001_ss1_2_ss2_2_ss3_1_ss4_2_ss5_3_0001_0001_vegf_ProteinInterfaceDesign_25Feb2015_20229_18_0 finished
                                                                                              2/25/2015 7:13:27 PM | ralph@home | Output file dock_placestub_0_input_0181_0001_ss1_2_ss2_2_ss3_1_ss4_2_ss5_3_0001_0001_vegf_ProteinInterfaceDesign_25Feb2015_20229_18_0_0 for task dock_placestub_0_input_0181_0001_ss1_2_ss2_2_ss3_1_ss4_2_ss5_3_0001_0001_vegf_ProteinInterfaceDesign_25Feb2015_20229_18_0 absent


                                                                                              after a bit more than a minute
                                                                                              ____________

                                                                                              Profile Conan
                                                                                              Avatar

                                                                                              Joined: Feb 16 06
                                                                                              Posts: 344
                                                                                              ID: 145
                                                                                              Credit: 1,309,534
                                                                                              RAC: 0
                                                                                              Message 5812 - Posted 25 Feb 2015 20:07:59 UTC

                                                                                                Received this error on Task 3335547


                                                                                                ERROR: Error: can't read sequence! Use -in::file::fasta sequence.fasta or -in::file::native native.pdb!
                                                                                                ERROR:: Exit from: src/protocols/abinitio/AbrelaxApplication.cc line: 488
                                                                                                std::cerr: Exception was thrown:


                                                                                                [ERROR] EXCN_utility_exit has been thrown from: src/protocols/abinitio/AbrelaxApplication.cc line: 488
                                                                                                ERROR: Error: can't read sequence! Use -in::file::fasta sequence.fasta or -in::file::native native.pdb!

                                                                                                Conan

                                                                                                ____________

                                                                                                TPCBF

                                                                                                Joined: Jun 20 11
                                                                                                Posts: 28
                                                                                                ID: 16929
                                                                                                Credit: 15,842
                                                                                                RAC: 0
                                                                                                Message 5813 - Posted 25 Feb 2015 21:59:31 UTC - in response to Message 5812.

                                                                                                  Received this error on Task 3335547


                                                                                                  ERROR: Error: can't read sequence! Use -in::file::fasta sequence.fasta or -in::file::native native.pdb!
                                                                                                  ERROR:: Exit from: src/protocols/abinitio/AbrelaxApplication.cc line: 488
                                                                                                  std::cerr: Exception was thrown:


                                                                                                  [ERROR] EXCN_utility_exit has been thrown from: src/protocols/abinitio/AbrelaxApplication.cc line: 488
                                                                                                  ERROR: Error: can't read sequence! Use -in::file::fasta sequence.fasta or -in::file::native native.pdb!

                                                                                                  Conan
                                                                                                  Same here. Got about a dozen or so WUs and they crap out faster than you can shake a stick at... :-(

                                                                                                  Ralf

                                                                                                  Profile [VENETO] boboviz

                                                                                                  Joined: Apr 9 08
                                                                                                  Posts: 499
                                                                                                  ID: 4205
                                                                                                  Credit: 707,031
                                                                                                  RAC: 66
                                                                                                  Message 5814 - Posted 25 Feb 2015 22:09:13 UTC

                                                                                                    Same errors as above (and as Conan) on all my pcs
                                                                                                    Please, stop this batch

                                                                                                    Profile robertmiles

                                                                                                    Joined: Jan 13 09
                                                                                                    Posts: 79
                                                                                                    ID: 5137
                                                                                                    Credit: 233,290
                                                                                                    RAC: 0
                                                                                                    Message 5815 - Posted 26 Feb 2015 0:15:15 UTC - in response to Message 5813.

                                                                                                      Received this error on Task 3335547


                                                                                                      ERROR: Error: can't read sequence! Use -in::file::fasta sequence.fasta or -in::file::native native.pdb!
                                                                                                      ERROR:: Exit from: src/protocols/abinitio/AbrelaxApplication.cc line: 488
                                                                                                      std::cerr: Exception was thrown:


                                                                                                      [ERROR] EXCN_utility_exit has been thrown from: src/protocols/abinitio/AbrelaxApplication.cc line: 488
                                                                                                      ERROR: Error: can't read sequence! Use -in::file::fasta sequence.fasta or -in::file::native native.pdb!

                                                                                                      Conan
                                                                                                      Same here. Got about a dozen or so WUs and they crap out faster than you can shake a stick at... :-(

                                                                                                      Ralf


                                                                                                      Similar problem here, but about 25 failed workunits so far, spread over two computers. The error message suggests that either an input file used by most of today's workunits is corrupted, or the command line that tells the application to use that file is defective.

                                                                                                      Profile [VENETO] boboviz

                                                                                                      Joined: Apr 9 08
                                                                                                      Posts: 499
                                                                                                      ID: 4205
                                                                                                      Credit: 707,031
                                                                                                      RAC: 66
                                                                                                      Message 5816 - Posted 26 Feb 2015 9:39:04 UTC

                                                                                                        Last modified: 26 Feb 2015 9:39:35 UTC

                                                                                                        Today, again
                                                                                                        3340497

                                                                                                        Please, stop this batch!!

                                                                                                        Profile Conan
                                                                                                        Avatar

                                                                                                        Joined: Feb 16 06
                                                                                                        Posts: 344
                                                                                                        ID: 145
                                                                                                        Credit: 1,309,534
                                                                                                        RAC: 0
                                                                                                        Message 5817 - Posted 26 Feb 2015 11:08:01 UTC

                                                                                                          New record for me I think, 100% failures with no successes after many dozens of tasks.

                                                                                                          This batch is fundamentally flawed and the parameters need resetting.

                                                                                                          Conan
                                                                                                          ____________

                                                                                                          candido

                                                                                                          Joined: Aug 14 14
                                                                                                          Posts: 3
                                                                                                          ID: 47062
                                                                                                          Credit: 126,773
                                                                                                          RAC: 0
                                                                                                          Message 5818 - Posted 26 Feb 2015 19:51:46 UTC

                                                                                                            Computation error after 2 to 4 minutes in each and every of approx. 500 workunits.
                                                                                                            ____________

                                                                                                            Profile [VENETO] boboviz

                                                                                                            Joined: Apr 9 08
                                                                                                            Posts: 499
                                                                                                            ID: 4205
                                                                                                            Credit: 707,031
                                                                                                            RAC: 66
                                                                                                            Message 5819 - Posted 26 Feb 2015 20:51:03 UTC - in response to Message 5818.

                                                                                                              Computation error after 2 to 4 minutes in each and every of approx. 500 workunits.


                                                                                                              Waste of time. I've stopped the download on all my cpus....

                                                                                                              candido

                                                                                                              Joined: Aug 14 14
                                                                                                              Posts: 3
                                                                                                              ID: 47062
                                                                                                              Credit: 126,773
                                                                                                              RAC: 0
                                                                                                              Message 5820 - Posted 26 Feb 2015 22:21:39 UTC - in response to Message 5819.

                                                                                                                Yes, at the moment I can't even download to check whether the problem has been solved on new work units... something like this : You have reached a daily limit of x work units, where x varies from time to time (!?)

                                                                                                                Roadranner

                                                                                                                Joined: Oct 15 13
                                                                                                                Posts: 3
                                                                                                                ID: 18898
                                                                                                                Credit: 105,255
                                                                                                                RAC: 41
                                                                                                                Message 5821 - Posted 27 Feb 2015 1:29:00 UTC

                                                                                                                  Two kinds of wus:

                                                                                                                  - dock_placestub - all ending with computation error
                                                                                                                  - zibochen - running normal at the moment

                                                                                                                  candido

                                                                                                                  Joined: Aug 14 14
                                                                                                                  Posts: 3
                                                                                                                  ID: 47062
                                                                                                                  Credit: 126,773
                                                                                                                  RAC: 0
                                                                                                                  Message 5822 - Posted 27 Feb 2015 3:26:22 UTC

                                                                                                                    I want zibochen!!!

                                                                                                                    Profile [VENETO] boboviz

                                                                                                                    Joined: Apr 9 08
                                                                                                                    Posts: 499
                                                                                                                    ID: 4205
                                                                                                                    Credit: 707,031
                                                                                                                    RAC: 66
                                                                                                                    Message 5823 - Posted 1 Mar 2015 14:02:10 UTC

                                                                                                                      Same error, again....

                                                                                                                      Trotador

                                                                                                                      Joined: May 7 10
                                                                                                                      Posts: 19
                                                                                                                      ID: 15474
                                                                                                                      Credit: 6,204,086
                                                                                                                      RAC: 42
                                                                                                                      Message 5824 - Posted 5 Mar 2015 6:10:56 UTC

                                                                                                                        Validate error for all tasks returned today.

                                                                                                                        It looks like being on the server side, units crunch ok.

                                                                                                                        Suspended until solved.

                                                                                                                        Profile Sysadm@Nbg

                                                                                                                        Joined: Dec 9 09
                                                                                                                        Posts: 7
                                                                                                                        ID: 12983
                                                                                                                        Credit: 208,060
                                                                                                                        RAC: 0
                                                                                                                        Message 5825 - Posted 5 Mar 2015 7:18:54 UTC - in response to Message 5824.

                                                                                                                          Validate error for all tasks returned today.

                                                                                                                          It looks like being on the server side, units crunch ok.

                                                                                                                          confirmed --> e.g. http://ralph.bakerlab.org/workunit.php?wuid=2963714
                                                                                                                          ____________

                                                                                                                          Profile [VENETO] boboviz

                                                                                                                          Joined: Apr 9 08
                                                                                                                          Posts: 499
                                                                                                                          ID: 4205
                                                                                                                          Credit: 707,031
                                                                                                                          RAC: 66
                                                                                                                          Message 5826 - Posted 5 Mar 2015 7:27:32 UTC - in response to Message 5824.

                                                                                                                            Validate error for all tasks returned today.


                                                                                                                            +1

                                                                                                                            Dirk Broer

                                                                                                                            Joined: Aug 7 14
                                                                                                                            Posts: 4
                                                                                                                            ID: 44489
                                                                                                                            Credit: 46,664
                                                                                                                            RAC: 4
                                                                                                                            Message 5827 - Posted 5 Mar 2015 18:55:54 UTC - in response to Message 5826.

                                                                                                                              Validate error for all tasks returned today.


                                                                                                                              +1


                                                                                                                              +1

                                                                                                                              zioriga

                                                                                                                              Joined: Feb 16 06
                                                                                                                              Posts: 7
                                                                                                                              ID: 260
                                                                                                                              Credit: 52,529
                                                                                                                              RAC: 1
                                                                                                                              Message 5828 - Posted 6 Mar 2015 8:52:23 UTC

                                                                                                                                +1
                                                                                                                                ____________

                                                                                                                                Roadranner

                                                                                                                                Joined: Oct 15 13
                                                                                                                                Posts: 3
                                                                                                                                ID: 18898
                                                                                                                                Credit: 105,255
                                                                                                                                RAC: 41
                                                                                                                                Message 5829 - Posted 10 Mar 2015 0:42:43 UTC

                                                                                                                                  I'm still getting validate errors. :-(

                                                                                                                                  Message boards : RALPH@home bug list : minirosetta beta 3.50-3.52 apps


                                                                                                                                  Home | Join | About | Participants | Community | Statistics

                                                                                                                                  Copyright © 2017 University of Washington

                                                                                                                                  Last Modified: 20 Nov 2008 19:41:56 UTC
                                                                                                                                  Back to top ^