New crediting system

Message boards : Current tests : New crediting system

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 13 · Next

AuthorMessage
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 1934 - Posted: 8 Aug 2006, 18:58:45 UTC - in response to Message 1933.  
Last modified: 8 Aug 2006, 18:59:55 UTC

. . . would it be better to put a cpu time capture in the Rosetta code itself. Since Rosetta@home code is compiled in the lab, results shouldn't be fake-able.


Not sure how the "time capture" works... wish I could use that for my self :) but once Rosetta determines the time, it should "encrypt" (not "encode") the number of CPU seconds in a way that only the Rosetta host can authenticate. And it should be encrypted along with something fairly random such as the complete WU name and the expiry date/time. That way you can't just replace a 3000 second WU with the time data from a 10000 second WU and have it pass authentication.

ID: 1934 · Report as offensive    Reply Quote
suguruhirahara

Send message
Joined: 5 Mar 06
Posts: 40
Credit: 11,320
RAC: 0
Message 1935 - Posted: 8 Aug 2006, 19:13:44 UTC
Last modified: 8 Aug 2006, 19:14:46 UTC

Does granting credits on how long each computer was running the application separate how much they worked? I mean, does the server give the same amount of credits both to the machine which has Pentium and to that which has Conroe, if they run applications for the same length while Conroe can crunch more?
ID: 1935 · Report as offensive    Reply Quote
Ethan

Send message
Joined: 11 Feb 06
Posts: 18
Credit: 25,579
RAC: 0
Message 1936 - Posted: 8 Aug 2006, 19:32:26 UTC - in response to Message 1935.  

Does granting credits on how long each computer was running the application separate how much they worked? I mean, does the server give the same amount of credits both to the machine which has Pentium and to that which has Conroe, if they run applications for the same length while Conroe can crunch more?


Once ralph determines how many credits to grant for each simulation produced, then it only depends on how fast your computer is to get that many credits. A brand new 3ghz core processor will get many more credits per day than a p3-1ghz since it will be able to process more simulations in the same amount of time.

ID: 1936 · Report as offensive    Reply Quote
tralala

Send message
Joined: 12 Apr 06
Posts: 52
Credit: 15,257
RAC: 0
Message 1937 - Posted: 8 Aug 2006, 20:20:58 UTC

One thing one should keep in mind is that nothing which is reported from the clientside is trustable, not the benchmarks, not the cpu-type and speed - nothing! Anything can be edited and faked. So every scenario which relies on the accuracy of the data from (single) clients should be discarded. However on average there are more accurate and valid information than faked ones. So any mechanism which takes averages from big numbers should in fact work quite well - even the claimed credits which depend on the reported benchmarks. If the sampling is big enough (>100) no discarding of odd results might be necessary but in any case I like the idea of throwing out 10% of the highest standard deviation.

Ethan

Your idea of measuring the average time per model and compare it with a golden computer assumes that the computer pools on Rosetta and Ralph are similar, which is probably not true. I assume on Ralph there are on average faster computer than on Rosetta. This makes the idea imho unpractical.
ID: 1937 · Report as offensive    Reply Quote
suguruhirahara

Send message
Joined: 5 Mar 06
Posts: 40
Credit: 11,320
RAC: 0
Message 1938 - Posted: 9 Aug 2006, 6:19:38 UTC - in response to Message 1937.  
Last modified: 9 Aug 2006, 6:19:49 UTC

but in any case I like the idea of throwing out 10% of the highest standard deviation.

Could you tell me why do you think the value of the deviation should be 10%, the constant?
ID: 1938 · Report as offensive    Reply Quote
Profile JKeck {pirate}
Avatar

Send message
Joined: 16 Feb 06
Posts: 14
Credit: 153,095
RAC: 0
Message 1939 - Posted: 9 Aug 2006, 6:24:41 UTC - in response to Message 1938.  

but in any case I like the idea of throwing out 10% of the highest standard deviation.

Could you tell me why do you think the value of the deviation should be 10%, the constant?


Just a rough number thrown out. The actual cut-off would depend on how many copies are sent out and what percentage of misclaiming hosts we have on RALPLH.
BOINC WIKI

BOINCing since 2002/12/8
ID: 1939 · Report as offensive    Reply Quote
suguruhirahara

Send message
Joined: 5 Mar 06
Posts: 40
Credit: 11,320
RAC: 0
Message 1940 - Posted: 9 Aug 2006, 6:34:29 UTC - in response to Message 1939.  
Last modified: 9 Aug 2006, 7:23:57 UTC

but in any case I like the idea of throwing out 10% of the highest standard deviation.

Could you tell me why do you think the value of the deviation should be 10%, the constant?


Just a rough number thrown out. The actual cut-off would depend on how many copies are sent out and what percentage of misclaiming hosts we have on RALPLH.

Thanks.
Has the formula which determines the value submitted already or not yet?

edit: the issue is argued also on the Rosetta's thread. I think it should be combined with this thread.

ID: 1940 · Report as offensive    Reply Quote
[B^S] sTrey
Avatar

Send message
Joined: 15 Feb 06
Posts: 58
Credit: 15,430
RAC: 0
Message 1941 - Posted: 9 Aug 2006, 11:22:13 UTC
Last modified: 9 Aug 2006, 12:29:08 UTC

If inter-project parity is one of the objectives, I'm not seeing it yet.

I started tracking credit when Seti devs asked for help with that, during the flame wars around their newest application. At that time I observed that my machine was generally collecting 9-10 credits per cpu hour on most projects. (Windows XP, standard BOINC client, no optimized apps except Einstein's "official" beta app). This # was not significantly affected by considering or ignoring the larger quotas on most projects I run.

My time setting for Ralph is 2 hours, and I was getting 20 credits or so per wu. Based on only the 3 wus credited so far (sorry not more, I've been gone for 3 weeks & Ralph's workstream is fairly thin), 2 have wildly higher amounts (58 & 60 credits) and 1 much lower (6 credits). Actually I see the "claimed credit" is showing the older calculation so you can see the difference. Here's my results list.

Edit: a 4th wu came in with credit (24) just a bit higher than previous-normal.

I know you're just starting to experiment with this, but it seemed worth a comment. (Though I'm cynically guessing that "extra" credit wouldn't cause the uproar Seti's perceived "lesser" credit caused...)
ID: 1941 · Report as offensive    Reply Quote
Spare_Cycles

Send message
Joined: 16 Feb 06
Posts: 17
Credit: 12,942
RAC: 0
Message 1942 - Posted: 9 Aug 2006, 12:54:42 UTC - in response to Message 1941.  

If inter-project parity is one of the objectives, I'm not seeing it yet.

Inter-project parity is not an objective on RALPH and thus it may never happen on RALPH.

A credit of 2 per model is being given no matter how slow or fast the WU in order to test the awarding of credit.

As far as I can tell, the test is working fine and credit is being awarded as intended.
ID: 1942 · Report as offensive    Reply Quote
[B^S] sTrey
Avatar

Send message
Joined: 15 Feb 06
Posts: 58
Credit: 15,430
RAC: 0
Message 1943 - Posted: 9 Aug 2006, 14:55:53 UTC - in response to Message 1942.  

If inter-project parity is one of the objectives, I'm not seeing it yet.

Inter-project parity is not an objective on RALPH and thus it may never happen on RALPH.

A credit of 2 per model is being given no matter how slow or fast the WU in order to test the awarding of credit.

As far as I can tell, the test is working fine and credit is being awarded as intended.


I'm not asking about RAlph, I'm asking about this scheme which is intended to be used, eventually, on Rosetta -- where Dekim's post earlier in this thread implies that parity remains an objective, at least "somewhat". I understand that the current per-model value is for test purposes only, and I see that credit is being awarded as described, just wondering about the path from here to project parity in production mode.

If I understood his description, it sounds like a lot of work for the lab members to have to run each wu, and come up with and record a credit-awarding value into it before it can be released to production.

The project team has a great track record so the above must not be a problem, but it will be interesting to see this work out over time.
ID: 1943 · Report as offensive    Reply Quote
Honza

Send message
Joined: 16 Feb 06
Posts: 9
Credit: 1,962
RAC: 0
Message 1944 - Posted: 9 Aug 2006, 15:00:36 UTC - in response to Message 1937.  

Ethan
Your idea of measuring the average time per model and compare it with a golden computer assumes that the computer pools on Rosetta and Ralph are similar, which is probably not true. I assume on Ralph there are on average faster computer than on Rosetta. This makes the idea imho unpractical.
Not being Ethan but anyway...
No, it doesn't. Damn credit award can be postponed until each model gets cralibrated (in term of credit) on Ralph.

Or, taking on your idea, credit award can be estimated once a 100 hosts(or so) returns each results type (no need for ralph estimation)

Each model, as I understand, is not constant in terms of computng demands.
If it would be so...or better make it so - we can use calibration units.

The downside of this is that credit can't be graned immediately (not a bad one) and there will be more results rending.
ID: 1944 · Report as offensive    Reply Quote
Divide Overflow

Send message
Joined: 15 Feb 06
Posts: 12
Credit: 128,027
RAC: 0
Message 1945 - Posted: 9 Aug 2006, 17:43:02 UTC
Last modified: 9 Aug 2006, 18:03:46 UTC

I think it's dangerous having a fixed global credit award for models produced. The variability of models per time is significant from one WU to the next. This would encourage cherry picking high model WU's and favor machines that are handed more of that typ of work. (Or at least run the risk of seeing an increase of aborted work on WU's that are found to produce a few number of models.) Examples from the same machine:

WU: 1qysA_BOINC_ABRELAX_SAVE_ALL_OUT__1108_42_0
Ran for 7021 seconds
Produced 5 models
Awarded 10 credits

WU: 1l2yA_BOINC_ABRELAX_SAVE_ALL_OUT__1108_42_0
Ran for 7195 seconds
Produced 62 models
Awarded 124 credits

It would be far better to take some scaling into account for each WU. Perhaps Ralph could assist in providing this figure to carry over into Rosetta?

EDIT: Just saw the post that this is indeed the plan, which should provide a very equitable credit system for the project. I'm eager to see some of this testing take place.
ID: 1945 · Report as offensive    Reply Quote
Profile [B^S] thierry@home
Avatar

Send message
Joined: 15 Feb 06
Posts: 20
Credit: 17,624
RAC: 0
Message 1946 - Posted: 9 Aug 2006, 18:11:07 UTC

I have finished a WU, apparently with the new credit system. Here are the results:

Win XP Pro sp2
Pentium 4 3.0 HT
RAM 1Gb
BOINC 5.5.0 ;-)

WU: 1qysA_BOINC_ABRELAX_SAVE_ALL_OUT_BARCODE__1109_99
CPU Time: 21127 seconds
Models: 17
Claim credits: 98.1
Granted credits: 34.00
Average credits: 5.80 per hour

ID: 1946 · Report as offensive    Reply Quote
suguruhirahara

Send message
Joined: 5 Mar 06
Posts: 40
Credit: 11,320
RAC: 0
Message 1947 - Posted: 9 Aug 2006, 18:32:06 UTC

https://ralph.bakerlab.org/result.php?resultid=238822
One of mine:

BOINC Alpha 5.5.9
Result ID: 238822
Name: 1enh__BOINC_ABRELAX_SAVE_ALL_OUT_flatss__1111_596_0

CPU Time: 3301sec = around 55min
Claimed credit: 4.88809081678063
Granted credit: 7(models)x2 = 14
Average credit: 12.83/h


ID: 1947 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1948 - Posted: 9 Aug 2006, 20:05:00 UTC

Here's a project comparison using what LITTLE ralph data I have. The other projects haven't been brought up to date yet and I have many more data points to add to them, so I wouldn't use this as anything more than a general idea of where you're at. I've set other projects to NNW until I can get a decent sample quantity. I highlighted the "ralph-new" values in red. Since Rosetta and OLD Ralph used the same credit formula, you can compare the new ralph to the old ralph and even to Rosetta on each puter. Note: all the numbers here are from stock boinc core clients.

ID: 1948 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 1951 - Posted: 11 Aug 2006, 2:04:41 UTC

Is ralph planning on sending more work? My results so far have been all over the map, and don't have enough data to draw any conclusions one way or the other.

tony
ID: 1951 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 1954 - Posted: 11 Aug 2006, 3:45:45 UTC

There seems to be a lot of various misunderstandings floating around in this thread. I am no more "in the know" then any other participant, but I wanted to try and spell out my understanding of the new system and why it is reasonable, and address some of the concerns people have express below. I hope this help set a stake in the sand, and at least give a frame of reference. If I'm off base then project team just has to identify where, rather than write up the whole of the details, so hopefully it leads to a clearer understanding.

Firstly, David Kim's comment:
The credits should remain somewhat consistent with other projects since the average values will be based on the standard boinc crediting scheme.


1) It was stated in passing. Kind of a reassurance that we're not trying to change the whole credit VALUE here. That the new system should remain in line with the old for the most part. It wasn't stated as an objective or a target to achieve. Since BOINC credits are SUPPOSED to represent the TFLOPS of the project, ya kinda have ta stick to that. It doesn't endorse any flawed system (read more below). It simply falls in line with the 100,000 credits = 1 TFLOPS system that BOINC uses.

2) YES, the fact that some machines are faster then others is fully accounted for. The slower machine will take longer to complete each "model". Models are not "work units". The number of models generated for a work unit varies by the user's time preference, and computer resources. This is why a credit system based on number of models crunched makes the most sense.

3) Yes, time to crunch a single model of a 400 amino acid long protein is much higher then the time to crunch a single model of a 55 amino acid long protein. This is fully recognized. No cherry picking is possible, because the credit granted per model for the first will be significantly higher then the credit granted per model for the second.

4) The idea (as I understand it), is to send a few of a given protein WUs out to Ralph and get a feel for how long each model takes to crunch for THAT WU. This credit value will then be used on Rosetta to award credits. So, from the moment the WU is released on Rosetta, the credit value is defined, fair, and requires no quarum or average or waiting for credit granting. All that work, or weeding or standard deviation and reversion to mean was done ahead of time on Ralph.

5) What's the deal with the 2 credits?? As I read it, David Kim found that for the WU he's released on Ralph, that the old system would have awared about 2 credits per model. The actual number doesn't matter. The point is that they are testing the idea of having a fixed value per model for THAT specific WU determined ahead of time. Because this is how Rosetta will be running. It will be handed a set of WUs to send out, along with the credit to award per model crunched.

6) You can't trust the client PCs... correct. Even with the new system it will be possible to falsify your results. The only value we're trusting from the client is the number of models they crunched on that protein... and if they report 50 models crunched, then obviously they've got to report the results of the 50 crunched models... this COULD be falsified. No doubt! (unless you encrypt all of the output data so that only the Rosetta servers can authenticate it) So, you see, we're not counting on the client accurately reporting number of seconds crunching, or Ghz of it's CPU or FLOPS measured. It levels the playing field about how much actual work your machine did to help the project.

7) Why use Ralph to assess the credit value per model? Why not just use a "golden computer" in the lab?? ...because that golden computer has a fixed architecture with no variation. Technology is changing all the time. Say a new system comes out with dual math co-processors, and it's capable of doing roughly twice the actual Rosetta work of it's predecessor, at only a 30% higher clock speed on the base CPU... how do you determine credit for that new machine? You can't just take clock speed, because it's only 30% faster. You can't just take floating point speed, because there are many other factors in performance. By using Ralph, a wide variety of machines and operating systems can be tested for each given work unit. This gives the fairest perspective on how difficult it is to crunch this new protein, and how much harder it is then the last protein that was studied. It's kinda the opposite of the golden machine. And that is better, because the field is always going to be changing. This approach automatically callibrates itself as system capabilities evolve.

I hope I'm helping here. I've been hoping someone from the project team would have spelled all of this out for us, but I think they are probably still zonked from CASP7 :)

Again, the above is just my understanding of things. Not to be taken as gospel by any means. But if you have specific concerns about the above approach, I hope this gives us all a frame of reference for discussion. And I've numbered paragraphs in hopes of NOT seeing all of this text quoted in responses.
ID: 1954 · Report as offensive    Reply Quote
suguruhirahara

Send message
Joined: 5 Mar 06
Posts: 40
Credit: 11,320
RAC: 0
Message 1955 - Posted: 11 Aug 2006, 4:01:36 UTC

Thanks for your brief explanation:) Now I've understood the system, I'm for it.
ID: 1955 · Report as offensive    Reply Quote
Crack

Send message
Joined: 18 Feb 06
Posts: 1
Credit: 257,595
RAC: 0
Message 1957 - Posted: 11 Aug 2006, 5:58:26 UTC
Last modified: 11 Aug 2006, 6:05:47 UTC

That seems a good solution.
However, the credit compaired to other projects will still be to high. Now it's impossible to compair Rosetta with other projects, because the claimed credit is always rewarded.
Changing the "quorum" from 1 to 3 on RALPH would solve that too.
ID: 1957 · Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 16 Feb 06
Posts: 166
Credit: 131,419
RAC: 0
Message 1959 - Posted: 11 Aug 2006, 10:10:56 UTC

The 2 points / model on this WU is not even close.

https://ralph.bakerlab.org/result.php?resultid=240301

Lets se if it evens out :)

Anders n
ID: 1959 · Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 13 · Next

Message boards : Current tests : New crediting system



©2024 University of Washington
http://www.bakerlab.org