New crediting system

Author	Message
Ethan Send message Joined: 11 Feb 06 Posts: 18 Credit: 25,579 RAC: 0	Message 2132 - Posted: 16 Aug 2006, 19:40:40 UTC - in response to Message 2130. Last modified: 16 Aug 2006, 19:45:02 UTC It's pretty easy to pick out extremes. The majority of wu's will be in the same ballpark, and if grossly off, they could probably be recalculated. The differences pointed out below are still only half as large as the difference between standard and optimized credits (if not more, I just checked rosetta, an identical computer to mine gets 3000 credits a day, I get 600 using the standard client). . so that's an improvement rather than a step back. ID: 2132 · Reply Quote

tralala Send message Joined: 12 Apr 06 Posts: 52 Credit: 15,257 RAC: 0	Message 2133 - Posted: 16 Aug 2006, 19:45:27 UTC - in response to Message 2131. these are very small numbers. it should even out. there will always be differences even with the same work unit because of the random nature of the runs. You call 70% small numbers? That are definitely not small numbers! 10-20% I would call acceptable (not worth the cherry picking). If you can easily prevent cherry picking why not do it? 4 results - that is a small number. Well, you've been warned. ;-) If you are not convinced it's a problem try it out. :-) I'm not sure what you meant with "we can easily prevent cherry picking". How would you do that? The only way to prevent cherry picking is to grant very similar credits/hour for each WU and if this is indeed easily achieved why not do it? ID: 2133 · Reply Quote

dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0	Message 2134 - Posted: 16 Aug 2006, 19:52:54 UTC Last modified: 16 Aug 2006, 19:53:22 UTC We can limit the number of wu you can abort. We can change the work unit distribution to be more homogenous. We can update the credit/model values. We can penalize people trying to cherry pick. It seems easy enough to me if it becomes a problem. just some ideas off the top of my head. ID: 2134 · Reply Quote

Astro Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0	Message 2135 - Posted: 16 Aug 2006, 20:07:25 UTC Last modified: 16 Aug 2006, 20:14:45 UTC Dekim is right, our quantity of individual samples is small. Given that this system is supposed to be moved to Rosetta in a day, we are left to bring up possiblilities of what might happen. Things that the project might have not thought about. They possess the "real numbers", in terms of the big picture as they hold the entire DB. I say "cherry picking" might be possible, but my limited amount of data doesn't support that this is possible. now, if I collected more data, would a pattern appear which might indicate I'd get more credit per hour running one wu type over another??? If you look at the results in bold, and IF it became apparant that that type WU consistently returned more per hour, then I could delete all that weren't that type. we have to try to find things to improve your final product before you release it. We are left supposing, guessing, speculating in an effort to help you. Note: this pic is the same as my earlier one. It's just had the WU names appended to it. ID: 2135 · Reply Quote

Ethan Send message Joined: 11 Feb 06 Posts: 18 Credit: 25,579 RAC: 0	Message 2136 - Posted: 16 Aug 2006, 20:34:20 UTC - in response to Message 2135. mmciastro, maybe I'm reading your graph wrong (thanks for posting it btw). . . but it looks very consistant on a credit/hour basis for a given computer. Is this the case? And if so, how do you feel about the credits/hour for the various machines (is machine 2 really about 2.5 faster than machine 1)? ID: 2136 · Reply Quote

FluffyChicken Send message Joined: 17 Feb 06 Posts: 54 Credit: 710 RAC: 0	Message 2137 - Posted: 16 Aug 2006, 20:35:20 UTC - in response to Message 2130. With what I've seen so far in one day with up to a 168% difference from lowest to highest credits for one single computer it's nowhere near ready to roll out on Rosetta. You're moving to a cherry picking heaven at the moment I would guess. Wouldn't be hard for some of the larger teams (or bord individual) to create a program grab the stats, see what the initial credits claimed are for that type and tell the team. Fluffy, dcdc, tralala, I think you are mistaken: Cherry-picking is NOT possible ! The variability you are seeing in the credits is not between different WU types but because of the different completion times of the models _within_ the WUs. Even if, say, the first model of a WU takes a long time to complete this doesn't tell you anything about how long the following models will take. This is a completely random process. So terminating WUs that start with a 'slow' model won't help you either. Fluffy, instead of 168% difference you could also say +/-45% difference with respect to the average. Example: average=10, 10-45%=5.5, 10+45%=14.5, 164% difference between lowest and highest. I think this is acceptable, considering that most values will be much closer to the average. I know I could say that and I certainly wouldn't say a 45% difference was acceptible, yes it's better than the x3 optimised usualy give over the standard. But the way I currently see it is that you may as well just count the number of models you creat instead of assigning a crdit value to it. It'll save an lot of bother ;-) All in all if you are going to go down this route I would have thought that you put everything into pending credit, then apply the awarding of credit till you have a statistically sound credit awarding per job type. Then adjust for time taken (using an internal timing procedure, not the boinc core client... This alters for actual work done not assumed work done) Pending credit happens on any project that uses a quorum anyway so that's not really problem and it would only happen for the first 'however many you think necessary x00's of retuned jobs before it could start graning instantly again. Preferably from trusted clients you know of on Rosetta. Not here on Ralph. that free's up ralph to actually impove the client without getting in the way of credit. That should make it quicker as you'll get though far more task in a day than you would in a week in rosetta (I should hope). P.S How do you actually intend to get the 'credit per model' across all the platform. The only way I can see it is to build a client (if you really still want to use Ralph for the benchmarking) is to send out the ralph-client with an internal benchmark to the testers, that would bypass having to worry about the boinc client used. ID: 2137 · Reply Quote

FluffyChicken Send message Joined: 17 Feb 06 Posts: 54 Credit: 710 RAC: 0	Message 2138 - Posted: 16 Aug 2006, 20:39:40 UTC - in response to Message 2136. Last modified: 16 Aug 2006, 20:46:21 UTC mmciastro, maybe I'm reading your graph wrong (thanks for posting it btw). . . but it looks very consistant on a credit/hour basis for a given computer. Is this the case? And if so, how do you feel about the credits/hour for the various machines (is machine 2 really about 2.5 faster than machine 1)? I think your reading it wrong, the credit/hr (C/H) should be consistent as it's the boinc benchmarks credit per hour. The G/H is the new method EDIT: Machine one is a Celeron 500, two is a Pentium 4 1.8GHz so probably not fa off. http://i65.photobucket.com/albums/h228/mmciastro/Ralphnewprojectcomparison5.jpg mmciastro, culd you set them to run at 3hr units. which I believe is the default here (and so I guess at Rosetta, I cannot check as it's under maintenace there) Since that's what the majority will be using, err, by defualt ;-) ID: 2138 · Reply Quote

Astro Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0	Message 2139 - Posted: 16 Aug 2006, 20:52:15 UTC - in response to Message 2136. mmciastro, maybe I'm reading your graph wrong (thanks for posting it btw). . . but it looks very consistant on a credit/hour basis for a given computer. Is this the case? And if so, how do you feel about the credits/hour for the various machines (is machine 2 really about 2.5 faster than machine 1)? The first four columns are the same as you would see if you looked at "your results" page, the last four are my own composition "C/H" is claimed credit/hour, G/H is granted credit/hour, Model is the number of models done, and ofcourse the last column is the name of the wu done. Claimed credit, and C/H are consistent as they are based on the benchmark credit sytem. Granted credit and G/H are what I'm actually getting. It's Granted Credit or G/H that is all over the map, but as long as they average out to the same as other projects, then they have done a good job in "cross project parity". Currently my G/H is higher (on average) than I get from other projects (see earlier posted cross project comparison chart). does this help? tony ID: 2139 · Reply Quote

Astro Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0	Message 2140 - Posted: 16 Aug 2006, 20:56:42 UTC - in response to Message 2138. mmciastro, culd you set them to run at 3hr units. which I believe is the default here (and so I guess at Rosetta, I cannot check as it's under maintenace there) Since that's what the majority will be using, err, by defualt ;-) Done, though I don't know what it will do if they release this new system tomorrow. LOL ID: 2140 · Reply Quote

Aaron Finney Send message Joined: 16 Feb 06 Posts: 56 Credit: 1,457 RAC: 0	Message 2141 - Posted: 16 Aug 2006, 21:21:43 UTC - in response to Message 2134. Last modified: 16 Aug 2006, 21:35:22 UTC First, a question - I must say, DeKim (david?) I really, honestly think that you are going the wrong direction here. I understand that you want to change the existing credit system, and because of that it is safe to infer that you felt the existing system wasn't working. Why? Back in the beginning (pre-boinc), SETI@Home used a system that was not too unlike your proposed method. There also was no quorum for results, and the credit system was fairly basic - 1 credit per workunit. Everyone thought this was fine and fair, but in reality this system was more bugged and UNFAIR than you could imagine. Some workunits would take much much longer, and there really wasn't any way that you could determine an average run time for every workunit without completely processing all of them (which defeats the purpose of having the DC project) ALTHOUGH, with enough care and thought, you could predict runtimes for 95%+ of the work. When BOINC was created, it was a harsh change to move to a crediting system that was based more on the actual work done closer to the process thread level than to assign arbitrary values to each workunit. If you want to change the existing system - Work with David Anderson and Rom Walton, and see if you can iron out the wrinkles. The -ONLY- fair way of granting credit is to calculate actual work done using total FLOPS or some other completely scientific method. NOT the only seemingly accurate (yet still arbitrary) averaging system you have implemented here. Please understand, I have been with BOINC since version 0.07, and SETI for years before that. The problems that exist now with the current credit system FAR OUTWEIGH the problems we had beforehand. I hope that you take this message to heart, and understand that what I feel you are doing is taking a step backwards in your attempt to be revolutionary. I also support you and this project in whatever changes you make; However, this doesn't mean that I am not more than mildly upset at the change. I'm sorry that I haven't spoken on the subject earlier, but work has ahold of me as of late. As we speak, I'm at the world AIDS conference in Toronto, and could not wait to comment here until I returned home as I feel that strongly that you could be making a big mistake. CERTAINLY, (and in the very least) - I would convey that MUCH more testing is needed. Credit is something that lives at the core of many of your constituents. Monopoly just isn't Monopoly without Boardwalk and Free Parking. Tread extremely carefully here if you do nothing else! We can limit the number of wu you can abort. We can change the work unit distribution to be more homogenous. We can update the credit/model values. We can penalize people trying to cherry pick. It seems easy enough to me if it becomes a problem. just some ideas off the top of my head. ID: 2141 · Reply Quote

feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0	Message 2142 - Posted: 16 Aug 2006, 21:36:45 UTC - in response to Message 2126. these are very small numbers. it should even out. there will always be differences even with the same work unit because of the random nature of the runs. David, is there any way to further refine credit? I mean for example to measure the value of a compelte relax run as compared to a model that aborts it? Do the WU results return that level of detail as to how many full atom relaxes were aborted and at what point? I've mentioned the idea of encrypting the .out data previously, but later it dawned on me that this would tend to get in the way of fun things like displaying your completed predictions... I suppose in that case it still runs the .out file through Rosetta, so it could do the needed decyption. But anyway, if you could encrypt portions of the results file to enable simple authentication of the results being claimed, and yet leave as much of the human readable stuff as you can that would be a good compromise. I'm hoping over time you can describe more of the contents of the .out file, so participants can better see the mechanics of how it all works and what results our clients are reporting back. ID: 2142 · Reply Quote

Ethan Send message Joined: 11 Feb 06 Posts: 18 Credit: 25,579 RAC: 0	Message 2143 - Posted: 16 Aug 2006, 21:45:08 UTC - in response to Message 2141. Last modified: 16 Aug 2006, 21:46:11 UTC First, a question - I must say, DeKim (david?) I really, honestly think that you are going the wrong direction here. I understand that you want to change the existing credit system, and because of that it is safe to infer that you felt the existing system wasn't working. Why? The current system places credits in the hands of individual particpants. . you essentially keep your own score. Want more credits? Just claim more credits by making your benchmarks higher than possible (some computers claim 15+ Gflops per cpu. . even at 4ghz and running two floating point calculations a cycle, that's 8 Gflops. . and cpu's don't get near theoretical). Don't get too greedy or you'll get zerod out. It's like speeding, if you go the same speed as everyone else in the left lane, even if 10 over, you're not likely to get in trouble. . . if you're going 40 over in the median, you'll get busted. Not a very good system imo. The new system is taking the law of averages to make the playing field more fair. It's taking the average credit claim from hundreds of machines and applying the same score to each. ID: 2143 · Reply Quote

feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0	Message 2144 - Posted: 16 Aug 2006, 21:48:39 UTC - in response to Message 2141. ...I understand that you want to change the existing credit system, and because of that it is safe to infer that you felt the existing system wasn't working. Why? The reasons why are numerous and sprawled throughout the Rosetta boards, including the infamous (and deleted) cheating thread. I you were not aware of it... the current BOINC implementation allows a user to basically modify a simple file with notepad and claim their machine is 10x (or 1000x) faster then it really is. That's the basic premise of the need for change. And that's why many of the BOINC projects are changing in ways appropriate for each project's work. In the end, the new system will still equate back to the FLOPS that BOINC proports to measure. But it will be much more difficult to modify your results and try to claim more credits. ID: 2144 · Reply Quote

Aaron Finney Send message Joined: 16 Feb 06 Posts: 56 Credit: 1,457 RAC: 0	Message 2145 - Posted: 16 Aug 2006, 22:08:12 UTC - in response to Message 2143. The current system places credits in the hands of individual particpants. . you essentially keep your own score. Want more credits? Just claim more credits by making your benchmarks higher than possible (some computers claim 15+ Gflops per cpu. . even at 4ghz and running two floating point calculations a cycle, that's 8 Gflops. . and cpu's don't get near theoretical). Don't get too greedy or you'll get zerod out. It's like speeding, if you go the same speed as everyone else in the left lane, even if 10 over, you're not likely to get in trouble. . . if you're going 40 over in the median, you'll get busted. Not a very good system imo. The new system is taking the law of averages to make the playing field more fair. It's taking the average credit claim from hundreds of machines and applying the same score to each. No offense, but the logic for this change astounds me. The current system itself is not flawed in it's fairness to all - on the contrary, it is the most fair. The problem is that the current system places too much responsibility for ones credit in the hands of the unknown. The only fair and appropriate way to fix it is to remove the power to control ones own credit from the CURRENT system. What has been proposed here is to add back in the inaccurracies of the past at the cost of losing fairness to all. Credit manipulating should never have been allowed by the public. These values should be encrypted and no access to them should be provided to the GP. ID: 2145 · Reply Quote

Aaron Finney Send message Joined: 16 Feb 06 Posts: 56 Credit: 1,457 RAC: 0	Message 2146 - Posted: 16 Aug 2006, 22:11:46 UTC - in response to Message 2144. Last modified: 16 Aug 2006, 22:15:50 UTC ...I understand that you want to change the existing credit system, and because of that it is safe to infer that you felt the existing system wasn't working. Why? The reasons why are numerous and sprawled throughout the Rosetta boards, including the infamous (and deleted) cheating thread. I asked for an answer, not a generalization. Even if they are numerous, then there is no better place to index them than here. I you were not aware of it... the current BOINC implementation allows a user to basically modify a simple file with notepad and claim their machine is 10x (or 1000x) faster then it really is. That's the basic premise of the need for change. And that's why many of the BOINC projects are changing in ways appropriate for each project's work. Then -THAT- would be the problem to fix. As I said, encrypt these values and do not allow the GP access to the tools needed to change them. In the end, the new system will still equate back to the FLOPS that BOINC proports to measure. But it will be much more difficult to modify your results and try to claim more credits. The new system will be a system of averages. Nothing more. Averages are hardly accurate. Where are my Swiss and German friends of yesteryear? Where is Jens? Where is Riedel? The founders of BOINC would be uproarious about this change. ID: 2146 · Reply Quote

dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0	Message 2147 - Posted: 16 Aug 2006, 22:14:43 UTC - in response to Message 2141. First, a question - I must say, DeKim (david?) I really, honestly think that you are going the wrong direction here. I am just adding more info for user's for now. If you want to know how much actual work you've done compared to others, look at the new info once it's up. If not, just ignore it. My goal in response to users and just plain old logic is to offer a more fair credit system. -- David K ID: 2147 · Reply Quote

Aaron Finney Send message Joined: 16 Feb 06 Posts: 56 Credit: 1,457 RAC: 0	Message 2148 - Posted: 16 Aug 2006, 22:18:01 UTC - in response to Message 2147. Last modified: 16 Aug 2006, 22:19:08 UTC First, a question - I must say, DeKim (david?) I really, honestly think that you are going the wrong direction here. I am just adding more info for user's for now. If you want to know how much actual work you've done compared to others, look at the new info once it's up. If not, just ignore it. My goal in response to users and just plain old logic is to offer a more fair credit system. -- David K Fair is the wrong word, but I can see why you use it. It's much more political than saying 'The credit system will now be harder to manipulate by malicious users, at the expense of accuracy.' I understand the problem, but I think the solution is improper. ID: 2148 · Reply Quote

Ethan Send message Joined: 11 Feb 06 Posts: 18 Credit: 25,579 RAC: 0	Message 2149 - Posted: 16 Aug 2006, 22:18:22 UTC - in response to Message 2145. Last modified: 16 Aug 2006, 22:22:31 UTC No offense, but the logic for this change astounds me. The current system itself is not flawed in it's fairness to all - on the contrary, it is the most fair. We astound each other :) So if a PGA golfer finished a round and reported his score as 17 (you keep your own score in golf), the rest of the players have to accept it even though obviously false? Everyone who uses optimized clients are raising their credit claims to levels not based on the properties of their hardware. It's not a fair system when you have to use modified software (or subtract 60 from your golf score) to stay competitive. There is not less accuracy with the new system, there is more. . by definition of averaging out work unit claims. It's like a quorum of 100 (or 1000, however many are used to create the average). I agree that the uber solution would be to have the Boinc folks release a scoring system that's hidden in compiled code. . but that's not feasible when the code is open source and out of Rosetta's hands. Fair is the wrong word, but I can see why you use it. It's much more political than saying 'The credit system will now be harder to manipulate by malicious users, at the expense of accuracy.' I still fail to see how the new system is less accurate. Senario 1, everyone has optimized clients and claim whatever they feel like. Result = no accuracy whatsoever. Senario 2, half the people use optimized clients, the other half use standard ones. Result = a more accurate scoring system and level playing field. ID: 2149 · Reply Quote

Aaron Finney Send message Joined: 16 Feb 06 Posts: 56 Credit: 1,457 RAC: 0	Message 2150 - Posted: 16 Aug 2006, 22:21:32 UTC - in response to Message 2149. We astound each other :) So if a PGA golfer finished a round and reported his score as 17 (you keep your own score in golf), the rest of the players have to accept it even though obviously false? Everyone who uses optimized clients are raising their credit claims to levels not based on the properties of their hardware. It's not a fair system when you have to use modified software (or subtract 60 from your golf score) to stay competitive. Just because some people have found an open door allowing them to sidestep the system -DOES NOT MEAN- that the system in it's design is not fair. You do not reinvent the system simply because somebody found a back door. You close the door and put a lock on it. ID: 2150 · Reply Quote

dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0	Message 2151 - Posted: 16 Aug 2006, 22:21:59 UTC - in response to Message 2148. First, a question - I must say, DeKim (david?) I really, honestly think that you are going the wrong direction here. I am just adding more info for user's for now. If you want to know how much actual work you've done compared to others, look at the new info once it's up. If not, just ignore it. My goal in response to users and just plain old logic is to offer a more fair credit system. -- David K Fair is the wrong word, but I can see why you use it. It's much more political than saying 'The credit system will now be harder to manipulate by malicious users, at the expense of accuracy.' I understand the problem, but I think the solution is improper. Fair is completely the right word as I stated it as a goal as in "the goal is to be fair." ID: 2151 · Reply Quote