New crediting system

Message boards : Current tests : New crediting system

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 13 · Next

AuthorMessage
Profile [AF>Le_Pommier] ninicool
Avatar

Send message
Joined: 28 Feb 06
Posts: 4
Credit: 4,699
RAC: 0
Message 2020 - Posted: 12 Aug 2006, 20:15:50 UTC

hello dekim,
I confirm you that the bugs about the "team info" ans "Resource share " are fixed.
I prefer the new system of points which seems to be more fair than the system to give 2 credits/model value.
(Sorry for my bad English)
ID: 2020 · Report as offensive    Reply Quote
tralala

Send message
Joined: 12 Apr 06
Posts: 52
Credit: 15,257
RAC: 0
Message 2022 - Posted: 12 Aug 2006, 20:29:37 UTC
Last modified: 12 Aug 2006, 20:30:51 UTC

So far only one result from my host:

https://ralph.bakerlab.org/result.php?resultid=242756

In this case it seems granted credit is a bit high. I suppose a host with my specs (Athlon 64@2.4 GHZ = Athlon 64 3800+) would get around 14-24 credits but I got even >25. This would even top Einstein which grants currently the most credit/hour.
ID: 2022 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 2024 - Posted: 12 Aug 2006, 21:15:03 UTC
Last modified: 12 Aug 2006, 21:20:15 UTC

Looking only at the new credit, not the 2cr/model stats.
I have set my puters to run ralph with priority so I can get more samples. With one result in on 4 of my five puters, they all are granted from 33% to %300 more than the claimed credit (based upon stock boinc app/clients). I'll have more later.
ID: 2024 · Report as offensive    Reply Quote
kevint

Send message
Joined: 24 Feb 06
Posts: 8
Credit: 1,568,696
RAC: 0
Message 2025 - Posted: 12 Aug 2006, 21:27:50 UTC - in response to Message 2022.  

So far only one result from my host:

https://ralph.bakerlab.org/result.php?resultid=242756

In this case it seems granted credit is a bit high. I suppose a host with my specs (Athlon 64@2.4 GHZ = Athlon 64 3800+) would get around 14-24 credits but I got even >25. This would even top Einstein which grants currently the most credit/hour.



Not so bad - as long as credit is granted equally across the board. Should Rosetta, grant more credit per hour than other projects it would have the tendancy to draw in more crunchers.
Anything from 19-25 credits per hour per core(based on Pent D 920 ) is about correct for correct cross project equality.
I have several Pent D 920's and running non-optimized I get an average of 23 credit per hour per core on SETI, QMC, SIMAP, Currently Rosetta is a bit higher than that averaging about 28, E@H is around 30 but have not crunched that for a couple of weeks to test.

Currently there are talks at E@H for project optimization apps that will would increase the current credit per hour about 20%-30%(faster WU's more credit per hour). And SETI 5.17 is suppose to incorporate a higher muliplier to match E@H, and further optimization on SETI would allow for even a greater credit per hour.
I know that in prelimiary testing of highly optimized SETI 5.15 apps I was seeing 1700-1800 credit a day on a Pent D 930. On Rosetta that same machine gets around 1200-1300.

My fear is that should Rosetta not at least be equal to these other projects - there will be an exodus to these other higher yielding projects and the Rosetta Project as a whole would suffer as "credit whores" gravitate to the projects that yield the most bang for the buck.

ID: 2025 · Report as offensive    Reply Quote
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 250
Credit: 543,579
RAC: 0
Message 2028 - Posted: 13 Aug 2006, 1:21:49 UTC

The claimed credit is based on the standard boinc method so actually for this example the credit/model method gave a lower amount of credit, 35 vs 25. It would be helpful if users indicate whether they are using optimized clients.
ID: 2028 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 2029 - Posted: 13 Aug 2006, 1:35:56 UTC
Last modified: 13 Aug 2006, 1:36:43 UTC

The result listed by tralala is from 5.5.0 for windows and since Boinc never made a 5.5.0 for windows, it must be optimized/third party. The result he called out "https://ralph.bakerlab.org/result.php?resultid=242756" is from an optimized/third party boinc client.
ID: 2029 · Report as offensive    Reply Quote
Profile Trog Dog
Avatar

Send message
Joined: 8 Aug 06
Posts: 38
Credit: 41,996
RAC: 0
Message 2032 - Posted: 13 Aug 2006, 2:36:36 UTC - in response to Message 1991.  

Why keep the old system in place?

In part so everyone can see they're still being credited fairly. Once folks get used to the new system and it is calibrated to BOINC flops and credit values, I presume the "claimed credits" information would eventually be phased out.



G'day feet1st

You say you presume that the old system will eventually be abandoned, but so far there is nothing from the project devs to confirm this.

Maybe I'm reading things wrong, but the official message as I read it is the old system will be kept in place and both will be reported,but the old system will continue to be exported.
ID: 2032 · Report as offensive    Reply Quote
Profile Trog Dog
Avatar

Send message
Joined: 8 Aug 06
Posts: 38
Credit: 41,996
RAC: 0
Message 2033 - Posted: 13 Aug 2006, 2:47:20 UTC - in response to Message 2025.  





Currently there are talks at E@H for project optimization apps that will would increase the current credit per hour about 20%-30%(faster WU's more credit per hour). And SETI 5.17 is suppose to incorporate a higher muliplier to match E@H, and further optimization on SETI would allow for even a greater credit per hour.



G'day Kevint

The new optimised Einstein apps are in beta testing at the moment and lead to significant speedups, once released as the official app the credits will be adjusted.

The adjustment will be to bring it back into line with the other projects.
ID: 2033 · Report as offensive    Reply Quote
Ringold

Send message
Joined: 13 Aug 06
Posts: 2
Credit: 26,104
RAC: 0
Message 2034 - Posted: 13 Aug 2006, 4:46:29 UTC - in response to Message 2033.  
Last modified: 13 Aug 2006, 4:59:47 UTC

Thats good to hear; otherwise it'd be a silly arms race with scientists that should be more mature than that.

Anyway, this whole issue is astounding. I think the new method seems to be the closest R@H can get to Folding@Home's method of granting credit, and F@H is as fair as you can get.

What I see is a knee-jerk reaction to no longer being able to cheat (tralla; anyone using an 'optimized' client is attempting to get higher points than the masses and justifications of lame benchmarks is merely an excuse since everyone has lame benchmarks -- except the cheaters with optimized clients. Dictionary could tell anyone as much!) and get sky-high points. Everything else is not understanding the credit system. What else could be had against a system that promises fair credit distribution?

Only fault with the plan seems to be thinking that the majority of the RALPH pool will be more 'pure' than the R@H pool of users.. if tralla is an example, maybe not the case? If F@H's system can be used directly here somehow I think it'd be the best possible solution instead of trusting the masses. As long as BOINC is open-source or can be reverse engineered, there will be corrupted clients, and as long as there are corrupted clients, there will be people complaining (and rightfully so).

And regarding what seemed to be a misconception about the 'golden machine' rule in general.. Why would you need 60 machines to establish a baseline? Yes, a P4 3.0ghz will behave differently than a A64 X2 4800 or an Core 2 Duo 6700. The other two will probably get more work done in a unit of time even at the same clock speeds (depending on optimizations). Therefore, they'll pound out more WU or at least WU worth more points. Thats the point, being credited based on scientific output, not CPU time or even artificial measurements of theoretical FLOPS preferably. Some CPU's get more work done than others, obviously! That type of system can't be tampered with in terms of rewarding points for science. I don't visit the main F@H forums often, but I do the top leading teams (HardOCP & those aussies) and virtually never has there been dissent over credit claims. I frequent the [H] board daily, and actually cant recall it ever being an issue. There really can't be; its set server side based on the baseline, and that is that. If one wants to 'cheat' and get more points than one can overclock to the limits of stability, but beyond that, no real cheating.


Anyway, looking forward to some RALPH WU. Looks like this will be an important project, and someone indicated RALPH gets light WU streams. Finishing up 2 CPDN models on an X2 but want to help R@H in the mean time since I hadn't in a while so this will be perfect.
ID: 2034 · Report as offensive    Reply Quote
Profile Astro

Send message
Joined: 16 Feb 06
Posts: 141
Credit: 32,977
RAC: 0
Message 2036 - Posted: 13 Aug 2006, 9:49:35 UTC

Here's the latest. Hope it's visible as it was shrunk a great deal. All results are from boinc 5.5.11 (standard). All results in green are from the 2cr/model run, results in black are from this latest run.

ID: 2036 · Report as offensive    Reply Quote
NJMHoffmann

Send message
Joined: 17 Feb 06
Posts: 8
Credit: 1,270
RAC: 0
Message 2037 - Posted: 13 Aug 2006, 12:18:49 UTC - in response to Message 2036.  
Last modified: 13 Aug 2006, 12:19:07 UTC

Here's the latest. Hope it's visible as it was shrunk a great deal. All results are from boinc 5.5.11 (standard).

Why so many statistics with this 2 cr/model? Ralph is testing the software, the credits are nonsense now. They tell us nothing, because Rosetta will have different fixed credits/model for each WU-type. What might be interesting is, if the times/model for a given WU-type are consistent enough to go to a fixed credit model.

Norbert
ID: 2037 · Report as offensive    Reply Quote
Hoelder1in

Send message
Joined: 17 Feb 06
Posts: 11
Credit: 46,359
RAC: 0
Message 2038 - Posted: 13 Aug 2006, 13:41:33 UTC - in response to Message 2016.  
Last modified: 13 Aug 2006, 14:07:29 UTC

FOR TESTING: I also made the credit/model value for each work unit determined from the most recent average so the work credits should be more accurate for different work units (particularly as more results are returned) rather than the 2 credit/model value. This way, we can get an idea of how the credit granting will be for different sized work units. This is just for testing though. For R@h, we will use the average value from the Ralph runs so everyone will get the same credit/model for a given work unit rather than a value that may change a bit initially.
I clicked through a number of recent results pages and it seems that the current credit/model averages are significantly higher than what the old crediting system would have assigned for Windows/standard client computers. I wonder why this is the case. Are you perhaps using the mean instead of the median to do the credits/model averaging and are thus affected by outliers on the high side ? I really think you should use the median which will make your averages largely independent of any low and high outliers and automatically 'select' the (Windows/standard client) majority population as averages. Another option would be a weighted mean (weighting factors 1/value, mean = n/Sum_1..n(1/v_i)) which de-emphasizes the high values. Here is an example to demonstrate the effect ot the different averaging methods: For the 10 values 2 3 5 5 5 5 5 10 15 20 the mean is 7.5, the median is exactly 5 and the weighted average would be 4.88. Would be interesting to hear your thinking on that.
ID: 2038 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2040 - Posted: 14 Aug 2006, 17:12:32 UTC - in response to Message 2032.  

G'day feet1st

You say you presume that the old system will eventually be abandoned, but so far there is nothing from the project devs to confirm this.

Maybe I'm reading things wrong, but the official message as I read it is the old system will be kept in place and both will be reported,but the old system will continue to be exported.


You are correct, I'm reading in to their statement that they are "changing the credit system". From that I infer that eventually the new system, whether in it's initial form, or after some revisions, will be used for credit reporting. And that the dual credit system is a temporary measure as we all become familiar with the new system.
ID: 2040 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2041 - Posted: 14 Aug 2006, 17:26:15 UTC - in response to Message 2005.  
Last modified: 14 Aug 2006, 17:29:39 UTC

Hoelder1in
1) Calibration:

...

2) Release a new WU to study, on Ralph:

...

Feet1st, I don't think what you say under 1) and 2) is correct...

I absolutely love the 'apples to apples comparison'. ;-)


I had a fairly lengthy reply all written up when my satilite internet connection dropped :(

Basically, I agree that all of #1 I had described is specuation on my part... suffice it to say there is a black box #1, which is some means of calibration (i.e. defining how large "bushels" are) and once that is in place, then the process described in #2 and #3 can proceed. I see your point about using the BOINC reported credits and the averages David Kim described... I'd still like to see a detailed explaination of the plan, and how a slow & fast machine on Ralph will impact the credit award per model that is determined.

The apples to apples comparison is based on personal experience :) My first gainful employment, age 12 I believe, was as an apple picker. Some days you'd be assigned a row of trees which grow larger apples (i.e. less hand motions to fill a bushel), and other days trees that grow smaller apples (if your hands are large enough, sometimes you can grab more then one apple per grab). For some people the small apples take longer to fill a bushel, for others they could fill faster. Depends on your hands, experience, how far you can reach from the ladder without climbing down and moving it, etc. etc. ...but the farmer knew when he was assigning you tougher trees, and made a point of rotating around so everyone had to share the task.
ID: 2041 · Report as offensive    Reply Quote
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 250
Credit: 543,579
RAC: 0
Message 2042 - Posted: 14 Aug 2006, 18:25:33 UTC - in response to Message 2038.  

FOR TESTING: I also made the credit/model value for each work unit determined from the most recent average so the work credits should be more accurate for different work units (particularly as more results are returned) rather than the 2 credit/model value. This way, we can get an idea of how the credit granting will be for different sized work units. This is just for testing though. For R@h, we will use the average value from the Ralph runs so everyone will get the same credit/model for a given work unit rather than a value that may change a bit initially.
I clicked through a number of recent results pages and it seems that the current credit/model averages are significantly higher than what the old crediting system would have assigned for Windows/standard client computers. I wonder why this is the case. Are you perhaps using the mean instead of the median to do the credits/model averaging and are thus affected by outliers on the high side ? I really think you should use the median which will make your averages largely independent of any low and high outliers and automatically 'select' the (Windows/standard client) majority population as averages. Another option would be a weighted mean (weighting factors 1/value, mean = n/Sum_1..n(1/v_i)) which de-emphasizes the high values. Here is an example to demonstrate the effect ot the different averaging methods: For the 10 values 2 3 5 5 5 5 5 10 15 20 the mean is 7.5, the median is exactly 5 and the weighted average would be 4.88. Would be interesting to hear your thinking on that.


Currently, I am just using the quotient of the claimed_credit and model totals for each work unit batch which get saved in a project specific table that we created. I wanted to avoid having to query the result table which would be necessary to get the median. I'd also have to add a project specific column to the result table which would hold the model count and I'm trying to avoid modifying the BOINC tables. I could use a correction factor if the descrepancies are significant enough. Can you point me to the results that you are talking about?

ID: 2042 · Report as offensive    Reply Quote
Hoelder1in

Send message
Joined: 17 Feb 06
Posts: 11
Credit: 46,359
RAC: 0
Message 2043 - Posted: 14 Aug 2006, 18:51:50 UTC - in response to Message 2042.  
Last modified: 14 Aug 2006, 19:16:40 UTC

Currently, I am just using the quotient of the claimed_credit and model totals for each work unit batch which get saved in a project specific table that we created. I wanted to avoid having to query the result table which would be necessary to get the median. I'd also have to add a project specific column to the result table which would hold the model count and I'm trying to avoid modifying the BOINC tables. I could use a correction factor if the descrepancies are significant enough. Can you point me to the results that you are talking about?
OK, I understand, if you only have these two numbers available that's of course the only thing you can do. I simply looked at the granted to claimed credit ratios on some of the recent results pages that looked like they were using the standard client. Here are a few examples: 1, 2, 3 (first six).

ID: 2043 · Report as offensive    Reply Quote
dcdc

Send message
Joined: 15 Aug 06
Posts: 26
Credit: 89,834
RAC: 0
Message 2044 - Posted: 15 Aug 2006, 8:34:02 UTC - in response to Message 2007.  
Last modified: 15 Aug 2006, 8:36:32 UTC

Hi everyone - I've just registered for Ralph, although I've not really had any problems with Rosetta in the past so i've never connected any computers up to it. I could drop a 12hr/day P3 1GHz on it in a few weeks if that's any use?

I read the entire thread last night(!) and I think the new credit system is definitely along the right lines. I'm not sure about using Ralph to calculate a WU mean score though if the Ralph app differs from the Rosetta one (doesn't it produce debug info etc?). What about when you release new versions onto Ralph that take more or less time than the current Rosetta release? It might be wise to hand pick some known dependable machines for this task, whether on Ralph or Rosetta, and use those to calibrate the credits per decoy, rather than using the whole of Ralph as a test bed. They would just have to have a consistent hardware config (and preferably on 24/7 for a fast turn-around time), and you'd need to monitor/be informed if this changed. Would that take a lot of work? I guess you'd need a table of 'golden machines' that would be sent the jobs first, then the credit per decoy could be calculated before general release. Of course the 'golden' machines would be given whatever credit they request, so they'd have to be from trusted sources, but there are plenty of those available.

I don't mind submitting a machine that will have a constant hardware config - i'm going to be putting a backup server in the loft at some point so it'd be fine to run that.


Is there any chance of calculating the historical 'work credit' based on average claimed credit per work unit? (i.e., for previous models crunched over the years).

If Rosetta publishes the user.xml stats etc, I think it should be the work credits stats which are published, but if this is to be done, it would be best to grant work credit for all decoys that the participant has processed. My guess is that Rosetta does have the data to do this (although perhaps not the time).


I've requested this over on the Rosetta forums as I'd like to have a look at this too (although this is probably a much better place to ask!).

It seems to me it would probably be a fairly simple task to backdate the new credit model fairly accurately for all previous jobs, using computers which are known to have the same hardware/software config from the start, then calculating a credits/hour figure for each computer and then using the number of decoys/time taken to calculate the number of credits due for each type of WU decoy.

I know there is a risk that some people will use this as fuel for more flaming, but I think we'll be able to get a good idea of whether the credits can be backdated reasonably accurately from this. Could a table showing the WU name, number of decoys produced, time taken and the benchmark score be released? I know it's gonna be a big file!

Danny
ID: 2044 · Report as offensive    Reply Quote
Hoelder1in

Send message
Joined: 17 Feb 06
Posts: 11
Credit: 46,359
RAC: 0
Message 2045 - Posted: 15 Aug 2006, 8:49:56 UTC - in response to Message 2043.  
Last modified: 15 Aug 2006, 9:08:40 UTC

Currently, I am just using the quotient of the claimed_credit and model totals for each work unit batch which get saved in a project specific table that we created. I wanted to avoid having to query the result table which would be necessary to get the median. I'd also have to add a project specific column to the result table which would hold the model count and I'm trying to avoid modifying the BOINC tables. I could use a correction factor if the descrepancies are significant enough. Can you point me to the results that you are talking about?
OK, I understand, if you only have these two numbers available that's of course the only thing you can do. I simply looked at the granted to claimed credit ratios on some of the recent results pages that looked like they were using the standard client. Here are a few examples: 1, 2, 3 (first six).
So you already did apply a correction factor, didn't you ? :-) I just randomly picked 13 standard client results that were sent out after midnight (UTC) and calculated the granted to claimed credit ratios. I get a mean of 1.16 with standard dev. 0.35 and an error of the mean of 0.10. So within the sampling error of this small sample your correction factor seems to be dead on. Presumably, you calculated it from a larger sample, so it probably is even more accurate. I guess the correction factor would have to be reviewed occasionally to correct for changes in the composition of the Ralph participants (Linux/Windows, standard/optimized client).

ID: 2045 · Report as offensive    Reply Quote
Profile Trog Dog
Avatar

Send message
Joined: 8 Aug 06
Posts: 38
Credit: 41,996
RAC: 0
Message 2046 - Posted: 15 Aug 2006, 10:20:46 UTC - in response to Message 2040.  

G'day feet1st

You say you presume that the old system will eventually be abandoned, but so far there is nothing from the project devs to confirm this.

Maybe I'm reading things wrong, but the official message as I read it is the old system will be kept in place and both will be reported,but the old system will continue to be exported.


You are correct, I'm reading in to their statement that they are "changing the credit system". From that I infer that eventually the new system, whether in it's initial form, or after some revisions, will be used for credit reporting. And that the dual credit system is a temporary measure as we all become familiar with the new system.


Cheers feet1st

It would be nice to get some official confirmation that this is the case.
ID: 2046 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 2047 - Posted: 15 Aug 2006, 14:35:27 UTC - in response to Message 2046.  

It would be nice to get some official confirmation that this is the case.

Agreed. David Kim mentioned preparing a more detailed description, and from the discussion here, he's got a pretty good idea what people on Rosetta will need clearly described in order to understand the new system. So, I'll wait patiently.

ID: 2047 · Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 13 · Next

Message boards : Current tests : New crediting system



©2024 University of Washington
http://www.bakerlab.org