Message boards : Current tests : New crediting system
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 13 · Next
Author | Message |
---|---|
[AF>Le_Pommier] ninicool Send message Joined: 28 Feb 06 Posts: 4 Credit: 4,699 RAC: 0 |
hello dekim, I confirm you that the bugs about the "team info" ans "Resource share " are fixed. I prefer the new system of points which seems to be more fair than the system to give 2 credits/model value. (Sorry for my bad English) |
tralala Send message Joined: 12 Apr 06 Posts: 52 Credit: 15,257 RAC: 0 |
So far only one result from my host: https://ralph.bakerlab.org/result.php?resultid=242756 In this case it seems granted credit is a bit high. I suppose a host with my specs (Athlon 64@2.4 GHZ = Athlon 64 3800+) would get around 14-24 credits but I got even >25. This would even top Einstein which grants currently the most credit/hour. |
Astro Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0 |
Looking only at the new credit, not the 2cr/model stats. I have set my puters to run ralph with priority so I can get more samples. With one result in on 4 of my five puters, they all are granted from 33% to %300 more than the claimed credit (based upon stock boinc app/clients). I'll have more later. |
kevint Send message Joined: 24 Feb 06 Posts: 8 Credit: 1,568,696 RAC: 0 |
So far only one result from my host: Not so bad - as long as credit is granted equally across the board. Should Rosetta, grant more credit per hour than other projects it would have the tendancy to draw in more crunchers. Anything from 19-25 credits per hour per core(based on Pent D 920 ) is about correct for correct cross project equality. I have several Pent D 920's and running non-optimized I get an average of 23 credit per hour per core on SETI, QMC, SIMAP, Currently Rosetta is a bit higher than that averaging about 28, E@H is around 30 but have not crunched that for a couple of weeks to test. Currently there are talks at E@H for project optimization apps that will would increase the current credit per hour about 20%-30%(faster WU's more credit per hour). And SETI 5.17 is suppose to incorporate a higher muliplier to match E@H, and further optimization on SETI would allow for even a greater credit per hour. I know that in prelimiary testing of highly optimized SETI 5.15 apps I was seeing 1700-1800 credit a day on a Pent D 930. On Rosetta that same machine gets around 1200-1300. My fear is that should Rosetta not at least be equal to these other projects - there will be an exodus to these other higher yielding projects and the Rosetta Project as a whole would suffer as "credit whores" gravitate to the projects that yield the most bang for the buck. |
dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0 |
The claimed credit is based on the standard boinc method so actually for this example the credit/model method gave a lower amount of credit, 35 vs 25. It would be helpful if users indicate whether they are using optimized clients. |
Astro Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0 |
The result listed by tralala is from 5.5.0 for windows and since Boinc never made a 5.5.0 for windows, it must be optimized/third party. The result he called out "https://ralph.bakerlab.org/result.php?resultid=242756" is from an optimized/third party boinc client. |
Trog Dog Send message Joined: 8 Aug 06 Posts: 38 Credit: 41,996 RAC: 0 |
Why keep the old system in place? G'day feet1st You say you presume that the old system will eventually be abandoned, but so far there is nothing from the project devs to confirm this. Maybe I'm reading things wrong, but the official message as I read it is the old system will be kept in place and both will be reported,but the old system will continue to be exported. |
Trog Dog Send message Joined: 8 Aug 06 Posts: 38 Credit: 41,996 RAC: 0 |
G'day Kevint The new optimised Einstein apps are in beta testing at the moment and lead to significant speedups, once released as the official app the credits will be adjusted. The adjustment will be to bring it back into line with the other projects. |
Ringold Send message Joined: 13 Aug 06 Posts: 2 Credit: 26,104 RAC: 0 |
Thats good to hear; otherwise it'd be a silly arms race with scientists that should be more mature than that. Anyway, this whole issue is astounding. I think the new method seems to be the closest R@H can get to Folding@Home's method of granting credit, and F@H is as fair as you can get. What I see is a knee-jerk reaction to no longer being able to cheat (tralla; anyone using an 'optimized' client is attempting to get higher points than the masses and justifications of lame benchmarks is merely an excuse since everyone has lame benchmarks -- except the cheaters with optimized clients. Dictionary could tell anyone as much!) and get sky-high points. Everything else is not understanding the credit system. What else could be had against a system that promises fair credit distribution? Only fault with the plan seems to be thinking that the majority of the RALPH pool will be more 'pure' than the R@H pool of users.. if tralla is an example, maybe not the case? If F@H's system can be used directly here somehow I think it'd be the best possible solution instead of trusting the masses. As long as BOINC is open-source or can be reverse engineered, there will be corrupted clients, and as long as there are corrupted clients, there will be people complaining (and rightfully so). And regarding what seemed to be a misconception about the 'golden machine' rule in general.. Why would you need 60 machines to establish a baseline? Yes, a P4 3.0ghz will behave differently than a A64 X2 4800 or an Core 2 Duo 6700. The other two will probably get more work done in a unit of time even at the same clock speeds (depending on optimizations). Therefore, they'll pound out more WU or at least WU worth more points. Thats the point, being credited based on scientific output, not CPU time or even artificial measurements of theoretical FLOPS preferably. Some CPU's get more work done than others, obviously! That type of system can't be tampered with in terms of rewarding points for science. I don't visit the main F@H forums often, but I do the top leading teams (HardOCP & those aussies) and virtually never has there been dissent over credit claims. I frequent the [H] board daily, and actually cant recall it ever being an issue. There really can't be; its set server side based on the baseline, and that is that. If one wants to 'cheat' and get more points than one can overclock to the limits of stability, but beyond that, no real cheating. Anyway, looking forward to some RALPH WU. Looks like this will be an important project, and someone indicated RALPH gets light WU streams. Finishing up 2 CPDN models on an X2 but want to help R@H in the mean time since I hadn't in a while so this will be perfect. |
Astro Send message Joined: 16 Feb 06 Posts: 141 Credit: 32,977 RAC: 0 |
Here's the latest. Hope it's visible as it was shrunk a great deal. All results are from boinc 5.5.11 (standard). All results in green are from the 2cr/model run, results in black are from this latest run. |
NJMHoffmann Send message Joined: 17 Feb 06 Posts: 8 Credit: 1,270 RAC: 0 |
Here's the latest. Hope it's visible as it was shrunk a great deal. All results are from boinc 5.5.11 (standard). Why so many statistics with this 2 cr/model? Ralph is testing the software, the credits are nonsense now. They tell us nothing, because Rosetta will have different fixed credits/model for each WU-type. What might be interesting is, if the times/model for a given WU-type are consistent enough to go to a fixed credit model. Norbert |
Hoelder1in Send message Joined: 17 Feb 06 Posts: 11 Credit: 46,359 RAC: 0 |
FOR TESTING: I also made the credit/model value for each work unit determined from the most recent average so the work credits should be more accurate for different work units (particularly as more results are returned) rather than the 2 credit/model value. This way, we can get an idea of how the credit granting will be for different sized work units. This is just for testing though. For R@h, we will use the average value from the Ralph runs so everyone will get the same credit/model for a given work unit rather than a value that may change a bit initially.I clicked through a number of recent results pages and it seems that the current credit/model averages are significantly higher than what the old crediting system would have assigned for Windows/standard client computers. I wonder why this is the case. Are you perhaps using the mean instead of the median to do the credits/model averaging and are thus affected by outliers on the high side ? I really think you should use the median which will make your averages largely independent of any low and high outliers and automatically 'select' the (Windows/standard client) majority population as averages. Another option would be a weighted mean (weighting factors 1/value, mean = n/Sum_1..n(1/v_i)) which de-emphasizes the high values. Here is an example to demonstrate the effect ot the different averaging methods: For the 10 values 2 3 5 5 5 5 5 10 15 20 the mean is 7.5, the median is exactly 5 and the weighted average would be 4.88. Would be interesting to hear your thinking on that. |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
G'day feet1st You are correct, I'm reading in to their statement that they are "changing the credit system". From that I infer that eventually the new system, whether in it's initial form, or after some revisions, will be used for credit reporting. And that the dual credit system is a temporary measure as we all become familiar with the new system. |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Hoelder1in 1) Calibration:Feet1st, I don't think what you say under 1) and 2) is correct... I had a fairly lengthy reply all written up when my satilite internet connection dropped :( Basically, I agree that all of #1 I had described is specuation on my part... suffice it to say there is a black box #1, which is some means of calibration (i.e. defining how large "bushels" are) and once that is in place, then the process described in #2 and #3 can proceed. I see your point about using the BOINC reported credits and the averages David Kim described... I'd still like to see a detailed explaination of the plan, and how a slow & fast machine on Ralph will impact the credit award per model that is determined. The apples to apples comparison is based on personal experience :) My first gainful employment, age 12 I believe, was as an apple picker. Some days you'd be assigned a row of trees which grow larger apples (i.e. less hand motions to fill a bushel), and other days trees that grow smaller apples (if your hands are large enough, sometimes you can grab more then one apple per grab). For some people the small apples take longer to fill a bushel, for others they could fill faster. Depends on your hands, experience, how far you can reach from the ladder without climbing down and moving it, etc. etc. ...but the farmer knew when he was assigning you tougher trees, and made a point of rotating around so everyone had to share the task. |
dekim Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 20 Jan 06 Posts: 250 Credit: 543,579 RAC: 0 |
FOR TESTING: I also made the credit/model value for each work unit determined from the most recent average so the work credits should be more accurate for different work units (particularly as more results are returned) rather than the 2 credit/model value. This way, we can get an idea of how the credit granting will be for different sized work units. This is just for testing though. For R@h, we will use the average value from the Ralph runs so everyone will get the same credit/model for a given work unit rather than a value that may change a bit initially.I clicked through a number of recent results pages and it seems that the current credit/model averages are significantly higher than what the old crediting system would have assigned for Windows/standard client computers. I wonder why this is the case. Are you perhaps using the mean instead of the median to do the credits/model averaging and are thus affected by outliers on the high side ? I really think you should use the median which will make your averages largely independent of any low and high outliers and automatically 'select' the (Windows/standard client) majority population as averages. Another option would be a weighted mean (weighting factors 1/value, mean = n/Sum_1..n(1/v_i)) which de-emphasizes the high values. Here is an example to demonstrate the effect ot the different averaging methods: For the 10 values 2 3 5 5 5 5 5 10 15 20 the mean is 7.5, the median is exactly 5 and the weighted average would be 4.88. Would be interesting to hear your thinking on that. Currently, I am just using the quotient of the claimed_credit and model totals for each work unit batch which get saved in a project specific table that we created. I wanted to avoid having to query the result table which would be necessary to get the median. I'd also have to add a project specific column to the result table which would hold the model count and I'm trying to avoid modifying the BOINC tables. I could use a correction factor if the descrepancies are significant enough. Can you point me to the results that you are talking about? |
Hoelder1in Send message Joined: 17 Feb 06 Posts: 11 Credit: 46,359 RAC: 0 |
Currently, I am just using the quotient of the claimed_credit and model totals for each work unit batch which get saved in a project specific table that we created. I wanted to avoid having to query the result table which would be necessary to get the median. I'd also have to add a project specific column to the result table which would hold the model count and I'm trying to avoid modifying the BOINC tables. I could use a correction factor if the descrepancies are significant enough. Can you point me to the results that you are talking about?OK, I understand, if you only have these two numbers available that's of course the only thing you can do. I simply looked at the granted to claimed credit ratios on some of the recent results pages that looked like they were using the standard client. Here are a few examples: 1, 2, 3 (first six). |
dcdc Send message Joined: 15 Aug 06 Posts: 27 Credit: 90,652 RAC: 0 |
Hi everyone - I've just registered for Ralph, although I've not really had any problems with Rosetta in the past so i've never connected any computers up to it. I could drop a 12hr/day P3 1GHz on it in a few weeks if that's any use? I read the entire thread last night(!) and I think the new credit system is definitely along the right lines. I'm not sure about using Ralph to calculate a WU mean score though if the Ralph app differs from the Rosetta one (doesn't it produce debug info etc?). What about when you release new versions onto Ralph that take more or less time than the current Rosetta release? It might be wise to hand pick some known dependable machines for this task, whether on Ralph or Rosetta, and use those to calibrate the credits per decoy, rather than using the whole of Ralph as a test bed. They would just have to have a consistent hardware config (and preferably on 24/7 for a fast turn-around time), and you'd need to monitor/be informed if this changed. Would that take a lot of work? I guess you'd need a table of 'golden machines' that would be sent the jobs first, then the credit per decoy could be calculated before general release. Of course the 'golden' machines would be given whatever credit they request, so they'd have to be from trusted sources, but there are plenty of those available. I don't mind submitting a machine that will have a constant hardware config - i'm going to be putting a backup server in the loft at some point so it'd be fine to run that.
I've requested this over on the Rosetta forums as I'd like to have a look at this too (although this is probably a much better place to ask!). It seems to me it would probably be a fairly simple task to backdate the new credit model fairly accurately for all previous jobs, using computers which are known to have the same hardware/software config from the start, then calculating a credits/hour figure for each computer and then using the number of decoys/time taken to calculate the number of credits due for each type of WU decoy. I know there is a risk that some people will use this as fuel for more flaming, but I think we'll be able to get a good idea of whether the credits can be backdated reasonably accurately from this. Could a table showing the WU name, number of decoys produced, time taken and the benchmark score be released? I know it's gonna be a big file! Danny |
Hoelder1in Send message Joined: 17 Feb 06 Posts: 11 Credit: 46,359 RAC: 0 |
So you already did apply a correction factor, didn't you ? :-) I just randomly picked 13 standard client results that were sent out after midnight (UTC) and calculated the granted to claimed credit ratios. I get a mean of 1.16 with standard dev. 0.35 and an error of the mean of 0.10. So within the sampling error of this small sample your correction factor seems to be dead on. Presumably, you calculated it from a larger sample, so it probably is even more accurate. I guess the correction factor would have to be reviewed occasionally to correct for changes in the composition of the Ralph participants (Linux/Windows, standard/optimized client).Currently, I am just using the quotient of the claimed_credit and model totals for each work unit batch which get saved in a project specific table that we created. I wanted to avoid having to query the result table which would be necessary to get the median. I'd also have to add a project specific column to the result table which would hold the model count and I'm trying to avoid modifying the BOINC tables. I could use a correction factor if the descrepancies are significant enough. Can you point me to the results that you are talking about?OK, I understand, if you only have these two numbers available that's of course the only thing you can do. I simply looked at the granted to claimed credit ratios on some of the recent results pages that looked like they were using the standard client. Here are a few examples: 1, 2, 3 (first six). |
Trog Dog Send message Joined: 8 Aug 06 Posts: 38 Credit: 41,996 RAC: 0 |
G'day feet1st Cheers feet1st It would be nice to get some official confirmation that this is the case. |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
It would be nice to get some official confirmation that this is the case. Agreed. David Kim mentioned preparing a more detailed description, and from the discussion here, he's got a pretty good idea what people on Rosetta will need clearly described in order to understand the new system. So, I'll wait patiently. |
Message boards :
Current tests :
New crediting system
©2024 University of Washington
http://www.bakerlab.org