Message boards : Feedback : Limit Ralph WUs per day
Author | Message |
---|---|
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
I don't know how quickly the WUs are getting consumed on Ralph after they are made available. I suspect that in just an hour or two they're all gone. I would think it would be better to have more hosts test fewer WUs each, rather than some hosts getting many and other hosts getting zero. Wouldn't it make sense to set the maximum WU per day on Ralph down very low, to like 5? That would give some WUs, but only 5, and that would tend to leave some available for other hosts to test as well. I think the problem is that by leaving Ralph active, it builds up a lot of debt, so by default the client tries to grab a full cache of WUs for Ralph. So, people aren't TRYING to "hoard" WUs, but BOINC ends up doing it for them. Another reason to limit to 5 is the way you vary the Ralph WUs. BOINC might think the next batch of WUs will take the same time as the 10 model WUs, and yet the new ones might run for my 16hr preference, and thus I'd end up with too much work downloaded due to the bad estimate as well. Limit WUs per day to 5, in hopes that Ralph can keep WUs available for at least 8-12hrs before they are all assigned. If WUs are still exhausted in <4hrs, then reduce to 3 per day. |
tralala Send message Joined: 12 Apr 06 Posts: 52 Credit: 15,257 RAC: 0 |
I read the linked thread and if I understand it the decision was already taken to limit maximum WU per host to something like 12 (I have 20 at the moment). Rhiju uses the term quota for that, right? I think there is even a better way to ensure minimum turnaround and most diverse testing: shorten the deadlines! I think a deadline of 2 days is ok given the frequent updates of the app. Actually the devs need the results within a day. A shorter deadline will also prevent hosts from builduing up big caches and there is no need to limit the quota to such harsh figures feet1st proposes. Also I would restrict the user option for run-time to 12 hours to ensure fast turnaround times (or deactivate it and send out WUs with a diverse set of runtimes as needed and as done for the latest 5.04 WUs). With short deadlines there is no need to limit the number of WUs server-side, since clients can't build large caches and you can cancel the obsolete WUs quickly after you release a new app. Any disadvantages for shortening the deadlines? |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
I read the linked thread and if I understand it the decision was already taken to limit maximum WU per host to something like 12 (I have 20 at the moment). Rhiju uses the term quota for that, right? The quota is stated in WUs "per CPU". In my case, it counts my dual-core as two, and I have a 2.5 day cache, so that might explain why I got 14 WUs. shorten the deadlines! I think a deadline of 2 days is ok given the frequent updates of the app. Actually the devs need the results within a day. A shorter deadline will also prevent hosts from builduing up big caches and there is no need to limit the quota to such harsh figures feet1st proposes. Also I would restrict the user option for run-time to 12 hours to ensure fast turnaround times (or deactivate it and send out WUs with a diverse set of runtimes as needed and as done for the latest 5.04 WUs). Well, I don't consider 5 per day "harsh". With 1000 active hosts, and only 3,300 WUs in progress, even now with many releases in just a short time... I was trying to be "fair", and get test WUs onto more hosts. I'm not positive, I agree shorter deadlines will mean work returned quickly, because you'll basically force EDF mode (earliest deadline first) on the clients if need be. But I don't think the client understands the deadlines in terms of scheduling what it will pull down. In other words, it may pull down more work that it can complete before the deadline, because your cache size told it to (if your cache size were greater than the 48 hr. deadline) Also, everyone seems very focused on getting fast turnaround time back to the project. This is great. I don't want to rain on the parade. But I am a computer programmer. As such, I want to see all typical user scenerios tested fully. And so the flaw in this rapid feedback approach (including the Ralph making them all 10 model tests) is that it prevents fully testing more typical user scenerios, such as clients that run a 24hr crunch preference. Hitting EDF or running only Ralph due to large debt build up means your BOINC client won't be switching to other projects and back the way it will on Rosetta... so you've really NOT tested your checkpointing as well as if you planned on your Ralph test for each version taking a week, rather than 12hrs (not slamming the team here, I understand, we want to kill these bugs and make R@H release ASAP, and putting a new Ralph version out several times a day when you have better code available makes perfect sense. I'm just saying future 1 week tests would afford much more variety in the user base to express itself in your results. It takes a week to crunch a single 24hr WU with other projects running, and how else could you test a 4day WU?). Because if this, I recommend no more than 5% resource share to Ralph (that's 1 day in 20). It will even out the WU flow to more hosts, avoid piling up large debt, and still afford thourough testing. I also recommend that code not be released on R@H while many of the Ralph WUs remain unreported. They may be STUCK for example, and thus proving more code changes are needed before rollout to R@H. And, while I'm on my soapbox, I would also recommend that Ralph WUs be included in the Rosetta science. Now obviously if a Ralph version turns up bad results, you throw them out. But if the version proves valid, and the results were useful, then we wouldn't all be so hesitant to create more Ralph WUs. Ralph WUs are GOOD. We need MORE of them. We actually need a STREAM of them to work down the debt so you aren't skewing your tests. If the Ralph results were treated the way R@H results are, then you may see points in time where the Ralph version is the same as Rosetta, and there are Ralph WUs available... and that's... O. K. It will actually give you a much more realistic client test environment. If a steady stream of Ralph WUs were implemented, then perhaps the 5% resource share recommendation could be amended. |
tralala Send message Joined: 12 Apr 06 Posts: 52 Credit: 15,257 RAC: 0 |
I agree on the topic that as many scenarios as possible should be tested on Ralph. However I still think fast turnaround times to be critical. With two week deadlines you now get reported back 4.99-WUs. They are clearly of less value at the moment than fresh 5.05-WUs. Short deadlines will not only force the client into EDF-Mode (and thus priorizing Ralph-Units) but also prevent BOINC from downloading more WUs. So with shorter deadlines you achieve the same effect as with limiting the quota: You prevent hosts building up large caches. But at the same time you avoid penalizing hosts which return a lot of errors quickly and you reduce average turnaround time. The only disadvantage is that due to EDF-Mode less switching will be done. But not all hosts will go into EDF, just those who still request large caches. The runtime can always be set up server side (as they already do now). So to summarize: Low quota: + no large caches - penalizing error-reporters Short deadlines: + no large caches + faster turnaround times - less switch-testing I see a 1 point-advantage of shorter deadlines. ;-) Best would probably a mix of short and long deadlines to guarantee the maximum variety of testing. |
Message boards :
Feedback :
Limit Ralph WUs per day
©2024 University of Washington
http://www.bakerlab.org