Posts by FluffyChicken

1) Message boards : RALPH@home bug list : Ralph (& Rosetta) bring down BOINC Client (Message 3311)
Posted 8 Aug 2007 by FluffyChicken
Post:
Hi, unfortunatly the error code jsut says you aborted the work unit.
But given most people are have no problems with Rosetta it must be your setup.
(Ralph I would at least expect some problems may happen as it is a testing project, though since noone else is reporting I doubt it is)

Given it's across multiple computers, what is common between them, some AV software, Firewall software etc..
OR as you've mentioned some project incompatability, Since QMC is running and that's also testing (RC at the moment ?)
You would need to see if Rosetta as the only running project (suspend the reset) causes the problem. QMC as the only running project casues the problem
Then both together.
You could probably speed problems up by altering the 'switch between projects' setting to a smaller value, say 15mins.

P.S. What are microsoft error messages saying casued the problem. You maybe able to find out easier in the Event Log.
2) Message boards : Feedback : Suggestion: More Users per work unit (Message 3074)
Posted 5 May 2007 by FluffyChicken
Post:
I'm not sure if the project is set up to fully mimick the Rosetta enviornment, but why not release the RALPH work units to many users? it seesm the people here willing to beta test are always begging for work. I joined the day the 64bit client was released, - and got no work.

Why not give work units to multiple users to collaborate and verify? giving 1 WU to 5 different users quituples the number of testers



I thinkthey usually put a few out, see if there are any bugs.
But it depends on the client and what they are testing.
Send a few tasks out, see what happens, then send more out if ok and/or fix the bug test a few again etc..


Also they used to test new job catagories/target out before putting them on to Rosetta, not sure if they still do.

There is no point in sending work out they already know is not going to be usefull anymore. Better the member do some work elsewhere in the mean time.
3) Message boards : RALPH@home bug list : 64bit app just added. (Message 3041)
Posted 2 May 2007 by FluffyChicken
Post:
ok I just checked the download page and it' x86_64 :)

both linux and windows I see.
4) Message boards : RALPH@home bug list : 64bit app just added. (Message 3040)
Posted 2 May 2007 by FluffyChicken
Post:
I don't see a post about this, but how are you implementing it.

I know its a copy of the 32 bit but what identifier is it. The official (or soon to be) is x86_64 , I know some projects do not use that and will need to change with the newer server code.
5) Message boards : RALPH@home bug list : Bug Reports for Ralph Server Update to BOINC version 5.9.2 (Message 2776)
Posted 6 Feb 2007 by FluffyChicken
Post:
Would this be an opportune time to remove the Q&A message boards and add a new board or two to the message boards?



oh, please, please, please do ....
I has to be really bad publicity when someone post in the Q&A and by chance someone (like me) happens to feel like checking the Q&A and is able to answer the question and/or point them to the main message board to ask it there where it will get answered....
6) Message boards : RALPH@home bug list : Bug reports for Ralph 5.42 and 5.43 (Message 2657)
Posted 22 Dec 2006 by FluffyChicken
Post:

- The more common usage of dual core processors today
- The increased level of comlexity in the graphics (the sidechains), this part of the 'theory of graphics crashes' coincides quite happily with the release of the docking program and the increased graphics deisplay.
- More active people reporting things on the forum (due to more R@H members overall)


K, but is it only dual core processors that are having this problem? I have noticed a low rate of work unit failure on another computer, which is a single cored 64-bit AMD processor, but I'm never watching closely enough to see if it is this same graphics failure, or some other problem (though I'll admit the problems did seem to stop with 5.43, but I also don't have any work unit history for this computer, it has been concentrating almost exclusively on CPDN for the last week or so, I'll go see if I can crash work units on it later).

Also, previous versions of rosetta were stable on my Athlon X2, or at least the crash rate was low enough that I didn't notice it. I believe the last stable release was 5.37 or something like that. If memory serves you could rotate and zoom on a molecule in 5.37, with no problems. Essentially, you've now reduced the level of the graphics complexity to below that of 5.37, and my computer is still crashing almost all of its work units. Right now I have one which hasn't done anything for about 40 minutes, but the time counter continues to increment, it is like the science code has stalled. I'll leave it to see if the watchdog kicks in. The important point is that I don't think (I'm not certain on this point) the graphics had been displayed at all. I'd noticed that the time remaining estimate was going up, not down, and decided to check on it.

My point is that I didn't start noticing work unit failures until release 5.41, and these failure occur on computers other than my dual cored X2.

In case anyone is curious I have stress tested my computer using Prime95, both cores (separately) with no problems. I've also tested my RAM using Memtestx86, the most recent version, again, no problems.



See option number 2,
It started (or was noticed a lot more) when the docking code came into it.

The part about duat/ht is that it is just more susceptible to the desyncronisation happening. I have also had a rare few fail on my P-M and Athlon64 and without graphics open. but it is nothing like what HT/dual people that play with graphics are reporting.

If they where really smart about it (they being Rosetta@home) they would put a tick box inthe proeferences to say 'I do not want graphics' and then they can sen the person a version with all the graphis ripped out of it, this often speeds up processing a touch (it does slighctly at seti) and decrease the size of the program along with the running memory requirements.

Personaly I would love that option.
7) Message boards : RALPH@home bug list : Bug reports for Ralph 5.42 and 5.43 (Message 2654)
Posted 21 Dec 2006 by FluffyChicken
Post:
Another thought... One of the posts by Chu was mentioning that there is no locking mechanism in place to prevent the science thread and the graphics thread from trying to access the same memory at the same time, which can cause a problem if it occurs (or at least that is what I understand the post to mean).

If it is the case that the current problem is caused by the graphics and science threads conflicting in this manner, wouldn't we have started seeing this problem a long time ago, like when graphics were first introduced? Why has the problem only started cropping up now?

Just a thought...


- The more common usage of dual core processors today
- The increased level of comlexity in the graphics (the sidechains), this part of the 'theory of graphics crashes' coincides quite happily with the release of the docking program and the increased graphics deisplay.
- More active people reporting things on the forum (due to more R@H members overall)
8) Message boards : RALPH@home bug list : Bug reports for Ralph 5.42 and 5.43 (Message 2651)
Posted 20 Dec 2006 by FluffyChicken
Post:
Chu,


Could you put that problem summary in the 'technical news' at the Rosetta@home site.

It would give people a definate place of what the problem is, it would also mean forum helpers could post a link to the news when the errors are happening.
9) Message boards : RALPH@home bug list : Bug reports for Ralph 5.42 and 5.43 (Message 2645)
Posted 19 Dec 2006 by FluffyChicken
Post:
Andrew, it sounds like you have a task that has slipped away from BOINC's control. We see this happen once and a while on Windows. It is almost like BOINC doesn't realize it is still running. And this may be why it's running during non-scheduled hours, cuz it doesn't respond when BOINC says "whoa!".

But you say it has been running for 157hrs. And THAT sounds like a task that has slipped away from the Rosetta (or Ralph) watchdog. Because it appears your run time preference is only an hour. The watchdog should have kicked in some time ago.

Ralph shows you've only got one task that is not yet completed and it was sent on Thursday. And by my math, 157hrs have not elapsed since Thursday, so now I'm really confused. In fact, with a Ralph 4 day deadline, times 24hrs... the deadline is 96hrs. I did see a case on Windows where a task managed to tally up more then one second per second of wall-clock time. This was due to the screensaver problems, and a hyperthreaded CPU.

Suggest you take note of the process ID at a minimum and that way you can tell for certain if it does end and another starts up. If the task really has run for that long, I'd abort it. Or at least suspend it and resume it again. But, since BOINC doesn't seem in contol of that task... I guess I'd end BOINC for 5 minutes, the task should then end. If it doesn't then reboot or end that process. Then restart BOINC.


Since 5.8.x is planned to be released shortly (shouldn't be to far away, but then they have said that before ;) The code is not going to change much, probably just sime simpleGUI fixes.

side / anyways, why are you running 24hr tasks on Ralph I thought they wanted them short over here, would create more application swapping as well.#
/side

anyways, it was the communication problem (0x40010004) still running and boinc getting confused problem I was wondering about, the swithcing between the screensaver graphics. There where quite a few fixes going on with screensaver/graphics and starting/stopping/stalling of tasks.
I thought a quick days testing on 3hr tasks should see if it is more stable. Seeing the rate you report them and seem to be able to cause it to happen ;-)

The other error (0xc0000005) has been in many projects, Rosetta before, Einstien about a year ago, CPDN as well and was certainly always related to the graphics.


Maybe a true test would be to replace the graphics with the default boinc one, see if it still happens .


10) Message boards : RALPH@home bug list : Bug reports for Ralph 5.42 and 5.43 (Message 2640)
Posted 17 Dec 2006 by FluffyChicken
Post:
My two docking WUs both ended prematurely (watchdog...20 credits). When I awoke this morning the screen saver was not hung, then I moved the mouse to take the ss down and it froze. Task manager shows one task getting 65% of my hyperthreaded CPU, and the other getting 35%. Odd thing is the WU that was getting the 35% was the one the graphic was being displayed for (you could tell by the elapsed time on it as shown in the graphic and in the CPU time shown in task manager).

Now that I ended the application that was not responding, I crashed a WU (- exit code 1073807364 ...a positive number?), but it was the one that was getting the 65% of CPU. The threads seem confused about who is doing what.


Have you tested it with 5.8.0 yet to see if there is a difference ?

11) Message boards : RALPH@home bug list : Bug reports for Ralph 5.42 and 5.43 (Message 2637)
Posted 17 Dec 2006 by FluffyChicken
Post:
it reports error code as - exit code 1073807364 (0x40010004).



I believe 0x40010004 means 'task is/was running' which would correspond to it being forced to close (via task manager or the hung program do you wish to kill it question.)



Maybe you should give out the graphics code (and calling messages) see if anyone in that field can debug and help you.


12) Message boards : RALPH@home bug list : Bug reports for Ralph 5.42 and 5.43 (Message 2636)
Posted 17 Dec 2006 by FluffyChicken
Post:
Have you given Jack Schonbrun a ring ? If I remember correctly he did the initial graphics setup, he may know an error or two ...
(though David Kim did the rotation if memory serves me right)
13) Message boards : RALPH@home bug list : Bug reports for Ralph 5.42 and 5.43 (Message 2590)
Posted 12 Dec 2006 by FluffyChicken
Post:
Maybe you should also note in the front page news say in bold that it is graphics you are testing and please play or turn on the graphics/screensaver.

Hint big time.



Also put a request in the Rosetta news that anyone who uses the screensaver and has seen problems with crashing/failed tasks then please attach to Ralph to help in the testing (link to ralph as well)

14) Message boards : Number crunching : Source Code - Who is looking at it ? (Message 2537)
Posted 16 Nov 2006 by FluffyChicken
Post:
I know some people outside of bakerlabs are looking at the Rosetta@Home source code.

But who is looking at what and how are things coming along ?

15) Message boards : Current tests : Are we testing anything over here at the moment ? (Message 2240)
Posted 26 Aug 2006 by FluffyChicken
Post:
Since the credit testing has been done, what's to do ?
16) Message boards : Current tests : New crediting system (Message 2239)
Posted 26 Aug 2006 by FluffyChicken
Post:
I hope someone rectifies this soon, you should certainly not have an account deleted for posting public information. Yes hide the post by all means (if the mod was unsure and wanted to protect something until he could get hold of the Administrator/David Baker)
Bad, Bad ,Bad managment.
Especially as they're new mods, they certainly shouldn't be doing that!

David Kim deleted the account of Aaron Finney which was absolutely necessary. He first deleted a few flaimbaits of him, than Aaron started to post the email and phone number of Dr. Baker together with a call to complain to David Baker by phone. This post was also removed and after he repeated it, his account was deleted. It was long after David Kim wanted to go to bed and in order to avoid more damage while he was at sleep the account was deleted. Had the ban feature be available over on Rosetta he might decided to just ban him, but the deletion was certainly justified.

Guys, please don't trust all the accusations which are thrown towards the project management at the moment.


Repeatedly posting is bad especially when you've been told not to, it is rosetta board after all not ours. Though still I don't get the point of the ban, you can just sign back up.
I had also assumed they had implemented the ban feature (since they updated the boards (or was that only here).
17) Message boards : Current tests : New crediting system (Message 2237)
Posted 26 Aug 2006 by FluffyChicken
Post:
I left rosetta too. 28k credits a month I was giving to Rosetta.. now 0.

I tried vocally to argue against the credit system on the rosetta boards - BOY WAS THAT A MISTAKE.

The german mod deleted my account because I posted David's email and the # to wash U (both of which are on his webpage.) SORRY - DIDN'T KNOW POSTING PUBLIC INFO YOU ARE GIVING FREELY AWAY ANYWAY WAS AGAINST YOUR RULES.

Don't expect me to reattach.

I have also removed my endorsement for this project from the International AIDS Conference. It's clear that your moderators have lost control of the project for you.



I hope someone rectifies this soon, you should certainly not have an account deleted for posting public information. Yes hide the post by all means (if the mod was unsure and wanted to protect something until he could get hold of the Administrator/David Baker)

Bad, Bad ,Bad managment.

Especially as they're new mods, they certainly shouldn't be doing that!
18) Message boards : Feedback : RSS feed logo - wrong one (Message 2226)
Posted 23 Aug 2006 by FluffyChicken
Post:
bug
The RSS feed for Ralph currently carries the Rosetta@Home logo, not the Ralph@home logo.


Way to fix it
It needs changing.
19) Message boards : Current tests : New crediting system (Message 2138)
Posted 16 Aug 2006 by FluffyChicken
Post:
mmciastro, maybe I'm reading your graph wrong (thanks for posting it btw). . . but it looks very consistant on a credit/hour basis for a given computer.

Is this the case? And if so, how do you feel about the credits/hour for the various machines (is machine 2 really about 2.5 faster than machine 1)?



I think your reading it wrong, the credit/hr (C/H) should be consistent as it's the boinc benchmarks credit per hour.
The G/H is the new method


EDIT:
Machine one is a Celeron 500, two is a Pentium 4 1.8GHz so probably not fa off.

http://i65.photobucket.com/albums/h228/mmciastro/Ralphnewprojectcomparison5.jpg



mmciastro, culd you set them to run at 3hr units. which I believe is the default here (and so I guess at Rosetta, I cannot check as it's under maintenace there) Since that's what the majority will be using, err, by defualt ;-)
20) Message boards : Current tests : New crediting system (Message 2137)
Posted 16 Aug 2006 by FluffyChicken
Post:
With what I've seen so far in one day with up to a 168% difference from lowest to highest credits for one single computer it's nowhere near ready to roll out on Rosetta.

You're moving to a cherry picking heaven at the moment I would guess.
Wouldn't be hard for some of the larger teams (or bord individual) to create a program grab the stats, see what the initial credits claimed are for that type and tell the team.
Fluffy, dcdc, tralala, I think you are mistaken: Cherry-picking is NOT possible !
The variability you are seeing in the credits is not between different WU types but because of the different completion times of the models _within_ the WUs. Even if, say, the first model of a WU takes a long time to complete this doesn't tell you anything about how long the following models will take. This is a completely random process. So terminating WUs that start with a 'slow' model won't help you either.

Fluffy, instead of 168% difference you could also say +/-45% difference with respect to the average. Example: average=10, 10-45%=5.5, 10+45%=14.5, 164% difference between lowest and highest. I think this is acceptable, considering that most values will be much closer to the average.



I know I could say that and I certainly wouldn't say a 45% difference was acceptible, yes it's better than the x3 optimised usualy give over the standard.
But the way I currently see it is that you may as well just count the number of models you creat instead of assigning a crdit value to it.
It'll save an lot of bother ;-)

All in all if you are going to go down this route I would have thought that you put everything into pending credit, then apply the awarding of credit till you have a statistically sound credit awarding per job type. Then adjust for time taken (using an internal timing procedure, not the boinc core client... This alters for actual work done not assumed work done)

Pending credit happens on any project that uses a quorum anyway so that's not really problem and it would only happen for the first 'however many you think necessary x00's of retuned jobs before it could start graning instantly again. Preferably from trusted clients you know of on Rosetta. Not here on Ralph. that free's up ralph to actually impove the client without getting in the way of credit. That should make it quicker as you'll get though far more task in a day than you would in a week in rosetta (I should hope).



P.S
How do you actually intend to get the 'credit per model' across all the platform. The only way I can see it is to build a client (if you really still want to use Ralph for the benchmarking) is to send out the ralph-client with an internal benchmark to the testers, that would bypass having to worry about the boinc client used.


Next 20



©2021 University of Washington
http://www.bakerlab.org