Posts by Moderator9

21) Message boards : Feedback : User of the day not changed since new BOINC release (Message 1576)
Posted 11 May 2006 by Moderator9
Post:
...I'm pretty sure I got the tone right.
But that's inconsequential.
This last answer is the one that should have come first.
...

You have nothing to apologize for. It was my fault. I was in a hurry and gave a shorter answer than I would normally. A new version of the Rosetta application had just been released when I found your message and I was hurrying to get a lot of things done, but did not want you to think you were being ignored..
22) Message boards : Feedback : User of the day not changed since new BOINC release (Message 1573)
Posted 11 May 2006 by Moderator9
Post:
...Nice. I must confess I was not expecting this tired old answer. Or attitude!

Soooo.... We volunteers are supposed to crunch and shut-up... This is rather a strong departure from the tone of the Rosetta Newsletter "Thank you for your past participation, and we hope you can provide additional support to Rosetta@home in the future."

Whith this kind of attitude, my reaction is to tell you that both Rosetta and Ralph can kiss my support good bye. There are plenty of projects out there that do show appreciation for the support.


Philippe


It is clear that you have misunderstood the content or tone of my post. May I direct your attention to this thread and this page of RALPH information.

RALPH is NOT a production environment. As such many things that are considered aesthetics are not priority issues for repair or change. The User of the day function is just such an item.

When you are attached to RALPH here will not be a steady stream of work. There will be times that your system may crash. You will not always get credits for any processing that is done. Team scores may be inaccurate, statistics feeds may not work or may be turned off at any time, the entire data base credits and all may be purged from time to time. The application and the work units run on RALPH may impact other projects you may be running.

None of these things should (or do) occur on Rosetta, as Rosetta is a production environment. Had the User of the day function been damaged on Rosetta, it would be fixed immediately, because that is where the project focuses its support for such functions. I am sorry if this offends you but ALPHA testing is not for everyone.

The User of the day problem has been reported to the project administrators (more than once). I am sorry if providing you with an honest answer to your question does not satisfy your needs, but I will not provide any user with false hope for a change, only to have to explain later why nothing happened to fix their problem.
23) Message boards : Feedback : User of the day not changed since new BOINC release (Message 1566)
Posted 10 May 2006 by Moderator9
Post:
We will also notice how long it takes to get fixed.
It's been a week already.


Actually it has already been a month. But it is not likely to get fixed any time soon as it has nothing to do with the purpose of the RALPH ALPHA testing. The focus here is on testing new Work Unit types, applications versions, and approaches to the science. None of the competitive aspects of the BOINC world are important for RALPH and no energy is spent in addressing problems in those areas.

I can assure you that the system administrator currently is working on more important issues. Should he become particularly bored at some point he may try to fix the User of the Day display, but it will be some time before that happens.

It is my understanding the UOTD function is not turned off but that something is wrong with the implementation of the function in the version of the server software running on RALPH. So it is not as simple as flipping a switch. In any case it has no impact on the functional testing conducted on RALPH.
24) Message boards : RALPH@home bug list : Bug reports for Ralph 5.11 and 5.12 (Message 1542)
Posted 8 May 2006 by Moderator9
Post:
I've suggested that if a given application version proves valid, then go ahead and USE the results for the science, and then they could keep a small but steady stream of WUs on Ralph and allow debts to be worked down and make things run more normally...

Make sense to me! Thanks for your reply, I will reign in my expectations (and my enthusiasm to some degree) and keep plugging away on Rosetta at the same time.

You can set the share for RALPH to a lower value. This works especially well if you are connect all the time (Not through a modem). The system will seek work according to your connection frequency settings, but it will not run up a high debt when there is no work because of the lower share value. When there is work it will have a high enough debt to always get at least a few work units.
25) Message boards : RALPH@home bug list : Bug reports for Ralph 5.09 and 5.10 (Message 1520)
Posted 6 May 2006 by Moderator9
Post:
... and an error box asking if I should send the report to MS. WU 90436
...

"MS"!! Well there is your problem right there. Send it to the attention of "Bill". ;>)
26) Message boards : RALPH@home bug list : Bug reports for Ralph 5.09 and 5.10 (Message 1518)
Posted 6 May 2006 by Moderator9
Post:
notice the graphics thread on rosetta production v5.07
uses only 10% cpu despite on web interface is set to 20%
and uses *all* available graphics space
and is very fast too , cause not less than 20 frames by second are allowed

*And runs at thread priority 4 too!

*Not switched yet to 640x480 16 bbp -:( I am busy now. Sorry

Those are MAXIMUM settings. If it does not need the whole amount it will not use it all.
27) Message boards : RALPH@home bug list : Bug reports for Ralph 5.09 and 5.10 (Message 1510)
Posted 6 May 2006 by Moderator9
Post:
Rhiju,

There is no doubt about it. The Display of the "JUMPING/STRAND Break Protocol" is a real cat pleaser. The thing looks like a well baited fish hook. But there does seem to be a problem with the new text display repeating for at least the first three models. With each repeat the overall graphic seems to stretch vertically distorting the display slightly each time the lines are repeated. It is as if a file or variable that should be overwritten is not being cleared correctly. But you have solved the problem of the graphics not fitting in the boxes. That part looks real good. One thing you might consider for the graphic is to keep the count of steps for the first model and display that as the project step count for subsequent models. Also if the RMSD and ENERGY of the target is known, that would be good to include so people can see if they are getting close as the models are running. Obviously this would only for known structures.

So far the only errors I have seen for Mac or Windows on any of these machines has been on the HOMO Work Units, which seem to just be recycles from other systems.
28) Message boards : Cafe RALPH : SOMEone PLEASE vote SOMEONE for user of the day!! (Message 1507)
Posted 6 May 2006 by Moderator9
Post:
As you may be aware RALPH is a test project. The purpose of the User of the Day on RALPH is to test the tolerance of the user community for consistency in this area of the homepage....

A real stress test for the community. ;-)

I fear we've failed this test :(

...please try another! :)

I have brought this to the attention of the folks who can change it. But there are higher priorities right now.

So if I understand you correctly the User of the Month idea is not popular?
29) Message boards : RALPH@home bug list : Bug reports for Ralph 5.09 and 5.10 (Message 1506)
Posted 6 May 2006 by Moderator9
Post:
not sure if this is a bug, more a cosmetic thing probably.
got a 5.09 running here right now, opened the graphics (not using the BOINC screensaver), maximized the window and i got a area on the right side, wider than the rmsd area, thats completely black without anything in it. (my screen resolution is 1024*768)
the description of what its doing right now looks like a nice new feature, it just looks a little dark on my sreen if you ask me. i can not verify what dimitris is seeing though, i only got the description once, and i am in model 3 right now.
edit: now at model 4 i got the description twice.
pic:


The large blank area is the result of the resizing to fit the proteins inside the right box. That blank area looks bigger on 16:9 aspect ratio than it does on normal monitors.
30) Message boards : RALPH@home bug list : Bug reports for Ralph 5.08 (Message 1491)
Posted 5 May 2006 by Moderator9
Post:
v5.08 has generated mixed results on my Linux box. Although several WU’s completed successfully, I’ve also had several result in computational errors:

http://ralph.bakerlab.org/result.php?resultid=102100
http://ralph.bakerlab.org/result.php?resultid=102101
http://ralph.bakerlab.org/result.php?resultid=102102
http://ralph.bakerlab.org/result.php?resultid=102103
http://ralph.bakerlab.org/result.php?resultid=102372
http://ralph.bakerlab.org/result.php?resultid=102373
http://ralph.bakerlab.org/result.php?resultid=102374
http://ralph.bakerlab.org/result.php?resultid=102375

These results came from a Work Unit type that had a problem. See this post suggesting they be aborted.
31) Message boards : RALPH@home bug list : Bug reports for Ralph 5.05 and higher (Message 1490)
Posted 5 May 2006 by Moderator9
Post:
So the short of this is, if the workunit is simply running uninterrupted, it could run forever, or until it hits the Max time setting. This is the risk of running a single project setup. If you don't see movement in the graphic, try suspending the Work unit and letting the system run a different one for 5 min. Then restart the first Work unit again for 5 min. Repeat this process 4 -5 times and it should abort the workunit if it was stuck. If it is not stuck it should let it keep running. Either that or we have a watchdog bug.



Does the "Max time" get checked even if the app is not swapped out? That could be it, as my computer was running in EDF mode, hence it NEVER got swapped.

May I suggest that these items, (flavors of the watchdogs) get checked whenever BOINC requests a checkpoint? I understand this is every hour or so. I realize that Rosetta doesn’t perform the checkpoint, but it could process watchdog duties.

Well it is really two separate functions that are fallbacks to one another. If the watchdog never has the opportunity to work (i.e. the work unit is never stopped and started for the check to occur) then the Work Unit will hit a wall for maximum time to process. The Max time function is independent of the watchdog and works on a different set of criteria and variables. he Max time is hard coded by the project before the Work unit is sent out.

Right now that max time on Rosetta is 24 hours. I think it is the asme for Ralph but Rhiju would have to verify that, because it could be different for each set of Work Units.

In any case you are correct. If you system was in EDF mode, the watchdog would not likely have kicked in. Perhaps that is a good reason to revisit how the checking is done.
32) Message boards : RALPH@home bug list : Bug reports for Ralph 5.05 and higher (Message 1482)
Posted 5 May 2006 by Moderator9
Post:
Version 5.09 has been released. If you have errors in Version 5.09 please report them in the 5.09 Bug thread.
33) Message boards : RALPH@home bug list : Bug reports for Ralph 5.05 and higher (Message 1478)
Posted 5 May 2006 by Moderator9
Post:
[This computer is headless. Remote access only. Hence no screensaver.

Mike, I use VNC to see the graphics on my remote monitorless, keyboardless, and mouseless puter. I click on the WU from the task tab and then view graphics. No screensaver here either. If it's a service install your hosed.

tony


It is a service install. I forgot about the "View Graphics button" I do VN into this computer. OK... 1.041% complete after 40 hours. Stage Full atom relax, Mode 1, Step 100, Accepted RMSD 50.36, Accepted Energy -19.40622 whatever this all means.


If it is a BIG protein you may have to wait for some time to see the steps advance, but you may be able to detect the slightest motion in the searching window image. If you see either the steps counting up or the movement in the searching window, it is still processing. On some of the large Work Units, it is possible for them to run very long times past your time setting. I would note however that yours is running way too long over the time setting. I have had a few lately that went 14 hours with a time setting of 2 hours.

The point being this. Unless the Workunit is either swapped out for project switching, or boinc is turned on and off four times the watchdog will never wake up and abort the work unit. Failing that the work unit will be aborted when it hits a limit preset by the project which SHOULD be 24 hours of CPU time.

My understanding is that it is designed to look at the Work unit each time it starts to process and determine of progress has been made since the last time it started up. This presuposes that the process was stopped for some reason. It does not just sit there checking the work unit all the time. If it never stops processing the workunit it will not check it. With luck Rhiju will chime in here and correct me if I am wrong about this, but I am going on the last explanation I had for all this.

Now let me add a caution here. If you restart BOINC before the workunit reaches a percent complete of greater than 2%, the Work unit WILL START OVER FROM THE BEGINNING AND THE CPU TIME WILL RESET TO ZERO!

So if you are going to play with starting and stopping. You should have keep in memory set to yes, and then suspend the Work unit or start another project long enough for another process to run for a while.

The watch dog is supposed to do 4 of these checks which show no progress before it will abort the workunit. That is part of how they worked out the "four times your time setting" concept for manual aborts.

So the short of this is, if the workunit is simply running uninterrupted, it could run forever, or until it hits the Max time setting. This is the risk of running a single project setup. If you don't see movement in the graphic, try suspending the Work unit and letting the system run a different one for 5 min. Then restart the first Work unit again for 5 min. Repeat this process 4 -5 times and it should abort the workunit if it was stuck. If it is not stuck it should let it keep running. Either that or we have a watchdog bug.
34) Message boards : Feedback : User of the day not changed since new BOINC release (Message 1465)
Posted 3 May 2006 by Moderator9
Post:
Seems ever since they took the Ralph servers down and installed the new BOINC server version, the user of the day has not changed.

Is there a process that needs to be started?

This has already been reported as a result of posts in a separate thread
35) Message boards : Cafe RALPH : SOMEone PLEASE vote SOMEONE for user of the day!! (Message 1462)
Posted 3 May 2006 by Moderator9
Post:
I was under the impression that UOTD was picked at random each day,


I'm just going by what it says when you view a profile. It says something like "recommend this profile for user of the day". I also saw an old post that said something about "why am I seeing the same few people over and over?" and the reply was that only a few people had created profiles, so the list to pull a random one from was very limited.

This function is most definately disabled, or perhaps there's a bug in the new server code version they recently installed on Ralph.

As you may be aware RALPH is a test project. The purpose of the User of the Day on RALPH is to test the tolerance of the user community for consistancy in this area of the homepage....

Ok, Ok, I'll tell someone about it. But don't be surprised if it takes a while for the thing to change. There are not many profiles to choose from.
36) Message boards : Number crunching : Checkpointing, more credits? Or more models? (Message 1460)
Posted 2 May 2006 by Moderator9
Post:


IF the Work Unit is removed from memory, it will always roll back to the last checkpoint. When it starts on my systems this will usually result in lost time as well. The clock does not keep rolling forward if the percent resets. This is why it is still a good idea to set keep in memory to yes.

All the project loose somme time because of this loss. CPDN and Rosetta are two of the more lossy in this regard, but all projects loose some time this way.


...so, on average, with the enhanced checkpointing, we should expect to see a credit increase throughout the project, along with increased project TFLOPS (which as you've pointed out elsewhere appear directly calculated from credits issued).

And well, not surprisingly that is precisely what has happened. If you look at the graphs on BOINCStats for Teraflops, and you have been watching the homepage of Rosetta, you can see the effect.

You have to ignore Friday because there is a spike caused by failed credit awards on Friday, but the project is showing about 27TF and there is a general trend upward. It rises and falls a little but still the trend is up.

The important thing is that only a week ago the project was stalled at about 24TF. That 3 TF gain is all about fixing the errors, and reductions in time lost from checkpointing issues. By my estimates there is about another 1TF that will come from additional error fixing. There could be another 2-4 TF still being lost due to long checkpointing. There is also about 2-3TF available if the Mac version of the application is fixed and optimized using Altivec coding. So there is still about 5 TF that could be squeezed out of the existing attach base of the project. This is all without adding a single system. Now to be fair there have been systems joining and returning every day so some part of the improvements comes from that as well.

37) Message boards : Number crunching : Checkpointing, more credits? Or more models? (Message 1459)
Posted 2 May 2006 by Moderator9
Post:


IF the Work Unit is removed from memory, it will always roll back to the last checkpoint. When it starts on my systems this will usually result in lost time as well. The clock does not keep rolling forward if the percent resets. This is why it is still a good idea to set keep in memory to yes.

All the project loose somme time because of this loss. CPDN and Rosetta are two of the more lossy in this regard, but all projects loose some time this way.


...so, on average, with the enhanced checkpointing, we should expect to see a credit increase throughout the project, along with increased project TFLOPS (which as you've pointed out elsewhere appear directly calculated from credits issued).

[And well, not surprisingly that is precisely what has happened. If you look at the graphs on BOINCStats for Teraflops, and you have been watching the homepage of Rosetta, you can see the effect.

You have to ignore Friday because there is a spike caused by failed credit awards on Friday, but the project is showing about 27TF and there is a general trend upward. It rises and falls a little but still the trend is up.

The important thing is that only a week ago the project was stalled at about 24TF. That 3 TF gain is all about fixing the errors, and reductions in time lost from checkpointing issues. By my estimates there is about another 1TF that will come from additional error fixing. There could be another 2-4 TF still being lost due to long checkpointing. There is also about 2-3TF available if the Mac version of the application is fixed and optimized using Altivec coding. So there is still about 5 TF that could be squeezed out of the existing attach base of the project. This is all without adding a single system. Now to be fair there have been systems joining and returning every day so some part of the improvements comes from that as well.

38) Message boards : Number crunching : Checkpointing, more credits? Or more models? (Message 1456)
Posted 2 May 2006 by Moderator9
Post:
At one point it was mentioned that we were seeing 3x productivity on clients with the new checkpointing. I haven't tracked things closely enough... when I lose work due to preemption, does the time spent reset back to the checkpoint? And the credits is based on time spent, right?

Or if time spent always rolls forward, then we'd just see more model completions per hour of time? (because less time is spent retracing the steps we had made prior to preemption).


IF the Work Unit is removed from memory, it will always roll back to the last checkpoint. When it starts on my systems this will usually result in lost time as well. The clock does not keep rolling forward if the percent resets. This is why it is still a good idea to set keep in memory to yes.

All the project loose somme time because of this loss. CPDN and Rosetta are two of the more lossy in this regard, but all projects loose some time this way.
39) Message boards : Feedback : need extra target cpu time options (Message 1453)
Posted 2 May 2006 by Moderator9
Post:
the home page says: Please set your ralph settings to match your rosetta@home settings so that we can truly simulate the rosetta@home environment.

the target cpu time options in ralph now need to be updated to match the new ones recently added to rosetta.

You are correct. I will bring this to the attention of the project team.
40) Message boards : RALPH@home bug list : Bug reports for Ralph 5.08 (Message 1446)
Posted 1 May 2006 by Moderator9
Post:
Hello again, i have this problem :

30/04/2006 19:19:51|ralph@home|Computation for result JUMP_RELAX_ALLBARCODE03_1tul_SAVE_ALL_OUT_463_16_0 finished
30/04/2006 19:19:53|ralph@home|Starting result JUMP_RELAX_ALLBARCODE04_1tul_SAVE_ALL_OUT_463_16_0 using rosetta_beta version 508
30/04/2006 19:19:53|ralph@home|Unrecoverable error for result JUMP_RELAX_ALLBARCODE03_1tul_SAVE_ALL_OUT_463_16_0 (<file_xfer_error> <file_name>JUMP_RELAX_ALLBARCODE03_1tul_SAVE_ALL_OUT_463_16_0_0</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error>)

What could i do ?

An error again :
30/04/2006 22:27:32|ralph@home|Computation for result JUMP_RELAX_ALLBARCODE04_1tul_SAVE_ALL_OUT_463_16_0 finished
30/04/2006 22:27:34|ralph@home|Unrecoverable error for result JUMP_RELAX_ALLBARCODE04_1tul_SAVE_ALL_OUT_463_16_0 (<file_xfer_error> <file_name>JUMP_RELAX_ALLBARCODE04_1tul_SAVE_ALL_OUT_463_16_0_0</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error>)


Rhiju commented on a similar error in This Post , but I am not certain yours is a Watchdog stop. The file errors are the same, and I know Rhiju had a post about that somewhere too, but I can't find it right now. He will pick this thread up again tomorrow if he does not catch it tonight sometime. My recollection is that this will self correct.


Previous 20 · Next 20



©2024 University of Washington
http://www.bakerlab.org