Posts by Moderator9

1) Message boards : RALPH@home bug list : Bug reports for Ralph 5.16 (Message 1691)
Posted 23 May 2006 by Moderator9
Post:
I have "NO searching..." and "Accepted" graphic at all!!!
The "Lowest" is broken into many pieces....

And the step counter is awfully slow!!!

Running for 8 minutes now, step is only at about 2600!
The rosetta WU I am crunching (CASP7, too) reached far
beyond step 100.000 within 8 minutes of crunching...

...

The relax phase is always slower, but the graphic should not look like that. Rhiju is aware there are problems with the graphic. I am not sure, but I think they are fixing it.
2) Message boards : RALPH@home bug list : Bug reports for Ralph 5.16 (Message 1685)
Posted 21 May 2006 by Moderator9
Post:
http://ralph.bakerlab.org/result.php?resultid=133961


Jose,

Your result posted a fountain of very valuable error data. I have sent a message to Rhiju with a link and asked him to review it. Thanks for attaching here, this should be VERY helpful. We should hear somethng back soon.
3) Message boards : RALPH@home bug list : Bug reports for Ralph 5.16 (Message 1681)
Posted 20 May 2006 by Moderator9
Post:
...
Anyway, has a cause of errors been identified already?

In part yes, and with the help of the people running RALPH they will eliminate it.
4) Message boards : Current tests : How to tell if crunching 1GB WU? (Message 1679)
Posted 20 May 2006 by Moderator9
Post:
You now have a class of WUs that are flagged for 1GB and higher systems. And you've asked us to monitor memory usage... presumably these 1GB units are going to need more memory than the norm... but how do we know which is which?

I've got one here called T0287_HOMOLOG_ABRELAX_hom019_539_85 that looks like I could wrap it around my HOUSE! An ya, it's using more memory, as one would expect.

Could you give us a feel for what you want us to be observing and which WUs are known to need more memory than the new and improved lower norm of about 100MB??

[edit] ...and is it 1GB per CPU? Or per system?

Actually they seem to be abandoning the idea. They have decided to run the very largest memory hogs in another area of the project. It seems the research was different than the main thrust of Rosetta to begin with. But more importantly, those work units had some very bad memory leak problems they are tracking down, and they can't put them out on the production system. So the focus now is to kill the last major bug, reduce the memory footprint of all the work units, incorporate the new techniques they are trying, to produce more efficient code, and optimize what they can in the applications.

To that end they are all very busy right now.
5) Message boards : Cafe RALPH : SOMEone PLEASE vote SOMEONE for user of the day!! (Message 1670)
Posted 18 May 2006 by Moderator9
Post:
Woooot a new user of the day :)

Anders n

I miss him already
6) Message boards : Cafe RALPH : Anyone out there? (Message 1662)
Posted 18 May 2006 by Moderator9
Post:
Hi, I'm wondering how many people are really reading these Ralph fora. Stop by post a note, say "HI". Technically, you could post a blank message and that'd be OK with me too.

thanks

tony

Well I read all the messages on all the Rosetta boards all the time. So yes I'm here too, and for reasons not dissimilar from your own.
7) Message boards : RALPH@home bug list : Bug reports for Ralph 5.16 (Message 1654)
Posted 16 May 2006 by Moderator9
Post:
d287__CASP7_ABRELAX_521_7

has been running for 6 hours and shows only 1.044% progress. This is running on a Mac.


Let it run. It is a test Work Unit for CASP7. It is probably just a large Work Unit. Do not be surprised if it suddenly jumps to 100% at the end of the first model. Do not stop Boinc Or Rosetta or it will start over at 0%.

If it gets to the place where is has run longer that about 5 times the setting for "Time" in your preferences, it will either be stopped by the "Watchdog" or you might want to consider aborting it manually at that time.

Keep us posted.
8) Message boards : Number crunching : Can you tell when the next checkpoint is for a given work unit? (Message 1652)
Posted 16 May 2006 by Moderator9
Post:
CPDN is quite nice this way, allowing me to minimize work loss when I must restart boinc or the OS. Recent memory issues with ralph have increased my interest in being able to do restarts with minimal loss.

I realize not all applications are predictable in the way CPDN units are, but I'm just wondering, is there any way to tell from Ralph graphics how close the next checkpoint is? E.g. my understanding is that rosetta checkpoints once per model; if that's correct then is there a known # of steps per model?
Or do you just wait for the model # or progress % to change and therefore know after the fact that a checkpoint just occurred?

They check point about every 20 min, CPU time, but there is no way to tell that I am aware of wehn intermediate checkpoints occur. When the whole number in the percent complete jump (say from 1.04% to 10.00%) that is a checkpoint, but it is also a model end point.
9) Message boards : RALPH@home bug list : Bug reports for Ralph 5.16 (Message 1643)
Posted 15 May 2006 by Moderator9
Post:
ERROR:: Unable to obtain sequence information.
fasta file must be provided.
ERROR:: Exit at: .initialize.cc line:236

http://ralph.bakerlab.org/result.php?resultid=125164
http://ralph.bakerlab.org/result.php?resultid=125315

I had a few of these as well. I feel certain they are bad Work Units.
10) Message boards : RALPH@home bug list : Bug reports for Ralph 5.16 (Message 1640)
Posted 15 May 2006 by Moderator9
Post:
Thanks for answer, Moderator9.
I've done 5 WUs with no issue, D 820 with 1GB RAM.
...still a bit memory demanding for average machine.

While is see many reports of the memory issue, I have not seen this on my machines. Both the Windows and Mac machines look normal and even approximate the memory foot print for einstein. In any case the application always moves out of the way if I use the machines for any other tasks.
11) Message boards : RALPH@home bug list : Bug reports for Ralph 5.16 (Message 1632)
Posted 15 May 2006 by Moderator9
Post:
Got one finished in ~ 45 min, memory usage unknown (finished sooner than I was able to check)
http://ralph.bakerlab.org/result.php?resultid=125948

Another one finished after 55 min, 230MB usage
http://ralph.bakerlab.org/result.php?resultid=125947

3 more to go...

Dumb question: apart from lower traffic of higher "Target CPU run time" - does having it set to let's say 12 hours brings "better" (i.e. more precise) results?


In a word, No. At least not yet. It simply makes more models. Rosetta is not like a lot of other projects where a particular refined result is sought by a particular machine within a specific work unit. However, the project is starting to experiment with approaches that do sort of "Learn" as they go, and refine the work based on each model produced.

The goal here is to provide each work unit with a different starting point, and have each system adjust the work looking for the lowest energy level it can find among the models it can create in the time allotted. So the more models made the better. However, ultimately all of the systems are actually searching for the same result. If we ever get to the point where the project can send out 10,000 work units and they all come back with the same result, and it correctly reflects the actual shape of the protein, then we will have reached part of the goal. If we ever get to the point where a single system can reliable predict a protein shape given the amino acid sequence, we are done.

Since what is being sought is a reliable way to predict the protein shape, at this point the more models returned the better. The hope is that among those returned will be the correct one. If that happens then it is presumed that the technique that was used could repeat the performance.
12) Message boards : RALPH@home bug list : Bug reports for Ralph 5.15 (Message 1624)
Posted 15 May 2006 by Moderator9
Post:
Im cruching WU MAPRELAX_TEST_hom021_1fna__514_125_0 using rosetta_beta version 515 right now!

The "phantom chain" and "broken chain" phenomenon is still NOT fixed, and I experienced another problem: in Model 2, exactly at Step 38000, RMSD jumped suddenly to 0 (!), while Accepted Energy was still changing...

IMHO, if RMSD of zero is reached (no difference to native) there should be no change at Accepted energy... Is this right?


I believe that the MAPRELAX workunits are running a CASP target. Since the RMSD is unknown there is no graph for it so it will always appear to be zero. The accepted energy will always rise and fall depending on the value for the current accepted shape.

I am not seeing the broken chain issue on my windows machine (not to imply you aren't), But I will watch for it. Sorry I did not check before responding but are you running linux? That may matter. If anyone else is seeing the broken chain please let us know.

You should see an improvement in the graphic text overrun in the description field soon, there is a fix for that.
13) Message boards : RALPH@home bug list : Bug reports for Ralph 5.15 (Message 1617)
Posted 14 May 2006 by Moderator9
Post:
I am seeing a very high error rate on MAC systems for the "MAPRELAX_TEST..." work unit type. The failure rate is at or near 100% for MACs. I suspect that there is another batch of bad work units for version 5.15 passing through the system at this time. If I am correct the storm should pass very quickly as these seem to error before any significant processing is done. If you see this on your system Rhiju has been notified and I am certain he will take steps to minimize the problem.

All of you should remember that you are saving the larger Rosetta community from a lot of discomfort by helping in the Ralph test program. I know the project staff is very appreciative of your contributions. Thank you for your help.
14) Message boards : RALPH@home bug list : Bug reports for Ralph 5.14 (Message 1613)
Posted 13 May 2006 by Moderator9
Post:
OS : WindowsXP Professional x64 Edition
CPU : Intel PentiumD 920 (2.80GHz)
Used RAM : approx. 115MB x2 at max. / 1GB
Graphic card: nVidia GeForce6600GT 128MB
BOINC version : the newest, 5.4.9

Work tasks - OK before closed
Graphic - OK

They worked fine without error at first. However, once BOINC client has been closed and restarted, the taskes which were being done more than half started from the beginning. Is it an error, or due to my preference of RALPH?

[color=darkred]If the work units start, and then you stop BOINC before about 25-40 min of processing, or in any case before the percent complete is more than 1.4%, when you restart BOINC they will start from zero. [color]
15) Message boards : RALPH@home bug list : Bug reports for Ralph 5.14 (Message 1608)
Posted 13 May 2006 by Moderator9
Post:
Fetched a new WU, this time it started w/o error.
RMSD is missing, I assume that it should be like that, because the native graphic is missing, too...

This causes the RMSD/Lowest Energy graphic to vanish, only a single red spot at the left edge is displayed.
And the description text is a bit too long.
(display end at "has very close seque")

So nothing serious, everything else works fine, even the graphics!

Accepted Energy is now below -216 for the second model.
Seems like the stranding algorithm improvements work fine.

Virtual memory load is about 132 MB, no clue what it was before...


All of the CASP7 target Work Units will have this display type. All that you describe is normal (except the long text overrun). Since they do not know the structure, they do not have the RMSD value, the Natural structure, or any other comparative information so it cannot be displayed. Because the RMSD is unknown, this forces the value to zero and the red dots all display at what would be the zero point of the RMSD graph (to the left of the box). As close as they can get to the graphic we all are familiar with is to show the accepted and lowest energy shapes as they occur. Rhiju has said they will work on the text overrun.
16) Message boards : RALPH@home bug list : Bug reports for Ralph 5.14 (Message 1605)
Posted 13 May 2006 by Moderator9
Post:
Had the same error using BOINC V5.4.9, WU aborted immediately, reporting this error:

Unrecoverable error for result MAPRELAX_TEST_hom007_1fna__510_3_0 (Unzul�ssige Funktion. (0x1) - exit code 1 (0x1))

ALL of these groups of errors look like a bad batch of Work Units. I have over 20 on each of my machines as well. I will bring this to Rhiju's attantion.

EDIT: Rhiju is "commuting" at the moment but I am advised that as I expected this is a bad batch of Work Units. Rhiju says to let you know-
"A new batch has been queued up. These should pass through very quickly. Sorry for the inconvience."
17) Message boards : Number crunching : 5.14 option? (Message 1604)
Posted 13 May 2006 by Moderator9
Post:
Debugging is optional... okay... how? Is it on, is it off? How do we test it?


See this post Item 2.
18) Message boards : Feedback : Upgrade server to support AMS (Message 1593)
Posted 11 May 2006 by Moderator9
Post:
Glad there is a vote of support, the AMS is going to be big, I only want this project to be up there with the rest,

Maybe upgrade this server?? Test it make sure it works then port it over to ROSETTA?

More info here for people who aren't sure what a AMS is

There are a number of issues that are server based that will all require attention. What is required is a critical mass of things that are broken now that would be fixed by upgrading. Currently the list has only one item that is affecting the science in an adverse way. All the rest of the issues, are of insufficient priority to disturb the stability of the project right at the start of CASP.

I have been collecting issues that are server based and providing them to the project as justification for upgrading, but we have not yet reached a "tipping" point. Part of the problem is that some of the things that are broken are not yet repaired in newer releases.

The good news is that it will happen. It just may not be for a few weeks yet.
19) Message boards : RALPH@home bug list : Bug reports for Ralph 5.13 (Message 1589)
Posted 11 May 2006 by Moderator9
Post:
Those three pics are taken within a few seconds and belong to the same model and same WU! And I noticed another problem: The pictures do not really show it, but the graphic sometimes displays a huge gap where the protein chain should obviously be connected!!! Don't know if this is a display bug only, or if it affects the correct setup of the chain!


In some cases the chains will appear to be broken because this is part of a new approach to building the model. In some of your images most of the model is not displayed at all and that is not correct, but if the entire amino acid chain is displayed but it is broken in a number of places that would be normal.
20) Message boards : RALPH@home bug list : Bug reports for Ralph 5.13 (Message 1588)
Posted 11 May 2006 by Moderator9
Post:
Hello again,
This Wu's are blocked to 1.04 % during 42 or 50 minutes then pass in 100 %. I do not believe that it is normal !
TEST_HOMOLOG_ABRELAX_hom002_t283__507_30_0_0
TEST_HOMOLOG_ABRELAX_hom003_t283__507_30_0_0
TEST_HOMOLOG_ABRELAX_hom003_t283__507_27_0
TEST_HOMOLOG_ABRELAX_hom005_t283__507_43_0_0


This is normal behavior if you have your time setting at one hour. It is also possible if you have your time setting set to default settings, that the project has set these test Work Units for a 1 hour turn around time to speed the testing results.

If any of the above is the case. The Work Unit will only process 1 model. It will only show 1.xx% until that model finishes and then it will jump to 100% and report. So what you are seeing is normal.

See this thread for more information.


Next 20



©2024 University of Washington
http://www.bakerlab.org