21)
Message boards :
RALPH@home bug list :
Bug Reports for 5.44
(Message 2687)
Posted 21 Jan 2007 by Chu Post: This update has some new rosetta applications added in, such as a preliminary version of rosetta protein design protocol and a special rosetta docking protocol which handles symmetric oligomers. The primary developers of those protocols will post more details about their applications. Please note that we are still working on adding thread synchronization features to the rosetta graphics and we are sorry that this update DOES NOT have the graphic-related problem fixed. |
22)
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.42 and 5.43
(Message 2671)
Posted 11 Jan 2007 by Chu Post: A bad batch, I think, maybe with bad memory management... The same on my hosts, like |
23)
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.42 and 5.43
(Message 2670)
Posted 11 Jan 2007 by Chu Post: I just posted it here. Sorry for the delay. Chu, |
24)
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.42 and 5.43
(Message 2646)
Posted 19 Dec 2006 by Chu Post: We suspect it is a problem of thread synchronization. Basically Rosetta working thread does the simulation which changes all the atom coordinates ( which are saved in shared memory) while the graphic thread tries to read data from that place to draw the graphic or screensaver. Currently there is no locking mechanism to ensure the shared memory is accessed by one thread at a time and this could generate some conflicts or memory corruption and then trigger an error. On one of our local computers, when screensaver or graphic is turned on, it caught errors at a rate of at least one per day on average and without any graphics, it ran flawlessly. The errors which have been observed include crashing(0xc0000005), hung-up (0x40010004) and being stuck( watchdog ending). All the errors were not reproducable with same random number seeds and we think that is due to the radomness in graphic process. Another side proof was that showing sidechains requires accessing shared memory more often and intensively, and after turning off sidechains and rotating, the graphic error rates drop but the problem is not solved completely. There seems to be an correlation between two. Anyway, our plan is to add a thread locking mechanism in the next release to see if this helps. This will probably happen after the holiday season. I believe the new boinc 5.8.x should also help to reduce the error rate. Thank everyone for helping test on this issue. |
25)
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.42 and 5.43
(Message 2630)
Posted 15 Dec 2006 by Chu Post: Hi gene, that job just crashed and did not freeze your computer, right? From users' report and my local test, it looks like that if a frozen WU is forced to be terminated, it reports error code as - exit code 1073807364 (0x40010004). If a WU just crashes itself without freezing the host computer, it will reports error code as -1073741819 (0xc0000005). I had a WU fail today, this message was in the log: |
26)
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.42 and 5.43
(Message 2629)
Posted 15 Dec 2006 by Chu Post: Sidechains, zooming and rotating has been disabled in the current application to help us narrow down the cause of graphic crash. So it is normal that you can not do anything on the screen and since Rosetta spends most of its time in high-resolution refinement ( moving backbone a little and refine sidechains ), it is also normal to hardly observe changes on the screen. However, the step number, cpu time should change frequently to reflect that the WU is still alive . If the WU is still working on generating its first model, it shows the progress at 1% for a while. Not sure about the slow graphic updating. Do you have other windows application running at the same time which also share cpu, memory and other resouces as well? I've got a live one! |
27)
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.42 and 5.43
(Message 2617)
Posted 14 Dec 2006 by Chu Post: Did it freeze (and you had to maually kill it) or just crash itself? Thanks. We are trying to increase stability in this release... We have turned off mouse rotation and sidechains temporarily. Please let us know if you can force a crash by playing with the "show graphics" option from the boinc manager, or with your screensaver! |
28)
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.42 and 5.43
(Message 2607)
Posted 13 Dec 2006 by Chu Post: I just put some docking WUs on ralph for graphic stability test and let's what will come out from that. Well, I'm now back to my problem PC, and was about to blow the train whistle --Whoo hooo!-- when I saw my screensaver MOVING, and it was on model 94. But then I noticed it only crunched the first WU for almost exactly 3hrs. Now that I've updated to project it shows watchdog ended it. And I commonly saw that same symptom on Rosetta once I activated the screensaver on the same host. |
29)
Message boards :
RALPH@home bug list :
Bug Report for Ralph 5.41
(Message 2574)
Posted 3 Dec 2006 by Chu Post: I think those errors (exit 161) are from a batch of problematic WUs, not general for the application as we have seen before. Old error 161 is still with us |
30)
Message boards :
RALPH@home bug list :
Bug Report for Ralph 5.41
(Message 2569)
Posted 1 Dec 2006 by Chu Post: Hi Krzychu P., thanks for reporting all these errors. If I remember correctly, I have also seen you reporting similar type errors for the previous versions of ralph applications and other types of WUs. From the stderr output, it looks like that Rosetta simulations were found to enter some bad conditions and triggered pre-mature exits. Although we are not 100% sure on what have been wrong, it is mostly likely due to a corrupted database file. For those WUs you have reported problems, they seem to be running ok on other clients' computers (on both ralph and boinc) with a fairly good successful rate. In other words, this type of error seems to happen on your computer much more frequently than average and this leads to my suspicion that there might be an issue of your computer to handle those input database files. I am not an expert on computer hardware and boinc setup and I just want to bring this to your attention. Do you use this computer also run Rosetta@Home? Have you seen similar errors for the WUs from Rosetta@Home? Have you noticed any other signs of a potential hardware or software problem? From the stderr file, I can see your computer is running a non-English version of operating system. Could that be the reason that some of the files are not input correctly? Maybe there are some other experts here who can have a better idea. Again, thank you for your contribution and support for our project. After about 40 minutes of computing: |
31)
Message boards :
RALPH@home bug list :
Bug Report for Ralph 5.41
(Message 2566)
Posted 30 Nov 2006 by Chu Post: The command line file is added for the project team. To test a lot of Rosetta parameters without changing the executable, we made them as input arguments from the command line. One impact of doing so is that Rosetta command line becomes longer and longer, difficutlt to remember and difficult to set up ( and more errors could slip through). The file is meant to help that aspect. In my personal opinion, this is a positive step, though still far away to go, to provide a more friendly control interface for Rosetta, such as to build up a graphic interface and a pull-down menu etc in the future. Sorry for not making it more clear on the watchdog issue. It did stop the WUs if the run is found to be stuck or running too long and it did preserve models which have been completed. However, the old behavior would throw an errror if there was no model generated before being caught by watchdog and with the fix, this should no longer happen any more. The empty result file (not really empty as it says it is from a watch dog error) will be returned and recognized by the validator and the credit will be assigned. Regarding the new feature to run Rosetta from a command file and a more flexible interface for setting up runs on BOINC... |
32)
Message boards :
RALPH@home bug list :
Bug Report for Ralph 5.41
(Message 2560)
Posted 30 Nov 2006 by Chu Post: Ralph has been updated to 5.41. In this update, several previously found bugs were fixed. Those are: 1. bug of do checkpointing even after Rosetta is finished. 2. bug of "error" deleting some intermediate files after they are gzipped. 3. watchdog failure -- when a run is stuck and caught by the watchdog, the results, if there is any, will be returned and validated. Credits will be assigned acoordingly. 4. some other bugs related to Rosetta Science. A new feature of reading Rosetta command from an input file is added and this gives more flexible interface to set up runs on BOINC. Thanks for everyone's support and please report bugs here! |
33)
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.37 through 5.40
(Message 2554)
Posted 23 Nov 2006 by Chu Post: Thanks. that WU is problematic. One more: |
34)
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.37 through 5.40
(Message 2553)
Posted 23 Nov 2006 by Chu Post: that has been fixed in current Rosetta code repository and will be included in the next ralph release. Pepo wrote:Also the version 5.36 is checkpointing after reaching 100% (instead of reporting the result) and then being preempted by other apps afterwards (possibly for a longer time, because of negative STD). |
35)
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.37 through 5.40
(Message 2539)
Posted 16 Nov 2006 by Chu Post: We finally got time tracking down this harmless but confusing "warning" output. It is due to a problem of not closing a file stream properly after opening it. The fix will be included in the next update. FRA_2rio_RIO2_hom002_6_2rio_6_1a06__IGNORE_THE_REST_10_1499_12_0 |
36)
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.37 through 5.40
(Message 2503)
Posted 8 Nov 2006 by Chu Post: Thank you all for the help. We are sorry that the file transfer bug was not completely fixed in 5.39 and that we had several updates in the last several days. The bug is very sneaky that it is only hit by a special combination of command line flags and only for some protein targets under some certain conditions, which makes local debugging difficult. Anyway, we believe this should be completely fixed in 5.40 as shown by our local preliminary test. We will put more tests on RALPH soon to confirm the fix. Thanks again for the patience and the generous support! |
37)
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.37 through 5.40
(Message 2493)
Posted 7 Nov 2006 by Chu Post: Sure, I am happy to answer that. A normal output file for Rosetta models contain for each atom the xyz position coordinates and some other necessary information. This file is just too large for BOINC application as there will be thousands and thousands of such files to be handled. Therefore, Rosetta uses a clever trick to compress the output which is called "silent_output". Under this mode, only the variables ( or degrees of freedom ) in the simulation is being output and these are normally the backbone and sidechain torsion angles ( phi, psi and chi ). By this means, the size of output file can be reduced by at least 30 fold. However, this requires us to reconstruct from these silent out files the normal rosetta output model files (with xyz positions in it) and to do so there is a critical assumption taken that a chemical covalent bond which connect any two atoms has a 'ideal" value for its length. Similarly, the angle composed by any three connected atom has its own ideal value too. So taking these ideal values together with phi/psi/chi angles, we are able to restore the positions for all the atoms in the protein model and we often refer the structure with "ideal" bond lengths and angles as "idealized structure". For the ab initio prediction, the output model is always "idealized" as it is folded with "ideal" geometry and an optimal set of phi/psi/chi angles. However, there are also some other important tests which requires starting from experimentally solved protein structures (native structures). Normally the bond geometries in these structures have a little bit different values from the ideal ones ( the ideal values are computed as an average over a large distribution of these value from experimental structures ). So in order to run these tests on BOINC, we need to add new functions to allow us to reconstruct protein models from non-idealized bond geometries and phi/psi/chi angles. On the client side, there is almost nothing changed except that the silent output file has one number for each residue which indicates whether it has ideal bonds or non-ideal bonds. The file size increase is very trivial with this new feature, but it opens the door for us to do large-scale tests on the experimentally sovled structures to understand better what are the features for these structures and how we can make Rosetta model more like those native structures. Hopefully this answers your question. May I ask? I hope a good description can be written for when 5.38 comes to Rosetta... WHY is "...outputting structures with non-ideal backbone and sidehchain geometries" an improvement? I know, useful to the science... please explain more, on the surface it sounds to a layperson like a step backwards. Also, what impact will this have on the user experience? Will it mean we'll see larger upload sizes on results? |
38)
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.37 through 5.40
(Message 2478)
Posted 6 Nov 2006 by Chu Post: Hi feet1st. That is really helpful. We have seen that error message ( in the debugging output) for quite a few times but no luck in finding a clue of what is the cause for that. Now for your reporting, we at least know it is somehow related to the graphic and I think it will help us a lot to investigate the real cause. V5.38 WU 315376 just crashed on my other machine. I just HAPPENED to be enlarging the native structure shown at the time of failure, so that was the first I'd brought up the graphic for this WU, rotated the lowest energy, then enlarged native and then crash and burn. |
39)
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.37 through 5.40
(Message 2476)
Posted 6 Nov 2006 by Chu Post: Thank you all for the help. We have already noticed that there are a lot of failures with error code -161 for the newly updated application 5.38. We are investigating the cause for it now... |
40)
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.37 through 5.40
(Message 2452)
Posted 4 Nov 2006 by Chu Post: If all goes well with this update, we'll probably update the main application on Monday. Look for some interesting new workunits with multiple copies of a protein -- these are attempt to simulate fibril formation. |
©2024 University of Washington
http://www.bakerlab.org