Message boards : RALPH@home bug list : Bug reports for Ralph 5.42 and 5.43
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
darknightcl Send message Joined: 21 Dec 06 Posts: 3 Credit: 36 RAC: 0 |
K, so I've been running Rosetta for over a year now, and only recently started having problems with the project. For instance, roughly 75% of the work units I download crash out with the funny graphics errors. One thing that I've noticed that I haven't seen anyone else mention is this. Sometimes the model displayed seems to "disconnect" at some point along the backbone. For example, it will be a nice continuous C-alpha trace (I assume this is a C-alpha trace of the protein), then it will suddenly have a break in it, like the protein has been cleaved with a peptidase. The two ends will sometimes wave around in a manner that is just not consistent with them still being connected, so it almost looks like there is a problem with the science code, like it is taking liberties (such as introducing breaks where it is convenient to have them) with the sequence, which doesn't seem probable, as that would undermine the science being done. The other possibility might be that there is a problem with the communication between the graphics code and the science code, but I'm not a programmer and cannot make any suggestions with regards to this. I'm running version 5.43 of the rosetta application, with CC 5.4.11 (the official release version). I have an AMD64 X2 4200+ with 1GB DDR 400 RAM, GeForce 6150 integrated graphics, latest directx (9c), Windows XP Pro SP2 (32-bit) with all the necessary patches. I'm not sure if the helps but I figured I'd mention it. |
darknightcl Send message Joined: 21 Dec 06 Posts: 3 Credit: 36 RAC: 0 |
Another thought... One of the posts by Chu was mentioning that there is no locking mechanism in place to prevent the science thread and the graphics thread from trying to access the same memory at the same time, which can cause a problem if it occurs (or at least that is what I understand the post to mean). If it is the case that the current problem is caused by the graphics and science threads conflicting in this manner, wouldn't we have started seeing this problem a long time ago, like when graphics were first introduced? Why has the problem only started cropping up now? Just a thought... |
FluffyChicken Send message Joined: 17 Feb 06 Posts: 54 Credit: 710 RAC: 0 |
Another thought... One of the posts by Chu was mentioning that there is no locking mechanism in place to prevent the science thread and the graphics thread from trying to access the same memory at the same time, which can cause a problem if it occurs (or at least that is what I understand the post to mean). - The more common usage of dual core processors today - The increased level of comlexity in the graphics (the sidechains), this part of the 'theory of graphics crashes' coincides quite happily with the release of the docking program and the increased graphics deisplay. - More active people reporting things on the forum (due to more R@H members overall) |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
...so it almost looks like there is a problem with the science code, like it is taking liberties (such as introducing breaks where it is convenient to have them) with the sequence, which doesn't seem probable, as that would undermine the science being done. Yes! They have a "jumping" algorythm you will see used on some tasks. It does just that, break at what are believed may prove to be pivitol points in the chain and then search around for what the correct reconnection of the two points might be. See discussion of Dr. Baker's journal and his original journal entry which started the discussion. I don't understand it all. But basically it is not a symptom of a graphic problem, it is a visual queue that Rosetta's efficiency in finding the structure is improving. |
darknightcl Send message Joined: 21 Dec 06 Posts: 3 Credit: 36 RAC: 0 |
K, but is it only dual core processors that are having this problem? I have noticed a low rate of work unit failure on another computer, which is a single cored 64-bit AMD processor, but I'm never watching closely enough to see if it is this same graphics failure, or some other problem (though I'll admit the problems did seem to stop with 5.43, but I also don't have any work unit history for this computer, it has been concentrating almost exclusively on CPDN for the last week or so, I'll go see if I can crash work units on it later). Also, previous versions of rosetta were stable on my Athlon X2, or at least the crash rate was low enough that I didn't notice it. I believe the last stable release was 5.37 or something like that. If memory serves you could rotate and zoom on a molecule in 5.37, with no problems. Essentially, you've now reduced the level of the graphics complexity to below that of 5.37, and my computer is still crashing almost all of its work units. Right now I have one which hasn't done anything for about 40 minutes, but the time counter continues to increment, it is like the science code has stalled. I'll leave it to see if the watchdog kicks in. The important point is that I don't think (I'm not certain on this point) the graphics had been displayed at all. I'd noticed that the time remaining estimate was going up, not down, and decided to check on it. My point is that I didn't start noticing work unit failures until release 5.41, and these failure occur on computers other than my dual cored X2. In case anyone is curious I have stress tested my computer using Prime95, both cores (separately) with no problems. I've also tested my RAM using Memtestx86, the most recent version, again, no problems. |
FluffyChicken Send message Joined: 17 Feb 06 Posts: 54 Credit: 710 RAC: 0 |
See option number 2, It started (or was noticed a lot more) when the docking code came into it. The part about duat/ht is that it is just more susceptible to the desyncronisation happening. I have also had a rare few fail on my P-M and Athlon64 and without graphics open. but it is nothing like what HT/dual people that play with graphics are reporting. If they where really smart about it (they being Rosetta@home) they would put a tick box inthe proeferences to say 'I do not want graphics' and then they can sen the person a version with all the graphis ripped out of it, this often speeds up processing a touch (it does slighctly at seti) and decrease the size of the program along with the running memory requirements. Personaly I would love that option. |
Trog Dog Send message Joined: 8 Aug 06 Posts: 38 Credit: 41,996 RAC: 0 |
Problem wu here |
Silver Streak Send message Joined: 11 Dec 06 Posts: 5 Credit: 216,369 RAC: 0 |
I had over 40 WU's err out in the last 2 hrs! |
Pieface Send message Joined: 16 Feb 06 Posts: 64 Credit: 203,513 RAC: 0 |
Likewise, tons of them, but at least they are going quickly, like in 90 secs or so. |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
All seem to fail with this error message " - exit code -1073741819 (0xc0000005) " Anders n |
Papagiorgio Send message Joined: 2 Nov 06 Posts: 3 Credit: 26,100 RAC: 0 |
|
Pepo Send message Joined: 8 Sep 06 Posts: 104 Credit: 36,890 RAC: 0 |
The same on my hosts, like Windows here: exit code -1073741819 (0xc0000005), Reason: Access Violation (0xc0000005) at address 0x0066C28D read attempt to address 0x0405FF98 (with full BOINC Windows Runtime Debugger symbolic output), or Linux here: Maximum disk usage exceeded, segmentation violation, with numeric Stack trace (12 frames). Peter |
Chu Volunteer moderator Project developer Project scientist Send message Joined: 26 Sep 06 Posts: 61 Credit: 12,545 RAC: 0 |
I just posted it here. Sorry for the delay. Chu, |
Chu Volunteer moderator Project developer Project scientist Send message Joined: 26 Sep 06 Posts: 61 Credit: 12,545 RAC: 0 |
A bad batch, I think, maybe with bad memory management... The same on my hosts, like |
darkpella Send message Joined: 30 Mar 06 Posts: 4 Credit: 15,691 RAC: 0 |
Hi, when ralph is suspended by Boinc Core to let anothe task run: 11/01/2007 20.38.31|ralph@home|Pausing task 1mkyA_TREEJUMP_ABRELAX__NEWRELAXFLAGS_LARS_TOP2_BARCODE__1607_25_0 (removed from memory) it doesn't get preempted nor it stops (i.e. it still runs at full power) hence abosrbing lots of CPU cycles form the task that should be running. I tried forcing Boic Core to make ralph run (suspending every other task) and then let it switch again to another task (simply resuming all other tasks, since it switched to EDF) but it didn't stop "rosetta_beta_5." form crunching at about 70% of the CPU time. I also tried suspending ralph as a project, but it didn't work either. Will try in a while rebooting to see what happens and let you know. Should you need any information before I reboot let me know ASAP. I'm running Win 2000 SP4 on a PIV at 2,53 GHz. Boinc version is 5.4.11 rosetta_beta version running is 5.43. bye darkpella |
Conan Send message Joined: 16 Feb 06 Posts: 364 Credit: 1,368,421 RAC: 0 |
> Had 21 WUs fail for various reasons, none should be Screensaver related as I no longer run it. Maximum disk usage exceeded, WU stuck Incorrect fragment size requested for Phi alignment https://ralph.bakerlab.org/workunit.php?wu=338885 Maximum disk usage exceeded, WU stuck, SIGSEGV:Segmentation Violation https://ralph.bakerlab.org/workunit.php?wu=384950 Exited with code 1 ERROR:Exit at:loop_relax.cc line:1798 https://ralph.bakerlab.org/workunit.php?wu=382149 https://ralph.bakerlab.org/workunit.php?wu=382150 Exited with code 1 Incorrect Function, ERROR:Exit at:.read_aa_ss.cc line:559 https://ralph.bakerlab.org/workunit.php?wu=382209 Exit Code -1073741819 Access Violation https://ralph.bakerlab.org/workunit.php?wu=382799, 382800, 382872, 382875, 383015, 383016, 383146, 383148, 383298, 383352, 383405, 383406, 383459, 383460, 384728. Computers are Opteron 275 (Linux), Opteron 285 (Linux) and 4800+ (Windows) Only the windows machine has a screensaver running but it is not Boinc screensaver so does not appear to be related to graphics problem, a faulty batch? Testing what exactly? |
[B^S] Dr. Bill Skiba Send message Joined: 15 Feb 06 Posts: 4 Credit: 6,496 RAC: 0 |
https://ralph.bakerlab.org/result.php?resultid=382713 A new one, at least for me. Work unit would not preempt. Kept on running even though it said it was preempted and boinc manager said another wu from another project was running. Ralph unit kept counting up both on cpu time and time to finish while saying it was preempted. BM said a uFluids wu was running, but cpu time stayed at zero and task manager showed Ralph using the cycles. I aborted it. |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
Errors to report https://ralph.bakerlab.org/result.php?resultid=387680 https://ralph.bakerlab.org/result.php?resultid=387679 ERROR:: Exit at: .rotamer_functions.cc line:1441 And https://ralph.bakerlab.org/result.php?resultid=387637 file_name>H4H6_1lis_PAIRWISE_DOCK_MCM_1619_4_0_0</file_name> <error_code>-161</error_code> Anders n |
Silver Streak Send message Joined: 11 Dec 06 Posts: 5 Credit: 216,369 RAC: 0 |
I had a few of these also, they occured over night. They seem to have ran a normal length of time before the error occured. </stderr_txt> <message> <file_xfer_error> <file_name>H1H7_1lis_PAIRWISE_DOCK_MCM_1619_1_0_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Great, I think I know what the problem is, and I'm sending them back out with the potential fix. I had a few of these also, they occured over night. They seem to have ran a normal length of time before the error occured. |
Message boards :
RALPH@home bug list :
Bug reports for Ralph 5.42 and 5.43
©2024 University of Washington
http://www.bakerlab.org