Message boards : RALPH@home bug list : Bug reports for 5.55
Author | Message |
---|---|
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Ralph 5.55 -- there's quite a bit of new stuff packed into this update. We'll be paying careful attention to the timer (reports of "percentage complete") as well as a new mode that folds and docks at the same time. |
ashriel Send message Joined: 3 Mar 07 Posts: 11 Credit: 648 RAC: 0 |
|
[B^S] thierry@home Send message Joined: 15 Feb 06 Posts: 20 Credit: 17,624 RAC: 0 |
Hi, I just get a WU 5.55: 1l2x__BOINC_INCREASE_CYCLES10_RNA_ABINITIO-1l2x_-_1868_11_0 It starts crunching with a % in Progress = 100%. But it continues to crunch. The screen saver is normal except that the % is written : 1 then 00000000.... through the entire screen. |
UBT - Mikeejones Send message Joined: 22 Mar 06 Posts: 2 Credit: 3,174 RAC: 0 |
I don't mess about if a WU says 100% complete and CPU time increases. Sorry but as soon as I saw that I aborted both WUs - been caught by this sort of thing before and wasted a lot of cycles! It may have carried on to completion but I wasn't going to try to find out just in case! https://ralph.bakerlab.org/workunit.php?wuid=416831 https://ralph.bakerlab.org/workunit.php?wuid=416907 refers |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
Another usability issue, which may be simple to improve is at step 340,000, which apparently is a magic number in the processing. This is where you clear out the histogram of energy and RMSD. It then "hangs" for 15 seconds or so, (more like a minute I suppose on a slower machine) and then takes another 10 seconds or so to do the first step or two after that. Any program that suddenly has portions of the screen blank out, and then shows no activity (unless of course you notice the CPU seconds counting up) for more then the attention span of the caffine-loaded viewer is immediately diagnosed as being "hung" and requiring manual intervention... (as if the 5 seconds you've waited already wasn't enough for the program to trash your computer if it was going to). ...anyway, if you could just NOT blank out those graphs until you complete the initialization or whatever is happening there as step 340,000 chuggs, then it would be a sizable smidge less alarming in appearance. It would be even better if you could impose a few more "steps" in to that long processing of step 340,000. |
Bober [B@P] Send message Joined: 18 Jun 06 Posts: 6 Credit: 15,427 RAC: 0 |
Hello I've got the same. But I'm not aborting them yet. |
idahofisherman Send message Joined: 7 Nov 06 Posts: 1 Credit: 9,435 RAC: 0 |
I am having the same thing happening. I will let the run for a couple of hours and then abort them if they have not completed. Hopefully this will not be a waste of CPU time, just a simple programming error. Please post a message when this is fixed as I have stopped this project from recieving any more task. |
Bober [B@P] Send message Joined: 18 Jun 06 Posts: 6 Credit: 15,427 RAC: 0 |
My 5.55 WU has just finished...no error...points granted - I think there is no need to abort them. |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Wow good eye! I haven't seen that hang, but your explanation makes sense. Let me see what we can do. On the other issues -- my mac screensaver says that the percentage complete is "inf%". This sounds like the issues reported below, too with large percentage complete values. Dang! I haven't been able to reproduce the Mac issue (process not found) noted on the R@H message boards yet. But I'm hoping to find a fix for the next update. Finally, one of our old style protein WUs is consistently failing, so I need to ask the other developer about that. Weird. Thanks for all the posts so far! This kind of quick feedback helps tremendously! Another usability issue, which may be simple to improve is at step 340,000, which apparently is a magic number in the processing. This is where you clear out the histogram of energy and RMSD. It then "hangs" for 15 seconds or so, (more like a minute I suppose on a slower machine) and then takes another 10 seconds or so to do the first step or two after that. |
ashriel Send message Joined: 3 Mar 07 Posts: 11 Credit: 648 RAC: 0 |
The WU mentioned above finished normally. CPU time (sec) - claimed credit - granted credit 3,347.68 -------- 9.89 ------------- 7.60
|
Pieface Send message Joined: 16 Feb 06 Posts: 64 Credit: 203,513 RAC: 0 |
This one errored out on 5.55: Resid 472582 1wrpA_BOINC_SYMM_FOLD_AND_DOCK-1wrpA-truncate__1873_21_1 ERROR:: Exit at: .fold_tree.cc line:809 |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Yup, looking at it. Hopefully will be fixed in the next update (tonight or tomorrow). This one errored out on 5.55: |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
I haven't been able to reproduce the Mac issue (process not found) noted on the R@H message boards yet. But I'm hoping to find a fix for the next update How about the other MAC issue where Ralph/Rosetta hangs after beening preemted and then resumed. I just checked my MAC and 1 WU on each project was hanging. Anders n |
feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0 |
This task is Rosetta, but was wondering, I've got 24hr run time preference... this bad boy has been crunching for 14hrs and isn't complete with model 3 yet. The % complete shows 42.1%. Still seems to be crunching just fine, but was wondering, does this mean it's only taken 1 checkpoint during this third model? Or, is there any way from the graphic to tell when a checkpoint has been actually taken? It's on step 395,000, so it must have been crunching for several hours. |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Hi feet1st -- sorry that workunit is taking a while. You're right that the WU isn't checkpointing until the end of the model, and that could cause a problem for some users that preempt often. We're working on a general checkpointing scheme for all modes, but it won't be ready for another week or two... This task is Rosetta, but was wondering, I've got 24hr run time preference... this bad boy has been crunching for 14hrs and isn't complete with model 3 yet. The % complete shows 42.1%. |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
Anders n, actually, wait, when did this start happening for you? Is there a discussion thread on this? I haven't been able to reproduce the Mac issue (process not found) noted on the R@H message boards yet. But I'm hoping to find a fix for the next update |
anders n Send message Joined: 16 Feb 06 Posts: 166 Credit: 131,419 RAC: 0 |
Anders n, actually, wait, when did this start happening for you? Is there a discussion thread on this? Se Bug reports 5.52-5.54. It started 18/3. Anders n |
Rhiju Volunteer moderator Project developer Project scientist Send message Joined: 14 Feb 06 Posts: 161 Credit: 3,725 RAC: 0 |
I see ... actually I thought this was a graphics bug, and thought it might be fixed in the latest update, but that's not the case. I wonder if I can reproduce it on my machine, switching between ralph and some other app. Anders n, actually, wait, when did this start happening for you? Is there a discussion thread on this? |
genes Send message Joined: 16 Feb 06 Posts: 45 Credit: 43,706 RAC: 20 |
Had these errors overnight on machines at work, so I didn't see what they did: resultid=471512 resultid=472465 One's a -161, other's an "incorrect function". I've got one running here right now that has the 100000000000000000000.... problem, resultid=471927, but it looks like it otherwise is operating normally, so I'll let it finish. |
Conan Send message Joined: 16 Feb 06 Posts: 364 Credit: 1,368,421 RAC: 0 |
Had this WU fail with MAXIMUM DISK SPACE EXCEEDED, I have many GigaBytes so this should not be the problem https://ralph.bakerlab.org/result.php?resultid=471223 Also had these two fail with the old ERROR -161, https://ralph.bakerlab.org/result.php?resultid=471479 https://ralph.bakerlab.org/result.php?resultid=471480 I currently have one running that may be a 5.55 or a 5.56 not sure, but it has jumped straight to 100% as some others have reported with the time to complete still going up but only 1 hour 40 minutes done on a 6 hour preferance. Windows machine. Strangely I have two others that have switched and are 'Waiting to run' but the Time to completion is still ticking over and also the percentage done is moving up, yet the CPU Time is not moving. I have a dual cpu dual core machine so 4 cores are running and they are all accounted for so Why is Boinc saying I have 6 cores doing something? Very strange. Linux machine. |
Message boards :
RALPH@home bug list :
Bug reports for 5.55
©2024 University of Washington
http://www.bakerlab.org