| Author | Message |
|
|
|
All right, while things over on BOINC are locking marvelous, here's another update on RALPH.
1.55 has three new things:
a) Fixes to deal with validator rejections for when the watchdog kicks in and when it says "too many restarts with no progress"
b) A very detailed debug information header which will hopefully help trace the problem inthe options system
c) ANOTHER bug fix in the BOINC API this time in the user preferences. THis bug lead directly to the phenomenon that Brotherbard managed to oint out by running his app in GDB. Awesome! Read about it here
Ramostol, this is relevant for you too, i think that's the same bug.
You two, could you set your settings back to restrict to specific days and see if it works now ? It did here :)
The lock file issue remains the last issue that we dont even have the faintest handle on, Apparently it's to do with setting the client to not allocate 100% of CPU.
Anyawy, please post reports here. |
|
|
|
|
|
hmm i think the graphics dont work with this one. not to worry. |
|
|
|
|
|
In fact in this WU
http://ralph.bakerlab.org/result.php?resultid=1278720
the graphic dont work.
|
|
|
|
|
In fact in this WU
http://ralph.bakerlab.org/result.php?resultid=1278720
the graphic dont work.
Yes i noticed as soon as i fired up my client. Its ok,
i'll track this down in the next version - i think its just todo with the fact i only updated the app and not the graphics_app.
|
|
|
|
|
|
Wow! Can confirm that my W2008 X64 server works ok for version 1.55.
Good to know it before preparing my new blade with 20 cores.
____________
|
|
|
|
|
|
I have 4 1.55 workunits that all start great even with weekday time limits set. Still waiting for them to finish.
Looking at the graphics app it looks like it has the same error as the main app. Here is the gdb output:
Initializing options.... ok
Loaded options.... ok
Processed options.... ok
core.init: command: /Library/Application Support/BOINC Data/projects/ralph.bakerlab.org/minirosetta_graphics_1.54_i686-apple-darwin
core.init: 'RNG device' seed mode, using '/dev/urandom', seed=-1656248255 seed_offset=0 real_seed=-1656248255
Initializing random generators... ok
core.init.random: RandomGenerator:init: Normal mode, seed=-1656248255 RG_type=mt19937
Initialization complete.
Opened semaphore
Breakpoint 1, 0x9603b4a9 in malloc_error_break ()
(gdb) bt
#0 0x9603b4a9 in malloc_error_break ()
#1 0x96036497 in szone_error ()
#2 0x95f60463 in szone_free ()
#3 0x95f602cd in free ()
#4 0x000a7cb6 in WEEK_PREFS::~WEEK_PREFS ()
#5 0x007d24a9 in GLOBAL_PREFS::~GLOBAL_PREFS ()
#6 0x001ad5a8 in get_shmem_name ()
#7 0x001ad634 in boinc_graphics_get_shmem ()
#8 0x00085dd8 in protocols::boinc::Boinc::attach_shared_memory ()
#9 0x000074b9 in app_graphics_init ()
#10 0x0000ca75 in boinc_graphics_loop ()
#11 0x000087f6 in main ()
And the stderrgfx.txt has a "Non-aligned pointer being freed (2)" error for each of the weekday setting just like the science app did.
The graphics start up fine if the weekday prefs are not set when BOINC first starts up. But if the prefs were set when I started BOINC then even if I reset the preferences the graphics apps will still not start up.
--Nathan
|
|
|
|
|
|
Yeah the graphics do work afterall. It's just the settings thing as you rightly point out NAthan. Thanks so much btw for identifying this bug. Its a major bugfix in the BOINC API and i would never have found 9or even suspected it) if i hadn't seen your trace !
Ok, so the graphics app needs to be recompiled too. ok, that's no problem. :)
|
|
|
|
|
|
The 4 workunits have run to completion successfully. http://ralph.bakerlab.org/results.php?userid=82
--Nathan |
|
|
|
|
|
i've some new 1.56 WUs with really nice graphics! |
|
|
|
|
|
a little treat for your guys ;)
|
|
|
|
|
|
Task: 1283328
Workunit: 1117100
Name: testD_cc_1_8_nocst4_hb_t360__IGNORE_THE_REST_2DO9A_7_7074_1_1
OS: Mac OS X 10.4.11
failed at initialization
<core_client_version>6.2.18</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
BOINC:: Initializing ... ok.
[2009- 1-30 7: 3:49:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing core...
Initializing options.... ok
ERROR: Unique best command line context option match not found for -user_tag
</stderr_txt>
]
(edited for readabilty) |
|
|
|
|
|
since we are on 1.56 now, should there be a new thread for that?
btw..very interesting color choice for the accepted energy line and the other line above the rmsd box. what does the blue and yellow mean and isn't there a purplish color in there as well?
just completed 2 of the 1.56 tasks with no problems on 4 hr run time. |
|
|
|
|
|
This task was valid and it got credit, but there are some error messages in stderr out:
...
Starting work on structure: S_shuffle_00012 <--- F_00007_0000861_0
Fullatom mode ..
Hbond tripped.
ERROR: dis==0 in pairtermderiv!
ERROR:: Exit from: src/core/scoring/methods/PairEnergy.cc line: 338
called boinc_finish
...
AdeB |
|
|
|
|
since we are on 1.56 now, should there be a new thread for that?
btw..very interesting color choice for the accepted energy line and the other line above the rmsd box. what does the blue and yellow mean and isn't there a purplish color in there as well?
Nope - its just prettyness. No extra meaning i'm afraid. |
|
|
|
|
|
Well, I am so disappointed ... no failure from any system ... sigh ... some other folks have all the fun ...
I suppose that is good news ... but seriously, running on 3-4 systems and not a failure in sight...
Almost as sad is that I have been running Rosetta on all systems but the linux box (too slow) and no failures there either ... |
|
|
|
|
|
HOLD THE PRESSES!
I got one failure
Windows XP Pro (32-bit), i7 ...
Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x005124B3 write attempt to address 0x3FF00000
App version 1.56 ... no idea what the gobble-de-gook is about the error for the rest of it ...
AT LAST ...
Of course it will be no fun if this is another of those .. "cannot reproduce" errors ...
{edit}
You know how we can attract testers ... crashed tasks get paid for by the project ... especially if the data finds a bug ... just a thought ... :) |
|
|
|
|
|
Version 1.55, Task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0.
Repeated "exited with zero status but no 'finished' file" problem.
BOINC logs:
2009-01-31 00:48:10|ralph@home|Starting _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0
2009-01-31 00:48:22|ralph@home|Starting task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 using minirosetta version 155
(...)
2009-01-31 01:04:19|ralph@home|Task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 exited with zero status but no 'finished' file
2009-01-31 01:04:19|ralph@home|If this happens repeatedly you may need to reset the project.
2009-01-31 01:05:12|ralph@home|Restarting task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 using minirosetta version 155
(...)
2009-01-31 01:14:32|ralph@home|Task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 exited with zero status but no 'finished' file
2009-01-31 01:14:32|ralph@home|If this happens repeatedly you may need to reset the project.
2009-01-31 01:14:49|ralph@home|Restarting task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 using minirosetta version 155
(...)
2009-01-31 01:31:09|ralph@home|Task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 exited with zero status but no 'finished' file
2009-01-31 01:31:09|ralph@home|If this happens repeatedly you may need to reset the project.
2009-01-31 01:32:25|ralph@home|Sending scheduler request: To fetch work. Requesting 1173 seconds of work, reporting 0 completed tasks
(...)
2009-01-31 01:48:42|ralph@home|Task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 exited with zero status but no 'finished' file
2009-01-31 01:48:42|ralph@home|If this happens repeatedly you may need to reset the project.
2009-01-31 01:49:46|ralph@home|Restarting task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 using minirosetta version 155
(...)
2009-01-31 02:06:39|ralph@home|Task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 exited with zero status but no 'finished' file
2009-01-31 02:06:40|ralph@home|If this happens repeatedly you may need to reset the project.
2009-01-31 02:07:45|ralph@home|Restarting task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 using minirosetta version 155
(...)
2009-01-31 02:15:11|ralph@home|Task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 exited with zero status but no 'finished' file
2009-01-31 02:15:11|ralph@home|If this happens repeatedly you may need to reset the project.
2009-01-31 02:15:28|ralph@home|Restarting task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 using minirosetta version 155
(...)
2009-01-31 02:31:56|ralph@home|Task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 exited with zero status but no 'finished' file
2009-01-31 02:31:56|ralph@home|If this happens repeatedly you may need to reset the project.
2009-01-31 02:31:56|ralph@home|Temporarily failed upload of _CAPRI17_T39_2_.sjf_br_one_docking.protocol__7228_256_0_0: connect() failed
2009-01-31 02:33:18|ralph@home|Restarting task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 using minirosetta version 155
(...).
As you may see, quite a waste of computing time. Before of that, other CAPRI17 task finished without any visible problems.
Finally the scheduler closed a time window for RALPH and started another project.
It was over 24 minutes, over 17% completed (however this last number is not really meaningful). What is interesting, boinccmd.exe --get_results claimed it was current CPU time 1455 sec, such as final CPU time, but the checkpoint CPU time was 1252 sec.
Finally I turned the RALPH on in the morning just to see what happens:
2009-01-31 11:34:25|ralph@home|Restarting task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 using minirosetta version 155
2009-01-31 11:34:39|ralph@home|Computation for task _CAPRI17_T39_1_.sjf_br_both_docking.protocol__7228_269_0 finished.
It has finished with a so-called success, however with 2 decoys, low credit and a "Too many restarts with no progress. Keep application in memory while preempted." notice.
I hope it helps.
Best Regards and have a nice weekend!
a.m. |
|
|
|
|
since we are on 1.56 now, should there be a new thread for that?
btw..very interesting color choice for the accepted energy line and the other line above the rmsd box. what does the blue and yellow mean and isn't there a purplish color in there as well?
Nope - its just prettyness. No extra meaning i'm afraid.
Triggered by this message i tried to look at the graphics of task 1285083, but the graphics screen didn't open.
I'm running Gentoo linux on a AMD Athlon XP with 512MB memmory.
In version 1.54 the graphics did work, the only problem there (and in previous versions) was the presentation of Total credit and RAC.
 |
|
|
|
|
|
On OS-X, no graphics with 1.56 ... so I too have the problem with graphics as AdeB reported below for Linux... |
|
|
|
|
|
Compute error
<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
[2009- 1-31 18: 7:40:] :: BOINC:: Initializing ... ok.
[2009- 1-31 18: 7:40:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/ralph.bakerlab.org/minirosetta_database_rev26003.zip
Unpacking WU data ...
Unpacking data: ../../projects/ralph.bakerlab.org/mtyka_lr5_D_score12_normal.zip
Setting database description ...
Setting up checkpointing ...
Initializing score function:
Initializing relax mover:
Starting protocol...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on PDB structure: S_rb1_2d4f_native_0000_withcon_00001 <--- S_rb1_2d4f_native_0000.pdb
Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x005124B3 write attempt to address 0x3FF00000
Engaging BOINC Windows Runtime Debugger...
|
|
|
|
|
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x005124B3 write attempt to address 0x3FF00000
Yea! Two of us with the same address!
Maybe it is not random after all .... :) |
|
|
|
|
|
The 1.57 version doesn't show the protein in the searching box. |
|
|
|
|
The 1.57 version doesn't show the protein in the searching box.
same here |
|
|
|
|
|
In this workunit running under 1.56:
http://ralph.bakerlab.org/workunit.php?wuid=1138646
The graphics works, but whenever it switches to the stick figure display, the Low Energy and Native sections of the graphics window get rather dim.
Also, it's a candidate for long-running models - at 4 hours 52 minutes into the requested 6 hours CPU time, it's still on model 2, step 81002, stage MoverBase-Minimization.
I already have some 1.57 workunits in the queue, so when they start, I'll try to check if they give similar results.
|
|
|
|
|
|
The searching box has the image in OS-x, but not in Windows XP in version 1.57 ... but at least the window comes up ... |
|
|
|
|
In this workunit running under 1.56:
http://ralph.bakerlab.org/workunit.php?wuid=1138646
The graphics works, but whenever it switches to the stick figure display, the Low Energy and Native sections of the graphics window get rather dim.
Also, it's a candidate for long-running models - at 4 hours 52 minutes into the requested 6 hours CPU time, it's still on model 2, step 81002, stage MoverBase-Minimization.
I already have some 1.57 workunits in the queue, so when they start, I'll try to check if they give similar results.
This workunit is now finished, and sucessful. However when shutting down the graphics, I thought I noticed a significant discrepancy between progress reported by the graphics window and progress reported by the BOINC manager for this workunit. The above progress figures are those from the graphics window. |
|
|
|
|
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x005124B3 write attempt to address 0x3FF00000
Yea! Two of us with the same address!
Maybe it is not random after all .... :)
The same here - lr6_D_score12_rlbn_1bm8_IGNORE_THE_REST_NATIVE_7059_5_1.
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x005124B3 write attempt to address 0x3FF00000
The same as well had Manuel Lupotto above.
Both of us had MiniRosetta ver. 1.56. |
|
|
|
|
The 1.57 version doesn't show the protein in the searching box.
In addition the Accepted Energy and RMSD graphs take a long time to develop each time you turn on the graphics. |
|
|
|
|
I already have some 1.57 workunits in the queue, so when they start, I'll try to check if they give similar results.
In this workunit, 1.57 under Vista SP1:
http://ralph.bakerlab.org/result.php?resultid=1292982
The graphics works, but whenever it switches to the stick figure display, the Low Energy and Native sections of the graphics window get rather dim. Also, the Searching section is solid black. Already up to model 5, though.
75% complete in the graphics, and in the BOINC manager.
I've switched from 100% CPU time to 98% CPU time to see if that helps test the lock file problem.
HMM... It's still running at 100%. Looks like I'd better check if any of the following are true:
1. I have to leave it set below 100% longer for 1.57 to notice.
2. There's another setting that overrides this.
3. I have to set it even lower, such as 90%, before it will stop rounding the value to 100%.
In case it makes a difference, I tried switching the Rosetta@home setting to 98% first, but it made no difference in the starting value for the Ralph@home setting, so I set that one to 98% also. |
|
|
|
|
|
I'm seeing the same quirk in progress times that robertmiles and others have already reported. I've got a bunch of tasks with names of the form csttest_1_8_nativecst_harm*, all of which, under Mac OS X 10.4.11, are supposed to take 1 hour approximately to complete. What I'm seeing is that, after say 45 minutes, progress is apparently only 15% complete and there is 1:20:00 left. It's stepping very slowly at this point in a stage called MoverBase+Minimization. Nevertheless the tasks complete in about one hour as they're supposed to.
|
|
|
|
|
I'm seeing the same quirk in progress times that robertmiles and others have already reported. I've got a bunch of tasks with names of the form csttest_1_8_nativecst_harm*, all of which, under Mac OS X 10.4.11, are supposed to take 1 hour approximately to complete. What I'm seeing is that, after say 45 minutes, progress is apparently only 15% complete and there is 1:20:00 left. It's stepping very slowly at this point in a stage called MoverBase+Minimization. Nevertheless the tasks complete in about one hour as they're supposed to.
You have to remember that the percentage bar is merely a cosmetic feature - it's not a "real" percentage bar. Its just a really crude estimate of the time left. Now: there is NO way of knowing how long the job will take before you've finished the first decoy .
I've recently changed this estimate to be much more conservative. We estimate it as percentage=100*time_spent/max_time.
where maxtime is USERTIME+4hrs, not USERTIME. because the WU cannot run longer then an excess of 4 hrs over the user time (due to the watchdog).
After the first decoy is completed the program can make a slightly more educated guess about how long it's going to actually take, so the percentages get more accurate as more decoys are produced.
So i'f you're 45 minutes in on the first decoy 15% is about correct. (since your max runtime is 300minutes. 45/300 = 0.15
I'm sorry there's no better way to do this, but rosetta goes through many different stages in making a decyo and its simply impossible to know how long it's going to take.
Mike
|
|
|
|
|
|
I found the World Community Grid Device Profile setting and changed it from 100% CPU time to 98% also; still running at 100%, though. I suspended BOINC for the weekly backup and disk cleanup session:
Back Up Files
Disk Cleanup
Disk Defragmenter
Vista update
reboot
Restarted running BOINC; still running at 100% CPU time.
Waited a few minutes; then got these messages:
2/1/2009 5:05:01 PM|World Community Grid|Task mf189_00038_13 exited with zero status but no 'finished' file
2/1/2009 5:05:01 PM|World Community Grid|If this happens repeatedly you may need to reset the project.
2/1/2009 5:05:01 PM|ralph@home|Task csttest_1_8_nativecst_harm_cenfa_0.1_hb_t373__IGNORE_THE_REST_1S3JA_4_7372_1_0 exited with zero status but no 'finished' file
2/1/2009 5:05:01 PM|ralph@home|If this happens repeatedly you may need to reset the project.
2/1/2009 5:05:01 PM|World Community Grid|Restarting task mf189_00038_13 using hpf2 version 603
2/1/2009 5:05:01 PM|ralph@home|Restarting task csttest_1_8_nativecst_harm_cenfa_0.1_hb_t373__IGNORE_THE_REST_1S3JA_4_7372_1_0 using minirosetta version 157
Waited a few minutes; then told BOINC Manager to read the config file and the local prefs file; no change.
2/1/2009 5:19:11 PM||General prefs: from World Community Grid (last modified 01-Feb-2009 16:35:52)
2/1/2009 5:19:11 PM||Computer location: home
2/1/2009 5:19:11 PM||General prefs: using separate prefs for home
2/1/2009 5:19:11 PM||Reading preferences override file
2/1/2009 5:19:11 PM||Preferences limit memory usage when active to 1438.32MB
2/1/2009 5:19:11 PM||Preferences limit memory usage when idle to 1725.99MB
2/1/2009 5:19:11 PM||Preferences limit disk usage to 27.94GB
2/1/2009 5:19:21 PM||General prefs: from World Community Grid (last modified 01-Feb-2009 16:35:52)
2/1/2009 5:19:21 PM||Computer location: home
2/1/2009 5:19:21 PM||General prefs: using separate prefs for home
2/1/2009 5:19:21 PM||Reading preferences override file
2/1/2009 5:19:21 PM||Preferences limit memory usage when active to 1438.32MB
2/1/2009 5:19:21 PM||Preferences limit memory usage when idle to 1725.99MB
2/1/2009 5:19:21 PM||Preferences limit disk usage to 27.94GB
I haven't checked if the POEM@home project or the boincsimap project have any similar options for lowering the percentage of CPU time, but will do that next.
No other projects expected to provide any workunits for the next day or so. Any other ideas about how to get such a lowering?
|
|
|
|
|
|
boincsimap recognized one of the previous CPU percentage changes from 100% to 98%, so I made no changes there.
POEM@home didn't, so I made the change there also.
Still running at 100% CPU time, though. Could the changes affect only workunits downloaded after the changes?
Looks like time to try even lower values.
Can 1.58 read the current CPU percentage value, and return it along with the results? Can it do the same with the Leave In Memory option, and perhaps a few other values affecting whether the lock file needs to be used? |
|
|
|
|
|
Robert,
I had the lock file problem with a setting ot 99% ... but it is INTERMITTENT ... I only had it on about 40% of the tasks .... which of course failed.
As far as the setting propagating. You set it here at Ralph, update Ralph on your computer ... get the setting downloaded (you can check in the preferences pane) then do updates to the other projects on the target system and it should propagate up ...
Note that if you hit Ok on the preferenece pane you are now running locally and I am not sure if the changed Ralph settings will then propagate up ...
I only had the problem with Rosetta, primarily on my i7 computer ...
@Mtyka,
The graphics in 1.58 seems to work on XP Pro again ... time for 1.59 to break them again ... :) |
|
|
|
|
|
I did that step to download the new setting before expecting it to have any effect. However, I've only had the end of one workunit from RALPH@home and the first part of another run since the change - not enough to test very well for an intermittent problem. |
|
|
|
|
|
Got this 1.56 result , with "Process exited with error code 193".
Other than that one the others seem to work ok on this run.
____________
 |
|
|
|
|
I did that step to download the new setting before expecting it to have any effect. However, I've only had the end of one workunit from RALPH@home and the first part of another run since the change - not enough to test very well for an intermittent problem.
I think out of 20 tasks I ran I had 8 or 9 with that problem others with other failures.
I think it is a combination problem... with something else being involved. Not sure what it is exactly as I usually use a switch interval of 720 min so that most tasks are run end to end with no switching. Leave in memory is checked and I only turn off the systems when power goes out ...
All I can positively report is that since I went back to 100% usage and the later application I now have tasks running on that same system with no failures ... |
|
|
|
|
Ramostol, this is relevant for you too, i think that's the same bug.
You two, could you set your settings back to restrict to specific days and see if it works now ? It did here :)
At last I managed to grab some 1.57 wus. I had time to observe that at least the first two succeeded, my first miniRosetta successes for a month. In a few hours we shall see if the rest survived the night and the network settings. Then, no news = good news.
Cheers! |
|
|
|
|
|
Well, almost 80 tasks run on all my systems and no failures since the access failure reported below... |
|
|
|
|
|
We seem to have broken the server ... work generation is stopped ...
Hard to test with no test cases ... :) |
|
|
|
|
|
nothing today either and i got room on my system now after playing catch up again. |
|
|
|
|
nothing today either and i got room on my system now after playing catch up again.
Cheep, Breek, Nereek ...
I can hear the crickets ...
The calm before the storm?
The storm before the hurricane?
Or was it something I said?
:) |
|
|
|
|
|
Got a couple more tasks, ran fine ...
A post over at Rosetta NC from another user on the lock file problem and the indication there is that they also cured it with using CPU at 100% and not less ...
The more cores you have running the worse the problem was my experience ... for example I did not see it that much on my 4 cores but it was a real killer on my 8 CPU i7 ... YMMV |
|
|
|
|
|
Then maybe that's why I'm not seeing it even though I've tried to set my CPU to 90% to catch the problem. For some reason, something seems to be making my CPU stay at 100% even though I've told it to change to 90%. I have a dual CPU core machine. |
|
|
|
|
Then maybe that's why I'm not seeing it even though I've tried to set my CPU to 90% to catch the problem. For some reason, something seems to be making my CPU stay at 100% even though I've told it to change to 90%. I have a dual CPU core machine.
Did you try to set it locally?
If you have local preferences already set, then changing the web ones will not do anything. Sadly, there is no clear indication on the preference pane that you are using local vs. remote preferences. Try setting them locally for a shot and then use clear to go back to the web settings... |
|
|
|
|
|
I've attempted to set them both locally and remotely, both at less than 100%. I'm not sure if I've found the correct way to set them locally, though, and I'll want to know the difference when I get the new computer I'm planning to order. |
|
|
|
|
I've attempted to set them both locally and remotely, both at less than 100%. I'm not sure if I've found the correct way to set them locally, though, and I'll want to know the difference when I get the new computer I'm planning to order.
Advanced menu/preferences ... make the setting ... click "Ok" button ...
To go back to the web settings:
Advanced menu/preferences ... click "Clear" button ...
You should be able to tell if the setting has taken, assuming you set it to less than 100% by watching the CPU trend-line and there will be periodic dips in the level ... the way BOINC does "throttling" is to halt operation on a periodic basis. If you set it lower the dips should be more dramatic. |
|
|
|
|
|
Every task is failing at line 330 ... halt the madness ...
ERROR: [ERROR] Unable to open constraints file: .pdb.distances.csts.bounded_1.0
ERROR:: Exit from: ..\..\src\core\scoring\constraints\ConstraintIO.cc line: 330
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
|
|
|
|
|
Every task is failing at line 330 ... halt the madness ...
ERROR: [ERROR] Unable to open constraints file: .pdb.distances.csts.bounded_1.0
ERROR:: Exit from: ..\..\src\core\scoring\constraints\ConstraintIO.cc line: 330
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
Another workunit that failed at that line both for me and my wingman:
http://ralph.bakerlab.org/result.php?resultid=1302770
In 1.58 at least for me. |
|
|
|
|
I've attempted to set them both locally and remotely, both at less than 100%. I'm not sure if I've found the correct way to set them locally, though, and I'll want to know the difference when I get the new computer I'm planning to order.
Advanced menu/preferences ... make the setting ... click "Ok" button ...
To go back to the web settings:
Advanced menu/preferences ... click "Clear" button ...
You should be able to tell if the setting has taken, assuming you set it to less than 100% by watching the CPU trend-line and there will be periodic dips in the level ... the way BOINC does "throttling" is to halt operation on a periodic basis. If you set it lower the dips should be more dramatic.
Thank you - finally, a method that works. Now set at 90%, so when I get another workunit, I can look for any effects that causes, or maybe faster if Rosetta@home workunits are affected also.
I've never used that function before, so I suspect that one of the other BOINC projects I participate in set it at 100% without providing any way to reverse that action. |
|
|
|
|
I've attempted to set them both locally and remotely, both at less than 100%. I'm not sure if I've found the correct way to set them locally, though, and I'll want to know the difference when I get the new computer I'm planning to order.
Advanced menu/preferences ... make the setting ... click "Ok" button ...
To go back to the web settings:
Advanced menu/preferences ... click "Clear" button ...
You should be able to tell if the setting has taken, assuming you set it to less than 100% by watching the CPU trend-line and there will be periodic dips in the level ... the way BOINC does "throttling" is to halt operation on a periodic basis. If you set it lower the dips should be more dramatic.
Thank you - finally, a method that works. Now set at 90%, so when I get another workunit, I can look for any effects that causes, or maybe faster if Rosetta@home workunits are affected also.
I've never used that function before, so I suspect that one of the other BOINC projects I participate in set it at 100% without providing any way to reverse that action.
THe only project that "poisoned" the well that I know of was QCN where a debug action set the date to a very bad value. So, you had to do lots of stuff to try to get the settings right. The problem is that the setting with the latest date is the one that prevails ... and in the case of QCN the server got the date 2030 or something like that ... so, any computer with the settings from that set would keep "updating" all the old projects with the settings you no longer wanted.
They have directons on the site to fix the issue though you have to touch all projects affected for some reason ...
Anyway,
*MY* experience is that it is an intermittent problem, that is to say, that it will not affect all tasks run. It caused me problems on only about 40% of the tasks run ... I have no idea what is the other issue in the mix that gives rise to the problem. And, to a large extent I am too production oriented to spend much time and energy I don't have to poke and prod the systems looking for these problems.
I posted a note on the BOINC Dev list where it vanished without a comment or even a gurgle as it sank out of sight about the fact that the preferences pane does not really show that you are on local settings vice global settings ... |
|
|
|
|
|
Hi all,
sorry i've been quiet on the boards recently. Been away last week and focusing on some other projects.
SOrry about the bunch of work units that went out with a faulty file reference - my bad. More today that should actually work.
The error rate has been excellent recently, there a couple of minor fixes coming soon. Then we'll start focusing on scientific improvements again. woo!
Mike
|
|
|
|
|
I've attempted to set them both locally and remotely, both at less than 100%. I'm not sure if I've found the correct way to set them locally, though, and I'll want to know the difference when I get the new computer I'm planning to order.
Advanced menu/preferences ... make the setting ... click "Ok" button ...
To go back to the web settings:
Advanced menu/preferences ... click "Clear" button ...
You should be able to tell if the setting has taken, assuming you set it to less than 100% by watching the CPU trend-line and there will be periodic dips in the level ... the way BOINC does "throttling" is to halt operation on a periodic basis. If you set it lower the dips should be more dramatic.
Thank you - finally, a method that works. Now set at 90%, so when I get another workunit, I can look for any effects that causes, or maybe faster if Rosetta@home workunits are affected also.
I've never used that function before, so I suspect that one of the other BOINC projects I participate in set it at 100% without providing any way to reverse that action.
THe only project that "poisoned" the well that I know of was QCN where a debug action set the date to a very bad value. So, you had to do lots of stuff to try to get the settings right. The problem is that the setting with the latest date is the one that prevails ... and in the case of QCN the server got the date 2030 or something like that ... so, any computer with the settings from that set would keep "updating" all the old projects with the settings you no longer wanted.
They have directons on the site to fix the issue though you have to touch all projects affected for some reason ...
Anyway,
*MY* experience is that it is an intermittent problem, that is to say, that it will not affect all tasks run. It caused me problems on only about 40% of the tasks run ... I have no idea what is the other issue in the mix that gives rise to the problem. And, to a large extent I am too production oriented to spend much time and energy I don't have to poke and prod the systems looking for these problems.
I posted a note on the BOINC Dev list where it vanished without a comment or even a gurgle as it sank out of sight about the fact that the preferences pane does not really show that you are on local settings vice global settings ...
I've never tried to run QCN. The only project I know of that even offers the chance of setting device-specific parameters is WCG, and their method of changing them to 90% CPU time didn't work.
I didn't try to change these settings on a few projects I try to participate in that look unlikely to offer any workunits soon. One of them, Cels@home, is no longer even reachable online.
I've never seen it when I was using 100% CPU time. However, I'm now using 90% CPU time, and it may be related to the problem I just posted over in the hard-to-find thread for 1.58 problems. |
|
|
|
|
I've never tried to run QCN. The only project I know of that even offers the chance of setting device-specific parameters is WCG, and their method of changing them to 90% CPU time didn't work.
I didn't try to change these settings on a few projects I try to participate in that look unlikely to offer any workunits soon. One of them, Cels@home, is no longer even reachable online.
I've never seen it when I was using 100% CPU time. However, I'm now using 90% CPU time, and it may be related to the problem I just posted over in the hard-to-find thread for 1.58 problems.
I have sensors on the way so that I can try QCN as another non-cpu intense project. Living in California, though away from where most of the fault lines are ... still ... it is a topic of interest ...
Cels@Home has changed their URL so that there is now a Cels@Home (old) and a Cels@Home that is in Alpha state ...
Now to the other thread ... |
|
|
|
|
Hi all,
sorry i've been quiet on the boards recently. Been away last week and focusing on some other projects.
SOrry about the bunch of work units that went out with a faulty file reference - my bad. More today that should actually work.
The error rate has been excellent recently, there a couple of minor fixes coming soon. Then we'll start focusing on scientific improvements again. woo!
Mike,
You should change the link on the front page to point to the latest bug thread ...
|
|
|