Posts by feet1st

61) Message boards : RALPH@home bug list : Bug Reports for Minirosetta version 1.35 (Message 4259)
Posted 10 Oct 2008 by Profile feet1st
Post:
I'm also adding a reporter column to our data files which should report individual decoy times. THat way we should be able to catch "stray" or spurious run-time outliers.


Are you enhancing the watchdog as well? To "think" at the model level, rather then the task level, about when things have been running too long?? In fact, perhaps the watchdog could abort the current model, and continue running the rest of the task?
62) Message boards : RALPH@home bug list : Bug Reports for Minirosetta v1.36 (Message 4258)
Posted 10 Oct 2008 by Profile feet1st
Post:
hombench_mtyka_looprelax_ccd_moves_looprelax_ccd_moves_t322__IGNORE_THE_REST_2F3XA_12_5091_1_0 has been going 16hrs and still on model 1.
63) Message boards : RALPH@home bug list : Bug Reports for Minirosetta v1.36 (Message 4253)
Posted 9 Oct 2008 by Profile feet1st
Post:
Looking at Ed's results, the last two reported were both ended by the watchdog because the 1hr runtime preference was exceeded by 3 times.

The two tasks:
hombench_mtyka_looprelax_ccd_moves_looprelax_ccd_moves_t303__IGNORE_THE_REST_1FEZA_4_5083_1_0
hombench_mtyka_looprelax_ccd_moves_looprelax_ccd_moves_t303__IGNORE_THE_REST_1FEZA_3_5083_1_0
64) Message boards : RALPH@home bug list : Bug Reports for Minirosetta v1.36 (Message 4248)
Posted 8 Oct 2008 by Profile feet1st
Post:
The end of the task seems to take forever, though. ...1 hr later it was only advanced .5% at 95%. ...the first 85%... in about 2 hrs.


Now you understand why they are working hard to focus on, and eliminate the long running models! A checkpoint is made at the end of each model, and sometimes more frequently then that, depending on the type of work. And you are describing symptoms of a task that runs for 3 hours and still has not completed it's first model.
65) Message boards : RALPH@home bug list : Bug Reports for Minirosetta v1.36 (Message 4243)
Posted 7 Oct 2008 by Profile feet1st
Post:
I lost 1.5 hours of work when BOINC switched applications. The task is

hombench_mtyka_looprelax_ccd_moves_looprelax_ccd_moves_t303__IGNORE_THE_REST_1YS9A_13_5083_1_0

Applications were not left in memory while others (including mass production minirosetta 1.34's) were running


...that is normal when you do not leave applications in memory. They continue to work on increasing the frequency of checkpoints, but it is a process, not an event.

Newer versions of BOINC try to wait until a checkpoint is reached before switching applications. This preserves more work.
66) Message boards : RALPH@home bug list : Bug Reports for Minirosetta version 1.35 (Message 4234)
Posted 3 Oct 2008 by Profile feet1st
Post:
So if you encounter such a job, please post to this thread so we can properly adjust the parameters of the job to reduce the run times.


...any progress on getting the tasks to automagically report back to you with the runtime for each model? Seems to be a key piece of data that has been neglected in the past. We've been seeing occaisional 4-6 hour models for quite some time and have always assumed they were within the normal range.
67) Message boards : RALPH@home bug list : Bug Reports for Minirosetta version 1.34 (Message 4229)
Posted 30 Sep 2008 by Profile feet1st
Post:
...another one, 26 hrs, still on model 1. I aborted it.
hombench_mtyka_foldcst_loopbuild_boinctest3_foldcst_loopbuild_t297__IGNORE_THE_REST_1ESC__5_5028_1_0
68) Message boards : RALPH@home bug list : Bug Reports for Minirosetta version 1.34 (Message 4224)
Posted 28 Sep 2008 by Profile feet1st
Post:
26hrs and still on model 1, and now it has passed it's deadline. I've canceled it.
It was called hombench_mtyka_foldcst_loopbuild_test1_foldcst_loopbuild_t286__IGNORE_THE_REST_1BWP__4_4982_1_0
69) Message boards : RALPH@home bug list : Bug Reports for Minirosetta version 1.34 (Message 4219)
Posted 26 Sep 2008 by Profile feet1st
Post:
Credit for all these recent tasks seems to exactly match the claimed credit, rather then using the average claims for the task.

Look at BigMike's report below. Note how his 2546.2 CPU seconds claimed and credit granted only reflect the second go at the WU where 2 models were done rather then the first time when 19 were done.

Interesting too, the task doesn't seem to show his runtime preference.
70) Message boards : RALPH@home bug list : Bug Reports for Minirosetta version 1.34 (Message 4208)
Posted 24 Sep 2008 by Profile feet1st
Post:
hombench_mtyka_looprelax_test_full_2_looprelax_t326__IGNORE_THE_REST_1Q7RA_4_4951_1_0 using minirosetta version 134
has been running 16hrs, and still only on model 2. I have a 24hr runtime preference, but this is way too long for 2 models.
71) Message boards : RALPH@home bug list : Bug Reports for Rosetta version 5.98 (Message 4181)
Posted 15 Aug 2008 by Profile feet1st
Post:
Do you know how to search for this particular protein/app combo as crunched by others, here or on Rosetta, to find out if this is the case?


I believe the project team would have to run a query to answer a question like that.
72) Message boards : RALPH@home bug list : Bug Reports for Rosetta version 5.98 (Message 4178)
Posted 14 Aug 2008 by Profile feet1st
Post:
Well, I assume that one way to cause this to happen would be to abort the transfer from the transfers tab in the advanced view. But, I doubt you did that, and especially since both had the same problem.

Have there been any BOINC problems?? I understand there is some handshaking and authorizations that take place to permit your upload while thwarting SPAMers. Perhaps there is a problem with that process? Or, one of the things I think they exchange is the file size. Perhaps the out file for this task grew unexpectedly large? Hopefully Ralph team can shed some light.
73) Message boards : RALPH@home bug list : Bug Reports for Rosetta version 5.98 (Message 4176)
Posted 13 Aug 2008 by Profile feet1st
Post:
Snagletooth, a file transfer error, but only after you completed a model. And given the file name, it looks like your machine finally gave up on trying to upload the result file. Have you had any other upload problems?
74) Message boards : RALPH@home bug list : Bug reports for 5.69-5.70 (Message 4150)
Posted 26 Jun 2008 by Profile feet1st
Post:
t405 got hung up on Win/XP. BOINC showed about 90 min of runtime, task manager showed no CPU going to the task. Exited and restart BOINC. Now the task shows more then 6hrs in to a 24 hours runtime preference. So, it would seem BOINC lost track of it several hours before it stopped getting actual CPU time.

Seems to be running normally at the moment.
75) Message boards : RALPH@home bug list : minirosetta 1.19 bug thread (Message 3992)
Posted 5 May 2008 by Profile feet1st
Post:
Project status display shows the feeder is down.
BOINC Client shows :

Message from server: Project encountered internal error: shared memory

This has been occuring at least since 03:19 Pacific time.
Can't complete a scheduler update to report my completed task.
76) Message boards : Feedback : Run time defaults (Message 3961)
Posted 23 Apr 2008 by Profile feet1st
Post:
I've found, more then once, bugs that only turned up when you ran WUs for the maximum time. Outfiles growing beyond their max size, etc. Not that we should all do it, but, someone should test that upper limit on runtime.
77) Message boards : RALPH@home bug list : minirosetta 1.13 bug thread (Message 3942)
Posted 21 Apr 2008 by Profile feet1st
Post:
Had to install security patches and reboot WinXP. Ended BOINC normally, installed the fixes, rebooted, restarted BOINC. Both mini 1.13 WUs that were running fine ended upon the restart.

WU1
WU2

Both say one decoy was completed. No signs of watchdog causing them to end. I could see that perhaps they ended upon restart due to insufficient time to complete another model... but then why didn't it reach the same conclusion when model 1 completed back before I rebooted??

If you are curious, here are the messages from BOINC Manger upon restart

4/21/2008 1:29:56 PM|ralph@home|Restarting task mini_abinitio-1bk2_-test_2008-2-6_3310_73_0 using minirosetta version 113
4/21/2008 1:29:56 PM|ralph@home|Restarting task mini_abinitio-1bkrA-test_2008-2-6_3310_73_0 using minirosetta version 113
4/21/2008 1:30:51 PM|ralph@home|Computation for task mini_abinitio-1bk2_-test_2008-2-6_3310_73_0 finished
4/21/2008 1:30:51 PM|ralph@home|Starting mini_abinitio-1tif_-test_2008-2-6_3310_72_0
4/21/2008 1:30:51 PM|ralph@home|Starting task mini_abinitio-1tif_-test_2008-2-6_3310_72_0 using minirosetta version 113
4/21/2008 1:30:58 PM|ralph@home|Computation for task mini_abinitio-1bkrA-test_2008-2-6_3310_73_0 finished
78) Message boards : RALPH@home bug list : minirosetta 1.13 bug thread (Message 3940)
Posted 21 Apr 2008 by Profile feet1st
Post:
This one mini_abinitio-1bk2_-test_2008-2-6_3310_73_0 shows peak memory of 867MB! It's 17hrs in to a 24hr runtime preference on WinXP.

Memory does not seem to be growing without bounds (i.e. no memory leak), just seems to use a lot, then free it again at various times as it runs.
79) Message boards : RALPH@home bug list : minirosetta 1.13 bug thread (Message 3926)
Posted 18 Apr 2008 by Profile feet1st
Post:
FYI this 1a19A_BOINC_ABINITIO_IGNORE_THE_REST-S25-9-S3-3--1a19A-_3508_214_0
has a peak memory usage of 558MB on WinXP so far (10hrs in to 24hr run time).

...after 17hrs, peak is 786MB, 26million page faults on a machine with 2.5GB for 2 processes.
80) Message boards : RALPH@home bug list : minirosetta 1.13 bug thread (Message 3924)
Posted 18 Apr 2008 by Profile feet1st
Post:
FYI this 1a19A_BOINC_ABINITIO_IGNORE_THE_REST-S25-9-S3-3--1a19A-_3508_214_0
has a peak memory usage of 558MB on WinXP so far (10hrs in to 24hr run time).


Previous 20 · Next 20



©2024 University of Washington
http://www.bakerlab.org