Posts by TPCBF

1) Message boards : RALPH@home bug list : Rosetta mini beta and/or android 3.61-3.83 (Message 6220)
Posted 19 Oct 2017 by TPCBF
Post:
One task (#4084226) stalled at 83.144%.
Got about a dozen or so he last couple of days. They all would end up in a compute error or stall out at various percentages in the 70-90% range, blocking any other useful WU from running.
I noticed that pretty much all WUs checkpoint after about 13 secs, then won't show another checkpoint for hours until they crap out.
Also, I had to manually remove a dozen or so dead tasks in the task manager to get my machine responsive again...
Have the last two running right now, which show a slightly different behaviour (at least while I can watch them). Started about 15min ago, they show 12% done, 12 min of CPU time vs 15 clock time and the check point increasing each time I check by about 10 secs, but still a fraction of the indicated CPU time used.
I am using BOINC agent 7.8.3 on an 8GB/i3/Windows 8.1 host...
2) Message boards : Current tests : Is this thing on? (Message 6088)
Posted 1 Nov 2016 by TPCBF
Post:
Looks like there are more WUs that just get stuck like this. A total of 6 WUs finished an reported, but the remaining 8 (or at least the 3 other ones that have been started overnight) are stuck right now. Will likely abort those shortly as not to block my laptop for other stuff to crunch...

Ralf
3) Message boards : Current tests : Is this thing on? (Message 6087)
Posted 1 Nov 2016 by TPCBF
Post:
Well, strange dates on all the posts, months if not years old...

Anyway, a new batch of WUs made it to this machine this afternoon and while the first one finished without a hitch and reported, the second one crunching right now for +55 min shows 1.994% done, with the last check point at 5:30min, with 6min CPU time and an estimated remaining runtime of 45min which keeps decreasing without any obvious progress.

Something's rotten in the state of Berkley... :?

Ralf
4) Message boards : RALPH@home bug list : Rosetta mini beta and/or android 3.61-3.83 (Message 5912)
Posted 12 Oct 2015 by TPCBF
Post:
Got a bunch of WUs today(Beta 3.63, on Windows 8.1/64) and while the first one finished fine, with the rest it seems the "same old same old" starts:
They will run for a while, then CPU time will stop increasing, at some point the job still shows "running" but no ETA time (just "-----") until they will crap out with a "Computation error" after blocking anything else on the host for hours, and no credit given either.

Again, these are Beta 3.63 WUs, on Windows 8.1/64, 8GB of RAM, BOINC agent v7.6.9...

Ralf


does resetting the project help?
I did not get a chance to do that, at least this time around. In the past, I mentioned this a few times that I had the same issue, but with each batch of WUs, it's the same spiel. Out of a dozen or so beta WUs, taking up days of compute time, blocking other projects, only two WUs finished and got credited.
I really don't know if any of the developers are actually paying attention, at least none of them seems to be posting in here... :-(

Ralf
5) Message boards : RALPH@home bug list : Rosetta mini beta and/or android 3.61-3.83 (Message 5904)
Posted 10 Oct 2015 by TPCBF
Post:
Got a bunch of WUs today(Beta 3.63, on Windows 8.1/64) and while the first one finished fine, with the rest it seems the "same old same old" starts:
They will run for a while, then CPU time will stop increasing, at some point the job still shows "running" but no ETA time (just "-----") until they will crap out with a "Computation error" after blocking anything else on the host for hours, and no credit given either.

Again, these are Beta 3.63 WUs, on Windows 8.1/64, 8GB of RAM, BOINC agent v7.6.9...

Ralf
6) Message boards : Current tests : New test batch-Anybody out there????? (Message 5832)
Posted 14 Mar 2015 by TPCBF
Post:
Hi TPCBF,

Yeah, we are checking the Posts for errors.

Thanks for the feedback! Can you post the titles of the WU?
Well, didn't look like it, that's why I posted in the meantime in the Rosetta forum.

WUs
cb_mar11_dock_placestub_EEEH_1035_vegf_ProteinInterfaceDesign_20241_91_0_0 and
cb_mar11_dock_placestub_EEEH_1038_vegf_ProteinInterfaceDesign_20241_91_0_0

finally finished earlier, after I had suspended them since last night and resumed once some other project WUs finished before their deadline

WU
cb_mar11_dock_placestub_EEEH_1037_vegf_ProteinInterfaceDesign_20241_91_0_0

finished just a couple of minutes ago (will report it in another)

while
cb_mar11_dock_placestub_EEEH_1036_vegf_ProteinInterfaceDesign_20241_91_0_0

now sits for a short while at 76.478% with no estimated time remaining...

Ralf



7) Message boards : Current tests : New test batch-Anybody out there????? (Message 5830)
Posted 13 Mar 2015 by TPCBF
Post:
Well, got earlier today again 4 WUs of what appears to be a new series of tests.
Problem however that all 4 WUs only run to a certain percentage and then simply seem to be stuck (the lowest at about 12%, the highest got to 8x%),blocking any other work on that host.
Is anyone of the Baker team actually keeping an eye out about what is happening and actually looking for feedback?
Or is this just all one large waste of time, on both ends?
8) Message boards : RALPH@home bug list : minirosetta beta 3.50-3.52 apps (Message 5813)
Posted 25 Feb 2015 by TPCBF
Post:
Received this error on Task 3335547


ERROR: Error: can't read sequence! Use -in::file::fasta sequence.fasta or -in::file::native native.pdb!
ERROR:: Exit from: src/protocols/abinitio/AbrelaxApplication.cc line: 488
std::cerr: Exception was thrown:


[ERROR] EXCN_utility_exit has been thrown from: src/protocols/abinitio/AbrelaxApplication.cc line: 488
ERROR: Error: can't read sequence! Use -in::file::fasta sequence.fasta or -in::file::native native.pdb!

Conan
Same here. Got about a dozen or so WUs and they crap out faster than you can shake a stick at... :-(

Ralf
9) Message boards : RALPH@home bug list : minirosetta beta 3.50-3.52 apps (Message 5769)
Posted 25 Jul 2014 by TPCBF
Post:
To roughly check works of checkpoints not necessarily to restart.
You can click "properties" of any of the currently executing task and check the line "CPU time at last checkpoint".
If checkpoints saving are working normal there will be the time(counted from start of task) of last checkpoint saved. If checkpoint does not work there will be "-- --" on this line.
Or time few hours ago/less compared to total CPU time - if the client could finish at least one model completely and recorded it on a disk - it also counted as checkpoint and usual this part work normal.
The problem with the current checkpoint setting in the WUs is that the recent batch of WUs seem to reset itself a lot, always starting from scratch instead of being able to continue from the last checkpoint. That's the purpose of checkpoints.
As it is currently, a lot of processing power get's wasted this way...

Ralf
10) Message boards : RALPH@home bug list : minirosetta beta 3.50-3.52 apps (Message 5759)
Posted 20 Jul 2014 by TPCBF
Post:
Same here, the 4 WUs I p/u on the 17th just keep restarting from 0% over and over again and each time, at least during the initial time, are trashing the hard drive like crazy...

Is anyone from the project actually around to monitor any responses. Or is Mr.Baker & Cie only available when there's a chance to bask in the limelight?

Ralf
11) Message boards : RALPH@home bug list : Rosetta Mini Beta 3.53 (Message 5750)
Posted 19 Jul 2014 by TPCBF
Post:
ok, the 4 WUs that I got have by now repeatedly restarted from scratch, even at though at one point showing almost 30% done.
And I noticed that those WUs are trashing the hard drive like crazy...

Ralf
12) Message boards : RALPH@home bug list : Rosetta Mini Beta 3.53 (Message 5746)
Posted 19 Jul 2014 by TPCBF
Post:
Strange behaviour of the last set of (4) WUs. They are running now for a bit more than 2h, showing around 24% done but only a few odd minutes of elapsed runtime, and only a few (23) seconds of CPU time and supposedly no checkpoint reached yet... :?
Running on Windows 8.1/64, with 6GB of RAM and not much else going on on the machine right now...

Anyone there and care to explain?

Ralf
13) Message boards : RALPH@home bug list : Rosetta Mini Beta 3.53 (Message 5743)
Posted 16 Jul 2014 by TPCBF
Post:
Yeah, those WUs seem to have a fairly large (low) restart threshold. Got two jobs on my laptop, running for about an hour and when I restarted my laptop today, they both started at 0% again. Theoretically they should finish before I have to leave and hit the road again, hope that they don't start all over again when I get back.

Ralf
14) Message boards : RALPH@home bug list : MiniRosetta Beta 3.41 (Message 5566)
Posted 25 Aug 2012 by TPCBF
Post:
And the crap continues, nothing but compute and validate errors... :(

Ralf
15) Message boards : RALPH@home bug list : MiniRosetta Beta 3.41 (Message 5565)
Posted 24 Aug 2012 by TPCBF
Post:
Yup, had now the last 4 3.41 task bomb out with the same error, though with runtimes between 388 and 18100 secs...

Hope we get some response from the techs this time around (oh well, one can dream)

Ralf
16) Message boards : Number crunching : WARNING, Tasks Cancelled. (Message 5560)
Posted 12 Jul 2012 by TPCBF
Post:
RALPH and Rosetta don't display the server version, but there has been discussion in the past on Rosetta about old server code.

In Nov 2008 http:////boinc.bakerlab.org/rosetta/forum_thread.php?id=4496&nowrap=true#57030 rev # 14349.

http://ralph.bakerlab.org/forum_thread.php?id=289#2768


Uuuh, 4 years ago informations
I hope they upgrade their servers!!
And I don't think this applies to the curren tissue at all. I checked a couple dozen WUs that validated just fine before this nonsense started and they all showed "cancelled" in the WU info...

But looks like the admins just don't f'ing care either way... :-(
17) Message boards : RALPH@home bug list : MiniRosetta Beta 3.26 (Message 5557)
Posted 11 Jul 2012 by TPCBF
Post:
The silence of the project admins is deafening... :-(

Ralf
18) Message boards : RALPH@home bug list : MiniRosetta Beta 3.26 (Message 5552)
Posted 10 Jul 2012 by TPCBF
Post:
Just checked and every Work Unit still in progress has been cancelled, so if I process them they will get Validate errors.

Usually if the project cancels a work unit it will be cancelled on the volunteers computer as well and then it wont get processed and waste CPU time.

Not the case here.
Even the work units that have been completed and marked as successful have been cancelled as well.
Well, just checked as a second WU of the latest batch(es) just finished and it did the same thing:
- processing along just fine, uploading, reporting
- results in a "validate error"

The "canceled" under "errors" in the WU info shows up for a few days back at least, with all but one error WU reporting and validating just fine, so I am not sure that is of any relevance in this case.

Looks something is askew here and it would really be nice if one of the admins would respond, at least to let us know they are looking into this...

My WU's with validate errors are
2465839 and
2464581 with

2461750 from the same batch send and returned validated just fine...

Haven't aborted those WU's just yet, but suspended the project in the hope to hear from the projects admins about this first... :?

Ralf
19) Message boards : RALPH@home bug list : MiniRosetta Beta 3.26 (Message 5546)
Posted 29 Jun 2012 by TPCBF
Post:
Since the latest batch started the other day, I get roughly one compute error for each dozen or so WUs that go through just fine...

Ralf
20) Message boards : RALPH@home bug list : MiniRosetta Beta 3.26 (Message 5516)
Posted 5 Apr 2012 by TPCBF
Post:
The 6 Work Unit Limit is a bit of a pain.

If the project sends out faulty work then I can't get any more for the day to test if some work units actually work or not.
This will spread the work around I suppose but slow down getting the work returned.

Conan
What 6 WU Limit?
I had in the last couple of days up to 20 if I counted right of those quickly failing 3.24 ones, and right now I have 9 of the 3.26 Beta WUs in queue (one currently running)...
Ok, make that 8 in queue and one just finished successfully...

Ralf


Next 20



©2019 University of Washington
http://www.bakerlab.org