minirosetta v1.47 bug thread

Author	Message
Reeltime Send message Joined: 1 Nov 08 Posts: 1 Credit: 6,349 RAC: 0	Message 4413 - Posted: 14 Dec 2008, 15:16:31 UTC Last modified: 14 Dec 2008, 15:17:37 UTC Not sure if this counts as a bug or not, but my runtime is set to 1 hr, most of the tasks take just over this mark c.65-70 mins. The 1.47 tasks are taking considerably longer. Current one is at 1hr 33 They are running normally upto about 78-80% then slowing down dramatically, then finishing somewhere about 90-91% Dont know if this is worth mentioning or not, so I thought I would :-) Host: 16239 If there is anything I need to check, filewise let me know, Im still fairly new to alpha testing Quick edit: Mentioned this because it is unusual for this project ID: 4413 · Reply Quote

ramostol Send message Joined: 29 Mar 07 Posts: 24 Credit: 31,121 RAC: 0	Message 4419 - Posted: 16 Dec 2008, 10:26:15 UTC This start is none too good I'm afraid. All cc2_1_8_mammoth-tasks are crashing after about 1 minute of computing. An example: cc2_1_8_mammoth_fa_cst_hb_t369__IGNORE_THE_REST_1S3QA_7_6585_1_0 <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> minirosetta_1.47_i686-apple-darwin(90916,0xa0538fa0) malloc: * error for object 0x1747d40: Non-aligned pointer being freed (2) * set a breakpoint in malloc_error_break to debug SIGBUS: bus error ID: 4419 · Reply Quote

Conan Send message Joined: 16 Feb 06 Posts: 364 Credit: 1,368,421 RAC: 0	Message 4420 - Posted: 16 Dec 2008, 11:39:20 UTC Last modified: 16 Dec 2008, 11:41:21 UTC This WU and this one that I have also finished seem to take an unusual amount of time. Both of these ones took over 13 hours for just 1 Decoy. My preferences are set to 6 hours. As it took this time to complete a single decoy that is the reason for the long running time. No wonder they are called Mammoth work units. Both completed ok (credit very low for the effort put in, but that is normal for both Ralph and Rosetta). ID: 4420 · Reply Quote

Phil Send message Joined: 28 Jan 07 Posts: 5 Credit: 1,206 RAC: 0	Message 4421 - Posted: 16 Dec 2008, 17:55:15 UTC The Graphics in this one show the following: Total Credit: -5.6988E-05 RAC 5.3133E-315 ID: 4421 · Reply Quote

Phil Send message Joined: 28 Jan 07 Posts: 5 Credit: 1,206 RAC: 0	Message 4422 - Posted: 16 Dec 2008, 22:26:32 UTC - in response to Message 4421. Last modified: 16 Dec 2008, 23:01:29 UTC The Graphics in this one show the following: Total Credit: -5.6988E-05 RAC 5.3133E-315 Interesting, I got a bunch of mammoths now for the same machine but running XP rather than Linux and the display is correct. ID: 4422 · Reply Quote

Conan Send message Joined: 16 Feb 06 Posts: 364 Credit: 1,368,421 RAC: 0	Message 4423 - Posted: 17 Dec 2008, 9:52:18 UTC - in response to Message 4420. Have now had This Task run for 58,307.80 seconds or over 16 hours with the generation of just the 1 decoy. They are getting longer. ID: 4423 · Reply Quote

AdeB Send message Joined: 22 Dec 07 Posts: 61 Credit: 161,367 RAC: 0	Message 4424 - Posted: 17 Dec 2008, 19:04:41 UTC Another long task - over 10 hours for 1 decoy What surprises me is that boinc during those 10 hours never switched to an other project. There was work for other projects and [Switch between applications every] is set to 120 minutes. It looks like this task 'hijacked' my PC until it was finished. Should it behave like this? I also saw the strange values for Total Credit and RAC Phil is describing. Also on a linux PC. AdeB ID: 4424 · Reply Quote

feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0	Message 4425 - Posted: 17 Dec 2008, 20:30:47 UTC - in response to Message 4424. Last modified: 17 Dec 2008, 20:35:28 UTC It looks like this task 'hijacked' my PC until it was finished. Should it behave like this? Sometimes it can seem that way. Ralph has short (3 day) deadlines, and so can easily find itself running "at high priority" on the BOINC list. The other way this can happen is that BOINC tries to switch projects at checkpoints to preserve all the work possible (even for those not keeping tasks in memory). And some of these long running models do not take checkpoints. So BOINC was sitting there thinking it was just 10 min. from being done, and seeing no checkpoint to cut in on, so it just kept running it. Another other way this can happen is if you rack up debt to Ralph when no work is available. BOINC knows it "owes" time to Ralph and so keeps running it. ID: 4425 · Reply Quote

Conan Send message Joined: 16 Feb 06 Posts: 364 Credit: 1,368,421 RAC: 0	Message 4426 - Posted: 18 Dec 2008, 1:18:58 UTC I have been doing both Ralph and Rosetta for quite some time now (was even number 1 in Ralph at one time), and I have noticed on Ralph over the last number of batch jobs that the Granted Credit equals the Claimed Credit and seems based on the Boinc Benchmark system. Why has the Credit system that Rosetta changed to and Ralph was also changed to 6 months to a year ago now reverting back to Benchmark ??? Based on this I am no longer getting due value for the time I spend crunching a work unit. I have seen other systems here on Ralph which have huge Benchmarks compared to me getting well over a hundred credits (114 was one example I saw for 13,400 seconds work) for 3 hours work when I do 6 or more hours work and don't get anywhere near as much as they do (from 55 to 90 for 4 to 7 hours). Because of this a number of users don't understand what I complain about when I say credit is low at Ralph and Rosetta (for me 10 to 12 cr/h at the moment, down from 14 to 15 a few months ago which is still low compared to Seti and others) as they are getting up to 30 cr/h. Can this be looked at please ?? If I do a 16 hour WU (like the current ones) I get 204 credits, others do a 3 hour WU and get 114, I don't see the fairness in that. My computers and results are easy to access and open to view. ID: 4426 · Reply Quote

feet1st Send message Joined: 7 Mar 06 Posts: 313 Credit: 116,623 RAC: 0	Message 4427 - Posted: 18 Dec 2008, 2:03:05 UTC Last modified: 18 Dec 2008, 2:13:00 UTC My t328 mammoth is still on model 1 step 931,000 after nearly 17hrs ... and, of course, it's time to reboot to install MS fixes! ...wish me luck! [update] Interesting... it restarted on model 1, step 0 (yes, I waited for it to initialize and start incrementing steps) but with 2hr15min of CPU time on it. So, it's like it did take a checkpoint... only it didn't. Should be an interesting output file! ID: 4427 · Reply Quote

Path7 Send message Joined: 11 Feb 08 Posts: 56 Credit: 4,974 RAC: 0	Message 4428 - Posted: 18 Dec 2008, 19:19:28 UTC Hello all, The next WU ran for about 4 hours, when I had to reboot my PC due to an IE7-update. cc2_1_8_mammoth_mix_cen_cst_hb_t342__IGNORE_THE_REST_2G0QA_1_6636_1_0 The WU restarted from 0:00 hours runtime and finished within 4656 seconds (1.29 hours), and generated 1 decoy; valid. Also nice within my runtime preference of 7200 seconds. Why did this WU run for more than 4 hours at its first run? Have a nice day, Path7. ID: 4428 · Reply Quote

Stephen Send message Joined: 17 Dec 08 Posts: 3 Credit: 6,566 RAC: 0	Message 4430 - Posted: 19 Dec 2008, 0:31:16 UTC i'm getting some odd behavior. * cpu timer sometimes is getting reset * i suspended all work units, then unsuspended them and they all completed immediately. ID: 4430 · Reply Quote

Stephen Send message Joined: 17 Dec 08 Posts: 3 Credit: 6,566 RAC: 0	Message 4431 - Posted: 19 Dec 2008, 1:48:18 UTC - in response to Message 4430. to elaborate on the problem: a WU will get to around 85% complete , progress will stay the same. time to completion stays around 10 minutes. i suspend all tasks, resume then the "stuck" WUs will complete ID: 4431 · Reply Quote

zombie67 [MM] Send message Joined: 8 Aug 06 Posts: 75 Credit: 2,396,363 RAC: 6,299	Message 4432 - Posted: 19 Dec 2008, 4:04:32 UTC - in response to Message 4426. I have been doing both Ralph and Rosetta for quite some time now (was even number 1 in Ralph at one time), and I have noticed on Ralph over the last number of batch jobs that the Granted Credit equals the Claimed Credit and seems based on the Boinc Benchmark system. Why has the Credit system that Rosetta changed to and Ralph was also changed to 6 months to a year ago now reverting back to Benchmark ??? Based on this I am no longer getting due value for the time I spend crunching a work unit. How so? Your machines claim based on benchmarks. If your benchmarks are not tampered with, then you are getting exactly what you are due. You can't just look at run time. Some machines are faster than others. So a fast machine running 4 hours will have done more work than a slower machine running 4 hours. So the faster machine should be awarded more credits, even though the crunch time is equal. Reno, NV Team: SETI.USA ID: 4432 · Reply Quote

Conan Send message Joined: 16 Feb 06 Posts: 364 Credit: 1,368,421 RAC: 0	Message 4434 - Posted: 19 Dec 2008, 13:36:48 UTC - in response to Message 4432. I have been doing both Ralph and Rosetta for quite some time now (was even number 1 in Ralph at one time), and I have noticed on Ralph over the last number of batch jobs that the Granted Credit equals the Claimed Credit and seems based on the Boinc Benchmark system. Why has the Credit system that Rosetta changed to and Ralph was also changed to 6 months to a year ago now reverting back to Benchmark ??? Based on this I am no longer getting due value for the time I spend crunching a work unit. How so? Your machines claim based on benchmarks. If your benchmarks are not tampered with, then you are getting exactly what you are due. You can't just look at run time. Some machines are faster than others. So a fast machine running 4 hours will have done more work than a slower machine running 4 hours. So the faster machine should be awarded more credits, even though the crunch time is equal. What I am referring to is not the fact that I am getting granted a benchmark score (and no they are not tampered with as you can tell by the low figures on my computers), it is the fact that the crediting system on Ralph and Rosetta was no longer based on the Boinc Benchmark value and therefore I should not be getting the same as claimed. The crediting system is supposed to be based on number of decoys generated as well as when it is returned and length of processing with the first to be returned in a batch gets what they claim then each one after that gets some form of averaging to get the final amount. At the moment it would appear that all results are getting what they claim which is not how the Rosetta/Ralph fixed type crediting system was meant to be, unless of course I am some how returning all my work before any one else in my batch, this I don't believe due to my 6 run time preference. ID: 4434 · Reply Quote

Conan Send message Joined: 16 Feb 06 Posts: 364 Credit: 1,368,421 RAC: 0	Message 4435 - Posted: 19 Dec 2008, 13:41:15 UTC - in response to Message 4431. to elaborate on the problem: a WU will get to around 85% complete , progress will stay the same. time to completion stays around 10 minutes. i suspend all tasks, resume then the "stuck" WUs will complete With these current 'mammoth' work units I too have noticed that they get to a point with around 10 minutes to go and sit there for quite some time. The work units appear to be compiling all data generated before then finishing the task. I have had them run for over 16 hours for just the 1 Decoy and have finished ok with a valid result. ID: 4435 · Reply Quote

zombie67 [MM] Send message Joined: 8 Aug 06 Posts: 75 Credit: 2,396,363 RAC: 6,299	Message 4436 - Posted: 19 Dec 2008, 15:59:31 UTC - in response to Message 4434. Last modified: 19 Dec 2008, 16:02:07 UTC Yes, I understand that the credit system changed back to pure benchmark. I noticed that too. But the unique method that used to be used here (and still used on Rosetta) is also benchmark based. It just averages with all the previous claims for that particular test. So in theory, as long as we don't mess with the benchmarks, the awarded credits should be about the same either way. Edit: I'm guessing the method changed back to the default when the server upgrade happened. Reno, NV Team: SETI.USA ID: 4436 · Reply Quote

Klimax Send message Joined: 7 Nov 07 Posts: 9 Credit: 11,583 RAC: 0	Message 4440 - Posted: 27 Dec 2008, 6:13:54 UTC Hello, I have failure of three lr6_score12_... WU https://ralph.bakerlab.org/result.php?resultid=1241954 https://ralph.bakerlab.org/result.php?resultid=1241953 https://ralph.bakerlab.org/result.php?resultid=1241939 apparently some sort of crash (maybe bug?) ID: 4440 · Reply Quote

Klimax Send message Joined: 7 Nov 07 Posts: 9 Credit: 11,583 RAC: 0	Message 4441 - Posted: 27 Dec 2008, 12:18:19 UTC - in response to Message 4440. Hello, I have failure of three lr6_score12_... WU https://ralph.bakerlab.org/result.php?resultid=1241954 https://ralph.bakerlab.org/result.php?resultid=1241953 https://ralph.bakerlab.org/result.php?resultid=1241939 apparently some sort of crash (maybe bug?) another three(all crashing in same function) https://ralph.bakerlab.org/result.php?resultid=1241948 https://ralph.bakerlab.org/result.php?resultid=1241947 https://ralph.bakerlab.org/result.php?resultid=1241936 ID: 4441 · Reply Quote

sslickerson Send message Joined: 15 Feb 06 Posts: 17 Credit: 4,006 RAC: 0	Message 4447 - Posted: 14 Jan 2009, 17:14:07 UTC Hi there, I am reattaching to RALPH to try and figure out why my Windows Vista 64bit laptop errors out on most minirosetta WU's. Will there be anymore WU coming up? Thanks, Timothy ID: 4447 · Reply Quote