Bug Reports for Minirosetta version 1.35

Message boards : RALPH@home bug list : Bug Reports for Minirosetta version 1.35

To post messages, you must log in.

AuthorMessage
James Thompson
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 7 Jun 06
Posts: 16
Credit: 268
RAC: 0
Message 4221 - Posted: 27 Sep 2008, 0:29:09 UTC

Please post issues/bugs relating to minirosetta version 1.35 here. Version 1.35 is an attempt to fix several issues from the 1.34 release, including bugs in hydrogen bonding and long run times for certain workunits.

The super-long run times are occasionally a problem with the jobs that we send out to you. So if you encounter such a job, please post to this thread so we can properly adjust the parameters of the job to reduce the run times.
ID: 4221 · Report as offensive    Reply Quote
Path7

Send message
Joined: 11 Feb 08
Posts: 56
Credit: 4,974
RAC: 0
Message 4222 - Posted: 28 Sep 2008, 11:28:45 UTC

Hello all,

The next WU ended with a error: -1073741819 (0xffffffffc0000005) Unhandled Exception Detected...
hombench_mtyka_foldcst_boinc_test2_foldcst_simple_t327___5001_2_1

Important to tell; after ± 5 minutes runtime I started boinc.scr which started OK. But didn't close anymore.
Not able to start Taskmanager I decided to hit the “sleep' button on my keyboard. After “awakening” my computer, the minirostta seemed to crunch on. However some 10 minutes later my firewall asked permission for Minirosetta 1.35 to contact the Internet.

Have a nice weekend,
Path7.
ID: 4222 · Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 16 Feb 06
Posts: 364
Credit: 1,368,421
RAC: 0
Message 4225 - Posted: 29 Sep 2008, 10:41:49 UTC
Last modified: 29 Sep 2008, 10:45:08 UTC

1.35 still has the same problem that 1.34 had with the Granted credit being the same as the Claimed credit.

Also RAC decay has still not been implemented, on either participants or participants computers, project stats have no current merit.
It used to work some months ago then stopped during one of the updates and has not worked since.
ID: 4225 · Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 8 Sep 06
Posts: 104
Credit: 36,890
RAC: 0
Message 4232 - Posted: 30 Sep 2008, 15:16:45 UTC

One more -1073741819 (0xffffffffc0000005) error for hombench_mtyka_foldcst_boinc_test2_foldcst_simple_t317___4997_3_2, with large BOINC Windows Runtime Debugger output.

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x006AA016 read attempt to address 0x403E8A08

And the callstack looks weird - the accvio happened at the same address as in the Path7's case, but the call stacks are definitely different.

Peter
ID: 4232 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 4234 - Posted: 3 Oct 2008, 19:11:11 UTC - in response to Message 4221.  
Last modified: 3 Oct 2008, 19:11:27 UTC

So if you encounter such a job, please post to this thread so we can properly adjust the parameters of the job to reduce the run times.


...any progress on getting the tasks to automagically report back to you with the runtime for each model? Seems to be a key piece of data that has been neglected in the past. We've been seeing occaisional 4-6 hour models for quite some time and have always assumed they were within the normal range.
ID: 4234 · Report as offensive    Reply Quote
James Thompson
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 7 Jun 06
Posts: 16
Credit: 268
RAC: 0
Message 4236 - Posted: 5 Oct 2008, 23:33:50 UTC - in response to Message 4234.  

So if you encounter such a job, please post to this thread so we can properly adjust the parameters of the job to reduce the run times.


...any progress on getting the tasks to automagically report back to you with the runtime for each model? Seems to be a key piece of data that has been neglected in the past. We've been seeing occaisional 4-6 hour models for quite some time and have always assumed they were within the normal range.


We do get that information from our workunits, but it can take a while for that data to come back, especially if the workunits take a long time to make a single decoy. To get a good estimate on the actual amount of time required to make a single decoy, we also need a lot of data because different people have very different computational power available for R@H.

We're doing an update today with more finely tuned parameters for job runtimes, so hopefully this won't be an issue soon. We're also reverting to an older version of the BOINC API that should eliminate at least some of the access violation errors, the BOINC API for v1.35 had some errors.

More soon. Thanks for your input!
ID: 4236 · Report as offensive    Reply Quote
mtyka
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 19 Mar 08
Posts: 79
Credit: 0
RAC: 0
Message 4244 - Posted: 7 Oct 2008, 16:42:09 UTC - in response to Message 4236.  


Thanks for posting these issues!

The latest batch that has "looprelax_ccd_moves" was using our classic fullatom "relax" protocol, which can take very long on the larger proteins. I've changed this now to use our brandnew "fastrelax" which takse about 5-10 times less with comparable performance and better scaling. Good that we caught this before it went out to R@H !

I'm also adding a reporter column to our data files which should report individual decoy times. THat way we should be able to catch "stray" or spurious run-time outliers.

Mike


http://beautifulproteins.blogspot.com/
ID: 4244 · Report as offensive    Reply Quote
Profile feet1st

Send message
Joined: 7 Mar 06
Posts: 313
Credit: 116,623
RAC: 0
Message 4259 - Posted: 10 Oct 2008, 20:56:42 UTC - in response to Message 4244.  

I'm also adding a reporter column to our data files which should report individual decoy times. THat way we should be able to catch "stray" or spurious run-time outliers.


Are you enhancing the watchdog as well? To "think" at the model level, rather then the task level, about when things have been running too long?? In fact, perhaps the watchdog could abort the current model, and continue running the rest of the task?
ID: 4259 · Report as offensive    Reply Quote

Message boards : RALPH@home bug list : Bug Reports for Minirosetta version 1.35



©2024 University of Washington
http://www.bakerlab.org