Scheduler update for more accurate job cache

Message boards : News : Scheduler update for more accurate job cache

To post messages, you must log in.

AuthorMessage
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 250
Credit: 542,407
RAC: 0
Message 6806 - Posted: 2 May 2020, 20:04:35 UTC

We updated the scheduler to use the optional user defined cpu run time preference for the estimated run time. Hopefully this should produce a more accurate job cache. Please post issues regarding this updae in the discussion.
ID: 6806 · Report as offensive    Reply Quote
xotwod

Send message
Joined: 30 Mar 20
Posts: 3
Credit: 6,084
RAC: 0
Message 6807 - Posted: 2 May 2020, 22:25:36 UTC - in response to Message 6806.  

I just got 159 Ralph@home tasks which estimate that they will take ~1 hour which does not seem possible. I'm almost certain I won't be able to meet the deadline - also the deadline shown on the RALPH website is different than what I see in my BOINC manager.
ID: 6807 · Report as offensive    Reply Quote
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 250
Credit: 542,407
RAC: 0
Message 6808 - Posted: 2 May 2020, 23:02:27 UTC - in response to Message 6807.  

I just got 159 Ralph@home tasks which estimate that they will take ~1 hour which does not seem possible. I'm almost certain I won't be able to meet the deadline - also the deadline shown on the RALPH website is different than what I see in my BOINC manager.


That doesn't sound good. How many cpu's? What do the website and manager show for deadlines? Maybe I need to reduce the duration calculation a bit to hopefully prevent this from happening.
ID: 6808 · Report as offensive    Reply Quote
xotwod

Send message
Joined: 30 Mar 20
Posts: 3
Credit: 6,084
RAC: 0
Message 6810 - Posted: 2 May 2020, 23:14:55 UTC - in response to Message 6808.  
Last modified: 2 May 2020, 23:18:23 UTC

https://ralph.bakerlab.org/show_host_detail.php?hostid=45448 is the computer, so literally just a AMD Ryzen 5 3600X 6-Core Processor (12 processors)


Application
Rosetta 4.20
Name
rb_04_24_22842_22255__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_39_11723
State
Task suspended by user
Received
2020-05-02 6:10:39 PM
Report deadline
2020-05-05 6:10:40 PM
Estimated computation size
40,000 GFLOPs
Executable
rosetta_4.20_windows_x86_64.exe

is shown in BOINC manager for that task, all of the tasks I have show a deadline of 11 May 2020, 22:10:40 UTC here : https://ralph.bakerlab.org/results.php?hostid=45448&offset=100&show_names=0&state=1&appid=

This behaviour is consistent across all 159 tasks currently in progress on my account.
ID: 6810 · Report as offensive    Reply Quote
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 250
Credit: 542,407
RAC: 0
Message 6811 - Posted: 2 May 2020, 23:51:44 UTC - in response to Message 6810.  
Last modified: 2 May 2020, 23:52:49 UTC

I figured out the issue causing the discrepancy. It should be fixed now with new work units. I previously added a "report_grace_period" param in the server config a while back that added 6 days to the normal 3 day deadline and that longer deadline is what is showing on the web site. I took that config param out so now there is no longer a grace period. Our R@h project does not have a grace period set in the config.

How many concurrent jobs are running on your host? 6? Assuming all jobs run for 1 hour, how many will not make the 3 day deadline? I can add a factor that increases the job duration which will produce a smaller cache, if necessary to help satisfy the deadlines.
ID: 6811 · Report as offensive    Reply Quote
xotwod

Send message
Joined: 30 Mar 20
Posts: 3
Credit: 6,084
RAC: 0
Message 6812 - Posted: 2 May 2020, 23:56:35 UTC - in response to Message 6811.  

12 jobs can run concurrently, but after running for half an hour a task says it is around 3% complete, it's submitting after an hour so I guess sure they will all meet the deadline. I guess my question is now why would tasks submit at around 6% complete?
ID: 6812 · Report as offensive    Reply Quote
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 250
Credit: 542,407
RAC: 0
Message 6813 - Posted: 3 May 2020, 0:13:08 UTC - in response to Message 6812.  
Last modified: 3 May 2020, 2:07:43 UTC

The BOINC client progress and remaining time estimates can be off and irregular. You can ignore those estimates and assume the jobs will run close to the cpu run time preference which defaults to 1 hour on Ralph and 8 hours on R@h.

I updated the title of this news thread since it was a bit misleading initially. A run time estimate (job duration) is calculated within the scheduler code to determine how many jobs to send to a host (and thus how many jobs to cache). This estimate is now based on the user run time preference. This change does not affect the progress and remaining time estimates displayed by the client. These values can be off and irregular due to the random nature of the modeling algorithms and checkpoint frequency, and can be ignored.
ID: 6813 · Report as offensive    Reply Quote
Michael E.

Send message
Joined: 29 Apr 20
Posts: 2
Credit: 14,831
RAC: 3
Message 6815 - Posted: 3 May 2020, 2:33:49 UTC

I am seeing a number of tasks sent 1-May that within BOINC manager have a Deadline of May 4 but when viewed in the web-based account information, it says 10-May.
The web interface for me is: https://ralph.bakerlab.org/results.php?userid=59270&offset=0&show_names=0&state=0&appid=

A screen cap of the tasks in the BOINC Advanced view is available at: http://www.wingnaprayer.golf/ralph@home-snap3.png

I do not understand the inconsistency.

I understand from working with Rosetta folks that there is some software measurement once your device processes a dozen or so tasks. Until that time, my feedback is to limit the number of tasks and/or double the estimated time - see my feedback starting at https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6893&postid=95513#95513 and helpful replies.

I will likely need to only rum ralph@home to try to enable most of the tasks to complete if the actual deadline is May 4.

Mike[/img]
ID: 6815 · Report as offensive    Reply Quote
Profile dekim
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 20 Jan 06
Posts: 250
Credit: 542,407
RAC: 0
Message 6816 - Posted: 3 May 2020, 2:50:35 UTC - in response to Message 6815.  

Michael, please see my post below. This discrepancy was due to a server configuration that I recently removed which should fix the discrepancy for new jobs. The deadlines should be 3 days and should match what is on the website for new jobs.
ID: 6816 · Report as offensive    Reply Quote
Michael E.

Send message
Joined: 29 Apr 20
Posts: 2
Credit: 14,831
RAC: 3
Message 6819 - Posted: 3 May 2020, 4:06:25 UTC - in response to Message 6816.  
Last modified: 3 May 2020, 4:12:07 UTC

Thank you dekim!

I did have to abort some 3 tasks because they would not finish in time.
ID: 6819 · Report as offensive    Reply Quote
Dotsch
Avatar

Send message
Joined: 4 Mar 06
Posts: 12
Credit: 11,931
RAC: 0
Message 6823 - Posted: 5 May 2020, 10:10:14 UTC
Last modified: 5 May 2020, 10:10:25 UTC

The 14 task my system with 1 hour estimated run time was from 0,75 to 1,4 hours, with one exception for result ID 5075666 which ran 4 1/2 hours.
ID: 6823 · Report as offensive    Reply Quote
richard

Send message
Joined: 6 Jul 20
Posts: 1
Credit: 0
RAC: 0
Message 6840 - Posted: 6 Jul 2020, 9:37:41 UTC
Last modified: 6 Jul 2020, 9:38:22 UTC

ID: 6840 · Report as offensive    Reply Quote
alvin123ew

Send message
Joined: 18 Jul 20
Posts: 1
Credit: 0
RAC: 0
Message 6847 - Posted: 18 Jul 2020, 10:06:06 UTC
Last modified: 18 Jul 2020, 10:13:19 UTC

https://guitartunertips.com/klos-travel-guitar-review/
ID: 6847 · Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 14 Apr 20
Posts: 4
Credit: 2,940
RAC: 0
Message 6848 - Posted: 18 Jul 2020, 17:30:11 UTC - in response to Message 6806.  

Does this rely on BOINC being able to determine a task duration correction factor to get the estimates right for non-default preferences? If so, I suspect you need to remove
<dont_use_dcf/>
from the project settings for that to be calculated and applied.
ID: 6848 · Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 14 Apr 20
Posts: 4
Credit: 2,940
RAC: 0
Message 6849 - Posted: 19 Jul 2020, 8:57:41 UTC - in response to Message 6848.  

(Reports of estimates being off, plus armchair analysis, over on the Rosetta forums: Why are my 'Remaining' time estimates so far off?)
ID: 6849 · Report as offensive    Reply Quote

Message boards : News : Scheduler update for more accurate job cache



©2020 University of Washington
http://www.bakerlab.org