Overview
SQL Server keep statistics for where it waits. This statistics can be find using various DMVs, the most famous of them is probably sys.dm_os_wait_stats. This DMV has accumulated wait stats for each type since you started SQL Server. Analyzing wait stats can give you an insight for your SQL Server that can be beneficial when doing performance tuning and such work.
What is not kept automatically is how the wait stats changes over time. This utility does just that.
We have a trail of captured values for (by default) 3 days with a denser capture rate. This will be how often you schedule the capture job, for instance once per minute.
We also have a trail of captured values for (by default) one year with a more course capture rate. Byt default we keep only one capture per day for this part.
The code
Versions
SQL Server 2012 and later. The capturing part should work with 2008, some of the queries using LAG with the OVER clause which requires 2012.
Details
We create a few tables and a couple of Agent jobs:
- A schema named ws.
- Table ws.ignores, which has the wait stats that we aren't interested in.
- Table ws.types with the wait stats that exists. We don't want to log the name of the wait stats each time we capture the wait stats. This table has the name and a smallint column with the id for the wait stats.
- Table ws.the_log, which has the actual captured wait stats. We turn on row compression for this table if your SQL Server version/edition supportes it.
- Table ws.config, which has how many days back in time we want the dense and the course capture information. By default 3 and 365 days. Change in the scripr or later in the table if you want.
- We populate the ignore table, based on Glenn Berry's excellent SQL Server diagnostic queries.
- Then you created an Agent job which does the INSERT of the wait stats into the ws.the_log table. Schedule this as frequently you want, for instance once per minute.
- You also create an Agent job which deletes old data, schedule for instance once per day.
Then it is just a matter of query your logged data. I have included a couple of queries, but this part is currently work in progress.
Limitations and further work to be done
There are just a couple of queries in the script for analyzing the captured data.
Things that I'll probably do include:
A view with the top 10 wait stats, whcih can be used from other queries so we focus on the most interesting wait stats.
Perhaps also a table with further ignores. I.e., for these, the values are indeed captured, but we can use the table in a WHERE clause to ignore when querying the data. You might want to ignore some stuff for some analysis but still have the values in the log, for instance.
I need to do some thinking regarding how to display the difference since last capture. Is there a standard here? Per minute? Perhaps too course? Per day? How do we handle the dense and the course trail of captured information in this regard?
There are of course plenty more in this section. We also have visualizing of the information. Perhaps some queries that are tailored for visualizing, to be used in Excel etc?
Suggestions are appreciated here. Could be ready-made queries or just thoughts on the subject. You find my mail address here.