Analyze SQL Server and Agent errorlogs

Reading the SQL Server and SQL Server Agent errorlog files from time to time is a good idea. You can find errors that shouldn't be there. You can find messages that indicates misconfiguration. Or security issues. Also, to know the frequency for some messages can be valuable. The problem is that nobody wants to open a text file with thousands and thousands of messages and read through it.

There are of course tools and software out there to help with this, but sometimes nothing beats actually looking at the errorlog file. Every time I have look in an error file on a production server, I find surprising messages. This is regardless of whatever monitoring software is in place. Every time.

When going through an errorlog file, we want to make sure we catch the serious errors and other unexpected messages. We also quickly want to discard messages that we aren't interested in. And you don't want to spend more than about 30 minutes per SQL Server the first time you go through its errorlog files. Over time, you probably trim things so this process is just a few minutes. Here you find how I handle these things.

The code
You first need to install the PrepareLogTables procedure.
Then you can use the code in here to analyze your SQL Server logs. 

This article applies to SQL Server 2008 and later, where not noted otherwise.
Article written: 2012-07-05.
Updated 2015-04-08. I moved delete of un-interesting messages from the prepare proc to the following analysis script. Makes it more flexible and transparent.

Intended audience
You are probably a SQL Server DBA. I assume that you understand how to execute SQL statements and can handle a simple SELECT statement.

This solution first creates a stored procedure that imports the SQL Server and SQL Server Agent errorlog files information into two tables (SqlLogs and AgentLogs). You then run a SELECT statement several times over this table, modifying the WHERE clause. The working process is something like:

  1. Run a SELECT without a WHERE clause
  2. Browse the result, and pick some message that happens very frequently
  3. Add this as a LIKE to the WHERE clause, with appropriate wildcards (%). See my commented examples.
  4. You now see only those messages.
  5. Decide what to do with them. This is of course the important part!!!
  6. Change the LIKE to NOT LIKE.
  7. You now see all messages except above, reducing the number of messages you see with each iteration.
  8. Repeat step 2-7 for as long as you want.
  9. It won't take long until you are down to a handful of messages (10-50), which you just go over from top to bottom.
Above process might seems like a lot of work, but you will quickly see that it won't take you long to go through an errorlog file, even if it has many many thousands of rows. In the code section, you will see examples of things I want to look for, or disregard (as commented LIKEs for the SELECT statement's WHERE clause).
Note: do not have more than one LIKE since ANDing two different LIKE will always produce a result for 0 rows! In general, you work with one message at a time, first LIKE to see all occurrences and then NOT LIKE to hide all occurrences. But you can of course modify the WHERE clause to your liking.

Keeping the errorlog file as small as possible
The fewer messages you have in the errorlog file, the better. There are several ways to keep the errorlog files as small as possible:
  • Make sure this doesn't happen again. I hope this is obvious. Say you have a bunch of failed login messages. Find out why these happen and correct the configuration for that client, or whatever the reason might be. The more serious the error is, the more rewarding this type of job is, especially if you can hunt down the reason, fix the problem and make sure it doesn't happen again.
  • Configure SQL Server to not write "successful backup" messages. You do this using trace flag 3226, more info here. I only do this where appropriate, typically when we don't use some 3:rd party backup software for SQL Server.
  • Configure SQL Server to not write certain messages to the eventlog. You do this using sp_altermessage. I hope it is obvious that you only what to do this if this message is indeed something you have no value at all for in the eventlog!
Where to you install the procedure and the tables?
Anywhere you like. I usually have a database named "sqlmaint" for these type of things, and this is what you find for the USE commands in my scripts. Change to your liking.

How about a centralized solution?
Sure, that is possible. The simple way is to use a multi-server query windows in SSMS. Or you can have some routines to import all logs into some central server and do the analysis there. I haven't had the need for a centralized solution so far, so I haven't spent time on that.

The PrepareLogTables procedure
The first parameter is how many SQL Server logs to import (1 means only the current one, 2 means the current and the one before, etc). The second parameter is how many agent log files to import.
Note that there is a DELETE statement for each log. This remove messages that I tend not to be interested in (like startup messages). These messages can have very valuable information per se, but from the perspective of going over the errorlog file and look for out-of-the-ordinary, I prefer to remove these. Modify this WHERE clause to your liking.