SQL Server writes messages to the EventLog when various things happen in the database engine. You can have SQL Server Agent to monitor the EventLog for some specific SQL Server errors, and let Agent can notify an operator (send e-mail) and/or start a job. The problem has always been to know what alerts to define - this article is an attempt to rectify that.
You might already have some monitoring application that has support for SQL Server, like OP5 ,HP Operations Manager or Microsoft Operations Manager. If you do and are happy with that solution, then stick with it. But if you don't have such a solution, or if you feel that your current software's support for SQL Server EventLog monitoring can be improved, then read on.
Stefan K., message 833
This article applies to SQL Server 2005 and later, where not noted otherwise.
Article written: 2009-05-29
2011-09-15: Cleaned up T-SQL code so you can execute all in one go
2012-07-03: Added message 833
I assume a reasonable knowledge level of SQL Server. You should be able to find your way around the tools and understand the most common terms and acronyms. If you don't meet those criteria, then I suggest you buy some tool which has an interface which is easier to use, hand this over to a senior DBA or perhaps hire a consultant to do the planning and implementation. If you want to use the T-SQL script at the end of this article, then you should be able to read and have a grasp of what it does.
The e-mail part
You probably want to send an email when the event occurs. I suggest you use the Database Mail support which was introduced in SQL Server 2005 - a pure SMTP implementation. I discourage you from using the old "SQL Mail" support since it uses MAPI. SQL Server Books Online describes how to configure Database Mail, so I won't talk about it here. Check out Dan Guzman's script to configure Database Mail using T-SQL instead of running the Wizard on each SQL Server Instance.
Frequency of EventLog polling
You can control this using a registry setting. Do this at own risk, it isn't supported or documented. The name of the registry key is EventLogPeekInterval. Where to find it depends on things like whether it is a default or a named instance, the instance name and also version of SQL Server. To put it bluntly, if you can't find it, then you shouldn't mess with it. The default value is every 20 seconds.
Defining the alert
You define an alert using the procedure msdb.dbo.sp_add_alert or SSMS (SQL Server Agent, Alerts folder). If you are about to use the T-SQL script and sp_add_alert, then please do read about the procedure in Books Online, and decide what values you want for each parameter. In the script below you will find what values I use, but that might not be the right values for you. I won't discuss the parameters here since they are documented in Books Online.
You also want to say what action to be performed. I will assume that you want to send an email (msdb.dbo.sp_add_notification). This requires that you configured the Database Mail support, configured Agent to use the Profile you created, re-started Agent, and finally configured an Operator. You will see that in the script below, I hard-wired the operator name to 'SQLAdmins'. Change this to your liking.
For the alert, you say whether you want to trigger on a certain error number or all errors with a certain severity level (I'm ignoring Perfmon and WMI alerts here). You can't have both. The difficult part is deciding on what to trigger on. You don't want to go too wide so you will be mail-bombed - but you don't want to miss important messages either.
One thing you will notice that if you trigger for a certain severity level that there are messages with that level that you may want to exclude. We used to be able to enter such "Non Alertable Errors" in the registry key NonAlertableErrors (as described here). This functionality seems to have been removed, I'm afraid. Let me know if your tests show that the reg key actually does anything in 2005 or later.
How will this evolve and how can you help?
In a blog post I wrote 2009-02-23, I had hopes for some community project to get started for this. That didn't happen, so I decided to write this article. I don't expect that I managed to find the "right" messages. If you have some messages you find obvious should be there or vice versa, then let me know. For instance, there might be some error which always follow some particular other error - meaning it isn't necessary to define alerts for both. Just keep in mind that we cannot exclude certain errors if we included the whole severity level for that error. I expect that many of us will end up with some differences in their alert definitions - one size doesn't fit all. But I hope that the suggested definitions on this page can serve as a good starting point. Please send me an email if you find obvious things to add or remove from the list. If I use your recommendation, then I'll add your name (and URL if you provide) to the Acknowledgements section, unless you ask otherwise. You find my mail address here.