Thursday, January 2, 2014

TempDB Parasites!

Unless you happen to be Chuck Norris, your SQL Server instances only have one tempdb. That makes tempdb a single point of contention for an instance. What’s worse is that there are a lot of things going on in tempdb that you may not be aware of. Let’s look at some things that can use tempdb while we’re not looking.

Tuesday, November 5, 2013

Dos and Don’ts of Providing Exceptional Customer Service to Exceptional Customers

What is “exceptional” customer service? It’s difficult to define, but you know when you have received it. It is an experience in which a service representative helped change a stressful situation into a relaxing one. If you get on the phone with a service rep, or start an email chain to a support department about something that is driving you crazy, and a few hours later your entire day is better because you did it, you have experienced “exceptional” customer service.

What are “exceptional” customers? When I say exceptional customers here I’m really thinking in the context of skill, training and experience. In my case, I’m generally dealing with DBAs and developers. Now, let’s take that a step further and say that I’m generally dealing with very experienced SQL Server DBAs and developers. Given this context, it’s much easier to understand what I mean by “exceptional.” These aren’t folks who just bought a new TV, and are upset because it didn’t come with an HDMI cable. These are people who have spent years shaping their craft, and building their own library of skills and experience. They are indeed “exceptional.”

Following are some Dos and Don’ts that can be considered when attempting to provide exceptional service to exceptional customers:

Training

Do spend as much time as possible learning everything you can about whatever it is that you support, including subjects that are part of the same domain.

For example, if you support a tool like SQL Sentry, you will want to know as much as you can about SQL Sentry, Windows, Virtualization, SQL Server, SSAS, SSIS, SSRS, SharePoint and Oracle, and this is just the short list. You can break all of these down into the specific subsystems you would want to know deeply. Providing support to exceptional customers requires a deep understanding of what they do. You don’t have to know everything they know (which would be quite difficult considering the broad range of subjects), but you do need to understand their language.

Don’t always rely on your management or organization to decide where, when and how to train you.

A wise manager once told me that I should never count on anyone but myself to drive my career. The same applies to training. Continuously review yourself to know what training you need, and either seek it directly or ask for it.

The Audience

Do be aware of, and prepared for, the fact that you may be talking to someone who knows more than you do about several subjects.

Exceptional customers know their stuff. Do not ever try and fake your way through a conversation with them. Be prepared to take a subject offline, and do the research in order to be able to effectively communicate with them about whatever it is.

Don’t make any assumptions about a customer’s background, experience, expertise or role in their organization.

This really speaks for itself. Making assumptions in general is something that should be avoided.

Acting Ability

Do learn to put any life and/or professional issues aside when working with a customer.

In any support role you have to have some acting ability, but even more so with exceptional customers. They will expect consistent and rational behavior, so you have to be able to forget whatever else is going on, and focus on the moment. This is not to say that you shouldn’t be yourself, but you may have to put on the face of the best “you” you can be, even if on a personal level, you are not at 100%.

Don’t allow a bad mood to be reflected in your customer interactions.

It’s surprising how easily a negative tone can come across. For example, in an email, if you are normally very wordy, and suddenly you send a short response, it can be seen as carrying a negative tone. Don’t let the fact that you’re having a tough day affect your consistency.

Responsiveness

Do let customers know the status of your effort.

Your customers will appreciate a simple note from a real person saying that their issue is being reviewed or worked on. You may not have an answer right away, but get an initial non-automated response to them as soon as possible.

Don’t avoid communication simply because there is nothing new to report.

If you’re working on something long term, provide updates at reasonable intervals. Every day is too often, but every 2 weeks is too long. I tend to target the middle of the week on a weekly basis. That way you aren’t hitting them with it when their week is getting started, and you also aren’t sending it when they are starting to wind down. Even if you haven’t made any new progress, let them know you’re still there.

Measuring Performance

Do track customer satisfaction, accuracy of information, and efficiency.

It is important to know how you and your team are doing, but measure your efforts by the results being produced.

Don’t measure performance by numbers.

Measuring performance based on the number of “tickets” you close is setting you up to create angry customers. It will ultimately motivate your team to cut corners, and only provide the bare minimum level of service to close incidents. If you measure numbers at all, you should use these measurements for planning capacity.

Escalation

Do escalate by adding a person rather than handing off to a person.

If you need to get another person involved, do it by adding a person. This way at every escalation level you have folks that are familiar with the entire case. The customer will not feel like they’ve been transferred, and every support engineer involved continues to learn from the effort.

Don’t transfer.

Transferring between people and departments is very frustrating for customers. When is the last time you were happy about being transferred?

Service Limitation

Do know where your support ends, and someone else’s should begin.

Unfortunately, we can’t always know everything or solve every problem. Some issues turn out to be bugs in a hosting platform, or otherwise completely outside of any help you can offer. It is important to recognize when you have reached this point, and to be prepared to help the customer with the next step, which may even be a referral to another service. If you have to make that referral, do everything you can to help them transition to that next step, and continue to follow up until everything is resolved, or you know they are in good hands elsewhere.

Don’t continue to spend your customer’s valuable time on issues far outside the scope of your offering.

When you have reached the end of your ability to provide help, do not keep hanging on trying to solve something that you are not equipped to solve. Imagine how you would feel if your water heater stopped working, so you called in a plumber. The plumber discovered there was something wrong with your electricity and decided to make it his personal mission to fix your electricity. Three weeks later, when he finally gives up, you have to call an electrician and spend 5x as much as if the plumber would have just referred you to an electrician in the beginning. For your customer’s sake and your own, go ahead and let them know when you are at the end of what you can offer.

Empathy

Do learn to see all sides of a situation, and put yourself on each side.

This can really make all the difference. If you can get into the customer’s shoes and truly understand what they are going through to the point that, even for just a moment, you feel it yourself, you will be in a position to provide exceptional support.

Don’t approach incidents from an angle of selfishness.

Never rush an incident because you have to get to a meeting, or because it is late on Friday, or for any reason! If you are not prepared to be completely available for the customer, for as long as you are needed, then you shouldn’t start on the incident yet. It is better to have to wait a bit longer than to have a bad support experience that sticks in your mind forever. If you have to wait, just make sure you respond to let the customer know their issue is in the hands of a real person.

Conclusion

If you want to provide exceptional customer service (which you should want to!), and you are working with experienced, exceptional customers, just remember that you need to be on your game as much as they are. They expect it, and they have earned it!

Until next time,
-JRH

Monday, July 22, 2013

My Perspective: The Top 5 Most Common Windows Problems That Might Prevent Monitoring SQL Server

Introduction

It’s been quite some time since I promised that I would write this list, so I’ll get to it without a lot of needless chat.

We all know that it is very important to monitor events and performance on SQL Server, but many times our well meaning paths are blocked by the host operating system itself. This is a list of the most common issues I run into here at SQL Sentry when troubleshooting issues that keep folks from being able to fully monitor their database server.

Note that everything on this list has something to do with a native piece in Windows, or drivers written for Windows. There are many things that can reduce your ability to monitor a server, but this is a Windows subsystem only list. The basic idea is that these are things outside of SQL Server that you may want to look at if you’re having trouble collecting counters or other valuable system information.

Number 5: Cross Domain Security

In a lot of cases, we’re trying to access servers in a different domain than our user account belongs to. I run across this frequently when a DMZ type configuration is involved. Typically SEs will not set up any sort of trust between the corporate domain and the internal domain of a DMZ. I won’t go into design principles of different network topologies or security models here, but suffice to say that it is for a very good reason.

If you can’t access performance counters or WMI because of this, the answer is pretty clear. You need to gain access to the other domain in some way.

The two most common recommendations I usually provide are to use a “pass-through” account, or to set up a leg of your monitoring environment inside the other domain.

By “pass-through” account I am referring to the age old act of having a user account with the same login name and password as yours that lives on the server you’re trying to access. Windows has allowed this for as long as I can remember, but I also wouldn’t trust that it will always be there. Newer concepts like UAC also complicate things when using this method, so I usually recommend moving into the other domain over pass-through.

Number 4: WMI Repository Corruption

In my opinion, the whole WMI stack is a bit of a train wreck for something that is so vital when it comes to system management and monitoring. Something as simple as a provider delivered by a new NIC driver or disk controller can corrupt the repository to the point that it can’t be used. I don’t see this as often since Windows Server 2008, but it does still happen.

It is also difficult to determine that you have corruption because you will generally just get strange errors that don’t mean a lot just by reading them. If you suspect issues with the WMI repository, there are three tools you can use to help ferret it out.

  1. WBEMTest
  2. WMIDiag/winmgmt
  3. WMI Tracing

A lot of times, if the repository is not actually corrupt, it may be just a bad provider that can be fixed by updating to the latest driver for one device or another, but if you have indeed got a corrupt repository you may need to rebuild it. You can find instructions for that here.

Number 3: Strict Firewall Rules

Remote registry is needed to access performance counters on a remote Windows server. Remote registry works over SMB. SMB uses RPC. Therefore, besides having the Remote Registry Service running, you need to have the ports needed for RPC open to be able to read performance counters remotely. This can get “interesting” when firewalls are involved, because RPC uses dynamic ports. No firewall admin is going to let you open the default dynamic port range for MSRPC through the firewall, which is different depending on what version of Windows you’re dealing with, but is quite large regardless.

If you’re interested in this topic, here are a couple of good reads on the subject:

  1. How RPC Works
  2. Setting Windows Dynamic Port Range

So, what needs to happen here is that all three parts (firewall, host and target server) need to be in alignment with what range of dynamic ports can be used. If they are not in alignment you can easily have RPC that just does not work at all, or that works sometimes, and fails when you least expect it.

If you know a firewall is involved, you really need to do some planning and get your SEs involved in your monitoring effort to ensure that everything in that pipeline gets configured correctly for you.

Number 2: Name Resolution Issues

This happens all the time. You can’t connect to WMI or Perfmon, but you know everything is configured correctly, and there are no network issues. You have all the rights you need, but something is still not working.

Open up a command prompt and run a ping on your host name or perform an nslookup. Better yet, run a tracert to ensure that you are not only getting to a server, but to the right server.

Countless times I’ve beat my head on my desk wondering what the heck is wrong, and when I finally tried using only the IP address everything was fine.

Sometimes DNS entries are changed without everyone involved being told. Sometimes the IP address is changed and not everyone is told. Sometimes there are even duplicate entries. Regardless of what caused it, the issue can be resolved by just fixing DNS, using the correct server name, using the IP directly, or sometimes you can just run “ipconfig /flushdns” from a command prompt to refresh your local DNS cache.

Number 1: PerfLib Corruption

Without a doubt, the number one problem I see with monitoring Windows performance counters is when the counter repository is corrupted.

The performance counter system in Windows can be very touchy. Some of the things that I have seen cause corruption in PerfLib counters are Windows Updates, SQL Server Service Packs and/or CUs, Software Installations, Cluster Failovers and Power Outages. There really seems to be not much rhyme or reason to what can bring down the performance counters we rely on so much to be able to tell what resources we’re in short supply of on our servers.

This happens so much in fact, that we have a frequently visited Q&A thread that provides instructions on rebuilding your counter library in whole or in part when the need arises. You can find that here.

Honorable Mention: TCP Offloading

I wanted to include this one, because it has really bitten me a few times. There are some settings that allow certain functions to be offloaded to your NIC to save CPU cycles. There have been more than a few cases where having these enabled has caused some really crazy behavior. For example, in one case the user was getting timeouts from the client, but there were never any queries running long enough on the server to cause the client to timeout. After an embarrassingly long time, I came across this blog post. It wasn’t exactly my problem, but it was sort of similar. After some experimenting we discovered that disabling TCP Large Send Offloading solved the timeout problem.

If you are having dropped connections, strange timeouts, or slow responses that make absolutely no sense, this information might be for you.

Conclusion

So there we have my list of 5, or 6, very common issues related to Windows itself (or drivers written for Windows) that can really ruin your day if you need to monitor SQL Server performance or events from outside of SQL Server. I hope that someone can learn from my experiences and really save some time if they run into any of these.

Until next time,
-JRH

Thursday, March 14, 2013

SQL Saturday Richmond: Downloads Available

I want to thank everyone involved in the Richmond, VA SQL Saturday this past weekend. It was a great event, and I enjoyed both speaking and meeting with everyone.

For those that may have attended one of my sessions, I wanted to let you know that I’ve uploaded the slide decks and demo materials. Note that any small applications I built were compiled using .NET 4.5 and Visual Studio 2012. If you drop me a note on here or in an email I’ll send you the source code. It really isn’t anything special, except for the one I built for wrecking tempDB. That one is actually sorta cool.Winking smile

Below are the links to where you can download the .zip files:

Thanks again everyone, and I hope to see some of you out at other upcoming events!

Monday, January 21, 2013

3 Reasons to Upgrade SQL Sentry Now

Something that always amazes me no matter how long I have been in technology is the rather large pool of folks that never make it around to upgrading their tools. When it comes to SQL Sentry, we love for you to upgrade for your benefit and ours, and we really work hard to make it easy to do so. So here are some great reasons to upgrade your SQL Sentry environment just as soon as you are able to.

3 – Help Us, Help You

This sounds cheesy, but it is 100% true. When the majority of customers are on the latest version, and you are 1 or 2 major releases behind it will almost always be more difficult to support. Things change, sometimes major things, and having to travel virtually back in time to see how it used to work takes more time and effort. If you want help in the fastest possible time, please upgrade to the latest version by logging onto www.sqlsentry.net/portal.

2 – That Bug You Found Might be Fixed, or That Feature You Wanted Might be Complete

Unfortunately no software is perfect. If it were we might be running from Terminators right now! We take software defects seriously, and we consistently release incremental builds to fix bugs and provide minor improvements. We also take feedback from our customers very seriously, and when you ask for something we will tell you when it is ready. Any new feature or defect is listed in the change list. If you have a question about something on the change list, just email us at support at sqlsentry.net. We are happy to explain in detail any bug fix or feature listed.

1 – You Already Paid for It

New version releases are rolled up under your annual SQL Sentry software support agreement. Besides having full access to our exceptional (and good looking!) support services, this covers all newly released SQL Sentry builds from minor updates to full blown major releases. Even if your maintenance contract expires tomorrow, you can upgrade today!

Honorable Mention – Do it for Kevin!

Every time someone needs support for something that has already been fixed, Kevin Kline (b|t) loses another hair! Think about Kevin’s head the next time you ignore the new version in the Version Checker!

Friday, August 17, 2012

My Perspective: The Top 5 Most Common SQL Server Performance Problems

What Prompted This Post?

As lots of you know already, Kevin Kline (b|t) has joined us here at SQL Sentry. I’ve followed Kevin myself for a good while, so I’m really excited to have him here with us.

One of the first things Kevin asked of me was to list the top 5 technical issues we see while providing customer service for SQL Sentry. For our Client Services team, technical support is far more than just supporting our own software products or managing licenses. Everyone on the team is deeply trained in SQL Server, Windows and Microsoft .NET. In fact, in general, any member trained on our Client Services team should be able to walk into the role of a Windows sysadmin, SQL Server DBA or .NET Application Developer with very little adjustment effort.

When presented with a technical issue or question, we try to see past the surface, and read between the lines. Many technical issues or questions are not rooted in what seems apparent on the surface, and the Client Services team is expected to work to resolve the root problem rather than simply addressing the immediate or obvious symptom.

As a result of this expectation, we do come across a variety of technical issues with both Windows and SQL Server, and I thought that filling Kevin’s request for a "Top 5" would make for an interesting blog post. I’m starting with 5 SQL Server Performance issues, and I’ll have another post later on 5 general technical issues that may or may not have anything to do with SQL Server.

I’ll make this a count down from 5 to 1, with 1 being the most common technical problem we see.

Number 5: Tempdb PAGELATCH Contention

Jonathan Kehayias (b|t) covers this topic very well in a great article on Optimizing tempdb configuration with SQL Server 2012 Extended Events, so I’m not going to attempt to go into it very deeply here, but I will talk about my experience with it.

I’m starting to see this more and more. It is usually with a system making heavy use of tempdb for some type of ETL process. This is especially common if it is an ongoing "real-time" style ETL process. I’m not saying data loads are the cause of this by the way; I’m just relaying my observations.

The symptoms of this can vary, but some things are always the same. High PAGELATCH waits in tempdb and poor performance recorded for processes using tempdb. I’ll typically follow the waits to Top SQL in Performance Advisor, and see lots of queries that use temp tables listed in Top SQL. These queries usually run in milliseconds, and should never be counted among the "Top SQL" for the server. This can have people feeling like these queries are a large part of the problem, but that is not necessarily the case at all. The queries are the victims of the real problem.

Once I’m suspecting this to be the case, I will usually jump to the Disk Activity tab in Performance Advisor to see how tempdb is configured. Most times I actually see the same thing: A busy tempdb with a single data file defined. From here I’ll usually recommend reconfiguring tempdb, and direct them to a resource like Jonathan's article for more information.

Number 4: Expecting Auto Update Statistics to Keep Statistics Updated

The problem here is that the thresholds for triggering auto statistics updates end up being the same in most cases, even for a very large table. Without going into a very deep explanation, the threshold is ~20% of the rows in the table. So on a really big table it takes a lot of data change to trigger an update. Kevin Kline has a nice, easy to follow explanation of this here as well.

The reason this makes the list is that DBAs seem really surprised to find out that the auto update isn’t taking care of things the way the name implies. Then there are also many dbas that believe it should be handled by their maintenance job. Then after looking at the maintenance, they are doing index reorgs most of the time, and that won’t update the statistics either (though a rebuild will). I also want to note here that if you are using the Fragmentation Manager feature in SQL Sentry 7.0 and higher, you can have a running history of when your indexes were reorganized rather than rebuilt. This can help you decide whether the problem you’re seeing could be related to auto update not happening.

The lesson here is really to keep an eye on statistics, and make sure they’re updated regularly, especially on large tables, which are becoming more and more common as time goes on. Another option here can be to use trace flag 2371 to actually change the formula used to trigger the update. The nicest explanation of this option I have found is at this blog post by Juergen Thomas.

Number 3: The CXPACKET Wait Type

I would say that this is hands down the single most common wait type I see on larger SQL Server systems when someone asks me to look into query performance with them.

There is a lot of information out there on how to deal with this, but sadly I still see a lot of people make the initial assumption that the problem should be solved by having either the query or the entire server set MAXDOP to 1. More often than not the problem can be handled by proper indexing or statistics maintenance. It could also be that the plan cached for this query is just not optimal, and you can mark it for recompile using sp_recompile, set recompile at the query level, or just evict the plan using DBCC FREEPROCCACHE with a plan handle. It is best to exhaust these options before deciding to change MAXDOP to 1 because you could be throwing away a lot of processing power without realizing it.

Paul Randal (b|t) has a great survey on his blog here that seems to support what I’m used to seeing as well. In fact, he’s the one who first taught me that MAXDOP 1 is not necessarily the answer to this.

Number 2: Misunderstood “Timeout Expired Prior to the Completion of…”

This one is huge. Outside of some very edge case behavior, there are two basic types of timeouts you *might* deal with for SQL Server. These are connection timeouts and operation (or query) timeouts. In both cases these are values set by the client connecting to the SQL Server. On the server side, there is a remote query timeout setting, but this is the very edge case I mentioned and will leave that for another discussion.

I’m going to focus on operation timeouts, since they are the most common. Operation timeout errors from various software tools might be the most misunderstood situation I come across. The cause of these really boils down to one simple thing though: The client executing the command has set a maximum amount of time that it will wait for the command to complete. If this maximum is reached prior to completion the command is aborted. An error is raised from the client.

Many times the timeout error will induce a panic mode, because the error can look a bit intimidating. The reality is, though, that this is not much different than hitting the stop button in SSMS because the query was taking too long. In fact, it will show up exactly the same in a profiler trace with Error = 2 (Aborted).

So, what does a timeout like this really tell us? It tells us that queries are taking longer than expected. We should go into “performance tuning” mode rather than “something is broken” mode. The error information from the client is really just some good information on where you might start to focus your tuning efforts.

If you receive timeout errors from the SQL Sentry monitoring service, and one of the servers you are monitoring is the source, this is not telling you that SQL Sentry is broken. This is SQL Sentry telling you that this server is experiencing performance issues. Again, it is time for “performance tuning” mode. These errors could be easily consumed internally, and retried later, but this would be doing our customers a huge disservice. We believe that you should know about *any* potential problem on your monitored server, even if it is SQL Sentry encountering the problem.

Incidentally, this is true for SQL Sentry, just as it is for any other system that uses an RDBMS for a repository your SQL Sentry database needs some TLC now and again. Without it you may indeed experience some timeouts from your SQL Sentry client. We spend a lot of time tuning our queries for performance before they ever make it out the door, but proper maintenance will ensure they continue to run as intended.

Number 1: Memory Pressure

This is the big one. As soon as Kevin mentioned wanting this list it’s the first thing that popped into my head. Not only because I see it so often, but also because it is so often mistaken for poor disk performance.

There are lots of caches in SQL Server, but the most well-known is the data cache (aka buffer pool). The easiest way to describe the data cache is that it is the data stored in memory, rather than persisted to disk. Being able to store lots of data in memory long term is desirable because working with data in memory is generally much quicker than having to perform physical IOs.

I could turn this post into a very long discussion on memory pressure in SQL Server at this point, but I promise I will try to avoid that. There is already a ton of information available on this subject, and that is not really the intent of this post. What I will say is that, usually, memory pressure manifests as a few different symptoms. When viewed individually, some of these symptoms can lead you to incorrect, and sometimes costly, conclusions.

The two misleading symptoms are that you may start to see higher than normal latency across the disk subsystem, and you may start to see abnormally high waits related to disk activity. If you look at nothing but these two symptoms, you may come to the conclusion that you need to work on your disk system.

This is why being presented with all relevant metrics on one dashboard is so important. You have to look at the bigger picture, and having the memory-related data available along with the disk activity and waits helps to paint a clearer picture of what is really going on.

Typically what I’ll see (along with the disk waits and disk latency) is a PLE (Page Lifetime Expectancy) that is fairly low for this server. I describe it this way because what is good or bad for this value really “depends”. The larger your buffer cache is, the higher your “critical” threshold will be for PLE. The more data there is to churn in and out of the buffer, the worse off you will be when the “churn” actually happens. Another consideration is NUMA. The way the PLE counter is calculated can cause this value alone to be very misleading when multiple NUMA nodes are involved, as described by Paul Randal in a blog post about Page Life Expectancy isn't what you think... Luckily in SQL Sentry 7.0 and higher, you can actually see where PLE is for the individual NUMA nodes in history mode, which makes this a bit less of a problem.

I’ll usually also see consistently higher lazy writer activity, and SQL Server page faults (SQL Server going to disk). Sometimes I’ll see what I call buffer tearing. It’s basically when the data buffer is up and down frequently creating a jagged (or torn) edge on the history chart in Performance Advisor. Finally, I may also see an abnormally large plan cache reducing available memory for the data cache.

All of these things together spell memory pressure, and there are various ways to deal with them, but the important thing to note is that this is not a disk issue. It’s not saying that your disk system is necessarily wonderful either, but I am saying I wouldn’t call up your SAN guy and order a bunch of new hardware based on this situation. Once you get the memory pressure situation under control, SQL Server will not need to go to disk as much, and the few symptoms related to disk may disappear entirely!

The moral here is really to always consider the full picture of performance, because looking at one thing out of context could severely limit your options for a solution.

Honorable Mention: SQL Server Agent History Retention Settings Unlimited

We see this enough to include it in this list, and I think anyone that uses SQL Server Agent should be aware of it.

In SQL Server Agent Properties, under History, you can adjust retention settings.

image

For some reason, I’ve seen quite a few people set this to unlimited by unchecking both checkboxes. If you do this, and you use Agent jobs frequently, eventually you’re going to run into problems with job history in MSDB, because these tables aren’t really indexed very well. The settings I’m using above are generally fine for most cases, and if you’re using SQL Sentry Event Manager, you’re keeping this information in the SQL Sentry database anyway, so retaining it here is just redundant.

Conclusion

So there are my (current) top 5 most common SQL Server performance issues/topics. For #4 and #5, I actually had to run some numbers to find out what they were, but for the top three, I knew without having to consider it much at all.

Thanks for reading!

Tuesday, June 26, 2012

SQL Agent Alerts & User Defined Errors

Not too long ago I was working with a customer who was having trouble getting a SQL Server Agent alert to fire. At first they thought that Event Manager was misbehaving, but we discovered fairly quickly that the alert itself was never firing at all.

Why Wasn’t The Alert Firing?

With a little more digging I discovered that this was due to the configuration of the alert. The “General” settings were similar to the screenshot below:

image

The key thing to notice here is the “Severity” setting. Severities 0-10 are considered to be informational or to be non-serious, non-system related errors (http://msdn.microsoft.com/en-us/library/ms164086.aspx). These are not logged to the error log by default. Since alerts of type “SQL Server event alert” are only fired based on events that are logged, this is the reason we’re not seeing the alert fire.

So, How DO We Fire The Alert?

Here we were trying to fire the alert based on an error from a RAISERROR statement. The question from this point is, how exactly *can* we get this alert to fire? The answer is straight forward. We need to get the error to log.

This is easily done using the “WITH LOG” option of the RAISERROR statement. The only thing to keep in mind is that for Event Manager to be able to process the alert, we need to raise the error using a valid error number rather than just the generic user error of 50000 that is used by default. For this reason, we need to create a message prior to raising the error, and in my case I’ll remove it afterward. There is probably no need for the removal, but I’m just a little picky about keeping my system data clean. I’m also not building this for a real system, so away it goes as soon as I’m done with it.

Show Me!

A simple script like the following should do the trick:

-- Add my message so that I can have an error number
EXEC sp_addmessage @msgnum = 50025,
              @severity = 9,
              @msgtext = N'This is a user defined error!';
 
-- Use the message number that I just created
-- Sev 9 (http://msdn.microsoft.com/en-us/library/ms164086.aspx)
-- State 0 since we're not getting too fancy here.
RAISERROR (50025, 9, 0) WITH LOG;
 
EXEC sp_dropmessage @msgnum = 50025;

Above we use “sp_addmessage” to add a message for error number 50025. Next I raise the error, with severity 9 and specify that it is to be logged. Finally I just remove the message using “sp_dropmessage”, again because I just want to keep my system data clean.

Note: In SQL Server 2012 you can use the THROW statement to create an error with a custom number without creating a message record as follows:

          THROW 50025, N'This is a user defined error!', 0;

You should remember though that this will always be severity 16, and can’t be used for alerts since only severities 19 through 25 are logged by default as outlined at http://msdn.microsoft.com/en-us/library/aa937483(v=sql.80).aspx, and the THROW statement does not provide an option for logging to the error log.

Now I’ll just run this script a few times, and in the SQL Sentry Event Manager calendar I see exactly what we were hoping for:

image

What Now?

From here I can use the condition and action system in Event Manager to do all sorts of things when this alert is fired. I can run a job, execute a process, email a person or group, execute a TSQL script or just log this to the database.

This is basically how you can get any user defined message to fire an alert that can be monitored by SQL Sentry Event Manager.

Until next time,
-JRH