Thursday, June 12, 2014

Gap Analysis

Today I want to take a minute to talk about gaps. Not the kind where you find trendy clothes for the young (and young at heart) folks, but the kind that you might see in a performance chart on the SQL Sentry Performance Advisor Dashboard.

I’ve found myself explaining this to a few customers in recent months, and I wanted to get this down someplace so that others may be able to find it on their own, or if they do ask I have reference material for them.

The general feeling I get from the customer base is that seeing gaps on these charts is very bad, and when you see them something must be going terribly wrong with your monitoring environment, or you are losing valuable information. After reading this post, I hope that you can take one idea away from it that will stick with you. That one idea is that gaps are good!

What are the gaps?

On the Performance Advisor history view, the area and line charts will have a hole or “gap” any time there is more than 90 seconds between 2 data points. This is explained in greater depth in one of our Q&A threads.

This could go on for quite some time, or it could be just one “slot” depending on exactly why the data points are going missing. Either way, it will appear similar to the shot below:

image

Gaps are good!

As I mentioned once already, the knee-jerk reaction is to think that you’ve lost some valuable information during that time. The reality is just the opposite though! These gaps are a well planned feature, not a problem.

There are a finite number of reasons that this can happen, and they generally involve something that needs either immediate attention, or will need attention soon.

  1. SQL Sentry could have been restarted, or went offline for some reason
    • This will manifest as a gap in ALL the charts, for every monitored server in a given site
  2. You’ve experienced transient network or network interface problems
    • Network communications are important to the monitoring topology. If the network or NIC is down, collections can and will be missed, and gaps will appear if it stays down long enough
  3. The monitored server experienced severe resource contention
    • SQL Sentry uses less than 1% of system resources on a monitored server in general for performance monitoring. If the server is under so much stress that it is not even able to answer a request for performance data, there is something very wrong. You should begin troubleshooting by using SQL Sentry to determine what was happening on the server just before, and just after the gap. In this case, the gap is quite literally telling you something important, by displaying nothing
  4. Writing to the SQL Sentry repository could be taking too long
    • SQL Sentry’s collection services uses a series of write queues for sending batches to the repository. These write queues are capped at a maximum depth. If the maximum depth is exceeded, new values will be dropped, ultimately resulting in gaps. This is a clear indicator that your current settings are too aggressive, maintenance and or tuning settings on the repository instance needs to be done, or you need to scale or redistribute your monitoring environment to allow for greater throughput to the repository database
  5. You experienced a cluster or AG failover
    • Depending on how long the failover took, you may see a gap in charts for the Windows Server Failover Cluster (WSFC), SQL Server Failover Cluster Instance (FCI) or both

These are all the reasons. If you experience chart gaps, and you contact SQL Sentry support, these are the things the support engineer is going to be looking for to start. Not every single case is this simple, as there are some edge cases, but the vast majority of reports concerning chart gaps turn out to be one of these things.

Why are gaps good?

In most reports, area and line charts will simply continue on from data point to data point. Performance Advisor is designed with the idea that it is supposed to be able to collect performance metrics at specified intervals. In the event that enough of those metrics are not collected, Performance Advisor will make it overly obvious that something is wrong, by creating a gap in the history charts. If we simply connected the dots, you would actually never even be aware, in most cases, that something caused data points to go missing. You would see a full chart, and you would have no reason to look any further. We believe that would be deceptive as to the true performance history of the server you are monitoring. The gaps also provide you a way to be more effective as a DBA since you will generally see them before you start getting calls about outages from customers and/or users.

Conclusion

If you are a SQL Sentry user, and you have experienced these chart gaps before, I hope that you will see them in a new light after having read this. We created the gaps as a way to provide you with actionable information, even though we have no real data for that time period.

Until next time,
-JRH

Wednesday, May 14, 2014

Testing RESTful WEB APIs with Fiddler

I wear a lot of hats these days, and unfortunately I don’t have my web developer hat on every day. Thus, I am posting this as much for myself as for anyone else, so that I will always have an easy way to find it the next time I need it, which will surely happen again.

These days it is very common to be working on an API that you just don’t have an interface for yet. There are plenty of ways to test APIs, but a tool that I like to use when unit testing is Fiddler. This tool has been around for quite a while (10ish years?), was created by Eric Lawrence, and is currently owned by Telerik (whom I have no affiliation with). It’s really great for watching all your web traffic, among other things.

Today I’m going to demonstrate how to test a RESTful API on HTTP using Fiddler.

For the record, I’m making some fairly common assumptions. I’m assuming everything is in JSON. It doesn’t have to be, but in almost every case these days it is, so it is a safe assumption. I’m also assuming that if you’re reading this you have a basic understanding of how an HTTP server works and what JSON is, so I don’t plan on explaining any of that. Remember this is a selfish post for my own reference later. :)

So, first thing you need is Fiddler. Go download it, and install it. It’s free, and I linked it in the first paragraph. In case you missed that though, it’s here.

Now you need an API to test. Normally you will be writing one, but I’m going to use a free one here so that everyone can try it. This is from the folks at World Weather Online, where I registered to use their free API that will get local weather data for the postal code I send in.

First the easy way, doing a get.

Here is the resource:
http://api.worldweatheronline.com/free/v1/weather.ashx

The Postal code I want is: 28078 (Huntersville, NC)

My APP Key for the API (5 calls per second, 500 call per hour allowed): baey8kmn59dqtesenfvy24y6

And so the final URL that I use to get my data is:
http://api.worldweatheronline.com/free/v1/weather.ashx?q=28078&format=json&num_of_days=1&key=baey8kmn59dqtesenfvy24y6

Notice I’m also telling them I want JSON format, and I want one days worth of data.

In fiddler, go to the “Composer” tab. Make sure “GET” is selected in the dropdown on the left, and paste the URL in the wide textbox in the center.

image

Then press execute! What you want is the 200 response code (aka thumbs up) on the left panel:

image

Now double click on that row, and you’ll get a world of information to review in Fiddler:

image

But, what if the API accepts POST or PUT and I have to send parameters. Well this is what I always have to look up myself, so you would follow a slightly different procedure.

Now, this API doesn’t support this, so you can’t really try it. generally POST is making a change to something, or adding something, so keep that in mind.

You would take the base URL, something like: http://api.worldweatheronline.com/free/v1/weather.ashx

Then you would change from “GET” to “POST” in Fiddler:
image

Then you need to enter your some request headers:
image

Finally, format your parameters as a JSON object in “Request Body”:
image

From here, it’s the same. Click execute and hope for the 200 response code. If you get something else (I think for this one you will get a 403, which is access denied). Even that tells you what you want to know though, whether your post worked, and why not if it didn’t.

So here it is, any time I want I can come back and look at what the headers are supposed to be (that’s the part I always forget…)

Until next time,
-JRH

Thursday, January 2, 2014

TempDB Parasites!

Unless you happen to be Chuck Norris, your SQL Server instances only have one tempdb. That makes tempdb a single point of contention for an instance. What’s worse is that there are a lot of things going on in tempdb that you may not be aware of. Let’s look at some things that can use tempdb while we’re not looking.

Tuesday, November 5, 2013

Dos and Don’ts of Providing Exceptional Customer Service to Exceptional Customers

What is “exceptional” customer service? It’s difficult to define, but you know when you have received it. It is an experience in which a service representative helped change a stressful situation into a relaxing one. If you get on the phone with a service rep, or start an email chain to a support department about something that is driving you crazy, and a few hours later your entire day is better because you did it, you have experienced “exceptional” customer service.

What are “exceptional” customers? When I say exceptional customers here I’m really thinking in the context of skill, training and experience. In my case, I’m generally dealing with DBAs and developers. Now, let’s take that a step further and say that I’m generally dealing with very experienced SQL Server DBAs and developers. Given this context, it’s much easier to understand what I mean by “exceptional.” These aren’t folks who just bought a new TV, and are upset because it didn’t come with an HDMI cable. These are people who have spent years shaping their craft, and building their own library of skills and experience. They are indeed “exceptional.”

Following are some Dos and Don’ts that can be considered when attempting to provide exceptional service to exceptional customers:

Training

Do spend as much time as possible learning everything you can about whatever it is that you support, including subjects that are part of the same domain.

For example, if you support a tool like SQL Sentry, you will want to know as much as you can about SQL Sentry, Windows, Virtualization, SQL Server, SSAS, SSIS, SSRS, SharePoint and Oracle, and this is just the short list. You can break all of these down into the specific subsystems you would want to know deeply. Providing support to exceptional customers requires a deep understanding of what they do. You don’t have to know everything they know (which would be quite difficult considering the broad range of subjects), but you do need to understand their language.

Don’t always rely on your management or organization to decide where, when and how to train you.

A wise manager once told me that I should never count on anyone but myself to drive my career. The same applies to training. Continuously review yourself to know what training you need, and either seek it directly or ask for it.

The Audience

Do be aware of, and prepared for, the fact that you may be talking to someone who knows more than you do about several subjects.

Exceptional customers know their stuff. Do not ever try and fake your way through a conversation with them. Be prepared to take a subject offline, and do the research in order to be able to effectively communicate with them about whatever it is.

Don’t make any assumptions about a customer’s background, experience, expertise or role in their organization.

This really speaks for itself. Making assumptions in general is something that should be avoided.

Acting Ability

Do learn to put any life and/or professional issues aside when working with a customer.

In any support role you have to have some acting ability, but even more so with exceptional customers. They will expect consistent and rational behavior, so you have to be able to forget whatever else is going on, and focus on the moment. This is not to say that you shouldn’t be yourself, but you may have to put on the face of the best “you” you can be, even if on a personal level, you are not at 100%.

Don’t allow a bad mood to be reflected in your customer interactions.

It’s surprising how easily a negative tone can come across. For example, in an email, if you are normally very wordy, and suddenly you send a short response, it can be seen as carrying a negative tone. Don’t let the fact that you’re having a tough day affect your consistency.

Responsiveness

Do let customers know the status of your effort.

Your customers will appreciate a simple note from a real person saying that their issue is being reviewed or worked on. You may not have an answer right away, but get an initial non-automated response to them as soon as possible.

Don’t avoid communication simply because there is nothing new to report.

If you’re working on something long term, provide updates at reasonable intervals. Every day is too often, but every 2 weeks is too long. I tend to target the middle of the week on a weekly basis. That way you aren’t hitting them with it when their week is getting started, and you also aren’t sending it when they are starting to wind down. Even if you haven’t made any new progress, let them know you’re still there.

Measuring Performance

Do track customer satisfaction, accuracy of information, and efficiency.

It is important to know how you and your team are doing, but measure your efforts by the results being produced.

Don’t measure performance by numbers.

Measuring performance based on the number of “tickets” you close is setting you up to create angry customers. It will ultimately motivate your team to cut corners, and only provide the bare minimum level of service to close incidents. If you measure numbers at all, you should use these measurements for planning capacity.

Escalation

Do escalate by adding a person rather than handing off to a person.

If you need to get another person involved, do it by adding a person. This way at every escalation level you have folks that are familiar with the entire case. The customer will not feel like they’ve been transferred, and every support engineer involved continues to learn from the effort.

Don’t transfer.

Transferring between people and departments is very frustrating for customers. When is the last time you were happy about being transferred?

Service Limitation

Do know where your support ends, and someone else’s should begin.

Unfortunately, we can’t always know everything or solve every problem. Some issues turn out to be bugs in a hosting platform, or otherwise completely outside of any help you can offer. It is important to recognize when you have reached this point, and to be prepared to help the customer with the next step, which may even be a referral to another service. If you have to make that referral, do everything you can to help them transition to that next step, and continue to follow up until everything is resolved, or you know they are in good hands elsewhere.

Don’t continue to spend your customer’s valuable time on issues far outside the scope of your offering.

When you have reached the end of your ability to provide help, do not keep hanging on trying to solve something that you are not equipped to solve. Imagine how you would feel if your water heater stopped working, so you called in a plumber. The plumber discovered there was something wrong with your electricity and decided to make it his personal mission to fix your electricity. Three weeks later, when he finally gives up, you have to call an electrician and spend 5x as much as if the plumber would have just referred you to an electrician in the beginning. For your customer’s sake and your own, go ahead and let them know when you are at the end of what you can offer.

Empathy

Do learn to see all sides of a situation, and put yourself on each side.

This can really make all the difference. If you can get into the customer’s shoes and truly understand what they are going through to the point that, even for just a moment, you feel it yourself, you will be in a position to provide exceptional support.

Don’t approach incidents from an angle of selfishness.

Never rush an incident because you have to get to a meeting, or because it is late on Friday, or for any reason! If you are not prepared to be completely available for the customer, for as long as you are needed, then you shouldn’t start on the incident yet. It is better to have to wait a bit longer than to have a bad support experience that sticks in your mind forever. If you have to wait, just make sure you respond to let the customer know their issue is in the hands of a real person.

Conclusion

If you want to provide exceptional customer service (which you should want to!), and you are working with experienced, exceptional customers, just remember that you need to be on your game as much as they are. They expect it, and they have earned it!

Until next time,
-JRH

Monday, July 22, 2013

My Perspective: The Top 5 Most Common Windows Problems That Might Prevent Monitoring SQL Server

Introduction

It’s been quite some time since I promised that I would write this list, so I’ll get to it without a lot of needless chat.

We all know that it is very important to monitor events and performance on SQL Server, but many times our well meaning paths are blocked by the host operating system itself. This is a list of the most common issues I run into here at SQL Sentry when troubleshooting issues that keep folks from being able to fully monitor their database server.

Note that everything on this list has something to do with a native piece in Windows, or drivers written for Windows. There are many things that can reduce your ability to monitor a server, but this is a Windows subsystem only list. The basic idea is that these are things outside of SQL Server that you may want to look at if you’re having trouble collecting counters or other valuable system information.

Number 5: Cross Domain Security

In a lot of cases, we’re trying to access servers in a different domain than our user account belongs to. I run across this frequently when a DMZ type configuration is involved. Typically SEs will not set up any sort of trust between the corporate domain and the internal domain of a DMZ. I won’t go into design principles of different network topologies or security models here, but suffice to say that it is for a very good reason.

If you can’t access performance counters or WMI because of this, the answer is pretty clear. You need to gain access to the other domain in some way.

The two most common recommendations I usually provide are to use a “pass-through” account, or to set up a leg of your monitoring environment inside the other domain.

By “pass-through” account I am referring to the age old act of having a user account with the same login name and password as yours that lives on the server you’re trying to access. Windows has allowed this for as long as I can remember, but I also wouldn’t trust that it will always be there. Newer concepts like UAC also complicate things when using this method, so I usually recommend moving into the other domain over pass-through.

Number 4: WMI Repository Corruption

In my opinion, the whole WMI stack is a bit of a train wreck for something that is so vital when it comes to system management and monitoring. Something as simple as a provider delivered by a new NIC driver or disk controller can corrupt the repository to the point that it can’t be used. I don’t see this as often since Windows Server 2008, but it does still happen.

It is also difficult to determine that you have corruption because you will generally just get strange errors that don’t mean a lot just by reading them. If you suspect issues with the WMI repository, there are three tools you can use to help ferret it out.

  1. WBEMTest
  2. WMIDiag/winmgmt
  3. WMI Tracing

A lot of times, if the repository is not actually corrupt, it may be just a bad provider that can be fixed by updating to the latest driver for one device or another, but if you have indeed got a corrupt repository you may need to rebuild it. You can find instructions for that here.

Number 3: Strict Firewall Rules

Remote registry is needed to access performance counters on a remote Windows server. Remote registry works over SMB. SMB uses RPC. Therefore, besides having the Remote Registry Service running, you need to have the ports needed for RPC open to be able to read performance counters remotely. This can get “interesting” when firewalls are involved, because RPC uses dynamic ports. No firewall admin is going to let you open the default dynamic port range for MSRPC through the firewall, which is different depending on what version of Windows you’re dealing with, but is quite large regardless.

If you’re interested in this topic, here are a couple of good reads on the subject:

  1. How RPC Works
  2. Setting Windows Dynamic Port Range

So, what needs to happen here is that all three parts (firewall, host and target server) need to be in alignment with what range of dynamic ports can be used. If they are not in alignment you can easily have RPC that just does not work at all, or that works sometimes, and fails when you least expect it.

If you know a firewall is involved, you really need to do some planning and get your SEs involved in your monitoring effort to ensure that everything in that pipeline gets configured correctly for you.

Number 2: Name Resolution Issues

This happens all the time. You can’t connect to WMI or Perfmon, but you know everything is configured correctly, and there are no network issues. You have all the rights you need, but something is still not working.

Open up a command prompt and run a ping on your host name or perform an nslookup. Better yet, run a tracert to ensure that you are not only getting to a server, but to the right server.

Countless times I’ve beat my head on my desk wondering what the heck is wrong, and when I finally tried using only the IP address everything was fine.

Sometimes DNS entries are changed without everyone involved being told. Sometimes the IP address is changed and not everyone is told. Sometimes there are even duplicate entries. Regardless of what caused it, the issue can be resolved by just fixing DNS, using the correct server name, using the IP directly, or sometimes you can just run “ipconfig /flushdns” from a command prompt to refresh your local DNS cache.

Number 1: PerfLib Corruption

Without a doubt, the number one problem I see with monitoring Windows performance counters is when the counter repository is corrupted.

The performance counter system in Windows can be very touchy. Some of the things that I have seen cause corruption in PerfLib counters are Windows Updates, SQL Server Service Packs and/or CUs, Software Installations, Cluster Failovers and Power Outages. There really seems to be not much rhyme or reason to what can bring down the performance counters we rely on so much to be able to tell what resources we’re in short supply of on our servers.

This happens so much in fact, that we have a frequently visited Q&A thread that provides instructions on rebuilding your counter library in whole or in part when the need arises. You can find that here.

Honorable Mention: TCP Offloading

I wanted to include this one, because it has really bitten me a few times. There are some settings that allow certain functions to be offloaded to your NIC to save CPU cycles. There have been more than a few cases where having these enabled has caused some really crazy behavior. For example, in one case the user was getting timeouts from the client, but there were never any queries running long enough on the server to cause the client to timeout. After an embarrassingly long time, I came across this blog post. It wasn’t exactly my problem, but it was sort of similar. After some experimenting we discovered that disabling TCP Large Send Offloading solved the timeout problem.

If you are having dropped connections, strange timeouts, or slow responses that make absolutely no sense, this information might be for you.

Conclusion

So there we have my list of 5, or 6, very common issues related to Windows itself (or drivers written for Windows) that can really ruin your day if you need to monitor SQL Server performance or events from outside of SQL Server. I hope that someone can learn from my experiences and really save some time if they run into any of these.

Until next time,
-JRH

Thursday, March 14, 2013

SQL Saturday Richmond: Downloads Available

I want to thank everyone involved in the Richmond, VA SQL Saturday this past weekend. It was a great event, and I enjoyed both speaking and meeting with everyone.

For those that may have attended one of my sessions, I wanted to let you know that I’ve uploaded the slide decks and demo materials. Note that any small applications I built were compiled using .NET 4.5 and Visual Studio 2012. If you drop me a note on here or in an email I’ll send you the source code. It really isn’t anything special, except for the one I built for wrecking tempDB. That one is actually sorta cool.Winking smile

Below are the links to where you can download the .zip files:

Thanks again everyone, and I hope to see some of you out at other upcoming events!

Monday, January 21, 2013

3 Reasons to Upgrade SQL Sentry Now

Something that always amazes me no matter how long I have been in technology is the rather large pool of folks that never make it around to upgrading their tools. When it comes to SQL Sentry, we love for you to upgrade for your benefit and ours, and we really work hard to make it easy to do so. So here are some great reasons to upgrade your SQL Sentry environment just as soon as you are able to.

3 – Help Us, Help You

This sounds cheesy, but it is 100% true. When the majority of customers are on the latest version, and you are 1 or 2 major releases behind it will almost always be more difficult to support. Things change, sometimes major things, and having to travel virtually back in time to see how it used to work takes more time and effort. If you want help in the fastest possible time, please upgrade to the latest version by logging onto www.sqlsentry.net/portal.

2 – That Bug You Found Might be Fixed, or That Feature You Wanted Might be Complete

Unfortunately no software is perfect. If it were we might be running from Terminators right now! We take software defects seriously, and we consistently release incremental builds to fix bugs and provide minor improvements. We also take feedback from our customers very seriously, and when you ask for something we will tell you when it is ready. Any new feature or defect is listed in the change list. If you have a question about something on the change list, just email us at support at sqlsentry.net. We are happy to explain in detail any bug fix or feature listed.

1 – You Already Paid for It

New version releases are rolled up under your annual SQL Sentry software support agreement. Besides having full access to our exceptional (and good looking!) support services, this covers all newly released SQL Sentry builds from minor updates to full blown major releases. Even if your maintenance contract expires tomorrow, you can upgrade today!

Honorable Mention – Do it for Kevin!

Every time someone needs support for something that has already been fixed, Kevin Kline (b|t) loses another hair! Think about Kevin’s head the next time you ignore the new version in the Version Checker!