Thursday, January 2, 2014

TempDB Parasites!

Unless you happen to be Chuck Norris, your SQL Server instances only have one tempdb. That makes tempdb a single point of contention for an instance. What’s worse is that there are a lot of things going on in tempdb that you may not be aware of. Let’s look at some things that can use tempdb while we’re not looking.

Table Variables
There are lots of blog posts out there that will tell you that table variables are treated differently from #temp tables, most importantly that they are stored in some special memory buffer even in cases where #temp tables go to disk. To the contrary, table variables and #temp tables are stored in the same way; both are stored in tempdb.
For some great reference, there is an incredibly detailed answer on Stack Exchange discussing this, and there really isn’t anything I can come up with here that will add to it: http://dba.stackexchange.com/a/16386/1186
The next time you’re trying to remember this though, just think of this little lyric:
“Table variables: convenient though they may be, they are still stored in tempdb.”
Query Plan Operations
I grabbed these explanations pretty much directly from the “Working With TempDB” whitepaper.
  • Sorts (including distinct)
    • When this happens the full row set is stored in tempdb for performing the sort operation. This is typically referred to as a spill, and in SQL Server 2012 you will see a warning about it in the execution plan.
  • Hash (aggregates and joins)
    • In order to perform hash operations, you need a hash table. If the hash table is big enough, it will be stored in tempdb.
  • Spools (table and index)
    • In this case, the entire input for the operator is stored in tempdb.
Let’s take a moment to show the effect a sort spill can have on tempdb.
I have to say that I was having a tough time getting a sort spill to happen, and Aaron Bertrand (b|t) stopped by with an idea for a nasty query. I tweaked it just a little bit to keep it from running all day, and in the end, it gave me what I was looking for. (thanks Aaron!)
SET NOCOUNT ON
USE tempdb
GO
 
SELECT internal_objects_alloc_page_count * 8 / 1024 [Before MBs] 
FROM sys.dm_db_task_space_usage 
WHERE session_id = @@SPID and database_id = 2
 
select top 1000 * 
from (SELECT TOP 5000 * FROM sys.all_columns) col cross join (SELECT TOP 5000 * FROM sys.all_objects) obj 
order by newid()
 
SELECT internal_objects_alloc_page_count * 8 / 1024 [After MBs] 
FROM sys.dm_db_task_space_usage 
WHERE session_id = @@SPID and database_id = 2
Note the DMV “dm_db_task_space_usage”. I’m using this to tell me space allocation for internal objects in tempdb. It is documented here.
And the results…
Before MBs
--------------------
0
…<useless data>…
After MBs
--------------------
2221
See how I have no internal space allocated in tempdb before the query, but afterward, I have allocated 2221 MB? That is where my sort spilled to tempdb.
In the plan, since I’m on SQL Server 2012, we can see a warning about it:
image image
How do I know for sure it was all the sort? Well, I commented that order by out and ran it again.
More results…
Before MBs
--------------------
0
…<more useless data>…
After MBs
--------------------
0

Cursors
Keyset cursors store their keyset in a tempdb table and static cursors store the entire result set in a tempdb table. I’m not going to be one of those “never use cursors” people, but I will suggest to keep this in mind when declaring a cursor.
It is worth noting a post Aaron did on using different cursor options: http://www.sqlperformance.com/2012/09/t-sql-queries/cursor-options
In this post, Aaron covers the options LOCAL and FAST_FORWARD, which can and should be used with the majority of the cursors you might be using.
Let’s take a moment to show the effect a static cursor can have on tempdb.
I have a database named “tests” with a table defined like this:
CREATE TABLE test
(
    id int IDENTITY(1,1) NOT NULL,
    [data] nvarchar(max) NULL
) 
Now, I’ll add some rows. This should give me close to 5MB per row worth of nvarchar data. Run 100 times, I should get somewhere close to a 500MB table.
insert dbo.test ([data])
    select 
        REPLICATE(CONVERT(nvarchar(max), 'A'), 2500000)
go 100

And just to check my questionable math, let’s check the table size:
SELECT SUM(stat.used_page_count) * 8 / 1024 MBs
FROM sys.dm_db_partition_stats stat
WHERE stat.[object_id] = object_id('dbo.test')
Results…
MBs
-------------------- 487
Not quite 500, but close enough for this. ;)

This query will check tempdb size, declare a static cursor, then check tempdb size again to see what we caused:
SET NOCOUNT ON
USE tempdb
 
GO
 
SELECT internal_objects_alloc_page_count * 8 / 1024 [Before MBs] 
FROM sys.dm_db_task_space_usage 
WHERE session_id = @@SPID and database_id = 2
 
DECLARE c CURSOR STATIC FOR SELECT [data] FROM tests.dbo.test
OPEN c
 
SELECT internal_objects_alloc_page_count * 8 / 1024 [After MBs] 
FROM sys.dm_db_task_space_usage 
WHERE session_id = @@SPID and database_id = 2
 
CLOSE c
DEALLOCATE c
 
SELECT internal_objects_dealloc_page_count * 8 / 1024 [Deallocate MBs] 
FROM sys.dm_db_task_space_usage 
WHERE session_id = @@SPID and database_id = 2
Here are my results:
Before MBs
--------------------
0
After MBs
--------------------
487
Deallocate MBs
--------------------
487
As you can see, before the cursor, I have 0 allocated internal object pages, then the entire thing is stuffed into tempdb. Finally, when I deallocate, we can see those pages are marked for deallocation as well.
One thing I was thinking while I was doing this was questioning whether or not this happened if I never even opened the cursor. I tried that out, and it doesn’t. It is not until you open the cursor that the tempdb objects are allocated.
Triggers
Since SQL Server 2005, triggers use the version store, which is kept in tempdb. So, if you’re using triggers, they are implicitly using tempdb. Remember that triggers are set-based, and you’ll get version data for every affected row. For really big batches, this can turn out to be quite a bit of version data.
Let’s see an example. This is using my test table from the cursors section above. I’m going to define an update trigger:
use tests
go
create trigger tUpdateTest on dbo.test for update as
 
    declare @var nvarchar(max)
 
    declare @pages int
 
    SELECT @pages = internal_objects_alloc_page_count * 8 / 1024
    FROM sys.dm_db_task_space_usage 
    WHERE session_id = @@SPID and database_id = 2
 
    print '[During MBs] ' + cast(@pages as varchar)
 
    select @var = deleted.[data] from deleted
This really doesn’t do anything, but I wanted to make sure we’re accessing the deleted table, and that we’re checking tempdb internal objects allocation inside the trigger.
Now, I’ll just run a batch that checks tempdb allocation before I run an update against every row. Our results will have the starting tempdb allocation size, and the print statement from the trigger. Let’s see what happens.
SELECT internal_objects_alloc_page_count * 8 / 1024 [Before MBs] 
FROM sys.dm_db_task_space_usage 
WHERE session_id = @@SPID and database_id = 2
 
update tests.dbo.test set [data] = [data] + 'x'
And the results…
Before MBs
--------------------
0
[During MBs] 493
See how all that version data got dumped into tempdb?
LOB Variables
If you have a variable holding a big XML document or maybe a parameter holding a large nvarchar(max), they can get stored as internal objects in tempdb. Let’s see what impact this might have.
This time I don’t even need a table. Just a variable and a loop. Each loop we will add another 100000 characters to the string, and check out tempdb usage for this task, along with the length of the variable. This will let us see at what point the variable goes to tempdb.
DECLARE @bigVar nvarchar(max) = '';
 
WHILE(LEN(@bigVar) < 1000000)
BEGIN
    SELECT @bigVar = @bigVar + REPLICATE(CONVERT(nvarchar(max), 'x'), 100000);
 
    SELECT LEN(@bigVar) [Variable Length], internal_objects_alloc_page_count * 8 / 1024 [TempDB MBs] 
    FROM sys.dm_db_task_space_usage 
    WHERE session_id = @@SPID and database_id = 2
END
I’m adding 100k to the string variable each loop, so I should get 10 rows in my results:
Variable Length      TempDB MBs
-------------------- --------------------
100000               0
Variable Length      TempDB MBs
-------------------- --------------------
200000               0
Variable Length      TempDB MBs
-------------------- --------------------
300000               1
Variable Length      TempDB MBs
-------------------- --------------------
400000               2
Variable Length      TempDB MBs
-------------------- --------------------
500000               3
Variable Length      TempDB MBs
-------------------- --------------------
600000               4
Variable Length      TempDB MBs
-------------------- --------------------
700000               6
Variable Length      TempDB MBs
-------------------- --------------------
800000               7
Variable Length      TempDB MBs
-------------------- --------------------
900000               9
Variable Length      TempDB MBs
-------------------- --------------------
1000000              11
You can see that for my first 2 loops, I’m still storing my string in memory, but somewhere between 200k and 300k characters it was moved to tempdb storage. By the time I get to 1m characters, I’m using over 10MB in tempdb. Now, this may not have a profound effect on tempdb usage or performance, but you can imagine what might happen if you had an application with 5000+ concurrent connections all performing operations like this. It adds up.
Conclusion
The purpose of this post wasn’t really to define everything that goes into tempdb, but to point out some specific scenarios that can cause tempdb usage, even though tempdb might not be at the top of our thoughts while performing these actions. These are parasites slowly leeching tempdb throughput little by little, and one should make an effort to be mindful of them when designing and developing a system that utilizes SQL Server.
Until next time,
-JRH

1 comment:

  1. I just wanted to note that the simple calculations I'm using above can get more precise if you want. I'm aware that if we're dealing with less than 1MB of data, but more than 0 pages, we'll still get 0 for an answer because of integer rounding. I've run the same tests with decimals and come up with the same answers where we have 0. Even if I hadn't though, the main concepts behind this post are still demonstrated. The tempdb "parasites" cause far more tempdb space to be used during these tasks than anything else being done in the same batch.

    ReplyDelete