The Beauty of TABLESAMPLE in SQL Server 2005

If you find yourself needing to “browse” a large table to get a feel for what kind of data is actually in there, there are two common methods for doing it. The first query grabs the top 1 percent, ordered by the clustered index (since there is no ORDER BY clause), while the second query just grabs a pseudo random 1 percent sample of the table.

It turns out that the first query costs 99 times as much as the second query and generates several orders of magnitude more logical IO. TABLESAMPLE is a new T-SQL clause in SQL SERVER 2005.  Just remember that you cannot count on what rows will come back with TABLESAMPLE.

 

Here are the results from a comparision of the two queries on a test table with a few hundred thousand rows in it.

SET STATISTICS IO ON

— Query 1

SELECT TOP (1) PERCENT *

FROM dbo.Account WITH (NOLOCK)

— Query 2

SELECT *

FROM dbo.Account

TABLESAMPLE (1 PERCENT) WITH (NOLOCK)

— Query 1

(3180 row(s) affected)

Table ‘Worktable’. Scan count 1, logical reads 1004462, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘Account’. Scan count 1, logical reads 8251, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

— Query 2

(3501 row(s) affected)

Table ‘Account’. Scan count 1, logical reads 91, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

This entry was posted in SQL Server 2005. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s