The Magic of Data Compression

As Moore’s Law has stayed in effect over the last 40 years, processor performance improvements have long outstripped gains in main memory and disk I/O performance. Hence, the popularity of larger, multi-layer caches, (at the CPU, disk controller, and SAN level), and the increasing utility of various forms of data compression.

Newer computer systems with modern processors typically have enough excess CPU capacity available to make lossless data compression a very effective technique for improving overall performance. At a previous employer, we compressed all communication between the Presentation Tier and the Application Tier, paying the CPU cost of compression/decompression on both sides. This was originally done to improve performance and reduce band-width requirements over low-speed WAN links, but we soon discovered that it also improved responsiveness on fast Ethernet and gigabit Ethernet LANs, for a very minor increase in CPU utilization.

At NewsGator, we have had great results using middle-tier data compression of certain types of data (such as LOB data that cannot be indexed anyway). We use our middle-tier servers to compress the data before we store it in SQL Server 2005, and then to decompress the data after we retrieve it from the database. This has significantly reduced the size of our databases, reduced memory pressure on our SQL Server machines, and reduced our I/O requirements on our main SAN.

This works fine for LOB data fields, but it would not work very well for other data types (such as int, varchar, etc.) that you would want to include in SQL Server indexes. It you compressed the contents of a varchar column in the middle-tier, you would not be able to have a SQL Server index on that column.

Luckily, SQL Server 2008 has a good answer for that problem, which is native data compression, whether you use page level or row level compression. Of course, there is some CPU cost for this functionality, but in cases where your workload is I/O bound, you will come out way ahead in most situations.

Technorati Tags:
This entry was posted in SQL Server 2008. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s