It is pretty common for large, busy SQL Server implementations to run into I/O pressure or bottlenecks, especially since many people seem to buy “big” servers with lots of CPU capacity and lots of RAM, but they treat the I/O subsystem as an afterthought. I have run into this situation enough lately, that I thought I would write a little about it.
The situation is not improved when you have SQL Server DBAs and SAN engineers working for different parts of large organizations, that may have different goals. If you suspect that your SQL Server instance is having I/O issues, these queries can help furnish proof that you are under I/O pressure, and they can help pinpoint what is causing the problem. As with most DMV queries, you will need VIEW SERVER STATE permission to run most of them.
I always like to run the Top Wait Stats rollup query first. If you see I/O related wait types at the top of the list, then you should start looking at the problem in more detail with the rest of the queries below:
-- Isolate top waits for server instance since last restart or statistics clear WITH Waits AS (SELECT wait_type, wait_time_ms / 1000. AS wait_time_s, 100. * wait_time_ms / SUM(wait_time_ms) OVER() AS pct, ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS rn FROM sys.dm_os_wait_stats WHERE wait_type NOT IN ('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE','SLEEP_TASK' ,'SLEEP_SYSTEMTASK','SQLTRACE_BUFFER_FLUSH','WAITFOR', 'LOGMGR_QUEUE','CHECKPOINT_QUEUE' ,'REQUEST_FOR_DEADLOCK_SEARCH','XE_TIMER_EVENT','BROKER_TO_FLUSH','BROKER_TASK_STOP','CLR_MANUAL_EVENT' ,'CLR_AUTO_EVENT','DISPATCHER_QUEUE_SEMAPHORE', 'FT_IFTS_SCHEDULER_IDLE_WAIT' ,'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN')) SELECT W1.wait_type, CAST(W1.wait_time_s AS DECIMAL(12, 2)) AS wait_time_s, CAST(W1.pct AS DECIMAL(12, 2)) AS pct, CAST(SUM(W2.pct) AS DECIMAL(12, 2)) AS running_pct FROM Waits AS W1 INNER JOIN Waits AS W2 ON W2.rn <= W1.rn GROUP BY W1.rn, W1.wait_type, W1.wait_time_s, W1.pct HAVING SUM(W2.pct) - W1.pct < 95; -- percentage threshold -- Common Significant I/O Wait types with BOL explanations -- *** I/O Related Waits *** -- ASYNC_IO_COMPLETION Occurs when a task is waiting for I/Os to finish -- IO_COMPLETION Occurs while waiting for I/O operations to complete. -- This wait type generally represents non-data page I/Os. Data page I/O completion waits appear -- as PAGEIOLATCH_* waits -- PAGEIOLATCH_SH Occurs when a task is waiting on a latch for a buffer that is in an I/O request. -- The latch request is in Shared mode. Long waits may indicate problems with the disk subsystem. -- PAGEIOLATCH_EX Occurs when a task is waiting on a latch for a buffer that is in an I/O request. -- The latch request is in Exclusive mode. Long waits may indicate problems with the disk subsystem. -- WRITELOG Occurs while waiting for a log flush to complete. -- Common operations that cause log flushes are checkpoints and transaction commits. -- PAGELATCH_EX Occurs when a task is waiting on a latch for a buffer that is not in an I/O request. -- The latch request is in Exclusive mode. -- BACKUPIO Occurs when a backup task is waiting for data, or is waiting for a buffer in which to store data
-- Some I/O Specific DMV Queries -- Glenn Berry -- December 2009 -- http://glennberrysqlperformance.spaces.live.com/ -- Twitter: GlennAlanBerry -- Always look at Avg Disk Sec/Read and Avg Disk Sec/Write in PerfMon for each Physical Disk -- Check for IO Bottlenecks (run multiple times, look for values above zero) SELECT cpu_id, pending_disk_io_count FROM sys.dm_os_schedulers WHERE [status] = 'VISIBLE ONLINE'; -- Look at average for all schedulers (run multiple times, look for values above zero) SELECT AVG(pending_disk_io_count) AS [AvgPendingDiskIOCount] FROM sys.dm_os_schedulers WHERE [status] = 'VISIBLE ONLINE'; -- High Latch waits (SH and EX) indicates the I/O subsystem is too busy -- the wait time indicates time waiting for disk SELECT wait_type, waiting_tasks_count, wait_time_ms, signal_wait_time_ms, wait_time_ms - signal_wait_time_ms AS [io_wait_time_ms] FROM sys.dm_os_wait_stats WHERE wait_type IN('PAGEIOLATCH_EX', 'PAGEIOLATCH_SH', 'PAGEIOLATCH_UP') ORDER BY wait_type; -- Analyze Database I/O, ranked by IO Stall% WITH DBIO AS (SELECT DB_NAME(IVFS.database_id) AS db, CASE WHEN MF.type = 1 THEN 'log' ELSE 'data' END AS file_type, SUM(IVFS.num_of_bytes_read + IVFS.num_of_bytes_written) AS io, SUM(IVFS.io_stall) AS io_stall FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS IVFS INNER JOIN sys.master_files AS MF ON IVFS.database_id = MF.database_id AND IVFS.file_id = MF.file_id GROUP BY DB_NAME(IVFS.database_id), MF.[type]) SELECT db, file_type, CAST(1. * io / (1024 * 1024) AS DECIMAL(12, 2)) AS io_mb, CAST(io_stall / 1000. AS DECIMAL(12, 2)) AS io_stall_s, CAST(100. * io_stall / SUM(io_stall) OVER() AS DECIMAL(10, 2)) AS io_stall_pct, ROW_NUMBER() OVER(ORDER BY io_stall DESC) AS rn FROM DBIO ORDER BY io_stall DESC; -- Average stalls per read, write and total SELECT DB_NAME(database_id) AS [Database Name], file_id, io_stall_read_ms, num_of_reads, CAST(io_stall_read_ms/(1.0+num_of_reads) AS numeric(10,1)) AS [avg_read_stall_ms], io_stall_write_ms, num_of_writes, CAST(io_stall_write_ms/(1.0+num_of_writes) AS numeric(10,1)) AS [avg_write_stall_ms], io_stall_read_ms + io_stall_write_ms AS io_stalls, num_of_reads + num_of_writes AS total_io, CAST((io_stall_read_ms+io_stall_write_ms)/(1.0+num_of_reads + num_of_writes) AS numeric(10,1)) AS [avg_io_stall_ms] FROM sys.dm_io_virtual_file_stats(null,null) ORDER BY avg_io_stall_ms DESC; -- Calculates average stalls per read, per write, and per total input/output for each database file. SELECT DB_NAME(database_id) AS [Database Name], file_id ,io_stall_read_ms, num_of_reads, CAST(io_stall_read_ms/(1.0 + num_of_reads) AS NUMERIC(10,1)) AS [avg_read_stall_ms],io_stall_write_ms, num_of_writes,CAST(io_stall_write_ms/(1.0+num_of_writes) AS NUMERIC(10,1)) AS [avg_write_stall_ms], io_stall_read_ms + io_stall_write_ms AS [io_stalls], num_of_reads + num_of_writes AS [total_io], CAST((io_stall_read_ms + io_stall_write_ms)/(1.0 + num_of_reads + num_of_writes) AS NUMERIC(10,1)) AS [avg_io_stall_ms] FROM sys.dm_io_virtual_file_stats(null,null) ORDER BY avg_io_stall_ms DESC; -- Which queries are causing the most IO operations (can take a few seconds) SELECT TOP (20) total_logical_reads/execution_count AS [avg_logical_reads], total_logical_writes/execution_count AS [avg_logical_writes], total_worker_time/execution_count AS [avg_cpu_cost], execution_count, total_worker_time, total_logical_reads, total_logical_writes, (SELECT DB_NAME(dbid) + ISNULL('..' + OBJECT_NAME(objectid), '') FROM sys.dm_exec_sql_text([sql_handle])) AS query_database, (SELECT SUBSTRING(est.[text], statement_start_offset/2 + 1, (CASE WHEN statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(max), est.[text])) * 2 ELSE statement_end_offset END - statement_start_offset ) / 2) FROM sys.dm_exec_sql_text(sql_handle) AS est) AS query_text, last_logical_reads, min_logical_reads, max_logical_reads, last_logical_writes, min_logical_writes, max_logical_writes, total_physical_reads, last_physical_reads, min_physical_reads, max_physical_reads, (total_logical_reads + (total_logical_writes * 5))/execution_count AS io_weighting, plan_generation_num, qp.query_plan FROM sys.dm_exec_query_stats OUTER APPLY sys.dm_exec_query_plan([plan_handle]) AS qp WHERE [dbid] >= 5 AND (total_worker_time/execution_count) > 100 ORDER BY io_weighting DESC; -- Top Cached SPs By Total Physical Reads (SQL 2008). Physical reads relate to disk I/O pressure SELECT TOP(25) p.name AS [SP Name], qs.total_physical_reads AS [TotalPhysicalReads], qs.total_physical_reads/qs.execution_count AS [AvgPhysicalReads], ISNULL(qs.execution_count/DATEDIFF(Second, qs.cached_time, GETDATE()), 0) AS [Calls/Second], qs.total_logical_reads AS [TotalLogicalReads], qs.total_logical_reads/qs.execution_count AS [AvgLogicalReads], qs.total_worker_time AS [TotalWorkerTime], qs.total_worker_time/qs.execution_count AS [AvgWorkerTime], qs.execution_count, qs.total_elapsed_time, qs.total_elapsed_time/qs.execution_count AS [avg_elapsed_time], qs.last_elapsed_time, qs.cached_time FROM sys.procedures AS p INNER JOIN sys.dm_exec_procedure_stats AS qs ON p.[object_id] = qs.[object_id] WHERE qs.database_id = DB_ID() ORDER BY qs.total_physical_reads DESC; -- Top Cached SPs By Total Logical Writes (SQL 2008). Logical writes relate to both memory and disk I/O pressure SELECT TOP(25) p.name AS [SP Name], qs.total_logical_writes AS [TotalLogicalWrites], qs.total_logical_writes/qs.execution_count AS [AvgLogicalWrites], ISNULL(qs.execution_count/DATEDIFF(Second, qs.cached_time, GETDATE()), 0) AS [Calls/Second], qs.total_logical_reads AS [TotalLogicalReads], qs.total_logical_reads/qs.execution_count AS [AvgLogicalReads], qs.total_worker_time AS [TotalWorkerTime], qs.total_worker_time/qs.execution_count AS [AvgWorkerTime], qs.execution_count, qs.total_elapsed_time, qs.total_elapsed_time/qs.execution_count AS [avg_elapsed_time], qs.last_elapsed_time, qs.cached_time FROM sys.procedures AS p INNER JOIN sys.dm_exec_procedure_stats AS qs ON p.[object_id] = qs.[object_id] WHERE qs.database_id = DB_ID() ORDER BY qs.total_logical_writes DESC;
Hi Glenn,I’m a huge fan of using DMV’s to troubleshoot SQL Server and you have kindly provided some really great queries here. I will certainly be adding then to my script bank.Cheers,
Glad you enjoyed them, John
Glen, great utils, thanks. I thought tempdb was my bottleneck until I ran the above, and turns out it\’s a userdb with 69% stall pct, I need to dig further.
Hi I ran your queries above and found one of my databases has an io_stall_pct of 97% , any tips for what next to focus on? I found your page because I have been researching how best to layout my database files (# of files, #of files/filegroup) and also playing with MAXDOP setting on a 24 core server. I just this week reduced MAXDOP to 1 and was planning on incrementing it 1 by 1 to find the best setting for us, previously it was set to 0 so SQL managed it. The db showing the 97% stall pct has 1 file group with much heavier usage then the others, and I currently have 3 files in this fg spread over 3 mount points on our san. I thought we only benefited from having as many files in an FG as there were different disks so that’s how I came to the #3 files. Do you think having MAXDOP un-restricted in combination with only 3 files in this heavily used fg could have contributed to the amazingly high stall %? I have a nearly identical system to this on identical hardware with same file structure and this db has a 46% stall pct there, still high but not shockingly high. Thanks in advance for any inputs you can give.
Ron,
Is it the data file(s) or the log file that has a 97% io_stall_pct? What kind of workload do you have? What kind of storage is it? What RAID level, how many spindles behind the logical drive? You could run my complete set of diagnostic queries to get a better overall picture of what is causing the pain.
Glenn thanks so much for responding so quickly. To answer your questions, 1) its the data files with the 97% io_stall 2) we are still in development, and are running ET&L batches almost exclusively in this env. The batches run overnight between work days, they are updating 10’s of millions of rows per batch and the underlying sprocs can run for 45mins to 1.5 hours, this is what I’m working on tuning. 3) we have a Clariion SAN at raid 5, I have an email out on the #/spindles, I don’t know that off the top of my head. I have run pretty much all of your diag queries (thanks for posting these) and I did find something curious. As I mentioned I use file groups and know which one of my fg’s is having these problems, however when I run this query (– Calculates average stalls per read, per write, and per total input/output for each database file.) I see 2 of the 3 files in this FG listed as #’s 1 and 2 in avg_io_stall_ms at 121 and 113.7, but the 3rd file in this Fg is all the way down in the 33rd place with only 13.1 avg_io_stall_ms. I would think that SQL is distributing the data evenly across these files so this is puzzling to me.
Hi Glen
A quick question on the isolate top wait states query. You have two columns with a percentage in it. First one pct and the other as running_pct. What is the difference between the two? Should I be looking at just one value or the two in combination.
Thanks!
Neil
Hi Glen
Sorry my bad I just looked over it much more proper. I get it now. Just looking at percentage is the way. I have the top wait time of CXPACKET with over 66% on that. We are running a major ETL system Bascially extracting from SAP into SQL applying fixes and then exporting again. Think once a week truncate table reimport data do validations redo. Now CXPACKET from my understanding is not bad on this type of Batch processing system(OLAP) but could be bad on a Transactional System(OLTP).
I have noticed severe CPU usage when the CXPACKET wait type is the most common wait type when reports are running. These reports are on BIG tables with the majority tables over 500 million up to 1,3 Billion rows. I suspect the parrallelism in the queries is causing issues as the same tables are used in the reports and I can see queued tasks on the CPU reaching as high as 10 queued tasks. My feeling is that the queries are issues a lot of lock in parrallel which is slowing the other queries down. Since I dont have a development environment I can predict what dropping the MAXDOP settings would do. Your thoughts would be appreciated.
Regards
Neil
Neil,
How many processor cores do you have, and what kind of processors do you have? Quite often, CXPACKET waits are caused by missing indexes, but that is more common with OLTP workloads.
If you are under severe CPU pressure, lowering MAXDOP some might help relieve the CPU pressure at the cost of making some of the parallelized queries take longer to run.
Hi Glen.
I checked for missing indexes. The server is hosted by a rather large international firm whose main business is business machines. They are not exactly forthcomming with the server specifications. It has 8 cores but not sure if it is 2X Quad Core or 4XDual core. My access to the server is quite restricted. So having a hard time getting all the information.
I suspect that there might be bad or missing statistics on the tables/views being used. This would lead to SQL not creating a optimal plan. Is this assumption correct? I will investigate a little more.
Regards
Neil
Hi Glen
Found the problem. Enforced JOIN Hints. Seems like some bright spark decided that HASH joins should be enforced in the queries. We have had massive issues with space in tempdb as well. So i started the process of elimination. I knew CXPACKET should be beneficial to a OLAP/DW only database so I was not happy just dropping the MAXDOP settings. I checked for missing indexes, old statistics and then finally went into the queries themselves.
Alarm bells started ringing when I saw SQL saying Warning: Local Join Types Enforced.
Here is the results(once again proving that SQL is better off making the choices). I had a query with the hash join enforced running more than 45 minutes. I removed the HASH join ran the query again after freeing buffers making sure there was no cached plans etc. The query now runs in 15 seconds! That is like 148 000% improvement.
Anyway thanks for the queries above though they did not help me solve the problem it was a interesting learning experience.
Great Blog!
Regards
Neil
Hi In my case TempDB is geeting full (Data File) Fastely , we are using this server for reporting perposes , I checked these DMVs
Databse Name I/O Stl_read No.of reads Avg_read_stl i/o stl_write no. of writes
tempdb 824092 221536 3.7 16531884 257911
avg.write stl
64.1 17355976 479447 36.2
After every 3-4 days its getting full (127 GB)
Inial Size of TempDB is 88GB and Reportdatabase size is 8GB only
Regards
Pingback: Was beeinflusst die SharePoint Performance (Teil2) | Tobias Wolter IT- Solutions & Consulting
Pingback: What impact can different cursor options have?
Thanks for these descriptive scripts – really useful.
Can I just ask – when I ran — Analyze Database I/O, ranked by IO Stall% and subsequent average IO script – I keep getting one database as the culprit with over 93%.
However when I run — Which queries are causing the most IO operations (can take a few seconds)
This database does not show up at all in results.
Why ?
Pingback: A SQL Server DBA myth a day: (12/30) tempdb should always have one data file per processor core | Paul S. Randal
Pingback: Benchmarking: 1-TB table population (part 2: optimizing log block IO size and how log IO works) | Paul S. Randal