Most of the Customers using sql server 2000,2005 and 2008 have this doubt in their mind when using parallelism in sql server.
Why Do I see more threads per SPID in sysprocesses than MAXDOP?
The above question is being addressed in the following blog by Bob Dorr from SQL Server Escalation Services,
But we will try to provide some more clarification on this issue, in this blog by explaining the behaviour of query processor when dealing with parallelism with the help of following example.
We have used the following script from the book "Inside Sql server 2005 Query Tuning and Optimization" by Kalen Delaney, to induce parallelism in the execution plan,
createtable [HugeTable1]
(
[Key] int,
[Data] int,
[Pad] char(200),
Constraint [PK1] PRIMARYKEY ([Key])
)
SETNOCOUNTON
DECLARE @i int
BEGINTRAN
set @i = 0
WHILE @i < 250000
BEGIN
INSERT [HugeTable1] Values (@i,@i,NULL)
SET @i = @i + 1
if @i % 1000 = 0
BEGIN
COMMITTRAN
BEGINTRAN
END
END
COMMITTRAN
SELECT [KEY],[DATA],[PAD] INTO [HugeTable2] FROM HugeTable1
ALTERTABLE [HugeTable2] ADDCONSTRAINT [PK2] PRIMARYKEY ([Key])
We use the following query which displays parallelism in it’s execution plan
setstatisticsprofileon
select T1.[Key],T1.[Data],T2.[Data] From HugeTable1 T1 Join [HugeTable2] T2 ON T1.[Key] =T2.[Key] where T1.Data < 100 OPTION (MAXDOP 2)
Execution Plan
|--Parallelism(Gather Streams)
|--Nested Loops(Inner Join, OUTER REFERENCES:([T1].[Key]))
|--Parallelism(Repartition Streams, RoundRobin Partitioning)
| |--Clustered Index Scan(OBJECT:([AdventureWorks2008].[dbo].[HugeTable1].[PK1] AS [T1])
WHERE:([AdventureWorks2008].[dbo].[HugeTable1].[Data] as [T1].[Data]<(100)))
|--Clustered Index Seek(OBJECT:([AdventureWorks2008].[dbo].[HugeTable2].[PK2] AS [T2]),
SEEK:([T2].[KEY]=[AdventureWorks2008].[dbo].[HugeTable1].[Key] as [T1].[Key]) ORDERED FORWARD)
We ran the above query 100 times in a loop,identified the SPID while running the above query and queried the sysprocesses for that spid from another query window at the same time
select*fromsys.sysprocesseswhere spid=56
spid kpid blocked waittype waittime lastwaittype cpu physical_io ecid status
56 5640 0 0x00BB 3 CXPACKET 66653 20605 0 suspended
56 5936 0 0x00BB 3 CXPACKET 2147483647 0 1 suspended
56 1252 0 0x00BB 1 CXPACKET 2147483647 0 2 suspended
56 3508 56 0x0024 0 LATCH_EX 2147483647 0 3 suspended
56 3580 0 0x0000 0 LATCH_EX 2147483647 0 4 runnable
As we can see here we have 5 threads which are spawned to execute the query, however while executing the query we limited the DOP to 2 by specifying the MAXDOP hint to 2.
The reason is “MAXDOP is always specified per operator in the execution plan and not per execution plan”.
So in the above execution plan we have 3 operators in the execution plan viz Clustered Index Scan, Clustered Index Seek and Nested Loops.
So we have
- 2 threads (as per the MAXDOP hint) which are used to perform Clustered Index Scan,
- 2 threads (as per the MAXDOP hint) which are used for Nested Loop Join which also performs Clustered Index Seek to join the rows from outer Clustered Index Scan. In other words we do not have separate threads allocated for Clustered Index Seek opearor,
- 1 thread is used for parallel gather streams operator to collect and converge the parallel thread output from the Nested Loop join operator. This thread is basically a synchronizing thread (also referred as Thread 0 IN xml plan) which is used when there is parallelism in the execution plan of the query.
To justify the above explanation, we can observe the XML execution plan which also show the runtime behaviour of the plan
2 threads used to perform Clustered Index Scan
<RelOpNodeId="3"PhysicalOp="Clustered Index Scan"LogicalOp="Clustered Index Scan"….>
<RunTimeInformation>
<RunTimeCountersPerThreadThread="2"ActualRows="100"ActualEndOfScans="1"ActualExecutions="1" />
<RunTimeCountersPerThreadThread="1"ActualRows="0"ActualEndOfScans="1"ActualExecutions="1" />
<RunTimeCountersPerThreadThread="0"ActualRows="0"ActualEndOfScans="0"ActualExecutions="0" />
</RunTimeInformation>
2 threads used to perform Nested Loop as well the clustered index seek
<RelOpNodeId="1"PhysicalOp="Nested Loops"LogicalOp="Inner Join"….>
<RunTimeInformation>
<RunTimeCountersPerThreadThread="2"ActualRows="50"ActualEndOfScans="1"ActualExecutions="1" />
<RunTimeCountersPerThreadThread="1"ActualRows="50"ActualEndOfScans="1"ActualExecutions="1" />
<RunTimeCountersPerThreadThread="0"ActualRows="0"ActualEndOfScans="0"ActualExecutions="0" />
</RunTimeInformation>
Same threads used to perform Clustered Index Seek on inner query
<RelOpNodeId="4"PhysicalOp="Clustered Index Seek"LogicalOp="Clustered Index Seek"…>
<RunTimeInformation>
<RunTimeCountersPerThreadThread="2"ActualRows="50"ActualEndOfScans="0"ActualExecutions="50" />
<RunTimeCountersPerThreadThread="1"ActualRows="50"ActualEndOfScans="0"ActualExecutions="50" />
<RunTimeCountersPerThreadThread="0"ActualRows="0"ActualEndOfScans="0"ActualExecutions="0" />
</RunTimeInformation>
Finally Thread 0 which is shown in all the operators is used to gather streams from all the parallel threads and to provide the final output to the client
<RelOpNodeId="0"PhysicalOp="Parallelism"LogicalOp="Gather Streams"….>
<RunTimeInformation>
<RunTimeCountersPerThreadThread="0"ActualRows="100"ActualEndOfScans="1"ActualExecutions="1" />
</RunTimeInformation>
To conclude the MAXDOP setting is used to limit the number of threads per operation in the execution plan and does not limit the number of threads used to execute the query. So it is normal to see threads per SPID in sysprocesses greater than MAXDOP setting.
However if you see lot of waits for CX Packets in sys.dm_os_wait_stats and High CPU being consumed by sql server, you can consider reducing the MAXDOP setting or identify the missing indexes which can expedite the processing of the query.
Parikshit Savjani,
SE, Microsoft SQL Server,