Wednesday, March 21, 2012

Odd query plan for view

I have a SQL 2000 table containing 2 million rows of Trade data. Here
are some of the columns:

[TradeId] INT IDENTITY(1,1) -- PK, non-clustered
[LoadDate] DATETIME -- clustered index
[TradeDate] DATETIME -- non-clustered index
[Symbol] VARCHAR(10)
[Account] VARCHAR(10)
[Position] INT
etc..

I have a view which performs a join against a security master table (to
gather more security data). The purpose of the view is to return all
the rows where [TradeDate] is within the last trading days.

The query against the view takes over around 30 minutes. When I view
the query plan, it is not using the index on the [TradeDate] column but
is instead using the clustered index on the [LoadDate] column... The
odd thing is, the [LoadDate] column is not used anywhere in the view!

For testing purposes, I decided to do a straight SELECT against the
table (minus the joins) and that one ALSO uses the clustered index scan
against a column not referenced anywhere in the query.

There is a reason why I have not posted my WHERE clause until now. The
reason is that I am doing what I think is a very inefficient clause:

WHERE [TradeDate] >= fGetTradeDateFromThreeDaysAgo(GetDate())

The function calculates the proper trade date based on the specified
date (in this case, the current date). It is my understanding that the
function will be called for all rows. (Which COULD explain the
performance issue...)

However, this view has been around for ages and never before caused any
sort of problems. The issue actually started the day after I had to
recreate the table. (I had to recreate the table because some columns
where added and others where renamed.)

On a side note, if I replace the WHERE clause with a hard-coded date
(as in 'WHERE [TradeDate] >= '20060324'), the query performs fine but
STILL uses the clustered index on the [LoadDate] column.(JayCallas@.hotmail.com) writes:
> The query against the view takes over around 30 minutes. When I view
> the query plan, it is not using the index on the [TradeDate] column but
> is instead using the clustered index on the [LoadDate] column... The
> odd thing is, the [LoadDate] column is not used anywhere in the view!

But "Clustered index scan" is just the same as "Table Scan". So it is
not very strange. No non-clustered index was good, so it scans the
index.

> There is a reason why I have not posted my WHERE clause until now. The
> reason is that I am doing what I think is a very inefficient clause:
> WHERE [TradeDate] >= fGetTradeDateFromThreeDaysAgo(GetDate())
> However, this view has been around for ages and never before caused any
> sort of problems. The issue actually started the day after I had to
> recreate the table. (I had to recreate the table because some columns
> where added and others where renamed.)

Statistics change, and old plan was not good any more. Yes, the above
is a problematic condition. Don't you read this newsgroup? :-) I
answered a very similar question last night.

You know something about the data that the optimizer does not. It
sees:

WHERE TradeDate > <UnknownValue
It estimates that it will hit 30% of the rows, a standard assumption.
And for 30% hit-rate a non-clustered index will be more expensive
than scanning the table.

This may be the place for an index hint See also the thread
"ranged datetime predicates & cardinality estimates" from yeaterday.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Thanks Erland for responding. As usual your posts are very helpful.

So, a Clustered Index Scan is as bad as a Table Scan? In this case,
what is it actually checking? The LoadDate is not even used in the
query so I am not sure what it is scanning for. Is it just using the
scan to look up the rest of the row?

As I already knew that the particular WHERE clause was bad, was I just
lucky all this time that the response time was good? Maybe because
there had not been any changes to the view or table in ages? Or maybe
because the plan was determined when there was a lot less data in the
table? Do query plans survive server reboots or restarting SQL?

Will take a look at the index hint any see how it goes.|||(JayCallas@.hotmail.com) writes:
> So, a Clustered Index Scan is as bad as a Table Scan? In this case,
> what is it actually checking? The LoadDate is not even used in the
> query so I am not sure what it is scanning for. Is it just using the
> scan to look up the rest of the row?

Let's say that you need to look up Michael Richardson in the telephone
book. Of course you open the book on R and quickly find him. You are seeking
the clustered index.

But say now that you are looking for someone whose first name is Jake,
and that he lives on Smallstreet, and you really need to find him. What
do you do? You read the phone book from start to end, that is you scan
the clustred index. The LastName, which is the key in the index is not
part of the search, but that is irrelevant.

> As I already knew that the particular WHERE clause was bad, was I just
> lucky all this time that the response time was good?

Bad is a little too strong a word. Problematic is more accurate.

The problem with a non-clustered index, is that if you get many hits,
and you for every hit you need to access the data page, you will do more
reads that you do, if you just scan the table from left to right.

>Do query plans survive server reboots or restarting SQL?

No. The plan is in cache only, and could also disappear during run-time,
if the plan is aged out.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||The hint worked. Thanks.

No comments:

Post a Comment