The Axis Of Evil
Views, table variables and local variables are great in terms of programming principles.
But In terms of performance, they can be a honey trap: They don’t scream “DANGER!” like cursors or user defined functions, but they can cause serious problems very easily.
That’s why I call them “The Axis Of Evil”.
I know – maybe it’s a little extreme, but let me explain.
Why Views Are Evil
The truth is they are not. It’s the way people sometimes use them that is evil.
Views are great for a few causes. They can be used to:
Expose an interface while keeping the option to change the underlying schema without affecting the client application
Expose only a subset of the rows or columns to clients for security reasons
Leverage code reuse
Those attributes attract many developers to use views because they relate to the good programming principles of reuse and encapsulation.
The problem starts when people over-reuse views. It’s not uncommon to see a view that joins a huge table to another view, which in turn joins other two super complex views, and so on. The optimizer is very smart and generally does an excellent job in building a good execution plan for our queries, but it also has its limits, and a query like the one described here is very hard to optimize. Once we figure out we have a performance problem with such a view it can be very hard to optimize it, and that’s for two main reasons:
It can be hard to identify where the performance problem lies in that kind of hierarchy. It can be in one place or in many places.
A-lot of business rules can be encapsulated in the view and the other views it uses. Even if you find the cause for the performance problem, It’s not always easy change things while keeping those business rules unharmed when you don’t remember the initial need the view was written for.
Use views, but wisely: Don’t over-reuse. If you see a view that starts to remind you of the one described above, don’t use it. In that case It’s easier to write things the right way from the start than trying to fix things later.
Why Table Variables Are Evil
One of the most discussed topics regarding SQL Server is temporary tables vs. table variables.
The general rule of thumb is that table variables are better for small result sets, and temporary tables are better for large result sets.
This rule of thumb and other characteristics of temp tables and table variables cause a-lot of people to believe a table variable is the right choice for their code. Table variables have advantages over temp tables, but the fact that they lack statistics can cause the optimizer to choose very bad plans. I’ve been working with SQL Server for almost 10 years and never saw a situation where a table variable was significantly better than a temp table (I’m sure such situations exist, but they are not many). On the other hand, I’ve seen a bunch of slow running processes that changing a table variable to a temp table improved their runtime dramatically.
As default, use temp tables. If you’re not happy with the performance, check if a table variable can help.
Why Local Variables Are Evil
Whether it’s a batch, a stored procedure, a user defined function or a trigger, the optimizer doesn’t know the values of local variables during query optimization.
In order to decide which plan to use, it uses the statistics density vector. This is the total number of rows divided by the number distinct values, or in other words, the average number of rows per distinct value. If the number is selective enough, an index seek can be performed. Otherwise, we can get an index scan, a clustered index scan and so on.
This behavior leads to the fact that using local variables in the where clause often causes the optimizer to generate bad execution plans.
On the other hand, constants and the values of parameters passed from outside are known to the optimizer at optimization time. On first execution of a stored procedure or a function, SQL Server “sniffs” the parameter values and optimization is done according to these values. This behavior is called parameter sniffing (which has its own problems).
The fact that constants and parameter values are known at optimization time leads to the possible ways for solving the problem:
Using a stored procedure for executing the query which uses the local variable, while passing the variable as a parameter. If the query in question is already a part of an existing procedure, put this query in another inner procedure and call it from the existing one.
Using Dynamic-SQL, either with sp_executesql (which allows reuse) or with exec().
Using option (recompile), which will generate a new plan on each execution, taking into account the variable values.
Using option (optimize for (@param = value)), which will generate a plan that is optimized for the value specified for @param.
Each option has pros and cons, but they are usually better than just using local variables.
In most cases, a stored procedure is the best option. If you think it’s not the right option for you, try the other options. Also check out my presentation, “Things You Can Find In The Plan Cache“, for better understanding the pros and cons of each option.
Views, table variables and local variables are fundamental parts of T-SQL development. Knowing their characteristics, how they are used and the way the optimizer can work with them can get you ahead in delivering high performance solutions.