Contact us

Madeira Data Solutions

Your Data, Our Solutions

The 10 Basic Concepts of T-SQL

Written By: Matan Yungman 29/08/2017

For the end of the latest Basic Querying and Programming in SQL Server course, I prepared a list of the basic concepts I recommend the students to remember and master in order to become proficient at writing T-SQL (and SQL in general). There’s no reason to keep those concepts inside the course, so here they are:

  1. Think in sets, not in rows
  2. Every part of your query is a table result, and can be referenced as such by later parts of the query
  3. Know the logical processing order of queries: From -> Join -> Where -> Group By -> Having -> Select -> Distinct -> Order By -> Offset/Fetch
  4. The more you prepare in advance, and the less calculations you perform on the fly, the better the query will run. Don’t take it to the extreme, of course
  5. Avoid user-defined functions as much as possible. Take the function logic out and use a set-based solution, or use an inline table-valued function if you want to keep the reuse and encapsulation a function gives you.
  6. Views can be evil (or to be more accurate, the way people use them). If you see a view that already queries from many tables and other views, consider whether you really want to use it, because in many cases, such views generate poor performing queries
  7. Keep queries simple. Don’t write “the mother of all queries”. If it’s complicated, break it down to smaller ones and use temp tables for temporary results
  8. In 99% of cases, temp tables are better than table variables
  9. Indexes will help your queries (but make sure there aren’t too many of them)
  10. Statistics will help them too
  11. Beware of things that prevent SQL Server from using an index, like wrapping a column with a function, using Like with % at the start of the predicate, or performing a manipulation of a column you filter on.

 

 

8 responses to “The 10 Basic Concepts of T-SQL”

  1. Alexander Suprun says:

    Jeff, there is one more side effect from using Scalar UDFs. A query will be forced to a serial plan, therefore cannot use more than one CPU. If let’s say you have a 8 core server then such a query would virtually run 56 (7*8) times slower.
    I’ve seen an ETL solution heavily depended on a lot of UDFs. When performance became terrible the client asked if adding more CPU/RAM would help… and nobody was going to re-write this monster, because they didn’t have enough manpower. So I would rather agree with the author – “Avoid user-defined functions as much as possible”.

  2. Jeff Moden says:

    You wrote “Avoid user-defined functions as much as possible”. That’s actually not correct advice. There are 3 basic types of user defined functions and a couple of implied “sub-types”. In most cases, I’ll agree that you should avoid Scalar UDFs and mTVFs (Multi-statement Table Value Functions), but iTVFs (Inline Table Value Functions) and the closely relate iSFs (Inline Scalar Functions, which are a special sub-type of iTVFs) can be the miracle answer to many performance problems. Further, iTVFs can be very effectively used to replace certain views as a form of “parameterized inline view”.

    How can you instantly know if you have a high performance iTVF (or iSF) instead of an mTVF or Scalar function? It’s easy. If the function contains the word BEGIN, this it cannot be an iTVF.

    While I do agree that Scalar User Defined Functions should generally be avoided, they’ve actually gotten more of a bad rap than they deserved. People that use SET STATISTICS are frequently unaware of the fact that measuring performance with SET STATISTICS will make Scalar UDFs look 200 times worse than they actually are, which is “only” 7 times slower than doing the same thing directly in code or using an “iSF”. Please see the following article on that subject.

    http://www.sqlservercentral.com/articles/T-SQL/91724/

    Thanks for listening,
    –Jeff Moden

    • Guy Glantser says:

      Hi Jeff,

      You are absolutely right, and your article explains this topic in a wonderful way.
      Thank you very much!

  3. Stephen J Newman says:

    7. Keep queries simple. Don’t write “the mother of all queries”. If it’s complicated, break it down to smaller ones and use temp tables for temporary results

    I like using less SELECT statements /sets/temp tables, to get to my answer. To me it is easier to view the logic in larger units rather than scrolling through line after line of temp table after temp table and then scrolling back up to remember what logic was used in the prior set’s (temp table’s) logic to keep track of the larger unit’s logic, that drives me nuts.

    I feel if we know how write ‘Mother Queries’ why not write them as long as performance is not heavily impacted.

  4. Andrej says:

    Hello, I have question about 8. Temp table vs table variable. I read, that using disk based table variables is better till 10k rows, after is better to use temp tables. Talking about performance. Using in-memory table variable should be different story, right?

    • Guy Glantser says:

      Hi Andrej,

      I disagree with the 10k threshold. It really depends on what you do with the table. For example, if you join the table to another large table without a proper index, then even for a very small number of rows, a temporary table will perform much better than a table variable. In fact, in one of my demos, I show this for only two rows.
      Using in-memory table variable will improve performance to some extent, but it won’t change the main problem, which is the lack of statistics. So if the problem is a join to a very large table without a proper index, even an in-memory table variable won’t help you.

  5. Oleg says:

    Thank you, it is useful.

    Only question about: “In 99% of cases, temp tables are better than table variables”
    Is it? Why?
    In most cases we even do not know is table variable in the tempdb and vs versa: where is temp table in the executing query.

    • Guy Glantser says:

      Hi Oleg,

      Both types of tables (temp tables and table variables) are stored in tempdb. There are several differences between them, but the most important one is that tables variables don’t have statistics. This means that the optimizer doesn’t have a clue about the number of rows in the table variable, and always assumes a single row (even if there are a million rows in the table). This leads to poor execution plans in many cases, and this is why it is almost always better to use temp tables.

Leave a Reply

Your email address will not be published. Required fields are marked *