Should You Split Queries?

Split Queries

Recently, while working with a customer and tuning some queries, we spotted a query that seemed odd. Something about it wasn't right. After some more investigation, the developer recognized the query as a one generated by Entity-Framework and using the SplitQuery feature. This was new to me. It's the first time I encountered this feature, so I went to learn about it.

Now that I know what it is and how it works, I can tell you that it's a terrible feature in most cases. Developers should avoid using it, unless there is a good reason to use it (which I doubt).

So what is the SplitQuery feature in Entity-Framework?

You can read about it here. According to that article, if you have a one-to-many relationship between two or more entities (tables), and you want to query both entities, then EF will generate a query with a join between the tables.

Here is an example query from the article:

SELECT [b].[BlogId], [b].[OwnerId], [b].[Rating], [b].[Url], [p].[PostId], [p].[AuthorId], [p].[BlogId], [p].[Content], [p].[Rating], [p].[Title]
FROM [Blogs] AS [b]
LEFT JOIN [Post] AS [p] ON [b].[BlogId] = [p].[BlogId]
ORDER BY [b].[BlogId], [p].[PostId]

The article then describes a problem with this approach, called "cartesian explosion" problem:

If a typical blog has multiple related posts, rows for these posts will duplicate the blog's information. This duplication leads to the so-called "cartesian explosion" problem. As more one-to-many relationships are loaded, the amount of duplicated data may grow and adversely affect the performance of your application.

So as a developer, what you can do to avoid this problem is to use the SplitQuery feature. If you instruct EF to split the above query, it will result in two queries executed one after the other:

SELECT [b].[BlogId], [b].[OwnerId], [b].[Rating], [b].[Url]
FROM [Blogs] AS [b]
ORDER BY [b].[BlogId]

SELECT [p].[PostId], [p].[AuthorId], [p].[BlogId], [p].[Content], [p].[Rating], [p].[Title], [b].[BlogId]
FROM [Blogs] AS [b]
INNER JOIN [Post] AS [p] ON [b].[BlogId] = [p].[BlogId]
ORDER BY [b].[BlogId]

The idea is to reduce the amount of data that the database sends to the client. All the columns from the [Blogs] table will be returned only once per blog instead of being duplicated across posts. This results in less network traffic and less memory footprint on the client.