A Useful Script to Delete Data from Multiple Tables


Multiple Tables

Sometimes you need to delete data from multiple tables in a database. For example, you might have a multi-tenant database, and you need to delete all the data that belong to some tenant.


The problem is that there are many tables in the database that contain data, which you need to delete. If you have a column like "TenantId" in all tables, then your life is easier, because you have a simple predicate to apply to your DELETE statements against all tables. But even then, if there are foreign keys between tables to enforce referential integrity, then things get more complicated.



The Problem


As an example, consider the following diagram:


Hierarchy of Tables with Referential Integrity
Hierarchy of Tables with Referential Integrity

In this example, the Tenants table contains a row for each tenant, and it includes the TenantId column with a primary key. There is a one-to-many relationship between that table and the Customers table. The Customers table has a TenantId column with a foreign key referencing the primary key of the Tenants table.


In the same way, there is a primary key in the Customers table (CustomerId) and a foreign key in the Orders table (CustomerId) referencing that primary key.


Now, let's say you need to delete all the data that belong to tenant 28. If you try to delete the row with TenantId = 28 in the Tenants table first, the statement will fail, because there are customers that belong to that tenant. In other words: there are rows in the Customers table that reference the row that you are trying to delete from the Tenants table.



Possible Solutions


One way to solve this problem is to add the ON DELETE CASCADE option to all foreign keys. With this option, when you delete a row that has referencing rows in another table, instead of failing the operation, the database engine will delete the row and all the referencing rows as well.


If you have multiple levels in the hierarchy, as in our example, and all foreign keys use this option, then all the relevant rows will be deleted from all tables recursively and transparently. All you need to do is delete the single row from the Tenants table, and the database engine will do all the rest behind the scenes.


This is nice, but it can be dangerous. You might not realize exactly how many rows you are going to delete in total, and this number might surprise you. If you are going to end up deleting a large number of rows from multiple tables in a single operation, then you might run into other problems related to transaction log throughput, disk space, and recovery time. In general, I don't like things that happen in the background, just like with triggers or plan guides. It can be quite confusing and difficult when troubleshooting stuff.


Another option is to delete the data from the tables in the correct order, from the lowest level to the highest. In our simple example, you can delete all the rows from the Orders table first, then from the Customers table, and only then delete the single row from the Tenants table.


When you have many tables with many references, it can become quite complicated to sort the tables in the correct order. Sometimes, you may not even be able to do it, because there is a closed loop of references, so there isn't really a "lowest" table.