[Updated on 2024-06-30 based on feedback from comments]
A while ago, I blogged about how to efficiently update millions of records in a SQL table, today I’d like to show you a simple trick to do the same when deleting millions of records in a SQL table.
The idea is the same, you don’t want to do it all at once, instead, you want to delete a fixed number of records at a time. Why do we need to do this? For starters, when you delete data from a table, a lot of things happen behind the scenes, and because of this, you need to be careful when removing data, especially, large amounts of data.
Many activities are performed by SQL when removing data, and because of this, it is important to know that certain items like the ones listed below can affect the locking and performance behavior of SQL delete statements:
- Number of indexes in the table
- Poor indexed foreign keys
- Lock escalation mode
- Isolation levels
- Batch sizes of the delete statement
- Trigger(s)
- Temporal tables
One thing you need to consider is also cascade deletes, when you delete data from one table, SQL might also attempt to remove data from a few or many more tables with references to the data you are deleting. This is a good things as it prevents orphan records, but it has also the potential to affect the performance of your delete statements.
The list above shows you some of the main areas to consider when removing large amounts of data, but in this post, I will assume you have considered having the correct type of indexes to make sure that the indexes are helpful when deleting data, and not the reason for table blocking or worst, deadlocks.
I’d like to show you how to do batch deletes to only delete an specific number of records at a time. Performing the delete in batches, helps avoiding or reducing locking on the tables where the data is being removed. Below is an examples on how to do this via SQL code.
Please note that this is a simple example for demonstration purposes only and it is recommended that you add additional error handling and logging in your actual implementation.
Batch deletes with SQL
Here is an example of a DELETE statement that can be used to delete data in batches of 1000 records within Microsoft SQL.
using (IDbConnection connection = new SqlConnection(connectionString))
{
connection.Open();
int batchSize = 1000;
do
{
string deleteSql = @"
WITH CTE AS (
SELECT TOP (@batch) Id
FROM myTable
ORDER BY Id
)
DELETE FROM myTable
WHERE Id IN (SELECT Id FROM CTE)";
var parameters = new { batch = batchSize };
try
{
int rowsAffected = connection.Execute(deleteSql, parameters);
}
catch (SqlException ex)
{
// Handle SQL Exception here
Console.WriteLine($"Error during delete: {ex.Message}");
break;
}
} while (rowsAffected > 0);
}
- A CTE named CTE is created to select the top batchSize rows ordered by Id.
- The DELETE statement uses the CTE to delete those rows.
- The loop continues until no more rows are affected, indicating that the table is empty
Remember, the optimal batch size depends on your specific database and table size. It’s recommended to test with different batch sizes to find the most efficient option. This kind of query can be resource-intensive if the table being deleted from is very large, so it’s important to test the performance on a test environment before running it on production. However, if you have the need to delete millions of records, this is still much better than attempting to delete all records at once.
It’s also worth noting that this query will work on SQL Server. If you are using other SQL databases such as MySQL, PostgreSQL or Oracle, the syntax may be slightly different.
The optimal batch size for deleting records in SQL will depend on various factors such as the size of the table, the amount of available memory, and the overall performance of the database. In general, larger batch sizes can be more efficient as they reduce the number of times the database needs to perform the delete operation, but they also require more memory to hold the set of records being deleted.
A batch size of 1000 records is a common starting point, but it may not be the best option for your specific use case. You may want to experiment with different batch sizes such as 5000 and observe the performance of your database to determine the optimal batch size for your needs.
It’s important to keep in mind that larger batch sizes can consume more memory, especially if your table is large and contains a lot of data. Also, If you have indexes on the table, then deleting large batch of records at once may lead to a more significant impact on the performance of the database.
In addition, if you are working with a production database, you should also consider the possible impact on other queries or transactions that might be executing concurrently. It’s always a good practice to test your code on a test environment before deploying to a production environment.
Lastly, if you have a table or tables where the data needs to be deleted often, you might want to create a scheduled SQL job that will run weekly or as often as you need to, and delete the data that isn’t needed anymore. This will be helpful to avoid deleting large amounts of data at once, and it will also keep your table(s) from increasing in size without need.
Happy coding!
Leave a Reply