How to archive data from my production databases?
Last updated: May 6th '02 | Best viewed with: All popular browsers | Best viewed at: 1024x768 | Links to external sites will open in a new window

About myself
My technical skills
My favorites
My picture album

Shortcut keys
My code library

VB resources
SQLServer resources
SQLServer books
Replication FAQ
Scripting resources
ASP resources

Search my site
Sign my guestbook
Contact information

SQL Server Articles New

 
NEW!!! Subscribe to my newsletter:
Want to keep in touch with the latest in SQL Server world? Email vyaskn@hotmail.com with 'subscribe' in the subject line
See also:  
Database coding conventions SQL Server interview questions
Evaluation of federated database servers SQL Server FAQ on programming, administration, replication and tools
SQL Server security best practices SQL Server administration best practices
Related books:
Inside SQL Server 2000 Microsoft SQL Server 2000 Unleashed SQL Server Backup and Recovery: Tools and Techniques

How to archive data from my production database? Is there a tool or command built-into SQL Server for this?


I often see this type of questions in the SQL Server newsgroups and mailing lists. So, decided to come up with a simple article, that's just good enough to get you started.

Archiving is the process of getting rid of old data (or data that is not valid anymore) from the main production databases. There is no built-in command or tool for archiving databases. In this article I'm going to show you how to implement this process.

Why archive in the first place? By archiving old data, you can keep your databases smaller and more manageable. With smaller tables, your index rebuild times and backup/restore times will be under control. You will also see performance gains on certain queries that either scan a larger portion of the table or clustered index.

Depending on the business needs, one might choose to either delete the unwanted data, or copy the data to a history table and delete that data from the production database.

Implementing an archival process is not that complicated. The three logical steps involved in this process are:

  • Identifying the data to be archived.

  • Optionally copying the identified data to a history table.

  • Deleting the identified data from the production database.

You may not always be archiving just a single table. You need to understand how different tables in the database are related to each other through foreign key relationships and then, sequence your delete/archive process, so that you are not violating any of the constraints.

Here's an example scenario of archiving all the orders taken more than 6 months ago, from the Orders and OrderDetails tables. The following stored procedure will copy orders older than 6 months into similar tables of a different database called Archive, and delete those rows from the main database:


CREATE PROC dbo.ArchiveData
(
	@CutOffDate datetime = NULL
)
AS
BEGIN
	SET NOCOUNT ON

	IF @CutOffDate IS NULL 
	BEGIN
		SET @CutOffDate = DATEADD(mm, -6, CURRENT_TIMESTAMP)

	END
	ELSE
	BEGIN
		IF @CutOffDate > DATEADD(mm, -3, CURRENT_TIMESTAMP)
		BEGIN
			RAISERROR ('Cannot delete orders from last three months', 16, 1)
			RETURN -1
		END
	END

	BEGIN TRAN

		INSERT INTO Archive.dbo.Orders
		SELECT * 
		FROM dbo.Orders
		WHERE OrderDate < @CutOffDate

		IF @@ERROR <> 0
		BEGIN
			ROLLBACK TRAN
			RAISERROR ('Error occured while copying data to Archive.dbo.Orders', 16, 1)
			RETURN -1
		END

		INSERT INTO Archive.dbo.OrderDetails
		SELECT *
		FROM dbo.OrderDetails
		WHERE OrderID IN
		(
			SELECT OrderID
			FROM dbo.Orders
			WHERE OrderDate < @CutOffDate
		)

		IF @@ERROR <> 0
		BEGIN
			ROLLBACK TRAN
			RAISERROR ('Error occured while copying data to Archive.dbo.OrderDetails', 16, 1)
			RETURN -1
		END

		DELETE dbo.OrderDetails
		WHERE OrderID IN
		(
			SELECT OrderID
			FROM dbo.Orders
			WHERE OrderDate < @CutOffDate
		)

		IF @@ERROR <> 0
		BEGIN
			ROLLBACK TRAN
			RAISERROR ('Error occured while deleting data from dbo.OrderDetails', 16, 1)
			RETURN -1
		END

		DELETE dbo.Orders
		WHERE OrderDate < @CutOffDate

		IF @@ERROR <> 0
		BEGIN
			ROLLBACK TRAN
			RAISERROR ('Error occured while deleting data from dbo.Orders', 16, 1)
			RETURN -1
		END

	IF @@TRANCOUNT > 0
	BEGIN
		COMMIT TRAN
		RETURN 0
	END

END


You just need to call this stored procedure without any parameters, to get rid of all orders older than 6 months. As you can see, you can pass an optional date parameter, to customize the archiving process. Care is taken not to delete orders placed in the last three months. Depending on your requirements, you can schedule this stored procedure as a job, using SQL Server Agent, so that it runs every week or fortnight.

This type of jobs tend to block the table while they are running. So, it's better to schedule these jobs for off-peak hours. Also, keep an eye on the transaction log space utilization. In case of any issues with log space, consider archiving in smaller batches.

Here's one thing you should consider: If you are using transactional replication to maintain a reporting database, data will be deleted from the reporting databses also, once you run the archival stored procedure. If that's not what you wanted (that is, maintain complete data in the reporting database), then you might want to consider publishing stored procedures, instead of tables, and customize the archiving stored procedure on the subscriber, so that it does nothing.