How Many Versions Do You Have?


Seems like a simple question however for SharePoint 2007 and 2010 it may be harder than you think to get a solid answer. What is really behind the question is how much space is my versions taking up or to put it another way if I trim my versions how much space will I get back. Unfortunately we have hit upon two shortcomings with SharePoint, 1) there is no out of the box reporting mechanism to give a SharePoint administrator any insight into number of versions or storage and 2) there is no out of the box trimming policy or mechanism which can be applied. For the most part we are left with implementing storage quota and either increasing quota when requested or asking users to do a self trim of documents and/or versions manually.

Later in this post I have included two TSQL scripts you can run against each of your Content DBs to get a better handle on document versions. The scripts use the NOLOCK construct but I would encourage not running them against a live production system and backups of your content DBs work nicely here – oh you don’t have backups of your content DBs, well then your version issue is only temporary. 🙂

The scripts focus on two aspects of version policy which I have seen incorporated – keep only x versions of a document and keep all versions for x years. After running the scripts below and through the magic of Excel 2010 I produced the following two reports. The first is document versions based upon age where age is by year. The way you read this thus, for YearOfAge of 0 means versions that are less than a year old, YearOfAge of 1 is versions with an age of 1 year or more but less than 2, etc. The SizeByYear and DocCountByYear represent the size and number of documents for that particular year.  The TotalDocCount and TotalSize is a sum of the current and prior year values. For example, there are 265,748 documents which are more than 1 year old with a total size of 102,763,586,583 bytes. So if someone was to make an argument to delete all versions 2 years of age or older then we can see that will clean up about 33,615 documents and about 16 GB of space.

 

The second report is based on number and size of documents by a specific version. Same data points as above but this time we are focusing on versions rather than a time span with SizeByVersion and DocCountByVersion being specific to a single version. TotalSize and TotalDocCount are a sum of prior versions. So its correct to say there are 304,260 documents with 7 or more versions with a combined size of 85,978,585,323 bytes.

So using these two TSQL scripts you can answer the version question. So how do I clean up those versions based upon some time aspect or number of versions or a combination of both? This will be in a later blog post.

TSQL scripts

As with anything on this blog these are not supported but supplied as sample code. I am not a SQL guy so those that sleep TSQL will undoubtedly point out my deficiencies and I welcome any improvements you may offer.

SharePoint 2010

WITH VersionAgeTable
AS
(
SELECT  DATEDIFF(year, [TimeCreated], GETUTCDATE()) AS [YearsOfAge], 
		SUM(CAST([Size] AS BIGINT)) AS [SizeByYear], 
		COUNT(*) AS [DocCountByYear]
FROM [dbo].[AllDocVersions] WITH (NOLOCK)
GROUP BY DATEDIFF(year, [TimeCreated], GETUTCDATE())
)
SELECT  d.[YearsOfAge],
		d.[SizeByYear] AS [SizeByYear],
		(SELECT SUM([SizeByYear]) FROM VersionAgeTable WHERE [YearsOfAge] >= d.[YearsOfAge]) AS [TotalSize],
		d.[DocCountByYear] AS [DocCountByYear],
		(SELECT SUM([DocCountByYear]) FROM VersionAgeTable WHERE [YearsOfAge] >= d.[YearsOfAge]) AS [TotalDocCount]
FROM VersionAgeTable AS d
ORDER BY d.[YearsOfAge];


WITH VersionTable
AS
(
	SELECT  [InternalVersion]/512 AS [Version], 
			SUM(CAST([Size] AS BIGINT)) AS [SizeByVersion], 
		    COUNT(*) AS [DocCountByVersion]
	FROM [dbo].[AllDocVersions] WITH (NOLOCK)
	GROUP BY [InternalVersion]/512
)
SELECT  d.[Version],
		d.[SizeByVersion] AS [SizeByVersion],
		(SELECT SUM([SizeByVersion]) FROM VersionTable WHERE [Version] >= d.[Version]) AS [TotalSize],
		d.[DocCountByVersion] AS [DocCountByVersion],
		(SELECT SUM([DocCountByVersion]) FROM VersionTable WHERE [Version] >= d.[Version]) AS [TotalDocCount]
FROM VersionTable AS d
ORDER BY d.[Version];

SharePoint 2007

WITH VersionAgeTable
AS
(
SELECT  DATEDIFF(year, [TimeCreated], GETUTCDATE()) AS [YearsOfAge], 
		SUM(CAST([Size] AS BIGINT)) AS [SizeByYear], 
		COUNT(*) AS [DocCountByYear]
FROM [dbo].[AllDocVersions] WITH (NOLOCK)
GROUP BY DATEDIFF(year, [TimeCreated], GETUTCDATE())
)
SELECT  d.[YearsOfAge],
		d.[SizeByYear] AS [SizeByYear],
		(SELECT SUM([SizeByYear]) FROM VersionAgeTable WHERE [YearsOfAge] >= d.[YearsOfAge]) AS [TotalSize],
		d.[DocCountByYear] AS [DocCountByYear],
		(SELECT SUM([DocCountByYear]) FROM VersionAgeTable WHERE [YearsOfAge] >= d.[YearsOfAge]) AS [TotalDocCount]
FROM VersionAgeTable AS d
ORDER BY d.[YearsOfAge];


WITH VersionTable
AS
(
	SELECT  [Version]/512 AS [Version], 
			SUM(CAST([Size] AS BIGINT)) AS [SizeByVersion], 
		    COUNT(*) AS [DocCountByVersion]
	FROM [dbo].[AllDocVersions] WITH (NOLOCK)
	GROUP BY [Version]/512
)
SELECT  d.[Version],
		d.[SizeByVersion] AS [SizeByVersion],
		(SELECT SUM([SizeByVersion]) FROM VersionTable WHERE [Version] >= d.[Version]) AS [TotalSize],
		d.[DocCountByVersion] AS [DocCountByVersion],
		(SELECT SUM([DocCountByVersion]) FROM VersionTable WHERE [Version] >= d.[Version]) AS [TotalDocCount]
FROM VersionTable AS d
ORDER BY d.[Version];

4 thoughts on “How Many Versions Do You Have?

  1. Hi Todd,

    Great post, very useful 🙂
    I have a concern though, maybe you can help me verify.
    After playing around with some queries regarding the documents, I believe that the “newest” version of a documents isn’t actually stored in the AllDocVersions table, but in the AllDocs table, and when this is updated, the previous version is moved to AllDocVersions (kind of like a history table) and latest version remains in AllDocs.

    If the above is correct, that means that for your space calculation queries, they also need to sum up the space used in the AllDocs table (or the AllDocStreams table where the content is stored) in order to show the space used.

    Hope to hear your thoughts on this!

    Best regards,
    -Bjorn

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s