Friday, September 25, 2009

Filegroup Woes

Today I spent far too much time trying to figure out a problem that had a very simple solution.

Some background: the databases I administer utilize partitioning for the purposes of archiving old data. I created a job that at the end of the month creates a new file group and file for the next month set to the size of the file for the previous month. Then the job adds the new filegroup to the partition, so when midnight rolls around on the first of the month, all new data gets put into the new file. This is all very standard stuff; however we recently decided that some of our files were getting a bit big, so we decided to add a second file each month (both in the same filegroup). This created some additional coding to my job to allow for the fact that if we told it to create a "b" file, it shouldn't try to create the filegroup as well.

Long story short, something went wrong last night, and the files weren't created and the job died on the Database with two files. While trying to figure it out I tried manually running the code to create the whole shebang:

USE dbname
ALTER DATABASE dbname ADD FILEGROUP [Filegroupname_200910]
ALTER DATABASE Leadman ADD FILE (NAME = 'filename_200910a'
,FILENAME = 'location' ,SIZE = 22438 ,FILEGROWTH = 500) TO FILEGROUP [Filegroupname_200910]

ALTER PARTITION SCHEME PartitionSchemeName NEXT USED FileGroupName_200910
ALTER PARTITION FUNCTION PartitionFunction_F01() SPLIT RANGE ('10/01/2009')



Except something went wrong, I got a message that the filegroup already existed
So I ran the last two lines again...no file error. I decided to kill the file and kick the job off again, let it do it's thing (plus it'd get rid of the error icon next to the job name). So I deleted the file, no problem, then I tried to delete the filegroup, but got this error:

The filegroup 'XXX' cannot be removed because it is not empty ...


I've seen this before, usually it means the file is on the partion still, or there is a file associated with it. Checked; nope not on the partition; no files associated with it, hmm this is weird.

While doing this, I was working with our new DBA, training him in our what jobs run, and how our system runs; so we both started looking for other objects that might be connected to the filegroup that we weren't thinking about. He found an article that suggested running this query:
SELECT *
FROM sys.destination_data_spaces DDS
JOIN sys.partition_schemes ps ON DDS.partition_scheme_id = ps.data_space_id
JOIN sys.partitions p ON DDS.destination_id = p.partition_number
JOIN sys.data_spaces AS ds ON ds.data_space_id = DDS.data_space_id
WHERE DDS.data_space_id = 7
-- I verified that the data space id was 7

This showed us that filegroup was associated with with a table and had two indexes associated with it, each with 18 million rows in them. This just didn't sound right as we couldn't figure out how that table could get on this filegroup, when there is no file associated with the filegroup...it served to be a major distraction that kept our attention for around an hour.

We broke down what the query was doing, looking at the individual parts and learned that the DMV was probably looking across the entire server, not just that one database. We learned to use syspartitions instead of sys.partitions...but that just confirmed what we already knew, no files associated with it. Argh, this was getting frustrating, what could be holding on to the filegroup.

After another hour of searching and messing around with different things, we came up with a thought, what if it was associated with the partition? It clicked for us, when I ran:
ALTER PARTITION SCHEME PartitionSchemeName NEXT USED FileGroupName_200910
I connected the partition to the filegroup. So I ran the query again, but changed the date to 200909 and then was able to remove the file group.

My job is setup to run again tonight and with luck, we should have our new filegroup/file by tomorrow.