Build directory structures using SQL Server 2005

If you ever work with directory structures on the filing system, you know how challenging it can be to traverse through folders to find specific file(s). If you store this type of information structure in the database, you are even more aware of what it takes to retrieve the data. Writing queries to pull this information is sometimes difficult to achieve and inefficient. You can use the recursion and XML features in SQL Server 2005 to build a file location on the fly.

Example

This example searches for a document and builds the path to the document based upon a parent-folder to child-folder relationship in the database. One file will belong to one folder, which may be a child folder in a long lineage of parent folders. The ultimate goal is to provide a file to be searched for, and the process will build the location to the file.

I've seen file path locations stored several ways in the database, usually with the purpose of storing the location of a file to be pulled for a Web site. Most of the time, the full path to a file is stored in one database field, but I have also seen the location of a file "normalized" so that the past must be built when needed. My goal for this article is to solve the issue of building the path from the hierarchical structure.

The script below creates a Documents table and a Folders table. The Documents table stores the filenames and the folder that the document resides in. The Folders table stores the directory structure of one or more local or network drives. Most of the work in this example will involve traversing through this folder structure to build the path to the file.

IF OBJECT_ID('Documents','U') IS NOT NULL
DROP TABLE Documents

IF OBJECT_ID('Folders','U') IS NOT NULL
DROP TABLE Folders

IF OBJECT_ID('udf_BuildDocumentPath','FN') IS NOT NULL
DROP FUNCTION udf_BuildDocumentPath

CREATE TABLE Documents
(
        DocumentID SMALLINT,
        FolderID SMALLINT,
        DocumentName VARCHAR(255)
)

CREATE TABLE Folders
(
        FolderID SMALLINT,
        ParentFolderID SMALLINT,
        FolderName VARCHAR(255)
)

The code below adds data to our newly created tables. I am adding data for three documents, all of which are located in the same folder.

INSERT INTO Documents(DocumentID, FolderID, DocumentName)
VALUES(1,5,'SalesForecast2008.xls')
INSERT INTO Documents(DocumentID, FolderID, DocumentName)
VALUES(2,5,'SalesProjection.doc')
INSERT INTO Documents(DocumentID, FolderID, DocumentName)
VALUES(3,5,'SalesForecastPresentation.ppt')

INSERT INTO Folders(FolderID, ParentFolderID, FolderName)
VALUES(1,null, 'D:')
INSERT INTO Folders(FolderID, ParentFolderID, FolderName)
VALUES(2,1, 'Sales')
INSERT INTO Folders(FolderID, ParentFolderID, FolderName)
VALUES(3,2, 'Forecasts')
INSERT INTO Folders(FolderID, ParentFolderID, FolderName)
VALUES(4,3, 'Data')
INSERT INTO Folders(FolderID, ParentFolderID, FolderName)
VALUES(5,4, '2008')
GO

The script below creates the function that will build the full path to the file based upon the DocumentID in the Documents table. This function uses a recursive common table expression (CTE) to traverse through the directory structure, linking the child folder ID to the parent folder ID in the table. Once the set of records are found that comprise the full path to the document, the FOR XML PATH('') construct is used to "pivot" these values from values in different rows to values concatenated in the same row. From there, it is just a matter of returning the build path to the caller.

FOR XML Path() is one of my favorite features in SQL Server 2005 because it makes it so easy to take a list of column values from different rows and concatenate them together so that they are on the same row. It is an ideal tool for dynamically building SQL statements that require list of different values for use in an IN() statement.

CREATE FUNCTION udf_BuildDocumentPath
(
        @DocumentID SMALLINT
)
RETURNS VARCHAR(400)
AS
BEGIN
        DECLARE @ReturnPath VARCHAR(400)

;WITH DirectoryPathCTE(DocumentID, FolderID, ParentFolderID, DocumentName, FolderName, LevelNumber)
        AS
        (
        SELECT
               DocumentID, f.FolderID, ParentFolderID, DocumentName, f.FolderName, 0
        FROM
               Documents d
               INNER JOIN folders f on d.FolderID = f.FolderID
        WHERE
               DocumentID = @DocumentID
        UNION ALL
        SELECT
               DocumentID, f.FolderID, f.ParentFolderID, DocumentName, f.FolderName, p.LevelNumber + 1
        FROM
               Folders f
               INNER JOIN DirectoryPathCTE p on p.ParentFolderID = f.FolderID
        )
        SELECT @ReturnPath =
        (
               SELECT
                  FolderName + '' + CASE WHEN LevelNumber = 0 THEN DocumentName ELSE '' END
               FROM
                       DirectoryPathCTE p
               ORDER BY LevelNumber DESC
               FOR XML PATH('')
        )

        RETURN(@ReturnPath)

END
GO

Now that my function is built, I can call it for every document I have in my Documents table, and the path to the file will be built based upon the DocumentID in the Documents table.

SELECT dbo.udf_BuildDocumentPath(d.DocumentID)
FROM Documents d

Conclusion

Even if you never need to implement the example in this article, I hope you'll take away some ideas from the recursion and the FOR XML PATH clause to solve some tricky problems you may encounter in the future.


  • Date: March 17th, 2008
  • Author: Tim Chapman

Comments

Popular posts from this blog

DevOps - Key Concepts - Roles & Responsibilities - Tools & Technologies

Rewriting the DotNetNuke Url Rewriter Module

Trouble In the House of Google