(Print this page)

Microsoft SQL Server 2008 – The FileStream feature
Published date: Sunday, September 13, 2009
On: Moer and Éric Moreau's web site

Almost every application has to store files to be used at some time. For example, those files could be (but not limited to) help files, reports, templates, images. Every developer faced the great debate of having those files maintained into a database or kept on a file system with a long list of pros and cons for each side.

I had to explore and test that feature for an upcoming project and decided to expose the results here.

What you need

The FileStream feature of SQL Server appeared in Microsoft SQL Server 2008. So this is the base version your need. To ease the development of your application, you should be using Visual Studio 2008 SP1 and the .Net Framework 3.5 SP1 because the SP1 has added features specific to the FileStream (more specifically the System.Data.SqlTypes.SqlFileStream class).

You might be happy that the Express version of SQL Server 2008 also supports this feature.

The demo application

I have worked hard for you this month again. You will find a VB and a C# version of the demo application. Again, the solution requires Microsoft Visual Studio 2008 SP1 and Microsoft SQL Server 2008.

Before the FileStream feature

As you have read into the introduction, before the FileStream feature was added to SQL Server, developers had 2 ways of handling files for an application.

The first method is to use the file system. Some of the problems are that the backup of the files is not always synchronized with the backup of the database, transactions were difficult to achieve, text search is hard, synchronization of a test environment is harder, DRP (Disaster Recovery Plan) is also harder because paths are often stored into the database.

The second method is to store the file into a BLOB, now called a VarBinary(max) where you have the benefits of the backups, the full text search but you are limited to 2GB per file. Operations on the database (backups and other maintenance tasks) take more time. Add to those constraints that many DBAs don’t want to hear anything about storing files into a database!

Now that we have the FileStream feature

This feature offers the best of both worlds. You get the all the benefits of the database like backups, security, full text search, transactions, replication but the files are not stored into the database (only a pointer to the file is stored) and you can exceed the 2GB limits (you are now limited by the OS and the available space on disk).

There are some other limitations like the column holding the document (or the link to the document) cannot be encrypted and the database mirroring is not supported.

What about the speed?

That’s an interesting question. After considering all the other pros and cons of the file system and the new FileStream feature, you might have this question that is coming to your mind. Is the file system faster or slower than the database?

I don’t have benchmarks for you but I have a link to provide you: “To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem”. This article was written in 2006 by people (including the late Jim Gray) of Microsoft Research. In short, the conclusion can read as: “(...) objects smaller than 256K are best stored in a database while objects larger than 1M are best stored in the filesystem. Between 256K and 1M, the read:write ratio and rate of object overwrite or replacement are important factors.”.

Setting-up your server

This can be a real pain if the server administrator is not you or your best friend! To enable the FileStream feature on your SQL Server instance, you must have admin privileges at the OS level. By default, the FileStream is turned off (as almost all features) but your sysadmin might have turned it on while installing SQL Server 2008.

If the feature is not already enabled, the first thing you need to do is to enable the FileStream feature. This is done by opening the properties for your SQL Server instance from the SQL Server Configuration Manager (see figure 1).

Figure 1: Enabling the FileStream feature for SQL Server

In this dialog, you will need to select the first checkbox (Enable FILESTREAM for Transact-SQL access) for sure. If you want to read and write FILESTREAM data from Windows, you need to select the second checkbox (Enable FILESTREAM for file I/O streaming access) and enter the name of the share. If you have remote clients you need to serve, you must also select the third checkbox (Allow remote clients to have streaming access to FILESTREAM data).

Once you have enabled the FileStream feature, you need to stop you SQL Server instance and restart it in order to start the new feature. You won’t be able to do any other commands related to FileStream as long the instance has not been restarted.

You now need to open SSMS (SQL Server Management Studio) and execute these statements:

EXEC sp_Configure FileStream_Access_Level, 2
RECONFIGURE

The last parameter is the access level which needs to be set to 0, 1, or 2. The value you write varies according to the level you have selected into the configuration dialog. Those values are 0-based. I strongly suggest to set it to 2 otherwise some queries of this article won’t work properly.

Read the Messages section to ensure that the statements ran correctly.

The server is now properly configured.

Configuring the database

We are now ready to create a database to test this feature (notice that you can also modify an existing database). This is a lot easier than configuring the server itself.

This statement will create a new database and the associated folder that will contain the files. Execute statements like these:

CREATE DATABASE TestFileStream 
  ON
    PRIMARY 
      ( NAME = Main, FILENAME = 'C:\Temp\TestFS\TestFS.mdf'),
      FILEGROUP FSGroup1 
      CONTAINS FILESTREAM
      ( NAME = FSFiles, FILENAME = 'C:\Temp\TestFS\Files')
  LOG ON  
    ( NAME = Archlog1, FILENAME = 'C:\Temp\TestFS\TestFS.ldf')
GO

USE TestFileStream
GO 

Ensure that you root folder (c:\Temp\TestFS) exists and that the files folder is not (c:\Temp\TestFS\Files) otherwise you will receive errors. You don’t have to store the files in a sub-folder of your database. It can be on a completely different disk (which would make sense if you have multiple hard drives on your server to optimize your performance).

Your database is now ready to handle FileStream.

Creating a table

New attributes have been added to the creation of tables in SQL Server. I haven’t found the most important attributes in the create table dialog so you won’t have other options than to script your table creation.

This script creates a three-column table to hold pointers to the files:

CREATE TABLE Files (
  [ID] UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE,
  [Name] VARCHAR(20) NOT NULL,
  [Data] VARBINARY(max) FILESTREAM DEFAULT(0x)
)

As a requirement, you need to have a ROWGUID column like my ID column. If you try to create a plain old Identity column, the table won’t be created and you will get an error (Msg 5505, Level 16, State 1, Line 1 - A table with FILESTREAM column(s) must have a non-NULL unique ROWGUID column).

The column that will be used as a pointer to the real file on the disk needs to be created as a VarBinary(max) with the FileStream attribute (this is the attribute I cannot find in the dialog). You will also find that I set a default value to (0x) to create an empty file on the disk as soon as a row is inserted into the table.

In addition to an ID, you normally need a name (and there is no special attributes here). You could also have anything else your application may require (latest update, file type, creator, ...).

First tests from SSMS

You are surely anxious to try what we have just setup.

I have good news and bad news for you! Yes we are able to test it from SSMS but we won’t be able to store real binary data into our files yet because there is nothing in SSMS to transform a file into its binary representation. So our test will be done with simple strings (and the application we will create next will handle real files).

So if you execute a query like this in SSMS:

INSERT INTO Files(ID, [Name], Data)
VALUES(NEWID(), 'Test 1', CAST('my very first test' AS VARBINARY(max)))

This query will insert a new row into our table and it will also create a file into our folder (somewhere in an obscure path) and the filename is also obscure. It does not seem to relate to our GUID!

If you run this query:

SELECT * 
FROM Files 
WHERE [Name] = 'Test 1' 

This query will return all the data (3 fields) you have inserted in which you will recognize the Name but not the Data field (we can ignore the ID field which was a new GUID).

Because the data we just created is plain text, we can safely cast the Data field into a VARCHAR field to see its real content (don’t expect to see an image if you have inserted a GIF) with a query like this:

SELECT *, CAST(Data AS VARCHAR(MAX)) AS RealData
FROM Files
WHERE [Name] = 'Test 1' 

Also, you can run a query to get the PathName of the file that was created by SQL Server on your disk:

SELECT *, CAST(Data AS VARCHAR(MAX)) AS RealData, [Data].PathName() AS PathName
FROM Files
WHERE [Name] = 'Test 1'

The PathName function was added specifically for the FileStream fields. It will give you a path but don’t try navigating to this folder as it will return an error (access denied). But if you navigate to the folder you have set when you created the database (c:\Temp\TestFS\Files in my case), you will find some sub-folders and one of those (if you dig deeply enough) will contain a file. Open this file with the famous Notepad and you will see the content.

Updating the table content is not more difficult. A simple query like this one will do the job:

UPDATE Files
SET Data = CAST('the second test' AS VARBINARY(max))
WHERE [Name] = 'Test 1'

If you look at the sub-folder containing your file right now, you may see 2 files. Opening them into Notepad will show you the first content and the second content. At some point the first file will disappear by itself.

The same behaviour can be observed when deleting a row, it takes some times to physically delete the files from disks:

DELETE FROM Files
WHERE [Name] = 'Test 1'

If you really want to see the file disappear, you may want to run this query:

CHECKPOINT

The real test: the demo application

The demo application does 3 things:

  • List the files available from the table into a grid.
  • Insert a new row and upload a document into the FileStream field.
  • Download a FileStream field and recreate a file with its content.

These 3 operations are clearly identified into the figure 2. Each operation has its own GroupBox control.

Figure 2: The demo application in action

The Refresh button simply retrieves a list of Ids and Names from the Files table in SQL Server and displays them into a DataGridView control. So far there is nothing about the FileStream feature.

After having provided a full path into the textbox of the “Upload files” section, clicking the Upload button inserts all the data except the content of the file into the Files table. Here are the important lines of code (download the demo application for all the lines):

Dim cmd As New SqlCommand("INSERT INTO Files ([ID], [Name]) VALUES (@ID, @Name)", cn)
cmd.Parameters.Add("@ID", SqlDbType.UniqueIdentifier).Value = ID
cmd.Parameters.Add("@Name", SqlDbType.VarChar, 20).Value = strFileName
cmd.ExecuteNonQuery()

Once the header has been inserted, you can now retrieve the path name and the transaction context from SQL Server for this specific file like this:

Dim trx As SqlTransaction = cn.BeginTransaction
cmd = New SqlCommand("SELECT [Data].PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT() " + _
                     "FROM Files " + _
                     "WHERE ID = @ID", cn)
cmd.Transaction = trx
cmd.Parameters.Add("@ID", SqlDbType.UniqueIdentifier).Value = ID
Dim rdr As SqlDataReader = cmd.ExecuteReader(CommandBehavior.SingleRow)
rdr.Read()
Dim strfilePath As String = rdr.GetString(0)
Dim trxID As Byte() = DirectCast(rdr(1), Byte())
rdr.Close()

Finally, you can copy your file content using a SQLFileStream object like this:

Using fs As IO.FileStream = IO.File.OpenRead(pFile)
    Using sqlFS As New SqlTypes.SqlFileStream(strfilePath, trxID, IO.FileAccess.Write)
        Dim buffer As Byte() = New Byte(512 * 1024) {}
        Dim intPos As Integer = fs.Read(buffer, 0, buffer.Length)
        Do While intPos > 0
            sqlFS.Write(buffer, 0, intPos)
            intPos = fs.Read(buffer, 0, buffer.Length)
        Loop
    End Using
End Using

Downloading a file content from SQL Server is almost the same as the last snippet of code but instead of reading the local stream and writing to the SQLFileStream, you do the inverse (write local and read from SQL):

Using fs As IO.FileStream = IO.File.OpenWrite(pFileName)
    Using sqlFS As New SqlTypes.SqlFileStream(strFilePath, trxID, IO.FileAccess.Read)
        Dim buffer As Byte() = New Byte(512 * 1024) {}
        Dim intPos As Integer = sqlFS.Read(buffer, 0, buffer.Length)
        Do While intPos > 0
            fs.Write(buffer, 0, intPos)
            intPos = sqlFS.Read(buffer, 0, buffer.Length)
        Loop
    End Using
End Using

Those are the kind of operations you will wrap into a class to hide the details of this feature.

The Connection String

There is a limitation to the FileStream feature regarding the Connection String. You will need to use Integrated Security because single accounts are not allowed to go into the FileStream folder. If you are not using SQL integrated security, you will need to run stored procedures and impersonate another account. You will find articles on the Internet showing you how to do this.

Conclusion

It requires more than a single line of code to activate this feature but I really think it worth it if you have large files (remember that if you only have small files, you may get better performance storing the files directly into the database).

Hopefully, the next versions will have more options to manage FileStream more easily directly from SSMS.


(Print this page)