(Print this page)

Do you know the TextFieldParser?
Published date: Saturday, May 22, 2010
On: Moer and Éric Moreau's web site

When it comes the time to read a delimited text file, everybody has its own little set of functions to handle that. It is sometime a simple task but it can also be tricky when the files contains delimiters and quotes and delimiters between quotes and ... you know what I mean.

Let me introduce you to another method that is built into the Framework to easily find the data without worrying too much about the delimiters, whether the fields are enclosed into quotes, ... Say you simply want to read the fields into a string collection.

This s exactly what the TextFieldParser class is all about. If provides methods and properties for parsing structured text files. This class is hidden into the Microsoft.VisualBasic.FileIO namespace.

Available source code

Both VB and C# versions are provided this month.

Now if you are doing C#, you can use this class the very same way. You will only be required to add a reference to the Microsoft.VisualBasic.dll which doesn’t heart your application.

Even if the solution provided here was created with Visual Studio 2008, the main feature of this article was released within the .Net Framework 2.0 (VS 2005).

Supported in C#

A very unusual situation occurs when you look at the MSDN help for methods of this class. It shows no C# code. Even more, it is saying that it is not available to C#. This is totally wrong. C# solutions just have to add a reference to the Microsoft.VisualBasic.dll and it can then magically be used.

Introduction

Understanding the TextFieldParser object is very easy. It provides methods and properties to iterate over a text file and to split the strings extracting fields.

Two types of files can be processed using the TextFieldParser: delimited or fixed-width.

When you process a delimited file, properties such as Delimiters and HasFieldsEnclosedInQuotes are meaningful. When you process fixed-width files, FieldWidths property is meaningful.

Figure 1: The demo application in action

Processing a delimited text file

First, here is my code called when you click the “Read delimited file” button:

Private Sub ProcessDelimitedFile()
    'Provide a test file
    Dim strFile As String = IO.Path.Combine(Application.StartupPath, "TestDelimited.txt")

    'If a valid file path to a .txt file has been selected....
    If Not IO.File.Exists(strFile) Then
        MessageBox.Show("File does not exist!", _
                        "File not found", _
                        MessageBoxButtons.OK, _
                        MessageBoxIcon.Error)
        Return
    End If

    'Instantiate a reader with the file to process
    Dim reader As Microsoft.VisualBasic.FileIO.TextFieldParser = _
            My.Computer.FileSystem.OpenTextFieldParser(strFile)

    'Set the reader's TextFieldType to delimited 
    reader.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited
    'Set the readers Delimiters to a comma (,)
    reader.Delimiters = New String() {"," , vbTab}
    'ignore the lines starting with this token
    reader.CommentTokens = New String() {"--"}

    ' Ready to read the file....
    Do While Not reader.EndOfData
        Try
            'Parse the line into fields using gthe ReadFields method
            Dim arrFields As String() = reader.ReadFields()

            'Process the data just read
            Dim strCurrentLine As String = String.Empty
            For Each strField As String In arrFields
                strCurrentLine += "   -" + strField + Environment.NewLine
            Next
            MessageBox.Show("The current line contains:" + _
                            Environment.NewLine + _
                            strCurrentLine, _
                            "Data read", _
                            MessageBoxButtons.OK, _
                            MessageBoxIcon.Information)

        Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
            MessageBox.Show("Line " & ex.Message & _
            "is not valid and will be skipped.")
        End Try
    Loop

    'Close the reader
    reader.Close()
End Sub

Now, let’s go through it to provide some explanation.

The first line I will explain is where the reader is instantiated.

    Dim reader As Microsoft.VisualBasic.FileIO.TextFieldParser = _
            My.Computer.FileSystem.OpenTextFieldParser(strFile)

This line is simply declaring the reader variable and using the OpenTextFieldParser method passing the filename (with its full path). If the file exists and there is no problem with it, the reader variable will now contain a handle to the file.

The following 3 lines are setting the TextFieldType property to Delimited (the other choice will be fixed width as we will see later in this article). The second line provides the delimiters used to split the line into different fields. Because it is an array, you might want to provide more than one delimiter. The last line provides another array of tokens that will have some lines ignored by the parser if that line starts with one of those token.

Those lines are:

    reader.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited
    reader.Delimiters = New String() {"," , vbTab}
    reader.CommentTokens = New String() {"--"}

Now that we have provided some instructions to the parser, we can start looping through the file.

The EndOfData property indicates that we have reached the end of the file. That’s why we start with the loop checking that property to ensure that the file is not empty:

Do While Not reader.EndOfData
   ...
Loop

To read an actual line of data, parse it and split it into fields, we can use the ReadFields method which returns an array of strings:

Dim arrFields As String() = reader.ReadFields()

Once you have the array of strings, you do whatever you want with it. In my simple demo, I concatenate them and display the line into a message box.

Creating a test file for the delimited process

To test this small project, you will need to provide a test file. The process is expecting to find a file named TestDelimited.txt in the same folder as the .Exe.

You can easily create one and automatically copy it when needed by adding a text file to your project (Project -> Add New item...) and select the “Text file” template as show in figure 2.

Figure 2: Adding a text file to the project

To automatically copy the file whenever it is required (whenever it changes) during the compilation of the project, open the properties of the file and set its “Copy to Output Directory” to “Copy if newer” as shown in figure 3.

Figure 3: setting the test file properties

My sample text file contains this data:

ID,Name,Height
--skip this line
1,"Joe Dalton",1.23
2,"Jack Dalton",2.34
3,"William Dalton",3.45
4,"Averell Dalton",4.56
5	test	t2

Notice that on the last line, I used the Tab key (and not spaces) between my values.

Now if you run your application with this demo file, you will discover that:

  • all the lines are displayed one-by-one into a message box
  • the lines that starts with -- is never displayed because we have set the CommentTokens collection
  • all the fields are properly interpreted because we have added 2 delimiters (the coma and the tab) to the Delimiters collection

Processing a fixed-width file

The process of reading a fixed-width file is the same as the one for delimited files except that instead of providing delimiters, we need to provide the width (in characters) of each field. So from my previous method, I only need to modify 3 lines of code.

The first line to modify contains the file name:

Dim strFile As String = IO.Path.Combine(Application.StartupPath, "TestFixedWidth.txt")

The second modification is to delete the line that sets the Delimiters collection.

The last modification is to add a new line that specifies the width of each field. It can be at the same place where the line you set the delimiters of the previous example (the line you just deleted):

reader.SetFieldWidths(5, 21, 7, -1)

The test file I have created to test this process has this content:

ID   Name                 Height Comments
---- -------------------- ------ ---------------------------
--skip this line
1    Joe Dalton           1.23   The short guy
2    Jack Dalton          2.34      
3    William Dalton       3.45     
4    Averell Dalton       4.56   The dummy guy
[T]5 test                 t2     112233.44           This is just another comments!

This file and the SetFieldWidths call deserve a couple of explanation.

First, I have set the width of the first field to 5 even if the width I am expecting only 4 characters because there is a space between the 2 columns. I could also have set the width’s array to (4, 1, 20, 1, 6, 1, -1) but that would have created a lot of dummy fields.

Also, the last value has is set to -1. This means that the last field as a variable length. Instead of setting a big number, you can just set it to -1.

You also need to ensure that your file really contains spaces and not tabs otherwise your fields will all be messy.

Last but not least, and I really don’t like this behaviour, if you press the Enter key right after the Height field and you don’t leave a couple of spaces to have at least one under the Comments column, you will have a MalformedLineException exception.

Handling multiple column widths in a same file

When you process a fixed-width file, it sometimes happens that you have different line format. The TextParser class supports that behaviour.

If you check the last line of my last file, there is a number under the comments and the comments is a bit offset.

Just before using the ReadFields method, you can read the first couple of characters (using the PeekChars method) without modifying the cursor in the file and adjust the field width accordingly before calling the ReadFields method normally.

Here is an example. I read the first 3 characters from the line. If these 3 characters are exactly what I am looking for ([T] in my scenario), I then set the field widths to handle 5 fields otherwise, it is set to 4 fields.

Dim strRowType = reader.PeekChars(3)
If strRowType.Trim.ToUpper = "[T]" Then
    reader.SetFieldWidths(5, 21, 7, 20, -1)
Else
    reader.SetFieldWidths(5, 21, 7, -1)
End If
Dim arrFields As String() = reader.ReadFields()

Error handling

If any lines are corrupt, an exception of type MalformedLineException will be triggered. You need to trap it and you decide what you do with it. In my demo, I simply report the error line and continue parsing.

Try
   ...
Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
   ...
End Try

Using a MemoryStream instead of a file

You may have situations where what you have to parse is not a file but you have it in a string you already have in memory. You will surely be happy to learn that you will not have to save your string in a file.

Everything you read so far still applies to MemoryStream. The only thing that needs to be changed is the initialization of the reader variable.

Here is a short example of what it would look like:

Dim strStringToParse As String = String.Empty
strStringToParse += "1,Joe Dalton,1.23" + Environment.NewLine
strStringToParse += "2,Jack Dalton,2.34" + Environment.NewLine

Dim objEnc As New ASCIIEncoding
Dim objMS As New IO.MemoryStream(objEnc.GetBytes(strStringToParse))
'Instantiate a reader with the file to process
Dim reader As Microsoft.VisualBasic.FileIO.TextFieldParser = _
       New FileIO.TextFieldParser(objMS)

As you can see, I create a regular string and initialize a memory stream object with it. I finally use the memory stream object to initialize the TextFieldParser object.

Conclusion

It won’t do miracle but if your text files or strings are well formed it will save you tons of lines of code with this easy to use yet unknown class.

The next time you will have fixed-width or delimited text to parse, come back to this article.

I hope you appreciated the topic and see you next month.


(Print this page)