When it comes the time to read a delimited text file, everybody has its own little set of functions to handle that. It is sometime a simple task but it can also be tricky when the files contains delimiters and quotes and delimiters between quotes and ... you know what I mean.
Let me introduce you to another method that is built into the Framework to easily find the data without worrying too much about the delimiters, whether the fields are enclosed into quotes, ... Say you simply want to read the fields into a string collection.
This s exactly what the TextFieldParser class is all about. If provides methods and properties for parsing structured text files. This class is hidden into the Microsoft.VisualBasic.FileIO namespace.
Available source code
Both VB and C# versions are provided this month.
Now if you are doing C#, you can use this class the very same way. You will only be required to add a reference to the Microsoft.VisualBasic.dll which doesn’t heart your application.
Even if the solution provided here was created with Visual Studio 2008, the main feature of this article was released within the .Net Framework 2.0 (VS 2005).
Supported in C#
A very unusual situation occurs when you look at the MSDN help for methods of this class. It shows no C# code. Even more, it is saying that it is not available to C#. This is totally wrong. C# solutions just have to add a reference to the Microsoft.VisualBasic.dll and it can then magically be used.
Introduction
Understanding the TextFieldParser object is very easy. It provides methods and properties to iterate over a text file and to split the strings extracting fields.
Two types of files can be processed using the TextFieldParser: delimited or fixed-width.
When you process a delimited file, properties such as Delimiters and HasFieldsEnclosedInQuotes are meaningful. When you process fixed-width files, FieldWidths property is meaningful.
Figure 1: The demo application in action
Processing a delimited text file
First, here is my code called when you click the “Read delimited file” button:
Private Sub ProcessDelimitedFile() 'Provide a test file Dim strFile As String = IO.Path.Combine(Application.StartupPath, "TestDelimited.txt") 'If a valid file path to a .txt file has been selected.... If Not IO.File.Exists(strFile) Then MessageBox.Show("File does not exist!", _ "File not found", _ MessageBoxButtons.OK, _ MessageBoxIcon.Error) Return End If 'Instantiate a reader with the file to process Dim reader As Microsoft.VisualBasic.FileIO.TextFieldParser = _ My.Computer.FileSystem.OpenTextFieldParser(strFile) 'Set the reader's TextFieldType to delimited reader.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited 'Set the readers Delimiters to a comma (,) reader.Delimiters = New String() {"," , vbTab} 'ignore the lines starting with this token reader.CommentTokens = New String() {"--"} ' Ready to read the file.... Do While Not reader.EndOfData Try 'Parse the line into fields using gthe ReadFields method Dim arrFields As String() = reader.ReadFields() 'Process the data just read Dim strCurrentLine As String = String.Empty For Each strField As String In arrFields strCurrentLine += " -" + strField + Environment.NewLine Next MessageBox.Show("The current line contains:" + _ Environment.NewLine + _ strCurrentLine, _ "Data read", _ MessageBoxButtons.OK, _ MessageBoxIcon.Information) Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException MessageBox.Show("Line " & ex.Message & _ "is not valid and will be skipped.") End Try Loop 'Close the reader reader.Close() End Sub
Now, let’s go through it to provide some explanation.
The first line I will explain is where the reader is instantiated.
Dim reader As Microsoft.VisualBasic.FileIO.TextFieldParser = _ My.Computer.FileSystem.OpenTextFieldParser(strFile)
This line is simply declaring the reader variable and using the OpenTextFieldParser method passing the filename (with its full path). If the file exists and there is no problem with it, the reader variable will now contain a handle to the file.
The following 3 lines are setting the TextFieldType property to Delimited (the other choice will be fixed width as we will see later in this article). The second line provides the delimiters used to split the line into different fields. Because it is an array, you might want to provide more than one delimiter. The last line provides another array of tokens that will have some lines ignored by the parser if that line starts with one of those token.
Those lines are:
reader.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited reader.Delimiters = New String() {"," , vbTab} reader.CommentTokens = New String() {"--"}
Now that we have provided some instructions to the parser, we can start looping through the file.
The EndOfData property indicates that we have reached the end of the file. That’s why we start with the loop checking that property to ensure that the file is not empty:
Do While Not reader.EndOfData ... Loop
To read an actual line of data, parse it and split it into fields, we can use the ReadFields method which returns an array of strings:
Dim arrFields As String() = reader.ReadFields()
Once you have the array of strings, you do whatever you want with it. In my simple demo, I concatenate them and display the line into a message box.
Creating a test file for the delimited process
To test this small project, you will need to provide a test file. The process is expecting to find a file named TestDelimited.txt in the same folder as the .Exe.
You can easily create one and automatically copy it when needed by adding a text file to your project (Project -> Add New item...) and select the “Text file” template as show in figure 2.
Figure 2: Adding a text file to the project
To automatically copy the file whenever it is required (whenever it changes) during the compilation of the project, open the properties of the file and set its “Copy to Output Directory” to “Copy if newer” as shown in figure 3.
Figure 3: setting the test file properties
My sample text file contains this data:
ID,Name,Height --skip this line 1,"Joe Dalton",1.23 2,"Jack Dalton",2.34 3,"William Dalton",3.45 4,"Averell Dalton",4.56 5 test t2
Notice that on the last line, I used the Tab key (and not spaces) between my values.
Now if you run your application with this demo file, you will discover that:
Processing a fixed-width file
The process of reading a fixed-width file is the same as the one for delimited files except that instead of providing delimiters, we need to provide the width (in characters) of each field. So from my previous method, I only need to modify 3 lines of code.
The first line to modify contains the file name:
Dim strFile As String = IO.Path.Combine(Application.StartupPath, "TestFixedWidth.txt")
The second modification is to delete the line that sets the Delimiters collection.
The last modification is to add a new line that specifies the width of each field. It can be at the same place where the line you set the delimiters of the previous example (the line you just deleted):
reader.SetFieldWidths(5, 21, 7, -1)
The test file I have created to test this process has this content:
ID Name Height Comments ---- -------------------- ------ --------------------------- --skip this line 1 Joe Dalton 1.23 The short guy 2 Jack Dalton 2.34 3 William Dalton 3.45 4 Averell Dalton 4.56 The dummy guy [T]5 test t2 112233.44 This is just another comments!
This file and the SetFieldWidths call deserve a couple of explanation.
First, I have set the width of the first field to 5 even if the width I am expecting only 4 characters because there is a space between the 2 columns. I could also have set the width’s array to (4, 1, 20, 1, 6, 1, -1) but that would have created a lot of dummy fields.
Also, the last value has is set to -1. This means that the last field as a variable length. Instead of setting a big number, you can just set it to -1.
You also need to ensure that your file really contains spaces and not tabs otherwise your fields will all be messy.
Last but not least, and I really don’t like this behaviour, if you press the Enter key right after the Height field and you don’t leave a couple of spaces to have at least one under the Comments column, you will have a MalformedLineException exception.
Handling multiple column widths in a same file
When you process a fixed-width file, it sometimes happens that you have different line format. The TextParser class supports that behaviour.
If you check the last line of my last file, there is a number under the comments and the comments is a bit offset.
Just before using the ReadFields method, you can read the first couple of characters (using the PeekChars method) without modifying the cursor in the file and adjust the field width accordingly before calling the ReadFields method normally.
Here is an example. I read the first 3 characters from the line. If these 3 characters are exactly what I am looking for ([T] in my scenario), I then set the field widths to handle 5 fields otherwise, it is set to 4 fields.
Dim strRowType = reader.PeekChars(3) If strRowType.Trim.ToUpper = "[T]" Then reader.SetFieldWidths(5, 21, 7, 20, -1) Else reader.SetFieldWidths(5, 21, 7, -1) End If Dim arrFields As String() = reader.ReadFields()
Error handling
If any lines are corrupt, an exception of type MalformedLineException will be triggered. You need to trap it and you decide what you do with it. In my demo, I simply report the error line and continue parsing.
Try ... Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException ... End Try
Using a MemoryStream instead of a file
You may have situations where what you have to parse is not a file but you have it in a string you already have in memory. You will surely be happy to learn that you will not have to save your string in a file.
Everything you read so far still applies to MemoryStream. The only thing that needs to be changed is the initialization of the reader variable.
Here is a short example of what it would look like:
Dim strStringToParse As String = String.Empty strStringToParse += "1,Joe Dalton,1.23" + Environment.NewLine strStringToParse += "2,Jack Dalton,2.34" + Environment.NewLine Dim objEnc As New ASCIIEncoding Dim objMS As New IO.MemoryStream(objEnc.GetBytes(strStringToParse)) 'Instantiate a reader with the file to process Dim reader As Microsoft.VisualBasic.FileIO.TextFieldParser = _ New FileIO.TextFieldParser(objMS)
As you can see, I create a regular string and initialize a memory stream object with it. I finally use the memory stream object to initialize the TextFieldParser object.
Conclusion
It won’t do miracle but if your text files or strings are well formed it will save you tons of lines of code with this easy to use yet unknown class.
The next time you will have fixed-width or delimited text to parse, come back to this article.
I hope you appreciated the topic and see you next month.