(Print this page)

A free LINQ to Excel (and CSV) provider
Published date: Tuesday, July 23, 2013
On: Moer and Éric Moreau's web site

I am not sure if it is still the case, but for a long time, companies had more data in Excel spreadsheets all over the place than properly stored in databases. I am not saying that this is the best thing you can do. I am a big fan of letting my application exporting data to Excel to let the users further manipulate it.

Or even if XML is a very useful formatting for data exchange between systems, again Excel (or their CSV counterpart) files are more popular (surely because many data is stored in this format already!).

None the less, chances are that you will end up sooner or later having to process one of these in your applications.

This month, I will introduce you to my latest found: a free LINQ to Excel provider. This is a nice solution when the format of the files you are receiving is both known and static.

Demo code

The code this month is provided in both VB and C#. The solution was created using Visual Studio 2012 but can surely be used with older version as well.

What you need

You will need a free library (MIT license) which can be downloaded from http://code.google.com/p/linqtoexcel/downloads/list. A NuGet package also exist if you prefer.

Adding references

Before being able to start coding using this library, you will need to reference it. As you can see in figure 1, you need to reference 2 files. When you download the library, be very cautious at the platform you are downloading. There is an x64 version and x86 version (the one without suffix). The platform you are targeting in Visual Studio must match this version.

Figure 1: Adding references

The demo application

As shown in figure 2, the UI for the test application is really simple. There are 4 buttons at the top of the form (one for each test) and a grid to quickly display the output.

The solution you can download also includes a sample Excel file with 2 worksheets (name DemoFile.xls) and a CSV file (DemoFile.CSV) containing the first worksheet.

Figure 2: The demo application in action

A first example: raw data

The first snippet of code I will show you is this one:

'Open an Excel file from the same folder as the executable
Dim fileName = IO.Path.Combine(IO.Directory.GetCurrentDirectory(), "DemoFile.xls")
Dim book = New LinqToExcel.ExcelQueryFactory(fileName)
 
'Extract the data from Sheet1 usinq LINQ
Dim data = From x In book.Worksheet()
           Select x
'Try to display the content of the Excel file in a datagrid
DataGridView1.DataSource = data.ToList()

This code sets a variable (fileName) to the Excel demo file found in the same folder as the executing assembly (because the “Copy to Output Directory” property of the file is set to Always). Notice that you can also open CSV file exactly the same way.

Then an object (book) of the LinqToExcel provider is instantiated with that file.

Finally, a LINQ query is prepared (data) to retrieve all the data from the Sheet1 worksheet (because no parameter is provided).

If you now try to output the result of the LINQ query to the grid, you will see something in the grid but it won’t be what you expect! It looks like the ColumnNames collection contained in the data collection is outputted which is not very interesting.

You might want to inspect the values without knowing what’s in the file by looping the result of the LINQ query like this:

'output all the data to Output window without knowing the contents
Dim intRowNumber As Integer
For Each r In data
    intRowNumber += 1
    'output the column names
    If intRowNumber = 1 Then
        For i As Integer = 0 To r.Count() - 1
            Debug.Write(r.ColumnNames(i).ToString() + " - ")
        Next
        Debug.WriteLine("")
    End If

    'output the data
    Debug.Write("Row " + intRowNumber.ToString() + " - ")
    For i As Integer = 0 To r.Count() - 1
        Debug.Write(r.Item(i).ToString() + " - ")
    Next
    Debug.WriteLine("")
Next

The results are currently outputted to the Output window.

The second test: Using a typed class

If you know that your Excel file is rather static (never changes columns), you might want, and this is where you will get the real benefits from LINQ to Excel, to create a type class.

This class must define the columns found in the worksheet you are working with.

For example, this class could be used with the demo file:

Public Class DemoFile
 
    Private _FirstName As String
    Public Property FirstName As String
        Get
            Return _FirstName
        End Get
        Set(value As String)
            _FirstName = value
        End Set
    End Property
 
    Private _LastName As String
    Public Property LastName() As String
        Get
            Return _LastName
        End Get
        Set(value As String)
            _LastName = value
        End Set
    End Property
 
    Private _Role As String
    Public Property Role() As String
        Get
            Return _Role
        End Get
        Set(value As String)
            _Role = value
        End Set
    End Property
 
    Private _Sex As String
    Public Property Sex() As String
        Get
            Return _Sex
        End Get
        Set(value As String)
            _Sex = value
        End Set
    End Property
 
    Private _DOB As DateTime
    Public Property DOB As DateTime
        Get
            Return _DOB
        End Get
        Set(value As DateTime)
            _DOB = value
        End Set
    End Property
 
    Private _dummy As String
    Public Property Dummy As String
        Get
            Return _dummy
        End Get
        Set(value As String)
            _dummy = value
        End Set
    End Property
 
End Class

Nothing really complex in there! The order of the columns in the file and the sequence of the properties is not important. The name of the property will be used to match the columns in the Excel file. Also, you might notice a property named Dummy in my class. This column does not exist in the Excel file and this is not a problem. This class does not need to be a dump of the columns, it can contains other properties you might need. On the opposite, you can also have columns in Excel that are not in your class. These columns are simply ignored.

With this class in you project, you can now make a better use of LINQ. Just change the LINQ query to specify what type of data you are processing (by giving a type when using the Worksheet method) and all of a sudden, you will see the columns properly shown in the grid.

Dim data = From x In book.Worksheet(Of DemoFile)()
           Select x

And because LINQ can also use the definition of the type, queries like these are also possible:

Dim data = From x In book.Worksheet(Of DemoFile)()
           Where x.Sex = "F"
           Select x

Dim data = From x In book.Worksheet(Of DemoFile)()
           Where x.DOB < New DateTime(1960, 1, 1)
           Select x

This is surely giving ideas where you could use it!

What if you have more than one worksheet?

No problem. As shown in the next snippet of code, the Worksheet method takes an argument to which you can pass the name of your worksheet. By default, it is looking for Sheet1.

Dim data = From x In book.Worksheet("Cities")
           Select x

And there is more!

The library offers more than what I have exposed here. The list includes:

  • Support of range in Excel
  • Support of Mapping (for column names)
  • Support files without a header row
  • Support transformations
  • Can report worksheet names of a workbook

Limitations

I don’t know if I would really consider it as a limitation but I prefer to warn you that this LINQ provider is read-only. That means that it doesn’t provide a save method! If you need to fully manipulate Excel files, you will need to use other mechanism (like my all-time favorite Aspose.Cells).

The documentation tells that there is a GetColumnNames method available. I didn’t find it. But there is a ColumnNames collection you can use instead.

Another limitation and this one is not really a limitation of the component but about LINQ, is the handling of incorrect data. For example, if you have an invalid date in your Excel file, trying to process the row containing the invalid date (in my case calling the .ToList method) will generate the exception shown in figure 3. As a workaround to this fact, better create a class only containing strings and do your validation only once the data has been retrieved.

Figure 3: Invalid data

Conclusion

If your Excel files have a minimum formatting and are straight forward data files, I seriously encourage you to give a serious try to the free LINQ provider.


(Print this page)