(Print this page)

LINQ-to-Objects Part 1
Published date: Tuesday, July 31, 2012
On: Moer and Éric Moreau's web site

I am aware that LINQ is not a new thing (I even wrote about it myself a couple of times) but I decided to create a short series of articles (probably in 3 parts) to cover the syntax of LINQ-to-Objects. One of the objectives I have is to provide the same examples for both VB and C# (where most other articles fall short!).

This first part of this series will set the foundation explaining how to select, filter, order, group, and join data using LINQ.

Downloadable demo

The downloadable this month is a solution containing 2 projects (1 in VB, 1 in C#). It was created using Visual Studio 2010 but you can surely reuse this code if you Visual Studio 2008 or better.

Figure 1: The demo application in action

Test data

Before being able to test LINQ, we need some data. I thought of using WMI, folders and files, database … but decided it was to complex just to demo and being able to repro the results do I decided to build my custom source of data to be used for all the demos of this article.

Here is part of the method that loads the data. You have some more data if you download this month demo code.

'VB
Private Sub LoadData()
    _customers.Add(New Customer With {
            .Name = "Dalton, Joe",
            .City = "Toronto",
            .Country = Countries.Canada,
            .Orders = New Order() {
                New Order With {.IdOrder = 1, .Quantity = 3, .IdProduct = 1, .Shipped = False, .Month = "January"},
                New Order With {.IdOrder = 2, .Quantity = 5, .IdProduct = 2, .Shipped = True, .Month = "May"}}})

    _products.Add(New Product With {.IdProduct = 1, .Description = "Product 1", .Price = 10})
    _products.Add(New Product With {.IdProduct = 2, .Description = "Product 2", .Price = 20})
End Sub

//C#
void LoadData()
{
    _customers.Add(new Customer {
            Name = "Dalton, Joe", 
            City = "Toronto", 
            Country = Countries.Canada, 
            Orders = new Order[] {
                new Order {IdOrder = 1, Quantity = 3, IdProduct = 1, Shipped = false, Month = "January"},
                new Order {IdOrder = 2, Quantity = 5, IdProduct = 2, Shipped = true, Month = "May"}}});

    _products.Add(new Product { IdProduct = 1, Description = "Product 1", Price = 10 });
    _products.Add(new Product { IdProduct = 2, Description = "Product 2", Price = 20 });
}

Selecting

As you probably already know, all LINQ queries always start with the From clause. The reason is simple. In order to enable the intellisense, the compiler must know the object it is working with.

One of the simplest LINQ query we can write is the equivalent of the famous SQL “Select * From X”. In LINQ, if we want to query our object (list of customers), we would write:

'VB
Dim result = From c In _customers
             Select c

//C#
var result = from c in _customers
             select c;

Apart from the semi-colon and the Dim/var, there isn’t much differences in this simple query.

My demo application is displaying the number of returned rows in a label and the rows into a datagrid control so that we can see the effect of the query with these 2 lines of code:

'VB
lblResults.Text = "Rows in results = " + result.Count.ToString
grdResults.DataSource = result.ToList

//C#
lblResults.Text = "Rows in results = " + result.Count().ToString();
grdResults.DataSource = result.ToList();

If you create an application with these lines, you will see the correct number of rows displayed in the label but your grid may remain empty. This is not a LINQ problem but rather a grid problem or I should more exactly say that it is related to the way the object is declared. It just cannot display an anonymous object (which is returned by the query).

The universal workaround to that problem is by using a variation to the Select clause. Using this variation, we will specify the output type of the query:

'VB
Dim result = From c In _customers
             Select CustomerName = c.Name, c.City, c.Country

//C#
var result = from c in _customers
             select new { CustomerName = c.Name, c.City, c.Country };

Now, if you try the code, it will properly show the 3 columns returned by the query. Notice that we have even changed the caption of Name property to be CustomerName. We could have change the order of the columns much like we do in the Select clause of SQL query.

The other way to fix the problem is to provide getters and setters to your class properties. This will work if you have access to the source code but will fall short if you didn’t create the code.

So instead of writing a class like this one:

'VB
Public Class Customer
    Public Name As String
    Public City As String
    Public Country As Countries
    Public Orders() As Order
End Class

//C#
public class Customer
{
    public string Name; 
    public string City; 
    public Countries Country; 
    public Order[] Orders; 
}

You better create one like this:

'VB
Public Class Customer
    Public Property Name As String
    Public Property City As String
    Public Property Country As Countries
    Public Property Orders As Order()
End Class

//C#
public class Customer
{
    public string Name { get; set; }
    public string City { get; set; }
    public Countries Country { get; set; }
    public Order[] Orders { get; set; }
}

The difference is little in VB, just the Property keyword is added.

If you are doing C#, you will always need to ensure that your Select clause is the last line of the query. VB developers have more flexibility. The Select clause can be anywhere but the first line. I recommend that even VB developers put it on the last line so that they don’t have problem converting LINQ queries to/from C#.

Now that we made sure we were able to query our object (the list of customers), let’s refine the query.

Filtering

Surely the most frequent operation you will do with a LINQ query is to filter the returned data. This clause is very similar to its SQL equivalent like you can see here:

'VB
Dim result = From c In _customers
             Where c.City = "Montréal" 
             Select CustomerName = c.Name, c.City, c.Country

//C#
var result = from c in _customers
             where c.City == "Montréal" 
             select new { CustomerName = c.Name, c.City, c.Country };

The interesting part is that you can use .Net methods to filter even more. Consider this example that filters for customers living in Montréal or having their name containing JOE:

'VB
Dim result = From c In _customers
             Where c.City = "Montréal" OrElse c.Name.ToUpper.Contains("JOE")
             Select CustomerName = c.Name, c.City, c.Country

//C#
var result = from c in _customers
             where c.City == "Montréal" || c.Name.ToUpper().Contains("JOE")
             select new { CustomerName = c.Name, c.City, c.Country };

The previous syntax is called regular LINQ expressions. When searching the web or reading books/articles, you may also find another syntax using lambda expression. For example, the last LINQ expression can be rewritten using lambdas like this:

'VB
Dim result = _customers.
        Where(Function(c) c.City = "Montréal" OrElse c.Name.ToUpper.Contains("JOE")).
        Select(Function(c) New With {.CustomerName = c.Name, c.City, c.Country})

//C#
var result = _customers
             .Where((c) => c.City == "Montréal" || c.Name.ToUpper().Contains("JOE"))
             .Select(c => new { CustomerName = c.Name, c.City, c.Country });

There are 2 things I don’t like about the VB syntax here. First the dot (.) must be at the end of the line. Second, why do we have to add Function in the predicate?

The most used LINQ operators are available in the form of LINQ expressions. Some others (like we will see in this series of articles) are only available through lambda expressions. VB developpers are lucky. The VB development team put more efforts and developed more methods as LINQ expressions.

Ordering

Another simple but useful clause of the query is to order the results.

'VB
 Dim result = From c In _customers
              Where c.Country = Countries.Canada
              Order By c.Country, c.City, c.Name
              Select c.Country, c.City, c.Name
//C#
var result = from c in _customers
             where c.Country == Countries.Canada
             orderby c.Country, c.City, c.Name
             select new { c.Country, c.City, c.Name };

By default, and as expected, it will be sorted ascending. But you can also have any field sorted descending.

Joining

The next functionality we will explore is to join 2 data sources together. This is useful to retrieve data from another source in order to complete a first set of data.

Depending on your data structure, there are 2 kinds of data joining.

If your data structure is already hierarchical as my customers containing orders example, you just need to include 2 From clause in your LINQ expression. The second from clause won’t query a root object. Instead, it will join on a relation own by the object. In my demo data, each customer has a collection of orders. To be able to list the orders for each customer, you can use a query like this:

'VB
Dim result = From c In _customers
             From o In c.Orders
             Select c.Country, c.City, c.Name, o.IdProduct, o.Quantity
//C#
var result = from c in _customers
             from o in c.Orders
             select new { c.Country, c.City, c.Name, o.IdProduct, o.Quantity };

VB developer can replace the second From with a coma (leaving the rest of the query as is) like this:

'VB
Dim result = From c In _customers, o In c.Orders
             Select c.Country, c.City, c.Name, o.IdProduct, o.Quantity

There are other scenarios where your data is containing a key to another data structure. In my example, the orders have a IdProduct field which can relate to the _products list. In this scenario, we need to use the Join clause like this:

'VB
Dim result = From c In _customers
             From o In c.Orders
             Join p In _products On o.IdProduct Equals p.IdProduct
             Select c.Country, c.City, c.Name, o.IdProduct, o.Quantity, 
                    p.Description, p.Price, Total = o.Quantity * p.Price
//C#
var result = from c in _customers
             from o in c.Orders
             join p in _products on o.IdProduct equals p.IdProduct
             select new { c.Country, c.City, c.Name, o.IdProduct, o.Quantity, 
                          p.Description, p.Price, Total = o.Quantity * p.Price };

If you are a C# developer, be very careful in the order in which you put the On clause. VB is not as strict but I strongly encourage VB developers to follow the same restriction. The object you join from needs to appear first otherwise your query won’t even compiled (you will get an error saying something like “The name 'p' is not in scope on the left side of 'equals'. Consider swapping the expressions on either side of 'equals'.”).

You can also see that I have included a calculated column in my returned fields (o.Quantity * p.Price).

The previous query returns the equivalent of an “inner join” (all orders for which the product is found in the products data structure). What if you would like to return the equivalent of a “left join” (all orders whether or not the related product is found)?

Let’s see the following query:

'VB
Dim result = From c In _customers
             From o In c.Orders
             Group Join p In _products 
                On o.IdProduct Equals p.IdProduct 
                Into ProductsByOrder = Group
             From po In ProductsByOrder.DefaultIfEmpty(
                New Product With {
                      .IdProduct = 0,
                      .Description = "Unknown Product",
                      .Price = 0
                     })
             Select c.Country, c.City, c.Name, o.IdProduct, o.Quantity,
                    po.Description, po.Price, Total = o.Quantity * po.Price
//C#
var result = from c in _customers
             from o in c.Orders
             join p in _products on o.IdProduct equals p.IdProduct into ProductsByOrder
             from po in ProductsByOrder.DefaultIfEmpty(new Product
                                          {
                                              IdProduct = 0, 
                                              Description = "Unknown Product", 
                                              Price = 0
                                          })
             select new { c.Country, c.City, c.Name, o.IdProduct, o.Quantity, 
                          po.Description, po.Price, Total = o.Quantity * po.Price };

The first thing to notice is that the join (Group Join in VB) is sent “into” (Into x = Group in VB) an intermediate result called ProductsByOrder. Then you can use a From clause to query this intermediate result to which you provide default values for products that won’t be find. Finally, your Select clause can no longer use fields from the joined table (my p alias), fields have to come from the new From clause (my po alias).

Grouping

Grouping is another very useful feature of LINQ. After you grouped some data, you get a new value: Key. This new value is in fact a property representing a group item. You can then iterate through each item to get the details.

Check this expression which groups the customers by country. It returns the name of the country, the number of customers in the group and the full details:

'VB
Dim result = From c In _customers
             Group c By c.Country Into GroupedCustomers = Group
             Select Country, Count = GroupedCustomers.Count, GroupedCustomers

//C#
var result = from c in _customers
             group c by c.Country into GroupedCustomers
             select new { Country = GroupedCustomers.Key, 
                          Count = GroupedCustomers.Count(), 
                          GroupedCustomers };

Apparently the Key property does not exist in VB.

Because the grid is not really good at representing the hierarchy returned by the group syntax, I have written 2 loops to output the contents of the groups to the Output window of Visual Studio. Here are the loops:

'VB
For Each group In result
    Console.WriteLine(group.Country.ToString() + " - " + group.Count.ToString())
    For Each detail In group.GroupedCustomers
        Console.WriteLine("   " + detail.City + " - " + detail.Name)
    Next
Next

//C#
foreach (var country in result)
{
    Console.WriteLine(country.Key);
    foreach (var detail in country)
    {
        Console.WriteLine("   " + detail.City + " - " + detail.Name);
    }
}

Distinct

Before finishing this first part of the series, I wanted to discuss of a weird one!

Let’s say you want to list the list of distinct countries where your customers are from. By quickly looking at the help, you would find the Distinct operator that you can use like this in C#:

    var result = (from c in _customers
              select new { c.Country }
             ).Distinct();

If you are doing VB, you might be tempted to write this query instead:

Dim result = (From c In _customers 
              Select New With {c.Country}
             ).Distinct()

But it is very possible that it won’t return the list that you expect. Instead, you need to write this query:

Dim result = (From c In _customers
              Select New With {Key c.Country}
             ).Distinct()

What is the difference you will ask? Look at the Select clause, a new keyword (not required in C#) has been added: Key.

In VB.NET, when an anonymous type is encountered only those properties declared as key properties can be used for comparison purposes. So in VB.NET without the Key keyword, nothing occurs when you're attempting to do a distinct comparison. For more details, check http://msdn.microsoft.com/en-us/library/bb384767.aspx.

Conclusion

You found here the fundamentals of LINQ to objects with equivalent examples in both VB and C#.

So far, there are some little nuances between both languages but in the end we can achieve exactly the same thing in both languages.

Follow this blog to see at least 2 more articles on LINQ to objects.


(Print this page)