I am aware that LINQ is not a new thing (I even wrote about it myself a couple of times) but I decided to create a short series of articles (probably in 3 parts) to cover the syntax of LINQ-to-Objects. One of the objectives I have is to provide the same examples for both VB and C# (where most other articles fall short!).
This first part of this series will set the foundation explaining how to select, filter, order, group, and join data using LINQ.
Downloadable demo
The downloadable this month is a solution containing 2 projects (1 in VB, 1 in C#). It was created using Visual Studio 2010 but you can surely reuse this code if you Visual Studio 2008 or better.
Figure 1: The demo application in action
Test data
Before being able to test LINQ, we need some data. I thought of using WMI, folders and files, database … but decided it was to complex just to demo and being able to repro the results do I decided to build my custom source of data to be used for all the demos of this article.
Here is part of the method that loads the data. You have some more data if you download this month demo code.
'VB Private Sub LoadData() _customers.Add(New Customer With { .Name = "Dalton, Joe", .City = "Toronto", .Country = Countries.Canada, .Orders = New Order() { New Order With {.IdOrder = 1, .Quantity = 3, .IdProduct = 1, .Shipped = False, .Month = "January"}, New Order With {.IdOrder = 2, .Quantity = 5, .IdProduct = 2, .Shipped = True, .Month = "May"}}}) _products.Add(New Product With {.IdProduct = 1, .Description = "Product 1", .Price = 10}) _products.Add(New Product With {.IdProduct = 2, .Description = "Product 2", .Price = 20}) End Sub //C# void LoadData() { _customers.Add(new Customer { Name = "Dalton, Joe", City = "Toronto", Country = Countries.Canada, Orders = new Order[] { new Order {IdOrder = 1, Quantity = 3, IdProduct = 1, Shipped = false, Month = "January"}, new Order {IdOrder = 2, Quantity = 5, IdProduct = 2, Shipped = true, Month = "May"}}}); _products.Add(new Product { IdProduct = 1, Description = "Product 1", Price = 10 }); _products.Add(new Product { IdProduct = 2, Description = "Product 2", Price = 20 }); }
Selecting
As you probably already know, all LINQ queries always start with the From clause. The reason is simple. In order to enable the intellisense, the compiler must know the object it is working with.
One of the simplest LINQ query we can write is the equivalent of the famous SQL “Select * From X”. In LINQ, if we want to query our object (list of customers), we would write:
'VB Dim result = From c In _customers Select c //C# var result = from c in _customers select c;
Apart from the semi-colon and the Dim/var, there isn’t much differences in this simple query.
My demo application is displaying the number of returned rows in a label and the rows into a datagrid control so that we can see the effect of the query with these 2 lines of code:
'VB lblResults.Text = "Rows in results = " + result.Count.ToString grdResults.DataSource = result.ToList //C# lblResults.Text = "Rows in results = " + result.Count().ToString(); grdResults.DataSource = result.ToList();
If you create an application with these lines, you will see the correct number of rows displayed in the label but your grid may remain empty. This is not a LINQ problem but rather a grid problem or I should more exactly say that it is related to the way the object is declared. It just cannot display an anonymous object (which is returned by the query).
The universal workaround to that problem is by using a variation to the Select clause. Using this variation, we will specify the output type of the query:
'VB Dim result = From c In _customers Select CustomerName = c.Name, c.City, c.Country //C# var result = from c in _customers select new { CustomerName = c.Name, c.City, c.Country };
Now, if you try the code, it will properly show the 3 columns returned by the query. Notice that we have even changed the caption of Name property to be CustomerName. We could have change the order of the columns much like we do in the Select clause of SQL query.
The other way to fix the problem is to provide getters and setters to your class properties. This will work if you have access to the source code but will fall short if you didn’t create the code.
So instead of writing a class like this one:
'VB Public Class Customer Public Name As String Public City As String Public Country As Countries Public Orders() As Order End Class //C# public class Customer { public string Name; public string City; public Countries Country; public Order[] Orders; }
You better create one like this:
'VB Public Class Customer Public Property Name As String Public Property City As String Public Property Country As Countries Public Property Orders As Order() End Class //C# public class Customer { public string Name { get; set; } public string City { get; set; } public Countries Country { get; set; } public Order[] Orders { get; set; } }
The difference is little in VB, just the Property keyword is added.
If you are doing C#, you will always need to ensure that your Select clause is the last line of the query. VB developers have more flexibility. The Select clause can be anywhere but the first line. I recommend that even VB developers put it on the last line so that they don’t have problem converting LINQ queries to/from C#.
Now that we made sure we were able to query our object (the list of customers), let’s refine the query.
Filtering
Surely the most frequent operation you will do with a LINQ query is to filter the returned data. This clause is very similar to its SQL equivalent like you can see here:
'VB Dim result = From c In _customers Where c.City = "Montréal" Select CustomerName = c.Name, c.City, c.Country //C# var result = from c in _customers where c.City == "Montréal" select new { CustomerName = c.Name, c.City, c.Country };
The interesting part is that you can use .Net methods to filter even more. Consider this example that filters for customers living in Montréal or having their name containing JOE:
'VB Dim result = From c In _customers Where c.City = "Montréal" OrElse c.Name.ToUpper.Contains("JOE") Select CustomerName = c.Name, c.City, c.Country //C# var result = from c in _customers where c.City == "Montréal" || c.Name.ToUpper().Contains("JOE") select new { CustomerName = c.Name, c.City, c.Country };
The previous syntax is called regular LINQ expressions. When searching the web or reading books/articles, you may also find another syntax using lambda expression. For example, the last LINQ expression can be rewritten using lambdas like this:
'VB Dim result = _customers. Where(Function(c) c.City = "Montréal" OrElse c.Name.ToUpper.Contains("JOE")). Select(Function(c) New With {.CustomerName = c.Name, c.City, c.Country}) //C# var result = _customers .Where((c) => c.City == "Montréal" || c.Name.ToUpper().Contains("JOE")) .Select(c => new { CustomerName = c.Name, c.City, c.Country });
There are 2 things I don’t like about the VB syntax here. First the dot (.) must be at the end of the line. Second, why do we have to add Function in the predicate?
The most used LINQ operators are available in the form of LINQ expressions. Some others (like we will see in this series of articles) are only available through lambda expressions. VB developpers are lucky. The VB development team put more efforts and developed more methods as LINQ expressions.
Ordering
Another simple but useful clause of the query is to order the results.
'VB Dim result = From c In _customers Where c.Country = Countries.Canada Order By c.Country, c.City, c.Name Select c.Country, c.City, c.Name //C# var result = from c in _customers where c.Country == Countries.Canada orderby c.Country, c.City, c.Name select new { c.Country, c.City, c.Name };
By default, and as expected, it will be sorted ascending. But you can also have any field sorted descending.
Joining
The next functionality we will explore is to join 2 data sources together. This is useful to retrieve data from another source in order to complete a first set of data.
Depending on your data structure, there are 2 kinds of data joining.
If your data structure is already hierarchical as my customers containing orders example, you just need to include 2 From clause in your LINQ expression. The second from clause won’t query a root object. Instead, it will join on a relation own by the object. In my demo data, each customer has a collection of orders. To be able to list the orders for each customer, you can use a query like this:
'VB Dim result = From c In _customers From o In c.Orders Select c.Country, c.City, c.Name, o.IdProduct, o.Quantity //C# var result = from c in _customers from o in c.Orders select new { c.Country, c.City, c.Name, o.IdProduct, o.Quantity };
VB developer can replace the second From with a coma (leaving the rest of the query as is) like this:
'VB Dim result = From c In _customers, o In c.Orders Select c.Country, c.City, c.Name, o.IdProduct, o.Quantity
There are other scenarios where your data is containing a key to another data structure. In my example, the orders have a IdProduct field which can relate to the _products list. In this scenario, we need to use the Join clause like this:
'VB Dim result = From c In _customers From o In c.Orders Join p In _products On o.IdProduct Equals p.IdProduct Select c.Country, c.City, c.Name, o.IdProduct, o.Quantity, p.Description, p.Price, Total = o.Quantity * p.Price //C# var result = from c in _customers from o in c.Orders join p in _products on o.IdProduct equals p.IdProduct select new { c.Country, c.City, c.Name, o.IdProduct, o.Quantity, p.Description, p.Price, Total = o.Quantity * p.Price };
If you are a C# developer, be very careful in the order in which you put the On clause. VB is not as strict but I strongly encourage VB developers to follow the same restriction. The object you join from needs to appear first otherwise your query won’t even compiled (you will get an error saying something like “The name 'p' is not in scope on the left side of 'equals'. Consider swapping the expressions on either side of 'equals'.”).
You can also see that I have included a calculated column in my returned fields (o.Quantity * p.Price).
The previous query returns the equivalent of an “inner join” (all orders for which the product is found in the products data structure). What if you would like to return the equivalent of a “left join” (all orders whether or not the related product is found)?
Let’s see the following query:
'VB Dim result = From c In _customers From o In c.Orders Group Join p In _products On o.IdProduct Equals p.IdProduct Into ProductsByOrder = Group From po In ProductsByOrder.DefaultIfEmpty( New Product With { .IdProduct = 0, .Description = "Unknown Product", .Price = 0 }) Select c.Country, c.City, c.Name, o.IdProduct, o.Quantity, po.Description, po.Price, Total = o.Quantity * po.Price //C# var result = from c in _customers from o in c.Orders join p in _products on o.IdProduct equals p.IdProduct into ProductsByOrder from po in ProductsByOrder.DefaultIfEmpty(new Product { IdProduct = 0, Description = "Unknown Product", Price = 0 }) select new { c.Country, c.City, c.Name, o.IdProduct, o.Quantity, po.Description, po.Price, Total = o.Quantity * po.Price };
The first thing to notice is that the join (Group Join in VB) is sent “into” (Into x = Group in VB) an intermediate result called ProductsByOrder. Then you can use a From clause to query this intermediate result to which you provide default values for products that won’t be find. Finally, your Select clause can no longer use fields from the joined table (my p alias), fields have to come from the new From clause (my po alias).
Grouping
Grouping is another very useful feature of LINQ. After you grouped some data, you get a new value: Key. This new value is in fact a property representing a group item. You can then iterate through each item to get the details.
Check this expression which groups the customers by country. It returns the name of the country, the number of customers in the group and the full details:
'VB Dim result = From c In _customers Group c By c.Country Into GroupedCustomers = Group Select Country, Count = GroupedCustomers.Count, GroupedCustomers //C# var result = from c in _customers group c by c.Country into GroupedCustomers select new { Country = GroupedCustomers.Key, Count = GroupedCustomers.Count(), GroupedCustomers };
Apparently the Key property does not exist in VB.
Because the grid is not really good at representing the hierarchy returned by the group syntax, I have written 2 loops to output the contents of the groups to the Output window of Visual Studio. Here are the loops:
'VB For Each group In result Console.WriteLine(group.Country.ToString() + " - " + group.Count.ToString()) For Each detail In group.GroupedCustomers Console.WriteLine(" " + detail.City + " - " + detail.Name) Next Next //C# foreach (var country in result) { Console.WriteLine(country.Key); foreach (var detail in country) { Console.WriteLine(" " + detail.City + " - " + detail.Name); } }
Distinct
Before finishing this first part of the series, I wanted to discuss of a weird one!
Let’s say you want to list the list of distinct countries where your customers are from. By quickly looking at the help, you would find the Distinct operator that you can use like this in C#:
var result = (from c in _customers select new { c.Country } ).Distinct();
If you are doing VB, you might be tempted to write this query instead:
Dim result = (From c In _customers Select New With {c.Country} ).Distinct()
But it is very possible that it won’t return the list that you expect. Instead, you need to write this query:
Dim result = (From c In _customers Select New With {Key c.Country} ).Distinct()
What is the difference you will ask? Look at the Select clause, a new keyword (not required in C#) has been added: Key.
In VB.NET, when an anonymous type is encountered only those properties declared as key properties can be used for comparison purposes. So in VB.NET without the Key keyword, nothing occurs when you're attempting to do a distinct comparison. For more details, check http://msdn.microsoft.com/en-us/library/bb384767.aspx.
Conclusion
You found here the fundamentals of LINQ to objects with equivalent examples in both VB and C#.
So far, there are some little nuances between both languages but in the end we can achieve exactly the same thing in both languages.
Follow this blog to see at least 2 more articles on LINQ to objects.