(Print this page)

LINQ-to-Objects Part 2
Published date: Sunday, August 26, 2012
On: Moer and Éric Moreau's web site

Last month, I wrote an article which was the first in the series of probably 3.

This month, I will continue showing the syntax of LINQ-to-objects which is often as any other LINQ-to-whatever. I received good comments about the parallel of showing the same query in both VB and C# so I will continue.

The exploration of this month article will continue with other categories of operators: Set, Aggregate, Generation, and Quantifiers.

Downloadable demo

The downloadable this month is a solution containing 2 projects (1 in VB, 1 in C#). It was created using Visual Studio 2010 but you can surely reuse this code if you Visual Studio 2008 or better.

You will find all the code of last month article as well in the solution for you convenience if you want to retry/review some topics of the previous demo.

Figure 1: The demo app

Test Data

We will mostly continue the demo using the same test data. Sometimes, I will have to provide new sets of test data (like a new list of Product named _products2):

'VB
_products2.Add(New Product With {.IdProduct = 1, .Description = "Product 1", .Price = 10})
_products2.Add(New Product With {.IdProduct = 3, .Description = "Product 3", .Price = 30})
_products2.Add(New Product With {.IdProduct = 5, .Description = "Series 2 - Product 5", .Price = 230})
_products2.Add(New Product With {.IdProduct = 7, .Description = "Series 2 - Product 7", .Price = 240})
_products2.Add(New Product With {.IdProduct = 9, .Description = "Series 2 - Product 9", .Price = 250})
_products2.Add(New Product With {.IdProduct = 11, .Description = "Series 2 - Product 11", .Price = 260})

//C#
_products2.Add(new Product { IdProduct = 1, Description = "Product 1", Price = 10 });
_products2.Add(new Product { IdProduct = 3, Description = "Product 3", Price = 30 });
_products2.Add(new Product { IdProduct = 5, Description = "Series 2 - Product 5", Price = 230 });
_products2.Add(new Product { IdProduct = 7, Description = "Series 2 - Product 7", Price = 240 });
_products2.Add(new Product { IdProduct = 9, Description = "Series 2 - Product 9", Price = 250 });
_products2.Add(new Product { IdProduct = 11, Description = "Series 2 - Product 11", Price = 260 });

Set Operators

If you remember your math class, you surely remember the sets and the terminology that goes with it. There are 3 operations you were able to do with them and the same operations apply to LINQ sets. These operations are Union, Intersect, and Except.

These operators are only required when you are dealing with 2 or more sets to find all elements (Union), elements belonging to both sets (Intersect), and elements not belonging to both sets (Except).

The Union operator simply merges 2 sets together in their original order.

'VB
Dim result = _products.Union(_products2)

//C#
var result = _products.Union(_products2);

If you ever happen to have a product with the same ID, Description and Price (the same product in appearance), it would be shown twice in results. The reason is simple, they both have a different hash code and that hash code is used for the comparison. The same problem will happen with the others Set operators. To fix this problem, you need to override both the Equals and the GetHashCode method of the Product class like this:

'VB
Public Overrides Function Equals(obj As Object) As Boolean
    If obj Is Nothing Then Return False
    If Not (TypeOf obj Is Product) Then Return False

    Dim p As Product = DirectCast(obj, Product)
    Return (p.IdProduct = IdProduct AndAlso 
            p.Description = Description AndAlso 
            p.Price = Price)
End Function

Public Overrides Function GetHashCode() As Integer
    Return String.Format("{0}|{1}|{2}", IdProduct, Description, Price).GetHashCode
End Function

//C#
public override bool Equals(object obj)
{
    if (!(obj is Product))
        return false;
    else
    {
        Product p = (Product)obj;
        return (p.IdProduct == IdProduct && p.Description == Description && p.Price == Price);
    }
}

public override int GetHashCode()
{
    return String.Format("{0}|{1}|{2}", IdProduct, Description, Price).GetHashCode();
}

The Intersect operator returns elements that are the same from both sets. Remember that the override of the Equals and GetHashCode is required if you are working with classes (reference type). They are not required if you have value type (like an array of integers).

'VB
Dim result = _products.Intersect(_products2)

//C#
var result = _products.Intersect(_products2);

The Except operator will return all elements from the first set not appearing in the second set.

'VB
Dim result = _products.Except(_products2)

//C#
var result = _products.Except(_products2);

If you need a list of all the elements that are not in the intersection (belonging to both sets), you can union 2 intersects! Here is the code:

'VB
Dim result = (_products.Except(_products2)).
    Union(_products2.Except(_products))

//C#
var result = (_products.Except(_products2))
                .Union(_products2.Except(_products));

Aggregate Operators

Aggregate operators are useful to make calculations on sets. For example, we have a set of products. What if we need to find the average price? We don’t need to loop through it and sum each price and divide by the number of products at the end. We have an operator that does exactly that.

So what are the available operators? This code snippet uses them all in their simplest form by storing the result of each one into a list of string:

'VB
Dim agg = New List(Of String)

agg.Add("Count = " + _products.Count().ToString())
agg.Add("LongCount = " + _products.LongCount().ToString)
agg.Add("Sum Price = " + _products.Sum(Function(p) p.Price).ToString)
agg.Add("Average Price = " + _products.Average(Function(p) p.Price).ToString)
agg.Add("Min Price = " + _products.Min(Function(p) p.Price).ToString)
agg.Add("Max Price = " + _products.Max(Function(p) p.Price).ToString)

//C#
var agg = new List
    {
        "Count = " + _products.Count(),
        "LongCount = " + _products.LongCount(),
        "Sum Price = " + _products.Sum(p => p.Price),
        "Average Price = " + _products.Average(p => p.Price),
        "Min Price = " + _products.Min(p => p.Price),
        "Max Price = " + _products.Max(p => p.Price)
    };

But normally, you will want to use these operators a bit differently, in the middle of more complex queries. Consider the following code snippet.

Here, we are reusing many operators from the previous article to join each customer to his orders and also to the products to get the total of each item. The result of the join is stored into result.

In a second operation, we start from the customers again but this time we are joining the previous result and use the Count and Sum operators to report the number of orders a client made and the total of all his orders.

'VB
Dim result = From c In _customers
     From o In c.Orders
     Join p In _products
            On o.IdProduct Equals p.IdProduct
     Select c.Name, o.IdOrder, TotalItem = o.Quantity * p.Price

Dim result2 = From c In _customers
              Group Join o In result
                  On c.Name Equals o.Name
              Into CustomersOrders = Group
              Select c.Name,
                  NbOrders = CustomersOrders.Count(),
                  TotalOrders = CustomersOrders.Sum(Function(o) o.TotalItem)

//C#
var result = from c in _customers
             from o in c.Orders
             join p in _products
                    on o.IdProduct equals p.IdProduct
             select new { c.Name, o.IdOrder, TotalItem = o.Quantity * p.Price };

var result2 = from c in _customers
              join o in result
                  on c.Name equals o.Name
              into CustomersOrders
              select new { c.Name, 
                  NbOrders = CustomersOrders.Count(),
                  TotalOrders = CustomersOrders.Sum(o => o.TotalItem) };

Of course it could all be mixed into a single query but it would much less readable.

There is another operator that fits into this section. It is named Aggregate. This operator will use the function of the operator and execute whatever it’s in it. For example, we can use it to concatenate strings or add values.

Here my sample will find the value of the most expensive order of each customer:

'VB
Dim result = From c In _customers
     From o In c.Orders
     Join p In _products
            On o.IdProduct Equals p.IdProduct
     Select c.Name, o.IdOrder, TotalItem = o.Quantity * p.Price

Dim result2 = From c In _customers
              Group Join o In result
                  On c.Name Equals o.Name
              Into CustomersOrders = Group
              Select c.Name,
                      MaxOrders = CustomersOrders.Aggregate(0D, 
                                Function(currentValue, currentItem) _
                                    If(currentValue > currentItem.TotalItem, 
                                       currentValue, 
                                       currentItem.TotalItem))

//C#
var result = from c in _customers
             from o in c.Orders
             join p in _products
                    on o.IdProduct equals p.IdProduct
             select new { c.Name, o.IdOrder, TotalItem = o.Quantity * p.Price };

var result2 = from c in _customers
              join o in result
                  on c.Name equals o.Name
              into CustomersOrders
              select new
              {
                  c.Name,
                  MaxOrders = CustomersOrders.Aggregate(0m, (accumulator, currentItem) => 
                                              accumulator > currentItem.TotalItem ? 
                                              accumulator : 
                                              currentItem.TotalItem )
              };

The 0m passed in the first argument of the Aggregate operator is to cast the result as a decimal and to initialize it to 0 (if you want to find the minimum value, you will surely replace 0m by something like decimal.MaxValue). Then we tell the function that 2 values are passed to it, the accumulator holding the result and the current item. Then we do the operation. In this case we check if the accumulator contains a value larger the TotalItem value of the current row.

The Aggregate operator, just like other LINQ operator, is not able to manipulate numbers only. One thing we often need to do is to concatenate strings. Here is an example of a LINQ query using the Aggregate operator to concatenate all the files of the StartupPath:

'VB
Dim strFiles As String = IO.Directory.GetFiles(Application.StartupPath).
    Aggregate(strFiles, Function(current, strFile) current + (strFile + ";"))

//C#
var strFiles = System.IO.Directory.GetFiles(Application.StartupPath)
    .Aggregate((current, strFile) => current + (strFile + ";"));

Generation Operators

It is sometime useful to generate data into our application, often to initialize structures or to generate test data. LINQ offers some operators to help you achieve that.

The first operation in this category is Range. It takes 2 arguments. The starting value and the number of values you want to be generated. For example, this query will generate 100 new products starting with the ID 1000:

'VB
Dim result = Enumerable.Range(1000, 100).
    Select(Function(x) (New Product With {
                        .IdProduct = x,
                        .Description = "Product " + x.ToString(),
                        .Price = x * 2
                    }))

//C#
var result = Enumerable.Range(1000, 100)
    .Select(x => (new Product
    {
        IdProduct = x,
        Description = "Product " + x,
        Price = x * 2
    }));

Another operator falling in this category is Repeat. This operator is useful when you want to repeat the same results multiple times like this query that returns 7 times the product 1:

'VB
Dim result = Enumerable.Repeat((From p In _products
                                Where p.IdProduct = 1
                                Select p), 7).
                    SelectMany(Function(x) x)

//C#
var result = Enumerable.Repeat((from p in _products 
                                where p.IdProduct == 1
                                select p), 7)
                               .SelectMany(x => x);

Quantifiers Operators

Quantifiers operators are used to check for existence of elements in a set.

The first quantifier operator is Any. It returns a Boolean value (true or false) indicating if an element following the condition exists in the set. It is much like doing a count and checking if the result is greater than 0. The Any operator has been optimized so that as soon as at least one item is found, the returned result is set to true and the remaining elements are not evaluated. This is an example of a query checking if at least one customer has the city Montréal:

'VB
Dim result As Boolean = (From c In _customers Select c).
                        Any(Function(c) c.City = "Montréal")

//C#
bool result = (from c in _customers select c)
    .Any(c => c.City == "Montréal");

Some other times, instead of checking for the existence of at least one element in the set, you need to check that all the elements verify a condition. The All operator is designed for that. The following example checks if all the customers have Montréal in the city:

'VB
Dim result As Boolean = (From c In _customers Select c).
                        All(Function(c) c.City = "Montréal")

//C#
bool result = (from c in _customers select c)
    .All(c => c.City == "Montréal");

The last operator of this category is Contains. You are probably used to this operator for example when you want to check if a letter is contained in a string. It works exactly the same in LINQ. Here is an example checking if a particular element exists in the set. Notice that your class (Product in my example) needs to have the Equals and the GetHashCode methods overridden:

'VB
Dim productToTest = New Product With
{
    .IdProduct = 1,
    .Description = "Product 1",
    .Price = 10
}
Dim result As Boolean = _products.Contains(productToTest)

//C#
var productToTest = new Product
{
    IdProduct = 1,
    Description = "Product 1",
    Price = 10
};
bool result = _products.Contains(productToTest);

Conclusion

Your understanding of LINQ operators should be a bit better now. Next month, I still have some more to demonstrate.


(Print this page)