LINQ Distinct With Anonymous Types Vs. Explicit Types

When are the same two shapes different? When one of those shapes has a name and one doesn’t.

The Enumerable.Distinct() method can produce distinctly different results (in both the actual SQL generated and the final result set) if the query results are shaped with an explicitly defined type rather than an anonymous type.

In other words, if you write your query with an anonymous type like this:

var distinctStuff = (from thing in DataContext.Things
                     select new
                          AnonValue1 = thing.Value1,
                          AnonValue2 = thing.Value2,
                          AnonValue3 = thing.Value3

You could get different results than if you project your query results into an explicitly defined type, like this:

var distinctStuff = (from thing in DataContext.Things
                     select new SomeObjectIDefined
                         ObjectValue1 = thing.Value1,
                         ObjectValue2 = thing.Value2,
                         ObjectValue3 = thing.Value3

The difference will depend upon whether or not you explicity define your equality operator for SomeObjectIDefined. And that is because, when you shape the query results with an explicitly defined type, the distinct operation is not part of the generated SQL (and not performed within the database). And, when you shape the query results with an anonymous type, the distinct operation is part of the SQL generated and performed at the database level.

So, if you don’t explicity define an equality operator (in this case, on SomeObjectIDefined, such that it regards all 3 members), every result instance in the result set is different. And, thereby, you’ll get duplicates in the end. Whereas, the anonymous type selection will not return duplicates.

Check the results of your queries in LINQPad for yourself to see the difference.

2 Responses to “LINQ Distinct With Anonymous Types Vs. Explicit Types”

  1. Craig, this is a great tip/idea/insight.

    So would you suggest using the first approach or would you suggest imlementing an equality operator?

    BTW, thanks for sharing this, keep these coming.

  2. Thank you, Yogesh.

    This depends.

    I like when I can have the database perform functions that it is good at performing, like a distinct operation. It also means that I don’t need to send a bunch of results back over the wire because the server will narrow those down for me.

    The explicitly defined type is scary because a) it won’t perform the distinct operation on the server and b) it could easily return unexpected duplicates.

    However, I have to weigh whether or not an anonymous type makes sense for the return object. Is that object going to be reusable elsewhere, selectable by other queries, etc.? And, does the distinct operation that is needed really define equality for the object?

    There are other options for solving the problem. Among them are defining your own IEqualityComparer to pass to the Distinct function, and writing a stored procedure to select the specific results required. Most frequently, on my current project, I’d be using the anonymous type.

Rss Feed