join in LINQ to SQL and LINQ to Entities Considered Messy, Redundant

Posted by on in Blogs
In this post I will demonstrate that use of the join keyword in LINQ to SQL and LINQ to Entities is nearly always wrong. LINQ queries which you write with the join keyword are harder to read and write than queries you write using associations, and they require knowledge of database metadata which is not required otherwise. This introduces the potential for errors and makes maintenance harder.

Many people ask how to do a "left join" in LINQ to SQL, and unfortunately, the answer they nearly always get — "Use DefaultIfEmpty!" — is, in my opinion, terrible advice. Let's implement the same query with and without the join keyword, and then compare the readability of the queries, the functionality, the knowledge required to write the query, and the maintainability of the code. I think you will find that using associations wins on every single criterion I examine.

I'm going to use the Northwind demo database for this example, since many people are familiar with its structure. I created a LINQ to SQL model for Northwind by simply dragging all of the tables in the database onto the LINQ to SQL designer. The only change I made to the model generated by the designer is to rename the "Employee1" property on the generated "Employee" type to the more descriptive name "Supervisor." I'm using LINQ to SQL for this demo, but everything I'm saying here applies equally to LINQ to Entities.

"Left" Joins


Let's imagine that I am asked to produce a web page listing all employees in a company, along with their supervisor, if any. This requires a "left join," in SQL terms, because not all employees have a supervisor. I'll project onto a presentation model, just like I do in LINQ to Entities. Using the association properties generated by LINQ to SQL, this is quite simple:



This is fairly readable. The one thing that you need to know is that both LINQ to SQL and LINQ to Entities coalesce nulls. This means that on a row where e.Supervisor is null, you will not get a NullReferenceException in the assignment to SupervisorName and SupervisorBirthDate, as you would with LINQ to Objects. Instead, null will be assigned. Therefore, it is important that EmployeeListItem.SupervisorBirthDate is of type DateTime? (a.k.a. Nullable<DateTime>) instead of the non-nullable DateTime.

Let's compare that to the equivalent query using the join syntax, using the mysteriously popular DefaultIfEmpty trick:



Yuck! This is far less readable than the query above. Yet these two queries produce exactly the same results. If you don't believe me, download the sample project attached to this post and try it yourself. Perhaps even worse than the general unreadability of the "join version" is the fact that this query requires knowledge of the structure of the database which is already present in the DBML (or EDMX, in the case of the Entity Framework) model. This is a problem for two reasons. First, it's an opportunity for programmers to make a mistake, which the first query eliminates. Second, it's a potential maintenance issue if the foreign key definition ever changes in the database.

"Inner" Joins


Now let's compare an example of an "inner join" using both my recommended method of associations and the LINQ join keyword. Here's the association version. It's so readable that there is very little to say about it:



Here's the join version:



Again, these two queries do exactly the same thing, as you can confirm for yourself by running the demo project. The join version here shares all of the faults of the "left join" version above.

API Consistency


Thus far, I've been discussing LINQ to SQL. But what if I have an Employee instance and that like to examine the employee's supervisor? I might write code like this:



Now compare that code with the "association" and "join" query forms. You will see that using the associations makes the LINQ to SQL queries much more closely resemble how you work with the materialized entity objects in "regular" code. Again, I think this makes your code easier to read.

What About Performance?


Not surprisingly, the SQL generated by the equivalent association and join versions is close to identical. I would not expect to ever see a performance difference between the two syntaxes in terms of query execution time.

Is It Ever Correct to Use join?


The advantages of using associations are so strong that you may wonder why join exists in LINQ at all. Associations, however, are only helpful when they actually exist. There may be times when you need to "join" based on values which are not actually foreign keys. Or you may need to join between LINQ to SQL and LINQ to Objects.

Running the Demo Project


Here's a the demo project. To build and run it, you'll need to Visual Studio 2010 Beta, SQL Server (Express is fine), and the Northwind demo database I linked above. Open the project, find the Web.config file in Solution Explorer and open it. Change the connectionString to point to your SQL Server. Now you should be able to run the application.
Comments are not available for public users. Please login first to view / add comments.