Entity Framework – Storing a complex entity as JSON in a single DB column

jsonDuring the development of TicketDesk 2.5, I came across an unusual case. I wanted to store a large chunk of my entity model as a simple JSON string, and put it into a single column in the database.

Here’s the setup:

I have an entity that encapsulates all of a user’s display preferences and similar settings. One of those settings is a complex set of objects that represents the user’s custom settings for list view UI pages. There can be many lists, each with separate settings. Some of the settings for a list contain collections of other objects, resulting in a hierarchy of settings that goes three levels deep

I didn’t want to represent these settings as a relational data model in the database though. Using EF’s standard persistence mapping conventions, this collection of settings ends up being spread across six  tables. The TSQL queries to access that data would be rather slow and cumbersome, and the relational model doesn’t add any value at all.

Instead, I just wanted to serialize out the entire collection of settings as a single JSON string, and store it in one column in the user settings table. At the same time though, I wanted the code behave as if this were just a natural part of my regular EF entity model.

The solution:

The solution was to use a complex type, with some fluent model binding magic, to flatten the hierarchy into a single column. The heirarchy itself is represented as a custom collection, with a bit of manual JSON serialization/deserialization built-in.

I got a pointer the right general direction from this SO post, which saved me a bunch of time when approaching this more advanced scenario.

First, let’s take a look at the root entity here:

public class UserSetting
 {
 	[Key]
    public string UserId { get; set; }

    public virtual UserTicketListSettingsCollection ListSettings { get; set; }
}

This is the only entity which will map to its own table in the DB. The ListSettings collection is the property I want persisted as JSON in a single column.

Here is the custom collection that will be stored:

public class UserTicketListSettingsCollection: Collection<UserTicketListSetting>
{
    public void Add(ICollection<UserTicketListSetting> settings)
    {
        foreach (var listSetting in settings)
        {
            this.Add(listSetting);
        }
    }

    [JsonIgnore]
    public string Serialized
    {
        get { return Newtonsoft.Json.JsonConvert.SerializeObject(this); }
        set
        {
            if (string.IsNullOrEmpty(value))
            {
                return;
            }

            var jData = Newtonsoft.Json.JsonConvert.DeserializeObject<List<UserTicketListSetting>>(value);
            this.Items.Clear();
            this.Add(jData);
            
        }
    }
}

This is a collection type, and is inheriting generic Collection<T>. In this case, T is the UserTicketListSetting type — which is a standard POCO wrapping up all of the settings for all of the various list views in one place.

Some of the properties inside UserTicketListSetting contain collections of other POCOs. The specific details of what’s going inside those classes doesn’t matter to this discussion, just understand that it results in a hierarchy of related objects. None of the properties in that hierarchy are marked up with EF attributes or anything.

The only magic here is that we have a Serialized property, which manually handles converting from/to JSON. This is the only property that we want persisted to the database.

To make that persistence happen, we will make UserTicketListSettingsCollection an EF complex type, though not by using the [ComplexType] attribute. Instead, we’ll manually register this complex type via the fluent model builder API.

In the DB Context this looks like this:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
	modelBuilder.ComplexType<UserTicketListSettingsCollection>()
        .Property(p => p.Serialized)
        .HasColumnName("ListSettingsJson");
}

This just tells EF that UserTicketListSettingsCollection is a complex type, and the only property we care about is the Serialized property. If there were other properties in UserTicketListSettingsCollection, you would need to exclude them with something like:

modelBuilder.ComplexType<UserTicketListSettingsCollection>().Ignore(p => p.PropertyToIgnore);

And that’s all I needed to get EF to store this entire hierarchy as a single JSON column.

Using this model in code is just like using any other EF entity. I can query it with LINQ expressions, and SaveChanges on the DbContext updates the JSON data in DB just like any other entity. Even the generation of code-based migrations works as expected.

It took a LOT of experimentation and digging to figure out how to make this work, but the implementation is rather simple once you know how to approach the problem.

This also reflects the amazing power and flexibility of Entity Framework. EF can be extended to fit very advanced scenarios even when the designers didn’t anticipate them directly.

You can see the full implementation of this in TicketDesk 2.5. Currently, TD 2.5 is in alpha, so look to the develop branch in source control on CodePlex. You will find this example, as well as several variations in the TicketDesk.Domain assembly.

Entity Framework: It’s not a stack of pancakes!

I’ve been talking to a lot of line-of-business developers lately. Most have adopted newer technologies like Entity Framework, but many are still working with strictly layered application designs. They’ll have POCO entities, DbContexts, DbSets, and all the modern goodness EF brings. Then they smash it all down into a data access layer, hiding EF behind several layers of abstraction. As a result, these applications can’t leverage EF’s best features, and the entire design becomes very cumbersome.

I blame n-tier architectural thinking!

Pancakes:

Most senior .Net developers cut their teeth on .net during a time when n-tier was a pervasive discussion across the industry. Every book, article and classroom lecture about web application design included a long discussion on n-tier fundamentals. The n-tier hype faded years ago, replaced by more realistic lego-block approaches, but n-tier inspired conceptualizations still exert a strong influence over modern designs.

I call it pancake thinking — conceptualizing the logical arrangement of an application as a series of horizontal layers, each with distinct, non-overlapping areas of responsibility. Pancake thinking pins Entity Framework as a DAL technology –just another means to get data in and out of a database –a fundamental misunderstanding of EF’s intended role.

Here is a diagram of a full blown n-tier approach applied to an ASP.NET MVC application using Entity Framework:

EF N-Tier Diagram

Note that all of entity framework is buried at the bottom. An object mapper is probably transforming EF’s entities into business objects. So, by the time information leaves the DAL, EF has been beaten completely out of it. EF acts only as a modernized version of traditional ADO.NET objects. EF is a better experience for DAL developers, but it doesn’t add value for those writing code further up the stack.

I don’t see as many strictly layered designs like this in the wild, though I have come across a few recently. Most of the current literature around EF advocates a somewhat hybrid approach, like this:

EF n-Tier Alternate Design

In this design, EF’s POCOs roam around the business and presentation layers like free-range chickens, but DbContext and DbSets are still being tortured in the basement.

What I dislike about this design is that EF has been turned on its head. The DbContext should be at the top of the stack, acting as the entry point through which you interact with entities.

So, how should a modern EF driven application be designed?

EF’s POCO entities, along with DbContext and DbSets, work best with a behavior driven approach to application design. You design around the business-behaviors of your application, largely without concern for what the database looks like. Instead of a horizontally segmented architecture, EF becomes a pervasive framework used throughout the entire application at all levels. If you are worried about mixing “data-logic” with “business-logic”, then you are still thinking about pancakes. Stop it!

Here is another diagram, this time letting Entity Framework do it’s thing, without any unnecessary abstractions:

EF Domain Model

Notice first, how much simpler the design becomes. We’ve elevated EF to a true business framework, and we’ve eliminated tons of abstractions in the process. We still have n-tier going on, so relax! We just didn’t logically split data and business code like you might be used to.

This is a domain-model architecture, and borrows some of the general ideas of “domain driven design” (DDD). If you are a DDD purist, please do not write me hate mail. What I’m proposing here is not intended to be a true DDD design. If you are interested in DDD in depth, check out Vaughn Vernon’s site. For discussions on practical DDD with .Net and EF, Vaughn’s TechEd presentations are a must watch. 

To understand how this design works, let’s explore the three main concepts behind Entity Framework.

EF Entity Model:

The heart of EF is the entity model. At its simplest, the model is just a bunch of data transfer objects (DTOs) –together they form an in-memory model mirroring the database’s structure. If you generate a model from an existing database, you get this shape. This is also what happens if you design a code-first model using the same thinking you’d use to design a relational database. This “in-memory database” viewpoint is how most n-tier applications tend to use EF models.

That simplistic approach doesn’t leverage EF’s power and flexibility. It should be a true business domain model, or something close. Your entities contain whatever code is appropriate for their business function, and they are organized to best fit the business requirements.

The nice thing about a business-oriented model is that code operating against the entities feels very natural to object oriented programmers. You don’t concern yourself with how the actual persistence is done; EF takes care it for you. This is exactly the level of abstraction n-tier designs strive for, but EF gives you the same result without the rigid, horizontal layering.

Persistence mapping:

The logical entity design should be based on business behavior, but the actual code implementation does require you to understand how EF handles persistence.

Persistence details do place some constraints on the kinds of OOP acrobatics you can employ in your model, so you need to be aware of how it works. Overall though, the constraints are mild, and shouldn’t keep you from an implementation that remains true to the intent of the business-centric design.

Many properties on your entities will need to be persisted to the database. Others may exist only to support runtime behaviors, but aren’t persisted. To figure out how, or if, properties map to the database, EF uses a combination of conventions and attribute annotations. EF doesn’t care about your entity’s other methods, fields, events, delegates, etc. so you are free to implement whatever business code you need.

EF does a good job of automatically inferring much of a model’s mapping from code conventions alone. If you use the conventions appropriately you can get a head-start on your persistence mappings –no code needed. For properties that need more explicit definitions, you use attributes to tell EF how to interpreted them.

For really advanced cases, you can hook into the model builder’s fluent API. This powerful tool lets you define tricky mappings that attributes and conventions alone can’t describe fully. If your model is significantly dissimilar from your database’s structure, you may spend a lot of time getting to know the model builder –but it’s easy to use, and amazingly powerful.

While you will need to understand EF persistence issues, you only need to concern yourself with them when you implement the entity model. For code using that model, these details are highly transparent –as they should be.

Repositories and Business Services:

The final piece of EF is the part so many people insist on hiding –the DbContext and DbSets. If you are thinking in pancakes, the DbContext seems like a hub for accessing the database. N-tier principals have trained you to hide data access from other code, at all costs!

Typically, n-tier type abstractions take the form of custom repositories layered on top of entity framework’s objects. Only the repositories may instantiate and use a DbContext, while everything at higher layers must go through the repositories.

A service or unit-of-work pattern is usually layered on top of the custom repositories too. The service manages the repositories, while the repositories manage EF’s DbContext and DbSets.

If you’ve ever tried to layer an n-tier application on EF like this, you probably found yourself fighting EF all over the place. This abstraction is the source of your pain.

An EF DbContext is already an implementation of a unit-of-work design pattern. The DbSet is a generic repository pattern. So you’ve just been layering custom unit-of-work and repositories over top of EF’s unit-of-work and repositories. That extra abstraction doesn’t add much value, but it sure adds a lot of complexity.

Ideally, the DbContext should be a root business service. The most important thing to understand is that this belongs at the top of the business layer, not buried under it.

Your entities directly contain the internal business logic appropriate to enforce their behaviors. Similarly, a DbSet is where you put business logic that operates against the set of an entity type. Anything that you’d normally put in custom repositories can be added to the real DbSet instead through extension methods.

Extension methods let you extend a DbSet on the fly. They are fantastic for dealing with business context specific concerns, and you can have a different set of extension methods for each of your business contexts. The extension methods can be arranged by namespace, and can also be defined in assemblies higher in the stack –in the latter case, the extension may have access to dependencies within the higher layer assembly that would not be appropriate to couple directly to your business layer. For example, an extension method in an asp.net web application can depend on HttpContext, but you would never want to create a dependency like that directly in the business domain.  Calling code can just chose which extensions are appropriate, and import/use those namespaces, while ignoring extension methods from other contexts.

For cross-cutting concerns that span multiple entity types, you can extend the DbContext itself. A common approach is to have multiple concrete roots for each of your business contexts. The business specific roots will inherits a common DbContext base class. The base class contains EF specific stuff. Factory and adapter patterns often appear in relation to these roots as well, but the key concept is that each top-level service derives from a real DbContext… your calling code has all the LINQ and EF goodness at its disposal.

If you embrace EF’s DbContext as a top-level business service, either directly or through inheritance, then you will find EF can be a very pleasant experience. You are able to leverage its full power at all layers of your application, and friction with EF’s internals disappear. Using custom abstractions of your own, it is hard to reach this level of fluidity.

The Domain Model you’ve read about online:

If you go online and read recent articles about ASP.NET application designs, you’ll find many advocates of domain model designs. This would be great, except that most of them still argue for custom unit-of-work and repository patterns.

The only difference between these designs, and the hybrid n-tier layering design I described before, is that the abstractions here are a true part of the business layer, and are placed at, or near, the top of the stack.

EF Testable Domain Model

While these designs are superior to pancake models, I find the additional custom abstraction is largely unnecessary, adds little value, and usually creates the same kinds of friction you see in the n-tier approaches.

The reason for the extra layer of abstraction seems to have two sources. Partially it comes from that legacy n-tier thinking being applied, incorrectly, to a domain model. Even though it avoids full layering, the desire for more abstraction still comes from the designer’s inner-pancake.

The bigger force advocating for extra abstractions comes from the Test Driven Development crowd. Early versions of EF were completely hostile to testing. It took insane OOP acrobatics and deep abstractions even to get something vaguely testable.

In EF 4.1, code-first was introduced. It brought us the first versions of DbContext and DbSets, which were fairly friendly towards unit testing. Still though, dependency injection issues usually made an extra layer of abstraction appealing. These middle-versions of EF are where the design I’ve just diagrammed came from.

In current EF versions (6 or higher), DbContext and DbSets are now pervasive throughout all of EF. You can use them with model-first, database-first, and code-first approaches. You can also use them with POCOs, or with diagram generated entities (which are still POCOs in the end). On the testability front, EF has added numerous features to make native EF objects easily testable without requiring these layers of custom abstraction.

You can, through a bit of experimentation, learn how to write great unit tests directly against concrete instances of DbContext and DbSet –without any runtime dependency on the physical database.

How to achieve that level of testability in your model is a topic for another post, but trust me… you don’t need custom repositories for testing EF anymore. All you need is to be smart about how you compose your code, and maybe a little help from a mocking framework here and there.

I’ve kept most of this discussion pretty high-level. Hopefully it will help expand how you view EF based application designs. With some additional research, you should be able to take these ideas and turn them into real code that’s relevant to your own business domain.

 

TicketDesk 3 Dev Diary – MEF, IoC, and Architectural Design

TicketDesk 2 and TicketDesk 3 have some key architectural differences. Both enforce a strict separation of concern between businesses and presentation layers, but there are major architectural differences within each layer. In this installment, I’d like to talk about how the back-end architecture will evolve and change.

TicketDesk 2 – Decoupled design:

The most significant technology that shaped TicketDesk 2’s class library design was the use of the Managed Extensibility Framework (MEF). The use of MEF in TicketDesk 2 was not about modularity, at least not in a way that is relevant to business requirements. TicketDesk 2 was never intended to support plug-ins or dynamic external module loading. I used MEF for two reasons; I was giving test driven development (TDD) another shot, and I had planned to write a Silverlight client for TicketDesk 2.

MEF was originally built by the Silverlight team. It had a lot of potential for other environments, but didn’t play well with MVC back then. It took some dark magic and hacking to just make it work there. MEF is an extensibility framework first, but an IoC container only by accident. While MEF can do the job of an IoC container, it wasn’t particularly good in that role.

As an extensibility framework, MEF actually has more in common with require.js than traditional server-side IoC frameworks. As a Silverlight technology, the primary purpose was to enable clients to download executable modules from the server on demand when needed. This is exactly what require.js does for JavaScript in HTML applications. The truly interesting thing is that TicketDesk 2 did not use MEF in this way at all. Asp.Net MVC is a server-side environment following a request-response-done type execution flow. Deferred module loading isn’t relevant in that kind of environment. TicketDesk used MEF only for its secondary IoC features — runtime composition and dependency injection.

Considering the difficulty in getting MEF working, and the fact that there are better IoC frameworks for MVC, I should have scrapped MEF in favor of Ninject –which has made me very happy in dozens of other projects. I stuck with MEF partly because it would pay off when I got to the Silverlight client, and partly because I liked the challenge that MEF presented.

Sadly, I was only three weeks into development on TicketDesk Silver, the Silverlight client, when Microsoft released Silverlight’s obituary. I had two other projects under development with Silverlight at the time, so that was a very bad summer for me.

The modular design of TicketDesk’s business layer is mostly about testability. EF 4 was quite hostile to unit testing, so I did what everyone else was doing… I wrapped the business logic in unit-of-work and repository patterns, and made sure the dependencies targeted abstract classes and interfaces. If you want to get all gang-of-four about it, the service classes in TD2 are more transaction script than unit-of-work, but it gets the same job done either way. This gave me the level of testability I needed to follow a (mostly) TDD workflow.

One thing I have never liked about heavy unit testing, and TDD in particular, is having to implement complex architectures purely for the sake of making the code testable. I’ll make some design concessions for testability, but I have a very low tolerance for design acrobatics that have nothing to do with an application’s real business requirements.

TicketDesk 2 walks all over that line. I dislike that there are a dozen or more interfaces that would only ever have one (real) concrete implementation. Why have an interface that only gets inherited by one thing? I also dislike having attributes scattered all over the code just to describe things to an IoC container. Neither of those things make TicketDesk work better. It just makes it more complex, harder to understand, and harder to maintain.

On the flip-side, I was able to achieve decent testability without going too far towards an extreme architecture. The unit tests did add value, especially early in the development process –They caught a few bugs, helped validate the design, and gave me some extra confidence.

If you noticed that the current source lacks unit tests, bonus points to you! My TDD experiment was never added to the public repository. I was pretty new to TDD, and my tests were amateurish (to be polite). They worked pretty well, and let me experience real TDD, but I didn’t feel that the tests themselves made a good public example of TDD in action.

TicketDesk 3 – Modularity where it matters:

A lot has changed for the better since I worked on TicketDesk 2.

Some developers still write their biz code in a custom unit-of-work and repository layer that abstracts away all the entity framework stuff; which is fine. But when EF code-first introduced the DbContext, it became much friendlier towards unit testing. The DbContext itself follows a unit-of work pattern, while its DbSets are a generic repository pattern. You don’t necessarily need to wrap an additional layer of custom repository and unit-of-work on top of EF just to do unit testing anymore.

I plan to move most of the business logic directly into the code-first (POCO) model classes. Extension methods allow me to add functionality to any DbSet<T> without having to write a custom implementation of the IDbSet interface for each one. And the unit-of-work nature of the DbContext allows me to put cross cutting business logic in the context itself. Basically, TD 3 will use something close to a true domain model pattern.

As for dependency injection, the need to target only interfaces and abstract types has been reduced. An instance of a real DbContext type can be stubbed, shimmed, or otherwise mocked most of the time. In theory, I should be able to target stubbed/shimmed instances of my concrete types. If I find the need to target abstracts, I can still refactor the DbSets and/or DbContext to inherit custom interfaces. There still isn’t a compelling need to wrap the business logic in higher layers of abstraction.

In TicketDesk 3, I will not be using a TDD workflow. I love unit testing, but am traditionally very selective about what code I choose to test. I write tests for code that will significantly benefit from them –complex and tricky code. I don’t try to test everything. Using TDD as a design tool is a neat thought process, but I find that design-by-test runs counter to my personal style of design. I can easily see how TDD helps people improve their designs, but I personally tend to achieve better designs when I’m coding first and testing later.

When I do get to the need for dependency injection, I plan to run an experimental branch in TicketDesk 3 to explore MEF 2 a bit further. I think they have fixed the major issues that made MEF 1 hard to use in web environments, but it is almost impossible to find good information online about MEF 2. The documentation, when you can find it, is outdated, contradictory, and just plan confusing. What I have found suggests that MEF 2 does work with MVC 4, but still requires some custom infrastructure. What I don’t know is how well it works.

With the need for dependency injection reduced, few compelling extensibility requirements on the back-end, and no plans to do heavy unit testing, I am more inclined to go with Ninject. They care enough to write top-notch documentation, and it was designed explicitly for the purpose of acting as an IoC container… which is the feature set TicketDesk actually needs.