Defining Aggregates for an Expressive Domain Model
When I started to apply the concepts of Domain-Driven Design to my projects, the decision of what entity to choose as the root of an aggregate seemed to be relatively straightforward. On smaller projects with simple domain models the choice can be fairly obvious, but on large projects or those with more complex domain models the choice is a lot less clear. Most likely, your aggregate roots will change along with refactoring of your model as you understanding of the domain improves.
Aggregate roots act as the entry points into an aggregate. As such, choosing an entity to be a root can have significant impacts on performance and usability of the domain model. If you find yourself compensating for a awkward domain model it may be worthwhile to investigate whether your choice of aggregate roots is truly reflective of the domain.
Choosing an aggregate root
The boundaries of an aggregate aren't always clear. An aggregate can be sub-divided into smaller aggregates, meaning an aggregate root can also be a member of a larger aggregate. Complicating things further is the fact that an aggregate only exists for a given bounded context. Different bounded contexts may take entities from two or more aggregates and consider them as a single aggregate. Will all of this in mind, how do you define an aggregate and choose an appropriate aggregate root?
In my experience, I began defining aggregates by deferring to my database schema. Understanding that aggregate roots are those entities that don't depend on the existence of any other entity I chose my entites based on a rule of cascading deletes. That is to say, if a table was subject to cascading deletes from another table, the entities stored in it were members of an aggregate. Following the chain of cascading deletes backwards, a table can be found that doesn't have any cascading deletes. The entities in this table are then the aggregate roots. The rule of cascading deletes is a good method of choosing initial aggregates. Unfortunately, it doesn't take into consideration the more subtle aspects of the domain. Using it as the only method of defining aggregates will possibly lead to issues later in development.
As my domain models became larger and the associations between entites became more complex, I found myself developing around deficiencies in the model. Particularily, traversing large aggregates to gain access to entities buried deep within them. My initial thoughts went to refactoring the domain model to breakup the aggregate into smaller parts. After a lot of reading and thought it was clear that compensating for an awkward aggregate was indeed an indication that I needed to refactor my domain model.
"Refactoring towards deeper insight"
In his book, Domain-Driven Design, Eric Evans explains that refactoring is an opportunity to learn more about your domain. Simply refactoring to make it easier to access entities would be a missed opportunity to achieve deeper insight. Aggregates are collections of entities that should be loaded together and persisted together. By accessing the aggregate only through a root, this consistency is maintained. In this sense, a database doesn't reflect how an aggregate is used within the application. These are insights that can only be gained by using the domain model.
One project I was working on in particular had a domain model where a collection of reports could be generated for a student. The reports were explicitly dependent on the student, so I made students the aggregate root. Over time I found that working with reports in the application was difficult because traversing the aggregate from the root to get to all the associations within the report was increasingly complex.
After some thought, I decided to refactor by making a new aggregate where reports was the root. While a very simply change, I immediately found that it provided a much more natural way of working with the domain model. There are many cases when you might need to access a student without needing to access the reports of a student, so I didn't need an aggregate that loaded these reports as well. When accessing a report, however, the report necessarily needs all of its associated entities in order to be useful within the domain so it makes a lot of sense to model this as an aggregate.
Modelling the domain, not the data
This single refactoring was an epiphany for me. It didn't really mitigate traversing the large aggregate, but at least these traversals were expressive of the domain. I felt it was the perhaps the first time I thought of the domain as a concept separate from the application. I was now modelling the rules and logic of the domain instead of the data the model was loading from the database.
An important part of modelling a domain is defining the boundaries of aggregates. These boundaries will impact how you use the domain model. Looking to the database for hints at where these boundaries are is fine but emphasis should be placed on understanding the domain so that you can create a model that accurately expresses your domain. If you find a domain model is awkward, it may be a sign that you haven't yet identified the real boundaries for your aggregates.
Comments