Tony Marston's Blog About software development, PHP and OOP

How to Produce a Rich Domain Model with Active Record

Posted on 12th October 2023 by Tony Marston
Introduction
What is the Active Record (AR) pattern?
My implementation of Active Record
Issues with Kevin Smith's article
It's not an ORM
Each table column must be a separate property
Cannot make a property private
Complex validation rules
The AR pattern determines the system architecture
Summary
References
Comments

Introduction

I recently came across an article written by Kevin Smith called How to Produce a Rich Domain Model with Active Record in which his opening statement was:

You can't. It's not possible.

Active Record provides complete access to a database row's fields through an object's public properties (or "attributes" of an "Active Record model", as they're called in both Ruby on Rails' and Laravel's ORMs). Any part of the codebase can access those attributes to read or change any of the data in the database row, making it easy to start working with a real database in your code.

The trouble starts when the complexity of the business inevitably reveals itself and you need to restrict the inherent permissiveness of Active Record to ensure that information is only retrieved or modified according to the rules of the business. In other words, you need a rich domain model: a layer of the codebase where the expressiveness of the business is encoded into a community of collaborating objects, each properly bounded and equipped with the information and behavior they need to model the relevant interactions and responsibilities of the business.

I completely disagree with this statement as I have been using my version of this pattern for the past 20 years, and I have never encountered any of the problems which he identifies.

This is almost identical to the rebuttal I gave in Active Record: Getting it Right to Shawn McCool's article called Active Record: How We Got Persistence Perfectly Wrong.

The fact that both of these people had problems with their implementations of this pattern speaks to the failure in those implementations rather than the pattern itself. A Design Pattern fits the following description:

In software engineering, a software design pattern is a general, reusable solution to a commonly occurring problem within a given context in software design. It is not a finished design that can be transformed directly into source or machine code. Rather, it is a description or template for how to solve a problem that can be used in many different situations.

This makes it quite clear that a pattern describes WHAT needs to be done but not HOW it should be done. Different programmers using the same language could produce different implementations of a pattern, but that is completely irrelevant. Provided that the code follows the description of the pattern and produces the expected results then it is a valid implementation. There is no such thing as a single definitive implementation that every programmer should follow, a valid implementation that makes every other implementation automatically invalid. The success or failure of a pattern's implementation is down to the skill of the implementor, so if there are problems with an implementation then they must be due to the lack of skill shown by the implementor.

What is the Active Record (AR) pattern?

The Active Record pattern is described by Martin Fowler, the author of Patterns of Enterprise Application Architecture (PoEAA), as follows:

An object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data.



An object carries both data and behavior. Much of this data is persistent and needs to be stored in a database. Active Record uses the most obvious approach, putting data access logic in the domain object. This way all people know how to read and write their data to and from the database.

In wikipedia this pattern has the following description:

The active record pattern is an approach to accessing data in a database. A database table or view is wrapped into a class. Thus, an object instance is tied to a single row in the table. After creation of an object, a new row is added to the table upon save. Any object loaded gets its information from the database. When an object is updated, the corresponding row in the table is also updated. The wrapper class implements accessor methods or properties for each column in the table or view.

This pattern is commonly used by object persistence tools and in object-relational mapping (ORM). Typically, foreign key relationships will be exposed as an object instance of the appropriate type via a property.

My implementation of Active Record

Before I switched to using PHP in 2002 I had no knowledge of design patterns, OO "best practices", or those design principles such as SOLID. I had spent the previous 20 years in writing enterprise applications using COBOL and UNIFACE, and neither of these languages had "best practices". Each team I had worked in had its own set of programming standards which were completely different from those used by every other team. The idea of a "one size fits all" set of standards simply did not exist.

With COBOL I was used to developing monolithic single-tier programs, while UNIFACE v5 introduced me to the 2-tier architecture as all database access was handled by a built-in database driver. When UNIFACE v7.2 was released it introduced support for the 3-Tier Architecture with its Presentation, Business and Data Access layers. I liked this so much I used it as my starting point for my PHP development. This was relatively easy as programming with objects is naturally 2-tier by default - after creating a class with methods and properties you must have a separate script which instantiates this class into an object so that it can then call its methods. Because I also started off by generating all HTML using XSL stylesheets, for which I wrote a single reusable object, I effectively split my Presentation layer into two giving a Controller and a View as described in the MVC Design Pattern as shown in Figure 1:

Figure 1 - 3-Tier Architecture combined with Model-View-Controller

model-view-controller-03a (5K)

Working on many different database applications taught me several lessons which I follow to this day:

I learned PHP by teaching myself by reading the online manual and looking at sample code in the books which were suggested in newsgroup forums. I learned about encapsulation and inheritance, but did not see any examples of polymorphism. I saw some ideas which I liked, some which I did not, and experimented with some ideas of my own. My goal was to produce as much reusable code as possible as that was the best way to become more productive. An enterprise application is made up of a variable number of subsystems, each with its own database and user transactions, so I looked for ways to deal with a growing number of database tables and user transactions to deal with those tables. Despite being totally unaware of the Active Record I developed a solution which had some similarities and some differences:

After constructing my first table class, the Model in the MVC design pattern, I then built a separate Controller for each of the user transactions in this family of forms. Once I had got these working I created a duplicate set for the second database table. This resulted in a lot of duplicated or similar code, so using what I had read about inheritance I decided that the best way to share this duplicated code was to move it to an abstract table class which could then be inherited by every concrete table class. At the end of this exercise I had a lot of code in the abstract class and nothing inside each concrete class except for its constructor. Note that I do not have separate validate() and store() methods which the Controller can call as they are called internally from within the abstract class, as shown in Common Table Methods:

Common Table Methods
Methods called externally Methods called internally UML diagram
$object->insertRecord($_POST)
$fieldarray = $this->pre_insertRecord($fieldarray);
if (empty($this->errors) {
  $fieldarray = $this->validateInsert($fieldarray);
}
if (empty($this->errors) {
  $fieldarray = $this->commonValidation($fieldarray);
}
if (empty($this->errors) {
  $fieldarray = $this->dml_insertRecord($fieldarray);
  $fieldarray = $this->post_insertRecord($fieldarray);
}
ADD1 Pattern
$object->updateRecord($_POST)
$fieldarray = $this->pre_updateRecord(fieldarray);
if (empty($this->errors) {
  $fieldarray = $this->validateUpdate($fieldarray);
}
if (empty($this->errors) {
  $fieldarray = $this->commonValidation($fieldarray);
}
if (empty($this->errors) {
  $fieldarray = $this->dml_updateRecord($fieldarray);
  $fieldarray = $this->post_updateRecord($fieldarray);
}
UPDATE1 Pattern
$object->deleteRecord($_POST)
$fieldarray = $this->pre_deleteRecord(fieldarray);
if (empty($this->errors) {
  $fieldarray = $this->validateDelete($fieldarray);
}
if (empty($this->errors) {
  $fieldarray = $this->dml_deleteRecord($fieldarray);
  $fieldarray = $this->post_deleteRecord($fieldarray);
}
DELETE1 Pattern
$object->getData($where)
$where = $this->pre_getData($where);
$fieldarray = $this->dml_getData($where);
$fieldarray = $this->post_getData($fieldarray);
ENQUIRE1 Pattern

Alongside these common methods are a set of common table properties:

Common Table Properties
$this->dbname This value is defined in the class constructor. This allows the application to access tables in more than one database. It is standard practice in the RADICORE framework to have a separate database for each subsystem.
$this->tablename This value is defined in the class constructor.
$this->fieldspec The identifies the columns (fields) which exist in this table and their specifications (type, size, etc).
$this->primary_key This identifies the column(s) which form the primary key. Note that this may be a compound key with more than one column. Although some modern databases allow it, it is standard practice within the RADICORE framework to disallow changes to the primary key. This is why surrogate or technical keys were invented.
$this->unique_keys A table may have zero or more additional unique keys. These are also known as candidate keys as they could be considered as candidates for the role of primary key. Unlike the primary key these candidate keys may contain nullable columns and their values may be changed at runtime.
$this->parent_relations This has a separate entry for each table which is the parent in a parent-child relationship with this table. This also maps foreign keys on this table to the primary key of the parent table. This array can have zero or more entries.
$this->child_relations This has a separate entry for each table which is the child in a parent-child relationship with this table. This also maps the primary key on this table to the foreign key of the child table. This array can have zero or more entries.
$this->fieldarray This holds all application data, usually the contents of the $_POST array. It can either be an associative array for a single row or an indexed array of associative arrays for multiple rows. This removes the restriction of only being able to deal with one row at a time, and only being able to deal with the columns for a single table. This also avoids the need to have separate getters and setters for each individual column as this would promote tight coupling which is supposed to be a Bad Thing ™.

A more complete description can be found in Abstract Table Class.

As stated previously I was able to create a standard validation object to take care of all primary validation, but what about any additional business rules known as secondary validation? Because I was using an abstract table class I was able to implement the Template Method Pattern which meant that I could add custom code into any concrete table class simply by adding code into one of the available customisable "hook" methods.

For my set of six Controllers the only difference was the name of the class from which the object was instantiated (remember that I did not need any setters and getters for individual columns), so I decided to extract the definition of the class name into a separate component script which then passed that name down to the Controller. This meant that while each user transactions had its own component script they could each share one of the standard Page Controllers.

I do not insert any methods into any concrete table class to deal with object associations. My previous experience taught me that regardless of what type of relationship you have to deal with they are variations of the standard one-to-many or parent-child relationship which can be deal with in a uniform way by using a special Transaction Pattern which deals with the handling of the foreign keys between those two entities. Referential Integrity can also be dealt with by standard code which is built into the framework. Object Aggregation also becomes a non-issue.

Originally I constructed the contents of the $fieldspec variable by hand, but as this became tedious due to the rising number of database tables in my application I decided to automate by building a Data Dictionary into which I could import each table's details directly from the database schema, then export it to create a separate table structure file as well as the table class file. Originally I used to construct each component script by hand, but then I added extra functionality into my Data Dictionary so that I could select a table, link it to a Transaction Pattern, press a button and have the script(s) created for me. This amount of code generation greatly increased my productivity.

As you can see I have taken the bare bones description of the Active Record pattern and refactored it to produce a sophisticated and comprehensive implementation which is far superior to anything else I have seen.

Issues with Kevin Smith's article

I shall now comment on some of the statements made in Kevin Smith's article which I find questionable, and explain how I have circumvented some of the issues which he has raised.

Active Record is not an ORM
Active Record provides complete access to a database row's fields through an object's public properties (or "attributes" of an "Active Record model", as they're called in both Ruby on Rails' and Laravel's ORMs). Any part of the codebase can access those attributes to read or change any of the data in the database row, making it easy to start working with a real database in your code.

The idea that the AR pattern is used to produce an Object Relational Mapper (ORM) is just an erroneous opinion, not a fact that is cast in stone. In Martin Fowler's Catalog of Patterns it is not categorised as being any sort of mapper which is reserved for those cases where the object structures and the relational structures do not match. Besides, an ORM is a service, not an entity. The difference between the two is that a service is stateless and does not contain any business rules while an entity has state and does contains business rules. This is confirmed in Active Record Basics where it says:

Active Record is the M in MVC - the model - which is the layer of the system responsible for representing business data and logic.

This is also confirmed in Martin Fowler's description which clearly show several methods which retrieve current state. This proves that the object is stateful and is therefore an entity, and not stateless, in which case it would be a service.

Each Model, one for each entity in the application, sits in the business/domain layer. An ORM or equivalent Data Access Object (DAO) sits in the Data Access layer. In my implementation there is no need for any mapping between two different structures simply because the structures are never different.

Each table column must be a separate property
Active Record provides complete access to a database row's fields through an object's public properties (or "attributes")

Having separate class properties for each table column, each with its own setter (mutator) and getter (accessor), has never struck me as being a good idea for database applications, although it may be appropriate for other types of application. If you think about it there are two possibilities:

  1. An object only has one possible set of values (properties or attributes) and those values appear individually and separately. In an aircraft control system, for example, the software is only dealing with a single aircraft, and each of its sensors (air temperature, air pressure, air speed, altitude, et cetera) sends in its current reading on its own without waiting for it to be assembled with the readings from other sensors. The software processes all these readings in real time and sends the results to the instrument panel in the pilot's cockpit.
  2. An object such as a database table has separate values for each row. A database deals with datasets which may contain values for any number of columns and any number of rows, and for each row there can be any number of columns. The User Interface (UI) or Presentation layer, which in a web application is an HTML document, also deals with datasets and not one column at a time. When an HTML document is constructed before being sent back to the client device it contains all the values it needs - it is not fed them one variable at a time. When the user presses a SUBMIT button the entire contents of that HTML document are posted to the server in a single $_POST array - they do not appear one variable at a time.

Can you see the difference between the two? One deals with a single object (aircraft) which has a single set of values which are fed in from different sensors at different times. This means that separate properties with their own getters and setters are appropriate. The other deals with the contents of a database table which deals with datasets which can have any number of columns from any numbers of rows, and some of those columns may be for/from other database tables. Each dataset is handled in an input or output operation as a single unit (an array) and not individual pieces. This means that separate properties with their own getters and setters are NOT appropriate.

If the data coming into and going out of an object appears in the form of sets which may contain values for any number of columns from any number of rows from any number of tables then how is it possible to justify building software which restricts that data to a fixed set of columns from a single row of a single table. This sounds like an artificial restriction just to obey an arbitrary rule which was devised for a different set of circumstances. Deconstructing each dataset into separate pieces of data so that you can use a separate method to read or write each piece of data using code similar to the following would, to me, be a prime example of tight coupling (which is considered to be bad):

<?php 
$dbobject = new Person(); 
$dbobject->setUserID    ( $_POST['userID'   ); 
$dbobject->setEmail     ( $_POST['email'    ); 
$dbobject->setFirstname ( $_POST['firstname'); 
$dbobject->setLastname  ( $_POST['lastname' ); 
$dbobject->setAddress1  ( $_POST['address1' ); 
$dbobject->setAddress2  ( $_POST['address2' ); 
$dbobject->setCity      ( $_POST['city'     ); 
$dbobject->setProvince  ( $_POST['province' ); 
$dbobject->setCountry   ( $_POST['country'  ); 

if ($dbobject->updatePerson($db) !== true) { 
    // do error handling 
} 
?> 

This also means that if you change the number of columns then you have to change some method signatures as well as all the places which call those signatures.

Contrast this with the following code which demonstrates loose coupling (which is considered to be good):

<?php 
require_once 'classes/$table_id.class.inc';  // $table_id is provided by the previous script
$dbobject = new $table_id;
$result = $dbobject->updateRecord($_POST);
if ($dbobject->errors) {
    // do error handling 
}
?> 

If you look at Common Table Methods you will notice that I load all the data into an object as a single array argument on the method call without having to name any of its component parts. This means that I can vary the contents of this input array at will without any adverse consequences as the validation which is automatically performed within the method will reject any input which is not completely valid. Note also that the same array is used as both an input and output argument on every internal method call so that the contents of the array can be amended, but only by a method which belongs inside that particular object.

Accessing a piece of data using $fieldarray['column'] is no different from $this->column, so there is no loss of functionality but there is an increase in the advantages.

Extracting the data from an object, such as when transferring it to the View object, does not require a collection of getters as it can be done with one simple command:

$fieldarray = $dbobject->getFieldArray();

Another reason which caused me to reject the idea of having a separate class property for each column, each with its own setter and getter, is that it restricts each object to only being able to deal with columns on that particular table.

Cannot make a property private
Need to make a property private so that only the object has access to it, a fundamental principle of software design called information hiding? It's not possible.

Firstly, I do not consider that implementation hiding, which is totally different from information hiding, requires the use of any visibility declarations. Simply hiding the implementation behind an interface (which is what ALL software does, not just OO software) has always been good enough for me. I built my framework using PHP 4 which did not have any of these visibility options, and my code worked without any errors, thank you very much. When I upgraded to PHP 5 and became aware of these options I simply could not justify the effort of changing my code as there would be no measurable benefits.

As explained above in Each table column must be a separate property I do not have a separate property for each column, so I simply do not have this problem. Once you have called an operation such as insertRecord() or updateRecord() with its input array you cannot change the contents of the array from outside the object, only from the inside.

Complex validation rules
Need to require multiple pieces of information for an update and validate them before accepting the changes, like making sure you have both longitude and latitude before updating geo-coordinates? Sorry, you can't enforce that.

Every experienced programmer knows that all input data must be validated before it is written to the database otherwise the DBMS could reject the data. This requires code to be executed within the object, but the questions that needs to be answered are What code do you write? and Where do you put it?

As stated earlier Active Record is the M (Model) in MVC so it is responsible for all data validation and business logic. This means that all validation rules need to be accessible from within the Model. If you look at Common Table Methods you will see that the insertRecord() and updateRecord() methods each call their own internal validation methods before attempting to update the database. Two separate types of validation are performed:

The AR pattern determines the system architecture
By its very nature, Active Record's impact on a domain cannot be contained and thus will largely determine the resulting system architecture, despite whatever plans its developers may have at the outset.

If you had designed your system architecture to be multi-layered, as I did, then built each component to conform to its position in that architecture, then you would not have a problem. My Controllers, Views and DAOs are pre-built and supplied within the framework, and all Models are initially generated by the framework to follow a standard pattern. All the sharable protocols, which includes all standard validation, are inherited from an abstract class, and the only code the developer has to add is the custom logic which can be inserted into any of the available customisable hook methods.

Given enough time under real-world dynamics, it will lead to a big ball of mud that becomes increasingly expensive to maintain.

I could not disagree more. I built my current framework 20 years ago and constructed an abstract table class after noticing the common protocols and common properties after building just two concrete table classes. I used this framework to build my first ERP application in 2007 using the designs for the PARTY, PRODUCT, ORDER, INVOICE, INVENTORY and SHIPMENT databases from Len Silverston's Data Model Resource Book. The prototype was finished in 6 man-months, which works out at an average of just one man-month for each database. I later upgraded this to a much larger ERP application which now contains 20 subsystems, 450+ database tables, 1,200+ relationships and 4,000+ user transactions, and I have never encountered any of the issues which Kevin Smith mentioned.

As for being difficult and expensive to maintain, that is not the fault of any particular design pattern but the developer's ability to implement them correctly. In recent years I have upgrade my ERP application to include the following:

Does that sound like a Big Ball of Mud to you?

Now you could try to contain Active Record to the edge of your application, only using it as a conduit between your database and domain objects that properly enforce the necessary constraints. This would keep it out of your domain and go a long way toward avoiding architectural distortion.

How you use the AR pattern is up to you, but if you use it incorrectly or in the wrong place then how can you say the pattern itself is at fault? I use it, as do others, as the conduit between the Presentation and Data Access layers, which means that it is the Model in MVC. As such it is the only layer in the entire system which contains and enforces all business logic, and once a method has been called by a Controller on a Model it is physically impossible for the input data to be corrupted by an external object. If I can do it then anyone can.

Summary

The final statement in his article goes as follows:

To put it plainly: Active Record fundamentally cannot allow the enforcement of constraints on its objects, leaving you with no option but to apply all your business rules in services and therefore "completely miss the point of what object-oriented design is all about" and "incur all of the costs of a [rich] domain model, without yielding any of the benefits".

If your interpretation of the "rules" of OOP causes you to write software and then complain about the results which it produces there as only two possibilities as far as I can see:

If your implementation of a design pattern is not producing satisfactory results then perhaps you have chosen the wrong pattern, or perhaps your implementation of that pattern is incomplete or flawed. You should also be aware that it is not good practice to pick a design pattern and then write the code to implement it, you should first write code that works and then refactor it as necessary to remove any duplications and see if any patterns emerge. This is not just my opinion, it is also the opinion of Erich Gamma, one of the Gang of Four, who in How to Use Design Patterns said:

Do not start immediately throwing patterns into a design, but use them as you go and understand more of the problem. Because of this I really like to use patterns after the fact, refactoring to patterns.

Some people say that the AR pattern produces an ORM, which makes it a stateless service, while others say that it is the M in MVC, which makes it a stateful entity. It can't be both, so which is it? In my implementation of this pattern it is used as the Model, not an ORM. If you have used it as an ORM then you should have put your business logic into an object which exists in the business/domain layer, and this object should only pass control to the ORM/DAO once all the business rules have been processed. If your business rules are not being processed then that can only be an omission on the part of the developer and no fault in any pattern pattern.

Here endeth the lesson. Don't applaud, just throw money.


References

The following articles describe aspects of my framework:

The following articles express my heretical views on the topic of OOP:

These are reasons why I consider some ideas to be complete rubbish:

Here are my views on changes to the PHP language and Backwards Compatibility:

The following are responses to criticisms of my methods:

Here are some miscellaneous articles:


counter