Tony Marston's Blog About software development, PHP and OOP

Getters and Setters are EVIL

Posted on 2nd December 2023 by Tony Marston
Introduction
Using arrays to pass data around
Data validation made easy
Reading from multiple tables
Writing to multiple tables
References
Comments

Introduction

I write nothing but enterprise applications. For the first 20 years of my career I used compiled languages which used bit-mapped displays, but for the last 20 years I have been developing nothing but web-based applications using PHP, which is an interpreted language. PHP is the first (and only) programming language I have used which has object oriented capabilities, and as this has been often been advertised as the greatest thing since sliced bread I wanted to know what it meant. The best description I found went as follows:

Object Oriented Programming is programming which is oriented around objects, thus taking advantage of Encapsulation, Inheritance and Polymorphism to increase code reuse and decrease code maintenance.

When I was learning PHP I had 3 sources of information - the PHP manual, books and online tutorials. I learned about encapsulation and inheritance, but saw no examples of polymorphism. I loaded some of the sample code onto my home PC and stepped through it with my debugger which was built into the IDE which I chose to use instead of a plain vanilla text editor. I did this so that I could examine in great detail how each line of code worked. As I became more and more familiar with PHP I strived to utilise its OO features in order to create as much reusable code as was humanly possible. I implemented these features in the following ways:

While I saw a few samples of "advice" of how things could be done I never accepted this as an instruction on how things should be done. How a programmer solves a problem is down to the skills of that programmer, and I refuse to be limited by the lesser kills of others. I played with some of the sample code which I found and did a bit of experimentation using ideas of my own to see what worked best for me and the type of application which I was writing. When I later became aware of things called "best practices" I could see that the results which they produced were inferior to mine, so I chose to ignore them. They appeared to be nothing more than personal preferences from people with limited experience rather than universal rules compiled by genuine experts, so I dismissed any idea which stood in the way of my aim to create as much reusable code as possible. One of these so-called "best practices" is the subject of this article.


Using arrays to pass data around

I noticed that PHP's handling of data arrays was far superior to that which was available in my previous languages. It meant that I could pass around collections of data whose contents were completely flexible and not tied to a particular pre-defined record structure. The data passed into objects from both the Presentation layer (via the $_POST array) and the Data Access layer (via the result on an SQL SELECT query) appears as an array, and this can contain a value for any number of fields/columns. The foreach function in PHP makes it easy to step through an array and identify what values it contains for what fields.

However, in all of the OOP samples I saw in books or within internet tutorials I noticed that the same convention was followed:

When I saw this I asked myself some simple questions: If the data outside of an object exists in an array, why is the array split into its component parts before they are passed to the object one component at a time? Can I access the data from an array inside the object, or am I forced to use a separate class variable for each field/column? The answer turns out to be a choice between:

$this->column            // each column has its own class property
and
$fieldarray['column']    // all columns are held in a single class property

Guess what? To PHP there is no discernible difference as either option is possible. The only difference is in how much code the developer has to write to put that data in and to get that data out. I then asked myself another question: Under what circumstances would a separate class property for each piece of data, forcing each to have its own setter (mutator) and getter (accessor), be the preferable choice? The answer is as follows:

This scenario would fit something like an aircraft control system which relies on discrete pieces of data which are supplied by numerous sensors all over the aircraft. When changes in the data are processed the system may alter the aircraft's configuration or it may update the pilot's display in the cockpit.

This scenario does NOT fit a web-based database application for the following reasons:

Having built enterprise applications which have hundreds of database tables and thousands of user transactions I realised straight away that having separate class properties for each table column, each with its own setter and getter, would be entirely the wrong approach as it produces tight coupling which in turn greatly restricts the opportunity for reusable software. As the aim of OOP is supposed to be to increase the amount of reusable software I decided that any practice which did not support this aim was something to be avoided.

Consider the following sample code which is required when using a separate property for each table's column:

<?php
require_once 'classes/person.class.inc';
$dbobject = new Person(); 
$dbobject->setUserID    ( $_POST['userID'   ); 
$dbobject->setEmail     ( $_POST['email'    ); 
$dbobject->setFirstname ( $_POST['firstname'); 
$dbobject->setLastname  ( $_POST['lastname' ); 
$dbobject->setAddress1  ( $_POST['address1' ); 
$dbobject->setAddress2  ( $_POST['address2' ); 
$dbobject->setCity      ( $_POST['city'     ); 
$dbobject->setProvince  ( $_POST['province' ); 
$dbobject->setCountry   ( $_POST['country'  ); 

if ($dbobject->insertPerson($db) !== true) { 
    // do error handling 
} 
?> 

This suffers from the following deficiencies:

Contrast this with the following code which can be used when the data array is not split into its component parts:

<?php 
require_once 'classes/$table_id.class.inc';  // $table_id is provided by the previous script
$dbobject = new $table_id;
$result = $dbobject->insertRecord($_POST);
if ($dbobject->errors) {
    // do error handling 
}
?> 

This is loosely coupled and offers the following advantages:

This means that I can use the following methods to handle the communication between a Controller and its Model:

Common Table Methods
Methods called externally Methods called internally UML diagram
$object->insertRecord($_POST)
$fieldarray = $this->pre_insertRecord($fieldarray);
if (empty($this->errors) {
  $fieldarray = $this->validateInsert($fieldarray);
}
if (empty($this->errors) {
  $fieldarray = $this->commonValidation($fieldarray);
}
if (empty($this->errors) {
  $fieldarray = $this->dml_insertRecord($fieldarray);
  $fieldarray = $this->post_insertRecord($fieldarray);
}
ADD1 Pattern
$object->updateRecord($_POST)
$fieldarray = $this->pre_updateRecord(fieldarray);
if (empty($this->errors) {
  $fieldarray = $this->validateUpdate($fieldarray);
}
if (empty($this->errors) {
  $fieldarray = $this->commonValidation($fieldarray);
}
if (empty($this->errors) {
  $fieldarray = $this->dml_updateRecord($fieldarray);
  $fieldarray = $this->post_updateRecord($fieldarray);
}
UPDATE1 Pattern
$object->deleteRecord($_POST)
$fieldarray = $this->pre_deleteRecord(fieldarray);
if (empty($this->errors) {
  $fieldarray = $this->validateDelete($fieldarray);
}
if (empty($this->errors) {
  $fieldarray = $this->dml_deleteRecord($fieldarray);
  $fieldarray = $this->post_deleteRecord($fieldarray);
}
DELETE1 Pattern
$object->getData($where)
$where = $this->pre_getData($where);
$fieldarray = $this->dml_getData($where);
$fieldarray = $this->post_getData($fieldarray);
ENQUIRE1 Pattern

Please note the following:

Because these methods are common to every table class it would be foolish to duplicate them, so following my own interpretation of "best practices" I decided to move them to an abstract class so that they could be shared using that simple mechanism called inheritance. Because the same methods produce different results depending on what object they are called on this satisfies the definition of polymorphism which is "same interface, different implementation". This means, for example, that when a Controller calls a series of operations on a Model it is not tightly coupled to a particular Model as it can use any Model which it is given. This is implemented using a technique known as Dependency Injection - Injecting the Model into the Controller.

Extracting the data from an object, such as when transferring it to the View object, does not require a collection of getters as it can be done with one simple command:

$fieldarray = $dbobject->getFieldArray();

This array can contain any number of columns from any number of rows and from any number of tables, which means that is does not require different variations in the code to deal with different combinations. PHP's foreach function provides the ability to iterate through an array and identify both column names and their values.

Another reason which caused me to reject the idea of having a separate class property for each column, each with its own setter and getter, is that it restricts each object to only being able to deal with columns on that particular table.


Data validation made easy

Anybody who has ever written a database application should know never to trust input submitted by a user. Each item of data in a database has a particular data type, and if your code tries to insert an incompatible value into a column, such as inserting the string "four" into a column expecting a number, or "30th February" into a column expecting a date, it will cause the query to abort. After receiving input from a user it is essential that you validate it before attempting to store it in the database.

The first question to ask is "Where should this validation be performed?" The answer is "In the Model", as shown by the calls to the validateInsert() and validateUpdate() methods in common table methods above. There are those who think that this validation should be called outside of the Model as it is wrong to insert unvalidated data into the Model, but it is they who are wrong! Data validation is part of the business logic, and business logic belongs nowhere but in the Business/Domain layer. The principle of encapsulation is defined as The act of placing data and the operations that perform on that data in the same class, and as validation is one of those operations it follows that each entity (table) in the business layer should have its own class, and that class should therefore contain all the necessary business logic for that entity, and this logic includes all validation.

I have also been told by some OO "experts" that I should be using setters as that is the correct place to validate a column's value, but this is wrong for two reasons:

  1. I don't use setters.
  2. That only works when I can validate a single column without reference to another column as that other column may not have been set yet. With the RADICORE framework I can compare the values in several columns very easily with code similar to the following:
    function _cm_commonValidation ($fieldarray, $originaldata)
    // perform validation that is common to INSERT and UPDATE.
    {
        if ($fieldarray['start_date'] > $fieldarray['end_date']) {
            // 'Start Date cannot be later than End Date'
            $this->errors['start_date'] = getLanguageText('e0001');
            // 'End Date cannot be earlier than Start Date'
            $this->errors['end_date']   = getLanguageText('e0002');
        } // if
        
        return $fieldarray;
    }
    

The second question to ask is "How should this validation be performed?" The answer is "With as much reusable code as possible" In all the code samples which I cane across I noticed that everybody was writing code manually to validate each column, but I had already worked out a method of calling a standard routine to perform this validation automatically. This is because the range of possible data types for a column in a database table comes from a fixed list, and the validation for each data type can therefore be fixed in the code. This means that it should be possible to write code along the lines of "If the data type for column X is Y, then validate X's value according to the rules for Y". As the database schema already contains the specifications for each column that exists in every table, and that includes its name, size and data type, it is a straightforward process to extract that information and make it available in each table's class file. This is why I created my Data Dictionary which has one process to import a table's specifications from the INFORMATION_SCHEMA in the application database into an intermediate database, and a second process to export that information to a table structure file so that it can be used to populate the common table properties in that table's object. This then enabled me to create a standard validation class which has two arguments - $fieldarray containing column data and $fieldspec containing each column's specifications. It then becomes a straightforward process to compare each column's value with its specifications.

Note that secondary validation can be performed in any of the following "hook" methods:


Reading from multiple tables

I have been informed that in "proper" OO programming each table class is only supposed to retrieve those columns which actually belong in that table. This rule is enforced by each table class having a separate getter for each column which belongs to that table. This also makes it impossible to retrieve data from multiple rows as each call to a getter can only obtain a single value for a single row.

This to me is an artificial restriction which does not support the way that databases work. They deal in data sets which can contain data for any number of columns from any number of tables from any number of rows. This cannot be duplicated with a fixed set of class properties each with their own getters and setters, but it can be duplicated with a single $fieldarray property.

With the RADICORE framework it is possible for an HTML screen to contain data from more than one table in the following ways:


Writing to multiple tables

It is quite possible that data from a single HTML screen needs to be spread across more than one database table, so how can this be achieved? The RADICORE framework provides the following ways:

Here endeth the lesson. Don't applaud, just throw money.


References

These are reasons why I consider some ideas on how to do OOP "properly" to be complete rubbish:


counter