This question will probably confuse a lot of novice programmers as they do not realise that Object Oriented Programming (OOP) had any aims in the first place. It's just a style of programming, right? Surely the only aim to programming is the writing of programs, the creation of software, right? While those answers are partially correct they miss an important point - the production of software on its own is not a measure of success, it is how effective, particularly cost effective it is viewed as by the end customer, the user. In the culinary world there is a well-known saying:
The proof of the pudding is in the eating.
This means that it does not matter what ingredients were used or what recipe was followed, if the end result looks like crap or tastes like crap then the entire effort has been a failure. Instead of producing a gourmet meal the cook has actually created a dog's dinner, something which is unfit for human consumption and will only be appreciated by a canine companion who will eat practically anything put in front of him.
In the software world the same rules apply - it does not matter what ingredients (programming language, software libraries, DBMS) you use, or what recipe (programming rules or "best practices") you follow, it is the end result as viewed by the paying customer that really counts. Different programmers have different ideas of what the "rules" of OOP are, but to my way of thinking most of them are not rules at all but simply guidelines or personal preferences. In my long experience of designing and building enterprise applications using a mixture of non-OOP and OOP languages, as well as a mixture of non-relational and relational databases, there is only a small number of rules which can genuinely be described as universally applicable, and these are identified below.
Note that these rules need not apply if you are a lone programmer who writes code only for personal consumption. However, if you are writing code which will be shared and maintained by other people then it is vitally important that you write code which those other people can read and understand. If you are the only person who can understand and maintain the code that you write then you are a bad programmer.
Programs must be written for people to read, and only incidentally for machines to execute.Martin Fowler, the author of Patterns of Enterprise Application Architecture (PoEAA) put it another way:
Any fool can write code that a computer can understand. Good programmers write code that humans can understand.Here is another variation of that saying:
Any idiot can write code that only a genius can understand. A true genius can write code that any idiot can understand.
Try to achieve complex tasks using simple code, not simple tasks using complex code.
Smart data structures and dumb code works a lot better than the other way around.This is the opposite view to that of Robert C. Martin (Uncle Bob) who, in his article NO DB wrote the following:
The database is just a detail that you don't need to figure out right away.I prefer to ignore Uncle Bob and follow the principles identified in Jackson Structured Programming where it says
start with the data structures of the files that a program must read as input and produce as output, and then produce a program design based on those data structures, so that the program control structure handles those data structures in a natural and intuitive way.
All the other "rules" or "best practices" which I have encountered are mainly based on one of the above, but merely expressed in a greater level of detail. You try and get a bunch of programmers to define what "readable" code means and all you will do is start a never-ending debate on when to use uppercase or lowercase, when to use camel case, snake case or even Studly caps.
I worked with several non-OO languages for over 20 years writing enterprise applications before I switched to using PHP in 2002 with its OO capabilities, and the first thing I needed to do was to find out what OOP actually meant and why it was supposed to be better than previous programming paradigms. The initial definition that I found at that time was roughly as follows:
Object Oriented Programming is programming which is oriented around objects, thus taking advantage of Encapsulation, Polymorphism, and Inheritance to increase code reuse and decrease code maintenance.
These three characteristics can be described as follows:
|Encapsulation||The act of placing data and the operations that perform on that data in the same class. The class then becomes the 'capsule' or container for the data and operations. This binds together the data and functions that manipulate the data.
More details can be found in What is OOP?
|Inheritance||The reuse of base classes (superclasses) to form derived classes (subclasses). Methods and properties defined in the superclass are automatically shared by any subclass. A subclass may override any of the methods in the superclass, or may introduce new methods of its own.
More details can be found in What is OOP?
|Polymorphism||Same interface, different implementation. The ability to substitute one class for another. This means that different classes may contain the same method signature, but the result which is returned by calling that method on a different object will be different as the code behind that method (the implementation) is different in each object.
More details can be found in What is OOP?
I noticed in the PHP manual that these capabilities were added to the language without removing the ability to write procedural code, so it was possible to have a mixture of procedural and OO code in the same program, thus leaving it up to the individual programmer to decide which style was best in a particular set of circumstances. This lead me to the conclusion, as documented in What is the difference between Procedural and OO programming? that:
OO programming is exactly the same as procedural programming except for the addition of encapsulation, inheritance and polymorphism. They are both designed around the idea of writing imperative commands which are executed in a linear fashion. The commands are the same, it is only the way they are packaged which is different.
It was quite clear to me that the objective of using OOP was to take advantage of the additional features in such a way as to increase the amount of reusable or sharable code within your application. An application consists of a number of different components (my ERP application currently has over 3,700) so it would not be unusual to find identical or similar pieces of code being used by more than one component.
Why would increasing the amount of reusable/sharable code be of any benefit? If you have a piece of logic which is common to several components the novice's method of implementing this would be to duplicate that logic, using the copy-and-paste method, in each component. This produces multiple copies of the same piece of logic in multiple places, so if a bug is ever found in that logic, or it needs to be updated, you have to find every copy of that logic in order to update it. All you have to do is miss one copy and the result could be at best unexpected and at worst unpleasant. The correct way, as practiced by experienced programmers, is to follow the DRY principle and define that logic in a single place, such as a function or a method, and then reference that function or method whenever you wish to execute that logic. This provides two enormous benefits:
It should therefore be obvious that the more reusable code you have then the less code you have to write. The less code you have to write then the less time it takes to complete a component. The less code you have to write then the less code you have to test as the reusable code will (should?) have already been tested. The less code you have to write in order to produce a component also means that you save time by not having to write as much code, and as time is money this also leads to lower costs. The ability to produce software in less time and at less cost than your rivals will always be a major factor in a competitive market.
When I came to start building my first PHP components I already had an architecture in mind which I had encountered in my previous language and which I saw could provide a solid basis for all future development. This was the 3-Tier Architecture with its separate Presentation, Business and Data Access layers. I had also encountered Extensible Markup Language (XML) and The Extensible Stylesheet Language Family (XSL), and I decided that I would create all my HTML pages using XSL Transformations. This meant that I had split my Presentation layer into two separate components thus producing an architecture that included the popular Model-View-Controller (MVC) Design Pattern. This combined architecture is shown in Figure 1:
Figure 1 - The MVC and 3-Tier architectures combined
In the following paragraphs I shall refer to various types of component using the names in the above diagram - Models, Views, Controllers and Data Access Objects (DAO). An application will contain a number of each of these component types which can be used in a variety of combinations in order to achieve different results. For example, a single Model may be referenced by any number of Controllers, and the DAO may be referenced by any number of Models. The most important layer is the Business layer, which is also known as the Domain layer, as this contains all the entities/objects and their individual business rules which are relevant to the application. The other components - the Controllers, Views and DAOs - are there as services which support the execution of the business rules.
Note here that I have introduced two categories of object. In his article How to write testable code the author identifies the following categories:
The PHP language does not have value objects, so I shall ignore them.
The components shown in Figure 1 above have been implemented as follows:
All business rules for an application exist in and only in the Business/Model layer, with a different Model component for each entity which needs to be referenced by the application. There could be hundreds of different Models in a large application.
All the service components are application-agnostic which means that they do not contain any logic or other knowledge which is specific to any application. This means that they can be used with the Model components of any application without any modification. The framework contains sets of pre-written and reusable components for these services as follows:
You must first create classes before you can make use of inheritance and polymorphism, so what you do here has a direct bearing on how much potential for reusability you will eventually create. Get it wrong and you will have limited potential. Get it right and that potential could be enormous.
Encapsulation is the act of creating a class for something which has data (state) as well as procedures (behaviour) which can operate on that data. The class then acts as the "capsule" for that data and those procedures. In OOP the data is implemented as "properties" and the procedures are implemented as "methods". Please try to avoid falling into the trap of creating anemic objects which contain state but very little behaviour. This is contrary to the basic idea of object-oriented design which is to combine data and process together.
The biggest challenge for the novice programmer is to identify which parts of an application, the "things", which need to be represented as classes, and what methods to build into each of those classes. RULE #1: what you do *NOT* do is create a single class called "application" with a single method called "run". Instead you identify the different "things" which are of interest to the those areas of the business which are to be handled by the application, and for each of these "things" you create a Model class which will exist in the Business layer. RULE #2: do not waste time by trying to design your software components first and your database last otherwise you will hit the problem known as Object-Relational Impedance Mismatch where the structure of the software components is out of sync with the the table structure in the relational database. One solution to this problem is to allow the mismatch but deal with it using an additional software component known as an Object-Relational Mapper (ORM), but to my mind this is just papering over the cracks instead of tackling the problem which is causing the cracks. A much better solution would be NOT to allow the mismatch in the first place, and this can be achieved quite simply by designing your database first, then designing your software around each database component.
Those new to OOP are so dazzled by the idea that OOP "lets you model the real world" that they try to model those objects which they perceive as existing within the real world. When designing something like an e-commerce application which deals with things called PRODUCTS, CUSTOMERS and SALES ORDERS they think it would be a good idea by designing classes for each of those three objects. They are told to leave the database design till later as it is less important, a mere "implementation detail".
It is a well known fact to every experienced database designer that sometimes the data for a single "thing" in the outside world will actually need to be split across more than one table in the database. This is a result of the process called Data Normalisation. For example, a sales order may initially be regarded as a single entity, but in a database it could require multiple tables such as ORDER_HEADER, ORDER_ITEM, ORDER_ADJUSTMENT and ORDER_ITEM_ADJUSTMENT. An experienced programmer would create a separate class for each of these tables whereas a novice would create an aggregate/compound class called ORDER which would handle every table associated with an order. Having a single class which is responsible for more than one database table surely breaks the Single Responsibility Principle (SRP), which is one of the reasons why I avoid that idea like the plague.
Inheritance is the ability to reuse the contents of a base class (superclass) to create a derived class (subclass). This leads to three important questions regarding the amount of code which can be reused through inheritance:
The method taught to novice OO programmers to identify places where inheritance may be used is to carry out the IS-A test. Unfortunately the poor dears get this completely wrong. They look at an entity called CUSTOMER and say "A customer is-a person, so I must create a PERSON class and then extend this to create a CUSTOMER class". Likewise they say "We sell widgets, and as a widget is-a product I must first create a PRODUCT class and then extend this to create a WIDGET class".
If you follow the same path you will end up with a number of superclasses (PERSON and PRODUCT in the above example) and a number of subclasses (CUSTOMER and WIDGET in the above example). This to me is wrong on so many levels:
The end result of this approach would be a limited amount of inheritance, which in my opinion is a sign of failure. Not only that, problems can be caused by mis-using inheritance by extending one concrete class to create another concrete class, or by creating deep inheritance hierarchies. The solution to these problems was provided in the book Design Patterns - Elements of Reusable Object-Oriented Software which was first published in October 1994 where it says:
One cure for this is to inherit only from abstract classes since they usually provide little or no implementation.
Any experienced OO programmer should know that when looking to create an abstract class you first look for a group of business/domain entities which share a common set of characteristics. A programmer experienced with SQL will be able to immediately point to the following list of characteristics which are common to all database tables:
Note that a table may be related to itself, in which case it is both the parent and the child.
Only a novice programmer would fail to see the benefit of placing the code to deal with all those common characteristics in an abstract class which can then be inherited by any number of concrete classes. This also passes the IS-A test as it it quite plain to see that every object in the domain/business layer is-a database table. The abstract class deals with the common characteristics while the concrete class provides the details which are unique to a particular table. But how much code can I put into the abstract class? Quite a lot, actually. This answers a criticism of my approach which was given as long ago as 2003 where someone known as Jochen Daum said the following:
This means you write the same code for each table - select, insert, update, delete again and again. But basically its always the same.
This person was obviously a novice to OO as he failed to understand that instead of repeating the code to deal with the SQL queries for select, insert, update and delete in each concrete table class you can define it just once in an abstract table class and then share it with every concrete table class by using inheritance. In my ERP application I currently have over 400 database tables, so if each of those 400 table classes inherits from the same abstract class then that is a lot of code sharing.
I disagree with the notion, as contained in the above quote from the Gang of Four book, that abstract classes usually provide little or no implementation. If you have been writing database applications for as long as I have then you may actually find that the code used to communicate with a database table could be quite large. You may wish to intersperse the standard logic which constructs SQL queries with custom logic to handle the business rules, in which case a solution for this is given in the same Gang of Four book in the form of the Template Method Pattern which is described as follows:
Defines the skeleton of an algorithm in an operation, deferring some steps to subclasses. It lets subclasses redefine certain steps of an algorithm without changing the algorithm's structure.
This is where an algorithm/operation requires a series of steps comprised of a number of invariant methods which have concrete implementations defined in the superclass, and variant/variable/customisable methods which do not have implementations unless they are defined in the subclass. Every subclass then shares the same invariant methods but has its own set of variant/variable/customisable methods. By following the guidelines in the Gang of Four book I have been able to create an abstract table class which contains 163 invariant methods and 95 variant methods. That is a LOT of code which is being shared.
It is not possible to take advantage of polymorphism unless you have the same method signature appearing in more than one class. These duplicate method signatures may appear by inheriting from the same superclass, but inheritance is not a requirement - it is possible to create several classes where the same method signatures are hard-coded instead of being inherited. How the duplicate methods get there is irrelevant, it is only the fact that they exist which matters. If you have a piece of code in object 'A' which calls a method in object 'B', but the same method is available in objects 'B1' to 'B99', you are then able to call that method in any of the 99 alternative objects. Although the method call is the same the object on which the call is made is different, so the results will be different as each of those 99 objects provides a different implementation of that method.
That explains the mechanics, but where can it be employed? I have already stated several important facts:
In addition my decades of experience with database applications has taught me the following:
Given the above it should be possible to create an object which encapsulates that behaviour and makes the method call on an object whose identity is not known until runtime. If you look at Figure 1 you should see that the methods in each Model (business/domain object) are accessed from a Controller, but unlike most novice programmers who create a separate Controller for each Model where the Model name is hard-coded, in my framework I have Controllers which call a known set of methods on an unknown object using a technique known as Dependency Injection. Each of the tasks in my application has its own component script which is very small as all it does is identify which Model and View are to be used before it hands control over to the Controller. Because the Controller accesses methods which are defined in the abstract table class, and because that same abstract class is inherited by every concrete table class (Model), you should see that the same Controller can be made to work with any Model.
If my framework contains 45 Controllers, and each of these can be used with any of the 400 Models in my ERP application, this means that I have 45 x 400 = 18,000 (yes, EIGHTEEN THOUSAND) opportunities for polymorphism. Is that a lot of reusability or what?
Using encapsulation, inheritance and polymorphism to create entity objects in the business/domain layer is one thing, but is it possible to create reusable service objects in the other layers? Unlike an entity which has state, data which can be accessed via multiple method calls, a service performs a single operation but does not have state, which means that it has to be provided with the necessary data on each method call. It could therefore be possible to create a single service object which can performs its function using any data instead of creating a different object for different sets of data. Below are examples of some of the reusable objects which exist in my framework.
I have seen more than one example on the internet where the novice programmer thinks that it is a good idea to create a different DAO for each table, but that is not how the DAO in the 3-Tier Architecture is supposed to work. In the first language that I used which incorporated a DAO this was an object which could deal with any table in a particular DBMS, which enabled the entire application to be switched from one DBMS to another simply by changing a single component. This behaviour is what I have duplicated in my PHP framework as I have available a separate DAO for each DBMS which I support. I started with MySQL, but later I added PostgreSQL, then Oracle and eventually SQL Server. This is made possible by the fact that each of the methods in the DAO which constructs and then executes the relevant query string includes in its arguments the database name, the table name, and the relevant column names with their values.
Some novice programmers question the idea of having a separate DAO as they think that once an application has been installed with a particular DBMS then it is highly unlikely that the DBMS will be changed. I would largely agree with that sentiment, but what about being able to choose the DBMS before the application is installed? I have developed an open source framework which can be downloaded and used by any team of developers, but I do not restrict it to be used with a single DBMS, I give the developer a choice. I have used the same framework to build a large ERP application as a package which can be used by multiple organisations, and this allows customers to choose the DBMS that they prefer before they install it.
In a web application all the screens which are shown to the user are constructed in the Presentation layer as HTML documents which conform to a standard which is (or supposed to be) supported by every web browser. Each HTML document is simply a large string of text with pieces of data enclosed in HTML tags. As I had become familiar with the use of XML and XSL in my previous language, where an XML document contains nothing but data while an XSL stylesheet contains a number of templates which can transform that data into HTML, I immediately saw the benefit of employing the same process in my own framework. I thus created a single View component which could extract the data from the Model(s), put that data into an XML document, load in the designated XSL stylesheet, then perform an XSL Transformation to create the HTML output which is then returned to the client's browser.
The View object does not contain any Model names as it functions using Dependency Injection where it is given an array of one or more objects and it calls standard methods on each of those objects to extract whatever data that it/they contain. These standard methods are inherited from the abstract table class, so they do not have to be duplicated in any Model class.
In my original implementation I had a separate XSL stylesheet for each different web page as they each required different columns to appear in different places with different controls. After producing a number of these stylesheets I could see a large amount of code which was similar while the only difference was the list of column names and their HTML controls. After a bit of experimentation I managed to remove the need for a multitude of customised stylesheets and replaced them with a small library of reusable XSL stylesheets. I did this by adding a <structure> element to the XML document which contains enough information to allow the XSL stylesheet to construct the variable application area within each HTML page. The contents of each <structure> element is obtained from a small screen structure script which also identifies the XSL file which is to be used.
By examining a large number of different web pages I could see a series of patterns emerging, and I managed to create a single reusable stylesheet for each pattern. This means that when creating a new task with a web page the developer does not need to spend any time on the standard parts of each page as this is already handled by the library of XSL stylesheets provided by the framework.
Not all tasks produce output which is displayed in a web page. Some use a PDF document, some use a CSV file, while some do not have any output at all. They simply do something without the aid of any dialog with the user before returning control to the component from which they were called. Just as I have a single class which produces all HTML output, I also have single classes for all PDF and CSV output.
The idea of having a small number of reusable Controllers is an impossible dream for novice programmers as the way they are taught to design their applications precludes this possibility. They are taught that when they have identified a task (use case or user transaction) that they should create a class with a method whose name corresponds with that particular task. Using this methodology they end up with method names such as
createShipment(). This is a totally bad idea as it means that in a large application with 3,000 tasks you end up with 3,000 unique method names, and this totally eliminates any possibility for code reuse via polymorphism.
As explained in Step 3 above you need to have identical method signatures appearing in multiple objects in order to provide the opportunity for polymorphism. Once you have done this you can produce components which call one or more of these methods on any of these interchangeable objects using the principle of Dependency Injection. As each of the above methods has the end result of inserting a record into a database table you can replace all those Controllers which call those unique methods with a single Controller which calls the generic
insertRecord() method on whatever table object is provided at runtime.
By recognising that each of the tasks (user transactions) in my application conforms to a pattern which may be repeated I have been able to create a library of reusable Transaction Patterns each of which has its own Controller script which calls a predetermined set of generic methods on one or more unknown Models (table classes). This means that instead of having a separate Controller for each Model which can only call methods which are unique to that Model I can have a separate Controller for each pattern of behaviour which can be applied to any Model. In this way I can apply the same pattern of behaviour to any table in my database by reusing the pattern instead of writing code to duplicate the behaviour.
I have so far identified and created 45 such Transaction Patterns which between them are used in over 3,700 tasks within my ERP application. Some of these patterns are used hundreds of times, some are used dozens of times while others are used only in rare circumstances.
Having code which you can reuse means that when you are writing a new component you can call this reusable code instead of having to duplicate it. In some cases this can make the creation of a new component so simple that it can be automated, which means that instead of writing the code yourself you can have it generated for you by pressing a button. Below are some of the code generation facilities which are included in my framework.
As pointed out previously my business/domain layer has a separate class for each database table. As a significant portion of the code which deals with the throughput of data from the Presentation layer to the Data Access layer and back again is essentially the same this has allowed me to identify a great deal of code which can be written once and then shared through inheritance from an abstract table class. Experienced database developers will immediately point out that each table has its own business rules, but these can be added in later as variant methods courtesy of my implementation of the Template Method Pattern.
Each table also has it own structure which provides the details for all those common characteristics, and it is these missing details which help to turn an abstract class into a concrete class. My previous language used an internal Data Model which was used to generate the necessary CREATE TABLE scripts, but in my PHP implementation I decided to do the reverse. Instead of maintaining the Data Model and then exporting to the database I import from the database into my version of the Data Model which I now call a Data Dictionary, then I export each table's data from the Data Dictionary into a series of PHP scripts which are then accessed at runtime. I originally created these PHP scripts by hand, but later I wrote a program to do it for me. In my Data Dictionary I built a process to export each table's data to two separate files:
The reason why I create two separate files is quite simple - after the class file has initially been created it may be amended later on to provide implementations for any of the variant/customisable methods, so this class file is never overwritten during the export process. It is also possible for the table's structure to change after it was first created, so all that is necessary is to re-import the changed structure and re-export the updated details. This will replace the contents of the structure file but not touch the class file.
As stated previously each task (use case or user transaction), regardless of its complexity, performs one or more operations on one or more database tables. This is a mixture of standard code which is provided by the framework and custom code to handle the business rules for that table and/or task. By using the Template Method Pattern the standard code is implemented within invariant methods which are defined in the abstract table class while the business rules are implemented within the variant/customisable methods which are defined within each concrete table class. The actual behaviour of each task has been implemented as a series of Transaction Patterns which have been built into the framework.
After generating a class file it is then necessary to create the tasks which allow the user to put data into and get data out of that table. In my framework each task has an entry in the MNU_TASK table in the MENU database which points to a small component script in the file system. I do not have a single task which performs all the possible operations on a table, instead I create a family of forms where each operation is carried out by a separate task. This is for reasons discussed in Component Design - Large and Complex vs. Small and Simple. Again I stated off by doing this all by hand, but later on I wrote another program to do it all for me. Within my Data Dictionary I now have a process which allows me to select a database table, select a Transaction Pattern, then press a button to create all the necessary database records and scripts.
At this point each task is in a basic but runnable state. Although the table class does not yet contain any variant/customisable methods for any specific business rules, it has enough logic to put data into and get data out of the database. Even the validation of user input is taken care of by the built-in validation object which checks all data with the contents of the table's $fieldspec array.
If you have been paying attention you should have noticed that once I have created a database table I can import that table's details into my Data Dictionary, press a button to create the corresponding class file, then press another button to generate the tasks to maintain that table which are then immediately runnable. This entire process can be achieved in 5 minutes without having to write a single line of code - no PHP, no SQL and no HTML. If you cannot achieve the same level of productivity with your methodology then I would suggest that you need to examine your approach and lean how to shift it into a higher gear.
Apart from creating reusable code by utilising encapsulation, inheritance and polymorphism, it is also possible to create it by building a library of functions and subroutines. Some libraries can be quite small while others can be quite large, but did you know that there is something which exists at a higher level above a library? In case you are a novice programmer who is still groping around in the dark I shall enlighten you - it is a thing called a framework. If you don't understand the difference between a library and a framework please read What is a Framework?
Instead of having to write your own code to call library functions a framework will create code with basic functionality which you can then alter by extending or overriding the framework code by adding implementations to the variant/customisable methods which are available through my use of the Template Method Pattern. This is the mechanism by which the flow of control is dictated by the framework instead of the caller. The invariant methods in the abstract class are always called, and the empty variant/customisable methods can be overridden in each concrete class to supply additional behaviour.
A proper framework will also contain built-in components to handle that functionality which is common to all the domains/subsystems within the application, for example:
The more reusable code you have at your disposal then the less code you have to write in order to get the job done. Being able to write less codes makes you more productive and more cost-effective. That's supposed to be a good thing, right? In my framework I provide the following categories of reusable code which, because they are pre-written, means that they do not need to be written again:
If writing less code is a laudable aim, then how about the ability to write no code at all? As mentioned above in Generating Tasks I can create a new table in my database, then simply by pressing some buttons I can create and then run the basic tasks to maintain that table in just 5 minutes without writing a single line of code - no PHP, no SQL, no HTML. How much code do you have to write?
Here endeth the lesson. Don't applaud, just throw money.
Here are some articles I have written on my framework:
Here are some heretical articles I have written on the topic of OOP: