Joel Rosi-Schwartz once advised me that if I ever found myself solving the same problem for a third time I should stop and take the opportunity to improve my efficiency by thinking of a better way to do it. Joel was very disciplined about this and over the years had developed a big bag of tricks for solving common problems encountered during systems design and development. In many ways Joel was an inspiration for me and this is one of his traits I have tried to emulate.

In my career I have designed more than a few systems and reviewed the designs for many more. There were times when I reviewed two or three in a week and others when I went 6 months between reviews, but they have never stopped coming. Following Joels advice I have tried to incrementally improving my approach to both designing systems and reviewing others designs by developing a set of aphorisms that remind me what should be present in a good system design.

1 Everything in moderation and nothing to excess

Ok so this is a platitude not an aphorism, but it is amazing how often system designers, myself included, carry otherwise good ideas to extremes. Knowing when to stop is sometimes the hardest skill of all to master. Too many designers solve problems that don’t exist, reinterpret requirements to fit their preconceived idea of an appropriate solution, or refuse to take advantage of new techniques in the name of risk reduction. Every solution and technique has an appropriate range of applicability, going beyond this range generally causes more problems that it solves. Practicality and reasonableness coupled with a willingness to use new techniques if they are better than old ones are characteristics that all systems designers would do well to cultivate.

2 A good system design is based on a sound conceptual model (Architecture)

Many of the system designs I have been asked to review were little more than arbitrary collections of sub-systems and components. It was obvious that little or no thought had been given to the overall interaction of these parts. The Shorter Oxford English Dictionary defines the word Architecture as “The conceptual structure and logical organization of a computer or computer based system.” I take it as self-evident that a system design which has no conceptual structure and little logic to its organization is going to be ultimately unsuccessful. If you don’t agree you should probably stop reading now. This begs the question: what are the attributes of a sound conceptual model that allow it to support the development of a good system design?

3 A sound conceptual model accounts for all system requirements at a reasonable level of abstraction

What constitutes a reasonable level of abstraction will obviously vary from system to system and will necessarily depend on the requirements, but in general I believe a reasonable level of abstraction is reached when a balance is achieved between specificity and generality.

3.1 A conceptual model is sufficiently generalized when it can account for all significant use cases in a concise way that reduces complexity by consolidating similar features.

There are two important concepts here. The first is the concept of a significant use case. For any given system there are a vast number of potential use cases only some of which are significant. Determining the relative significance of use cases is an art not a science. Ultimately only the user of the system can determine the relative significance of use cases. However, most users concentrate on the normal operation of the system being designed and fail to consider more exceptional scenarios. It is up to the system designer to include use cases that go beyond the normal operation of systems. System designers commonly fail to do this and often neglect system validation (testing), installation, commissioning or recovery from failure when defining use cases.

The second important concept is that of reducing complexity by consolidating similar features. This is where Joel’s rule of threes can be usefully applied. If there are three or more features of a system that display significant similarity it is worth considering consolidating them into a single, more generalized, feature. However a reduction in complexity that leads to a corresponding reduction in fidelity could be a misapplication of Occam’s razor and should be avoided.

There is of course a counter balance to generalization, namely specificity.

3.2 A conceptual model is sufficiently specific when it is possible to demonstrate how a system design based on the model will achieve measurable targets for required system attributes

A clear understanding of the difference between functional requirements and system attributes is required to appreciate this statement. A functional requirement can only be present or absent – a system is capable of performing a function or it is not. Functional requirements can be derived from use cases they describe the behaviors the system is capable of displaying.

By contrast system attributes, or qualities as they are sometimes called, define the way in which the system performs its functions. Attributes commonly include qualities like reliability, recoverability and availability , but can include more exotic qualities if needed, such as traceability, augmentability, and autonomy. In his book Principles of Software Engineering Management by Tom Gilb does a good job of describing how to define measurable systems attributes as part of a requirements specification process. The relevant point here is that a sound conceptual, model will reflect the system attributes it is required to deliver and it will also make allowance for measuring target values for these attributes.

I have seen many designs containing valid abstractions that failed to support required system attributes. The abstractions were not grounded, they lacked specificity, they were too abstract. The following examples illustrate the point:

  • A system for running an energy trading market, with very strict performance requirements, had a highly generic design that reduced efficiency. There was no clear demonstration of how the generic design, which had limited benefits, could support the required performance. (processing efficiency)
  • A system with no intrinsic support for testing was to be implemented in a life critical environment. The system lacked any details around how it would support such an obvious requirement as the ability to prove correct operation before it was implemented in a hospital. (verifiability)
  • A massively complex pipeline management system that required multiple levels of abstraction had no internal diagnostics mechanism. As a result it would not be possible to isolate defects and remove them. (diagnosability)
  • The user interface for a scientific data warehouse exposed all the advanced configuration features to novice users without setting reasonable defaults. Most users were administrative staff with little to no understanding of these features and no reason to learn them. The conceptual model emphasized flexibility and did little to support usability even though the limited knowledge of the target audience as well understood. (usability)

Some might say these are all missed requirements. They are, but they are not missed functional requirements. They each represent a failure of the designers to imbue the conceptual model (the Architecture) with support for the required system attributes. They represent a lack of specificity in how conceptual model supports the system requirements.

4 A good conceptual model is easy to communicate

A sound conceptual model that accounts for all the system requirements at a reasonable level of abstraction is only half the battle. It must also be understandable. All too often a great architecture leads to a mediocre design, a poor implementation and an unsatisfactory solution. At each stage something gets lost. Extreme programming techniques can help but even then solutions don’t always live up to the original concept. The most commonly required system attribute is also one of the least often considered – Ease of communication of the conceptual model on which the design is based is essential for successful solutions.

CORBA failed to gain wide acceptance because it was too complicated and led to more problems than it solved when placed in the hands of the average programmer. And that is the key point – Even the best architectures in the world are usually implemented by practitioners with average skills. We can’t all be James Gosling. Which is why he did not put multiple inheritances and operator overloading into Java.

I believe there are at least three ways to improve the ease with which a conceptual model can be communicated.

4.1 A conceptual model is easier to understand and communicate if it is; coherent, Logical in the relationship of its parts, Aesthetically consistent.

4.2 A conceptual model is easier to understand and communicate if it is analogous to a commonly experienced, tangible, real world system.

4.3 A conceptual model is easier to understand and communicate if it is anthropomorphized: Made to mimic human behavior, characteristics and modes of interaction.

The last point draws directly from modern studies in evolutionary psychology. In his excellent book, The Origins of Virtue, Matt Ridley describes an experiment in which two groups of students where presented with the same problem. For one group the problem was described mathematically as a set of simultaneous equations. For the other group it was presented as a problem in social dynamics in which the group had to identify who was lying in a complex situation by analyzing individual motivations. The problems were computationally identical all that differed was the presentation. The experiment clearly showed that people are far better at evaluating social situations than they are at solving mathematical problems. This is interesting and not surprising, as people have evolved over millions of years to be good at analyzing and solving social problems.

If the conceptual model for a system is deliberately made analogous to a real world, commonly experienced, system and that system is one that involves social interaction (ie it is anthropomorphized), then communicating the design becomes a great deal easier. Another interesting feature of such conceptual models is that the analogy itself often suggests possible enhancements and steers the designer or implementer away from problem areas. In defining use cases it is common practice to identify actors. I have found it useful to assign motives and personalities to these actors. This may seem bizarre but it helps directly with clearly identifying appropriate separation of concerns between system components. When faced with design decisions it is often helpful to ask “what would this actor want to do in this situation?” or “Is this responsibility in keeping with the personality of this actor”. This line of reasoning would have prevented many of the poor design decisions I have made in the past.

Summary

  1. Everything in moderation and nothing to excess
  2. A good system design is based on a sound conceptual model (Architecture)
  3. A sound conceptual model accounts for all system requirements at a reasonable level of abstraction

    3.1 A conceptual model is sufficiently generalized when it can account for all significant use cases in a concise way that reduces complexity by consolidating similar features

    3.2 A conceptual model is sufficiently specific when it is possible to demonstrate how a system design based on the model will achieve measurable targets for required system attributes

  4. A good conceptual model is easy to communicate

    4.1 A conceptual model is easier to understand and communicate if it is coherent – Logical in the relationship of it’s parts – Aesthetically consistent.

    4.2 A conceptual model is easier to understand and communicate if it is analogous to a commonly experienced, tangible, real world system

    4.3 A conceptual model is easier to understand and communicate if it is anthropomorphized – Made to mimic human behavior, characteristics and modes of interaction

In the Beginning there was one.

The first node of the ARPANET at the University California Los Angeles (UCLA) on the 2nd September 1969

This is the first map of The Internet. It shows the first node on the ARPANET at the University California Los Angeles (UCLA) on the 2nd September 1969. The diagram is taken from Casting the Net: From ARPANET to INTERNET and beyond by Peter H. Salus and was drawn by Alex McKinzie who worked for BBN. Any Travelog needs maps. For a good catalog of Internet cartography checkout The Atlas of Cyberspaces.

In July 1851 two mathematics teachers, Prof. Adolf Anderssen (1818 -1879) from Breslau, and Lionel Kieseritzky (1806-1853) from what is now Estonia, played a game of chess at Simpsons on the Strand, a London chess Salon. The game was so startling in its brlliance that in 1855 it was named The Immortal Game by the Austrian player Ernst Falkbeer. The chess Canon contains very few named games. This game is considered by some to be the greatest ever played. It has been been studied and replayed for over 150 years.

The applet used to display this game is called PGN Viewer and can be used to display any chess game or fragment of a game. It is freely available from Chess Tempo. A Portable Game Notation (PGN) file with my annotation of this game is here. PGN is a nicely specified notation that allows the storage and transfer of annotated chess games. Most serious chess software allows for import and export of PGN games.

The measurement and management of time is something most people give little thought to. But when designing Internet based systems time presentation, manipulation and management can rapidly become a major headache. Just when you think youve got it nailed some other unanticipated problem arises. The regular failure of systems designers to get this right is a classic example of the principle of inappropriate parsimony.

The world is a sphere that rotates on its axis once every 24 hours and takes 365 days to rotate around the sun, more or less. Its very simple. But, the more or less part is vitally important. The world is not actually a sphere it is an oblate spheroid. It does not rotate every 24 hours it takes a ever-so-slightly longer than that and its getting slower, it wobbles on its axis which by the way is inclined to the plane of its rotation around the Sun. And as we all know it takes about a 1/4 of a day longer than a year to rotate around the Sun, more or less, and this time period is also increasing. Add to this the vagaries of international politics, national pride and public safety and you have a pretty complex system. All these factors affect the measurement and management of time. Given this complexity its not difficult to see why system designers try to simplify the conceptual model of the real world on which they base their solutions. However, this is the wrong thing to simplify. The solution can and should be simplified but the conceptual model on which it is based must be high fidelity not an approximation.

Time measurement in the real world

International Time Zones

Imagine the earth as a giant orange with 24 equally sized segments. Each segment takes up 1/24th of the earths circumference at any latitude. If two observers are standing on the same line of latitude, for example the equator, and they are exactly one segment (15 degrees) apart then there will be one hour difference between each observers local time. One of the observers will experience midday exactly 1 hour before the other. If the observers are exactly 2 segments apart then there will be 2 hours difference and so on.

By international treaty these segments are called international time zones. The first segment is called UTC-0 and is centered in the Greenwich Meridian that passes through both poles and Greenwich in London, England. This is the 0 degree line of longitude. International Time Zones are numbered, east of London they are positive and west of London they are negative as follows.

(…. UTC-3, UTC-2, UTC-1, UTC0, UTC+1, UTC+2, UTC+3…..)

In case you were wondering UTC-12 does not exist it is called UTC+12. The numbers refer to the offset in hours from the Greenwich Meridian. So UTC-8 is 8 hours behind UTC0. The time in the UTC0 time zone is considered to be the “base” time. All other times are relative to UTC. UTC stands for Coordinated Universal Time or just Universal Time. (Yes! the Acronym has the letters the wrong way round. The French insisted on having it that way!). “Greenwich Mean Time” is no longer the international base time it has been replaced by UTC.

The main point to understand is that each International Time Zone defines a precise geographical area of the earth’s surface and an offset in hours from Universal Time. Each area is exactly 15 degrees wide, they cut across countries without regard to national borders or the decisions of local governments. But we all know that time often changes as we cross state borders not as we cross some arbitrary line defined by geometry. So what is going on? Thinking of International Time Zones as geometrically rigid is not a common practice as this map shows. Most people and even organizations confuse International Time Zones with Local Civil Time Zones. Even the name of this map is oxymoronic. There is nothing “standard” about the areas it defines, they change regularly forcing the map to be redrawn on a yearly basis.

Map of local civil time zones overlaid on International Time Zones

Local Civil Time

Each country in the world, and sometimes each administrative subdivision within a country, aligns itself with an International Time Zone and bases it’s own definition of Civil Time on this alignment. Choosing which International Time Zone to align with involves the evaluation of several factors; Maximizing useful daylight hours, the Time Zones chosen by trading partners, national pride etc.

For Example the west coast of North America including administrative sub-divisions in Mexico, the USA (but not Alaska) and Canada have all aligned themselves with the UTC-8 International Time Zone even though the west coast of Canada lies mostly in the UTC-9 International Time Zone. The result is a region with the same offset in hours from UTC running from the Yukon Territory of northern Canada to half way down Baja California, Mexico. Each country and subdivision has named its Local Civil Time; in Canada it is called Pacific Time as it is in the USA but in Mexico it is called America/Tijuana Time.

Countries cannot arbitrarily redefine or rename International Time Zones all they can change is their own definition of Local Civil Time. So Local Civil Time in India is defined as being 5 ½ hours ahead of Universal Time. This is convenient for India because of it’s geographical location (straddling the boundary between UTC+5 and UTC+6) but there is no International Time Zone called UTC+5.5.

Much confusion results from the naming of International Time Zones UTC+3 is the name of a geographical area that is 3 hours ahead of Universal Time. However the name is also convenient shorthand for the offset from Universal Time. This leads many people to concluded that UTC+5.5 is the name of an International Time Zone. While it is plausible it is not accurate. There are only 24 International Time Zones.

Daylight Saving

Daylight Saving is the process of changing Local Civil Time, by adding or subtracting a fixed time period, usually and hour. This adjustment is carried out seasonally and has the effect of extending the amount of daylight available during the standard 9-5 working day. It has been shown that this simple adjustment can save lives by allowing people to travel to and from work during daylight hours. Continuing the example above there are two versions of US Pacific Time, Pacific Standard Time and Pacific Daylight Time. Pacific Standard Time is UTC -8 hours. But Pacific Daylight time is UTC -8 hours, + 1 hour, (effectively UTC -7 hours).

In the Northern Hemisphere the switch to daylight savings usually happens on the first Sunday of April at 2 am by adding one hour. The switch back usually occurs the on last Sunday of October again at 2 am by subtracting one hour. In the southern hemisphere everything is reversed. Daylight saving starts in September or October and ends in March or April.

From a systems design view point handling International Time Zones and Daylight saving is tricky. The end of daylight saving is the most disruptive time. Since the same hour is repeated. This can lead to severe problems if datestamps are recorded using Local Civil Time. Many designers do not realize that the system works differently in the northern and southern hemispheres, and Designers often assume that International Time Zones are the same thing as Local Civil Time.

Leap Years and Leap Seconds

Everyone knows that there are 365 days in a year, but every 4 years the Gregorian calendar makes a correction by adding an extra day because there are actually 365.25 days in a year. But this itself is merely an approximation. The rule for leap years is if it is divisible by 4 unless it is also divisible by 100 unless it is also divisible by 400 in which case it is a leap year again (So the year 2000 was a leap year!)

Every now and then leap seconds are added or taken away from UTC. This is achieved by adding, or subtracting, a second from the last minute, of the last day, of June or December. The last minute of 1996 was 61 seconds long. The reason for this adjustment is complex. If you want to understand then you need to look up the difference between International Atomic Time and Coordinated Universal Time. So far all leap seconds have been positive additions.

Recording Dates and Times for Internet Systems

One clean solution to the above problems is as follows

  • All Times must include the date as well since, as you will see, the date can change depending on where we choose to display the time.
  • All Times are recorded in UTC.
  • All Times are recorded with the Name of the local Civil time used to record them and a few denormailzed attributes (Offset and Daylight Savings) for efficiency.

In this example dates are formatted as follows YYYY-MM-DD and all times are 24HR:MM:SS. This is how you would record 10 pm on May 18th 2002 in San Francisco

UTC Civil Time Offset from UTC Daylight Saving
2002-05-19 5:00:00 Pacific Daylight Time -8 +1

Notice that the days are different. In San Francisco it is still the 18th but at Greenwich it is the 19th. The benefit of this way of recording dates is that all dates recorded anywhere in the world can be compared. Since they are all recorded in UTC we can order by the UTC value and get everything in chronological order. Furthermore we can recreate the format of the original entered date and time easily and any other Time Zone without too much trouble.

2002-05-19 5:00:00 Subtract 8 hours and then add 1 hour = 2002-05-18 22:00:00 (Pacific Daylight Time)

This even gives us a way to untangle switches from daylight saving back to normal time. Consider the following. (Remember the switch from Daylight Saving back to Pacific Standard Time happened at 2 am on October 27th).

UTC Civil Time Offset from UTC Daylight Saving Time in San Francisco
2002-10-27 09:30:00 Pacific Daylight Time -8 +1 2002-10-27 02:30:00 (Pacific Daylight Time)
2002-10-27 10:30:00 Pacific Standard Time -8 +0 2002-10-27 02:30:00 (Pacific Standard Time)

This demonstrates why recording UTC is essential to avoid repeating the same timestamps when switching from daylight savings back to normal time.

What this model lacks is a way to lookup when daylight saving comes into effect. For this a lookup table is required as follows

Civil Time Lookup Table

Column Name Example Value
Year 2002
Local Civil Time Name Pacific Time
Country USA
Administrative Subdivisions Oregon, Washington, California
Daylight Saving Time Name Pacific Daylight Time
Daylight Saving Time Offset +1
Daylight Saving Time Start 2002-04-07 10:00:00 UTC-0
Daylight Saving Time Stop 2002-10-27 09:00:00 UTC-0
Standard Time Name Pacific Standard Time
Standard Time Offset 0
International Time Zone Name UTC-8
Offset from UTC -8

When recording a time with this model you need to know the date and time and the civil time that is being used. Then a check on the lookup table can be used to set the required parameters such as offset and the daylight savings value. This is not particularly efficient since it requires a lookup every time a date-time is recorded.

Finally if you are going to all this trouble you really need access to a reliable system clock for accurate timestamps. This can be done by configuring Network Time Protocol.

There is one thing this model does not handle. In the event that a negative leap second is ever required this model will not work. The second in question will be repeated. This seems unlikely. Since the earth is slowing down in its rotation around the sun not speeding up.

There is a class of design problem that can mislead the unwary systems designer, myself included, although I am getting better at identifying the warning signs. These design problems require the designer to base the solution on a conceptual model of a real world system or process. It often appears that a simple conceptual model that approximates to the real world system will suffice. In practices solutions based on these simple, low fidelity, models fail to handle the problem completely and often causes a whole series of new problems. Redesigning the solution to handle these exceptions only produces more problems that require more redesign and so on. If the hapless designer persists s/he will often go through several complete redesigns before getting to a solution that finally solves the problem by modeling the real world system with a high degree of fidelity. This iterative redesign process is typical of this class of design problem. Some might say that only poor designers are ever trapped in this way, others will say these solutions are anti patterns. But I think there is more too it.

The type of design problem under consideration here is particularly intractable because there exists a series of approximate conceptual models of increasing fidelity and complexity. Faced with situations like these many systems designers will claim to be applying Occam’s razor when they opt to base their solution on the simplest conceptual model. But, a solution based on an approximation is only as good as the approximation. The only way to improve such a solution, if it is insufficient, is to replace the low fidelity conceptual model with one of higher fidelity. The worst type of problem is one that has many plausible conceptual models each of slightly higher fidelity and complexity than the last. The slavish misapplication of the principle of parsimony will condemn our systems designer to step through each successively higher complexity model until they finally reach one with the required fidelity.

Occams Razor – When faced with several explanations of a phenomenon One should always choose the simplest, the one that requires the fewest leaps of logic

This leads to a tentative conclusion: Occam’s Razor is no good for selecting between alternative models if the alternatives are approximations with differing fidelity. A simple low fidelity model cannot be compared with a complex high fidelity one.

An old joke comes to mind. An engineer, a physicist, a mathematician and a biologist were asked to define Pi: The engineer said About 3, the physicist said 3.14159 +- 0.00001, the mathematician said The circumference of a circle divided by its diameter and the biologist said What is Pi. Choosing the least complex, lowest fidelity model is not always the right answer! Too many systems designers think there is virtue in always assuming Pi should be about three.