Managing Complexity Part 2

Introduction

In Part 1 I introduced the idea of program complexity. In this article, I take a deeper look at the problem.

What is Complexity?

First though, let me define complexity in relation to programming. A complex program is a program where a large number of program components and/or data are interacting with each other resulting in a number of different possible outcomes. In the paper, Software Complexity - An Alternative View [1], the authors put it very well:

According to Morowitz the complex systems share certain features like having a large number of elements, possessing high dimensionality and representing an extended space of possibilities. Such systems are hierarchies consisting of different levels each having its own principles, laws and structures.

From the very beginning, complexity has been a problem in developing software as Nancy G. Leveson has stated in her paper Software Engineering: A Look Back and A Path to the Future:

Thus the first fifty years may be characterized as our learning about the limits of our field, which are intimately bound up with the limits of complexity with which humans can cope. Our tools and techniques are used to assist us in dealing with this complexity, that is, to help make our systems intellectually manageable. We do this by imposing on the software development process the discipline that nature imposes on the hardware engineering process. We have been learning what types of discipline are necessary and how to best enforce them [2].

The reason that complexity is such an issue is because it can directly affect the reliability of a program [3]. A reliable program correctly accomplishes the task it was designed to do. An unreliable program has errors or bugs. It may be aggravating to buy a buggy game, but it is lethal to have a program glitch in an airliner. As the world becomes more automated, the need to manage complexity becomes more important.

However, it is a hard fact that complexity has been, as still is, the biggest problem in software development. We see the influence of complexity every time we must update a program because bugs have been discovered and (hopefully) corrected. The need to manage complexity has given rise to innumerable techniques, tools and even programming languages and yet it is a problem that has yet to be solved and, in my opinion, probably never will be solved. Not only is it a product of our limitation as human beings to comprehend the complete scope of possibilities in large, complex software systems, there is simply no mechanism that exists that can determine with absolute certainty that a complex program will function correctly [4].

This does not mean however, that we can’t mitigate the negative influence of complexity. If we can understand the source of complexity, we can then take measures to manage that complexity. While not a complete list, this series of articles will address two areas of software complexity.

1. Program size and the number of interacting components. The complexity of a program increases proportionally (and possibly geometrically) as the number of interacting components increases [3].

2. The amount of interacting information, or data, within a program. The problem here isn’t the data [4], a number is a number, rather it is how that number is represented and manipulated. If function B requires a number from function A, and function A passes an incorrect number or a number in the wrong format, function B will either fail or pass along the incorrect number to other components, creating a chain of failure within the program.

There are many other factors that relate to program complexity, but these two areas pose the greatest risk to the software developer. The rest of this article will discuss a strategy to address the first point above, program size.

It’s a Problem

Software programs are solutions to existing problems. A computer chess game is designed to solve the problem of having a competent player when there isn’t a human to play. A data entry program is designed to solve the problem of getting customer information into a database in the most efficient manner possible. An operating system is designed to solve the problem of interacting with a computer. The problem may be vital or recreational, but all software is designed to solve a particular problem.

Generally speaking, it is the problem or more specifically, the scope of the problem that defines the size of a program. There are of course exceptions to this rule, but that is why they are exceptions. In most cases, the size of a program should be proportional to the scope of a problem that needs to be solved. By scope, I mean all the steps that are required to implement a solution to the problem.

Every problem can be dissected into smaller, intermediate problems that when solved, will solve the larger problem. Some problems are small, and have little scope. For example, programmatically displaying an ascii chart is a small problem. Basically you can dissect this problem into generation of the ascii characters and displaying the ascii characters. A simple two or three-subroutine program will accomplish this. A data entry program has greater scope because it has more to do, and has a bigger list of intermediate problems that need to be solved.

To keep program size manageable, it is necessary to understand all (as best we can) the intermediate problems that need to be solved in order to solve the bigger problem. By identifying the intermediate problems, we can efficiently write code to produce a concise solution, reducing the amount code in the program and reducing its complexity. This means that the first step in writing a program is defining the problem. I like the general to specific methodology when analyzing a problem.

The first step is to write a general statement of the problem that needs to be solved. For example, let’s say that we want to write a dungeon-crawler. A general statement might be, “Write a program that simulates exploration of a random dungeon in the spirit of the classic game Rogue.”

Here we have the problem, “exploration of a random dungeon” and a limit on the solution “in the spirit of the classic game Rogue”. This defines the problem that needs to be solved, and sets the boundaries on the solution of the problem. Virtually every program designed to solve a problem, will have to do so within set boundaries. By including the boundaries within the problem definition, the refinement of the action steps becomes easier.

Although it may not appear so at first glance, the above general statement contains a lot of valuable information. First, the problem is one of simulation. A simulation is a model of a real or imaginary world or process. In this case we are simulating the exploration of a dungeon and the implied assumption is that within our simulation, the dungeon is a real place. This means we can add elements to our dungeon such as water, traps, secret rooms, ambient sounds, darkness and light. A dungeon is usually a scary place, filled with unknown monsters, hidden treasures and possibly an artifact that needs to be recovered. The dungeon also needs to have a random layout, as defined in our general statement, so we need to be able to change the layout each time the player plays the game. We also need some sort of mechanism to display the layout of the dungeon to the player.

Of course, to explore a dungeon we need one or more characters in the game. If the dungeon has monsters, and what dungeon doesn’t, then our character will have to have a way to deal with these monsters, which means weapons and ammo. If the character gets into a fight, there has to be some way to determine if the character wins or loses, so there has to be some sort of combat system. Of course, losing a fight means death, so the player has to have a stat system, some way to measure the vitality, and possibly skills of the player. The character may find treasure that will need to be managed so we will have to create an inventory system. We will also need a mechanism to display this information the player.

If there are monsters, then we need to ask ourselves how many different kinds of monsters exist in the dungeon, how smart they need to be and how tough they are to fight. The monsters will need to have stats, enough intelligence to roam about the dungeon and to attack the player, or interact with the player if they are friend rather than foe. If the monster uses weapons, it will need to have an inventory system so that the player can loot the carcass when the monster is dead.

The constraints also give an opportunity to add to our preliminary list of intermediate problems. Since this game is going to be in the spirit of the classic game Rogue, we will use ascii characters for the game display elements. This means will need to display both the dungeon and any vital information within the confines of an 80 by 25 or 80 by 50 screen. Since the ascii character set is limited to 255 characters, and not all are printable, we will need to decide what ascii characters are used for what objects. Color is also an option that can be used to help display information the player.

As you can see, the general statement is a springboard for defining the set of intermediate problems that need to be solved. By examining the general statement we have been able to create the start of a list of intermediate problems that will need to be solved. This is an iterative process that will ultimately define the scope of the problem and the scope of our solution. Once we have a starting list, then we must revisit each intermediate problem and see if we can dissect them into smaller pieces. Once that task is done, each new intermediate problem is then dissected again, until finally we reach a point where an intermediate problem cannot be subdivided further. This is the point where we can apply code to solve all of our little problems. However, don’t break out the editor just yet. We can simplify the coding process by looking at our list one more time.

Overlapping Problems

At this point we should have a list that has various groupings and within those groupings, problems that need to be solved. Now, look through those groupings and identify those problems that overlap groups. That is, we probably have problems that are the same or very similar in more than one group. These overlapping problems are prime candidates for generalized routines, and offer us a way to reduce code size by creating one routine for both problem groups, rather than writing code for two problems. The rule here is: if you do it twice (or more), write it once.

For example, we have decided in our dungeon crawler that both the player character and monsters will have inventory items. The player character can pick up any item dropped by a monster. Here we have three different entities that can “hold” an inventory item, the character, a monster and the floor of the dungeon.

Let’s assume that an inventory item is a type-def and that each entity has an array of type-defs that describe the inventory list for that entity. The character has an array that describes the inventory in his backpack, the monster has an array that describes what it is carrying in each hand and on his body, and the floor has an array that describes an inventory item at each location. We have three different problems, but we can write one routine to address all three problems. The following pseudo-code illustrates this concept.

SUB SwapInventoryItem(Inv1 as InvType,  Inv2 as InvType)
        ‘Write the swap code here
    END SUB

    ‘Monster drops an item 
    SwapInventoryItem MonsterInv(RightHand), FloorInv(2, 5)

    ‘Player wants to pick up item
    SwapInventoryItem PlayerBackpack(1), FloorInv(2, 5)

With one routine, we can handle moving inventory from one location to another, no matter the source and destination entities. Now, we look at the list again and ask ourselves, can we extend this even farther?

In order for the character to use an item, the character must equip an item. This means we are moving an inventory item from the backpack to the right-hand slot, for example. If an item is already in the right-hand slot, it must be moved to the backpack. Another case of swapping an inventory item, and our generalized swap routine will work just fine for this case. The player will need to drop an item to the floor if it becomes useless and so we have another case where we can use our general swap routine. By looking at the overlapping problems, this single routine has addressed several problems, and has reduced the code we need to write by a considerable amount.

Oops, I Forgot

Even the best program outlines will not cover every aspect of the code needed to implement a program solution. In all my years of professional programming, I have yet to see an initial design document completely describe a project. This is just another aspect of complexity.

When unanticipated problems arise, the first thing to do is to step away from the code. Do not write one line of additional code. Take this new problem and begin the analysis process, dissecting it into smaller problems, look for code overlap and fit it into the overall scheme of the program outline. By doing this immediately as the problem arises, it may be possible to fit the problem into existing code modules. Even if the problem requires new code, the analysis will enable us to mentally fit the problem into the overall strategy of our program solution. Knowing where to put a problem is as important as understanding the problem.

Code Organization

Complexity is a function of our inability to grasp the totality of a program as it grows in size. There are three simple, yet effective techniques to enable us to better visualize our program structure. The first is to use meaningful names for our program data and structure. Most modern programming languages allow more than 8 bytes of name space so using a subroutine name such as SwapInventoryItem is a lot more informative than SWI. You may remember what SWI is when you are coding it today, but will you remember a month from now?

The second technique is to break up the source code into meaningful sections. Most modern languages allow the use of Includes, different source files that when combined create one program. In our dungeon crawler we could break up the program into an inventory include, combat include, data definition include, and so on. By arranging code according to problem groups, we will know where to look when we must update or fix a section of code. It enables us to mentally get a handle on the code. Even if the programming language doesn’t have an Include command, you can still group code into sections within a single source file, and get many of the same benefits.

The third technique is as important as the others and yet is the most overlooked. Commented code. A simple one line comment on what a subroutine is doing will go along way toward reminding us what we were thinking when we wrote the code three months ago. It takes so little time to write a comment, yet will offer immense benefits when we must revisit code at a later date when our memory has been filled by other programs and other problems.

A Final Word

From the very beginning, complexity has been the bane of the programmer and for the near future at least, will continue to be so. However, we can manage that complexity, at least to a certain degree, and make our lives a bit easier. Having a clear picture of what we are trying to accomplish, reducing the size of a program and organizing our code will enable us to get a better grasp on our programming projects and make our programs much more reliable.

Reference

1. Peter Kokol, Janez Brest, Viljem Žumer, Software Complexity - An Alternative View,
http://lsd.uni-mb.si/metrics/articles/smc96/SMC96.html.

2. Nancy G. Leveson, Software Engineering: A Look Back and A Path to the Future,
http://sunnyday.mit.edu/cacm.html.

3. Edsger W. Dijkstra, The Humble Programmer,
http://www.cs.utexas.edu/~EWD/transcriptions/EWD03xx/EWD340.html.

4. J.P. Lewis, Mathematical Limits to Software Estimation Supplementary Material,
http://www.idiom.com/~zilla/Work/Softestim/softestim.html.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-Share Alike 2.5 License.