What They Don't Teach You In AP CSP - Lesson Two - Program Design Principles and Techniques

Sunday, August 27, 2023

An important topic I believe should have been mentioned in my AP CSP class is the proper design of programs. Although one learns to write basic programs, one does not learn how to write programs with solid foundations that are easy to build upon. To aid in understanding and ease of modification, build your programs with a sense of proper design and structure.

Poor design leads to programs that are hard to extend, hard to understand, and hard to modify. It may not be apparent or seem to make much of a difference with small projects that are not modified later, but it's best to follow good design practices so you get the benefits for free, which are useful in small projects but truly indispensable in larger ones.

Common Design Problems

Among projects I've seen from people who were in AP CSP, these were some of the most common design problems:

Inappropriate data structure or representation
Repeated or similar code
Bad variable/function names
Poor representation of logic
Too much mutable state
Inappropriate scope

I will explain each of these and how I usually rectify them in the following sections.

Inappropriate Data Structure or Representation

Explanation

This design problem happens when a program requires some data but does not store or represent it in an optimal fashion. The most common form of this problem in student projects was using multiple lists in parallel for the storage of data.

For example, you may have statistics on the popularity of songs. Each entry includes the name of the song, the artist, and the number of plays in the last week. A poor (but unfortunately common) representation for this is three lists - one for names, one for artists, and one for plays in the past week - that all have the same order for one statistic, that is, index i in a list with characteristic c represents the ith statistics' characteristic c. To get one statistic including all three characteristics, you index each list at the same position, and do something with each characteristic.

How to Rectify

In this case, and whenever you want to store some custom data that must be represented as a collection of different values representing different characteristics of the data, you should use an object (or dictionary) to store multiple pieces of related data as one larger unit. If you define a class, you could be sure about the fields you can access as well. You can use both inside other objects and data structures, pass them to functions or receive them from them, and manipulate them much more conveniently.

Example

A project I once redesigned for a friend was a quiz program. It was a history quiz where each question about when something happened had one of these three answers: the '70s, the '80s, and the '90s. For representing questions and answers, he used two lists - one that held the questions and one that held the answers (it was a more internal representation for answers, but the idea still holds). Both of the lists were always "lined up", so an element in one list would correspond to another element in the other list at the same index (it's not so hard to keep this alignment when the lists aren't changed, but there are much better methods of access that don't require this extra thought).

This is the original code - sic and verbatim, except formatting.

//The list of questions for the app
var listQuestions = [
    "The first microprocessor, the intel 4004, came out when?",
    "When was the first mobile phone call made?",
    "When was the first digital camera made?",
    " When was the 3 ½ inch floppy disk made? ",
    "When was the first Mac computer released? ",
    "When was the first Game Boy released?",
    "When was the world wide web released to the general public?",
    "When was the first text message sent? ",
    "When was Linux released?"
];

//List of identifiers for correct answers (matches up to the dacade)
var listCorrectawnsers = [0, 0, 0, 1, 1, 1, 2, 2, 2];

This is an inappropriate data structure. What we should have is a custom object that holds a question and an answer. This is what I did in my redesign, shown below.

var questionsAndAnswers = [
    new TrackedQuestionAnswerPair("The first microprocessor, the intel 4004, came out when?", 0),
    new TrackedQuestionAnswerPair("When was the first mobile phone call made?", 0),
    new TrackedQuestionAnswerPair("When was the first digital camera made?", 0),
    new TrackedQuestionAnswerPair("When was the 3 ½ inch floppy disk made?", 1),
    new TrackedQuestionAnswerPair("When was the first Mac computer released?", 1),
    new TrackedQuestionAnswerPair("When was the first Game Boy released?", 1),
    new TrackedQuestionAnswerPair("When was the world wide web released to the general public?", 2),
    new TrackedQuestionAnswerPair("When was the first text message sent?", 2),
    new TrackedQuestionAnswerPair("When was Linux released?", 2),
];

// --snip--

function TrackedQuestionAnswerPair(question, answer) {
    this.question = question;
    this.answer = answer;
    this.shown = false;

    this.reset = function () { this.shown = false; };
}

This is more appropriate. Admittedly, this could be better - removing the "magic numbers" 0, 1, and 2, and keeping the responsibilities of TrackedQuestionAnswerPair inside of itself, for a start - but this is already much better. We now no longer have the mental load of remembering we have two arrays representing what should be one array of data. We can add more responsibilities to TrackedQuestionAnswerPair painlessly, while there was previously no concept of "internal responsibility" before.

Especially if you're going to modify it later, don't store data in an inappropriate structure!

Repeated or Similar Code

Explanation

This idea is often called "Don't Repeat Yourself" (or DRY for short).

Although functions were taught as a way to reduce code repetition, students were not taught the more advanced and helpful features of functions (such as lambda functions, higher-order functions, partial function application, etc.), and so they did not use functions to their full potential.

Times with pure and obvious repetition were pretty rare, which is good, but not all repetition happens verbatim, as you'll see in the example.

How to Rectify

In general, think about introducing some abstraction not only when you are writing the same code, but when you are writing code with the same idea as some previous code. Consider using methods like the ones I've described here.

Example

This project is a custom music maker program. You can click on a number of different buttons, which each play a different sound. You can also make a recording (only records notes, not time) or a random song. I will focus only on the buttons you can press to make different sounds. The creator chose to implement this like so:

var noteA = "piano-a_A_major.mp3";
var noteB = "piano-b_B_major.mp3";
var noteC = "piano-c_C_major.mp3";
// And so on, for every sound. 21 lines.
// --snip--

onEvent('keyA', 'click', function () {
    playSound(noteA, false);
    appendItem(recording, noteA);
    updateSoundsUsedValue();
});
onEvent('keyB', 'click', function () {
    playSound(noteB, false);
    appendItem(recording, noteB);
    updateSoundsUsedValue();
});
onEvent('keyC', 'click', function () {
    playSound(noteC, false);
    appendItem(recording, noteC);
    updateSoundsUsedValue();
});
// And so on, for every button. 105 lines.

This is a bad design. Think about how it would handle change. If you change the ID of a button, you need to change one place in one line (good). If you change the sound that plays with one button, you need to change two places in two lines (bad). If you change the method of storing a recording, you need to change one place in one line times the number of buttons, that's 1 * 1 * 21 places to change, the very same way (awful)!

Obviously, this would benefit from a better design, one that abstracts this complexity into something more singular and easier to change later. Here's how I approached this problem:

var noteA = 'piano-a_A_major.mp3';
var noteB = 'piano-b_B_major.mp3';
var noteC = 'piano-c_C_major.mp3';
// And so on, for every sound. Same as original. 21 lines.
// --snip--

(function () {
    var IDsToSounds = {
        keyA: noteA,
        keyB: noteB,
        keyC: noteC,
        keyD: noteD,
        keyE: noteE,
        keyF: noteF,
        keyG: noteG,
        drumAMinor: drumAMinor,
        drumCMinor: drumCMinor,
        drumSnare: drumSnare,
        drumClap: drumClap,
        guitarStringAMajor: guitarStringAMajor,
        guitarStringBMajor: guitarStringBMajor,
        guitarStringCMajor: guitarStringCMajor,
        trumpetDMinor: trumpetDMinor,
        trumpetC: trumpetC,
        trumpetLoud: trumpetLoud,
        trumpetForte: trumpetForte,
        wooAdlib: wooAdlib,
        yeAdlib: yeAdlib,
        heyAdlib: heyAdlib,
    };

    Object.keys(IDsToSounds).forEach(function (id) {
        registerEventHandler(id, IDsToSounds[id]);
    });

    function registerEventHandler(id, sound) {
        onEvent(id, 'click', makeSoundPlayer(sound));
    }
})();

function makeSoundPlayer(sound) {
    return function () {
        playSound(sound);
        recording.push(sound);
        updateSoundsUsedValue();
    };
}
// That's all, and it's mostly data. 41 lines.

Obviously, this could be better - we could have all IDs match the name of a sound, so we wouldn't need this mostly repetitive dictionary, and we could introduce a class like SoundPlayingButton that takes care of playing and storing its own sound - but this is an improvement, and fixes the problem this section mentions. To prove that it has, think about how this design would handle change.

If you change the ID of a button, you need to change one place in one line (good). If you change the sound that plays with one button, you need to change one place in one line (good). If you change the method of storing a recording, you need to change one place in one line (good).

Indeed, writing programs that are easy to change later should be your goal. If you have to make changes in multiple places to change one thing, your design is likely poor, and you should consider a way to abstract away the repetition.

Bad Variable or Function Names

Explanation

Variable and function names should be self-explanatory. Although there are exceptions, for example, it's fine for very abstract functions you define like map, flatMap, or compose that declare a function parameter or two to refer to them as f or g, or for iteration mechanisms to use a variable like i, but in more concrete code, you benefit greatly from descriptive names.

How to Rectify

If the name of something does not make its purpose clear to an outsider, consider renaming the offending item. Modern IDEs have tools to make this quite painless and less error-prone, so this is probably the easiest problem to fix in this list.

Example

This project is a clicker game. You can buy clickers, which will add one money times the number of clickers you have per second. Every time you buy a clicker, you must pay 5 money more to buy another. You can also pay to receive more money per click, which also works for the clickers. Every time you pay to get more money per click, you must pay 50 money more to do it again.

Now, take a look at the code. It's mostly the same, although I formatted it, removed some useless parentheses, and removed useless commented-out code.

var cp = 100;
var clicks = 0;
var clickers = 0;
var cm = 1;
var cmp = 1000;

onEvent('button1', 'click', function () {
    clicks = clicks + 1 * cm;
    setText('label1', 'Money: ' + clicks);
});
onEvent('button2', 'click', function () {
    if (clicks >= cp) {
        clicks = clicks - cp;
        clickers = clickers + 1;
        cp = cp + 5;
        setText('label1', 'Money: ' + clicks);
        setText('button2', 'Buy clicker (' + cp + ' money)');
    }
});
onEvent('button3', 'click', function () {
    if (clicks >= cmp) {
        clicks = clicks - cmp;
        cm = cm + 1;
        cmp = cmp + 50;
        setText('label1', 'Money: ' + clicks);
        setText('button3', '+1 money per click (' + cmp + ' money)');
    }
});
timedLoop(1000, function () {
    clicks = clicks + clickers * cm;
    setText('label1', 'Money: ' + clicks);
});

What is cp? What is cm, or cmp? Why is the amount of money called clicks, which is misleading? Try and figure it out, if you want to. Without changing anything else, I will rename these variables to be more descriptive, and see how much easier it is to understand.

var clickerPrice = 100;
var money = 0;
var clickers = 0;
var numberOfClickers = 1;
var moreMoneyPerClickPrice = 1000;

onEvent('button1', 'click', function () {
    money = money + 1 * numberOfClickers;
    setText('label1', 'Money: ' + money);
});
onEvent('button2', 'click', function () {
    if (money >= clickerPrice) {
        money = money - clickerPrice;
        clickers = clickers + 1;
        clickerPrice = clickerPrice + 5;
        setText('label1', 'Money: ' + money);
        setText('button2', 'Buy clicker (' + clickerPrice + ' money)');
    }
});
onEvent('button3', 'click', function () {
    if (money >= moreMoneyPerClickPrice) {
        money = money - moreMoneyPerClickPrice;
        numberOfClickers = numberOfClickers + 1;
        moreMoneyPerClickPrice = moreMoneyPerClickPrice + 50;
        setText('label1', 'Money: ' + money);
        setText('button3', '+1 money per click (' + moreMoneyPerClickPrice + ' money)');
    }
});
timedLoop(1000, function () {
    money = money + clickers * numberOfClickers;
    setText('label1', 'Money: ' + money);
});

Obviously, this would be much easier to read even with small changes like integrated operator assignment, guard clauses, and basic expression simplification (multiplying by one, anyone?), but I didn't put those in here to show how much difference names alone can make.

If you want others to be able to understand your code with relative ease, make sure your names make sense. Your names won't be verbose if you apply this idea properly, and the names should describe themselves without need for a comment or guessing.

Poor representation of logic

Explanation

Although branching with if statements is the main way taught in AP CSP to express logic and the flow of control, there are different methods that are often better suited to the job. When there are, using if statements is probably a poor representation of logic.

How To Rectify

I call the application of logical representation design improvement from if statements to something else "abstracting branching". If you're using a contiguous sequence of numbers (preferably starting with 0) to choose a path some code can take, extract all paths into their own function, and store each function in a list. To use this, index the list (maybe with an offset if the number doesn't start at 0) and run the function at that index. If it's not paths of code you want to represent, just fill the list with data instead. You could also perform a similar technique with more varying data to interpret with a dictionary.

Example

This project is themed around cars and has two different features: A random car name generator and a mock price calculator. We will be focusing on the mock price calculator, which takes a brand and a condition and determines a price. The creator implemented this with branching via if statements, including nesting.

Here is the the original code for this, just formatted:

brand = getText('brandDropdown');
condition = getText('conditionDropdown');

// Check the value of variables to decide the price to set
if (condition == 'Totaled') {
    price = 0;
} else if (brand == 'Ford' && condition == 'New') {
    price = 30000;
} else if (brand == 'Ford' && condition == 'Used') {
    price = 11000;
} else if (brand == 'Tesla') {
    if (condition == 'New') {
        price = 60000;
    } else {
        price = 28000;
    }
} else if (brand == 'BMW') {
    if (condition == 'New') {
        price = 56000;
    } else {
        price = 30000;
    }
} else if (brand == 'Honda') {
    if (condition == 'New') {
        price = 30000;
    } else {
        price = 21000;
    }
} else if (brand == 'Toyota') {
    if (condition == 'New') {
        price = 30000;
    } else {
        price = 18000;
    }
}

Even if it didn't mix nesting and boolean arithmetic, it would still be difficult to read, simply because this is logic being represented poorly. I abstract this branching into a dictionary of conditions to a dictionary of brands to prices, then index it to get the value I want. I also extracted it into a pure function.

function calculateCarPrice(condition, brand) {
    var PRICES = {
        New: {
            Ford: 30000,
            Tesla: 60000,
            BMW: 56000,
            Honda: 30000,
            Toyota: 30000,
        },
        Used: {
            Ford: 11000,
            Tesla: 28000,
            BMW: 30000,
            Honda: 21000,
            Toyota: 18000,
        },
        Totaled: {
            Ford: 0,
            Tesla: 0,
            BMW: 0,
            Honda: 0,
            Toyota: 0,
        },
    };

    return PRICES[condition][brand];
}

This spares us the ceremony of specifically choosing paths of code to take. We don't even need the PRICES constant, we could just replace its one use with the same literal we assign to it in a one-statement solution, although I find that giving the dictionary a name assists with understanding. One thing this solution doesn't do currently is return 0 instantly if the condition is "totaled", although we could easily check for that in a separate if statement and reserve the dictionary for new and used prices only. The trade-off: we wouldn't be able to have any different price for totaled vehicles. Both solutions are valid and well-designed, what you choose should depend on your needs.

Too much mutable state

Explanation

Mutable (modifiable) state is useful, but it tends to make your programs more complicated. Sometimes, mutable state is used (read from and written to) when a more appropriate method to alter the behavior of the program exists. It's not uncommon for function calls to offer a better representation with which to solve your problem.

How To Rectify

Consider alternative representations. Many cases are better represented with calls to functions. You may have to change the interface of some functions to make this work.

Example

This project, the same one from the previous example, also has this problem. The car naming page chooses a random adjective and noun then joins them together into one name. This was implemented with 2 constants - one for adjectives and one for nouns - and 2 variables - one for the index in the list of adjectives for the random adjective and the same thing for the list of nouns.

var adjectiveList = [ "Electric", "Ford", "Hybrid" /* --snip-- */ ];
var nounList = [ "2,000", "Tundra", "3.0" /* --snip-- */ ];

var adjectiveIndex = 0;
var nounIndex = 0;

// --snip--

onEvent('randomizeButton', 'click', function () {
    nounIndex = randomNumber(0, nounList.length - 1);
    adjectiveIndex = randomNumber(0, adjectiveList.length - 1);
    updateScreen();
});

function updateScreen() {
    var adjective = adjectiveList[adjectiveIndex];
    var noun = nounList[nounIndex];
    var carName = adjective + '\n' + noun;
    setProperty('carName', 'text', carName);
}

We set nounIndex and adjectiveIndex to new values and call updateScreen when the randomize button is clicked. updateScreen reads the indexes, indexes the appropriate list to get an item, then joins them into a name. It would be better to do one of two things:

Put the computation of the list indices into updateScreen (which should have its name changed to something like updateName) and have that be the callback for the click event on randomizeButton. This stops us from having to worry about what computation is actually happening, but that naturally means we have no control over generation and determination.
Modify the interface of updateScreen to receive the appropriate indices (or values) as arguments. This lets us have control over what we send to the function and choose different determination rules under different circumstances, but we have to take care of the determination of argument values manually.

I choose to take option one because there is only ever one method for determining values (generate a random number within the bounds of the list for both lists, then index the lists). I made an abstraction for this idea, then implemented option one.

(function initExtensions() {
    Object.defineProperties(Array.prototype, {
        randomIndex: {
            value: function () {
                return randomNumber(0, this.length - 1);
            },
        },
        randomItem: {
            value: function () {
                return this[this.randomIndex()];
            },
        },
    });
})();

var CAR_NAME_PROPERTIES = new MultiPropertyItem({
    adjectives: ['Electric', 'Ford', 'Hybrid' /* --snip-- */ ],
    nouns: ['2,000', 'Tundra', '3.0' /* --snip-- */ ],
});

// --snip--

onEvent("randomizeButton", "click", updateCarName);

function updateCarName() {
    setProperty("carName", "text", generateCarName(CAR_NAME_PROPERTIES));
}

function generateCarName(carProperties) {
    return carProperties.pickProperties().join('\n');
}

function MultiPropertyItem(namesToProperties) {
    var _this = this;

    this._properties = namesToProperties;
    this.pickProperties = function () {
        return Object.keys(_this._properties)
            .map(function (i) {
                return _this._properties[i];
            })
            .map(function (i) {
                return i.randomItem();
            });
    };
}

There is a bit more baggage with this method, but the initExtensions IIFE is intended to be in a library, and this is generally much easier to change, as we only need to add a new key-value pair to the object we create a new MultiPropertyItem from.

In general, representing what you can with data makes things your behavior easier to change, because data can be switched, changed, and extended at runtime, while behavioral statements can only be changed when the program is written. When using data this way, though, try to keep as much data constant and immutable as possible. Your program will be easier to reason about when you do not have to worry as much about how functions change the state of the program, and instead think about how you can explicitly pass the data into functions and receive data from them (that is, favor pure functions).

Inappropriate scope

Explanation

In a number of projects I've seen written by students, variables are used with an unnecessarily large scope. When variables are given a larger scope than is required for the program to work properly, it's much more difficult to reason about where those variables are used.

In general, prefer smaller scopes for your variables. It's not so bad to have constants with a large scope, but still try to keep the scope small to avoid confusion. It's much harder to reason about mutable variables with larger scopes because you must consider where they are changed as well as read, and you also need to think about the order in which things happen so the variable has a valid value. You have to do that for every variable, so try to make it easy on yourself and limit the scope. It's usually easy to extend scope, but more difficult to limit it, so try to do it early!

How To Rectify

First, you need to recognize if the scope for a certain variable is too large. My rule of thumb is that if not everything (functions, classes, other variables) in the scope uses the variable, the scope may be too large. That doesn't automatically make a problem, but when many variables break this rule, especially mutable ones, it is likely a problem.

If you found that the scope for a variable was too large, find all the uses of the variable (modern IDEs make this task trivial) and choose the smallest scope possible among all the uses. If there's any number of uses, but they are only in one function, declare the variable inside of the local scope of that function. If there are uses in multiple functions, consider grouping those functions into methods of a class (this may easily violate another principle, so be careful). There is usually a proper way to shrink the scope of a variable.

Example

This project is a "bird finder" program, made to show all birds that have a certain color and conservation status. Unfortunately, it suffers from this design problem. Here is a snippet of the heart of the program - the part that determines all the birds that fit the chosen criteria (having some chosen color, or a random one, and some chosen conservation status). The function that does this is called sort - this is a bad name. I would have called it, in its current form, populateQualifyingBirds or something similar (because finalBird is also a bad name).

var finalBird = [];

// --snip--

var color;
var bStatus;
var birdName;
function sort(choice1, choice2) {
    finalBird = [];
    for (var i = 0; i < nameList.length; i++) {
        color = birdColor[i];
        bStatus = birdStatus[i];
        birdName = nameList[i];
        if (choice1 == 'random' && bStatus == choice2) {
            // --snip--, random color decision here.
        }
        if (color == choice1 && bStatus == choice2) {
            appendItem(finalBird, birdName);
        }
    }
}

function update() {
    setText("text_area5", finalBird + " ");
}

As you can see, color, bStatus, and birdName are scoped globally, but they are never used outside of sort. Therefore, they are inappropriately scoped, and should be moved into sort. It's a little different with finalBird - that variable is used in both sort and update.

Before I talk about my solution, I have to show my different representation for the bird records. Here is how I define the bird records - a custom object for every bird, easy to access and work with.

var allBirds = readRecordsSync('100 Birds of the World').map(function (i) {
    return {
        name: i.Name,
        color: i['Primary Color'],
        status: i['Conservation Status'],
        toString: function () { return this.name; },
    };
});

Now, the way I solved this problem was, in essence, write a function that takes a color (which could be 'random', in which case, a random color is chosen) and conservation status, and returns the list of qualifying birds, determined from those inputs.

function createQualifyingBirdsList(colorChoice, conservationStatusChoice) {
    if (colorChoice === 'random')
        colorChoice = [
            'Black',
            'Brown',
            'Yellow',
            'Blue',
            'White',
            'Grey',
            'Green',
            'Red',
            'Pink',
        ][randomNumber(0, 8)];

    return allBirds.filter(function (bird) {
        return colorChoice === bird.color && conservationStatusChoice === bird.status;
    });
}

Then, I have a function makeUpdater that returns a function that finds the appropriate birds and updates the bird text area when invoked.

function makeUpdater(isUpdateRandom) {
    return function () {
        var colorOrRandom = !isUpdateRandom ? getText('dropdown1') : 'random';
        var qualifyingBirds = createQualifyingBirdsList(colorOrRandom, getText('dropdown2'));
        setText('text_area5', qualifyingBirds.map(function (i) { return i.toString(); }).join(', '));
    };
}

Now that we aren't worrying about storing a variable somewhere and using it in two separate places. We are instead leveraging return values as opposed to the side effects of state modification, and this makes programs more robust, easier to change, and easier to reason about. This also solves our problem with inappropriate scope. In fact, the only global variable is allBirds, which holds all the records on birds, and is also a constant. There are no more problems with inappropriate scope in this program, and now it is much easier to change later.

Conclusion

By designing your programs properly, you can save yourself a lot of work later, maybe even at that time. Programming was made so people could take it easy while computers do the repetitive work, so why make things harder on yourself when writing programs?

It may seem unnecessary at first, but after you get used to writing programs that are properly designed, you'll see that it pays off in ease-of-change, understandability, and maintainability, and you'll be glad that you did.