If you want to change something, hold it in place.
This advice goes far beyond software. If you've ever tried to cut food on a slippery surface you know that it is good to hold it with a fork.
In software we've learned this lesson well. We write tests to hold down behavior. If behavior changes counter to our expectations, they fail.
Tests provide fixity. They allow us to change only what we want to change. It's no surprise that we take them very seriously. We use them as documentation and we run them before deployment to make sure everything is ok. Our tests also describe our intentions. In most development organizations, they are like load-bearing structure in a building. We depend upon them. Deleting a test is nearly unthinkable.
This appears to be where are right now in software development. We place incredible value on our tests. Sometimes they seem more important than the code. Unfortunately, I think that this stance obscures an insight that can be very helpful.
I'll explain, but before I do let’s examine an under-appreciated aspect of software systems.
Intentions Fade
The tests we write and the names we use in our code describe our intentions. They tell the story of what we intended when we are programming. How long is this information useful?
Our hope is is that it is useful for the system's entire lifetime. And, in the best cases, it is. Unfortunately, team churn and business changes can decrease the value of that information over time. Intentions Fade. Reading code that was written more than 5 years ago by a different team involves a lot of guessing. We ask ourselves questions like: "Why did they want to do that?" or "What was the business context that led to this code?" Often it is very hard or impossible to get answers.
Intention Fade is is another form of technical debt, a way that systems can become less understandable. We can refactor to keep it at bay, but we have more tools than we may think.
Enter Fixity
Remember fixity, that quality that tests give us? It turns out that code has fixity also. If we hold our code constant, we are free to change our tests however we like.
fixity - the quality or condition of being fixed and immovable.
Here is the way we often think about tests and code:
We think that tests cover code, but in actuality tests cover behavior. They cover it through code. This means that code "covers" behavior as well. In fact, if we are lucky, code determines most of our system's behavior. It also fixes it; it holds it in place.
The nice thing about this is that we can fix behavior through our tests or we can fix it through our code. Both sides work.
If we hold our code constant, behavior is fixed and we can have our tests in flux. If we hold our tests constant, we can make any change to code that we like as long as it doesn't change behavior.
This relationship holds, but it can go further. I think that code is really the ground truth of a system. It describes what the system actually does rather than how it was intended to behave.
If intentions fade, we can hold behavior in place by not changing the code while we rewrite tests. We can also hold behavior in place with tests as we change names in the code to make today's intention clear.
A Different View of Tests
Many teams I've visited feel locked down by their tests. They know that they need them, but they realize that their tests don’t just fix behavior in place, they also fix structure in place. It's hard to refactor the boundary that you are testing through.
If we understand fixity and take it seriously, we realize that we can interrogate the behavior of the system any time what we want to and record it as a description of ground truth.
Ground truth - information that is known to be real or true, provided by direct observation and measurement (i.e. empirical evidence as opposed to information provided by inference. Wikipedia (9/7/2023)
Fortunately, writing tests for existing code is easy. We just ask questions of the code while we aren't changing it. That period of time, often on a branch, holds behavior constant. The questions and their answers become tests. We can keep them as long as they are useful and we can delete them and rewrite them at a different level if they get in the way.
We can look at tests as descriptions with varying lifetimes. Some may last for years and others may last for hours. They are useful as long as they facilitate change and understanding.
Coda
Sometimes I'm asked what I think of approval testing frameworks. In general, I like them. They are very much like something I prototyped a long time ago named Vise. The idea was to run the code once with embedded calls that save values. Every subsequent run checked against those values. It's a great way of holding behavior in place during refactoring.
The only issue I have is with these tools is there's often the chance to do a bit more.
When I characterize existing code, I look at it as a process of discovery. I start with an empty test and name it 'x.' Then I type a scenario, using existing methods that I am curious about and compare its result against some dummy value like 0 or the empty string. The test usually fails. When it does, I take the actual value that the test discovers and make it the expected value so that it passes. Then I change the name of the test from ‘x’ to a phrase that describes the behavior I've learned. It’s often right at that moment that I can discern an intention and run it past others for validation.
In short, I like approval tests to fix behavior in place for quick refactoring and framework-less characterization testing when I want to discover and document. They are two avenues toward the same goodness.