Test-induced design damage in Clojure

June 22, 2022

Writing tests is the software equivalent of doing the dishes: unloved, but necessary. Unfortunately, because language designers are almost never interested in testing, it's usually a second-class concept, assuming the language even bothers to treat it as anything different from the rest of code in the first place. This is unfortunate, because it can have detrimental consequences to your code design.

Test-Induced Design Damage (TIDD) is not a new concept. DHH of Rails wrote about TIDD back in 2014. This is not even a new concept for Clojure, as Eric Normand wrote a newsletter about it in the past. Unfortunately, Normand's post didn't have the impact I'd hoped for, nor did it go into enough detail for me, so I'm going to try and give examples that will help people understand the issue and the trade-offs a bit better.

What is test-induced design damage?

I suggest you read DHH's post above, but in short, it's altering code to better support tests at the expense of other aspects of the system. Communities like Test-Driven Development (TDD) have taken it as a priori gospel that better testing is a primary goal, while downplaying the consequences. As they say, software developers know the value of everything and the cost of nothing.

For example, extracting hidden/private/closed-over code so it can be mocked for testing also carries detriments like requiring names (it usually can't be anonymous any more), cluttering docs, expanding argument lists (since support objects must be passed-in or injected), potential misuse (if users can now directly access/create/use an object when they shouldn't), and indirection overhead (both mental and code-based).

To be clear: alterations that support testing may have other benefits that justify their use, but this needs to be evaluated on a case-by-case basis, rather than assuming improved testing is a sufficient justification itself.

It's even possible that complicating your code to support testing actually increases the number of bugs, despite more testing. This is because, all other things being equal, larger codebases have more bugs¹.

The rest of this post will look at changes made solely for testing. I'll show you some examples of TIDD, in order of increasing complexity.

Examples

Imagine you have some function that takes too long to use in local tests. Maybe it makes a network call that takes a while.

Let's say you made a plain function:

(defn my-fn 
  []
  (call-prod-endpoint))

;; usage
(my-fn)

You decide you need to mock it for testing, so what are your options?

Redefining via `with-redefs` or `with-redefs-fn`

Using with-redefs, you can temporarily replace the root definition for testing without touching the original code at all (alter-var-root can work, too, though it's more cumbersome to use). This sounds like the perfect way to leave non-testing code clean, right? Eric Normand suggested this in his original newsletter.

(defn my-fn 
  []
  (call-prod-endpoint))

;; test
(deftest redef-ing-my-fn
  (with-redefs [my-fn #(call-mock-endpoint)]
    (is (= (my-fn) some-expected-result))))

Unfortunately, with-redefs requires a lot of care with multi-threaded / lazy code, since the var root definition is changed for all threads for a limited time. Code in other threads that run after the with-redefs ends can easily use an unintended value. Tim Baldridge wrote a long post on how vars work under the hood and why redefs can be tricky, and it's worth reading before using functions like binding/with-redefs in any context.

You could safely use with-redefs if you can guarantee all of the following:

Don't run multiple tests simultaneously - slower unit testing is the price
Don't rely on background threads - these are fragile anyway, and create timing concerns even if you don't redef anything
Wrap the entire body of the test in with-redefs - you don't need to worry about lazy evaluation happening after with-redefs ends if you've already forced the values you need
Ensure you always join with other threads before exiting the with-redefs if those threads do anything an assertion relies on - this may require code distortion itself

These are a lot of constraints. #1 is undesirable if easy, but ensuring #2 and #4 may range from annoying to infeasible without altering our main code, which violates the goal of avoiding TIDD.

(There's also a long discussion about with-redefs in the Reddit comments on Eric Normand's original newsletter. Unfortunately, Eric's example involved a database connection, which inherently has state and was thus a stronger candidate for protocols/components, and many people latched onto that aspect instead of considering the bigger picture.)

Rebinding via `binding` or `with-bindings`

This is similar to the above, and works, but it requires you declare my-fn as ^:dynamic.

(defn ^:dynamic my-fn 
  []
  (call-prod-endpoint))

;; test usage
(deftest rebinding-my-fn
  (binding [my-fn #(call-mock-endpoint)]
    (is (= (my-fn) some-expected-result))))

On the upside, it only changes the local thread and its children's definition, so tests can run in parallel. Care must still be taken with background threads, but you should avoid those in tests anyway. For all other started threads, they'll carry the binding frame with them, even if the top thread ended, so it's much safer for multi-threaded code.

However, this is still a slight alteration of the code. Declaring it as ^:dynamic means it's slightly slower to execute in production code. Worse, it sends a false signal to users that they may need/want to rebind it. Plus, it suffers a variant of the expression problem, since you cannot mark outside vars as ^:dynamic without forking the code. (One can argue you shouldn't test outside code, but creating wrapper fns just to mock is TIDD again.) Still, this is almost the ideal solution, if not for ^:dynamic.²

Branch inside the function on a testing flag

This might be an option if you already use feature flags heavily. For testing, it would look something like:

(defn my-fn 
  []
  (if-not global.flags/is-testing?
    (call-prod-endpoint))
    (call-mock-endpoint))

Then you need to set global.flags/is-testing? only when testing. This keeps the function signature clean, but clutters the global namespace, complicates the function body, makes multiple mock behaviors difficult, and adds branching overhead.

You could also use compile-time constants or macros to make this pattern more efficient, but it would still be less flexible and cluttered.

Multimethods

What about polymorphism? You could make my-fn polymorphic with multimethods by dispatching based on whether you're running normally or for testing:


(defmulti my-fn (fn [type] type))

(defmethod my-fn :normal [_]
  (call-prod-endpoint))

(defmethod my-fn :test [_]
  (call-mock-endpoint))

;; usage
(my-fn :normal)

;; test usage
(deftest polymorphic-multimethod-test
  (is (= (my-fn :test) some-expected-result)))

The problem is you now have more code, and you have to weave the right dispatch value into all calls to my-fn (and possibly their parents), which alters the param signatures. You could set the dispatch value as a global var, but that has many of the same problems as internal branching does.

Which leaves protocols...

Protocols

The pattern I've seen the most in real Clojure code, and unfortunately, the most complicated option, is to replace plain functions with protocols and records.

(defprotocol MyProtocol
  (my-fn [_]))

(defrecord MyFunctionner []
  MyProtocol
  (my-fn [_]
    (call-prod-endpoint)))

(defrecord MyTestFunctionner []
  MyProtocol
  (my-fn [_]
    (call-mock-endpoint)))

;; non-default constructors are commonly added
(defn my-functionner []
  (->MyFunctionner))

(defn my-test-functionner []
  (->MyTestFunctionner))

;; usage 

(let [my-fn-er (my-functionner)]
  (my-fn my-fn-er))


;; add component deps for bonus points
(def system
 ...
 :my-functioner (my-functionner)
 :something-else (component/using
                   (something-else)
                   [:my-functionner ...]))

Protocols have the inherent problem of requiring state, since they can only be used with an object. Even if the type/record defines no state internally, lifecycle state itself must be taken into consideration. Unlike a function or multimethod, which is effectively available once its namespace is required, protocol functions cannot be used before an object is created or after it's destroyed. Plus, the object must be passed around everywhere it's used, cluttering up argument lists and adding to naming overhead everywhere.

For bonus complexity, non-default constructors are extremely common additions, and once people have a type/record with a lifecycle, they add it to their initialization system, so they end up writing a bunch of extra Component/Integrant/etc code to support it, too.

Is all this worth it? How many protocols have you seen that exist just to support testing and nothing else?

Solution: dynamic redef

The solution I've settled on is one created by Mourjo Sen and I think it deserves to be more widely known. It's encapsulated in a mini-library called dynamic-redef.

The basic idea is to mimic the propagated thread-local behavior of binding without having to declare anything ^:dynamic or mess with our main code. It uses alter-var-root to permanently replace the root definition of a function with one that looks up its current definition in a ^:dynamic map but falls back to the original definition if no overrides are found. Then "dynamically redefining" a function involves adding a new binding frame under the hood with updated fn definitions for the dynamic function lookup map.

Here's his original gist of the technique:

Advantages:

Allows you to leave your main code completely unaltered
Incurs no performance penalty in production code
Replaces definitions in a more thread-safe manner than raw with-redefs

Disadvantages:

Does not play well with background threads (though you should avoid those in tests when possible)
Like binding, does not work with plain Java threading, which doesn't use Clojure thread frames

Summary

This is not meant to eliminate testing-specific protocols/records, but to offer an option that's more suitable in some use cases. My personal "middle way" of testing is, examine the thing to be mocked and determine if it has inherent state. If so, it's probably a better fit for protocols. But if not, don't complicate your code just to test it. Give dynamic redef a try. It may be unfamiliar, but it's simpler than the alternatives when it fits.

Footnotes

Code Complete has some industry-generated estimates on bugs/LOC, but the much-discussed study, A Large-Scale Study of Programming Languages and Code Quality in Github, actually computed the overall effect of code size (independent of language) as a control variable. If you look at the discussion of the control variables in Table 6, "...they are all positive and significant". All else being equal, less code means fewer bugs.↩
Technically, you don't have to declare a var ^:dynamic to use binding on it. There's an undocumented .setDynamic method on vars, but to use this dark art successfully, you'd have to invoke it before the compiler gets to any call sites with the var. Otherwise, it'll compile a call to the root definition, and never check for binding frames. I've seen some code that claims to do this reliably via macros, but it doesn't seem to work for me.↩

Tags: clojure tidd redef tdd dynamic redef test

Undertale in EXAPUNKS

Arrhythmic Beats - IDM