TDD and Behavioral Testing in Go: Mocking Considered Harmful

12 min readSep 27, 2023

Art via https://quii.gitbook.io/learn-go-with-tests/

Test-Driven Development (TDD) promises robust, error-free code, but its core principles are often misunderstood or ignored, especially in the Go ecosystem. TDD gained prominence with Kent Beck’s 2002 book of the same name. It provides a framework for crafting resilient software systems. The essence of TDD, which is behavioral testing, frequently becomes sidelined due to industry focus on rapid development and superficial performance indicators. Observing TDD in practice over the years reveals persistent misunderstandings and reservations. Long tenure at a single company provides us with a unique opportunity to witness the long-term impact of our approaches, including the problems around testing. This article explores the nuances of Test-Driven Development in Go, emphasizing the importance of behavioral testing and cautioning against the pitfalls of mocks for a more resilient and maintainable codebase.

The Interlace of Testing and Design

“A software system can best be designed if the testing is interlaced with the designing instead of being used after the design.” — Alan Perlis, NATO conference on Software Engineering, 1968.

This insight, articulated over half a century ago, anticipates the core tenets of Test-Driven Development (TDD), effectively prefiguring the methodology three decades before its formalization.

Test-Driven Development (TDD) is grounded in three simple rules:

Write production code only to pass a failing unit test.
Write no more of a unit test than sufficient to fail.
Write no more production code than necessary to pass the one failing unit test.

In TDD, tests serve dual functions: they shape the design and act as contracts that the code must meet. One key point to note is that adding a TDD test must be triggered by a new requirement, i.e. a new behavior, not merely the addition of new functions. Through iterative cycles, a just-in-time design takes form, progressively refining the software’s architecture and public API. TDD thereby introduces an element of emergence in software design.

We should also note that TDD is not monolithic; it has multiple stylistic schools. The style described in this article aligns with the “Classical” or “Detroit School” of TDD, which emphasizes real implementations and minimizes the use of mocks and stubs. This is in contrast to the “Mockist” or “London School,” which heavily employs mocks to isolate the unit of work from its dependencies. Given that the Classical approach allows for emergent design without extensive mocking, it is well-suited for those who prefer testing against real objects and wish to follow Alan Perlis’ principle of interwoven testing and design.

Skipping Tests for Speed?

Fred George introduced Programmer Anarchy in 2012 , an enhanced form of Extreme Programming (XP) sometimes called Developer Driven Development. Around the same time, the Lean Startup became popular. One could argue that these developments have preoccupied the industry with speed and fast deliveries that often overshadows a focus on testing. While speed is critical, this mindset encourages the ‘duct tape programmer,’ an individual who prioritizes rapid delivery over code quality, leaving a maintenance burden for the rest of the team. This archetype can ship products, but often leaves behind a codebase that’s difficult to maintain.

One might assume that TDD, which requires an upfront investment in test writing, would slow down development. However, TDD acts as a safety net, catching regressions and errors early, which in the long run speeds up development by reducing the debugging time.

The cost of directional backtrack is enormous when making a wrong turn in your architecture or code structure, especially if discovered late in the development process. Therefore, the focus on testing and iterative design aims to minimize this cost by catching issues early.

Behavioral Testing in Go

Testing behavior translates to focusing on what the software is supposed to achieve — its contract with the external world. This contract is usually manifested through the public API, making it the focal point of TDD.

Go packages serve as modular units of code, each focusing on a single responsibility. This design is reminiscent of the UNIX philosophy, where each tool performs one task but performs it well. In Go, each package is designed to accomplish a specific task efficiently, much like UNIX tools. This allows for composable systems where packages can be combined in various ways to achieve complex functionalities.

Analogously, the hexagonal or ports-and-adapters architecture allows for easy swapping of components, thereby making the application more adaptable for tests, different user interfaces, or various business rules. Given this philosophy, “utilities” do not contain business logic and should not be a package, but rather be placed in a package that clearly denotes their supportive role to multiple other packages, avoiding generic names like util, common or server. These should still be behavior-oriented, or more precisely, purpose-oriented. It’s worth pointing out that a package named ‘utils’ lacks a specific behavior, contradicting the principle of behavioral focus.

test what the package does, not how it does it
test the behavior, not the implementation
test the public API

Public API, the Stable Contract to Test

The public API — those functions, methods, and types that are exported from a package — serves as the stable contract. Tests should be written against these APIs to ensure that they behave as expected. By doing so, you’re not just validating that your package works correctly, but you’re also providing implicit documentation for how to use it.

Process wise, this approach to testing also aligns well with the definition of done. If the requirements are fulfilled, the package is complete. We can refactor the underlying implementation details without breaking the API, nor the corresponding tests!

Coupling Kills Software

Tight coupling between components makes software difficult to understand, test, and maintain. The more interconnected the components are, the harder it becomes to change one part without affecting others. This is why principles like the Single Responsibility Principle (SRP) and Interface Segregation Principle (ISP) are emphasized in software design — to reduce coupling.

Inversion of Control (IoC) shifts the responsibility of object creation and dependency management from the application to a container or framework. While IoC has its merits, its introduction in tests often signals design complications. If your tests require complex IoC setups akin to your application, you’re likely complicating the test environment. Such complexity dilutes the benefits of TDD by making the tests more challenging to write, understand, and maintain.

Example for IoC in Tests

Dependency Injection (DI) is a common form of Inversion of Control (IoC). It involves passing dependencies into an object rather than creating them within the object. Here’s a simple Go example illustrating IoC in tests:

type Database interface {
    Query(string) (string, error)
}

type RealDatabase struct {}

func (rd *RealDatabase) Query(q string) (string, error) {
    // Real query logic here
    return "real data", nil
}

type FakeDatabase struct {}

func (fd *FakeDatabase) Query(q string) (string, error) {
    return "fake data", nil
}

type Service struct {
    db Database
}

func NewService(db Database) *Service {
    return &Service{db: db}
}

func (s *Service) GetData(query string) (string, error) {
    return s.db.Query(query)
}

// In your tests
func TestGetData(t *testing.T) {
    fakeDB := &FakeDatabase{}
    service := NewService(fakeDB)

    data, err := service.GetData("some query")
    if err != nil || data != "fake data" {
        t.Fatalf("Expected fake data, got %s", data)
    }
}

Database here is an interface with a Query method. RealDatabase and FakeDatabase are implementations of this interface. The Service struct depends on Database, but rather than creating a database connection itself, it accepts an implementation of Database as a parameter. In the test TestGetData, a FakeDatabase is passed to NewService, demonstrating IoC in the test setup. This makes the test easier to understand and maintain.

Mocks Considered Harmful

Mocks in TDD can create a misleading parallel universe, deviating from real-world behavior and becoming a maintenance burden. This misleading nature extends to the mental models that developers build. Specifically, mocks can create false perceptions of how different parts of the system interact, leading to misunderstandings that manifest as design flaws or bugs later in the development process.

For example, using a mock database that returns hard-coded query results can lead to tests that pass in isolation but fail when integrated with real database logic. In tests that employ mocks, the System Under Test (SUT) often becomes tightly coupled with the mock objects, leading to brittle tests. Mocks often require us to specify the interactions between the SUT and its dependencies in fine detail. This means your tests become sensitive to how the SUT achieves its functionality, not just what it achieves. If you change the internal workings of your SUT — while keeping its behavior the same — your mocks can start to fail, even though from a functional perspective, everything is fine.

Instead, use real implementations or fakes that fulfill the same contracts as your real objects, thereby adhering to the principles of behavioral testing. The line between fake clients and mocks can sometimes be blurry, but a key distinction is that fake clients are full implementations of an interface that mimic real-world behavior, whereas mocks typically simulate responses for specific method calls without encompassing full behavior. Mocks are often generated dynamically during tests, creating tight coupling, while fake clients are usually static and reusable.

Fakes don’t require setting expectations, and are not designed to verify interactions. Fakes enable to test the behavior, not the interactions. The focus moves away from “how the SUT interacts with its dependency” to “what the SUT does”, making your tests less brittle and easier to understand.

Example: 3rd party API package

A more complex example to consider may highlight better how to approach TDD. When dealing with API clients that use both HTTP and WebSockets, consider them as separate components with distinct responsibilities. The HTTP client handles RESTful API calls, while the WebSocket client manages real-time data streams. Both should be abstracted behind interfaces to facilitate testing.

Pitfalls to Avoid:

Coupling Logic: One common mistake is to couple the logic for handling HTTP and WebSocket communication tightly within the same component. This makes it difficult to test and maintain.
Ignoring Behavior: Don’t focus on testing implementation details, which leads to brittle tests. Instead, focus on the behavior of the system.
Mocks: Reliance on mocks for dependencies can result in fragile tests that break due to changes in implementation details.

Recommendations:

Fake Implementations: Use fake clients that adhere to the same interface as your real clients. The fake HTTP client could use a simple in-memory map to simulate CRUD operations, while the fake WebSocket client could use Go channels to simulate real-time data.
Isolation: We place the faker in the internal/ directory to ensure that it can’t be imported and misused by external code. This aligns with the principle of encapsulation. The real implementations against the API reside in the root package or a dedicated sub-package, depending on the complexity. The root package would then expose the primary interface and types.
Further, ensure that our tests are only using the package’s public API, thereby emulating how a third-party would use your package, you can employ _test packages. Isolating our tests from the internal details of the package. This ensures that we don’t inadvertently access unexported functions, variables, or types in our tests.

my-api-client/
├── examples/
│   └── main.go
├── internal/
│   └── fakes/
│       ├── http_fake.go
│       └── ws_fake.go
├── client.go             # Primary interface and types
├── client_test.go        # Tests for the primary interface
├── httpclient.go         # HTTP client implementation
├── httpclient_test.go    # Tests for HTTP client
├── wsclient.go           # WebSocket client implementation
├── wsclient_test.go      # Tests for WebSocket client
├── README.md
├── go.mod
└── go.sum

client_test.go

// TestMarketSubscriber_Initialize tests the Initialize method of MarketSubscriber
func TestMarketSubscriber_Initialize(t *testing.T) {
    fakeHTTP := &fakes.HTTPClient{}
    fakeWS := fakes.NewFakeWSClient([]string{"Initial Fake Data", "Updated Fake Data"})

    client := mypackage.NewSyncClient(fakeHTTP)
    ws, err := client.Connect(context.Background(), fakeWS)
    if err != nil {
        t.Fatalf("Failed to initialize: %v", err)
    }

 dataCh := make(chan string)

 go ws.Subscribe(ctx, dataCh)

    // ... asserts
}

Testing the public interfaces only and testing private functions implicitly is generally sufficient. The internal methods get covered indirectly when testing the public methods. This is known as “black-box” testing and is perfectly acceptable.

In tests involving third-party APIs, emphasize the expected state changes and behaviors within your package. If calling a WebSocket API is supposed to change the state of an object within your package, your tests should focus on verifying that state change.

If we can’t interact with a staging environment, capturing real payloads and using them in tests is a good approach. This ensures that our code can handle the kinds of data it will encounter in production.

client.go

// HTTPClient defines the contract for http data fetching data.
type HTTPClient interface {
    Get() (string, error)
}

// WSClient defines the contract for WebSocket subscriptions.
type WSClient interface {
 Init(authToken string) error
    Subscribe(ctx context.Context, messages chan<- string)
}

type SyncClient struct {
 client HTTPClient
}

func NewSyncClient(client WSClient) *SyncClient {
 return &SyncClient{
  client: client,
 }
}

type AsyncClient struct {
 ws WSClient
}

func (c *Client) Connect(ctx context.Context, ws WSClient) (*AsyncClient, error) {
 c := &AsyncClient{
  ws: ws,
 }

 err := ws.Init(c.client.AuthToken())
 if err != nil {
  return nil, fmt.Errorf("error initializing subscriber: %w", err)
 }

 return c, nil
}

func (c *AsyncClient) Subscribe(ctx context.Context, messages chan<- string) {
    // ...
}

Aligned with the Go philosophy of compsition over inheritance, the `AsyncClient` composes behavior from both the http and ws client.

Further, to abide separation of concerns, ensure that the async, or WSClient doesn’t handle message decoding. Create a Decoder component and interface for this as it makes the system easier to reason about and test.

internal/fakes/http_fake.go

// FakeHTTPClient for testing
type FakeHTTPClient struct{}

func (f *FakeHTTPClient) Get() (string, error) {
    return "Initial Fake Data", nil
}

internal/fakes/ws_fake.go

// FakeWSClient for testing
type FakeWSClient struct {
    autoMessages []string
}

func NewFakeWSClient(autoMessages []string) *FakeWSClient {
    return &FakeWSClient{
        autoMessages: autoMessages,
    }
}

func (f *FakeWSClient) Subscribe(ctx context.Context, messages chan<- string) {
    go func() {
        for _, msg := range f.autoMessages {
            messages <- msg
        }
    }()
}

The `FakeWSClient` design allows decoupling the fake’s behavior from its implementation. This allows to inject data in tests and makes these more focused and clear.

In summary, avoid tightly coupling HTTP and WebSocket logic, ignore implementation details in your tests, and be cautious when considering the use of mocks. Notice that the `HTTPClient` and `WSClient` interfaces are minimal, containing only the methods that the respective clients would use, ensuring that no extra methods are thrust upon them. This abides the Interface Segregation Principle (ISP). In the code layout, the HTTP client and WebSocket client are in separate files, each focused on its own responsibility — HTTP and WebSocket communication, respectively. This aligns with the Single Responsibility Principle (SRP).

Real API and Networking Code

A question you may ask youself is when to implement real connectors.
Once the interfaces are fairly stable and we have a good set of behavioral tests that exercise these interfaces through our Client, it might be time to implement a real client.

While unit tests may not be feasible for these components, we can still write integration tests that run against real services. Another option is to wrap `http.Client` and `websocket.Conn` in our own interfaces and provide fakes for them. We prefer an example client that runs against the real environment. Given we have tested against real data, networking is a component that needs fine tuning in production though.

Returning Structs vs. Interfaces

Structs as return types make sense when the implementation is not expected to vary. Interfaces are useful when you want to define a contract that multiple types can satisfy. `SyncClient` and `AsyncClient` are the primary entry points, and unless you foresee multiple implementations for these, returning structs is appropriate.

Testing Behaviors and the Difference to BDD

BDD (Behavior-Driven Development), it’s more of a methodology that extends TDD (Test-Driven Development) to include considerations about the behavior of a system from the user’s standpoint. BDD often employs natural language constructs to describe the expected behavior, commonly seen in tools like Cucumber. BDD is essentially a specialized variant of TDD that emphasizes collaboration among developers, QA, and non-technical participants in a software project. It encourages teams to use shared language and constructs to discuss and document how the application should behave. While TDD focuses on what the individual units of code should do, BDD focuses on how different parts of the application should behave in terms of business logic or user interaction. The methods used to test these behaviors are still rooted in TDD; BDD simply refines the focus and context of those tests. So in that sense, you could consider BDD a mental model or a specific approach within the broader TDD methodology.

Conclusion

TDD has come a long way, and its essence often gets lost in the noise of speed and quick deliveries. When applied with a focus on behavioral testing, yields robust, maintainable code. This methodology serves as a safety net, enhancing long-term development speed. TDD prepares for future challenges by aiding in the creation of robust software that can withstand changes until the next inevitable refactoring.

References:

Mocks Aren’t Stubs — Martin Fowler, 2007
Test Driven Development — Kent Beck, 2002
Absolute Unit (Test) — Talk — Dave Cheney, 2019
TDD, Where Did It All Go Wrong — Talk — Ian Cooper, 2017