Discovery Writing: Design as an Emergent Property of Code

Brandon Sanderson often states; "be a chef, not a cook". Throughout his lecture series, he explains the concept of ‘discovery writing’ – a concept which reflects the foundation of my software development philosophy.

Discovery writers typically embark on their creative process with naught but a few ideas, vague concepts, a pen and a piece of paper. From there, the story grows. Messy, imperfect and endogenous.'Outliners', on the other hand, will rigorously plan, review, and subsequently, execute their story in a controlled and planned fashion. While Sanderson emphasises that this is not a binary–and indeed, encourages that authors might adopt any mix of approaches that suit them – there is one key distinction: cooks follow a recipe, while chefs invent. When provided a range of tools, adopt the ones that suit you, and blend them creatively. Own the process you use to create.

Background

Software for me, initially, was a means, not an end. Scrappy scripts and patchy codebases. The goal was to analyze or display data, not to think about how it was structured through code. As I was confronted with harder challenges over the years, I was forced into creating/copying patterns that were more effective. Eventually, I started to consider code and infrastructure on a deeper and more thoughtful level. From those earlier years, my approach has always been to 'learn by doing'. It is only after those initial forays do I have the confidence to 'put pen to paper' and start outlining software designs.

Friction

A staple of software engineering is clear organisation. A simplified view of the development lifecycle is often represented linearly:

I tend to prefer the following:

Engineers are not a monolith. That being said, there tends to be a preference for familiarity and predictability among engineers. Strong opinions, strongly held. The premise of 'Discovery Writing' can irritate collaborators (especially when not properly communicated or disguised). Frequent refactors, pivots, and experimental components of the stack can (understandably) be frustrating to work with. This is particularly true in an environment that desperately attempts to reliably calculate predictability of productivity with OKRs, points, evaluations, roadmaps, jira boards, and whatever the next shiny methodology is...

None of this is to say that planning-forward approaches are wrong – just that mine is different. I want to explain how I approach code and why I approach it this way.

Why

Scientific software, like science, suffers from a high failure rate. Scientific research is specialised and fragmented. Each scientist uses a very specific lexicon, for a very specific domain, which they investigate with very specific methods. I've always experienced this through the flat-file formats I'm forced to work with for a given modality of data. NIfTI or DICOM files for radiology, H5AD files for single-cell transcriptomics, Zarr for spatial data, TIFF for microscopy, CSVs for bulk transcriptomics... the list goes on. It is natural for any engineer to look at this fragmented space and either want to consolidate and re-organise. Particularly given the prevalence of multi-modal (cross-domain) analyses nowadays. Data scientists are forced to hundreds of hours navigating and learning diverse data representations, rather than focusing on analysis.

Doing so with an outline/plan upfront makes this goal of consolidation seem either too overwhelming complex (a 50 page technical specification) or damagingly reductive (a 2 page project brief) to implement. However, by taking a less-structured approach and developing intuition this task becomes achievable.

If you can't speak the lexicon of computational biologists, they won't adopt your tools. If you don't understand why both DICOM and NIfTI formats exist, clinical data scientists won't trust your work. Most tools, pipelines, and data repositories are short-lived and infrequently accessed. Discovery work in each respective domain done in tight feedback loops with experts helps develop long-lived infrastructure.

I've found that by embedding myself with the user, learning their domain, and authoring helpful tools, then effective system design starts to emerge.

How

Embedding directly alongside the user:
- Offset the cost of inclusion.
  - Automation has compounding returns. However in the early stages of development it often costs (more time/money/energy) than it is worth to the user. Building relationships and directly contributed to a given analysis, data wrangling, or process can offset that.
  - Running a PerturbSeq analysis motivated me to set up centralised repositories to simplify accessing transcriptomics data.
- Test small code patterns.
  - It is usually these small code patterns that end up forming the cornerstone of a given design.
  - A simple pandas-like API design that better represented fragmented and complicated data
- Learn domain-specific language.
  - Understanding the lexicon of a given domain translates directly into elegantly representing and exposing information. A lexicon provides the vocabulary for modeling a system.
  - obs and var are familiar concepts whether a single-cell computational biologist is working with radiology data or transcriptomics.
Start from scratch a few times (build once to throw away, once to keep):
- First design is rarely the right one. It's good mental discipline to throw away bad solutions (even though sometimes it feels like abandoning a child).
- Try-fail-catch (as opposed to try-catch-fail). Try, fail, and 'catch' or identify the failure.

Design as an Emergent Property of Software

My weakly held opinion is that this 'discovery' approach trains intuition. And intuition builds exponential returns. Emergent Design (S. L. Bain) more articulately challenges the traditional outlining approach to software engineering (referred to there as 'Big Design Up Front'). Instead of trying to predict every requirement and architectural need before writing code, Bain argues for an evolutionary (though still test-driven) approach. Software design should emerge naturally, driven by actual needs rather than speculative planning.

Summary

Discovery:

Embed: Learn the lexicon. Work alongside users until you can speak their language and understand why their workflows exist (not just what they are)
Prototype: Build small, disposable tools that solve immediate problems. These aren't MVPs—they're probes to test whether your mental model is correct
Discard: Throw away what doesn't resonate. The willingness to abandon code is the cost of learning
Extract: Identify the patterns that survived. These become the foundation of your actual system design

When to use Discovery vs. Outlining:

Discovery when: domain is unfamiliar, users are specialized, failure rate is high, requirements are fuzzy
Outlining when: domain is well-understood, patterns are established, cost of iteration is high

Conclusion

Discovery writing mandates a high tolerance for failure. I've been fortunate in that virtually every manager I've had since 2021 has fostered a supportive environment for this mechanism of development, as well as tools to navigate friction with coworkers who dislike it. Despite its drawbacks, this mechanism of authoring code has led to some of my most impactful projects and products. It is also how I've learned how to build systems without a formal education in programming.

Hopefully this introspection serves as a valuable exploration of scientific software engineering, and an explanation of why I try to be a chef – not a cook.