An honest account of building a production macOS app with Claude Code—what works, what doesn't, and what it actually feels like.


Five months ago, I started building Yakki, an amazing macOS dictation app. Today it has 148 Swift files, approximately 40,000 lines of code, 15 major subsystems, and ships with Apple notarization through an automated 7-step distribution pipeline. It handles real-time audio capture, on-device transcription, text injection, LLM-powered meeting intelligence, speaker diarization, multi-model ML management, licensing, auto-updates, telemetry, and onboarding, among other things.

I am one person.

This is not a story about how AI is going to replace devs. It's a story about what happens when a solo developer uses coding tools seriously, every day, for months, on a real product that real people use. The answer is... nuanced.


Setting up my projects

My primary tool is Anthropic's CLI for Claude, running in the terminal. It has access to my filesystem, can read and edit code, run shell commands, and search the codebase. I interact with it through natural language, and it responds with code changes, explanations, and tool use.

Basically, the project has a CLAUDE.md file at its root—a document that gives Claude context about the project. Think of it as onboarding documentation that the AI reads before every session. It contains:

  • Build commands
  • Project structure (where views, services, managers, and models live)
  • Architectural patterns (SwiftUI + AppKit, state management, navigation)
  • Frameworks and dependencies
  • Specific rules about development philosophy—things like do not add fallbacks (I might implement fallbacks, but only when I am fully aware they exist and what their scope is)
  • And something maybe less usual: a set of UI/UX design laws

Those design laws aren't a joke. They look like this:

The Law of Visual Economy ("Boxitis" is the Enemy). Every line and box adds "cognitive weight." Start with just text and whitespace. Only add a container if the content absolutely floats away without it.

The Law of Direct Manipulation (Frictionless Flow). Every click, modal, or confirmation dialog is a speed bump. Great apps feel like they read your mind.

Stop designing "Screens" and start designing "States." Do not design the "Perfect Static View" but simulate what happens when data changes or the window resizes.

These rules exist because models tend to over-engineer UI. Left unconstrained, Claude will add confirmation dialogs, wrap everything in bordered containers, and create beautiful static views that fall apart when data changes and also look very cookie-cutter. The design laws are guardrails. They encode my taste into something the AI can reference.

Beyond CLAUDE.md, I maintain a private memory directory where I store patterns confirmed across multiple sessions, key architectural decisions, and debugging insights. When Claude encounters a problem, it can check its memory for notes from previous sessions. This continuity matters—without it, every conversation starts from scratch.


What the Collaboration Actually Looks Like

The romantic version: I describe features in natural language and Claude writes them perfectly. The realistic version goes a bit more like this: it's a conversation with someone extremely intelligent sometimes and less graceful on different occasions. A long, iterative, sometimes frustrating conversation where both sides contribute different things.

The Research Phase

Before implementing anything significant, I have Claude explore the codebase. "Read the audio pipeline. Understand how AudioManager, TranscriptionEngine, and TextInjector interact. Then tell me how you'd add speaker diarization without breaking existing dictation."

It can read 15 files in parallel, cross-reference imports, trace data flows, and synthesise a coherent picture in minutes. The same task takes me an hour of jumping between files. Its research is thorough in a way that would take me a very long time—it checks every import, every protocol conformance, every initialisation path.

The output is a plan. Not code yet. A structured proposal: "Here's what exists, here's where the new feature fits, here's what needs to change, here's what might break." I review the plan, push back on parts I disagree with, and we iterate. Only then does code get written. This is by far the most important phase. The amount of love, care and time you allocate here is inversely proportional to the amount of issues you might have moving forward (imho).

The Implementation Phase

Most new code follows this pattern:

  1. I describe the feature and constraints in 2-3 paragraphs
  2. Claude proposes an architecture
  3. I approve, adjust, or reject
  4. Claude writes the code
  5. I review the diff and adjust or polish details
  6. We iterate on problems

The code quality varies. For straightforward features (add a settings toggle, create a new view, wire up a notification), Claude produces clean, working code on the first try. For complex systems (the audio capture pipeline, the text injection mechanism, the LLM streaming parser), it usually gets the architecture right but struggles with edge cases.

Edge cases are where human experience matters. Claude doesn't know that CGEvent keystroke simulation needs different delays for spaces versus regular characters because some apps do word-boundary processing on spaces. It doesn't know that AVAudioEngine has hidden background callbacks that fire after deallocation. It doesn't know that NSApp.setActivationPolicy(.accessory) will fight with SwiftUI's default WindowGroup. These are things I learned from debugging, from user reports or crash logs.

But here's the thing: I often don't know them in advance either. The difference is that when we hit the edge case together, I recognise it faster (so far—we will see in a few months). I've seen that class of problem before. I can say "this is a race condition in audio tear-down" and Claude can generate the fix. The collaboration works because we bring complementary knowledge: Claude knows the API surface broadly, I know the gotchas deeply.

The Debugging Phase

Debugging is where the dynamic shifts most. Claude is genuinely good at analysing crash logs, reading stack traces, and proposing hypotheses. I had a production crash, EXC_BAD_ACCESS (SIGSEGV) in AVAudioIOUnit::IOUnitPropertyListener, main thread showing -[AVAudioEngine dealloc]. Claude read the crash report and immediately identified it as an async/deallocation race condition: a background audio callback firing into a deallocated engine. That was amazing—this would have taken me way longer to notice.

The fix required cleaning up resources in a specific order: stop the engine, remove the audio tap, nil the references, then remove the device change listener. Claude proposed the correct cleanup order on the first try. This is a pattern it has seen in training data. The specific crash was unique to our app, but the category of bug (async callback + deallocation) is well-documented, and in those cases the level of efficacy is terrific.

For novel bugs, the kind that don't match known patterns, Claude is less useful. It tends to propose solutions to the wrong problem, or suggest changes that would mask the symptom without fixing the cause. In those cases, I debug the old-fashioned way (print statements, Instruments, hypothesis testing) and then describe what I found so it can implement the fix.


The Numbers

Some concrete data about what this collaboration has produced:

Codebase size: ~148 Swift files, ~40,000 lines of code in the main app target. An additional 18 shell scripts (including a 782-line distribution pipeline), 10 test files, and 100+ markdown documentation files.

File authorship: Most of the time, Claude writes first drafts, I refine.

Major subsystems: Audio capture pipeline, transcription engine, text injection, telemetry, and so on.

Documentation: A lot. This documentation is both a product of working with AI (you generate a lot of documentation when your co-author can produce it instantly) and a tool for working with AI (more context means better outputs). I am obsessed with documenting processes to create breadcrumbs and insights for the future me when I have to go back to a project. I have suffered in the past from having to return to my own code months after writing it, only to realise I had become a stranger to my own code.

Distribution: The app ships through a fully automated pipeline. This pipeline handles both stable and nightly release channels.

From the end of September 2025 to February 2026. Five months from first commit to a notarized app with licensing, auto-updates, a go-to-market plan, and real users. I have a day job in consulting and collaborate on other projects.


What Works Well

Research and exploration. AI is spectacularly good at reading a codebase and explaining what it does. When I'm working on a feature that touches 5 different files, having Claude read all of them, understand the relationships, and propose where changes should go saves enormous time. This is the highest-ROI use case.

Boilerplate and patterns. Anything that follows a well-known pattern, Claude produces quickly and correctly. I'd estimate 60% of the code in the app is this kind of structural work.

Documentation. Claude generates thorough documentation naturally. Every feature gets a spec document, every bug gets a root-cause analysis, every architecture decision gets a writeup. The (extremely complete) documentation folder exists because the marginal cost of documentation dropped to near zero.

Refactoring. "Move this logic from the view into a separate manager class" is a task Claude handles well. It understands the target architecture, moves the code, updates all references, and adjusts the call sites. This is exactly the kind of tedious, error-prone work that I'd procrastinate on without AI.

Script writing. The 782-line distribution script was largely assisted. Shell scripts are a perfect fit: well-defined inputs and outputs, no hidden state, extensive documentation online. Previously, my experience with bash was marginal. Claude writes better bash than I do.

Learning. Models do an amazing job at helping you learn as you go. They can explain details about things you might not have encountered before, or patterns novel to you. They provide confidence when climbing walls that before seemed unattainable.

What Doesn't Work Well

Novel architecture. When building something that doesn't have obvious precedents (our custom text injection system, the hotkey detection), Claude needs heavy guidance. It proposes plausible code, architectures that don't necessarily account for real-world constraints I've learned through experience. I end up describing the architecture myself and having Claude implement it.

Aesthetic judgment. Claude's default UI is over-designed. Too many borders, too many colours, too much visual complexity. The design laws in CLAUDE.md help, but I still regularly strip out unnecessary containers, reduce opacity values, and simplify layouts after Claude's first pass. Taste is hard to encode in instructions.

Integration testing. Claude can write unit tests, but it can't test that the audio pipeline actually works with a real microphone, that text injection behaves correctly in Safari versus VS Code, or that the indicator follows screen changes in a multi-monitor setup. Hardware-dependent testing is still entirely manual.

Long-running context. Sessions that go beyond 30-40 exchanges start to degrade. Claude loses track of earlier decisions, revisits solved problems, or contradicts previous approaches. I've learned to keep sessions focused: one feature, one bug, one architectural decision per conversation. The memory system helps bridge sessions, but it's not perfect.

Security sensitivity. I review every change that touches authentication, licensing, or API key handling manually. Claude occasionally proposes patterns that would work functionally but have subtle security implications—e.g. storing a token in UserDefaults instead of Keychain, or logging request bodies that might contain sensitive data. These are the kinds of errors where "it works" is not the same as "it's correct."


The Emotional Reality

Here's something nobody talks about: working with AI every day changes how you think about your own work. The ways of coding have changed forever, and for some people that might be very hard to change.

There are moments of genuine wonder. Claude reads a 1,400-line service class, understands the threading model, identifies a race condition I missed, and proposes a fix that I verify is correct. A real contribution that made the product better. On the other hand, you also have it proposing a solution to the wrong problem. I explain why it's wrong. It apologises and proposes another wrong solution. I explain again. On the third try, it gets it right, but I've spent 20 minutes that could have been 5 minutes of just writing the code myself.

And there are moments of something harder to name. What does that mean about your authorship? About your craft? I chose the architecture. I defined the constraints. I reviewed every line. I debugged the production crashes at midnight. But I type less and less of the code. Is that different from a tech lead who designs systems and reviews PRs but doesn't write the implementation? (Respect to all tech leads out there.)

I've decided it is and it isn't. The creative work—deciding what to build, how it should feel, what trade-offs to make—is mine. The implementation work is shared. And that's fine.


Things I wish I had been told

If you're considering this approach for a real project:

Write a good CLAUDE.md. This is your highest-leverage investment. Include build commands, project structure, architectural patterns, and your design principles. The AI will follow rules it can reference. Without them, you'll spend half your time correcting the same mistakes.

Research before implementing. Always have Claude read the relevant code before writing new code. "Read AudioManager.swift, then implement feature X" produces dramatically better results than "implement feature X."

Keep sessions focused. One topic per conversation. Use the memory system to bridge sessions. Don't try to build an entire feature in a single marathon session.

Review diffs, not descriptions. Claude will describe its changes accurately but sometimes omit important details. Always read the actual code diff. I've caught subtle issues (retained self in closures, missing weak references, incorrect thread dispatch) that the description glossed over.

Maintain your own expertise. AI amplifies what you know. If you understand audio pipelines, AI helps you build better audio pipelines faster. If you don't understand audio pipelines, AI helps you build audio pipelines that work until they don't, and you won't know why. The developer's knowledge is the multiplier, not the AI.

Document obsessively. Every bug fix, every architecture decision, every failed approach—write it down. Future sessions will be better because of it. And future you will be better because of it, whether or not AI is involved.


The Bigger Picture

I've shipped a production macOS app with 15+ major subsystems in months, working part-time, as a solo developer. A macOS app of this complexity has a terrifying amount of boilerplate, configuration, and plumbing. SwiftUI views that need to observe state. Service classes that need to manage lifecycle. Notification wiring, UserDefaults persistence, error handling, logging. None of this is intellectually challenging. It definitely helps having the experience and knowing what you are doing. Regardless of that, said complexity takes time. AI compresses that time dramatically.

The hard parts—the 200-millisecond grace period that prevents the last word from being clipped, the click-through window that never steals focus, the cleanup order that prevents the audio deallocation crash—those came from human experience. From debugging. From watching users struggle and figuring out why. The bottleneck in software development was never typing speed. It was understanding: understanding what to build, understanding why it breaks, understanding what users actually need.


Yakki is a macOS dictation app built by one developer and one AI. It turns your voice into text, wherever your cursor is, with on-device privacy. Learn more at yakki.ai.