My Claude Code Workflow for Refactoring a Legacy Codebase

Last month I inherited a Node.js backend that hadn’t been meaningfully updated since 2019. Express 4, callbacks everywhere, no TypeScript, test coverage around 12%. The kind of codebase where you open a file and say “okay” in a tone that means something’s wrong.

The goal was to modernize it without a full rewrite. Keep the API surface identical, upgrade the internals, and get test coverage above 70%. The constraint was time — two weeks.

This is how I broke it down using Claude Code, and what I learned about structuring large refactors with AI assistance.

Start with a map, not a plan

The first thing I did was ask Claude to analyze the codebase. Not to fix anything — just to describe what was there.

I had it walk the directory structure, identify the routing patterns, trace the data flow from request to database, and flag the most concerning patterns. It found five major problem areas: raw SQL string concatenation, callback-based async, no input validation, hardcoded configuration, and circular dependencies between modules.

This took about 30 minutes and gave me something I didn’t have before: a written map of the codebase from someone who wasn’t emotionally attached to it. That map became the basis for planning the refactor.

Break the refactor into parallel tracks

A refactor this size can’t be one big branch. It needs to be broken into tracks that can proceed independently and merge cleanly.

I identified four tracks:

  1. Async migration — Convert callback-based code to async/await, module by module.
  2. Input validation — Add Zod schemas to every route handler.
  3. SQL injection cleanup — Replace string concatenation with parameterized queries.
  4. Test coverage — Write tests for existing behavior before changing it.

The key insight: tracks 1-3 touch different parts of the code if you sequence them by module. The async migration for the user module doesn’t conflict with adding validation to the billing module. Track 4 (tests) needs to run against the current code first, then be updated as tracks 1-3 land.

One shard per track

This is where the workflow clicked.

I opened Crystl, set up a gem for the project, and created four shards — one for each refactor track. Each shard was an isolated session running on its own git branch via a worktree.

  • Shard 1: refactor/async-migration — Claude converting callbacks to async/await, starting with the lowest-dependency modules.
  • Shard 2: refactor/input-validation — Claude adding Zod schemas and validation middleware.
  • Shard 3: refactor/parameterized-sql — Claude replacing raw SQL with parameterized queries.
  • Shard 4: refactor/test-baseline — Claude writing tests against the current behavior.

All four running simultaneously. I could glance at the Crystal Rail and see which shards were active, which were waiting for approval, and which had finished their current task.

The merge cadence

Running parallel tracks only works if you merge frequently. The cadence I settled on:

Day 1-2: Let the test baseline shard run first. It writes tests against unmodified code, so there are no conflicts. Merge it into main once coverage looks solid.

Day 3-5: The three refactor shards work in parallel. Each one is scoped to specific modules to minimize overlap. At the end of each day, I’d merge whichever track had a clean, passing set of changes.

Day 6-8: Conflict resolution. As the refactored modules start overlapping (the async-migrated user module now needs its validation updated too), I’d rebase the remaining branches and let Claude handle the merge conflicts within each shard.

Day 9-10: Final integration and cleanup. Merge remaining branches, run the full test suite, fix what broke.

What went right

Parallel progress was real. While Claude was grinding through the async migration on one branch, it was simultaneously writing validation schemas on another. I wasn’t blocked on either.

Incremental merges kept things stable. Instead of a terrifying mega-merge at the end, each merge was small and reviewable. When something broke, the blast radius was one track’s worth of changes.

Conversation history saved me. More than once, I needed to go back and check why Claude chose a particular approach in a shard I hadn’t looked at for two days. Being able to scroll through the conversation history and see the full reasoning was critical.

Notifications kept me in the loop. Crystl’s notifications pinged me when a shard needed an approval decision or had finished its task. I didn’t need to babysit four terminal windows.

What I’d do differently

Scope tracks more narrowly. My “async migration” track was too broad. It would have been better as three shards — one per module group — since the later modules depended on patterns established in the earlier ones.

Write integration tests earlier. Unit tests are great, but the failures I hit during merges were all integration-level. Should have had Claude write integration tests as part of the baseline track.

Use formations. Crystl has a feature called formations that lets you save and restore shard layouts. I set up my four-shard arrangement manually each morning. Should have saved it as a formation on day one.

The result

Two weeks, four parallel tracks, 47 PRs merged. Test coverage went from 12% to 74%. The codebase is async/await throughout, all SQL is parameterized, every route validates its input, and the API contract didn’t change.

Could I have done this without parallel shards? Technically yes. But it would have been sequential — finish one track, start the next — and I’d still be working on it.

The refactoring workflow itself is tool-agnostic. Break the work into independent tracks, merge incrementally, run tests continuously. But having a terminal that actually supports running multiple Claude sessions in parallel, on isolated branches, without losing context — that turned a theoretical workflow into a practical one.

Crystl is free — sign up at crystl.dev/login.