Vibe-Coding: A Field Guide to Shipping Production Code with AI Assistance

June 7, 2025

Software development has fundamentally shifted with sophisticated AI coding assistants like Claude. What started as a meme has evolved into a legitimate development methodology that can deliver genuine 10x productivity improvements when applied correctly. But like any powerful tool, the difference between success and disaster lies in how you wield it.

This post is heavily inspired by Diwank Singh’s excellent field notes from his team at Julep.

From Internet Joke to Production Reality

The concept of “vibe-coding” emerged from Andrej Karpathy’s tweet about letting AI write code while developers “just vibe.” Initially dismissed as developer humor, this fantasy began materializing with Claude Sonnet 3.5 and advanced coding interfaces. The joke stopped being funny because it started being genuinely possible.

Steve Yegge’s concept of CHOP (Chat-Oriented Programming) provides a more grounded description of this paradigm. Rather than traditional line-by-line coding, developers now operate like orchestra conductors, directing and shaping rather than playing every instrument themselves. You’re not crafting every line anymore; you’re reviewing, refining, and directing. But you remain the architect while Claude serves as an incredibly knowledgeable intern with zero context about your specific system.

Think of traditional coding like sculpting marble. You start with a blank block and carefully chisel away, line by line, function by function. Every stroke is deliberate, every decision yours. It’s satisfying but slow. Vibe-coding is more like conducting an orchestra. You’re not playing every instrument but providing vision and direction. The AI provides raw musical talent, but without your guidance, it’s just noise.

The Three Modes That Actually Work

After extensive experimentation and more than a few production incidents, three distinct modes of AI-assisted development have emerged. Each has its own rhythm, guardrails, and appropriate use cases.

Playground Mode is where you embrace the chaos. Claude writes 80-90% of the code while you provide just enough steering to keep things on track. This is perfect for weekend hacks, personal scripts, proof-of-concepts, and those “I wonder if…” moments that make programming fun. You can go from idea to working prototype in minutes, but this cowboy coding style is absolutely inappropriate for anything that matters. Use it for experiments, never for production.

Pair Programming Mode represents the sweet spot for most development. This is where vibe-coding starts to shine for projects under 5,000 lines of code, side projects with real users, or well-scoped services within larger systems. The key innovation here is the CLAUDE.md file, which serves as a constitution for your codebase. Instead of repeatedly explaining your project’s conventions, you document them once, and Claude maintains consistency across all interactions.

This file should include your architecture decisions and the reasoning behind them, code style guidelines for formatting and naming conventions, testing requirements that specify how tests should be written, a domain glossary with project-specific terminology, and explicit boundaries about what AI should never modify. Beyond documentation, this mode relies heavily on “anchor comments” throughout the codebase using prefixes like AIDEV-NOTE, AIDEV-TODO, or AIDEV-QUESTION to provide context and guidance directly in the code.

Production Scale Mode requires the most sophisticated approach, as the potential for both benefit and harm increases dramatically. This is for large codebases, systems with real users, and applications where bugs have financial or reputational costs. Current AI systems don’t scale seamlessly to massive codebases, so the most effective approach involves sectioning systems into well-defined modules that AI can understand safely.

At this scale, boundary comments become critical, explicitly marking integration points and warning AI about areas where changes could break production systems or API contracts. Without these explicit boundaries, AI will happily “improve” your APIs and break every client in production.

The One Sacred Rule

Here’s the most important principle in AI-assisted development, and I’m going to repeat it until it’s burned into memory: never let AI write your tests. This isn’t about AI limitations; it’s about the fundamental nature of what tests represent.

Tests aren’t just code that verifies other code works. Tests are executable specifications that encode your understanding of the problem domain, your edge cases, and your actual intentions. They represent the accumulated wisdom of every bug you’ve fixed and every edge case you’ve discovered. When AI writes tests, it’s just verifying that the code does what the code does, not what it should do.

For example, when Claude implements a rate limiter, it might generate basic tests verifying functionality, but miss critical production concerns like memory leaks from users who hit the API once and never return. Only human domain knowledge reveals such issues. Your tests are your specification, your safety net, and the encoded wisdom of your experience. Guard them zealously.

Building Sustainable Infrastructure

The CLAUDE.md file isn’t optional documentation; it’s the foundation that enables consistent, reliable AI assistance. Think of it as establishing the fundamental laws that govern how code should be written and how systems should interact. A production-tested structure includes the golden rule (always ask when unsure), project context and purpose, critical architecture decisions with explanations, code style and anchor comment guidelines, and explicit boundaries about what AI must never touch like test files, API contracts, migrations, or secrets.

AI-assisted development also changes your git workflow significantly due to the increased pace of code generation. Git worktrees provide isolated environments where AI can experiment freely without polluting your main branch history. You can create an AI playground, let Claude experiment in isolation, cherry-pick successful changes back to main, and clean up when done. This gives you the best of both worlds: Claude can experiment freely while your main branch history stays clean and meaningful.

Transparency in commit messages becomes crucial too. Tag AI-assisted commits clearly so reviewers know to pay extra attention. Something like “feat: implement user feed caching [AI]” with notes about what AI generated versus what you designed helps during code review and future debugging.

Context Is Everything

One counterintuitive lesson is that being conservative with context to save tokens actually costs more long-term. It’s like trying to save money by buying cheap tools that break quickly. Compare giving AI a starved prompt like “Add caching to the user endpoint” versus providing rich context about request volume, server architecture, existing infrastructure, and specific requirements. The front-loaded context investment pays for itself by avoiding iteration cycles.

The lesson is clear: front-load context to avoid iteration cycles. Think of tokens like investing in good tools; the upfront cost pays for itself many times over. Fresh sessions for distinct tasks also help. It’s tempting to keep one long-running conversation, but this leads to context pollution. Use one task, one session. When the task is done, start fresh.

Real-World Application

Refactoring error handling across 500+ endpoints showcases how to divide responsibilities between human architectural decisions and AI mechanical execution effectively. The human work involves making architectural decisions about error philosophy, hierarchy design, and response formats. These are pure architectural decisions that Claude can’t make because they involve understanding business requirements, user needs, and operational constraints.

With clear specifications, Claude can handle the mechanical refactoring of finding and updating hundreds of error sites consistently. The result was completing in 4 hours what would have taken 2 days manually, with all 500+ error sites updated consistently. This demonstrates the power of clear human direction combined with AI execution capability.

Making It Work in Practice

Not all AI mistakes carry equal weight, and understanding this hierarchy helps you allocate attention appropriately. Some mistakes are merely annoying but harmless, like wrong formatting that your linter will catch or verbose code you can refactor later. Others are expensive to fix, including breaking internal APIs that require coordination or changing established patterns that confuse the team. But the really dangerous ones are career-limiting events: modifying tests to make them pass, breaking API contracts, leaking secrets, or corrupting data migrations. Your guardrails should be proportional to the mistake level.

Successful AI adoption also requires cultural changes that normalize disclosure of AI assistance. This isn’t about hiding AI use but about using it responsibly. Tag commits transparently so reviewers know which code requires extra scrutiny and future debuggers understand the code’s provenance. Research from “Accelerate: The Science of Lean Software and DevOps” shows that teams using rigorous practices deploy 46 times more frequently and are 440 times faster from commit to deployment. This performance gap amplifies with AI assistance because AI amplifies your existing capabilities and processes. If your practices are chaotic, AI amplifies the chaos. If your practices are disciplined, AI amplifies the discipline.

As AI capabilities advance, the fundamentals remain constant: humans set direction, AI provides leverage. What’s coming includes proactive AI that suggests improvements without prompting, AI that learns team patterns and preferences, and persistent memory across sessions and projects. But even as capabilities expand, the core principle remains: we’re tool users, and these are simply the most powerful tools ever created for software development.

The developers who master this workflow aren’t necessarily smarter or more talented; they’re simply the ones who started earlier and learned from more mistakes. In an industry where speed and quality determine success, AI assistance isn’t a nice-to-have; it’s a competitive necessity. But only if you do it right.

If you’re ready to adopt AI-assisted development, start by creating a CLAUDE.md file for your current project, add three anchor comments to your most complex code, and try one AI-assisted feature with proper boundaries. The most important thing is to start. Start small, start careful, but start. The tools exist, patterns are proven, and competitive advantages are real.

Vibe-coding represents a fundamental shift that amplifies human capabilities rather than replacing them. Teams mastering this approach ship better software faster, while those ignoring it watch competitors advance. Perfect is the enemy of shipped. Start with one project, establish boundaries, and iterate. The future of development is here; it’s just not evenly distributed yet.

Templates: Julep’s CLAUDE.md example

“Perfect is the enemy of shipped. Start with one project, establish boundaries, iterate. Be part of the distribution.”