Vibe Coding vs Hard Engineering: Maintaining Quality When AI Writes Your Code
AI now generates 80% of production code. Learn how to maintain security and quality standards in the vibe coding era with proven strategies
I shipped a feature last Tuesday that I couldn't fully explain. The code worked, tests passed, edge cases handled, performance solid. But if you'd asked me to walk through the implementation line by line, I would've stumbled. An AI wrote most of it. I just vibed it into existence.
This is vibe coding. You describe what you want, the AI generates it, you tweak until it feels right, and you ship. It's fast, it's powerful, and it's fundamentally different from the engineering discipline we spent years learning. The question isn't whether this is the future; it already is. The question is how we maintain code quality and security when 80% of our boilerplate comes from a black box.
What vibe coding actually means
Vibe coding isn't just "using AI tools." It's a shift in how we build software. Traditional engineering meant understanding every abstraction, debugging from first principles, and building mental models of your entire stack. Vibe coding means prompt refinement, pattern recognition, and trusting outputs you didn't write.
Here's what it looks like in practice. I needed a rate limiter for an API. The old way: research algorithms, implement token bucket, handle edge cases, write tests. The new way:
# Prompt: "Create a Redis-backed rate limiter for FastAPI
# that allows 100 requests per minute per user, with burst handling"
from fastapi import FastAPI, HTTPException, Request
from redis import Redis
import time
app = FastAPI()
redis_client = Redis(host='localhost', port=6379, decode_responses=True)
async def rate_limit(request: Request, limit: int = 100, window: int = 60):
user_id = request.headers.get("X-User-ID", "anonymous")
key = f"rate_limit:{user_id}"
current = int(time.time())
pipe = redis_client.pipeline()
# Remove old entries outside the window
pipe.zremrangebyscore(key, 0, current - window)
# Add current request
pipe.zadd(key, {current: current})
# Count requests in window
pipe.zcard(key)
# Set expiry
pipe.expire(key, window)
_, _, request_count, _ = pipe.execute()
if request_count > limit:
raise HTTPException(status_code=429, detail="Rate limit exceeded")
return True
The AI generated this in three seconds. It works. It handles the sorted set cleanup, the pipeline for atomicity, the expiry to prevent memory leaks. I tested it, tweaked the error message, and shipped it.
But I didn't write it. I don't fully understand why it uses zremrangebyscore instead of a simpler approach. I trust it works because the tests pass, but my mental model is fuzzy. That's vibe coding.
The security problem nobody's talking about
Here's what happened to a team I advise. They used AI to generate authentication middleware. The code looked solid: JWT validation, proper headers, clean error handling. It went through code review. Three engineers approved it. It shipped to production.
Two weeks later, they discovered the vulnerability:
// What the AI generated (looks fine)
const authenticateToken = (req, res, next) => {
const authHeader = req.headers['authorization'];
const token = authHeader && authHeader.split(' ')[1];
if (token == null) return res.sendStatus(401);
jwt.verify(token, process.env.ACCESS_TOKEN_SECRET, (err, user) => {
if (err) return res.sendStatus(403);
req.user = user;
next();
});
};
// The problem: what if process.env.ACCESS_TOKEN_SECRET is undefined?
// jwt.verify fails silently and calls next() with err set
// But the error handling doesn't account for startup configuration issues
This wasn't a hallucination. The code is textbook JWT validation. The problem was context, the AI didn't know their environment variables were loaded asynchronously. In development, it worked. In production, during deployment, there was a race condition where the secret wasn't loaded yet.
The middleware silently failed open for the first few requests after each deploy. Full authentication bypass. For about 200ms per deployment. Enough for an attacker who was watching.
The vibe coder's mistake: They verified the code did what they asked. They didn't verify it handled what they didn't think to ask about.
Common mistakes in the vibe coding era
Mistake 1: Trusting tests you didn't write
I asked an AI to write tests for a payment processing function. The tests were comprehensive, happy path, error cases, edge conditions. They all passed. I shipped it.
The production bug: the tests mocked the payment gateway's response format incorrectly. The actual API returned status: "succeeded" but the tests checked for status: "success". The code worked in tests, failed in production.
# What the AI generated for tests
def test_process_payment_success(mock_stripe):
mock_stripe.charge.create.return_value = {
"id": "ch_123",
"status": "success", # Wrong - Stripe uses "succeeded"
"amount": 1000
}
result = process_payment(amount=1000, token="tok_123")
assert result["success"] is True
# What actually happens in production
def process_payment(amount, token):
charge = stripe.Charge.create(amount=amount, source=token)
# This fails because charge.status is "succeeded", not "success"
if charge.status == "success":
return {"success": True, "charge_id": charge.id}
return {"success": False, "error": "Payment failed"}
The fix isn't to stop using AI-generated tests. It's to verify mocks against real API responses. Run contract tests. Use tools like VCR.py to record actual interactions.
Mistake 2: The "works on my machine" cascade
Vibe coding amplifies environment-specific bugs because the AI doesn't know your deployment context. I've seen this pattern repeatedly:
// AI generates this for local file processing
const fs = require('fs');
const path = require('path');
function processUpload(filePath) {
const absolutePath = path.resolve(filePath);
const data = fs.readFileSync(absolutePath, 'utf8');
// Process data...
return processed;
}
Looks fine. Works locally. Breaks in production because:
Your serverless function has read-only filesystem (except
/tmp)The file path is actually an S3 URL, not a local path
There's no error handling for the synchronous read blocking the event loop
The AI optimized for your prompt, not your infrastructure. The solution is better prompts that include deployment context: "Create a file processor for AWS Lambda that handles S3 URLs and uses streams for memory efficiency."
Mistake 3: Accumulating technical debt invisibly
Traditional tech debt is obvious, you see the nested conditionals, the God class, the missing abstractions. Vibe coding creates invisible debt because the code looks clean but has no underlying architecture.
I inherited a codebase where every feature was vibed into existence. Each route handler was perfect in isolation. Together, they were unmaintainable:
# Route 1 - AI generated
@app.post("/users")
async def create_user(user: UserCreate):
hashed = bcrypt.hashpw(user.password.encode(), bcrypt.gensalt())
# Direct database insert...
# Route 2 - AI generated (different session)
@app.post("/admins")
async def create_admin(admin: AdminCreate):
hashed = hashlib.sha256(admin.password.encode()).hexdigest()
# Direct database insert...
# Route 3 - AI generated (different session)
@app.post("/api-keys")
async def create_api_key(key_data: APIKeyCreate):
salted = f"{key_data.secret}{SALT}"
hashed = hashlib.md5(salted.encode()).hexdigest()
# Direct database insert...
Three different hashing algorithms for passwords. None of them shared. All generated independently. Each one "correct" in isolation. Collectively, a security nightmare.
The fix requires human-driven architecture. Define your abstractions first, then use AI to implement within those constraints:
# Human-designed interface
class PasswordHasher:
def hash(self, password: str) -> str:
raise NotImplementedError
def verify(self, password: str, hashed: str) -> bool:
raise NotImplementedError
# Then prompt: "Implement PasswordHasher using bcrypt with work factor 12"
class BcryptHasher(PasswordHasher):
def hash(self, password: str) -> str:
return bcrypt.hashpw(
password.encode(),
bcrypt.gensalt(rounds=12)
).decode()
def verify(self, password: str, hashed: str) -> bool:
return bcrypt.checkpw(password.encode(), hashed.encode())
# Use everywhere
hasher = BcryptHasher()
How to maintain quality when you're mostly prompting
Strategy 1: Prompt for the pit of success
Bad prompt: "Create a login endpoint"
Good prompt: "Create a login endpoint for FastAPI that uses bcrypt for password hashing, returns JWT tokens with 15-minute expiry, implements rate limiting at 5 attempts per minute per IP, logs failed attempts to Datadog, and includes input validation using Pydantic models."
The second prompt gets you closer to production-ready code because you've encoded your security and operational requirements upfront. The AI can't forget requirements you didn't specify.
Strategy 2: Review for what's missing, not what's wrong
Traditional code review: "Does this logic work correctly?" Vibe code review: "What edge cases didn't we think to handle?"
I train my team to ask:
What happens if this runs out of memory?
What happens if the database is down?
What happens if this gets called 10,000 times per second?
What happens if the input is malicious?
What happens if the dependency is compromised?
The AI generated working code. Your job is to think about the contexts it couldn't.
Strategy 3: Test the boundaries, not the happy path
AI-generated tests are suspiciously focused on making things work. Write additional tests for making things break:
# AI generates this
def test_parse_date_valid():
assert parse_date("2024-01-15") == datetime(2024, 1, 15)
# You should add these
def test_parse_date_invalid_format():
with pytest.raises(ValueError):
parse_date("not-a-date")
def test_parse_date_sql_injection():
with pytest.raises(ValueError):
parse_date("2024-01-15'; DROP TABLE users; --")
def test_parse_date_year_overflow():
with pytest.raises(ValueError):
parse_date("999999-01-15")
def test_parse_date_null_byte():
with pytest.raises(ValueError):
parse_date("2024-01-15\x00")
Strategy 4: Maintain a human-written core
Not everything should be vibed. Keep a human-written core of critical abstractions:
Authentication and authorization logic
Data validation and sanitization
Error handling patterns
Database migration strategy
API contract definitions
Security middleware
Think of these as load-bearing walls. You can vibe code the interior decorating, but the structure needs human thought.
Strategy 5: Understand it or delete it
This is harsh but necessary: if you ship code you don't understand, you own the consequences. When AI generates something clever that you can't explain, you have three choices:
Study it until you understand it
Simplify it until you understand it
Delete it and write it yourself
I learned this the hard way with a caching implementation. The AI generated a sophisticated LRU cache with WeakRefs for memory management. It worked beautifully. Until it caused a production memory leak because the WeakRef behavior interacted poorly with our serialization layer.
I didn't understand WeakRefs well enough to debug it. If I had studied it first or asked for a simpler approach, I would have caught the issue.
The meta-problem: vibe architecture
The hardest part isn't maintaining individual vibe-coded features. It's maintaining system coherence when every component was independently vibed into existence.
You end up with systems that work but have no conceptual integrity. Each service uses different logging formats because they were generated in different sessions. Each module has different error handling because the prompts varied. The codebase becomes a Frankenstack.
The solution is treating AI as an implementation assistant, not an architect. You still design the system. You still define the interfaces. You still establish the patterns. Then you use AI to fill in the implementation details faster than you could manually.
What actually works
After a year of heavy vibe coding in production, here's what keeps quality high:
Architecture reviews before implementation. Decide on your patterns before prompting. What's your error handling strategy? How do you handle configuration? What's your testing approach? Lock these in, then generate within those constraints.
Mandatory explanation sessions. If you can't explain how the generated code works to a junior engineer, you don't ship it. This forces you to understand what you're deploying.
Defense in depth. Assume AI-generated code has vulnerabilities you didn't catch. Add monitoring, add rate limiting, add circuit breakers, add input validation at multiple layers.
Continuous security scanning. Use tools like Semgrep, CodeQL, and dependency scanners. AI might generate code with known vulnerable patterns. Automation catches what review misses.
Keep a decision log. When you choose to use AI-generated code over writing it yourself, document why. This helps future you understand the tradeoffs.
Looking forward
Vibe coding isn't going away. It's going to accelerate. The developers who thrive won't be the ones who resist it or blindly embrace it, they'll be the ones who figure out how to maintain engineering discipline while working at AI speed.
That means stronger architecture skills, better testing instincts, and deeper security awareness. The boilerplate is automated. The thinking isn't.
I still can't fully explain that rate limiter I deployed last Tuesday. But I can explain why I chose that approach, what tradeoffs it makes, what could go wrong, and how I'd debug it. That's the new baseline.
The code writes itself now. Our job is making sure it writes the right thing.