DB-First Architecture: Why PostgreSQL Beats YAML for Developer Tools
How we migrated from 200+ YAML files to PostgreSQL — and why every developer tool should consider doing the same.
🎯 TL;DR
- Replaced 200+ YAML config files with 40+ PostgreSQL tables — query times dropped 98%
- Idempotent upsert migrations (
ON CONFLICT DO UPDATE) made every migration safely re-runnable - JSONB columns handle schema evolution without breaking changes
- YAML files kept as read-only exports — developers can still browse them, but DB is the source of truth
😤 The Problem: YAML at Scale
If you’re building a developer tool that manages projects, sprints, tasks, or configuration, you’ve probably started with YAML or JSON files. We did too. It works great at first — human-readable, version-controlled, easy to edit.
Then your tool grows. Ours looked like this:
sprints/
project-a/
sprint-01.yaml
sprint-02.yaml
...
sprint-55.yaml
project-b/
sprint-01.yaml
...
config/
learnings/
2026-01-15-cache-pattern.yaml
2026-01-16-build-fix.yaml
...213 files...
events.yaml # 4000+ lines, growing daily
The breaking point? Two automated processes tried to update the same YAML file simultaneously. One process’s changes silently overwrote the other’s. We lost data that had to be manually re-entered.
The lesson: The moment you have more than one writer — whether that’s a developer, a CI pipeline, or an AI agent — file-based storage becomes a race condition waiting to happen.
🐘 Why PostgreSQL, Not Just “Any Database”
We chose PostgreSQL specifically for three features that YAML files can’t replicate:
- ACID transactions — concurrent writes are safe by default (MVCC handles isolation)
- JSONB columns — schema flexibility without losing query power. New fields go in JSONB first, graduate to real columns when stable
- pgvector extension — semantic search over knowledge entries using embedding vectors, something no file-based approach can match
🔄 The Migration Strategy: Idempotent Upserts
The key insight that made migration painless: every SQL statement must be safely re-runnable.
-- Every table uses IF NOT EXISTS
CREATE TABLE IF NOT EXISTS sprints (
id SERIAL PRIMARY KEY,
project_id INTEGER REFERENCES projects(id),
name VARCHAR(200) NOT NULL,
status VARCHAR(50) DEFAULT 'active',
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(project_id, name)
);
-- Every INSERT uses ON CONFLICT
INSERT INTO sprints (project_id, name, status)
VALUES (1, 'sprint-alpha', 'active')
ON CONFLICT (project_id, name) DO UPDATE
SET metadata = sprints.metadata || EXCLUDED.metadata,
updated_at = NOW();
This means you can run any migration file twice (or ten times) and get the same result. No “migration already applied” errors. No manual rollbacks. This single pattern eliminated an entire class of deployment bugs.
🧬 Schema Evolution: JSONB as an Escape Hatch
Traditional schema changes in developer tools are painful — you add a column, every config file needs updating, backward compatibility breaks. With JSONB:
-- Need a new field? Just add it to metadata
UPDATE sprints
SET metadata = metadata || '{"priority_score": 85}'
WHERE id = 42;
-- Query it immediately
SELECT name, metadata->>'priority_score' as priority
FROM sprints
WHERE (metadata->>'priority_score')::int > 80;
Our rule: new fields start in metadata JSONB. If they’re used consistently across 3+ release cycles, promote to a real column:
ALTER TABLE sprints ADD COLUMN IF NOT EXISTS priority_score INTEGER;
UPDATE sprints SET priority_score = (metadata->>'priority_score')::int
WHERE metadata ? 'priority_score';
This gives you the flexibility of YAML (add any key anytime) with the queryability of SQL.
⚡ Performance: The Numbers
| Operation | YAML (file-based) | PostgreSQL | Improvement |
|---|---|---|---|
| List all items | 2.3s (glob + parse 55 files) | 45ms (indexed query) | 51x faster |
| Full-text search | 1.8s (grep across 213 files) | 62ms (tsvector index) | 29x faster |
| Update one record | 340ms (read + parse + write + fsync) | 38ms (single UPDATE) | 9x faster |
| Concurrent writes | Race condition / data loss | ACID guaranteed | Safe |
| Cross-project query | 4.1s (scan all directories) | 78ms (JOIN) | 53x faster |
The 51x improvement on listing isn’t just about speed — it’s about what becomes possible. Sub-50ms queries mean you can build real-time dashboards, autocomplete, and interactive CLIs that feel instant.
📁 Keeping YAML Around (Read-Only Exports)
The biggest resistance to DB-first comes from developers who say: “But I can read YAML in my editor!” Fair point. The solution: keep YAML as read-only exports.
# Generate YAML snapshots from DB (for browsing, not editing)
dp export --format yaml --output sprints/
# The files are gitignored or auto-regenerated
# Manual edits get overwritten on next export
Developers still get human-readable files. But the source of truth is always the database. Any manual YAML edit gets overwritten on the next export cycle.
📏 Five Rules We Live By
After the migration, we codified these rules:
- DB is the only source of truth — if it’s not in PostgreSQL, it doesn’t exist
- Every INSERT has ON CONFLICT — no migration should fail on a second run
- YAML is export-only — generated from DB, never edited directly
- New fields start in JSONB — graduate to columns when stable
- IF NOT EXISTS on everything — CREATE TABLE, ADD COLUMN, CREATE INDEX
📊 Results
After migrating:
- Query times dropped 98% across the board
- Zero data loss incidents (vs 3 in the YAML era from concurrent writes)
- Semantic search over knowledge entries via pgvector embeddings
- Real-time dashboards became possible (sub-50ms queries)
- Schema changes became non-breaking (JSONB escape hatch)
🤔 When to Make the Switch
You should consider DB-first when:
- You have more than one process writing to config/state files
- You need to query across projects or aggregate data
- Your YAML/JSON files exceed 50 files or 1000 lines
- You want full-text search or semantic search over your data
- You’re building a CLI or dashboard that needs fast lookups
You can stay with YAML when:
- Single-user, single-project tool
- Static configuration that rarely changes
- No need for cross-file queries
- Version control diff visibility is the top priority
The migration is worth it earlier than you think. The concurrent write bug that finally pushed us over was predictable in hindsight — we just didn’t hit it until our tool scaled.
Written by Kishore Rajendra, AI-assisted. DevPilot is a development velocity engine for solo developers and small teams.