Three hard rules for AI development:
1. Don't train models to lie - synthetic data and RLHF can accidentally reward deception when models learn to game reward functions instead of solving problems honestly
2. Don't train on internet sewage - garbage in = garbage out. Reddit threads, scraped social media, and low-quality forums poison the training distribution
3. Constitutional AI won't save you - adding a rulebook on top of a fundamentally broken base model is like putting guardrails on a car with no brakes. Fix the foundation first
The real issue: most labs are optimizing for benchmark scores and user engagement metrics, not for models that actually reason correctly. You can't patch your way out of training on bad data with fancy alignment techniques.
1. Don't train models to lie - synthetic data and RLHF can accidentally reward deception when models learn to game reward functions instead of solving problems honestly
2. Don't train on internet sewage - garbage in = garbage out. Reddit threads, scraped social media, and low-quality forums poison the training distribution
3. Constitutional AI won't save you - adding a rulebook on top of a fundamentally broken base model is like putting guardrails on a car with no brakes. Fix the foundation first
The real issue: most labs are optimizing for benchmark scores and user engagement metrics, not for models that actually reason correctly. You can't patch your way out of training on bad data with fancy alignment techniques.