Researchers from Anthropic and collaborators published a paper in Nature in April 2026 that should make anyone who works with data uneasy.

They gave a language model a hidden behavioral trait, then had it generate data that looked unrelated to that trait — number sequences, code, math reasoning traces. They then filtered the outputs aggressively to remove explicit and detectable references to the trait.

A fresh model trained on this filtered data still inherited the trait.

In one setup, a model prompted to prefer owls generated nothing but numbers. After filtering, a student model trained on those numbers went from naming "owl" as its favorite animal 12% of the time to over 60%. The authors call this subliminal learning.

They reported similar effects across number sequences, code, and reasoning traces. The effect was strongest when teacher and student shared the same or closely matched base model; transfer across different model families was much weaker.

The trader-relevant principle is not that markets work like neural networks.

It is simpler than that: filtering data does not guarantee you removed the fingerprint of the process that generated it.

When you exclude outlier days from a backtest, the remaining sample still reflects the logic that decided what counts as an "outlier."

When you filter setups by win rate and then study the survivors for common features, some of what you find reflects the filter itself — not just the market.

When you clean a dataset by removing "messy" periods, your definition of messy already embeds assumptions about what normal looks like.

One practical implication the authors highlight is provenance: tracking where data and models come from, not just what outputs look like.

Takeaway:
Next time you clean a dataset or filter a sample, ask not only what you removed, but what assumptions defined the removal. That filter has a point of view. And it is still in your data.

Part 1 of 3. Next: Your Backtest Has a Family Tree.

This is not trading advice. No entries, exits, or price targets. Research note on data integrity.

Building structure tools, not signals

$BTC

BTC
BTC
61,947.99
+1.13%