Scientists published a new study on large language models (LLMs), also known as AI, which confirms the hypothesis that constant exposure to low-quality web content leads to long-term and significant reductions in the cognitive abilities of the models—reasoning, understanding long context, safety, and even the emergence of 'dark traits' such as psychopathy, narcissism, and Machiavellianism. The study was conducted by a group of scientists from several American universities.
In the experiment, four open LLMs underwent extensive fine-tuning on "garbage" data — short and popular posts and tweets, low-quality, trivial, but highly active content. The data is divided along two dimensions: M1 — the degree of engagement (popularity of short posts), and M2 — semantic quality (how substantive the material is). The control data was of similar size but compressed and with low engagement or substance.
The results were serious. With M1 intervention (high engagement of garbage posts), benchmark reasoning scores (ARC-Challenge with Chain-of-Thought) dropped from ~74.9 to ~57.2, and for the long context test (RULER-CWE) — from ~84.4 to ~52.3, as the share of garbage data increased from 0% to 100%. The main cause of degradation was the so-called "thought-skipping": models increasingly omitted or shortened reasoning and planning chains.
At the same time, attempts to "heal" such a condition — through tweaking instructions or fine-tuning on clean data — yield only partial recovery. The research shows that even the most extensive adjustments cannot fully restore a model's original capabilities: persistent representational drift occurs, rather than just a training format issue. Notably, the popularity metric (post engagement) turned out to be a stronger predictor of degradation than text length or content.
Ultimately, the authors emphasize: the quality of data is not just a technical detail of training, but a matter of training safety. They call for regular "cognitive health checks" of deployed LLMs and a review of the practice of continuous learning on unfiltered web data. This research sets a new benchmark: it is not enough to scale models — it is necessary to control what they are given as input.
Remember this when you trade on GPT's signals and advice!)


