22 December, 2025 – Tether Data’s AI research division, QVAC, today announced the release of QVAC Genesis II, a major expansion of the world’s largest publicly available synthetic educational dataset for artificial intelligence pre-training. With the addition of 107 billion new tokens, the combined QVAC Genesis dataset now totals 148 billion tokens across 19 educational domains, significantly extending the scale, depth, and reasoning quality of open AI training data. QVAC Genesis II builds directly on the foundation laid by QVAC Genesis I, which introduced a rigorously validated, education-focused synthetic dataset spanning core STEM disciplines. This second release expands coverage to 10 new domains, including chemistry, computer science, statistics, machine learning, astronomy, geography, econometrics, and electrical engineering, while also regenerating college-level physics using an improved methodology. Together, Genesis I and II form the most comprehensive synthetic educational dataset ever released to the public. At the core of this release is a new data generation approach called Option-Level Reasoning, designed to extract structured reasoning not only from model failures, but also from correct answers. Rather than treating correct responses as finished outputs, this method systematically analyzes every answer option in a multiple-choice question, reinforcing correct reasoning while explicitly addressing common misconceptions. The result is training data that emphasizes clarity, causality, and decision-making, not just surface-level correctness. This new approach complements the original Failure Analysis method introduced in Genesis I, forming a dual-method pipeline that ensures every generated question contributes educational value. Independent evaluations show that models trained on Genesis II data demonstrate substantially higher reasoning accuracy and produce clear, unambiguous answers far more consistently than models trained on prior synthetic datasets. More than a scale increase, this release reflects a deliberate shift in how educational AI data should be built. While much of the industry focuses on scraping and aggregating ever-larger volumes of text, QVAC’s approach is designed to teach models how to think, reason, and explain, grounding intelligence in understanding rather than imitation. “Most AI training today optimizes for fluency, not understanding,” said Paolo Ardoino, CEO of Tether. “With this release, we’re pushing beyond volume toward structure, reasoning, and clarity. Intelligence should be built on understanding why something is true, not just predicting what sounds right. By making this dataset open, we’re giving researchers and builders the tools to develop AI that is more reliable, more explainable, and ultimately more useful to society.” As with Genesis I, the expanded dataset is released openly to support researchers, academic institutions, and independent developers working outside of closed, proprietary systems. It is made available under a Creative Commons Attribution–NonCommercial (CC-BY-NC 4.0) license, reinforcing QVAC’s commitment to open, community-driven AI research. The release continues QVAC’s broader mission to advance local, decentralized intelligence, where AI models can be trained, refined, and deployed without dependence on centralized cloud platforms. By strengthening the open foundations of AI training data, Tether Data aims to reduce structural barriers to innovation and ensure that high-quality intelligence remains accessible to the global research community. The full technical breakdown of the dataset, titled “QVAC Genesis II: Expanding the Largest and Highest-Quality Multi-domain Educational Synthetic Dataset for Pre-training” is available now via the QVAC research blog, alongside access to the dataset and models on Hugging Face. Further information, including a detailed FAQ section, is available on the QVAC Website.
This article was originally published as Tether Releases QVAC Genesis II on Crypto Breaking News – your trusted source for crypto news, Bitcoin news, and blockchain updates.
