Constellation Network and Common Crawl Provide Secure Validation of AI Training Data

San Francisco, California, December 19th, 2024, Chainwire

Constellation Network, a Web3 ecosystem validated by the US Department of Defense, today announced the launch of a customized blockchain developed in partnership with the Common Crawl Foundation, to create the industry’s first cryptographically secure, immutable archive of internet data for AI training and development.

The collaboration introduces a new approach to validating and securely accessing 17 years of internet crawl data—spanning nearly  9 petabytes which 80% of Large Language Models (LLMs) use to train AI—through an immutable, cryptographically secured blockchain network built on Constellation. This innovative application-specific network, or Metagraph, addresses pressing concerns in AI development while exploring vast new use cases for blockchain technology in emerging industries: data provenance, privacy, and ethical sourcing. Furthermore, the network will utilize Constellation’s DAG utility asset to secure the archived internet crawls. This represents a significant advancement in utilizing cryptocurrency as a mechanism for businesses to notarize data, shifting the focus from consumer costs or gas fees typical of many other layer-one networks to an operational expense.

Key Technological Innovations

Comprehensive Data Archiving: A fully immutable copy of internet history, providing unprecedented transparency and traceability for AI training datasets
End-to-End Encryption: Cryptographic security that ensures data integrity throughout the AI development lifecycle
Ethical AI Framework: A robust solution for addressing concerns around data collection, storage, and usage in large language models

“This integration is a critical step forward in securing the future of AI development,” said Alex Brandes, CTO of Constellation Network. “By ensuring cryptographic integrity and immutability of training data, we are addressing one of the most pressing challenges in the field today: trustworthiness and provenance of datasets. We believe our platform will grow to become a cornerstone in the field of responsible AI development, setting new standards for data integrity and trust.” 

Industry Applications

The blockchain-enabled data archive is already attracting attention from advanced AI research initiatives. TraceAI, a project developed through the National Science Foundation (NSF) and SBIR program, is in testing stages in the development of their own application-specific network, built on Constellation, to add immutability, auditability, and proof of authorship to its training models and to develop advanced watermarking technologies. TraceAI will also leverage  Common Crawl’s Constellation-built solution to further extend their work in blockchain encrypted AI to include tracking the source origin of data.

Kevin Jackson, Vice President of Space Domain Communications & Commercialization for Forward EdgeAI, emphasizes the significance of this breakthrough: “This represents the natural evolution of AI and machine learning model development—transforming data management from a technical challenge to a trusted business tool that drives global standardization and verification.”

Looking Forward

Over the coming months, Constellation Network and Common Crawl Foundation will work together to expand on solution sets for AI developers and further integrate the distribution of the cryptographically validated access to the crawl as part of the standard release process.   

“For users of the Crawl who are concerned about the provenance of the data, especially those using it for AI models, Constellation and their hypergraph blockchain provides an elegant solution”, said Rich Skrenta, Executive Director of the Common Crawl, “we are looking forward to adding the ability to securely validate the crawl as part of our standard distribution by partnering with Constellation”.

Evidence of this integration can be found on Constellation’s transaction viewer, called the “DAG explorer,” and developers can get started using verified historical crawls for AI applications. Please follow along for further solutions to be developed by Constellation, Forward Edge-AI, and Common Crawl. 

About Constellation Network Constellation is a leading blockchain network advancing innovation through on-chain data security, partnering with critical global stakeholders, including the U.S. Department of Defense, to deliver transformative, next-generation technologies.

About Common Crawl Foundation The Common Crawl Foundation is a 501(c)(3) non-profit organization dedicated to providing a copy of the internet to the public, free of charge. Their web archive consists of petabytes of data collected over years of web crawling, serving as a critical resource for researchers, businesses, and developers worldwide.

About Forward Edge-AI Forward Edge-AI is at the forefront of a revolution in responsible and inclusive Artificial Intelligence (AI) for the betterment of humanity. Since its foundation in 2019, our goal is to become the dominant player in Artificial Intelligence and lead the revolution in augmenting edge technology with human intelligence.

About Common Crawl Foundation

Contact

Email: [email protected] 

Website: https://constellationnetwork.io/ 

Twitter: https://x.com/conste11ation 

GitHub: https://github.com/Constellation-Labs/tessellation

DAG Explorer: https://mainnet.dagexplorer.io/

Contact

Dagnum PI
[email protected]

The post Constellation Network and Common Crawl Provide Secure Validation of AI Training Data appeared first on Chainwire.