AI is powered by three foundational megastructures: data, compute, and algorithms. AI training relies on feeding datasets (a.k.a. training samples) into algorithms to produce AI models. The performance of AI models is then evaluated through testing based on their prediction outcomes for additional, previously unfed data.
For example, AlphaGo’s initial model was trained with 30 million recorded go moves provided by professional human players, allowing it to quickly grasp the game and improve its skill level through subsequent iterations. On the other hand, Large Language Models (LLMs) such as ChatGPT require far greater amounts of general training samples. This process requires both labelled and unlabelled general datasets, such as images, text segments, or audio clips, which can only be delivered through manual labelling.
Alaya is a distributed AI data collection and labelling platform with intelligent optimisation, targeted sampling, custom preprocessing and superior privacy. We aim to address data quality challenges by incorporating sustainable decentralised incentives with captivating gamification concepts to provide a more engaging and rewarding experience.
Inspired by “Swarm Intelligence” — the emergence of collective intelligence from decentralised, self-organising behaviour of interconnected individuals –, Alaya is the first native Web3 AI data platform to integrate distributed communities with social commerce in data collection, labelling and annotation.
Alaya is designed to provide an integrated AI training and deployment toolset featuring data quality management, intelligent optimisation and adaptive sampling. Automated data preprocessing is achieved through a built-in user labelling and ranking system for accurate sampling task distribution and quality assurance. Additional customisation options for different datasets are available on request.
Our mission is to provide superior-quality AI data through decentralised community solutions to support a vibrant future for the wider Web3 AI ecosystem. It is our belief that the future of decentralised AI must rely on native Web3 data solutions to address rising challenges in the traditional corporate AI industry, such as data monopoly and censorship. A decentralised AI data solution will offer better individual privacy and data ownership rights and is also essential for building a self-regulating free market of data to support a future of decentralised AI networks that closely align with human values.