The Language Network - ICO Review

The Language Network is a decentralized, open ecosystem for Language AI apps and services.

Language AI is about getting machines to understand humans, in our natural language, rather than us conforming to the interfaces of our machines. In many ways, it’s the most human-centered form of AI, and it’s projected to be one of the fastest growing as well: $126 billion by 2023.

But it’s also potentially a bit scary. Do we want a future where a few large tech companies collect and use all of our language and communication data to train their AI?

We think there’s a better way, so we’re using blockchain and crypto-economics to create an open ecosystem, one that’s not owned by any single company and that shares value across all stakeholders.


Our Mission

1. Create open infrastructure that accelerates innovation in Language AI
2. Put users in control of their data
3. Share value equitably across the ecosystem

Changing the economic model of language data

LangNet turns language data into a revenue generating asset owned by the individual. Right now, tech companies either process data they’ve collected from you or buy data from data brokers. These middlemen data brokers hire people to generate language data on an hourly basis and keep the enormous profits for themselves.

We take a different approach and give ownership back to the users that produced the data. Through LangNet, users’ voice data becomes their intellectual property, stored on an immutable blockchain ledger and protected by our Voice ID system that can identify if someone else is using their voice data. Instead of contributing labor and being paid hourly, users can build an asset base of language data that can be “rented” out to others, like a piece of property.

This is a profound change in the way data is priced, similar to the shift from high-priced, one-time software licenses to usage based SaaS pricing. And like SaaS businesses, the new model of licensing data incentivizes the network to optimize for network usage rather than profit maximization, which should bring down the price of voice data, increase innovation, and grow a bigger pie for everyone as more innovative language-based apps are developed.

About our ICO

The team behind LangNet has been building speech recognition products and services for over two years, cooperating and partnering with leading enterprises including a top 10 global bank.

Our experience working with enterprises gave us two important insights:

  1. there is an enormous amount of latent demand for Language AI applications in every industry;
  2. each industry, enterprise and application requires its own language dataset, so the lack of good data greatly limits innovation and adoption of Language AI.

With the realization that this is a system-level problem rather than just a business problem, we started LangNet to build a new kind of ecosystem, one that is incentivized correctly for broad participation, resource-sharing and innovation.

We are using this ICO to kickstart the community of data contributors, developers, and large enterprises. Over these next few weeks, we hope you’ll join us as we roll out several bounty campaigns, partnership programs, and data crowdsourcing campaigns to build a vibrant ecosystem for Language AI.

Background

Building effective Language AI applications requires large amounts of high quality language data. At its most basic, the extent to which Language AI models can process and interpret natural language commands is limited by the availability of data used to train them. With limited AI models, developers are constrained in type of applications they can build.

We aim to enable greater innovation in Language AI by addressing the structural problems around language data:


1. Limited quality and types of public language data

Much of the publicly available language data was created many years ago for research purposes. Quite often, the language is outdated and stilted, which is unacceptable for enterprises building client-facing applications. The datasets are also limited in language variety and too small to train production-grade systems.


2. High cost of purchasing language data

Producing language data sets of sufficient size and quality is a costly and laborious undertaking. Transcribed speech audio routinely sells for $500 per hour or more from dataset brokers, and thousands of hours of speech audio are needed per language model.

Speech recognition systems require thousands of hours of voice data per language

3. Need for specific data

For most enterprise applications, one would be hard pressed to find existing language data that corresponds precisely with the task at hand. For example, one of our solutions required “food and grocery related ecommerce voice data in Korean.”

If the dataset cannot be bought, it must be custom assembled at even higher expense. Given the time and labor costs involved in such an undertaking, it is rarely cost-effective. As a result, more often than not, these applications are not even developed.

The LangNet Solution

To address these problems, LangNet will leverage its diverse, multilingual community to crowdsource language data required for specific purposes. Developers and enterprises will be able to quickly and inexpensively acquire required language data by working directly with data providers, rather than going through middlemen. Users will be able to retain ownership of the language data they create, and license it to earn more over an extended period of time.

“Mining” the initial datasets
To begin, we will set aside a relatively high amount of the total token pool to crowdsource initial voice datasets for the following 17 languages:

  • English
  • Chinese
  • Spanish
  • Russian
  • Japanese
  • German
  • French
  • Arabic
  • Hindi
  • Portuguese
  • Italian
  • Indonesian / Malay
  • Korean
  • Vietnamese
  • Persian
  • Turkish
  • Hebrew

Each language has a target number of hours to be aggregated, ranging from 300,000 to 500,000. The target number of hours per language are divided into a pre-determined number of periods, and the rewards are halved during the years.

Tokens are earned per minute of audio at a rate that is reduced according to a fixed schedule as more audio is collected for that language, rewarding early adopters for each language who bring the most initial value.


Staking

Users can stake their data as an indication of quality for potential data licensees.

If potential licensees reject user data because of poor quality, users lose a portion of their stake. Staking is a well-known mechanism for fighting spam, fraud, and abuse.

While there is no requirement to stake, users can create and add to their stake to attract more licensors. Data licensees can filter data based on the amount staked in order to find higher quality data. Increasing their stake allows users to potentially monetize their data at a higher rate.


Reputation

Whereas staking is the user’s representation of their own data, reputation scores are a product of the user’s interactions with the ecosystem and cannot be controlled or unilaterally adjusted by the user. Successful transactions such as licensing out data or correctly validating data can increase reputation, while getting data or a validation rejected can decrease user reputation.


Crowdsourcing Method — “MoreCoin App”’

Through our morecoin mobile app, users can contribute data, earn LANG tokens and generate their own, private and secured Voice ID. 
Developers and enterprises can use the MC Business app to crowdsource and purchase datasets.

As the first dApp for collecting language data, our mobile app will be open sourced and can be further developed for collecting other types of language data. Potential tasks can include generating more diverse intent sentences for a given command (e.g. “turn on the lights,” “turn on lights,” “lights on”, etc.) or talking to a bot in order to generate sample conversational data. Third party developers can also build their own data collection dApps to quickly crowdsource and create their own datasets to license out on LangNet.


Curating datasets

In addition to creating and supplying their own voice data, any user can curate existing data and create custom datasets that can be licensed out to buyers. By licensing curated datasets, users can benefit from additional revenue streams.


LANG Token

LangNet uses Blockchain Ethereum technology and standard type ERC20 Blockchain. LANG tokens will be used for buying and selling language datasets on LangNet.

Conclusion

Virtual assistants like Siri and smart speakers like Alexa are just the tip of the iceberg, just the beginning of what’s possible.

What’s so exciting for us is talking to customers like auto manufacturers and smart home device companies and new startups and developers who are working on amazing new voice-based services such as voice-based health monitoring, voice passwords, and biometrics.

By making our language data secure and more broadly sharable we can empower these startups and entrepreneurs. And so we hope you’ll join us and contribute your language data, and together we can build this amazing new future together.

There are endless opportunities:
Speech patterns used to spot Parkinson’s and Alzheimer’s
Analyzing voice to diagnosis heart disease
Voice analysis tech could diagnose disease
Nearly 1 in 5 EU5 physicians use Siri, Alexa

LangNet Timeline

Visit the following links to learn more:

Website
Video
Telegram
FB group
Twitter
Linkedin
Medium
YouTube
Bitcointalk