Last December, Google started using BERT (Bidirectional Encoder Representations from Transformers), a new algorithm in its search engine. Available in three distributions by the global company (trained in English, Chinese and also multilingual), the solution consists in a Natural Language Processing model which allows Google to understand better what is being searched by the users, providing more assertive results.
With the objective to democratize the use of the algorithm in Portuguese, NeuralMind, a startup focused on products for text and image analysis using artificial intelligence, trained the algorithm using the BrWaC (Brazilian Web as Corpus) and publish it on GitHub. The company is the first to provide the solution in portuguese.
“We trained BERT to understand Portuguese language. It was a huge work, of several days on Google Cloud machines, in addition to several weeks of data preparation”, says the CTO of NeuralMind, Professor Roberto Lotufo, who coordinated the work with NeuralMind’s researchers.
It is estimated that 15% of searches made on Google platform, daily, are formulated in an unprecedented way. Therefore, it is necessary to understand the real meaning of the sentence to guarantee the ideal result. “BERT permits practically all tasks using Natural Language being better solved using the technology. Thus, exceeding human performance”, says Lotufo.
The solution is a disruption for companies, as well for developers and all the technology community. If you are interested, access NeuralMind’s GitHub or the HuggingFaces page to find the repository. More information: www.github.com/neuralmind-ai/portuguese-bert.