Open in app

Sign In

Write

Sign In

Daulet Nurmanbetov
Daulet Nurmanbetov

258 Followers

Home

About

Published in Towards Data Science

·May 31, 2020

BERT Model Embeddings aren’t as good as you think

Toward multilingual sentence embeddings — I wrote in the past about Facebook’s LASER embedding system, where sentences of similar meaning would have similar vector embeddings. What’s more, LASER embeddings are multilingual meaning sentences in different languages would map to the same vector! It is advantageous to have an embedding system that maps sentences in different…

Machine Learning

6 min read

BERT Model Embeddings aren’t as good as you think
BERT Model Embeddings aren’t as good as you think
Machine Learning

6 min read


Published in Towards Data Science

·May 14, 2020

SQL-like window functions in Pandas

A Single Place for all Pandas Window Functions — Window functions are very powerful in the SQL world. However, there isn’t a well written and consolidated place of Pandas equivalents. Basics of writing SQL-like code in pandas covered in excellent detail on the Pandas site. However, the Pandas guide lacks good comparisons of analytical applications of SQL and their…

Sql

3 min read

SQL-like window functions in Pandas
SQL-like window functions in Pandas
Sql

3 min read


Published in Towards Data Science

·May 4, 2020

Cutting edge semantic search and sentence similarity

Semantic search is a hard problem worth solving in NLP. — We commonly spend a lot of time looking for a specific piece of information in a large document. And we commonly find if using CTRL + F. The proverbial Google-fu, the art of effectively searching for information on google is a valuable skill in a 21st-century workplace. …

NLP

9 min read

Cutting edge semantic search and sentence similarity
Cutting edge semantic search and sentence similarity
NLP

9 min read


Published in Towards Data Science

·Mar 12, 2020

Summarization has gotten commoditized thanks to BERT

Have you ever had to summarize a lengthy document into key points? Or providing an executive summary to a document? …

Summarization

6 min read

Summarization has gotten commoditized thanks to BERT
Summarization has gotten commoditized thanks to BERT
Summarization

6 min read


Published in Towards Data Science

·Mar 12, 2020

Crowd-Sourced Data Labeling

How to obtaining High-Quality Labeled datasets with Croud-Sourced Workers As a data scientist, we spend an ungodly amount of time handling data — cleaning, normalizing, labeling. These days, thankfully, many solutions off-load the labeling to third parties, freeing up data scientists’ valuable time, and lessening the burden of manual text, pictures or video labeling. However, as a data scientist…

Data Labeling

5 min read

Crowd-Sourced Data Labeling
Crowd-Sourced Data Labeling
Data Labeling

5 min read


Published in Towards Data Science

·Feb 20, 2020

Bootstrapping cutting-edge NLP models

How to get up and running with XLNet and Pytorch in 5 mins — What is XLNet XLNet is a modern NLP language model that is based on Transformers (BERT, RoBERTa, TinyBERT, etc.) Results of XLNet on various Natural Language Understanding tasks are approaching that of human performance. XLNet can generate text at a level of a high-schooler, it can answer simple questions. It can comprehend that…

Artificial Intelligence

4 min read

Bootstrapping cutting-edge NLP models
Bootstrapping cutting-edge NLP models
Artificial Intelligence

4 min read


Published in The Startup

·Feb 9, 2020

Weak Supervision, Future of Data Labeling

Overview of data labelling for AI, new paradigms, and size of the growing data labelling market. — Data Labeling Landscape With the emergence of AI, many firms coming to realize the real bottleneck to bootstrapping Machine Learning is a lack of labelled datasets. In response to the need, many companies emerged that offer labelling services, labelling platforms or some other labelling solutions such as providing specific domain-experts to label data…

Machine Learning

4 min read

Weak Supervision, Future of Data Labeling
Weak Supervision, Future of Data Labeling
Machine Learning

4 min read


Published in Towards Data Science

·Nov 23, 2019

Extracting Data from Financial PDFs

How to quickly extract text and data from Municipal Bond CAFR Reports What is Spreading A large portion of finance is dedicated to writing and reading financial statements. In the US for a financial statement to be considered official, it must be in a format of PDF. …

Machine Learning

5 min read

Extracting Data from Financial PDFs
Extracting Data from Financial PDFs
Machine Learning

5 min read


Published in Towards Data Science

·Nov 2, 2019

Guide on AWS Textract set-up

How to accurately process PDF files with OCR-as-a-service from AWS Asynchronous API responses Recently a new paradigm of async API responses has become prominent. It works by returning Job-ID rather than an API response. Then, to check the status, the user would need to submit a second call to the API with the…

AWS

3 min read

Guide on AWS Textract set-up
Guide on AWS Textract set-up
AWS

3 min read


Published in Towards Data Science

·Oct 22, 2019

Multilingual Sentence Models in NLP

Overview of two major multilingual sentence embedding models Why Multilingual Models Multilingual Models are a type of Machine Learning model that can understand different languages. One example would be to classify whether a piece of text is a toxic comment. Using a regular Machine learning model we would be able to detect only…

Machine Learning

4 min read

Multilingual Sentence Models in NLP
Multilingual Sentence Models in NLP
Machine Learning

4 min read

Daulet Nurmanbetov

Daulet Nurmanbetov

258 Followers
Following
  • Nir Eyal

    Nir Eyal

  • Christoph Schranz

    Christoph Schranz

  • David Mezzetti

    David Mezzetti

  • Bruce Caron

    Bruce Caron

  • Newt Bailey

    Newt Bailey

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech