How to Find Duplicates in a Corpus with NLP, Understanding Your Neural Network’s Predictions, and a Black Friday Deal

ODSC - Open Data Science
ODSCJournal
Published in
3 min readNov 25, 2022

--

How to Find Duplicates (and Near-Duplicates) in a Corpus with NLP

In this article, we’ll call the profileText action, pull down output tables, and perform duplicate identification in Python.

Understanding Your Neural Network’s Predictions

Use this concise guide to assess your neural network’s feature importance.

5 Easy SQL Tricks to Clean Dirty Data

These ready-to-go queries can help you find out missing values from the data as well as check if any patterns are associated with the data.

How to Build an Automated Development Pipeline

In this article, we will describe the development pipeline in simple words, and also describe the most important steps in building an automated development pipeline.

5 Practical Implementations of AI and ML in SaaS Product Management

AI and ML are altering the world and software product management as SaaS product management adopts these technologies.

ODSC’s Black Friday deal starts now! Register for ODSC East today to save 75% on any in-person or virtual pass. The conference will feature training sessions, workshops, and talks covering 10+ tracks on data science fundamentals and cutting-edge developments. Register now for 75% off.

Can We Wage War Against Opioids with Data and Analytics?

By using data analytics and public health data, one trooper was able to solve problems and save lives.

How AI Can Help Improve Employee Satisfaction Through Lean Thinking

AI tools can be helpful and practical for implementing lean thinking in the workplace. There are several ways managers and employers can apply AI to help boost employee satisfaction through lean principles.

ODSC East 2023 Call for Speakers & Save the Date

Do you have exciting data science use cases, research, or projects you want to share with the world? Learn more about how you can speak at ODSC East 2022 this May.

Video of the Week: Responsible AI: From Principles to Practice

In this talk, Dr. Tempest van Schaik will share her Responsible AI (RAI) journey, from ethical concerns in AI projects, to turning high-level RAI principles into code, and the foundation of an RAI review board that oversees projects for the team. She will share some of the practical RAI tools and techniques that can be used throughout the AI lifecycle, special RAI considerations for healthcare, and the experts she looks to as she continues in this journey.

Upcoming Webinars:

CI/CD for Machine Learning

Wed, Nov 30, 2022, 12:00 PM — 1:00 PM EST

In this talk, you’ll learn how to automatically allocate cloud instances (AWS, Azure, GCP) to train ML models, automatically shut the instance down when training is over, automatically generate reports with graphs and tables in pull/merge requests to summarize your model’s performance using any visualization library, and more.

Form Recognizer 3.0: Document Understanding

Tue, Dec 6, 2022,12:00 PM — 1:00 PM EST

Form Recognizer 3.0 is now Generally available! Join this session to learn more about the new features and services available through Form Recognizer to make document processing and automation even easier and more streamlined.

Using Open Source for Failure Prediction

Thu, Dec 8, 2022, 12:00 PM — 1:00 PM EST

Failure prediction in real time on time series data can be realized with the use of Open Source tools. We will deliver an overall view of how to start with the generation of new raw sensor data (typically captured by an Edge device), and end up with a real-time graph that shows alerts warning that a failure is imminent. There are a number of processes that must be put into place before the stated goal can be fully realized.

--

--

ODSC - Open Data Science
ODSCJournal

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.