Ian OrmesherinTowards Data ScienceDuplicate Detection with GenAIHow using LLMs and GenAI techniques can improve de-duplicationJul 12
Andy BryantProcessing guarantees in KafkaEach of the projects I’ve worked on in the last few years has involved a distributed message system such as AWS SQS, AWS Kinesis and more…Nov 16, 20194
Adrian EvensenEntity Resolution — An IntroductionFinding records that refers to the same real-world objectJan 24Jan 24
Rogger ValverdeHow to deduplicate jobs using Bullmq?There are some cases where you are a lot of jobs with same name and payload for a single queue. It could be because you have different…Jun 30Jun 30
Mitchell GrayHarnessing Deduplication in Apache FlinkContinuing on my Apache Flink Journey it’s time for some real world use cases. Now that I have discussed my initial Flink experiences (See…Mar 51Mar 51
Ian OrmesherinTowards Data ScienceDuplicate Detection with GenAIHow using LLMs and GenAI techniques can improve de-duplicationJul 12
Andy BryantProcessing guarantees in KafkaEach of the projects I’ve worked on in the last few years has involved a distributed message system such as AWS SQS, AWS Kinesis and more…Nov 16, 20194
Adrian EvensenEntity Resolution — An IntroductionFinding records that refers to the same real-world objectJan 24
Rogger ValverdeHow to deduplicate jobs using Bullmq?There are some cases where you are a lot of jobs with same name and payload for a single queue. It could be because you have different…Jun 30
Mitchell GrayHarnessing Deduplication in Apache FlinkContinuing on my Apache Flink Journey it’s time for some real world use cases. Now that I have discussed my initial Flink experiences (See…Mar 51
Rohan KhannaTackling Duplicates: Data Deduplication StrategyIn this blog post, we’ll dive into the troubles caused by confusing data copies (duplicates) that can wreak havoc on our analysis!Jun 4
Wenjing ZhanData Preprocessing — Deduplication with MinHash and LSHWhen dealing with text preprocessing, one headache a data scientist has to deal with is the duplicated or similar documents.Nov 2, 2020
Maryam BahramiinArtificial Intelligence in Plain EnglishHow to use fuzzy matching for deduplicationIntroducing a method for labeling duplicate records in the data.May 19