How would you think about data quality for a $6B big data project?
Data science in fraud, talks at The Fifth Elephant, Anthill Inside workshops, and more this week on The HasGeek Newsletter 💌
Data quality remains one of the fundamental issues in scaling a big data project, especially when the data collected is private information, such as identity numbers and biometrics.
One of the largest data collection projects in India, the Aadhaar enrolment process has seen much controversy surrounding the collection, storage and processing of the data. We previously covered the architecture of Aadhaar in this talk by Pramod Varma and Regunath Balasubramanian at The Fifth Elephant 2012 and Rachna Khaira more recently spoke about the on-ground realities of Aadhaar at Rootconf this year.
With about 60,000 private agents on-board for enrolment and updates, the scale makes fraud inevitable. Fraudsters have even managed to disable security features and defeat back-end data analytics. This ends up compromising on the quality of data collected.
Anand V, in his talk at The Fifth Elephant, will touch upon the various processes used by UIDAI and how scaling deployed data acquisition systems creates novel challenges which can fully compromise data quality.
How do you manage offline data acquisition systems and scale the operation? Catch Anand V at The Fifth Elephant to understand how to scale systems, while keeping data quality in mind.
Date: 26 and 27 July
Venue: NIMHANS Convention Centre, Bangalore
Using data science to detect frauds
On 23 June, HasGeek in association with WalmartLabs organised a meetup on the role of data science in detecting frauds. The aim of the panel was to map out the landscape of frauds in transactions, identify the known and unknown types of frauds and the techniques used in data science to detect such frauds.
The discussion was moderated by Vinayak Hegde, CTO, Zoomcar who spoke about his experience in helping build systems which would flag users who were conducting fraudulent activities. The panelists, Vivek Mehta, Nirmal J M and Vamsi Varanasi added inputs based on their experience in dealing with frauds.
Confirmed talks from ReactFoo Delhi
Check out the schedule and book your tickets today!
Date: 18 August
Venue: India International Centre (IIC), Lodhi Road, New DelhiGet tickets
- Mumbai — Head of Engineering / CTONew! UnFound
- Bangalore — Senior Software Engineer (Big Data) dataxu
- Bangalore — Data Scientist (Artificial Intelligence) Atkins
- Bangalore, Hyderabad — Lead computer vision engineer skilld.in
- Detroit, MI — Data scientist iba infotech llc
On our radar
PySangamam is Tamil Nadu’s first conference on Python, targeted at users and developers of Python. The first edition will be a 2-day single track conference organised primarily by ChennaiPy.
Date: 7–8 September 2018
Venue: IC & SR — IIT Madras
Calendar of events
- Deep Learning Bootcamp — 23–24 July, Bangalore
- Machine Learning with Amazon SageMaker — 26 July, Bangalore
- Deep learning based hybrid recommendation systems in TensorFlow — 27 July 2018, Bangalore
- Math for Data Science — 28 July, Bangalore
- Bootcamp: Learning representations of text for NLP — 28–29 July, Bangalore
- Make your own DL framework — 29 July, Bangalore
- DL and ML for computer vision — 5 August, Bangalore