How would you think about data quality for a $6B big data project?

Abhishek Balaji
Hasgeek
Published in
3 min readJul 4, 2018

Data science in fraud, talks at The Fifth Elephant, Anthill Inside workshops, and more this week on The HasGeek Newsletter πŸ’Œ

Data quality remains one of the fundamental issues in scaling a big data project, especially when the data collected is private information, such as identity numbers and biometrics.

One of the largest data collection projects in India, the Aadhaar enrolment process has seen much controversy surrounding the collection, storage and processing of the data. We previously covered the architecture of Aadhaar in this talk by Pramod Varma and Regunath Balasubramanian at The Fifth Elephant 2012 and Rachna Khaira more recently spoke about the on-ground realities of Aadhaar at Rootconf this year.

With about 60,000 private agents on-board for enrolment and updates, the scale makes fraud inevitable. Fraudsters have even managed to disable security features and defeat back-end data analytics. This ends up compromising on the quality of data collected.

Anand V, in his talk at The Fifth Elephant, will touch upon the various processes used by UIDAI and how scaling deployed data acquisition systems creates novel challenges which can fully compromise data quality.

How do you manage offline data acquisition systems and scale the operation? Catch Anand V at The Fifth Elephant to understand how to scale systems, while keeping data quality in mind.

Date: 26 and 27 July
Venue: NIMHANS Convention Centre, Bangalore

Get tickets

Using data science to detect frauds

On 23 June, HasGeek in association with WalmartLabs organised a meetup on the role of data science in detecting frauds. The aim of the panel was to map out the landscape of frauds in transactions, identify the known and unknown types of frauds and the techniques used in data science to detect such frauds.

The discussion was moderated by Vinayak Hegde, CTO, Zoomcar who spoke about his experience in helping build systems which would flag users who were conducting fraudulent activities. The panelists, Vivek Mehta, Nirmal J M and Vamsi Varanasi added inputs based on their experience in dealing with frauds.

Read more

Confirmed talks from ReactFoo Delhi

The schedule for ReactFoo Delhi is shaping up well, with talks on building micro front-ends, improving page loading performance, WebVR/AR experiences, React Canvas and many more.

Check out the schedule and book your tickets today!

Date: 18 August
Venue: India International Centre (IIC), Lodhi Road, New DelhiGet tickets

Related jobs

View more jobs on Hasjob

Hasjob is a service of HasGeek. Write to us at support@hasgeek.com if you have suggestions or questions on this service.

On our radar

PySangamam 2018

PySangamam is Tamil Nadu’s first conference on Python, targeted at users and developers of Python. The first edition will be a 2-day single track conference organised primarily by ChennaiPy.

Date: 7–8 September 2018
Venue: IC & SR β€” IIT Madras

Calendar of events

Workshops

Conferences

--

--