Recommender systems deliver intensive personalisation, adapting an app or a shopping experience to the person and even the device accessing it. Personalisation is one of the most interesting mission statements for tech companies, and is identified as crucial for delivering better experiences for the users. It reduces users’ cognitive load, and also provides them with more relevant and valuable options.

The wide scale of how much recommender systems are involved in our life is exemplified by the variety and depth of the following:-

  1. Amazon, the quintessential name in recommender systems, has achieved 29% increase in global sales and a proliferation into the buying habits of more than 200 million people.
  2. Zomato’s profiling and restaurant-user recommendations have boosted its outreach.
  3. Spotify’s leading personalisation has earned it a forecasted yearly increase of 23.77%
  4. Even the people we meet is a function of recommender systems today, with LinkedIn and Tinder deploying different strategies to connect and index users.

According to a McKinsey report, 75% of Netflix viewing selections are from product recommendations. Hence, the long-term impacts of recommender systems and the ethics thereof are an upcoming grand challenge. While it provides benefits to both users and the content provider, user profiling is required during the recommendation process. Companies have multiple data records that could potentially disclose our identity or reveal important aspects of our private life. As the system recommends things that align with the input, the system yields ethical and privacy concerns.

Privacy Concern: Personal Data Collection

Personalised recommendations typically require the collection of personal data, which puts users at risk. The more data a recommender collects, the more accurate recommendations users can obtain. The collected data consists of users’ identity, demographic profile, behavioural data, purchase history, rating history, and more. According to a 2018 research paper, the data exposes personal aspects of the users to the system. The data is possibly sold to third parties as well. Platforms can also be hacked, which is also a vulnerability with the systems. It is hence vital for platforms to develop privacy frameworks to avoid problems.

The data collected here can be dissected with the following considerations:-

  1. Demographic vs Product Information:-
    Also called static and dynamic information respectively, they represent two different aspects of a user’s experience. The user’s demographics, which is data that seldom changes, describes their personal qualities. Their context, behavioural data, and responses to various interactions or stimuli is what forms their product information. These are also responsible for paving the path for content and collaborative filtering respectively.
  2. Explicit vs Implicit Inputs:-
    Direct actions and inputs by users like rating a product or leaving behind a review is an explicit input. The subtler interactions that unconsciously indicate preferences are called implicit inputs, which include browsing history, clicks, bounce rate, replays, rewinds, etc.

Towards Secure & Private Recommender Systems

Corrective actions to protect the interests of users are currently being taken. These are being planned on both the technological and policy level. The recent GDPR legislations have inspired more approaches. Some of the most popular takeaways from the progressed companies are as follows:-

  1. Data minimisation:- limiting the data to be collected for better results
  2. PbD:- integrating privacy practices at all steps of the process
  3. Data breaches:- developing protocol to address violations
  4. Data Protection Officers:- a SPOC for any queries for breaches
  5. Transparency:- the legal basis and rights of the users are to be made available publicly
  6. Consent:- the user has to be informed appropriately about their data being collected

Technological Practices

Two of the most impactful technological design approaches to improve recommender systems and reduce their vulnerability/risk are knowledge separation and anonymization.

Knowledge separation isolates the most sensitive data — personal data — from their ratings or behaviours. This is essentially separating the implicit input from the explicit one. Recommender systems hence learn behaviour and dynamic information without any influence of who is using the service. Furthermore, in case of an intrusion, the possibility of linking this activity to users is very low, making the system secure indeed.

Knowledge Separation

Anonymization is performed where personal data is stored or used at any stage, when it does not affect the final purpose of the system functioning. In case of pseudo-anonymization the storage of the keys should be specially restrictive.

The main concept and driving point behind anonymization is that it minimises personal data requirements by default. Behavioural information is also anonymized before being used for training the system.

De-identification of data secures your personal information while providing the valuable meaning

