Solving Cold Movie Problem Using BERT Embeddings :Recommendation System

@Omkarade
2 min readJan 3, 2023

--

Part 1 : Recommendation System: Using Deep Learning

The item cold-start problem refers to when items added to the catalogue have either none or very little interactions. This constitutes a problem mainly for collaborative filtering algorithms due to the fact that they rely on the item’s interactions to make recommendations. Solving this problem I use BERT Embeddings. I Embed all movies genres and save all . also convert new movies genres into BERT Embeddings and using nearest neighbor. I find new movies nearest k movies and recommend them to user. I’m not using pretrain movie Embedding layer .because each movie has 1 X 50 Dimensions representation . at query time I need to calculate distance between every point and it going very time consuming if every point is 50 or more Dimensions. so I use BERT Embeddings and get 1 X 1 Dimensions representation for each movies .for low latency application. The data used for this task is the MovieLens data set.

movies = pd.read_csv(PATH + 'movies.csv') 
movies.head()

Prepressing :

movies['genres']=movies.genres.str.replace('Sci-Fi','science fiction')
movies['genres']=movies.genres.str.replace('|',' ')

genres converting into Bert Tokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)
new=[]
for i in movies['genres'].values:
gen1=tokenizer(i)
gen2=gen1['input_ids']
gen2.pop(0)
gen2.pop(-1)
new.append(sum(gen2))
movies['genres_token']=new

here are some genres converted data -

This function return K nearest movie based on Tokenize genre

def Recommendation_Movie(myNumber,myList,j):
i=set()
while len(i)<=j:
i.update(myList.title[myList.genres_token==myNumber].tolist())
g=min(myList.genres_token.values, key=lambda x:abs(x-myNumber))
i.update(myList.title[myList.genres_token==g].tolist())
bgh=myList.genres_token[myList.genres_token == g].index.tolist()
myList=myList.drop(index=bgh)
return i

using this approach its show similar movies to new movie but this approach is overfit . main objective of this approach is work on Low latency.

Some Outputs -

Demo -

Connect me — linkedin , github

--

--