Recommendation System Dengan Python : Content Based Filtering (Part 2)

Novindra Prasetio
Data Folks Indonesia
6 min readJul 19, 2019
Photo by Wahid Hasyim K on Unsplash

Haloo… Kali ini saya akan membangun rekomendasi sistem dengan metode content based filtering. Di post saya sebelumnya sudah membahas definisi dari recommendation system beserta metode yang dapat diaplikasikan.

Data : Hotel di Bandung

Data yang saya pakai pada pembuatan content based filtering, yaitu dengan dataset yang saya buat sendiri dari sumber booking.com karena informasi yang diberikan lengkap.

booking.com

Data yang diambil berjumlah 100 data hotel yang meliputi nama, address,dan description. Dataset bisa diunduh disini : hotel_bandung_english

Import dahulu library apa yang mau dipakai.

Setelah kita mengimport library apa saja yang dipakai, cobalah analisa terlebih dahulu dengan describe dan info.

df.describe()
df.info()
describe
info

Deskripsi Hotel

Deskripsi hotel berguna untuk mengetahui isi informasi dari masing — masing hotel yang ada pada dataframe yang kita masukkan. Deskripsi hotel dibuat dengan pembuatan fungsi.

Lalu, coba isi fungsi print_description() dengan angka index yang ingin diisi.

print_description(1)Sheraton Hotel & Towers offers 5-star accommodation in the middle of a green landscape in Bandung. All spacious rooms come with a flat-screen cable TV. The hotel offers an outdoor pool, spa center and restaurant with mountain views. Wi-Fi access is available free in all areas of the hotel. Elegant rooms have modern interiors, light wood furnishings and large windows. Each provides a comfortable seating area, DVD player and private bathroom with shower. You can work out in the gym or enjoy body treatments at the spa. Reception staff are ready to serve your needs for 24 hours. International and Asian dishes are offered at Feast Restaurant, while soft drinks are served at Samsara Lounge. A variety of cocktails and snacks are also available at Poolside Terrace. Sheraton Bandung Hotel & Towers is a 10-minute drive from Juanda Culture Park and Dago area, where various factory outlets are located. Husein Sastranegara Airport is a 30-minute drive away.
Nama: Sheraton Bandung Hotel & Towers
Alamat: Jl. Ir H Juanda 390, 40135 Bandung, Indones
print_description(50)Featuring an outdoor pool and a restaurant, House-Sangkuriang is conveniently located just a 5-minute walk from Dago’s factory outlets. It has a 24-hour front desk and provides free Wi-Fi access in all areas. Elegant and warmly lit, the air-conditioned rooms in House-Sangkuriang include hardwood floors. A flat-screen satellite TV, an electric kettle and a free one-time minibar are among the in-room comforts, and a shower, slippers and a hairdryer are included in the private bathrooms. The hotel also serves daily afternoon tea in the lobby and on the pool terrace. Cihampelas Walk Mall is a 10-minute drive from the property, and Husein Sastranegara Airport is a 20-minute drive away. Airport transportation can be arranged upon request. The staff at the front desk can assist with valet parking and luggage storage. Housing a business center, the hotel also provides laundry service for a fee. International dishes are served at Dining Room. Guests can also dine in the comfort of their rooms.
Nama: House Sangkuriang
Alamat: Jl. Sangkuriang no.1 Dago, Kecamatan Coblong, 40135 Bandung, Indonesia

Text Preprocessing

Setelah melakukan analisa dan deskripsi hotel, proses selanjutnya melakukan text preprocessing yang bertujuan agar data yang dipakai nanti dapat diproses menjadi angka dengan TF-IDF dan cosine similarity nanti. Data yang akan dipakai hanya kolom ‘description’ saja karena agar bisa mendapatkan kesamaan pada pengaplikasiannya nanti.

Dataframe yang telah ditambahkan kolom ‘desc_clean’

Lalu, coba buat kembali fungsi yang sama seperti deskripsi, namun kolom yang dipakai desc_clean.

print_description_clean(1)sheraton hotel towers offers 5star accommodation middle green landscape bandungall spacious rooms come flatscreen cable tvthe hotel offers outdoor pool spa center restaurant mountain viewswifi access available free areas hotelelegant rooms modern interiors light wood furnishings large windowseach provides comfortable seating area dvd player private bathroom showeryou work gym enjoy body treatments spareception staff ready serve needs 24 hoursinternational asian dishes offered feast restaurant soft drinks served samsara loungea variety cocktails snacks also available poolside terracesheraton bandung hotel towers 10minute drive juanda culture park dago area various factory outlets locatedhusein sastranegara airport 30minute drive away
Nama: Sheraton Bandung Hotel & Towers
Alamat: Jl. Ir H Juanda 390, 40135 Bandung, Indonesia
print_description_clean(50)featuring outdoor pool restaurant housesangkuriang conveniently located 5minute walk dagos factory outlets 24hour front desk provides free wifi access areas elegant warmly lit airconditioned rooms housesangkuriang include hardwood floors flatscreen satellite tv electric kettle free onetime minibar among inroom comforts shower slippers hairdryer included private bathrooms hotel also serves daily afternoon tea lobby pool terrace cihampelas walk mall 10minute drive property husein sastranegara airport 20minute drive away airport transportation arranged upon request staff front desk assist valet parking luggage storage housing business center hotel also provides laundry service fee international dishes served dining room guests also dine comfort rooms
Nama: House Sangkuriang
Alamat: Jl. Sangkuriang no.1 Dago, Kecamatan Coblong, 40135 Bandung, Indonesia

TF-IDF dan Cosine Similarity

Setelah data di preprocessing, gunakan library TF-IDF dan cosine similarity agar bisa diubah menjadi angka berupa matriks. (Untuk mengetahuinya lebih lanjut, klik kata — kata yang telah digaris bawahi)

Output dari penggunaan TF-IDF dan cosine similarity

Lalu agar kita bisa memprediksi rekomendasi hotel, buat variabel indicies sebagai set indexing utama nanti. Lalu coba definisikan dari variabel indicies.

0                Capital O 253 Topas Galeria Hotel
1 Sheraton Bandung Hotel & Towers
2 OYO 794 Ln 9 Bandung Residence
3 OYO 226 LJ hotel
4 OYO 230 Maleo Residence
5 OYO 167 Dago's Hill Hotel
6 OYO 794 Ln 9 Bandung Residence
7 OYO 196 Horizone Residence
8 OYO 483 Flagship Tamansari Panoramic Bandung
9 OYO 295 Grha Ciumbuleuit Residence
10 OYO 193 SM Residence
11 Capital O 874 Hotel Nyland Pasteur
12 OYO 352 Sabang Hotel
13 Hilton Bandung
14 InterContinental Bandung Dago Pakar
15 Aryaduta Bandung
16 Art Deco Luxury Hotel & Residence
17 Crowne Plaza Bandung
18 Best Western Premier La Grande Bandung
19 éL Royale Hotel Bandung
20 Courtyard by Marriott Bandung Dago
21 Four Points by Sheraton Bandung
22 Mercure Bandung City Center
23 Swiss-Belresort Dago Heritage
24 OYO 228 Hotel Lodaya
25 Prama Grand Preanger Bandung
26 P Hostel
27 The Trans Luxury Hotel Bandung
28 Grand Tjokro Bandung
29 Grand Mercure Bandung Setiabudi
30 Aston Tropicana Hotel Bandung
31 De Paviljoen Bandung by HIM
32 Sensa Hotel Bandung
33 Ibis Bandung Trans Studio
34 Aston Pasteur
35 The Luxton Bandung
36 Holiday Inn Bandung Pasteur
37 Savoy Homann Hotel
38 The Jayakarta Suites Bandung, Hotel & Spa
39 Arion Swiss-Belhotel Bandung
40 MOXY Bandung
41 Ibis Styles Bandung Braga
42 Favehotel Premier Cihampelas
43 De JAVA Hotel Bandung
44 El Cavana Bandung
45 Ibis Budget Bandung Asia Africa
46 Ibis Bandung Pasteur
47 Favehotel Braga
48 Ivory Hotel Bandung
49 The Papandayan
Name: name, dtype: object

Modelling

Pada tahap permodelan, saya membuat fungsi untuk rekomendasi hotel yang sama sesuai hasil dari TF-IDF dan cosine similarity yang dibuat. Hasil yang akan ditampilkan berupa 10 urutan terdekat dengan nama hotel yang kita definisikan.

Setelah selesai membuat fungsi rekomendasinya, coba ketik nama hotel yang ada pada dataset.

recommendations('Benua Hotel')['FOX Lite Hotel Metro Indah Bandung',
'InterContinental Bandung Dago Pakar',
'Zest Sukajadi Hotel Bandung',
'M Premiere Hotel Dago Bandung',
'Ibis Bandung Pasteur',
'Serela Cihampelas Hotel',
'Grand Cordela Hotel Bandung ',
'Favehotel Hyper Square',
'HARRIS Hotel & Conventions Ciumbuleuit - Bandung',
'Hemangini Hotel Bandung']
recommendations("Serela Cihampelas Hotel")['Vio Cihampelas',
'Grand Sovia Hotel',
'Neo Dipatiukur Bandung',
'Grand Tjokro Bandung',
'HARRIS Hotel & Conventions Ciumbuleuit - Bandung',
'InterContinental Bandung Dago Pakar',
'Ibis Bandung Pasteur',
'Tebu Hotel Bandung',
'Aryaduta Bandung',
'Benua Hotel']

Kesimpulan

Penggunaan recommendation system dengan menggunakan metode Content based filtering dapat menghasilkan nama — nama hotel yang memiliki kesamaan dari sisi deskripsi yang telah ditampilkan pada dataset.

Terima kasih telah membaca artikel ini. Selanjutnya pada part 3 saya akan membuat dengan metode collaborative filtering.

Refrensi

  1. https://towardsdatascience.com/building-a-content-based-recommender-system-for-hotels-in-seattle-d724f0a32070
  2. https://towardsdatascience.com/how-to-build-from-scratch-a-content-based-movie-recommender-with-natural-language-processing-25ad400eb243

--

--