Week 2 — UniMajor Helper?

Velican Özkaya
AIN311 Fall 2022 Projects
3 min readNov 21, 2022

In the second week of our project, we realized, when we translated and proofread the dataset, that it did not contain sufficient data we intended to use.

The Problem with the OLD Dataset

During the translation and analysis of the data, we noticed that the data we featured for week one did not reveal the university major that the candidates had earned the right to go to, but instead showed the type of school from which they graduated. Not only that, but we also had problems with the Persian language, and we had problems displaying in the local environment, which slowed down our progress. Thus we decided to find a new dataset.

This time, we examined the data set we found in a very detailed way.

About Our NEW Dataset

The dataset we found includes the nets made by the students in the “Seleksi Bersama Masuk Perguruan Tinggi Negeri” or short “SBMPTN” (Joint Entrance Selection of State Universities).

Basic information about SBMPTN

The “SBMPTN” consists of 2 examinations. One of the examinations is for science-related majors. The other one is for a humanities-related major. Students can take both exams. The first part of these two examinations are general knowledge and reasoning tests covering:

  • KPU — Kemampuan Penalaran Umum — General Reasoning
  • Kua — Kemampuan Kuantitatif — Quantitative Skills
  • PPU — Pengetahuan & Pemahaman Umum — General Knowledge & Understanding
  • KMB — Kemampuan Memahami Bacaan & Menulis — Reading Comprehension & Writing

The second part differs according to the examination types.

Science-type examinations

  • Mathematics (mat)
  • Physics (fis)
  • Chemistry (Kim)
  • Biology (bio)

Humanities-type examinations

  • Mathematics (mat)
  • Geography (geo)
  • History (sej)
  • Sociology (sos)
  • Economy (eko)

For more detailed information on the SBMPTN, you can check this link (2).

As a reminder, the placement process is affected by the results obtained by the students from this exam. But unlike our country, the calculation for the score is not known.

Our aim in this project is to suggest different university departments to students by looking at the results of distinct courses. And we plan to do this by calculating a formula for a different major, giving different weights to different subjects. In addition, we also have the average, minimum and maximum scores of people who have placed in the first 500 universities in science departments. By using these, we aim to obtain a formula that is close to the formula currently used to calculate the score.

Overview of the dataset

In this dataset, there are 147 thousand of the 1.1 million people who took the exam in the dataset. In 4 CSV files:

  • Departments and capacities you can choose in the exams.
  • Universities of choice
  • The preferences of the students who took the humanitarian departments exam (maximum 2), the ID of the student, and the scores they got for each course.
  • The preferences of the students who take the scientific departments exam (maximum 2), the student’s ID, and the scores they get for each course.

Related Work

As another project using this dataset, we can cite this project(4) as an example.

The project above does not aim to find appropriate departments for all students but rather, to find out how hard it is for a student to enter a top 3 university.

What Have We Done This Week?

We spent about half of our week finding a new dataset. The newly found data was divided into several CSV files. The universities, their IDs, and their departments were in different CSV files, so we combined them into one CSV file. We also combined science and humanity students into a single CSV file after realizing that the two did not share student IDs. We also learned how other people process this data by examining projects that use the same dataset.

Thanks for reading by Velican Özkaya & Mehmet Ertaş

--

--