Lab book post #6
This has been the last week of working on weekly exercises. The activity has included something we have done before, fixing bad code and optimising how the program runs. We had to write some code that would return information about a triangle. I haven’t been able to finish all the work but thankfully the marking algorithm works such that I will be able to go back and fix up any work that I have missed.
More interestingly this week I have been able to start properly on the next major project. The ‘Open Data Project’ involves finding data that is available on the internet and analysing it. Ideally the data comes in a .csv or comma-delimited format. From here I must learn how to use Jupyter and some extra python packages like pandas which is for data analysis.
After quite a bit of searching for good options I have come to the decision that there is a lot of potential for data analysis in movies. IMDB provides a lot of information about all their films for the public to access and there are lots of people who have already scraped the data.
I have found someone who has got a list of over 5000 movies with 28 informational categories on each. There are some quite obscure statistics like the number of facebook likes each actor has and the number of faces on movie posters. This will allow me to make quite an interesting analysis of the success of a movie. Not only will I be able to determine what factors have any relevance (eg. budget, facebook likes, production country) but I will then be able to take the analysis further and see to what extent these contribute and how influential certain aspects are.
I also intend to find some other data and compare the two data sets. An interesting category that is not included in my current data set is location of filming which I believe would be very interesting to study in relation to other factors.