Can I learn python in one month? Day 28

Adam Jones
2 min readMar 29, 2018

--

Having successfully created a random country generator, I’ve spent the past two evenings attempting to tackle my third goal: Create a list of most commonly used Korean words. I started by doing some research to see how one might go about scraping data from websites, and came to the conclusion that some software called BeautifulSoup was the best tool for the job. Basically it allows you to parse the HTML from a web page so that it can be manipulated using python. After some issues to install it correctly (at first I could only run it through the terminal and not through SublimeText), I finally had it working.

I decided to work on an English-language website first before moving onto Korean, as I assume this would likely to throw up some additional complications. So I found a webpage with a large amount of prose text, which was to be the subject for my experiment — specifically I used this NY Times article on ‘How to Spot a Nuclear Bomb Program’. After watching some python tutorials I discovered that in order to pull out the specific text you want from a webpage, you need to identify the HTML tag and call that in your python code. This was quite a time-sink as every tutorial I looked at seemed to use a different method for this, but eventually I isolated a fairly clean block of text to work with.

Next came the more interesting part. From my list of words I created a ‘dictionary’ containing a unique set of words from the article, and the number of times that word appeared in the article. I then ranked the words by most common occurrence and printed the results:

List of the most common words (occurring ≥5 times in the article)

And here’s the code:

I’m actually fairly pleased that I was able to figure this out, and it opens the door to more potentially interesting projects. It’s probably not the optimal way to write the code but at least it works. And now I’ve got it working for an English website, tomorrow I’ll see if I can use it to give me a list of common Korean words.

Read the previous post here and the next post here

--

--