Day 1: MVP and the Wikipedia API

>>> Daily summary: brief intro

It’s project season at Hackbright, which means everyone has four weeks to define, build, test, and document a minimum viable product. But I’ve heard it doesn’t happen in that order.

The primary purpose of our Capstone Project is to learn. In programming, this means build something using what you’ve learned, and learning infinitely more while you’re building it. How gorgeous is that?

I’ll be building a web app that leverages an SQL database, Python, JavaScript, and HTML/CSS. And I’m willing to bet I’ll use 10 libraries I haven’t even heard of to solve problems I can’t yet imagine having, plus a bunch of JavaScripty things that sound like they’re named after fringe characters from the DC/Vertigo universe.


My MVP statement of purpose: To create a web app that uses the Wikipedia API to store information about all countries in the world and generate a pseud0-random* single-question, multiple-choice quiz on the capitals of all countries.

Core Features**:

  1. Query the MediaWiki Web API, sort response into capitals & countries, store in a database.
  2. Generate the multiple-choice quiz question with one right answer and three wrong ones from other countries.
  3. Handle login/logout
  4. Put the quiz question in a form and serve it up pretty to the client.

5. Record a user’s response to the question and pass it back to the server. Store it in the database.

6. Format a quiz score and serve it to the client.

7. Provide navigation between wikipedia and the quiz.


Today’s mission was to start the R&D on core feature #1, emphasis on the R. There is a scary amount of documentation out there about interacting with Wikipedia. I focused my energy on getting friendly with MediaWiki, the python library Wikipedia, and the illustrious mwparserfromhell. Oh, and python requests, because I’d never made a request before. Because I’ve never made anything before.

>>> Where I struggled

Dealing with the format of what comes back from the requests.get() call.

MediaWiki spends a lot of time talking about how to write a bot for updating Wikipedia, but it doesn’t provide much direct support for parsing the actual text that comes from querying a page, because the API doesn’t know much about the way Wikipedia’s content is organized. What it passes you is a giant template, with lots of {{names}} in it that point to references on the page with which a reader would be familiar. At first I thought it was, as our TA Katie so eloquently put it, a big, stupid string.

Luckily, hundreds of people have asked this question before me and seem to have worked out a number of ways to parse the text coming back from the API, and I was mired for awhile in all the options I could have chosen. With the few I mentioned above, I’ve got a function that returns an API query on a single country’s page with the text in it, pythonifies the query into some seriously nested dictionaries, and strips away some of the layers until it gets to the Wiki template level I want, but I still can’t isolate the template that starts with “Infobox” which is where the capital information lives. So I know what I’m doing tomorrow.

>>> Thoughtful takeaway:

As a beginner, I’m fighting for every single word in my tiny program like it’s my college application essay. I’m deleting and re-typing or slowly tapping out each word and bracket and testing each iteration. As I go, deleting things and trying slight changes feels good to me for two reasons:

  1. I don’t get attached to it. Deleting became easy. I can always write more code.
  2. I’m never afraid to run the .py file, because I know exactly what I just changed. And when I get a new error message, I rejoice!

Progress.


*The word ‘random’ is up there with ‘literally’ in terms of lexical abuse in our daily lives. Curious? Try this enlightening read on randomness from cryptographer Mark Green for starters.

**Bear with me on the use of the word ‘feature’ here, I know these are more like teeny stories.