Civis R&D Bookshelf: Open source, Python integers, and applied predictive modeling

by Liz Sander

Figuring out how to contribute to open source

Julia Evans writes a great blog about programming and machine learning, focusing on whatever topic interests her at the time. As Civis kicks off its participation in Hacktoberfest, this post caught my eye. She goes through what is easy and difficult about contributing to open source, along with some tips for getting started, with all of the enthusiasm that makes her blog so enjoyable to read. If you like this post, she has also written some great zines.

Weird Python Integers

This is a short post about a weird behavior in Python, relating to how integers are stored under the hood. Basically, references to small integers all point to the same memory address, but not larger integers. This means that you can mess with the memory addresses of the small integers and break math! I love learning about the strange edges of programming languages like this, because it’s fun, and also because it gives a lot of insight into how the language works at a deeper level.

The prior can generally only be understood in the context of the likelihood

This is a recent paper by Gelman, Simpson, and Betancourt, available on ArXiv. The article focuses on the importance of considering the prior in context when building a Bayesian model, and in particular, the dangers of using a uniform or “uninformative” prior as a default. Articles on Bayesian analysis can be dense, but I found this paper easy to follow, and the conclusions felt obvious as I encountered them (even though they weren’t obvious to me beforehand!). It’s a testament both to the strength of the arguments and the quality of the writing that this paper is so clear. If you do Bayesian analysis, or are interested in the topic, give it a read!

Applied Predictive Modeling, by Kuhn and Johnson

If you’re looking for a book to fill in the gaps of your ML background, this is a great place to start. It focuses on the intuition behind algorithms, and how to use them in practice, over the mathematical details. It’s much more readable than most of the ML reference books I’ve encountered, and it even has R examples to get you started in applying your newfound knowledge.

This post is part of our Bookshelf series organized by the Data Science R&D department at Civis Analytics. In this series, Civis data scientists share links to interesting software tools, blog posts, scientific articles, and other things that they have read about recently, along with a little commentary about why these things are worth checking out. Are you reading anything interesting? We’d love to hear from you on Twitter.