How Learning Python Changed My Life

Part II: Extensibility

hamid
4 min readJul 15, 2014

Another attractive aspect of Python is its extensibility model. Most of the Python libraries I came across were written in Python. This promotes portability, makes it easy to install/uninstall libraries, and enables developers to easily peek into other libraries’ source code whenever necessary. Python’s philosophy of extensibility is that you should always attempt to write your libraries in Python, unless it’s performance critical at which point you can write C/C++ extensions for Python.

The presence of software libraries like Boost.Python makes writing such extensions especially pleasant. It’s as simple as the examples provided on Boost.Python’s documentation homepage. For instance, a simple C function like the one below:

char const* greet()
{
return "hello, world";
}

(source: http://www.boost.org/doc/libs/1_55_0/libs/python/doc/tutorial/doc/html/index.html)

can be easily called from Python by means of the following C++ wrapper:

#include <boost/python.hpp>

BOOST_PYTHON_MODULE(hello_ext)
{
using namespace boost::python;
def("greet", greet);
}

(source: http://www.boost.org/doc/libs/1_55_0/libs/python/doc/tutorial/doc/html/index.html)

Of course, things become a little more involved if you are introducing new types for instance. But it still maintains the elegance and taste for which Boost is famous.

Another popular option that is commonly employed for producing C extensions for Python is SWIG (short for: Simplified Wrapper and Interface Generator). There are several C/C++ open source projects that utilize SWIG to generate Python wrappers/interfaces to expose C/C++ libraries to Python scripts.

An excellent showcase of SWIG happens to be with the brilliant cross-platform GUI library wxWidgets and its Python extension module wxPython. wxWidgets and wxPython deserve a post of their own, but you can get briefly introduced to wxWidgets by glancing at some of the beautiful apps it was used to create.

Here is a sneak peek of a GUI utility I created a long while ago using wxPython for data annotation purposes.

Sample app created using wxPython

wxWidgets/wxPython utilize the native GUI controls available on every OS they are supported on, which makes it easy to produce cross-platform apps that have a native look and feel on a variety of platforms.

Python’s extensibility model makes it a lot easier for developers to tinker with libraries like wxWidgets using Pythonic APIs that are often much friendlier than those offered by the native implementations of those libraries.

Abundance of libraries/frameworks

The extensibility model we just discussed enables a number of serious software projects to be easily accessible in Python. But even without that, there is an overwhelming number of pure Python libraries for almost everything you may ever need.

To see for yourself, check out PyPI (Python Package Index) for a massive quantity of software libraries that do pretty much everything.

If you’re interested in text processing, natural language processing, or machine learning, here is a list of software packages I personally used and found useful:

  • Natural Language Toolkit (NLTK): along with its book, NLTK is a perfect source for anyone interested in a gentle and entertaining introduction to Natural Language Processing fundamentals. It introduces you to concepts like word-breaking, parsing, part-of-speech tagging, sentiment analysis, supervised classification and many more.
  • PyParsing: a very powerful library for creating advanced parsers. Generally speaking, you know you need a parser when you find it too complex to rely on regular expressions alone to make sure a piece of text conforms to some pattern. A very nice introduction to the library is in this presentation by the library’s author, Paul Mcguire.
  • Whoosh: a pure Python full-text indexing/searching library. You know you need that when you want to search through a collection of documents/texts/strings for more than one word, even if the words are far apart, probably out of order, perhaps with stemming … etc. Whoosh is also straightforward to use. I tried it with a little over a million documents and it was pretty fast and stable. Speaking of full-text indexing/searching, I cannot fail to mention Lucene and Solr. If you are ever interested in an industry-quality full-text indexing/searching engine, make sure to give Solr a try.
  • scikit-learn: a software library with a decent collection of machine learning algorithms for various tasks. I remember using some of its preprocessing, clustering, and classification algorithms quite a few times.
  • Scrapemark: web-page scraping made super easy. You know you need it when there is a piece of data you need to scrape off of a web-page without having to write elaborate regexes or state machines. Yes, the project is not actively maintained anymore but it is still one of the easiest ways to do scraping, and the whole library comes in a single Python file.
  • BeautifulSoup: a super multipurpose library for processing HTML even if it is malformed. Works out of the box.
  • gensim: a library with a couple of Singular Value Decomposition-based algorithms for latent analysis of large text collections. Works as advertised. Pretty useful. You know you need this one when you have a large collection of documents that you would like to query for the most semantically similar documents.
  • NumPy/SciPy: two of the most fundamental and foundational libraries for numeric and scientific computing. You may or may not use them directly but many serious machine learning, statistics, and NLP libraries depend on them. gensim is one example for instance since it relies heavily on matrix factorization algorithms.

Learning Python made it pretty easy for me to explore these libraries and utilize their capabilities. This was yet another situation where Python was a great enabler.

[Did you like it? Read on: Part III: Web Development]

--

--