Concurrency, Parallelism, and Fishing

James Farner
YipitData Engineering
2 min readMar 28, 2019

Have you ever wanted to learn about the differences between concurrency and parallelism via a long-winded, unnecessary, and very stretched, analogy with fishing? Great!

This talk was inspired by some extensive (historical) usage of gevent at YipitData, general confusion about the two terms, and the state of our current architecture.

In the past, YipitData used gevent to manage concurrency in our applications. We would have top-level jobs devoted to solving a problem (generally gathering web data) and handled pieces of each job through different co-routines. One job might be finding all items in a marketplace given a category URL. Within the job, one co-routine might be pulling URLs from a queue, the next might be requesting the URL, the next parsing the data, and the last saving that information to a database. This approach had the advantage of a single Python process, running many co-routines, and being very CPU efficient — most of the time the CPU could be parsing data while it waited for various web requests or DB inserts to come back.

Today, we write simpler applications; they’re composed of small and independent functions that we scale across multiple machines. This was enabled by standardizing our queueing framework (check it out at readypipe.io) and a lot of investment automating our server provisioning. We moved the complexity out of our applications and into our infrastructure; we still get the benefits of a concurrent application model, but without necessarily concurrent programming practices.

Caveats: I use the term “process” very frequently in this talk and mean a few different things at different times (a Python process, a CPU, an OS process, etc.) — sorry for the confusion and the cases where I’m imprecise!

More info:

--

--