Recapping SciPy 2017

Andrew Therriault
3 min readJul 15, 2017

I’m waiting out a thunderstorm on my way back from Austin, where I just wrapped up my first SciPy, so figured I’d write a quick recap. It was an awesome event, with a ridiculous number of super-talented and enthusiastic people (presenters and attendees alike — Sebastian Raschka, for example, was there just to hang out and see panels), and my only regret is that I didn’t have time to catch more presentations.

(***update: Sebastian objected to this characterization of his attendance, noting that he engaged in discussions, gave a lightning talk, and did a mean rendition of Summer of ’69 at karaoke on Friday. My sincerest apologies for this error.)

Fortunately, all of the presentations were recorded, so I (and you) can catch up on what was missed. Particularly looking forward to watching Roy Keyes’ talk on using machine learning for cancer treatments and Matt Rocklin’s tutorial on using Dask for parallel data analysis.

Of the talks I saw, it’s hard to pick favorites, but a few that are definitely worth checking out:

Will also be recommending Andreas Mueller’s tutorial on machine learning for my own team when I get back . I’m really grateful that SciPy is making such a valuable resource available to everyone for free — normally you have to pay Columbia University a lot of money to get that kind of instruction!

I also presented myself, in one of the last sessions as part of the machine learning & AI track that Sarah Guido organized. My talk was a practical introduction to what I call “sustainable machine learning models”, which are repeated models that use the outputs of one iteration to drive data collection for future updates (for example, fundraising models which help you to find potential donors to ask for contributions, then use the results of those solicitations to update the model in the future).

You can find the video here, and I’ve posted the slides on github as well. (That’s probably going to be especially useful for the last few slides, which I had to rush through to stay on schedule — there’s a lot more on the slides than I could mention.) I only scratched the surface of what I had hoped to cover in this brief presentation, so I’ll probably come back to this topic again in a future presentation.

In the meantime, take a look at the video and let me know what you think. I’m particularly interested in hearing from people who found it useful . Do you want to hear more about this from me in the future? What parts of the topic would you most want to hear more on? Are there other media (blog posts, code examples, etc.) that would help to make things clearer?

Finally, if you have suggestions for things worth referring to on this topic, please send them my way! As I said in the talk, this type of model isn’t something I invented myself, just something I’ve done a lot over this past few years. And so I’m certain there’s other research and guidance out there I don’t know about. Some of the concepts I introduced might already be known by different names, and they may be better developed in papers and books I haven’t read yet, and it’s totally possible that some lesson I think I’ve learned through experience has actually been shown to be wrong by somebody with more expertise. So if there’s something I should look at that could make my work on this topic better, please let me know!

--

--

Andrew Therriault

Data science consultant and educator. Formerly Chief Data Officer @CityofBoston, Director of Data Science @TheDemocrats, and Data Science Manager @Facebook.