Advanced Philosophy

What does it mean to be a conscious being?

Image for post
Image for post
Photo by Simon Wijers on Unsplash

A philosophical zombie is a hypothetical human being who walks, talks, and behaves exactly the same as a normal person; we could subject it to brain scans or to any imaginable examination, and we couldn’t find anything unusual. However, even if it pretends to the contrary, what makes it a zombie is that it lacks consciousness (or, as the philosophers say, qualia): a subjective, inner world of experiences.

Our personal introspection strongly convinces us that we are not zombies ourselves; we know first hand “what it is like” to be us. So could we spot a philosophical zombie if it crossed our path? Or put the opposite way: As our technology over time perfects the art of building ever more deceptive humanoids, how sure are we that something resembling consciousness won’t be emerging along the way? …

Some of our insights from developing a PyTorch framework for training and running deep learning models …

My wonderful colleagues at Atomwise and I have written a production-level PyTorch framework for training and running deep learning models. Our application centers on drug discovery — predicting whether a molecule inhibits the activity of a protein by binding to a pocket. The goal was to have a stable yet flexible platform to support both machine scientists in training and experimenting, and medicinal chemists in applying them as part of their production workflow.

Iterative improvement and “continuous refactoring” are ubiquitous in software development, and we were no exception. The first version was far from perfect; but looking at use cases, feedback, and extended functionality, we have gone through several rounds of refinement. Our design goals were to provide a tool that allows easy experimentation without the need to write code, as well as the use within an automation pipeline. This meant we had to strike a balance between allowing enough knobs to conduct meaningful research, yet limiting the potential for unintentional misconfiguration and confusion. We also optimized for performance and cost-effectiveness within a cloud environment. …

Simple utilities for statistics collection

Image for post
Image for post

A typical PyTorch training loop contains code to keep track of training and testing metrics; they help us monitor progress and draw learning curves.

Let’s say for a classification task, we are using binary cross entropy as training loss, and we are also interested in accuracy. After each epoch, we measure the same on a held out validation data set. We want to write periodic progress information to the console as well as to Tensorboard (or any other of your favorite dashboard tools).

Note the pattern of recording a number of successive values, then computing statistics over it. While this snippet is simple enough, there is some amount of repetition, and logging starts to obscure the main algorithmic core. the more metrics and statistics (say, standard deviations in addition to the means) we want to track. …

A configurable, tree-structured Pytorch sampler to take advantage of any useful example metadata

Image for post
Image for post
Photo by Christina Winter on Unsplash

When you are building your awesome deep learning application with PyTorch, the torchvision package provides convenient interfaces to many existing datasets, such as MNIST and Imagenet. Stochastic gradient descent proceeds by continually sampling instances into mini batches. In many cases, you don’t have to lose sleep over it and can stick with the default behavior: Go through the list of images one-by-one, and reshuffle the list after every epoch. If you never had reason to modify that in your modeling experience, you can stop reading here.

However, when you are working on your own custom dataset that you aggregated and curated, you might end up writing your own subclasses of DataSet, DataLoader, or Sampler. This was the case for us at Atomwise, where we try to predict bioactivity of possible medical drugs based on structural models. Without going into too much detail, for the purpose of the following exhibition let me briefly describe a simplified version of our data schema. Each example consists of a pair of files with spatial coordinates, for a protein target and a ligand molecule binding to it. There are multiple labels corresponding to different endpoints and confidences, according to their assay (the type of chemical or biological experiment that was performed to determine activity). Some negative examples can be synthetically generated, in a number of different ways. We are continually asking ourselves the question, “What is the best way to use this data?” And you guessed it, the answer ranges from “it depends” to “it’s complicated”. In the drug discovery domain, it can be tricky to come up with canonical, universal recipes for benchmark construction. …

This post is intended for my fellow machine learning engineers who are curious about applications in medicine, biology, or chemistry, but without a prior formal background in these fields. I have been in this position myself; my goal In this post is to give you a concise starting point into the field of drug discovery.

Image for post
Image for post

In the first part, I sketched the “what” of machine learning in drug discovery: its objectives and role within the drug research pipeline, types and quality of experimental data, benchmarks and evaluation metrics. That set the scene for the present post, where we are going to explore the “how” of actual approaches in greater detail. Unfortunately space does not permit a complete coverage of the rather large body of literature; I apologize in advance for subjectively leaving out some works and references. Nevertheless, I am going to discuss a number of prototypes that span the spectrum of approaches that have been developed so far. …


Having applied machine learning to search and advertising for many years, two years ago I was ready for a transition to something new. In particular, the field of life sciences seemed to stand out as an opportunity to have a rewarding and positive impact. I was excited to join Atomwise, working on deep learning for drug discovery.

Deep neural networks started to become particularly popular around 2012, when researchers from the University of Toronto [Krizhevsky et al, 2012] won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). In recent years, this brand of machine learning techniques has revolutionized several artificial fields such as computer vision, natural language processing, and game playing. Will it be able to similarly transform chemistry, biology, and medicine? Experimental sciences give rise to a wealth of unstructured, noisy, and sometimes poorly understood data. One central attraction of deep neural nets is their ability to build complex and suitable featurizations without the need to handcraft them explicitly. Progress has undoubtedly been made, and compelling ideas have been proposed and are being developed. In the same year of the first ImageNet competition, Kaggle hosted the Merck Molecular Activity Challenge; this sparked a lot of interest as well, spurred research into life sciences applications and also captured attention in the popular press. …


Stefan Schroedl

Head of Machine Learning @ Atomwise — Deep Learning for Better Medicines, Faster. Formerly Amazon, Yahoo, DaimlerChrysler.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store