The making of my Dreamforce Presentation titled “Identify Protein Structures Using Einstein”

Gnanasekaran Thoppae
Gnana’s Blog
Published in
4 min readNov 13, 2017

It was a pleasure presenting at Dreamforce 2017 earlier this week, arguably one of the exotic subject among all 2700+ sessions —Protein 3D Structures!

The Origins and Inspiration

It was Dreamforce 2016, I was sitting at Benioff’s keynote when he and Parker announced Salesforce’s new #AI platform — Einstein, essentially an umbrella product name that includes tools and APIs related to Salesforce’s #AI capabilities. Following that, I visited the Trailhead Forest at Moscone West where I saw plenty of demos around image based prediction examples, one I particularly remember was the new image based house search feature of the Dreamhouse app, the reference implementation of all things Salesforce.

Waking, Waking Mr. Alex

The more I read and spoke about Einstein the more I realized that it is just another API that can be consumed from within and outside Salesforce platform. It quickly descended to me that I could rekindle my university days and wake up my inner biologist. Remember, #AI technologies always existed around and is nothing new. In 1996 for my masters thesis, I created predictive models using a “learning” algorithm to predict the likelihood of a protein sequence to a known 3d structure. I eventually published the same in 2000.

Sourcing of Images

I attempted to see how I can make use of the Einstein Vision API but mold to fit my use case — something similar to my past study. As I started to investigate different application areas, it occured to me that the largest protein structure database — Protein Data Bank, contains 3d images of protein structures frozen in certain angles and represented in “ribbon” model diagram. In my fit-for-purpose analysis, I found that whole lot of protein structures have been identified since my exit of bioinformatics scene 17 years ago.

They come in different size and shape

With a renewed confidence that these images could be used for image based model generation and prediction using Einstein Vision APIs, I continued to gather technical know-how. I attended the webinars on the topic, followed the Trailhead content that was rolled out later and eventually went around building my own custom classifier (deep learning predictive model) using a selective dataset composed of 3 distinct proteins (Lysozyme, Hemoglobin, Porin), each containing fifty 3d protein structure images that are closely related.

Test, Test and Test

In my playing around of the API I observed that the prediction indeed works. I tested with various protein structure images that weren’t part of the training dataset described above and yet received astonishingly close predictions. For ex., when I threw a membrane protein image at the model, it predicted high probability of it being a Porin which is a membrane protein. While I was doing this exercise during my off work, the Dreamforce call for papers came out. I hesitated on the relevance of the topic to Dreamforce but it was an opportunity to share what I found and to my surprise my abstract was selected!

Final thoughts

So, there you go, the background to my Dreamforce 2017 presentation. If there is one moral story I can leave you with, it is this. Do not think your topic to “too generic” or “exotic and different” — If it is worth sharing what you learnt, please propose an abstract. If you proposed papers for Dreamforce but didn’t get selected, don’t get dejected, there are plenty of community events such as Tahoe Dreamin’, London’s Calling and French Touch Dreamin.

Don’t stop sharing and stay young by learning!

--

--

Gnanasekaran Thoppae
Gnana’s Blog

Enabling organizations exploit the strategic value of the Salesforce platform.