An Interview with Gordon Webster, Author of Getting Started With Python In The Lab: An Introductory Python Tutorial For Life Scientists
Published Feb 17, 2017 by Len Epp
Gordon Webster is the author of the Leanpub book Getting Started With Python In The Lab: An Introductory Python Tutorial For Life Scientists and co-author of Python for the Life Sciences: A gentle introduction to Python for life scientists. In this interview, Leanpub co-founder Len Epp talks with Gordon about his career, his books, and his experience self-publishing on Leanpub.
This interview was recorded on October 12, 2016.
This interview has been edited for conciseness and clarity.
Len: Hi, I’m Len Epp from Leanpub, and in this Leanpub podcast, I’ll be interviewing Gordon Webster. Gordon earned his PhD in biophysics and structural biology at the University of London, and has worked with life science R&D in both Europe and the US. He’s currently based in Cambridge, Massachusetts.
He has both academic and commercial experience, and is the author of a number of patents, in addition to scientific articles. In his profile, he writes that his “career path has reflected his belief that the most interesting and potentially promising areas of research lie at the intersections between the traditional scientific disciplines,” and I’m sure we’ll get to talking about that in just a bit.
Gordon is the author of the Leanpub book, Getting Started With Python In The Lab: An Introductory Python Tutorial For Life Scientists, and more recently — along with Alex Lancaster, he is co-author of the Leanpub book, Python For The Life Sciences: A gentle introduction to Python for life scientists.
Python For The Life Sciences is a great introduction to computer programming, written with the interests of biologists in mind — in particular those who haven’t written any code before. Along with the book, you get code samples that you can learn from, and even use for your own research. The book covers topics including biochemistry and gene sequencing, molecular mechanics and agent-based models of complex systems.
In this interview, we’re going to talk about Gordon’s professional interests, his books, his experience using Leanpub, and at the very end, ways we can improve Leanpub for him and other authors.
So, thank you Gordon for being on the Leanpub podcast.
Gordon: Oh thank you for having me.
Len: I was wondering if you wouldn’t mind telling us a little bit about yourself, and what I like to call an interviewee’s “origin story” — how you first became interested in biophysics and structural biology, and how you got to where you are now.
Gordon: I think my interest in biophysics started with seeing three-dimensional structures of DNA and proteins and stuff like that. I remember being very captivated by that intersection of physics and biology. And so I went into biophysics — kind of related to the thing you mentioned a moment ago, about the fact that I really enjoy things that are on the boundaries of two different disciplines.
So the idea of using physics to study biology, actually really appealed to me. There’s a sort of a certain mindset and methodology to physics that doesn’t always work, I have to say, in biology. It’s an incredibly interesting area.
The other thing that’s spurred my interest in biophysics was computers. I remember in the 1980’s, I got a home computer. I was completely hooked from the minute I started writing BASIC on a home computer. All through college, I always pursued projects and electives where I had a chance to do computing. So that’s always been a big part of my career too. Biophysics is a very computational, quantitative, numerically-intensive field, and so the computer stuff has always played a very large part in that.
Len: And what is the difference between what one might conventionally understand to be biology, and biophysics?
Gordon: I paint this picture of a spectrum. At one end of the spectrum of biology you have evolution, and field biology — studying species and animals and the way they interact, and all this kind of thing. And then there’s all the classification, taxonomy and botany and stuff. And then at the opposite end of the sort of spectrum, you have the almost atomic and molecular biology.
I call it the study of dead stuff. And it gets kind of ironic, that when you get to the very small scale in biology, down to atoms and molecules, nothing really looks like biology anymore. Because you’re essentially studying things that are governed by the laws of physics and chemistry.
And it isn’t till you get further towards that first end of the spectrum that I described, where you start to look at organisms and reproduction and survival and evolution, and populations of organisms and the dynamics of those populations — that you see anything that you could really call biology. So it’s kind of interesting that at the very small scale, a lot of the stuff that biologists study really looks like chemistry and physics.
Len: That’s really fascinating. I wanted to ask you, what was the subject of your research for your PHD at University of London?
Gordon: I studied structural and computational biology. There was a great interest at that time in finding ways of shutting off certain gene sequences. And we didn’t have the kind of technology then for developing these, like silencing RNAs and technology that’s out there now of that sort.
People were very interested in looking at drugs that could bind to DNA, and actually close down a certain gene, essentially by binding to the beginning of the gene, or the gene promoter — and shutting off that gene. The goal was always to try to be able to control gene expression, so that you could — for example — cure cancer, or other diseases that had a genetic component.
Len: I’m sure probably some of the people listening to this podcast have heard about CRISPR and how powerful that is. I was wondering, since I’ve got you here, if you wouldn’t mind maybe explaining a little bit about what that is, and why it’s so important.
Gordon: CRISPR is an interesting system — it’s sort of enzymes or a gene editing system that people have found in organisms. It’s not human made, it’s not invented. It existed in nature. And now there’s a number of companies who are trying to essentially patent it, and develop it for use as a gene editing tool.
So the former dogma of biology’s always been that once you’ve established a gene sequence in a cell, that it’s there forever and that there’s not much you can do about it. You can put things into the cell, maybe to switch it off. But then those things need to be there all the time.
The difference with the sort of CRISPR approach, is that now you’re basically going in and looking to edit the genes themselves that are in the cell, so that you’re interfering with the cell’s processes at the genetic level, which is something we’ve not really been able to do before.
Len: And what do you think some of the new applications might be, that people can make of this?
Gordon: I know that obviously people are very interested in disease. So some of the genetic diseases — there are genetic diseases where people are born, for example, without the gene that codes for a vital enzyme for example, that processes carbohydrates in the cell.
There are some people that have deficiencies in processing certain kind of chemicals that are essentially vital to growth and life. And those people often don’t live very long. They often die as children. I know that there’s a lot of interest in trying to fix those genes, whereas previously, all you could do was try to intervene with drugs and things like this.
Now there’s an effort to try to fix those kind of diseases — again — at the genetic level. So that’s something — again — that we’ve never really been able to do before. There were attempts sort of in the 90s. I mean, you probably heard about gene therapy, which was in the 90s. People were trying to do gene therapy with viruses. And viruses also have a very interesting kind of gene editing capability.
So for example, a lot of viruses, when they invade cells, they’ll splice their own gene sequences into cells, and co-opt the cell to produce more virus, instead of producing what the cell wants to produce. And so people thought that maybe viruses could be a way to do gene editing, and a lot of the gene therapy early on, was done with viruses.
And that field is still going. It’s not dead or anything. But I think that the CRISPR thing is an advance beyond that, in terms of having much more control over the way the gene is edited. The problem with the viruses, I think, is that it wasn’t always very easy to control where the virus would put the gene that you wanted into the cell.
Len: My next question is kind of personal, a little bit selfish. I lived in London for a few years working, and I studied in the UK at Oxford doing my doctorate there. And I always thought Oxford was the perfect distance from London. It was just far enough away that it took some time to get there. But it was close enough that you could still go there and enjoy London.
But I always wondered what it would be like, because there’s so many great universities in London — what it would be like to actually be a student, with all the fantastic distractions of London life around you. What was that like, studying, doing your doctorate in London?
Gordon: It was awesome. And you’re right that it was — it was sometimes not easy to — to focus on work, when you had all that stuff. But you have to also bear in mind, I mean — I was there in the 80s. So I mean this was the era when people like [The Clash])https://en.wikipedia.org/wiki/The_Clash_ were playing at the Hammersmith Palais. It was an incredible time to be young, and also to be a student in London. And I absolutely had a marvelous time there. Maybe sort of too good a time, sometimes. But yeah, it was — it was fantastic. I just had a really, really great time. It’s a wonderful city.
Len: I wasn’t there in the same era you were, but like, just going to Camden any given night, you can find fantastic bands playing. It’s just so amazing.
I wanted to ask you about Amber Biology, which is the consulting firm that you have with your co-author, Alex Lancaster. When did you set up your consultancy, and what kind of work do you do?
Gordon: I created the consultancy about three years ago. Originally I had a partner who was somebody I used to work with when I was working more in the mainstream of the pharmaceutical business. He was somewhat engaged at the beginning, but he had a day job and he didn’t really want to give up his day job, and he ended up kind of becoming a silent partner. And in the end, I guess, the company was kind of moribund for a few months. In the end, I persuaded him to relinquish his partnership, so that I could work with Alex, because Alex was very interested in being actively engaged in Amber Biology. And so we had a change of partnership last year.
They finally got all the paperwork through in summer of last year, about when we started on the book as well. And then essentially, we’ve been building the business. The business had been going for three years, as Amber Biology. But Alex and I have been working together for about a year and a half now. So it’s about a year and a half we’ve been doing it together.
And the kind of work we do is all computational biology. So anything you can do in which biology can be done on computers. This includes a lot of things that are — I mean, when you talk about biology and computers, a lot of people immediately think of bioinformatics. It’s the big area that everybody’s heard of. People think about gene sequencing and genomics and gene analysis. And that’s certainly stuff that we do as well.
But both of us have a background in modeling and simulation in biology, and that’s an area that we are really keen to pursue. There’s a whole backstory here, and we can get into that if you’re interested. But I would say it’s still very early in biology for people doing modeling and simulation.
If you think about physics and civil engineering and things like that, simulation and modeling are a main stream of research. In physics, for example, people model the movement of stars and planets, using sort of gravitational models. They plug the observations from telescopes into them.
And then when you have a deviation of the model from the observations, that’s actually interesting. This is an example I like to give. Where models can be wrong, but still informative, and that is that if you’re studying a binary star system — you plug in the Newtonian Gravitational Model, and you find that it doesn’t match the observations.
What that often tells you is that there’s hidden mass there that you can’t see with the telescope, and there are one or more planets orbiting one of the stars. And so the deviation of the model from the observations, gives you a clue as to how much mass is missing and where that mass is.
And that kind of thinking, that kind of mindset of using modeling and simulation is really prevalent in physics and civil engineering, similarly. I mean, you want to build a suspension bridge. It’s going to get built in CAD in a virtual sense, before any steel or concrete gets built in the real world. And then all the pieces get tested in CAD, and there’s feedback from the physical testing of all the pieces of the bridge, back into the computer model.
That’s the kind of place that we would like to see biology go. But it’s still extremely early, and most modeling in biology right now is exclusively the confine of people doing, for the most part, theoretical biology. And those people are often people who have backgrounds, for example, in computer science, and who are doing this kind of thing that you talked about earlier — of straddling different disciplines, and bringing computer science ideas into biology. This is the area that we’re really interested in. But like I said, it’s very early in biology right now.
Len: I really liked that analogy. I found it in something that you wrote — I think — on your blog. I mean, I don’t know if you used this example specifically. But I think it was Neptune was discovered because people saw deviations from the expected movement of another planet. So they derived from the deviations, from the model they had of the way the whole system worked what must be going on. What you’re saying, I think is that biology is, given our current understanding of it, too complex, to have a whole model in the simple way -
Gordon: Right, exactly.
Len: I mean people think physics is really complicated. But even physicists will tell you in some ways that it’s very simple. And it reminded me of the story of Vulcan, which was this planet people thought existed, because they saw deviations in the movement of Mercury that they couldn’t explain. And it took centuries until Einstein, to figure out that there wasn’t — well I mean, people realized from observation that it wasn’t there. There was no planet there causing the deviation, so it must be something else. And then it actually took a fundamental change in the entire model, to understand why Mercury was moving the way it was. And I guess what you’re saying is — biology is so far from even having a kind of model of the first type, in the first place, that getting to that second step, isn’t there yet. You have a blog -
Gordon: Yes, that’s exactly right. And the other issue is that people who have not had a lot of experience with modeling, which is true for the most part in biology — they tend to think of modeling like weather forecasting. The idea is you have this very big, very complete model with, essentially, data points for everything…
All the data are very well represented, very — very complete. And then you run the model and you make predictions. This idea that an incomplete or partial model could be of any value is something that — I think most people in the biological field tend to dismiss modeling, because of these kind of fears. Because of the complexity. Well, how could you model the inside of a cell, because there’s just too many moving parts?
Len: You have a great blog post called, “Big Data Does Not Equal Big Knowledge”. I’m sure everyone’s heard talk about Big Data by now, and I was wondering if you wouldn’t mind talking a little bit about what you were getting at in that blog post. You talk in particular about how visualizations — or the type of visualizations that people often get from data — are not necessarily as useful in the life sciences, as they might be in other fields.
Gordon: With these numerical quantitative approaches, it’s a little like the kind of demographic data mining that political campaigns and advertisers do. It’s like sort of looking at trends in the data. And I think there are lots of areas where that kind of approach works really well.
And in biology, I mean, you can do it too. I think I give the example of dose response curves and things like this. Where you have a relatively simple system with not too many variables operating under the surface. And the problem with the examples — for example, where this kind of stuff has really failed dismally is in areas like gene expression and genomics. So people were sure that once the human genome project was complete — I remember, I think it was Watson, was saying, “Oh within a couple of years of this, cancer will be a thing of the past, and we’ll have a handle on all of the disease genotypes,” and so on.
And really, what we learned from that, is that we don’t really understand the genome as well as we thought we did. So, having the human genome sequence is a bit like having the physical location of any neuron in your brain. It’s still a long way from explaining consciousness. I mean, yeah you could map the brain in the greatest detail, but it still doesn’t exactly tell you how the system works.
And I think with Big Data, what people are trying to do, is say, “Well I don’t really understand what mechanisms are under the hood here. But if I look at the data under one set of conditions, under another set of conditions, and I carefully weight the data, so that I’m not comparing apples and oranges, then basically if I can see some significant differences in the data, those may point to where the problem or where the issue is — whatever the thing is I’m trying to investigate actually lies.”
And it’s a valid approach in a lot of ways. I mean it’s not crazy. And some of the low-hanging fruit has probably been picked in that approach. But, for example, patterns of gene expression, or patterns of phosphorylation in the phenotypes of cells — those things are so complex, there’s so many different moving parts.
And it might be, for example, that what you’re looking for isn’t the biggest difference between one set of genes expressing in another, but maybe some pattern of differential expression, that might be buried in all the noise that you cut out, because you think it’s not significant. But it might be some recurring pattern of 10 different genes, all of which have very small but significant deviations when you look at them all together.
So these are the kind of things that Big Data is trying to uncover. And the visualization thing is also usually — you apply a lot of filters to the data. You try to pull out the differences in the data, in the way that a sound engineer would try to filter background out of a recording. As you were saying earlier about your software for doing audio filtering.
And I think that the problem is that, it’s an effort to sidestep the complexity of the biology. It’s partly driven by this fear that, “Well, I could build a model, but how could I ever build a complete model?” It’s always going to be a partial model at best. And so that probably isn’t going to work.
Len: You mentioned earlier, failure, and you just mentioned side-stepping. That leads me into my next planned question for you, which was — I wanted to talk to you a little bit about Theranos. I’m sure a lot of people listening have heard about this company that’s turned into a pretty catastrophic failure in the health sciences area. I know you’ve written about it on your blog, The Digital Biologist, which is the reason I’m bringing it up.
And I wanted to ask you, how can something like this — if you could explain a little bit about what Theranos is, and how it failed, and how can something like this happen in the sciences? I think it’s a question that a lot of people have. The lay person associates science with rigor. And there appears to have been this huge fraud.
Gordon: Right. I mean, I think the one thing I would say is that — yes — that lay people do tend to think of scientists as being almost kind of like Mr. Spock — that is logical, and everything is kind of decision making, devoid of all that other human baggage like emotion and ambition and greed and all that stuff.
And the truth is, it’s really still very much a human activity. And the application of the scientific method — there’s this kind of ideal view of it, if you look at the books on the philosophy of science and Karl Popper and all this kind of stuff. There’s this very idealized, sort of Platonic ideal of what the scientific method is. But when you start to combine science and commerce, then all that human stuff, it still plays a role. And honestly, it plays a role even in academic research.
There it’s not so much about money, but about prestige and ambition, and people in academic research sometimes stray because they want the result that they want, because they know it’s going to get them that Professorship, or the prize or the prestige or the recognition within the community that they want. And so, the number of cases of academics going off the rails — even over issues about prestige and standing in the community — are well documented.
And when you start to think about that in the context of the Theranos thing, I mean there, the stakes are even higher. You’re talking about ambition and prestige and standing, but also about billions of dollars and entire careers. And so the human stuff definitely plays a role in science.
Len: It’s really interesting. In particular, I remember when I first heard about the company. I looked into it, and I saw that– I mean, this is currently on its [corporate] board — but it’s also, in addition to being very human, and a business with lots of money at stake — it was extremely political.
Currently you’ve got a former Wells Fargo CEO. A company that’s also in the news these days. And a retired Marine Corps general on the board. And on its lists of Counselors, currently it includes Bill Frist, the former US Senate Majority Leader. And Sam Nunn, a former Senator — and Chairman of the Senate Armed Services Committee. And incredibly, to top it all off, Henry Kissinger was involved.
Do you think that one of the reasons they could get away with their self-representation was, all of these powerful people were behind it, and that that may have deterred people from seeing the truth early?
Gordon: Yes. I liken the Theranos problem, and problems that a lot of biotech and pharmaceutical companies have, generally, with the kind of problems, for example, that NASA had. I understand that one time there was kind of a management culture — you had a lot of people managing projects who are driven by deadlines. And as you said, political considerations — who weren’t really engineers, and didn’t really understand the risks and understand the complex systems that they were building.
And the Challenger disaster was an example where this kind of management culture essentially overrode the culture that should’ve prevailed at NASA. Which is one where — in my opinion — for those kind of projects, you need engineers who understand the systems that are being built, and the risks inherent in those systems. Those are the kind of people that should be running the project.
And at Theranos — and not only Theranos, but other biotech and pharma companies too — what you often have is kind of a management mindset where you have people who maybe did an undergraduate degree in science, never really done a lot of research. I don’t want to slam MBAs here, but there’s definitely a — you see a lot of MBAs in high places in the pharmaceutical industry, driving R&D, who’ve never really done any R&D themselves.
And so I think that you have this culture now, where there’s this management culture — people go to business school, they get an MBA. They feel that it makes them qualified to oversee all kinds of human activities — whether or not they really understand the risks and the processes inherent to whatever it is that the company, or the organization is making.
And I feel like with Theranos, you have a similar kind of thing. I mean they didn’t publish any data. Everything was just like radio silence, in terms of actually validating the technology. And they held out for a really long time. I mean — obviously now, we know, thanks to the Wall Street Journal’s reporting — that that was because, essentially the stuff didn’t work.
But it it would’ve behooved them to have taken a more rigorous approach, and known before they had gone down that path of wasting all those millions of dollars — that this technology wasn’t going to work. And somebody else might’ve intervened, or the R&D might have been done differently. Or they might have pivoted much earlier, as they’ve pivoted recently to this new thing, when they’re no longer a provider of blood tests. Now they’re going to be, as I understand it, a developer of hardware for this?
Len: It’s really interesting, what you said about MBA’s and the concept of overseeing. That subject has come up on some of the interviews that I’ve done for this podcast, repeatedly. And I think it’s partly because a lot of the people that I interview are software developers. I could talk about it for a long time. But the theory behind a lot of management, I believe is based on being able to oversee. Being able to see what people are doing. And it’s deep, it’s deep in the structure of the way MBAs are taught. So deep that people don’t even know it’s there.
For example, if you’re managing people laying bricks, or if you’re managing people building a house — you can have a kind of abstraction around watching. You can see whether people are having a pint, or laying the bricks. You can see whether people are hammering. You can hear whether they’re hammering the nails, or whether they’re playing cards.
But a scientist? I mean, you can see her at work in the lab. Or a software engineer, you can see him sitting at a desk with a computer. But you can’t see the work. Because the work is mental. And it seems to me that this is a real problem in an era where software is eating the world. We need a new way of thinking about management. And the thing that keeps coming up in these conversations is, that you need to have domain-specific expertise, in order to manage people who are doing work that involves primarily things you can’t see.
Gordon: Absolutely. And the other thing about R&D generally, is that it’s a very non-linear process. I think the management mindset has arisen out of these kind of industries where you have a production line, and you have this chain of processes from A through Z. And you go A, B, C, D. And then if there’s a bottleneck at D, you fix that bottleneck, so the thing works better. But it’s all very much a a process of box checking and crossing T’s and dotting I’s, and you have a defined process. I feel like that management mindset works really well for that.
So the kind of people who manage well in the pharma business, tend to be more, I feel, on the regulatory side. Where, once the drug gets through this sort of R&D phase, and it’s now being in development and clinical testing, I feel like the process there — t’s still not completely linear for sure. But there’s much more of that kind of production line mindset there.
But R&D — it’s iterative, it’s non-linear. You start an experiment. You see something really interesting. It can take you off in a whole different direction. No amount of management deadlines can mandate that nature is going to behave in the way that you want it to. “Well, we must have an answer from this particular cell line before December, so we can tell the investor something.” And if the cell line doesn’t want to behave as you expect it to, then you’ve got all kinds of questions you haven’t answered.
This comes back to the modeling again. I see modeling as very much an adjunct to this experimental sort of method, where you’re basically helping to determine what the next experiment is that you really ought to do, to answer the questions you need to answer.
I wrote a piece on this on LinkedIn. I likened a good R&D team to a jazz ensemble, rather than an orchestra. And I talk about this quite a lot in that thing too. I have certainly worked at companies where I have seen this kind of management decision-making going on. And it’s really based more on what the company needs to do, but without a real understanding of what the company is actually able to do.
I’ve worked quite a lot in software development too, and I’ve also seen that in software development where you have a similar kind of situation, where you have people who’ve never really written or tested code — overseeing a software development effort.
One company I worked at, I ended up having to leave, because they felt that testing was an unnecessary waste of time and money. And this was not an ideal world, I was told. And in an ideal world, we would test everything — but here, we just don’t have time to do that. And so it all turned really sour for them, because — of course — a lot of the software they were rolling out, just didn’t work — it was embarrassing, and I was embarrassed to be a part of that effort.
Len: I’ve spoken with a couple of professional testers about that very issue, and so I have a little bit of a sense of how frustrating it can be to be part of a project where people are just profoundly mishandling it under a false view of efficiency.
Speaking of writing code, I wanted to ask you — what was the inspiration for you and Alex to write, Python For The Life Sciences, which is a book devoted to helping people who haven’t coded before, to learn how to do so?
Gordon: The biggest single reason is that, well, partly it stems from what we said earlier about the fact that modeling and computational approaches are still relatively non-mainstream in biology. What that translates into is that there isn’t really much in the way of a computing component in the core life sciences curriculum. So most biologists can go through college, and pretty much avoid using computers — other than maybe for writing their articles, and maybe using Excel spreadsheets and stuff like that.
If a few of them are lucky, they may get to have some kind of training in MATLAB, or R or something like that. But for the most part, a lot of biologists graduate and start doing research. They go into grad school, or even become postdocs, without ever having really done much in the way of computational research. And what you see in labs is people doing endless calculations, still with hand calculators. People are using Excel to process all their spreadsheet data, painstakingly copying stuff into tables.
Nowadays, there’s more lab automation. So a lot of the lab instruments produce data that’s ready to be visualized in Excel. But Excel — it’s a great tool for what it is, but it’s not really the tool for most quantitative biology. There are certain things, for sure, you can do with it. But being able to write code gives you the opportunity to look at your data in ways that’s just not possible using hand calculators and spreadsheets. And so that was really our major…
So when I worked, for example, at one pharmaceutical company, I remember that the really the only numerical piece of software in the entire company that was used by everybody from the financial people in the accounts department — to the scientists at the bench, was Microsoft Excel. And that was in a company doing the kind of quantitative work that drug development is. This is the progressive aggregation of knowledge. It just — it struck me as kind of bizarre that in such a quantitative field, where data and numbers are so prevalent, and more so now than ever, that you have so many people working in that field who just have no real way of using computers to their full potential.
Len: And why did you choose to focus on Python?
Gordon: Because it’s, I think in the book we call it, the Swiss Army Knife of programming languages. It’s a wonderful language that you can just start using right away. I trained also in Java, for example. And Java requires you to use the kind of object-oriented paradigm for programming right from the outset. So there’s a steep learning curve there for anybody who’s not familiar with object-oriented programming.
And it’s also kind of a sledgehammer to crack a nut. If you just want to write some small scripts to open a file and read some data in, and reprocess the data in a different format, or find some patterns in a sequence, or something like that, you don’t really need to be writing object-oriented code all the time. So I like the fact that Python gives you that option to just jump in and start writing the procedural code that we all used to write, when we were writing in C and Basic and stuff.
Or for more complex applications, you can scale it up and use that object-orientated programming paradigm, to help you — to organize all the moving parts. And write applications in the large.
Len: You mentioned earlier, and I believe I read on one of your blog posts, that the book took you about a year to complete.
Len: Or to get to the state it did, that it’s in now. Was it your plan from the beginning for it to take a year?
Gordon: No. I think the book ended up being much bigger than we thought it would be. I think it was going to be a little 50 to 100 page thing about biocomputing with Python. And almost like a get-you-going tutorial. But then, it just blossomed, mainly based on both of our previous experiences in modelling and using Python in our own research. And, “Oh, wouldn’t that be cool? Remember when I did that stuff with the robots?” And, “Remember when I did that stuff with next generation sequencing? We should include some of that.”
And so, there was definitely — I guess in the software world, you’d call it feature creep. But we’re very happy the way it. We were glad we did it. It’s much more of a full-fledged book than I think we imagined in the beginning.
Len: It looks great. And I wanted to ask you — you didn’t use the Leanpub workflow to make your book, rather you used our “Bring Your Own Book” feature to upload your book, so you can sell it on our bookstore.
I was wondering what tools you used to make your great-looking book?
Gordon: The entire book was actually built and edited in Google Docs, because we needed a collaborative platform. And I use Macs, and I use Linux and Windows as virtual machines on my Mac. But Alex is a Linux guy. We couldn’t really use something that was primarily in the Mac world as a tool, and so we settled on Google Docs, and it worked really, really well — until we got up to about 250 pages. And then you start to see the limitations of trying to edit large documents in a web browser.
I’ve got to give the Google people credit. Google Docs is a great tool. But once we reached pretty much the maximum size that’s practical for a Google document, around the 300 pages mark, we already started to see that it was unresponsive sometimes.
And the other issues that we had were — when you create a PDF out of the Google Doc, it does some silly things. For example, all of the internal links point back to the original Google document, and not to the new PDF. So if you have a link in your new PDF to page 100, it will actually point to page 100 in the original Google Doc. Which is kind of absurd. I mean, if you’re exporting to PDF, you would hope those internal links would remain internal.
So what we ended up having to do, was to save the entire document as a .docx file in Microsoft Word format. And then we used the Mac Pages program. Well — initially, let me say — I tried using Microsoft Word 2011. Which is the version I happened to have on my Mac. And that does not preserve the links.
When we first published the book on Leanpub, all the links inside — the external links, were dead, because Word didn’t handle those properly. And when we put it into the Mac Pages program, then it did a good job of exporting the document. And also, there were some other issues with Word. The images would stray. It didn’t really know how to place images where we’d placed images in text. The images would stray into the margins of the page, and look kind of ugly. And you ended up having to go and do a lot of fixing of the positions of the images and stuff like that.
So in the end, the workflow was — Google Docs, save as .docx, import into Mac Pages, fix any kind of page formatting stuff that we needed to fix, and then export as a PDF. And that worked for us.
Len: That’s quite a journey. Thanks for all of the details.
I wanted to ask for any other self-published authors listening: both of your books have great covers. And I really like the one for Python For The Life Sciences, where the sort of strand of DNA is the snake, presumably a python.
Len: I was wondering, do you have any advice for people about how to find a source of good book covers?
Gordon: I used Keynote to make that cover. Which is kind of the Mac equivalent of PowerPoint. I find that to be a really versatile graphic design tool. I don’t claim any great expertise or knowledge in graphic design, but Keynote is actually a really great tool if you want to blend some images together, and make some simple shapes. If you look at the cover of the book, it’s all fairly simple shapes, and takes a bit of playing around with gradients and colors to get it right. But yeah, Keynote — it’s a fantastic tool for putting together designs. That cover was completely designed using Keynote.
Len: You have a section at the back of your book where you ask for readers to send you any errors or omissions they may find, and just send them to you via email. Have you had any responses like that?
Gordon: Not yet, no. But, I mean, one of the things that attracted us to Leanpub — we both have software development backgrounds. We really like the iterative publishing model. It’s liberating to be able to get a book out there. Not to have to worry that every little typo is fixed, every diagram has the right caption.
Obviously we did our very best. We didn’t want to put something out there that looked sloppy or half-finished. For our own pride, as much as anything. But it’s great to know that if — there are always errors, and it’s great to know that with the Leanpub model, you have a way to go back in, fix the errors, upload the book, and all your readers are able to benefit from that too. That’s a really nice feature, something that really attracted us to Leanpub.
Len: Do you plan to make a print version of your book?
Gordon: We actually do, so actually — here I can show you on the video. Here’s a proof copy. With the cover. I actually expanded the cover, so that we’re on the back and it has the spine. We went to a local book store. They have a machine called a Gutenberg Machine, obviously named after the old printing press. And it does a really nice job of printing books on demand. It’s a black and white copy now, so the interior of the book is black and white. The machine unfortunately doesn’t do color. But we are exploring a number of places right now, where we might be able to produce physical copies of the book.
There are people who still like physical copies of the book. And I think also for libraries and schools and things like that, there’s still a place for having a physical copy of a book.
Len: If there was one feature we could build for you, or one problem we could fix, what would that be?
Gordon: I think putting the book together and collaborating on the book is something that it would be great to have — a more fully fledged tool for doing that kind of thing. And not necessarily in a web browser. I mean it could be like an app. For example, I’ve made some photo books previously. And a lot of those photo book, online services — they have an app you can download to your desktop. And you can actually build the book in the app, and then it publishes it to the website for you. So you’re not being forced to work in a browser with all of the limitations that entails.
So I feel like it would be great to have something. And also, a tool for creating a book, that would allow you to immediately go into multiple formats. PDF, MOBI, eBook — ePub, sorry. All this sort of thing.
Len: We do have that if you write your book using Leanpub.
Gordon: Right, right.
Len: We automatically produce PDF, EPUB, MOBI, and you can make a website if you want to. But around collaboration, that’s something where, it’s this huge area, where we’re definitely going to be doing work at some point.
Gordon: I can give you some tangible examples of the kind of problems we faced. We had lists of topics and things we wanted to cover. And we’ve put them all out on this whiteboard, that’s actually behind my desk. And then you’d have things like — well, okay — I wrote the chapter. Do we introduce Matplotlib in chapter four or chapter six? Oh, I think you introduced it. So some parts of the book, we would have explanations for things, where somebody had already introduced it previously in the book. And we’d have to move the explanation back in the book.
Had we covered all the topics? It would be really nice to have almost a kind of a meta book assembler. So that you could assemble the book in a kind of outline manner, with all the topics you want to cover. And then as people are working on it, you could tick off the topics and where they first appear in the book. And all that kind of thing. It’s more of the structure of the book, like a way to collaboratively define and keep track of the structure of the book as you’re working on it.
A lot of the features of most sort of book editors, are focused very much on layout and putting images in the text, and the markup and, what’s bold and where the links are. And chapter headings and tables of contents and stuff like that. Which is great. You need all that stuff too. But I don’t see much in the way of meta kind of — do you know what I mean? I don’t know if meta’s the right word, but…
Len: I do — I do know what you mean. Thanks, that’s very clear and that’s really interesting. That’s a really great observation too. I mean, especially where there’s so much emphasis placed, in so many writing tools, on formatting — but not on structure. When presumably, when it comes to the reading of a text, or most texts for most purposes, the structure of the text is far more important than the formatting. So that’s a really good observation. Thanks for that, we’ll process that internally.
Gordon: Alex and I both used — I don’t know if you’ve ever used TeX or LaTeX? They’re these sort of markup languages for creating typeset text. But something along those kind of lines, that way, you can really define the meta structure of the book as well.
I mean, like you said — the structure of the book is really important. And then to be able to, in essence, kind of apply stylesheets, like that CSS kind of model, where you have the structure of the book, and you say, “Okay, a chapter’s going to have a header and a footer. And it’s going to have this block of content at the beginning that describes the chapter, and maybe a picture and all that kind of stuff.”
And you lay that all out, and then you can just — “Okay, let’s look at it in this style. Let’s look at in this style.” And yeah really, really decouple the content and the structure of the book from the layout.
Len: That’s a request that we’ve had from some of our best authors in the past. And it’s something that we’re thinking about. It’s really — conceptually it’s very consistent, as an idea — it’s very consistent with Leanpub’s approach to writing. Which is that — when you’re writing, you should be writing, and you should consider formatting to be a separate process.
Len: Like for 99.9% of books, that’s the appropriate approach. And separating those things too conceptually, is very important to us.
Len: Unfortunately, I think our time is about up. And I just wanted to say — thank you for a great interview, and for making such a great book.
Gordon: Oh thank you.
Len: And for being a Leanpub author.
Gordon: Thank you very much, it’s been a pleasure. We loved it. I’m sure it won’t be our last one.
Originally published at leanpub.com.