On being (and remaining) a novice coder

It isn’t pretty. But it got the work done.

When it became obvious that I would have at least a rudimentary understanding of coding and to finish my ‘medium data’ PhD project, I did the same things a lot of novice programmers do. I took a few classes (both within the university, as well as those offered by organizations like Software Carpentry and Ladies Learning Code). I began looking at the work ‘real’ programmers do on Github, and tried to collaborate (i.e., get some sweet example code) from more experienced people. And lastly, I spent hours at my computer, trying to tackle simple problems through endless internet searches for solutions and much frustration (so much frustration…).

A few years of this, and I would now call myself a competent beginner at coding. This state of ‘competent beginner’ means a few things. I can make nice graphics, manipulate databases and datasets with some ease, and successfully take on a data analysis project from beginning (data compilation and cleaning) to end (statistical analysis and getting pretty graphics). I’ve even have a few flashes of brilliance, a few moments where programming allowed me to perform a task that was beyond the realm of what I think ‘beginner’ programming means.

How to attain the next step of coding? This is the stage of truly elegant code, in which a thoughtful, design architecture determines how a program is written and executed. Even to an amateur, it is easy to see the difference between this type of code and ‘beginner’ code — it tends to be shorter, calls upon a well-defined subset of other functions to reduce the complexity of a single piece of code, and takes advantage of innate properties of coding languages (such as vectorization) to drastically reduce the computational time required to perform complex tasks. Much of this is covered within formal education around coding and computer programming, but it’s a step that most of us miss if you are primarily self-taught.

Perhaps making this leap is akin to learning a language — you begin by learning words, and then progress towards learning the architecture and design of the language itself in order to deploy these words within meaningful sentences. Learning to code is somewhat like this — you pick up how to do little tasks, and then assemble them in order to do all of the steps necessary to analyze data. What is the best way of making the next leap, beyond frankensteining code together towards designing truly elegant (as well as quickly executed) code?

I wonder whether this is a stage that can be truly self-taught; it clearly requires a degree of analysis and knowledge that is likely beyond most amateur data scientists. Having access to a knowledge pool of people who are adept programmers is likely an avenue forwards for DIY programmers. So too are opportunities for direct feedback on your own code (such as this R-specific study group in Vancouver, British Columbia). What is clear is that this next step in attaining coding literacy is a big one, requiring both outside help, as well as much time spent refining and thinking about how to actually write code.

And maybe coding literacy isn’t about attaining perfect code elegance. This is likely true for a lot of people who are like me — they don’t have a background in computer science or coding, but need skills in order to handle the complexity of data that modern science creates. Indeed, creating ingenious coders isn’t the aim of programs that promote basic coding literacy within non-computer sciences populations. So perhaps it isn’t the fact that my code needs many more lines in order to perform the same data operation as another person’s (and takes my computer more time to actually perform). Maybe the accomplishment is still in the fact that I, a non-computer scientist, was able to accomplish it in the first place.