Beginning My Quest: Become a Software Engineer/Data Scientist or Bust!

When I was young, I was basically certain I would become a computer programmer. I loved everything about computers, and not just games — I loved doing the things we had to do on Windows systems in 1997, like configuring CONFIG.SYS and AUTOEXEC.BAT, and like so many kids of that era my family treated me like a sort of wizard for learning those things. I wanted foremost to be a wizard, of course, or an astronaut, failing that; but my third and clear fallback was computers. I even had my plan together by the age of fourteen — also the age I built my first PC — to go to the university in state that most directly specialized in computer science. But that was also the year I discovered speech communication, through competitive forensics, and from then on, my educational interests constantly were pulled between my original plan of becoming a programmer, and pursuing social sciences, a realm that fascinated me because, as it turns out, people are interesting too.

Ultimately, by college, the social sciences had won me over. I went to a school where I could focus on both communication and philosophy and continue to compete in forensics (the talking-out-loud kind, not the criminal or the digital) and show off my public speaking abilities. If I’m being fully honest, if I could travel through time, that is the one of my life decisions I would change. By age 27 I had a Ph.D. in communication and was in a tenure track job at Rowan University in Glassboro, New Jersey, and I immediately knew that while I didn’t regret having the knowledge from my Ph.D., I didn’t want to be a college professor. I almost immediately began to hatch a plan to shift into software development as I had originally intended, hopefully applying many of the skills I had from my Ph.D. But until a family emergency two and a half years later forced my to leave my post at Rowan, I was unable to put this plan into operation, due to the demands of working as faculty (although I did undertake technical and statistical courses and grant applications during my time at Rowan).

I am not a newbie to computers. I had learned to program and make small utilities and games using Microsoft QBASIC and a dialect of Macintosh BASIC for the 68k era Macintosh I had in my room as a personal computer by the time I was nine. I had begun a preparatory programming curriculum in high school, completing a full software program (I guess we’d call it an app now) in Microsoft Visual C++ by the time I was 17. And I continued to dabble in programming as well as systems administration as a hobby, becoming one of those people who installed Linux to see what it was like, and then using those skills to start a web forum that I operated as the sole technical administrator from 2012 until either 2016 or 2017 — during which time I administered a set of MariaDB SQL records and migrated content management systems. During my college years, I had also learned the basic syntax of Python. Despite this, the technologies that drive the areas where I can best bring my skills to use — modern web development and problem solving, driven by Javascript and back end applications, and modern data science and machine learning, driven by the familiar Python but working in complex, algorithmic ways in which I have never been trained — are places I need extensive skills-building, and if I want to get a job, portfolio-building.

I live in the rural Midwest. It’s hard to network to get any job here, much less a job in software development — and I am a city girl at heart, even though I’ve spent all but one year of my life in the rural Midwest, South, or the New Jersey suburbs. I am facing an uphill fight here, but with the family emergency that claimed my previous career out of the way, I am now dedicated with passion and zeal to learning to code. I’m starting this blog as a way to keep myself accountable, and also to post about the projects I’m working on. You see, I’m a project-based learner; it’s very hard for me to work through “hello world” exercises or even do things like analyze Python’s existing data sets. I’ll do it, as a test, but I already have at least four major projects I’m working on — more on that later — and my goal is for these learning projects to become portfolio projects when I go up for a job. This is ambitious, but so is trying to accomplish anything when one lives so far from civilization that there are more cows than humans within a one mile drive from you. Ideally, this blog will become more of an amateur developer/researcher blog that showcases my skills as my abilities mature and I return to the path I should have followed more than a decade ago.

My Method

My learning method is simple. It’s not without cost, but I have some money saved and have decided it’s worth committing. I had considered a coding boot camp, but the advice I found on most of them was that it wasn’t worth it, and in the era of COVID, if I’m going to be simply at home and talking to people for help, I decided it would make more sense to simply do that. Additionally, I’m a text-based learner; I read my first coding language library manual when I was nine. (I can’t say I actually learned Borland Power C, at least partly because I didn’t have access to the proprietary compiler and libraries being documented, but I definitely read it, and tried, and got more than I would have from watching someone explain it.) I know I will need interactive feedback and guidance, and starting this blog is partly a way to try to seek that as well — but I’ve also made accounts with Free Code Camp and have subscribed to Codecademy for interactive tutorials with coding language syntaxes (which, again, I find more helpful than videos).

In terms of books, to start with given my initial goal of learning data science and back end software development as much as possible before diving into the nuts and bolts of the front end, I’ve purchased the following resources:

For Python & data science:

  • Statistics in a Nutshell: A Desktop Quick Reference 2nd Edition by Sarah Boslaugh — while my PhD provided statistics training, and I was working on some empirical work at the time I left my university post, I need a reference for the deeper elements of stats, since so much of my work was purely analytic. (These things cross over more than you think — my undergrad double major in philosophy meant a logic class, which is how I know what the actual definition of an algorithm is!)
  • Problem Solving with Algorithms and Data Structures Using Python 2nd Edition by Bradley N. Miller and David L. Ranum — this is a computer science primer that also covers Python. Since I already understand Python, but never got the chance to take college level courses in it, this will help fill knowledge gaps, and from skimming through it it will also provide some direct guidance on the kind of problems I’m trying to solve for my portfolio projects. Most of all it looks like it will help me at least start to understand computer science in a way I’ve read many employers look for (and are concerned may be lacking in a job candidate who doesn’t have a computer science degree).
  • Python Data Science: A Simple and Effective Guide to Python Data Science by Christopher Wilkinson — this was actually the first book I got on my quixotic quest, it helped me nail down Python syntax, and it’s really helping me get basic things like web scrapers up and running. It’s also a nice reference that’s a good size and nice for me to carry when working out on the porch.
  • Practical Web Scraping for Data Science: Best Practices and Examples with Python 1st Edition by Seppe vanden Broucke and Bart Baesens — given recent US Postal Service delays, this one is still held up in the mail. But I’ve mentioned that, especially as someone with a Ph.D. and numerous qualifications in textual research, it’s important for me to work on real data projects to build a portfolio. Although all the Python data science books cover scraping, I have some rather complex data needs for some of my research, and I’m hoping this is the dedicated guide I need for that.
  • Natural Language Processing in Action: Understanding, Analyzing, and Generating Text with Python 1st Edition by Hobson Lane, Hannes Hapke, and Cole Howard — I don’t think I’m quite ready for the meat of this book yet, and none of my current example projects involve deep natural language processing, but I know that many of the jobs using these skills involve this, and it does interest me. I’ve mentioned there are things about software development and data science that have changed since the last time I looked into them, and quantitative language processing is something that wasn’t mainstream even back in 2012 when I took quantitative methods in my doctoral program. I don’t fully understand it, but I know it’s big, I’m sure it’s useful for my projects, and I want to understand it!

For Web Development and Javascript:

  • First of all, Codecademy and Free Code Camp deliver really good overall pathways here, and I am working through them — there’s so many good resources on this online that I wanted to spend the money I was spending on books on primarily books about data science and Python, where I’ve been less excited about the online curricula. I’m working through these, but slowly, because as I’ve mentioned the back end is what excites me (and learning Node.js when most of the problems I’m looking to solve have more pathways directly evident in Python is not my highest priority. Don’t get me wrong, I will learn Node though.)
  • The Road to GraphQL: Your Journey to Master Pragmatic GraphQL in JavaScript with React.js and Node.js by Robin Wieruch — this reflects my approach to start with data and start with backend. I’m hoping to learn to master some front end techniques while also learning an alternative visualization approach for data. These techniques can hopefully then inform my learning of broad Javascript front-end and back-end frameworks.
  • The Road to React: Your Journey to Master Plain Yet Pragmatic React.js, also by Robin Wieruch — I know that React is big right now in the industry, and that it is a path to creating front end websites and apps.

I’ve also picked up a couple of C# tutorials for when I don’t feel like working on my actual coding projects — hobby game dev has long been a dream of mine, and while I’m not putting the gaming industry on my radar (at least not in terms of developing games myself) career-wise, I’d love to have the skills to make my own games and game engines/sub-engines, and C# is the language of Unity.

In addition to all this, I made the decision to use some of my savings to purchase a MacBook Air of the latest model — so I have something with a Unix ecosystem out of the box (I have not been terribly impressed with the new Windows subsystem layer so far) and something that is comfortable to carry around and work with so I’m motivated to work anywhere. So far having this new computer has helped a lot at being productive.

My Learning/Portfolio Goals

So, what are my learning projects? I’m obviously not just going to read these books cover to cover and understand things. And I’m not saying I refuse to work through exercises — I’ve done plenty on FreeCodeCamp and Codecademy. But I do have specific interests I’d like to implement, do research on, make, blog about, and these are my projects in rough order of priority for implementation:

  • I have particular questions regarding minority representation (specifically LGBT people) in movies, and the common claim that business sense says don’t put gay people in films or they flop. I want to test if that’s the case, explain when it is or isn’t the case, and publish a full report. I’ve done some manual data analysis on this on my own and found some interesting facts (like the other common claim, that gay films do poorly internationally, is just wrong based on manual data scraping).
    In any case, I need to scrape data on movies, complex data, and since some data (the actual cost and profit margins of movies beyond raw box office data) is often or regularly not available and is almost always unreliable, I need to find new measures to infer these things.
    A note here: I know that “working on movie data” is a common new data scientist project, and I am in fact a newbie. However, I am very specifically not just querying the IMDb API or working from their .tsv files. One of my big data scraping goals, which I can publish as an independent open-source mini-portfolio project, is a way to access Rotten Tomatoes reviews and scores, as their API is notoriously limited and basically off limits to researchers and students.
    Once I have this data, I can start to apply statistics knowledge and visualization, and possibly implement something of a “live” webpage using front-end technology to visualize results. I may also create a “less controversial” public facing version of this project investigating my theory that non-superhero action movies are becoming financially unsuccessful and trying to determine why.
  • I mentioned I’m a video game fan. I’m interested in trends around the reviews of different games — why some games generate controversy, others seem to get universal praise, etc. In order to examine this scientifically, I need to (and I have already implemented an almost-functional parser for some sites in this regard) scrape reviews from major gaming websites and also access social media data (I have acquired a Twitter developer token) and then apply analytical techniques, like basic correlation and other things my academic training prepared me for, but also possibly eventually natural language processing and analysis. Again, this could become a public white paper or report, documenting and linking to my methods on GitHub.
    Both of these projects have extant code and I’m simply working through the problems and solutions related to them. The next projects haven’t been started yet, but I will proceed with them as possible.
  • A web game engine similar to Twine, but implemented using an up-to-date framework (either React or Vue) and implementing features that are common in the old text games Twine draws inspiration from (engines like INFORM that were used to create early games like Infocom’s) but maintains the choose-your-own-adventure nature. This would be a great chance for me to explore Javascript in detail, implement a React.js web service, and to fully grasp object oriented programming in both the UI and game flow elements of the project. (I don’t consider this a “game dev” project bc Twine is fundamentally a creator of web apps and I aim to do the same thing — Twine has been used for customer service apps and other sorts of decision trees, so I think implementing this would still make for a good portfolio project outside of the game industry.)
  • This last project is a bit outside my normal interests wheelhouse, but my father has been complaining for years that he dislikes all the currently available money management software packages (he tracks his expenses meticulously, categorizes them, and still balances his checkbook). He loved the last released version of Microsoft Money, and if I sit down with my father to find out exactly what Microsoft Money did, I think I can implement a modern web app (and then make it native/local with Electron.js) that he can use, and that shows I can do UI work as well as complex financial operations.

I’m sure I’ll have other project ideas, but these are the ones I have right now.

I need to leave the Midwest. I need to work in software/data, and I’m going to do it. I would pay for a bootcamp, but everything I’ve been told is that portfolio is where it’s at — so here I am, this is how I’m going to do this. I’m gonna make it work.

--

--

Eleanor Amaranth Lockhart, Ph.D.
Out of the Midwest with Software & Data

Dr. Eleanor (Ellie) Amaranth Lockhart holds a Ph.D. in communication from Texas A&M & is currently researching topics related to popular culture & data science!