Will it solve the world’s problems? No. But words matter, and it’s a tiny thing that you can do right now.

It is no secret that technology and engineering are traditionally unwelcoming fields for racial minorities. That’s especially true in the United States, where Black Americans were literally the slaves of white masters barely 155 years ago. The memory of that injustice is far from distant: The last living witness of Abraham Lincoln’s assassination was a contestant on a 1950s TV game show.

Structural injustice permeates every aspect of life in the United States, from big, obvious things, like the criminal justice system, education, and housing, to the harder-to-see but still insidious things, like the everyday language that we use.


Why the linguistics of mathematics matters in programming and tech

DataSeries highlight:

  • Math is, at its heart, a language. And language is a human creation, highly expressive but full of assumed background knowledge and fuzzy meanings. This article goes on the relationship between the way we write math and programming and tech.

Mathematicians are fond of saying that mathematical notation is unambiguous.

Explain this, then.

Let this be Figure 1:

Notation for a mixed number

And let this be Figure 2:


The metadata science of hacking Jupyter notebooks with SQL and command line fu

Who doesn’t love Jupyter notebooks? They’re interactive, giving you the instant gratification of immediate feedback. They’re extensible — you can even deploy them as websites. Most importantly for data scientists and machine learning engineers, they’re expressive — they span the space between the scientists and engineers who manipulate data and the lay audience that consumes and wants to understand the information that data represents.

But Jupyter notebooks have their drawbacks. They’re big JSON files that store the code, markdown, input, output, and metadata of every cell that you run. …


Learning a language is the same whether it’s natural or artificial.

Photo by Kevin Ku on Unsplash

My greatest love has always been language. I picked up German as a child, then Spanish. I taught myself Japanese and Mandarin. Then Sanskrit, Ancient Greek, Swahili, siSwati, Malay, Bahasa Melayu. In graduate school, I learned Czech.

I don’t think I have any special genius for language. It’s something that anyone can learn and apply. And it lets me pick up programming languages even more rapidly than I pick up natural ones.

Python as a second language

One of the hardest parts about learning programming and machine learning is attempting to keep up with the state of the art. These fields encompass mathematics, computer science…


Unit test your machine learning models, profile your code, and take full advantage of c’s natural language processing speed.

TABLE OF CONTENTS:

I read a blog post that claimed to have profiled spaCy and NLTK for natural-language data preprocessing, and to have found NLTK far faster.

uhh (via giphy, @ TheLateShow)

What?

NLTK is the Jeep Grand Cherokee of natural language processing toolkits: it’s huge, it’s been around for a long…


Your guide to a fresh decade of better projects and cleaner code

Today’s the traditional day to reflect on the past 364 and make declarations of self-betterment.

It’s been an eventful year. Models like BERT have rocked the traditional field; Kubernetes and Docker have joined forces; companies have come under fire for using prejudicial algorithms to make business decisions; controversy has roiled data science boot camps; job descriptions demand an ever-expanding set of disjoint skills. It’s clearer than ever that ours is a multifaceted industry where it’s challenging to stay on your feet and relevant.

Here are 10 suggestions each to help you level up as a data scientist and programmer, with…


Optimize your programming by searching like a professional

Photo by Clay Banks on Unsplash

The problem with “Just Google it”

LMGTFY” is a long-running joke website. “RTFM” is a disdainful sneer. (Ouais, aussi en français.)

But if finding answers were as easy as typing a natural-language query into a search engine and reading a manual page, there’d be no such thing as graduate degrees in library science and I’d be out of a job.

LMGTFY: Answering open-ended questions is not a trivial task.

Apart from being an unsolved, NLP-hard machine-learning question, finding the answers to “why isn’t my code compiling?” or “what’s wrong with my model?” …


Set up nearly-automatic Python virtual environments and create Jupyter notebooks and more in Visual Studio Code.

Why use virtual environments in data science?

We know the importance of dependency management for package development and software developers. But what about for people doing data science who aren’t deploying to PyPI or conda-forge?

Virtual environments can help you fix things when they break.

If you’ve been using Python for any length of time, you’ve had the frustration of a cluttered development environment with too many packages installed. You don’t need them all at one time, and trying to figure out which ones are necessary for your project is frustrating to do by hand.

Packages don’t always get upgraded at the same time, and many are not compatible with each other or even with the version…


Learn Docker, Docker Compose, and Cookiecutter for project management

When you have a great new idea, the last thing you want is to set aside the data and make a clean workspace. Each of us has indulged in just doing “one quick thing” in the IDE without thinking of best practices for dependency management and reproducibility. If we’re honest with ourselves, do we even know what “best practices” are?

Classic.

Virtual environments aren’t easy, whatever tutorials may say.

Sometimes, I’ve spent more time getting my development environment set up correctly (or fixing what’s gone wrong) than working with data and models.

Anaconda frequently causes problems, particularly with R and NLP libraries (but that deserves its own post). I’ve…

Ray Johns

Opinions are my own. Machine Learning, AI, Linguistics, NLP with Deep Learning | { B.A. : Dartmouth; J.D. : Yale; M.S. : Simmons; CS Graduate Courses: Stanford}

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store