If you’re using Amazon Redshift, you’re likely loading in high volumes of data on a regular basis. The most efficient, and common, way to get data into Redshift is by putting it into an S3 bucket and using the COPY command to load it into a Redshift table.

Here’s an example COPY statement to load a CSV file named file.csv from the bucket-name S3 bucket into a table named my_table.

COPY my_schema.my_table 
from 's3://bucket-name/file.csv'

Simple enough, right? What about that pesky <authorization> line? …

It’s no secret that regulations such as GDPR are impacting how companies are handing personal data. It’s not just impacting marketers and those in charge of production databases either. The legal and ethical issues in handing such data has made its way into the domain of data science, machine learning and data warehousing.

I’ve worked for and advised several companies confronting the dual challenges of using a modern data warehouse to support data analysts and data scientists while ensuring they protect customer data. The challenge of protecting private data in a warehouse that’s used for modeling and analysis is a different challenge than a production app database. First, more people typically get access to a data warehouse than a production database. Second, the activities of data scientists and analysts often involve downloading raw data for use in Python scripts, Juypter Notebooks and so on. …

In the grand scheme of the web, 15,000 web pages is a drop in the bucket. However, you can learn a lot by sourcing and scraping that many from across a diverse set of sites. In building a Python app to find and Tweet out interesting data science content, I had to gather a lot of potential articles, videos and blog posts to work with and then scrape those to learn more about them. Here’s some of what I learned along the way.

Finding pages to scrape is a task in itself

I talked a little bit about how I sourced URLs to go after in the post I linked above, but getting started was harder than I initially thought. One of my main sources were links people posted on Twitter. …

I’m still digesting all the news out of AWS re:Invent 2018, but boy is there a lot to like if you’re a software engineer and have an interest in machine learning.

For years, engineers working on data science teams have essentially been working on data pipelines and supporting the work of a team of data scientists who need data to build models. …

Add my voice to the chorus of those taking breaks or fully walking away from social media. I just completed one week away from the two social media platforms I had been spending time on, Twitter and Facebook. Others have shared their thoughts and research behind the need for a break, but I found a few unexpected takeaways that I feel are worth surfacing.

First, a few things about my week away:

  1. My primary motivation for taking a step back was not that I shared much on social media and became anxious about “likes”, “follows”, etc. In fact, I don’t share much— I consume. For me, Twitter especially, was a tool for consuming more and more information. …

Once in a while, an answer on Quora really speaks to me. It’s rare I grant you, but when it happens it provides a sense of clarity around a question I’ve had rattling around my head for a while. Or perhaps it answers a question I’ve had subconsciously, but haven’t been able to articulate.

Recently, I came across an answer to the question, “What is a data science?”, along with this answer from Michael Hochster. It’s worth reading the whole thing (it’s not long), but to summarize:

Data Scientists are people with some mix of coding and statistical skills who work on making data useful in various ways. …

One symptom of the hype surrounding data science is the feeling that we as practitioners need to live up to it. It’s easy to blame Marketing, Sales, bloggers and even investors but we are just as guilty in making what we do more complicated than it needs to be.

If you’ve spent more than 1 week working in a department with “Data Science”, “Machine Learning” or (bless your heart) “Artificial Intelligence” in the title you’ve felt this pressure and know exactly what I mean. …

We all scrape web data. Well, those of us who work with data do. Data scientists, marketers, data journalists, and the data curious alike. Lately, I’ve been thinking more about the ethics of the practice and have been dissatisfied by the lack of consensus on the topic.

Let me be clear that I’m talking ethics not the law. The law in regards to scraping web data is complex, fuzzy and ripe for reform, but that’s another matter. …

Maybe you haven’t heard, but working from somewhere besides the same office everyday is more than just a fad. Despite some of the old guard (looking at you IBM) pulling back, remote work really is thriving.

The thing is though, there’s a big difference between working from home every Friday and not stepping foot in an office for weeks or months at a time. Both have value, but if you’re looking to work 100% out of the office your job hunt needs to differ from the traditional. If you haven’t heard the term “remote-first” before, it’s time to get familiar.

Remote-first doesn’t necessary mean that everyone in the company works remotely. In fact, in my experience it’s rare to find a 100% remote company that’s fewer than a handful of people. Sure there are exceptions, but the reality is that most companies want an office of some sort, and there are going to be a mix of remote and in-office colleagues that need to work well together. For example, Stack Overflow, based in NYC, has several offices yet has a great reputation as a remote-first company. …


James Densmore

Data Science and Data Engineering Consultant at Data Liftoff https://www.dataliftoff.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store