Sign in

Learn the basics of Kafka Console Producers & Consumers with this hands-on article and video

Photo by Markus Spiske from Pexels

In a world of big data, a reliable streaming platform is a must. That’s where Kafka comes in. You already have it installed and configured with Docker. If that’s not the case, read this article or watch this video before continuing.

Today you’ll learn all about Kafka Topics, console Producers, and Consumers. You’ll master the Kafka shell, and by the end of the article you’ll be ready for more advanced examples, such as working with Kafka in Python.

The best part is — the video guide is available once again:

Today’s article covers the following:

  • Kafka topics in a…


And how to create your first Kafka Topic. Video guide available.

Photo by ThisisEngineering RAEng on Unsplash

In a world of big data, a reliable streaming platform is a must. That’s where Kafka comes in. And today, you’ll learn how to install it on your machine and create your first Kafka topic.

Want to sit back and watch? I’ve got you covered:

Today’s article covers the following topics:

  • Approaches to installing Kafka
  • Terminology rundown — Everything you need to know
  • Install Kafka using Docker
  • Connect to Kafka shell
  • Create your first Kafka topic
  • Connect Visual Studio Code to Kafka container
  • Summary & Next steps

Approaches to installing Kafka

You can install Kafka on any OS, like Windows, Mac, or Linux…


CSVs cost you time, disk space, and money. Here are five alternatives every data scientist must know.

Photo by jaikishan patel on Unsplash

Everyone and their grandmother know what a CSV file is. But is it the optimal way to store data? Heck no. It’s probably the worst storage format if you don’t plan to view or edit data on the fly.

If you’re storing large volumes of data, opting for CSVs will cost you both time and money.

Today you’ll learn about five CSV alternatives. Each provides an advantage, either in read/write time or in file size. Some are even better in all areas.

Let’s set up the environment before going over the file formats.

Getting started — Environment setup

You’ll need a couple of libraries to…


It’s also 2.5 times lighter and offers functionality every data scientist must know.

Photo by KTRYNA on Unsplash

Storing data in the cloud can cost you a pretty penny. Naturally, you’ll want to stay away from the most widely known data storage format — CSV — and pick something a little lighter. That is, if you don’t care about viewing and editing data files on the fly.

Today you’ll learn about one of the simplest ways to store almost anything in Python — Pickle. Pickling isn’t limited to datasets only, as you’ll see shortly, but every example in the article is based on datasets.

What is Pickle exactly?

In Python, you can use the pickle module to serialize objects and save them…


CSV’s cost you time, disk space, and money. There’s a solution.

Photo by Christina Morillo from Pexels

CSV isn’t the only available data storage format. In fact, it’s likely the last one you should choose if you don’t plan to view and edit the data on the fly. Going with CSV would be a long and expensive mistake if you plan to dump large datasets and use automation for processing.

Picture this — you collect large volumes of data and store them in the cloud. You didn’t do much research on file formats, so you opt for CSVs. Your expenses are through the roof! A simple tweak can reduce them by half, if not more. …


Python’s OS module is a nightmare for managing files and folders. You should try Pathlib.

Photo by Pontus Wellgraf on Unsplash

File and folder management with Python’s os module is a nightmare. Yet, it’s an essential part of every data science workflow. Saving reports, reading configuration files, you name it — there’s no way around it.

Picture this — you spend weeks building an API around your model, and it works flawlessly, at least on your machine. Once deployed, it’s a whole different story. Your API fails in unexpected places or even won’t run, as absolute paths you’ve hardcoded simply don’t exist.

There’s a no-brainer solution. The pathlib library comes by default with Python 3.4 and above. It’s by far the…


This free Python library could save you a ton of time. But is there a catch?

Photo by Bermix Studio on Unsplash

Deep learning has come a long way in recent years. The practitioners are now way beyond simple image classification tasks. It’s becoming easier to detect or even segment objects of interest, both in images and video. Computer vision has come a long way, but some things haven’t changed in years. Visualization is one of them.

Pharos is a free library for visualizing advanced computer vision datasets — think object detection. It builds a Flask web app around your dataset, thus making the exploration effortless.

By using Pharos, you can easily explore how the dataset was labeled and decide if there…


CSV’s are costing you time, disk space, and money. It’s time to end it.

Photo by Tom Swinnen from Pexels

CSV is not the only data storage format out there. In fact, it’s likely the last one you should consider. If you don’t plan to edit the saved data manually, you’re wasting both time and money by sticking to it.

Picture this — you collect large volumes of data and store them in the cloud. You didn’t do much research on file formats, so you opt for CSVs. Your expenses are through the roof! A simple tweak can reduce them by half, if not more. That tweak is — you’ve guessed it — choosing a different file format.

Today you’ll…


Features vs. simplicity — here are the top picks for everyday data science workflows

Photo by XXSS IS BACK from Pexels

The world of data science IDEs can be overwhelming. You can go from plain text editors for ultimate simplicity to IDEs so feature-rich they will make your head spin. Analysis paralysis gets even worse if you’re willing to pay for a piece of coding software.

You want something simple, yet capable. You want something professional and feature-rich, yet not overwhelming. Does it ring a bell? It likely won’t be a one-time decision.

The short answer is — there’s no one-size-fits-all solution. It’s a personal preference. I have mine, but I’ll try to stay unbiased as possible while comparing these four.

JupyterLab


And a ton of free resources to learn them today

Photo by Andrea Piacquadio from Pexels

Data science is hard. You’ll have to learn a handful of libraries as a beginner, even to solve the most fundamental tasks. Adding insult to injury, the libraries change and get updated constantly, and there’s almost always a better tool for the job.

The problem of not knowing which tool to use is simple to understand — it results in failing completely or not doing a task optimally. What’s also dangerous is not knowing libraries well enough. You end up implementing algorithms from scratch, completely unaware there’s already a function for that. Both cost you time, nerves, and potentially money.

Dario Radečić

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store