Your first step into the Data world

Alexandre Bergere
datalex
Published in
7 min readOct 8, 2019

I recently decided to become a mentor on OpenClassrooms and before my first meeting, I was thinking to myself: “ How can I best help students? How can I share the most important information? How can I show them so many different aspects of the data world?”

Working on data in not just processing and storing data into Datawarehouse’s. You have to understand it: the data himself and all the context around. You have to clean, process, modeling and store it to finally visualize it to make it valuable. And as every aspect of IT, technologies and methods evolved very fast.

That’s why, throughout this article, I hope I can share some tips and guidelines to help them.

https://dataintensive.net/images/map-ch3.jpg

1. The right tools

First of all, once you’re starting in data, you’ve to choose your language: python or R? Why don’t be specialized in one of the major companies’ tools: Informatica, Talend, Qlick, MSBI? For sure you’ll use SQL but with who: PostreSQL, SQL Server, Oracle, MariaDB?

You still don’t have any idea of how SQL works? Reading this article will be a good start.

Here are some useful tools you should have:

  • Try different IDE and pick your favorite one: VSCode, PyCharm, RStudio
  • No matter in what you’re coding, when you’re working with others you have to do it properly and versioning it. Git is not an option anymore: Github, Gitlab.
  • Using Notebook is a great new approach to discover and visualize data. Make your code like a tutorial: Jupyter, Zeppelin, Azure Notebook.
  • Use easy to use data visualisation tools. Is not because you’re working in data that for all exploration or visualization you have to use code! There is a loads of great tools which are making it easier: Power BI Desktop, Penthao, even Excel.

Don’t look for complexity, look for efficiency.

Of course, the right tools is your knowledge and expertise. That’s why you have to always be aware on new technologies, releases, concepts, methods …

Here are some concepts, people and technologies that you should familiarise yourself with:

In a global manner:

https://mattturck.com/data2021/

First of all you must read Data Intensive Application.

2. Update 1 hour every day — Thanks to Paris’ public transport

During your day, try to find an hour to stay updated and read useful articles. Choose your favorite moment: during breakfast, lunch break, a boring meeting, or on public transport … but do it often!

Once you’ve found the time, you need to find the right content!

  • Social media

Maybe the opposite of what your teachers or parents told you, social media can be useful! Of course we are not talking here about the lastest post of Nicki Minaj but rather following the people and companies who will give you the right information and provide you with a source of knowledgeable articles.

Start by following the languages you’re using (with their companies and communities) and follow them on Twitter, LinkedIn, Facebook or Github. You’ll see on your feed page what they like and share, don’t hesitate to follow as well.

  • Newspapers

Why don’t you subscribe to some newspapers who will do the work for you? Online or hard paper copy, choose what suits you best. Here are just some examples: Medium, Techcrunch, Wired, The Verge, Mashable …

  • Speakers

Sometimes is can be easier to to hear directly from the experts.

There are loads of different events during the year, everywhere in the world, that bring a lot of different speakers to talk about technologies (from beginners level to experts), aspect of data, use case studies … during several days. Often the most famous events will come with a price tag.

Summits from the cloud’s leaders are often free and very knowledgeable (AWS, GCP, Microsoft) — you can even find free stickers, goodies and food, so what are you waiting for?

Microsoft also offer free learning days: https://openhack.microsoft.com/.

Find experts who are speaking about their favorite tech languages or topics on some app like meetup or eventbrite. Mostly these are scheduled for after work hours, but you can also find breakfast or weekend talks.

And if you don’t want to move from your bed, why not choose from the thousands of videos available on YouTube?

  • Read some books

Seems old fashioned, but there is nothing better than a good piece of paper to understand a principle or deeper aspects.

Here some good books to start: The Kimball Group Reader, The Data Warehouse Toolkit, Hands-On Machine Learning with Scikit-Learn and TensorFlow, Software in 30 days, Python Data Science Handbook, Data Intensive Application

And for the french reader : “Data Mining et Statistiques décisionnelles”.

3. Keep a trace

Photo by Pereanu Sebastian on Unsplash

That’s great that you read something interesting, unfortunately we don’t all have the same brain as Sheldon, so do something to remember it:

  • Use Notion, PowerPoint, Google Slide, OneNote, LibreOffice or even Prezzi to keep, summarize and design all these new informations. And why don’t share it with the community if you think it could be useful for someone else?
  • Use Notebooks to save your favourites code with a graphic aspect. Nothing better than a good Notebook to understand the appropriated languages.
  • Use Github to share some of you projects or anything which could be useful to the community.
  • Use the bookmark bar, with a good classification, to save articles or websites that you could use later.

Loads of people summarize knowledgeable ressources, here some of them:

4. Always keep updated and trained

Even if you can always of update yourself with free content, loads of applications did a real good job to help you in that way. While you may have to pay for some of the main actors the results are definitely worth it. Here are some leaders on training:

  • Datacamp: specialized in Python, R and SQL. Datacamp focus on practice with an interactive online tool.
  • Pluralsight and Edx: leaders on offering a variety of video training courses in software developers, IT administrators, and creative professionals.
  • OpenClassrooms: Leads by this credo “ Learn to learn efficiently”, the platform offers loads of different path mixing videos and exercises.

And some sources to train yourself on the main cloud computing companies:

And when you’re ready, why not pass certifications?

5. Leaving your comfort zone

Photo by Sam Manns on Unsplash

Open your tech landscape

You know how to use pandas and query in SQL ? That’s maybe enough for today but not for tomorrow.

As you know (or just discover in this article?!), hundreds of technologies exist on the market. Of course, it’s impossible to become an expert on all of them, but you should know at least a little about a lot of technologies.

First you have to find other technologies which have the same characteristics of the one you’re used to it. Understand when you should use Python instead of R for example.

Don’t stay stick on data field, be open and curious to all different aspect: Architecture, Cloud Computing, Security, Web … And most of the time, you will have to do or at least understand these others field to become an expert in data.

Don’t remain compartmentalized in technology

Don’t stay focused on tech, be curious to understand use case studies and interactions with all other fields: sociology, bank, insurance, retail, philosophy …

When you work in data, we don’t expect you to develop algorithms without understanding the business, the needs or the context. You have to understand the data to exploit it and have knowledge on the current sector.

If you’re working on energy field, maybe check some online training on it to have a better understanding of the context which will give you a better approach and allow you to have a better analysis.

Some useful readings: The Signal and the Noise: The Art and Science of Prediction, Weapons of Math Destruction

Don’t forget that wealth comes from diversity, don’t use just one flow, use as many as possible.

--

--

Alexandre Bergere
datalex
Editor for

Data Architect & Solution Architect independent ☁️ Delta & openLineage lover.