Dropbase
Published in

Dropbase

How to work with really large Excel files that slow you down

We cover using Power Query in Excel for very large csv’s and it’s drawbacks. Followed by a Bring Your Own Database solution and the ease of using Dropbase to solve all of your flat file conundrums

How to utilize Power Query to work with large csv files

  1. Locate your Text/CSV data source file. In this method, you cannot change the filename or move the source file once you have created your query. So make sure your source file has a great name and that you can access it.
  2. Open up Excel and navigate to Data > Get & Transform Data > From Text/CSV. This will open up a file browser where you can select your source file. Click Import.
  3. Here’s the first important bit: once a data preview window opens up, click on the small arrow besides Load to open a dropdown menu and click on Load To…

The Drawbacks of using Excel

  1. Power Query does not do repeatability well. If you’d like to apply the same transformation steps on an newer source file, then you have to either do the steps all over again or write a bit of M code to point the query to another file.
  2. Using Power Query locks you down to Excel. Using Excel prevents you from exporting your transformed steps to other tools. The transformed steps exist only as a query on top of the original file and you can’t pipe the results to another tool. Quite a big drawback if you’re part of a team!
  3. Excel doesn’t store a edited version of the source file. You may have noticed while using Power Query that all of your transformation steps only created a view of the transformed data. Your underlying source file wasn’t altered. This is good if your incoming file is cleaned and acts as a source of truth. But what if the file isn’t cleaned or formatted in the first place?

Setting up your own database

  • Personal server + database software: In this option, you have or will have a dedicated server that you will install a database software onto. You’ll have the security of always knowing where your data is but this is the most technically demanding and least scalable option.
  • Cloud server+ database software: You can purchase/rent server space from providers such as Cloudware, where you can install a database software. Definitely easier to set up than the previous option but you’ll incur consistent costs.
  • Cloud database: By far the most popular and easiest option is to just use a managed database. These have familiar names like Amazon Web Services (AWS) and Microsoft Azure. You can find a list of the best ones over on Simplilearn
  • For the first two choices in the previous step, you have a wide berth for what server software you can install. You could start with a free software like MySQL and see if this option is for you.
  • For cloud databases, the software you have access to is dependent on the specific service provider although you usually have many options.

How to use Dropbase to work with large csv files

  1. You drag and drop your csv file into Dropbase
  2. Apply any transformation steps and save them as pipelines
  3. Click Load to Database

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store