Overcome 50 MB Logic App file size limits — using the power of Azure DataFactory

Mohammed Brückner
Serverless and Low Code pioneers
5 min readOct 9, 2022

The (low code) Logic App practitioners among us value Logic App for making regular or automated, orchestrated tasks a breeze. In particular with all the connections into the wider Microsoft Cloud. (Logic Apps is very similar to Power Automate which is part of the Power Platform family, in case you did not know.)

As a practitioner you know as well it is not made for dealing with big file contents on it own. And that is fine, after all architecture is a game of trade-offs. It just begs the question what to do when your Logic App flow needs to deal with a large file and there is no built-in functionality of delegating the data operation to other services in the Logic App connector(s) you are using? And how to do solve this cost effectively?

Some random Dall-E2 AI Art thrown in because it has something to do with cloud and cloud bills

Well, the power house for Extract, Transform and Load (ETL) in Azure is Azure DataFactory. And it is easy to hook into a Logic App flow. Well, now let’s put this into perspective since you probably read “easy” in any given tutorial just to find out it is everything but easy. ;D — DataFactory does come with a certain learning curve, has its own taxonomy and yes, it takes a little bit of toying with it to use it effectively. When I wrote “easy” what I meant was it does not take much to make an Azure DF pipeline run part of your Logic App flow. Just setting expectations. What a pipeline is and what it takes will hopefully become clearer after you went through this article, so hang in please.

PART 1: The Logic App flow

For our scenario let’s assume you have a big file sitting on Dropbox. The flow therefore could be triggered like this.

The idea now is to handle the new Dropbox file using Azure DataFactory instead of Logic Apps. (DataFactory does not have the limitations with regard to file size.) Creating a DataFactory pipeline run takes a couple of settings. (A pipeline run is basically like a Logic App flow run, just as a DataFactory job.) And in this case I am passing on some parameters to pick the right file on Dropbox, the right Blobstorage container and I am passing along a fresh (short-lived) access token for Dropbox. Note: Using Azure Keyvault for tokens (instead) is a good idea.

By the way, as the comment says — as of October 22, the content type of Blobs written to Blobstorage would always be application/octet-stream by default and the Logic App connector does not have any way to set the content-type. However the Azure Blobstorage API does and what you now can do is fire an Azure API call and set the right content-type. Doing that out of Logic Apps and using Managed Identities is rather painless. Nevertheless, it’s not important for the focus of this article. (Check this one instead.)

Coming back to the flow — now that the pipeline run is initiated, the DF pipeline runs asynchronously in the wide cloud computing void. The only way to find out how far it got is by checking in on the job run. That’s what we do in an “Until” loop, using a custom variable for flow control. While the state is “InProgress” we need to wait another minute or so and iterate again until it changes. (You can change limits in the Until loop which you should.)

PART 2: The DataFactory Pipeline

A DataFactory pipeline is determined by the actions attached, just like in Logic Apps. The only action step in my case is a “Copy data” step, fetching data from an HTTP endpoint (=Dropbox API) and pushing the data to Azure Blobstorage. This is what it looks like in the DataFactory Studio.

And here are the (mapped) parameters coming from Logic Apps.

The pipeline action “Copy data” is determined mainly by “origin” and “sink”, meaning where the data comes from and where it needs to go to.

The (HTTP) headers can be defined dynamically in DataFactory and in my case it looked like this.

Authorization: Bearer @{pipeline().parameters.a_token}
Dropbox-API-Arg: {“path”: “/@{pipeline().parameters.subfolder}/@{pipeline().parameters.file_name}”}

And that is pretty much it. Azure DataFactory working hand in hand with Logic Apps and happily chewing through massive files, if you want it to. DataFactory is completely serverless and pay-per-use. Shuffling around hundreds and hundreds of MB from Dropbox into Azure Blobstorage cost me a total of 20 EUR cents — or at least in that ballpark. Give it a try. No reason to be afraid of a shocking bill.

Now use your new skills to build an end to end video encoding flow in Azure.

Or create another pipeline for copying over files from one Storage Account to another, the way you want it and need it.

NB: Of course everything has limits and DataFactory makes no difference. Check out the limits here.

--

--

Mohammed Brückner
Serverless and Low Code pioneers

Author of "IT is not magic, it's architecture", "The DALL-E Cookbook For Great AI Art: For Artists. For Enthusiasts."- Visit https://platformeconomies.com