It worked on my machine…

Eugene Niemand
Feb 7 · 3 min read

Azure Portal Corrupts Downloaded Files

Image for post
Image for post
Credit: http://www.keepcalmandposters.com/poster/5273092_keep_calm_it_works_on_my_machine

Have you ever had a situation where you’re working with a tester who has reported a bug, and you can’t for the life of you recreate it? It does after all work on your own machine…

We had an example of when downloading a parquet file using the Azure Portal Data Explorer, it corrupted the file and made it unreadable by Spark and parq, however downloading the same file with the Azure Storage Explorer solved the problem.

We were two days from deploying to production and our tester raised a bug, reporting he could not read the file he downloaded using the Azure Portal Data Explorer. Executing the below, we got a very strange and unhelpful error. We also got the same error using the spark-shell.

parq v3.2.0[FAIL] don't know what type: 15
don't know what type: 15

We started comparing versions of our Scala binaries to see if there were any obvious differences. This worked two days ago, what’s changed? We looked at logic, data types etc. Nothing obvious, we tried increasing resources on our Spark cluster — still no luck.

We ran our integration tests locally, used the same command and it all worked a treat. Okay, so the code is fine and produces a valid parquet file. We manually uploaded the same JAR to the cluster and ran the job again. The tester downloaded the file and the error occurred again, the engineer downloaded the file and the command worked fine.

Image for post
Image for post
Credit: http://qadesigngurus.blogspot.com/2016/05/dev-vs-qa.html

This is when it struck me (at the time I was sitting between the tester and the engineer running the job — our tester was using the Azure Portal and the engineer was using the Azure Storage Explorer to download the file. The engineer used the portal and lo and behold the same error occurred.

Upon further investigation, we noticed that the downloaded files had different sizes, the one from the portal was 5812KB and the storage explorer version was 4491KB. This is really strange, given they are as supposedly binary files, however using different tools to download somehow affects the file.

We still don’t know why this happened or what other files this could have an effect on. So if you come across this issue, my advice would be to avoid using the Azure Portal, and instead use the Azure Storage Explorer.

About the author

Eugene is a Senior Data Engineer at ASOS with a passion for Test Driven Development, Agile Methodologies, Continuous Integration and Delivery using Microsoft Azure

Eugene Niemand’s Blog

Technology and Programming professional and enthusist

Eugene Niemand

Written by

Lead Data QA Engineer at ASOS.com - I have a passion for Test Driven Development, Agile Methodologies, Continuous Integration and Delivery using Microsoft Azure

Eugene Niemand’s Blog

Technology and Programming professional and enthusist

Eugene Niemand

Written by

Lead Data QA Engineer at ASOS.com - I have a passion for Test Driven Development, Agile Methodologies, Continuous Integration and Delivery using Microsoft Azure

Eugene Niemand’s Blog

Technology and Programming professional and enthusist

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store