Easy is hard, and hard is easy

That was one of the first things my professor said during my freshman year while we were being introduced to physical computing, and damn is that still holding true.

I haven’t written a puzzle-solving code post in a bit, so I thought I’d document a recent obstacle I ran into while trying to implement an “easy” solution.

The objective was to take objects in Array A, and then check if Array B contained the same object. If so, Array B would have the duplicate object filtered out.

I’m currently building my personal portfolio site which uses AWS S3 to store the files, and MongoDB to group the S3 URLs along with a photo genre and a couple other fields. Whenever I upload new media via a form, the file travels into my S3 bucket, and then the entire bucket’s contents are retrieved. Recently I’d noticed something odd, which I’ll get into in a bit, and realized it was a bug I’d overlooked.

Earlier, I’d figured out the flow of uploading a new photo as: “Send it from the form to the backend where it’ll get sent to S3. Then, just download the files and add them to Mongo for querying later on”. On paper, I’d checked off all those boxes, but something was off. For instance, I would upload one or two new photos at a time, and a huge array would instead be added into Mongo.

I could see the URL’s of the photos I’d just added in the list, but where the heck did these other values come from?? I would then check the webpage where the photos were displayed client-side, and I’d have a repeated set of photos I’d already added.

Then it dawned on me.

… and then the entire bucket’s contents are retrieved

Yep, I might’ve been adding just one or two new files from the form, but the backend was taking everything that was already in S3, plus the new files and submitting all of that into Mongo. That’s why I’d been getting a huge array with duplicate results in the database.

To try and fix this, I came up with the following plan: “Compare the photos already in my database with an array of everything from S3, and then only upload the non-duplicates

Seemed pretty foolproof at first. I’d make a separate array to store the objects from Mongo, and then compare it against photoArray which held everything from S3. Then, I’d use something like array.includes() along with array.filter() .

Step 1: Getting the existing objects from Mongo

Now that I had two arrays, all I had to was plug them into this line of code below and just not mess up the order of which array would come first.

If previousArray included a value that was also in photoArray, then it would get filtered out. I gave the code a test run, and noticed that filterResult was returning an array that was very similar to photoArray.

The screenshot above shows one object in previousArray which correctly reflected the single entry in Mongo. The array on the bottom with two objects was the data inide of filterResult and photoArray. (I was uploading two files at once here).

If you look closely, you’ll see that DSC00833.jpg is the file already inside Mongo. I’d uploaded it again along with a new file to see if the filtering worked. Evidently something wasn’t working, because the same file ended in the filtered results when I was expecting just DSC04688.jpg.

I stared for a bit, and then saw that the object in previousArray had this new field, _id, attached to it. “Oh ok” I thought “Mongo adds that automatically when the file enters into the DB, so that extra field must be throwing off the filtering process”.

I stuck previousArray onto a .forEach() function, and then removed the _id field from each object. With that done, previousArray and photoArray would always have just the src and genre fields when they got compared. I ran the function again…. and wut?

filterResult was showing 2 objects instead of 0

All three arrays are showing the same objects! How come filteredResult was displaying these when it should be empty? If photoArray and previousArray contained objects with the same field values, then filteredResult should have returned an empty array.

Well, it turns out array.prototype.includes() doesn’t recognize two arrays with the same objects as the same, and neither does Object.is() which I also tried. The reason for the latter is because two arrays would occupy different places in memory or something.

Well that’s certainly annoying. The simple description of “using array.includes() to filter out duplicates from an array” was starting to become pretty complex it seemed.

I knew the src field was the only value that really mattered when it came to checking for duplicates, so I decided to start there. I began by taking previousArray and looping through it to get the src values by themselves in a separate array.

At first, I had previousArray store only one object since I wasn’t sure how to make it work with multiple objects to start. I’d figured I’d be able to figure out a solution once I had it working with just one object.

I set a variable name selectedSource to previousArraySources[0] and then played around with how to filter out selectedSource from photoArray. I’d seen that array.prototype.filter() allows for a callback, so I thought I’d do that.

That seemed to do the trick! photoArray would be looped through, and only true values from containsValue would get added to filterResult.

Now onto having multiple selectedSource values

The obvious place to start was to make a for-loop to update the value of selectedSource with the now multiple values stored in previousArraySources. The last piece of the puzzle came together by accident, and it’s where photoArray is recursively updated with filterResult. I was looking at the objects in filterResult, and wondering “huh, why don’t I just this filtered array back into the for-loop?” so I set photoArray equal to filterResult which ended up doing the trick!

Full page solution here:

Now is this the most efficient way to get what I want? Most likely not. I’m just trying to get more comfortable with manipulating data, which I accomplished here, as well as getting my site up and running. I’m sure I’ll look back in a year with knowledge of time-complexity or some other spooky concepts, and laugh at the rookie mistakes I’m making now!

--

--

--

A blog for documenting my thought processes over the course of a project and how I can improve upon them in the future. Writing challenges me to build a “tree trunk’s foundation” [Tim Urban] of knowledge as I tackle new challenges.

Recommended from Medium

REST services & Koa.JS

Things you should know about javascript

JavaScript: Use Cases

NodeJS: Best Practices for Production

Day27 of #100DaysOfCode

Building Adobe Scripts

Don’t Skip Using Semicolons and Commas When Writing JavaScript

TypeORM Dynamic Collection (Table) name when using MongoDB

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Timmy Zhou

Timmy Zhou

More from Medium

Delivering Competitive Advantage

What I Learned from Interning in Factory Software at Tesla

What is Power over Ethernet (PoE)?

Why I left law to be a software engineer