Easy is hard, and hard is easy
That was one of the first things my professor said during my freshman year while we were being introduced to physical computing, and damn is that still holding true.
I haven’t written a puzzle-solving code post in a bit, so I thought I’d document a recent obstacle I ran into while trying to implement an “easy” solution.
The objective was to take objects in Array A, and then check if Array B contained the same object. If so, Array B would have the duplicate object filtered out.
I’m currently building my personal portfolio site which uses AWS S3 to store the files, and MongoDB to group the S3 URLs along with a photo genre and a couple other fields. Whenever I upload new media via a form, the file travels into my S3 bucket, and then the entire bucket’s contents are retrieved. Recently I’d noticed something odd, which I’ll get into in a bit, and realized it was a bug I’d overlooked.
Earlier, I’d figured out the flow of uploading a new photo as: “Send it from the form to the backend where it’ll get sent to S3. Then, just download the files and add them to Mongo for querying later on”. On paper, I’d checked off all those boxes, but something was off. For instance, I would upload one or two new photos at a time, and a huge array would instead be added into Mongo.
I could see the URL’s of the photos I’d just added in the list, but where the heck did these other values come from?? I would then check the webpage where the photos were displayed client-side, and I’d have a repeated set of photos I’d already added.
Then it dawned on me.
“… and then the entire bucket’s contents are retrieved”
Yep, I might’ve been adding just one or two new files from the form, but the backend was taking everything that was already in S3, plus the new files and submitting all of that into Mongo. That’s why I’d been getting a huge array with duplicate results in the database.
To try and fix this, I came up with the following plan: “Compare the photos already in my database with an array of everything from S3, and then only upload the non-duplicates”
Seemed pretty foolproof at first. I’d make a separate array to store the objects from Mongo, and then compare it against photoArray
which held everything from S3. Then, I’d use something like array.includes()
along with array.filter()
.
Now that I had two arrays, all I had to was plug them into this line of code below and just not mess up the order of which array would come first.
If previousArray
included a value that was also in photoArray
, then it would get filtered out. I gave the code a test run, and noticed that filterResult
was returning an array that was very similar to photoArray
.
The screenshot above shows one object in previousArray
which correctly reflected the single entry in Mongo. The array on the bottom with two objects was the data inide of filterResult
and photoArray
. (I was uploading two files at once here).
If you look closely, you’ll see that DSC00833.jpg
is the file already inside Mongo. I’d uploaded it again along with a new file to see if the filtering worked. Evidently something wasn’t working, because the same file ended in the filtered results when I was expecting just DSC04688.jpg
.
I stared for a bit, and then saw that the object in previousArray
had this new field, _id
, attached to it. “Oh ok” I thought “Mongo adds that automatically when the file enters into the DB, so that extra field must be throwing off the filtering process”.
I stuck previousArray
onto a .forEach()
function, and then removed the _id
field from each object. With that done, previousArray
and photoArray
would always have just the src
and genre
fields when they got compared. I ran the function again…. and wut?
All three arrays are showing the same objects! How come filteredResult
was displaying these when it should be empty? If photoArray
and previousArray
contained objects with the same field values, then filteredResult
should have returned an empty array.
Well, it turns out array.prototype.includes()
doesn’t recognize two arrays with the same objects as the same, and neither does Object.is()
which I also tried. The reason for the latter is because two arrays would occupy different places in memory or something.
Well that’s certainly annoying. The simple description of “using array.includes()
to filter out duplicates from an array” was starting to become pretty complex it seemed.
I knew the src
field was the only value that really mattered when it came to checking for duplicates, so I decided to start there. I began by taking previousArray
and looping through it to get the src
values by themselves in a separate array.
At first, I had previousArray
store only one object since I wasn’t sure how to make it work with multiple objects to start. I’d figured I’d be able to figure out a solution once I had it working with just one object.
I set a variable name selectedSource
to previousArraySources[0]
and then played around with how to filter out selectedSource
from photoArray
. I’d seen that array.prototype.filter()
allows for a callback, so I thought I’d do that.
That seemed to do the trick! photoArray
would be looped through, and only true values from containsValue
would get added to filterResult.
Now onto having multiple selectedSource
values
The obvious place to start was to make a for-loop to update the value of selectedSource
with the now multiple values stored in previousArraySources
. The last piece of the puzzle came together by accident, and it’s where photoArray
is recursively updated with filterResult
. I was looking at the objects in filterResult
, and wondering “huh, why don’t I just this filtered array back into the for-loop?” so I set photoArray
equal to filterResult
which ended up doing the trick!
Full page solution here:
Now is this the most efficient way to get what I want? Most likely not. I’m just trying to get more comfortable with manipulating data, which I accomplished here, as well as getting my site up and running. I’m sure I’ll look back in a year with knowledge of time-complexity or some other spooky concepts, and laugh at the rookie mistakes I’m making now!