jobSort() Story

Austin Tackaberry
6 min readMar 10, 2018

--

Before I dive into the history of jobSort(), here are a couple links:

How I came up with the idea

I was thinking that I would be applying to jobs soon. I stumbled upon Hacker News: Who is Hiring and noticed that there didn’t seem to be a great interface to view the listings. So I thought I would make one. I also knew about a couple other sites with developer job listings. I thought I would add another layer to my web app and make a little job board site. The large job boards had so many listings that it was hard to filter listings to find jobs you actually wanted. So mine would be a smaller job board. One that only grabbed listings from niche developer job sites. I ultimately landed on Hacker News: Who is Hiring, Github, and Stack Overflow.

There isn’t really a great story of how I came up with the sorting feature. In case you don’t know, jobSort() sorts based on the technologies you are looking for and how much you care about each technology in a job listing. You assign a weight to each technology. For example, a higher number given to JavaScript will move job listings with JavaScript in the description closer to the top. I knew that other job sites let you filter by technology, which was nice, but what if you knew Python, Django, Flask, Pandas, and Scikit Learn, but you really wanted to work with Django and didn’t care quite as much about the others? That would require multiple searches and/or lots of scrolling.

Getting the data

Now that I had the idea, I had to figure out how to get the data from each of the sources.

Github had an API, easy.

Stack Overflow was tricky for a couple reasons. I knew I would have to scrape it, and I did that by first taking the user inputted location and job title and adding that info to the GET http request. Easy enough right? Well kinda. So I was able to get all of the job listings, but the descriptions weren’t on that page. You had to go to each of the listings’ pages in order to get the descriptions. These were crucial because how else was I going to find out which technologies it has for the jobSort() algorithm? So I figured out how to get all those links and make all those additional GET requests…but it took like 15 seconds. And that was the day I learned about Promises.all. Quite the light bulb moment right there. Oh, and Stack Overflow hates being scraped, so I kept getting blocked. And when I hosted my site on Heroku, it didn’t work at all. Turns out my Heroku server was permanently blocked from Stack Overflow.

Hacker News was a pain because, as you may know, it is not formatted AT ALL. Even when you thought you noticed some patterns, there was always that one person that totally messed it up. It seemed like all comments had the company name first, oh that’s nice..NOPE this person decided to put compensation first. So writing that algorithm was fun. I tried to eliminate false positives and deal with the false negatives. It was okay if my algorithm couldn’t find the company name, but it wasn’t okay if my algorithm thought that the company name was “Software Engineer”.

The issues start

Alright, cool. So every time the user did a search, all this stuff would happen. It was fun learning how promises worked and determining what could be run in parallel vs what needed to be done in series. But I had another problem. Every time the user did a search, every Hacker News comment location was sent to Google’s geocoding API to get the coordinates. There are > 600 comments. Not only does Google have a rate limit, but they also have a daily limit. This clearly was not going to work.

But then I thought, why was I having all this happening every time the user made a search? The Hacker News comments don’t change that often and they are the same regardless of user input. Certainly, it would make more sense to make a node app that scraped the post every so often and stored the info in a database. I had some experience with MongoDB, but I heard some negative comments about NoSQL databases, and I saw that MySQL was in a lot of job listings, so I went with that. I was able to get around the rate limit, because I could set a timer to run 50 geocoding requests every second. However, I still ran into the daily limit even if I ran this hourly. I had another thought. What if I checked if the location of each post had actually changed? From one hour to the next, most of the comments should be the same. I shouldn’t have to do another geocoding request if the post hasn’t changed.

Going back to the Stack Overflow issue, I tried to email Stack Overflow, but that didn’t work. I tried to write a meta post asking to get my Heroku server unblocked. But even if I got it unblocked, I wouldn’t have been able to have more than 5 users at a time because Stack Overflow would have blocked my server. I thought about moving that request to the client side, but that also wasn’t ideal. Ultimately, even though it hurt me deeply, I ended up removing Stack Overflow and Github altogether. The real value in my site was formatting Hacker News anyways. It would have been nice to have all 3, but it was going to be more pain than it was worth. It sucked removing all that code, and all my beautiful promises, but that’s the way it goes sometimes.

jobSort() algorithm

Overall, there weren’t too many issues with the algorithm. I became intimate with RegEx whether I wanted to or not. I had to think critically about how to find C, C++, C#, Go, R and determine that they were in fact languages. Even React vs React Native. Or JavaScript vs JS. Node vs NodeJS. It was an iterative process, but overall not too many major obstacles.

Autocomplete input for technologies

I wanted the user to be able to type the technologies, and I wanted there to be a dropdown menu offering suggestions. And I wanted to prevent the user from typing in a word that was not in the technologies array. jobSort() has a list of technologies to look for in a job description. If the user types in a technology that isn’t in that list, that would cause issues. I could have it be added to the list of technologies, but what if it was a typo? The quick solution I found for this functionality was the HTML5 datalist element. It had some autocomplete functionality and there was a drop down menu. The only issue was that it was horrible on mobile, and there was no control. I wanted to specify the height of the menu, and I couldn’t. I wanted to hide it when the input value was empty, and I couldn’t. I dealt with this for awhile.

At a coffee and code meetup in SF, Anthony Ng showed me an OSS project to contribute to, Downshift. It was the perfect library. It is an autocomplete library made by Kent C. Dodds that allows you to make all the decisions regarding its functionality. What a beautiful moment that was.

Later refactoring

I ended up refactoring several times later on. I had it all in one massive <App /> component (I know, I know), so then I modularized it. Then, I added testing with Mocha/Chai/Sinon. Then, I added Prettier and ESLint - AirBNB config.

Conclusion

So now I have a node app that scrapes the latest Hacker News: Who is Hiring post, parses the data, determines the coordinates for each location, and stores all the information in a MySQL database. And then there’s the web app where the user types in a location and then technologies the user wants in a job and weights them appropriately. There’s a neat little text loader that types out some jobSort() code. And then you get filtered and sorted results. Individual listings can be hidden, and there’s a short/long description toggle button.

It was a perfect project for my skill level at the time. I posted it to https://reddit.com/r/cscareerquestions and received 650 new users which was pretty cool. Overall, people seemed to like it. Turns out there are similar websites out there. None of them had the jobSort() algorithm though. Now all I need is returning users!

--

--