I created a virtual hiring assistant using AI

December 10, 2023

Hiring is difficult. More so when you’re a part of a fast-growing start-up, building and breaking things, and you have urgent, important things on your plate all the time.

A couple of months ago, we decided to get more hands on deck in our product team. I posted about an Associate Product Manager internship position on LinkedIn and X, and over the next couple of weeks, we got over 800 applicants.

The volume of inbound resumes rapidly went past the 1000 mark, and the best way to find qualified candidates on Greenhouse was hilariously underwhelming — read through resumes after applying clever keyword searches and other basic filters.

Ideally, I’d want to read through every single resume, but within a couple of days of trying, I figured out it’s unrealistic. At one point, resumes were actually coming in at a faster rate than I was processing them.

There had to be a better way.

But a month passed and I could not prioritize it. I thought of ideas, but kept procrastinating the actual work of building something to help me out.

Push came to shove. We had 1300+ candidates who’d applied on average a month ago — many probably already chose competing opportunities, and my day was getting out of hand. A significant chunk of the work on my plate were excellent learning opportunities for an APM, but not as high-leverage for me anymore. We needed to hire. Fast.

One fine Saturday morning, I decided to do something about it. In the next 7 hours, I built a tool that I’d envisioned a month ago. In the next 7 days, there were 40 qualified candidates in later stages of the hiring funnel. Within 14 days of getting off my ass and writing Hire Stack, we rolled out an offer.

This is the story of how Hire Stack works.

The Funnel

For this particular position, this is what the hiring funnel looked like:

Resume Review: The first step was to sift through the resumes. We looked for relevant experience, educational background, and skills that align with the role.
Intro Call: Selected candidates were then invited for a brief introductory call. This helped align on logistics, and a fit for the position, both from the applicant’s end, and from our end.
Assignment: We shared a practical assignment relevant to the role. This tested the candidates' ability to handle real-world scenarios they might encounter in the position.
Assignment Discussion: Candidates who did well on the assignment were invited to discuss their approach and solutions, providing insights into their problem-solving and analytical skills.
General Interview: This stage involved a more in-depth interview, focusing on a range of topics from technical knowledge to past experiences.
Cultural Interview: The final stage assessed candidates' fit within our company culture and team dynamics.

The Bottleneck

With 1300+ applicants, the main bottleneck was Resume Review. It was taking too much time to review applicants, and because of the lack of a clear heuristic, the resume reviews were also not really doing justice to the funnel, since qualified candidates might never make it to the top of the resume stack that’s being reviewed.

Another bottleneck, albeit a less severe one, was the logistics around an scheduling the Intro Call.

The Approach

My approach was focused on taking me out of the loop for the two identified bottlenecks — Resume Review and Intro Call : Logistics as much as possible. Automating the rest of the funnel was a bonus goal. Here’s what I ended up doing:

Fetching Resumes

First and foremost, for any automation, I needed a way to fetch all the resumes for the job from Greenhouse.

Greenhouse does not allow downloading all the resumes for a job from their dashboard. What a shame. The only two ways of getting all resumes for a job are to either use Greenhouse’s Harvest API, or download them one by one, or in batches of 30 from the dashboard.

Harvest API

I could not get the Harvest API to work. The tokens I got ran into auth errors. I did not have admin access to figure out the right API keys, and the back and forth for getting the right access took too much time.

Selenium FTW

Desperate to make progress, I fell back to browser automation. Greenhouse allows you to export an excel sheet that holds basic contact info and job + application identifiers for all candidates for a job post. I figured out that any row in this excel sheet holds enough information to successfully construct the URL of a candidate’s application page on Greenhouse.

A couple of hours of effort and I had an automation running that was downloading all 1300 resumes by using Selenium and the Safari driver, taking roughly 5 seconds per resume.

That’s still a lot of time. It took a couple of hours. But I got all the resumes dumped on y system. I recently got admin access to the Harvest API, I’ll update this article if I get it to work.

Evaluating Candidates

Human in the loop: AI assists you in evaluating resumes. It shouldn’t entirely replace you. Every candidate should get a fair shot, and AI’s biases should not disadvantage anyone. In our approach, we’ll leverage AI to sort resumes, with a human in the loop to ensure the validity of evaluations.

So far, so good. Now comes the meaty part — writing an intelligence layer on top of 1300 resumes to sort them. Before we automate evaluation though, we need to align on the evaluation criteria.

Setting the Evaluation Criteria

One important takeaway I have from this project is that good AI-based automation mimics how you’d go about solving the problem. So I wrote down my criteria for evaluating candidates, along with the weights I assign to them. I ended up with three — education, product experience, and product-adjacent experience (design + engineering). I assigned weights to these criteria in the same order, and elaborated on them, setting context for an LLM.

Along with this, I also included the job-description of the role in the LLM’s context, empowering it to find it’s own interpretation of a fit — since the job description also set clear expectations for the role + what Enterpret does.

Automated Evaluations

Now that we’ve articulated on our evaluation criteria and have collected the resumes of 1300 candidates — it’s time to write the intelligence layer. I used LangChain to write a simple pluggable prompt that included the job description, our evaluation criteria and could hold the text parsed from a candidate’s resume. The output was a set of scores, out of 100 each, for each criteria. The LLM layer was GPT4. I tried 3.5-turbo as well, but the quality wasn’t that great.

For each candidate, hire-stack would sore their resumes out of 100 on each of our 3 evaluation criteria and created a sorted output matrix of the individual criterion scores and their weighted sum.

Sampling and Spot Checks

The script also allows you to sample resumes. I’d recommend sampling at a rate of ~1-10% initially, when you’re playing around with the evaluation criteria prompt. I set a sampling rate of 5% and went through all evaluated resumes a few times and tweaked the criteria description when I felt that GPT4 assessment was not aligned with mine. GPT4 was often overly lenient with product experience and education.

Final Run

After going through a few rounds of sampled resume evaluations, I felt confident enough in GPT4’s assessment to run it on the full batch of 1300 resumes. This took a good couple of hours at a modest rate-limit and burned through a few dollars on OpenAI.

I now had a list of 1300 candidates, sorted by their fit for the role, based on a carefully thought out evaluation criteria. Next steps, I started going through resumes from the top, re-evaluating the sorted candidates from my end, and putting a few in the to-reach-out bucket.

In the next 15 mins, I found 45 people to reach out to, and coordinate next steps with. Sigh. This would have taken hours, if not days, earlier. That brings us to our next stage of automation — the scheduling.

Scheduling Intro Calls

God bless Calendly. The ability to to share a link that let’s others block time on your calendar when it works mutually, is brilliant. I love that it has become an industry standard. For my job here, I relied on a combination of a Google Calendar appointment page (inspired by Calendly), and a personalized automated email.

Nothing technically brilliant in this automation here. I wrote a script that went through the evaluation results and reached out to shortlisted candidates at their email, plugging in their name and the 15 min appointment page link. Setting a max of 4 appointments a day, for a 2 hour slot in my day — I had sold out on booking for a week by Monday. I’d started this experiment on Saturday.

That’s all about Hire Stack! We eventually rolled out an offer to one of the candidates who was ranked in the top 1% of our evaluations, after going through the rest of the funnel.

You can find the code for all the steps along with instructions on setting up Hire Stack on your system on GitHub. If you don’t use Greenhouse, all you need is a bunch of resumes, and you’re ready to get started!

As of last week, Hire Stack is actively being used for hiring other roles. This makes me happy!

Subscribe to new essays→