Engineers can't gauge their own interview performance. And that makes them harder to hire.

interviewing.io is an anonymous technical interviewing platform. We started it because resumes suck and because we believe that anyone, regardless of how they look on paper, should have the opportunity to prove their mettle. In the past few months, we’ve amassed over 600 technical interviews along with their associated data and metadata. Interview questions tend to fall into the category of what you’d encounter at a phone screen for a back-end software engineering role at a top company, and interviewers typically come from a mix of larger companies like Google, Facebook, and Twitter, as well as engineering-focused startups like Asana, Mattermark, KeepSafe, and more.

Over the course of the next few posts, we’ll be sharing some { unexpected, horrifying, amusing, ultimately encouraging } things we’ve learned. In this blog’s heroic maiden voyage, we’ll be tackling people’s surprising inability to gauge their own interview performance and the very real implications this finding has for hiring.

First, a bit about setup

When an interviewer and an interviewee match on our platform, they meet in a collaborative coding environment with voice, text chat, and a whiteboard and jump right into a technical question. After each interview, people leave one another feedback, and each party can see what the other person said about them once they both submit their reviews. If both people find each other competent and pleasant, they have the option to unmask. Overall, interviewees tend to do quite well on the platform, with just under half of interviews resulting in a “yes” from the interviewer.

If you’re curious, we have a few public recordings of interviews done on the platform, so you can watch and see what an interview is really like. In addition to these, our feedback forms are attached below. There is one direct yes/no question, and we also ask about a few different aspects of interview performance using a 1-4 scale. We also ask interviewees some extra questions that we don’t share with their interviewers, and one of those questions is about how well they think they did. In this post, we’ll be focusing on the technical score an interviewer gives an interviewee and the interviewee’s self-assessment (both are circled below). For context, a technical score of 3 or above seems to be the rough cut-off for hirability.

Perceived versus actual performance

Below, you can see the distribution of people’s actual technical performance (as rated by their interviewers) and the distribution of their perceived performance (how they rated themselves) for the same set of interviews.¹

You might notice right away that there is a little bit of disparity, but things get interesting when you plot perceived vs. actual performance for each interview. Below, is a heatmap of the data where the darker areas represent higher interview concentration. For instance, the darkest square represents interviews where both perceived and actual performance was rated as a 3. You can hover over each square to see the exact interview count (denoted by “z”).

If you run a regression on this data², you get an R-squared of only 0.24, and once you take away the worst interviews, it drops down even further to a 0.16. For context, R-squared is a measurement of how well you can fit empirical data to some mathematical model. It’s on a scale from 0 to 1 with 0 meaning that everything is noise and 1 meaning that everything fits perfectly. In other words, even though some small positive relationship between actual and perceived performance does exist, it is not a strong, predictable correspondence.

You can also see there’s a non-trivial amount of impostor syndrome going on in the graph above, which probably comes as no surprise to anyone who’s been an engineer.

Gayle Laakmann McDowell of Cracking the Coding Interview fame has written quite a bit about how bad people are at gauging their own interview performance, and it’s something that I had noticed anecdotally when I was doing recruiting, so it was nice to see some empirical data on that front. In her writing, Gayle mentions that it’s the job of a good interviewer to make you feel like you did OK even if you bombed. I was curious about whether that’s what was going on here, but when I ran the numbers, there wasn’t any relationship between how highly an interviewer was rated overall and how off their interviewees’ self-assessments were, in one direction or the other.

Ultimately, this isn’t a big data set, and we will continue to monitor the relationship between perceived and actual performance as we host more interviews, but we did find that this relationship emerged very early on and has continued to persist with more and more interviews — R-squared has never exceeded 0.26 to date.

Why this matters for hiring

Now here’s the actionable and kind of messed up part. As you recall, during the feedback step that happens after each interview, we ask interviewees if they’d want to work with their interviewer. As it turns out, there’s a very statistically significant relationship (p < 0.0008) between whether people think they did well and whether they’d want to work with the interviewer. This means that when people think they did poorly, they may be a lot less likely to want to work with you³. And by extension, it means that in every interview cycle, some portion of interviewees are losing interest in joining your company just because they didn’t think they did well, despite the fact that they actually did.

How can one mitigate these losses? Give positive, actionable feedback immediately (or as soon as possible)! This way people don’t have time to go through the self-flagellation gauntlet that happens after a perceived poor performance, followed by the inevitable rationalization that they totally didn’t want to work there anyway.

Lastly, a quick shout-out to Statwing and Plotly for making terrific data analysis and graphing tools respectively.

¹There are only 254 interviews represented here because not all interviews in our data set had comprehensive, mutual feedback. Moreover, we realize that raw scores don’t tell the whole story and will be focusing on standardization of these scores and the resulting rat’s nest in our next post. That said, though interviewer strictness does vary, we gate interviewers pretty heavily based on their background and experience, so the overall bar is high and comparable to what you’d find at a good company in the wild.

²Here we are referring to linear regression, and though we tried fitting a number of different curves to the data, they all sucked.

³In our data, people were 3 times less likely to want to work with their interviewers when they thought they did poorly.

Pingback: People can’t gauge their own interview performance. And that makes them harder to hire. | Aline Lerner's Blog

Soham Mehta

December 15, 2015 at 7:12 pm

Terrific analysis. Thank you!

Jermaine Cole

December 15, 2015 at 7:33 pm

People be insecure, yo!

December 15, 2015 at 8:20 pm

You just described the Dunning-Kruger effect with empirical data https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

Aline Lerner
December 15, 2015 at 8:58 pm

I thought about alluding to the infamous Dunning-Kruger, but it’s not nearly as pronounced in our data as impostor syndrome… which makes me pretty happy to be honest 🙂

Michael

December 15, 2015 at 11:34 pm

Interesting article! I was wondering about that one really actionable component…

At some companies (especially larger ones, I imagine), the interviewer doesn’t have the final say even for a phone screen on whether someone will continue with the interview process. I’ve suggested to the committee for someone continue on and have them turned down anyway. So what is your advice for this situation, when giving genuine positive feedback immediately could result in a *really* awkward situation?

Aline Lerner
December 16, 2015 at 12:07 am

I know that some hiring processes can be complicated and involve a lot of moving parts. In those cases, probably the best thing to do is to try to give feedback as quickly as possible. Even if it’s not immediate, I expect that if it’s done within a day or two, it’ll get the candidate before they have a chance to do the rationalization song and dance.

I’d be interested to figure out exactly where the line of demarcation is for something like this, i.e. exactly when the rationalization window closes.

That aside, I’d be curious about why the interviewer doesn’t have the final say for a phone screen. It makes sense when there are multiple interviewers involved, as there might be for an onsite… but in the case where it’s just one interviewer, it seems like turnaround should be pretty quick.

1. Michael
  December 15, 2015 at 5:23 pm
  
  Being as quick as possible is probably the best bet, I agree.
  
  As far as final say, there are two things. The first is that someone may be extremely personable and, people not being entirely rational, an interviewer might give the benefit of the doubt when the evidence of technical skills isn’t there. A more impartial person on a committee wouldn’t have that bias. The other is to maintain a standard, which can also swing the other way — someone may say no to a candidate and the committee think the interviewer was too harsh.

Pingback: Mattermark Daily - Tuesday, December 15th, 2015 | Mattermark

Phil Miller

December 15, 2015 at 8:43 pm

Could you break out the correlation between candidate self-assesment and wanting to work with the interviewer by whether or not the interviewer would want to work with them, the interviewer’s rating of the candidate, or both?

Anonymous

December 17, 2015 at 11:53 am

This is bunch of non sense. I disagree with this author at a fundamental level. If resumes “suck” solving algorithms also doesn’t reflect how well a candidate can perform in the Job. I am a VP of Engineering at a startup in Bay area and I have interviewed lot of candidates who are very good at Algorithms but fail to succeed in my company.

Pingback: What operate the fully interviewers hold in overall? We regarded at thousands of actual interviews to search out out. – Startupon.net

Pingback: 優秀な面接者に共通するものとは？　数千の実例を調べてみました。- 後編 | キャリア・働き方 | POSTD

Benjamin

July 23, 2020 at 2:53 am

This is a great idea to give more data and good feedback to candidates for the position. I’m currently working on a project https://engre.co/ is an engineering platform where you can also apply your ideas. I think that it would be useful for many contractors to find out why they did not fit this or that project. This way everyone can analyze their opportunities and their offer within a narrow market and improve their chances of getting a job

Pingback: How to Hire Staff for Small Business | Uplift-Recruitment

Engineers can’t gauge their own interview performance. And that makes them harder to hire.

First, a bit about setup

Perceived versus actual performance

Why this matters for hiring

15 thoughts on “Engineers can’t gauge their own interview performance. And that makes them harder to hire.”

Leave a Comment Cancel Reply

First, a bit about setup

Perceived versus actual performance

Why this matters for hiring

Share this:

15 thoughts on “Engineers can’t gauge their own interview performance. And that makes them harder to hire.”

Leave a Comment Cancel Reply