interviewing.io logo interviewing.io blog
better interviewing through data
Navigation
Featured

Uncategorized

Impostor syndrome strikes men just as hard as women… and other findings from thousands of technical interviews

Posted on October 30th, 2018.

The modern technical interview is a rite of passage for software engineers and (hopefully!) the precursor to a great job. But it’s also a huge source of stress and endless questions for new candidates. Just searching “how do I prepare for a technical interview” turns up millions of Medium posts, coding bootcamp blogs, Quora discussions, and entire books.

Despite all this conversation, people struggle to know how they’re even doing in interviews. In a previous post, we found that a surprisingly large number of interviewing.io’s users consistently underestimate their performance, making them more likely to drop out of the process and ultimately harder to hire. Now, and with considerably more data (over 10k interviews led by real software engineers!), we wanted to go deeper: what seems to make candidates worse at gauging their own performance?

We know some general facts that make accuracy a challenge: people aren’t always great at assessing or even remembering their performance on difficult cognitive tasks like writing code.1 Technical interviews can be particularly hard to judge if candidates don’t have much experience with questions with no single right answer. Since many companies don’t share any kind of detailed post-interview feedback (beyond a yes/no) with candidates for liability reasons, many folks never get any sense of how they did, what they did well, or what could have been better.2, 3 Indeed, pulling back the curtain on interviewing, across the industry, was one of the primary motivators for building interviewing.io!

But to our knowledge there’s little data out there looking specifically at how people feel after real interviews on this scale, across different companies–so we gathered it, giving us the ability to test interesting industry assumptions about engineers and coding confidence.

One big factor we were interested in was impostor syndrome. Impostor syndrome resonates with a lot of engineers,4 indicating that many wonder whether they truly match up to colleagues and discount even strong evidence of competence as a fluke. Impostor syndrome can make us wonder whether we can count on the positive performance feedback that we’re getting, and how much our opportunities have come from our own effort, versus luck. Of particular interest to us was whether this would show up for women on our platform. There’s a lot of research evidence that candidates from underrepresented backgrounds experience a greater lack of belonging that feeds impostor syndrome,5 and this could show up as inaccuracy about judging your own interview performance.

The setup

interviewing.io is a platform where people can practice technical interviewing anonymously, and if things go well, get jobs at top companies in the process. We started it because resumes uck and because we believe that anyone, regardless of how they look on paper, should have the opportunity to prove their mettle.

When an interviewer and an interviewee match on interviewing.io, they meet in a collaborative coding environment with voice, text chat, and a whiteboard and jump right into a technical question (feel free to watch this process in action on our interview recordings page).  After each interview, people leave one another feedback, and each party can see what the other person said about them once they both submit their reviews.

Here’s an example of an interviewer feedback form:

Feedback form for interviewers

Feedback form for interviewers

Immediately after the interview, candidates answered a question about how well they thought they’d done on the same 1-4 scale:

Feedback form for interviewees

Feedback form for interviewees

For this post, we looked at over 10k technical interviews led by real software engineers from top companies. In each interview, a candidate was rated by an interviewer on their problem-solving ability, technical ability, and communication skills, as well as whether the interviewer would advance them to the next round. This gave us a measure of how different someone’s self-rating was from the rating that the interviewer actually gave them, and in which direction. In other words, how skewed was their estimation from their true performance?

Going in, we had some hunches about what might matter:

  • Gender. Would women be harder on their coding performance than men?
  • Having been an interviewer before. It seems reasonable that having been on the other side will pull back the curtain on interviews.
  • Being employed at a top company. Similar to above.
  • Being a top-performing interviewee on interviewing.io — people who are better interviewees overall might have more confidence and awareness of when they’ve gotten things right (or wrong!)
  • Being in the Bay Area or not. Since tech is still so geographically centered on the Bay Area, we considered that folks who live in a more engineering-saturated culture could have greater familiarity with professional norms around interviews.
  • Within the interview itself, question quality and interviewer quality. Presumably, a better interviewer is also a better communicator, whereas a confusing interviewer might throw off a candidates’ entire assessment of their performance. We also looked at whether it was a practice interview, or for a specific company role.
  • For some candidates, we could also look at few measures of their personal brand within the industry, like their number of GitHub and Twitter followers. Maybe people with a strong online presence are more sure of themselves when they interview?

So what did we find?

Women are just as accurate as men at assessing their technical ability

Contrary to expectations around gender and confidence, we didn’t find a reliable statistically significant gender difference in accuracy. At first, it looked like female candidates were more likely to underestimate their performance, but when we controlled for other variables, like experience and rated technical ability, it turned out the key differentiator was experience. More experienced engineers are more accurate about their interview performance, and men are more likely to be experienced engineers, but experienced female engineers are just as accurate about their technical ability.

Based on previous research, we hypothesized that impostor syndrome and a greater lack of belonging could result in female candidates penalizing their interview performance, but we didn’t find that pattern.6 However, our finding echoes a research project from the Stanford Clayman Institute for Gender Research, which looked at 1,795 mid-level tech workers from high tech companies. They found that women in tech aren’t necessarily less accurate when assessing their own abilities, but do have significantly different ideas about what success requires (e.g., long working hours and risk-taking). In other words, women in tech may not doubt their own abilities but might have different ideas about what’s expected. And a survey from Harvard Business Review  asking over a thousand professionals about their job application decisions also made this point. Their results emphasized that gender gaps in evaluation scenarios could be more about different expectations for how scenarios like interviews are judged.

That said, we did find one interesting difference: women went through fewer practice interviews overall than men did. The difference was small but statistically significant, and harkens back to our earlier finding that women leave interviewing.io roughly 7 times as often as men do, after a bad interview.

But in that same earlier post, we also found that masking voices didn’t impact interview outcomes. This whole cluster of findings affirms what we suspected and what the folks doing in-depth studies of gender in tech have found: it’s complicated. Women’s lack of persistence in interviews can’t be explained only by impostor syndrome about their own abilities, but it’s still likely that they’re interpreting negative feedback more severely and making different assumptions about interviews.

Here’s the distribution of accuracy distance for both female and male candidates on our platform (zero indicates a rating that matches the interviewer’s score, while negative values indicate underestimated score, and positive values indicate an overestimated score). The two groups look pretty much identical:

Accuracy by gender

What else didn’t matter?

Another surprise: having been an interviewer didn’t help. Even people who had been interviewers themselves don’t seem to get an accuracy boost from that. Personal brand was another non-finding. People with more GitHub followers weren’t more accurate than people with few to no GitHub followers. Nor did interviewer rating matter (i.e. how well an interviewer was reviewed by their candidates), although to be fair, interviewers are generally rated quite highly on the site.

So what was a statistically significant boost to accurate judgments of interview performance? Mostly, experience.

Experienced engineers have a better sense for how well they did in interviews, compared with engineers earlier in their careers.7 But it doesn’t seem to just be that you’re better at gauging your interview performance because you’re better at writing code; although there is a small lift from this, with higher rated engineers being more accurate. But when you look at junior engineers, even top-performing junior candidates struggled to accurately assess their performance.8  

experienced versus juniors

Our data mirrors a trend seen in Stack Overflow’s 2018 Developer survey. They asked respondents several questions about confidence and competition with other developers, and noted that more experienced engineers feel less competitive and more confident.9 This isn’t necessarily surprising: experience is correlated with skill level, after all, and highly skilled people are likely to be more confident. But our analysis let us control for performance and code skill within career groups, and we still found that experienced engineers were better at predicting their interview scores. There are probably multiple factors here: experienced engineers have been through more interviews, have led interviews themselves, and have a stronger sense of belonging, all of which may combat impostor syndrome.

Insider knowledge and context also seems to help: Being in the Bay Area and being at a top company both made people more accurate. Like the experienced career group, engineers who seem more likely to have contextual industry knowledge are also more accurate. We found small but statistically significant lifts from factors like being located in the Bay Area and working at a top company. However, the lift from working at a top company seems to mostly measure a lift from overall technical ability: being at a top company is essentially a proxy measure for being a more experienced, higher quality engineer.

Finally, as you get better at interviewing and move into company interviews, you do get more accurate. People were more accurate about their performance in company interviews compared to practice interviews, and their overall ranking on the interviewing.io site also predicted improved accuracy: interviewing.io also gives users an overall ranking, based on their performance over multiple interviews and weighted toward more recent measures. People who scored in the top 25% were more likely to be accurate about their interview performance.

In general, how are people at gauging their interview performance overall? We’ve looked at this before, with roughly a thousand interviews, and now, with ten thousand, the finding continues to hold up. Candidates were accurate about how they did in only 46% of interviews, and underestimated themselves in 35% of interviews (and the remaining 19%, of course, are the overestimators). Still, candidates are generally on the right track — it’s not like people who score a 4 are always giving themselves a 1.10 Self-ratings are statistically significantly predictive for actual interview scores (and positively correlated), but that relationship is noisy.

The implications

Accurately judging your own interview performance is a skill in its own right and one that engineers need to learn from experience and context in the tech industry. But we’ve also learned that many of the assumptions we made about performance accuracy didn’t hold up to scrutiny — female engineers had just as accurate a view of their own skills as male ones, and engineers who had led more interviews or were well known on GitHub weren’t particularly better at gauging their performance.

What does this mean for the industry as a whole? First off, impostor syndrome appears to be the bleary-eyed monster that attacks across gender ability, and how good you are, or where you are, or how famous you are isn’t that important. Seniority does help mitigate some of the pain, but impostor syndrome affects everyone, regardless of who they are or where they’re from. So, maybe it’s time for a kinder, more empathetic interviewing culture. And a culture that’s kinder to everyone, because though marginalized groups who haven’t been socialized in technical interviewing are hit the hardest by shortcomings in the interview process, no one is immune to self-doubt.

We’ve previously discussed what makes someone a good interviewer, and empathy plays a disproportionately large role. And we’ve seen that providing immediate post-interview feedback is really important for keeping candidates from dropping out. So, whether you’re motivated by kindness and ideology or cold, hard pragmatism, a bit more kindness and understanding toward your candidates is in order.

Cat Hicks, the author of this guest post, is a researcher and data scientist with a focus on learning. She’s published empirical research on learning environments, and led research on the cognitive work of engineering teams at Google and Travr.se. She holds a PhD in Psychology from UC San Diego.

1Self-assessment has been explored in a number of domains, and often used to measure learning. One important criticism is that it’s highly impacted by people’s motivation and emotional state at the time of asking. See: Sitzmann, T., Ely, K., Brown, K. G., & Bauer, K. N. (2010). Self-assessment of knowledge: A cognitive learning or affective measure?. Academy of Management Learning & Education, 9(2), 169-191.

2Designing a good technical interview is no small task on the interviewer side. For an informal discussion of this, see this post.

3For some anecdotal conversation about interview self-assessment, see this one

4E.g., this article and this one.

5Some examples of further reading in social science research:
Good, C., Rattan, A., & Dweck, C. S. (2012). Why do women opt out? Sense of belonging and women’s representation in mathematics. Journal of personality and social psychology, 102(4), 700.
Master, A., Cheryan, S., & Meltzoff, A. N. (2016). Computing whether she belongs: Stereotypes undermine girls’ interest and sense of belonging in computer science. Journal of Educational Psychology, 108(3), 424.

6One complication for our dataset is the representation of experienced female engineers: we simply didn’t have very many, which is true to the demographics of the tech industry, but also means that selection biases in the small group of experienced female engineers we do have are more likely to be present, and this isn’t the be-all and end-all of exploring for group differences. We’d like to continue looking at interviews with female participants to explore this fully.

7These effects and the previous non-findings were all explored in a linear mixed model. Significant results for the individual effects are all p<.05

8Experienced engineers have an average skew of -.14; Junior engineers have an average skew of -.22, New Grads have an average skew of -.25.

9See also: https://insights.dice.com/2018/03/19/imposter-syndrome-tech-pros-age/

10Another wrinkle with the design behind this data is that there’s a floor and a ceiling on the scale: people who always score a 4, for example, can’t ever overrate themselves, because they’re already at the top of the scale. We dealt with this a couple of ways: by excluding people at the floor and ceiling and re-running analyses on the middle subset, and by binning skew into either accurate or not and looking at that. The findings hold up across this.

Featured

Uncategorized

Exactly what to say when recruiters ask you to name the first number… and other negotiation word-for-words

Posted on August 16th, 2018.

There are a lot of resources out there that talk about salary negotiation but many tend to skew a bit theoretical. In my experience, one of the hardest things about negotiating your salary is knowing what to say in tough, ambiguous situations with a power balance that’s not in your favor. What’s OK? What’s rude? What are the social norms? And so on.

Before I started interviewing.io, I’ve worked as a software engineer, an in-house recruiter, and an agency recruiter, so I’ve literally been on all sides of the negotiating table. For the last few years, I’ve been guest-lecturing MIT’s 6.UAT, a class about technical communication for computer science majors. Every semester, negotiation is one of the most-requested topics from students. In this post, I’m sharing the content of that lecture, which is targeted toward students, but has served seasoned industry folks just as well. You’re never too young or too old to advocate for yourself.

Btw, if you don’t like reading and prefer long, rambly diatribes in front of an unsightly glass wall, I covered most of this material (and other stuff) in a webinar I did with the fine people at Udacity (where I used to run hiring) a few months ago. So, pick your poison.

Why negotiate at all, especially if I’m junior?

If you’re early in your career, you might say that negotiation isn’t worth the hassle — after all, junior roles have pretty narrow salary bands. There are a few reasons this view is short-sighted and wrong. First, though it’s pretty unlikely in the grand scheme of things, if you’re applying to a startup, there might come a magical day when your equity is worth something. This is especially true if you’re an early employee — with a good exit, a delta of a few tenths of a percent might end up being worth a down payment on a home in San Francisco.

But, let’s get real, your equity is likely worthless (except interviewing.io’s equity… that’s totes gonna be worth something), so let me give you a better, more immediate reason to learn to haggle early in your career, precisely because that’s when the stakes are low. Humans are frighteningly adaptable creatures. Scared of public speaking? Give 3 talks. The first one will be gut-wrenchingly horrific, the stuff of nightmares. Your voice will crack, you’ll mumble, and the whole time, you’ll want to vomit. The next one will be nerve-wracking. The last one will mostly be OK. And after that, you’ll be just fine. Same thing applies to approach anxiety, mathematical proofs, sex, and, you guessed it, salary negotiation!

So, make all the awkward, teeth-cringing mistakes now, while it doesn’t matter, and where failure will cost you $5K or $10K a year. Because the further along you get in your career, the bigger the upside will be… and the bigger the downside will be for not negotiating. Not only will the salary bands be wider for senior roles, but as you get more senior, more of your comp comes from equity, and equity has an even wider range for negotiating. Negotiating your stock well can make 6-figure differences and beyond (especially if you apply some of these same skills to negotiating with investors over term sheets, should you ever start your own company)… so learn these skills (and fail) while you’re young, because the older you get, the harder it’s going to be to start and the more high-stakes it’s going to be.

So, below, as promised, I’ll give you a few archetypal, stress-inducing situations and what to say, word-for-word in each one. But first, let me address the elephant in the room…

Will my offer be rescinded if I try to negotiate?

As I mentioned earlier, this blog post is coming out of a lecture I give at MIT. Every semester, I start the negotiation portion of the lecture with the unshakeable refrain that no one will ever rescind your offer for negotiating. Last semester was different, though. I was just starting to feel my oats and get into my talk (the negotiation piece comes about halfway through) and smugly recited the bit about offers never being rescinded, followed by my usual caveat… “unless you act like a douche while negotiating.” Then, a hand shot up in the back of the room. Ah ha, I thought to myself, one of the non-believers. Ready to placate him, I called on the gentleman in the back.

“My offer got rescinded for negotiation.”

The class broke out into uproarious laughter. I laughed too. It was kind of funny… but it was also unnerving, and I wanted to get to the bottom of it.

“Were you a giant jerk when you negotiated?”

“Nope.” Shit, OK, what else can I come up with…

“Were you applying at a really small company with maybe one open role?” I asked, praying against hope that he’d say yes.

“Yes.”

“Thank god.”

So, there’s the one exception I’ve found so far to my blanket statement. After working with hundreds and hundreds of candidates back when I was still a recruiter, I had never heard or seen an offer get rescinded (and none of my candidates acted like douches while negotiating, thank god), until then. So, if you’re talking to a super small company with one role that closes as soon as they find someone, yes, then they might rescind the offer.

But, to be honest, and I’m not just saying this because I was wrong in front of hundreds of bloodthirsty undergrads, an early startup punishing a prospective employee for being entrepreneurial is a huge red flag to me.

OK, so, now onto the bit where I tell you exactly what to say.1

What to say when asked to name the first number

There will come a time in every job search where a recruiter will ask you about your compensation expectations. This will likely happen very early in said search, maybe even during the first call you’ll ever have with the company.

I think this is a heinous, illaudable practice fraught with value asymmetry. Companies know their salary ranges and roughly what you’re worth to them before they ever talk to you (barring phenomenal performance in interviews which kicks you into a different band). And they know what cost of living is in your area. So they already have all the info they need about you, while you have none about them or the role or even your market value. Sure, there are some extenuating circumstances where you are too expensive, e.g. you’re like an L6 at Google and are talking to an early stage startup that can only afford to pay you 100K a year in base, but honestly even in that situation, if the job is cool enough and if you have the savings, you might take it anyway.

So, basically, telling them something will only hurt you and never help you. So don’t do it. Now, here’s exactly what to say when asked to name the first number.

At this point, I don’t feel equipped to throw out a number because I’d like to find out more about the opportunity first – right now, I simply don’t have the data to be able to say something concrete. If you end up making me an offer, I would be more than happy to iterate on it if needed and figure out something that works. I also promise not to accept other offers until I have a chance to discuss them with you.

TADA!

What to say when you’re handed an exploding offer

Exploding offers, in my book, are the last bastion of the incompetent. The idea goes something like this… if we give a candidate an aggressive deadline, they’ll have less of a chance to talk to other companies. Game theory for the insipid.

Having been on the other side of the table, I know just how arbitrary offer deadlines often are. Deadlines make sense when there is a limited number of positions and applicants all come in at the same time (e.g. internships). They do not make any sense in this market, where companies are perpetually hiring all the time — therefore it’s an entirely artificial construct. Joel Spolsky, the creator of Trello and Stack Overflow, had something particularly biting to say on the matter of exploding offers many years ago (the full post, Exploding Offer Season, is really good):

“Here’s what you’re thinking. You’re thinking, well, that’s a good company, not my first choice, but still a good offer, and I’d hate to lose this opportunity. And you don’t know for sure if your number one choice would even hire you. So you accept the offer at your second-choice company and never go to any other interviews. And now, you lost out. You’re going to spend several years of your life in some cold dark cubicle with a crazy boss who couldn’t program a twenty out of an ATM, while some recruiter somewhere gets a $1000 bonus because she was better at negotiating than you were.”

Even in the case of internships, offer deadlines need not be as aggressive as they often are, and I’m happy to report that many college career centers have taken stands against exploding offers. Nevertheless, if you’re not a student or if your school hasn’t outlawed this vile practice, here’s exactly what to say if it ever happens to you.

I would very much appreciate having a bit more time. I’m very excited about Company X. At the same time, choosing where I work is extremely important to me. Of course, I will not drag things out, and I will continue to keep you in the loop, but I hope you can understand my desire to make as informed of a decision as possible. How about I make a decision by…?

The reverse used car salesman… or what to say to always get more

At the end of the day, the best way to get more money is to have other offers. I know, I know, interviewing sucks and is a giant gauntlet-slog, but in many cases, having just one other offer (so, I don’t know, spending a few extra days of your time spread over a few weeks) can get you at least $10K extra. It’s a pretty rational, clear-cut argument for biting the slog-bullet and doing a few more interviews.

One anecdote I’ll share on the subject goes like this. A few years ago, a close friend of mine who’s notoriously bad at negotiation and hates it with a passion was interviewing at one of the big 4 companies. I was trying to talk to him into getting out there just a little bit, for the love of god, and talk to at least one more company. I ended up introducing him to a mid-sized startup where he quickly got an onsite interview. Just mentioning that he had an onsite at this company to his recruiter from the bigco got him an extra $5K in his signing bonus.

Offers are, of course, better than onsites, but in a pinch, even onsites will do… because every onsite increases your odds of not accepting the offer from the company you’re negotiating with. So, let’s say you do have some offers. Do you reveal the details?

The answer is that it depends. If the cash parts of the offers you have are worth more than the one you have in hand, then you can reveal the details. If they’re worth more in total but less in cash, it’s a bit dicier because equity at smaller companies is kind of worthless… you can still use it as leverage if you tell the story that that equity is worth more to YOU, but that’s going to take a bit more finesse, so if you’ve never negotiated before, you might want to hold off.

If the cash part of your equity is not worth more, it’s sufficient to say you have offers and when pressed, you can simply say that you’re not sharing the details (it’s ok not to share the details).

But whether you reveal details or not, here’s the basic formula for getting more. See why I call it the reverse used car salesman?

I have the following onsites/offers, and I’m still interviewing at Company X and Company Y, but I’m really excited about this opportunity and will drop my other stuff and SIGN TODAY if…

So, “if” what? I propose listing 3 things you want, which will typically be:

  • Equity
  • Salary
  • Signing/relocation bonus

The reason I list 3 things above isn’t because I expect you’ll be able to get all 3, but this way, you’re giving the person you’re negotiating with some options. In my experience, you’ll likely get 2 out of the 3.

So, what amounts should you ask for when executing on the reverse used car salesman? It’s usually easier to get equity and bonuses than salary (taxed differently from the company’s perspective, signing bonus is a one-off rather than something that repeats every year). Therefore, it’s not crazy to ask for 1.5X-2X the equity and an extra 10-15% in salary. For the bonus portion, a lot depends on the size of the company, but if you’re talking to a company that’s beyond seed stage, you can safely ask for at least 20% of your base salary as a signing bonus.2

What if the company says no to all or most of these and are a big enough brand to where you don’t have much of a leg to stand on? You can still get creative. One of our users told me about a sweet deal he came up with — he said he’d sign today if he got to choose the team he could join and had a specific team in mind.

Other resources

As I mentioned at the beginning of this post, there are plenty of blog posts and resources on the internets about negotiation, so I’ll just mention two of my favorites. The first is a riveting, first-hand account of negotiation adventures from one of my favorite writers in this space, Haseeb Qureshi. In his post, Haseeb talks about how he negotiated for a 250K (total package) offer with Airbnb and what he learned along the way. It’s one of the most honest and thoughtful accounts of the negotiation process I’ve ever read.

The second post I’ll recommend is a seminal work in salary negotiation by Patrick McKenzie (patio11 on Hacker News, in case that’s more recognizable). I read it back when I was still an engineer, and it was one of those things that indelibly changed how I looked at the world. I still madly link anyone and everyone who asks me about negotiation to this piece of writing, and it’s still bookmarked in my browser.

If you’re an interviewing.io user and have a job offer or five that you’re weighing and want to know exactly what to say when negotiating in your own nuanced, unique situation, please email me, and I’ll whisper sweet, fiscal nothings in your ear like a modern-day Cyrano de Bergerac wooing the sweet mistress that is capitalism.3

1If you’re interviewing at interviewing.io, USE THESE ON ME. IT’LL BE GREAT. And while you’re at it, use these on me as well.

2Some of the larger tech companies offer huge signing bonuses to new grads (~100K-ish). Obviously this advice is not for that situation.

3An increasing number of our customers pay us on subscription, so we don’t get more money if you do.4 And for the ones who don’t, salary and recruiting fees typically come out of a different budget.

4In the early days of interviewing.io, we tried to charge a flat per-hire fee in lieu of a percentage of salary, precisely for this reason — we wanted to set ourselves up as an entirely impartial platform where lining up with our candidates’ best interests was codified into our incentive structure. Companies were pretty weirded out by the flat fee, so we went back to doing percentages, but these days we’re moving over as many of our customers to subscription as possible — it’s cheaper for them, better for candidates, and I won’t lie that I like to see that recurring revenue.

Featured

Uncategorized

We looked at how a thousand college students performed in technical interviews to see if where they went to school mattered. It didn’t.

Posted on February 13th, 2018.

interviewing.io is a platform where engineers practice technical interviewing anonymously. If things go well, they can unlock the ability to participate in real, still anonymous, interviews with top companies like Twitch, Lyft and more. Earlier this year, we launched an offering specifically for university students, with the intent of helping level the playing field right at the start of people’s careers. The sad truth is that with the state of college recruiting today, if you don’t attend one of very few top schools, your chances of interacting with companies on campus are slim. It’s not fair, and it sucks, but university recruiting is still dominated by career fairs. Companies pragmatically choose to visit the same few schools every year, and despite the career fair being one of the most antiquated, biased forms of recruiting that there is, the format persists, likely due to the fact that there doesn’t seem to be a better way to quickly connect with students at scale. So, despite the increasingly loud conversation about diversity, campus recruiting marches on, and companies keep doing the same thing expecting different results.

In a previous blog post, we explained why companies should stop courting students from the same five schools. Regardless of your opinion on how important that idea is (for altruistic reasons, perhaps), you may have been left skeptical about the value and practicality of broadening the college recruiting effort, and you probably concede that it’s rational to visit top schools, given limited resources — while society is often willing to agree that there are perfectly qualified students coming out of non-top colleges, they maintain that they’re relatively rare. We’re here to show you, with some nifty data from our university platform, that this not true.

To be fair, this isn’t the first time we’ve looked at whether where you went to school matters. In a previous post, we found that taking Udacity and Coursera programming classes mattered way more than where you went to school. And way back when, one of our founders figured out that where you went to school didn’t matter at all but that the number of typos and grammatical errors on your resume did. So, what’s different this time? The big, exciting thing is that these prior analyses were focused mostly on engineers who had been working for at least a few years already, making it possible to argue that a few years of work experience smoothes out any performance disparity that comes from having attended (or not attended a top school). In fact, the good people at Google found that while GPA didn’t really matter after a few years of work, it did matter for college students. So, we wanted to face this question head-on and look specifically at college juniors and seniors while they’re still in school. Even more pragmatically, we wanted to see if companies limiting their hiring efforts to just top schools means they’re going to get a higher caliber of candidate.

Before delving into the numbers, here’s a quick rundown of how our university platform works and the data we collect.

The setup

For students who want to practice on interviewing.io, the first step is a brief (~15-minute) coding assessment on Qualified to test basic programming competency. Students who pass this assessment, i.e. those who are ready to code while another human being breathes down their neck, get to start booking practice interviews.

When an interviewer and an interviewee match on our platform, they meet in a collaborative coding environment with voice, text chat, and a whiteboard and jump right into a technical question. Check out our recordings page to see this process in action.

Interview questions on the platform tend to fall into the category of what you’d encounter at a phone screen for a back-end software engineering role, and interviewers typically come from top companies like Google, Facebook, Dropbox, Airbnb, and more.

After every interview, interviewers rate interviewees on a few different dimensions, including technical ability. Technical ability gets rated on a scale of 1 to 4, where 1 is “poor” and 4 is “amazing!”. On our platform, a score of 3 or above has generally meant that the person was good enough to move forward. You can see what our feedback form looks like below:

new_interviewer_feedback_circled

On our platform, we’re fortunate to have thousands of students from all over the U.S., spanning over 200 universities. We thought this presented a unique opportunity to look at the relationship between school tier and interview performance for both juniors (interns) and seniors (new grads). To study this relationship, we first split schools into the following four tiers, based on rankings from U.S. News & World Report:

  • “Elite” schools (e.g. MIT, Stanford, Carnegie Mellon, UC-Berkeley)
  • Top 15 schools (not including top tier, e.g. University of Wisconsin, Cornell, Columbia)
  • Top 50 schools (not including top 15, e.g. Ohio State University, NYU, Arizona State University)
  • The rest (e.g. Michigan State, Vanderbilt University, Northeastern University, UC-Santa Barbara)

Then, we ran some statistical significance testing on interview scores vs. school tier to see if school tier mattered, for both interns (college juniors) and new grads (college seniors), comprising a set of roughly 1000 students.

Does school have anything to do with interview performance?

In the graphs below, you can see technical score distributions for interviews with students in each of the four school tiers (see legend). As you recall from above, each interview is scored on a scale of 1 to 4, where 1 is the worst and 4 is the best.

First, the college juniors…

interns by tier

And then, the seniors…

New grads by tier

What’s pretty startling is that the shape of these distributions, for both juniors and seniors, is remarkably similar. Indeed, statistical significance testing revealed no difference between students of any tier when it came to interview performance.1 What this means is that top-tier students are achieving the same results as those in no-name schools. So the question becomes: if the students are comparable in skill, why are companies spending egregious amounts of money attracting only a subset of them?

Okay, so what are companies missing?

Besides missing out on great, cheaper-to-acquire future employees, companies are missing out on an opportunity to save time and money. Right now a ridiculous amount of money is being spent on university recruiting. We’ve previously cited the $18k price tag just for entry to the MIT career fair. In a study done by Lauren Rivera through the Harvard Business Review, she reveals that one firm budgeted nearly $1m just for social recruiting events on a single campus.

The higher price tag of these events also means it makes even less sense for smaller companies or startups to try and compete with high-profile, high-profit tech giants. Most of the top schools that are being heavily pursued already have enough recruiters vying for their students. Unwittingly, this pursuit seems to run contrary to most companies desires for high diversity and long-term sustainable growth.

Even when companies do believe talent is evenly distributed across school tiers, there are still reasons for why companies might recruit at top schools. There are other factors that help elevate certain schools in a recruiter’s mind. There are long-standing company-school relationships (for example, the number of alumni who work at the company currently). There are signaling effects too — companies get Silicon Valley bonus points by saying their eng team is comprised of a bunch of ex-Stanford, ex-MIT, ex- etc. etc. students.

A quick word about selection bias

Since this post appeared on Hacker News, there’s been some loud, legitimate discussion about how the pool of students on interviewing.io may not be representative of the population at large because we have a self-selected pool of students who decided to practice interviewing. Certainly, all the blog posts we publish are subject to this (very valid) line of criticism, and for this post in particular. As such, selection bias in our user pool might mean that 1) we’re getting only the worst students from top schools (because, presumably, the best ones don’t need the practice), or 2) we’re getting only the best/most motivated students for non-top schools, or both. Any subset of these is entirely possible, but we have a few reasons why we believe what we’ve published here might hold true regardless.

First off, in our experience, regardless of their background or pedigree, everyone is scared of technical interviewing. Case in point… before we started working on interviewing.io, we didn’t really have a product yet. So before investing a lot of time and heartache into this questionable undertaking, we wanted to test the waters to see if interview practice was something engineers really wanted, and more so, who these engineers that wanted practice were. So, we put up a pretty mediocre landing page on Hacker News… and got something like 7,000 signups the first day. Of these 7,000 signups, roughly 25% were senior (4+ years of experience) engineers from companies like Google and Facebook (this isn’t to say that they’re necessarily the best engineers out there… but just that the engineers the market seems to value the most still need our service).

Another data point comes from one of our founders. Every year, Aline does a guest lecture on job search preparedness for a technical communication course at MIT. This course is one way to fulfill the computer science major communication requirement, so enrollment tends to span the gamut of computer science students. Before every lecture, she sends out a survey asking students what their biggest pain points are in preparing for their job search. Every year, trepidation about technical interviewing is either at the top of the list of 2nd from the top.

And though this doesn’t directly address the issue of whether we’re only getting the best of the worst or the worst of the best (I hope the above has convinced you there’s more to it than that), here’s the distribution of school tiers among our users, which I expect mirrors the kinds of distributions companies see in their student applicant pool:

New grads by tier

So what can companies do?

As such, companies may never stop recruiting at top-tier schools entirely, but they ought to at least include schools outside of that very small circle in the search for future employees. The end result of the data is the same: for good engineers, school means a lot less than we think. The time and money that companies put in to compete for candidates within the same select few schools would be better spent creating opportunities that include everyone, as well as developing tools to vet students more fairly and efficiently.

As you saw above, we used a 15-minute coding assessment to cull our inbound student flow, and just a short challenge leveled the playing field between students from all walks of life. At the very least, we’d recommend employers do the same thing in their process. But, of course, we’d be remiss if we didn’t suggest one other thing.

At interviewing.io, we’ve proudly built a platform that grants the best-performing students access to top employers, no matter where they went to school or where they come from. Our university program, in particular, allows us to grant companies the privilege to reach an exponentially larger pool of students, for the same cost of attending one or two career fairs at top target schools. Want diverse, top talent without the chase? Sign up to be an employer on our university platform!

1Of course, this hinges on everyone completing a quick 15-minute coding challenge first, to ensure they’re ready for synchronous technical interviews. We’re excited about this because companies can replicate this step in their process as well!

Featured

Uncategorized

What do the best interviewers have in common? We looked at thousands of real interviews to find out.

Posted on November 29th, 2017.

At interviewing.io, we’ve analyzed and written at some depth about what makes for a good interview from the perspective of an interviewee. However, despite the inherent power imbalance, interviewing is a two-way street. I wrote a while ago about how, in this market, recruiting isn’t about vetting as much as it is about selling, and not engaging candidates in the course of talking to them for an hour is a woefully missed opportunity. But, just like solving interview questions is a learned skill that takes time and practice, so, too, is the other side of the table. Being a good interviewer takes time and effort and a fundamental willingness to get out of autopilot and engage meaningfully with the other person.

Of course, everyone and their uncle has strong opinions about what makes someone a good interviewer, so instead of waxing philosophical, we’ll present some data and focus on analytically answering questions like… Does it matter how strong of an engineering brand your company has, for instance? Do the questions you ask actually help get candidates excited? How important is it to give good hints to your candidate? How much should you talk about yourself? And is it true that, at the end of the day, what you say is way less important than how you make people feel?1 And so on.

Before I delve into our findings, I’ll say a few words about interviewing.io and the data we collect.

The setup

interviewing.io is an anonymous technical interviewing platform. On interviewing.io, people can practice technical interviewing anonymously, and if things go well, unlock real (still anonymous) interviews with companies like Lyft, Twitch, Quora, and more.

The cool thing is that both practice and real interviews with companies take place within the interviewing.io ecosystem. As a result, we’re able to collect quite a bit of interview data and analyze it to better understand technical interviewing. One of the most important pieces of data we collect is feedback from both the interviewer and interviewee about how they thought the interview went and what they thought of each other. If you’re curious, you can watch a real interview on our recordings page, and see what the feedback forms for interviewers and interviewees look like below — in addition to one direct yes/no question, we also ask about a few different aspects of interview performance using a 1-4 scale. We also ask interviewees some extra questions that we don’t share with their interviewers, one of which is their own take on how they thought they did.

Feedback form for interviewers

Feedback form for interviewers

Feedback form for interviewees

Feedback form for interviewees

In this post, we’ll be analyzing feedback and outcomes of thousands of real interviews with companies to figure out what traits the best interviewers have in common.

Before we get into the nitty-gritty of individual interviewer behaviors, let’s first put the value of a good interviewer in context by looking at the impact of a company’s brand on the outcome. After all, if brand matters a lot, then maybe being a good interviewer isn’t as important as we might think.

Brand strength

So, does brand really matter for interview outcomes? One quick caveat before we get into the data: every interview on the platform is user-initiated. In other words, once you unlock our jobs portal (you have to do really well in practice interviews to do so), you decide who you talk to. So, candidates talking to companies on our platform will be predisposed to move forward because they’ve chosen the company in the first place. And, as should come as no surprise to anyone, companies with a very strong brand have an easier time pulling candidates (on our platform and out in the world at large) than their lesser-known counterparts. Moreover, many of the companies we work with do have a pretty strong brand, so our pool isn’t representative of the entire branding landscape. However, all is not lost — in addition to working with very recognizable brands, we work with a number of small, up-and-coming startups, so we hope that if you, the reader, are coming from a company that’s doing cool stuff but that hasn’t yet become a household name, our findings likely apply to you. And, as you’ll see, getting candidates in the door isn’t the same as keeping them.

To try to quantify brand strength, we used three different measures: the company’s Klout Score (yes, that still exists), its Mattermark Mindshare Score, and its score on Glassdoor (under general reviews).2

When we looked at interview outcomes relative to brand strength, its impact was not statistically significant. In other words, we found that brand strength didn’t matter at all when it came to either whether the candidate wanted to move forward or how excited the candidate was to work at the company.

This was a bit surprising, so I decided to dig deeper. Maybe brand strength doesn’t matter overall but matters when the interviewer or the questions they asked aren’t highly rated? In other words, can brand buttress less-than-stellar interviewers? Not so, according to our data. Brand didn’t matter even when you corrected for interviewer quality. In fact, of the top 10 best-rated companies on our platform, half have no brand to speak of, 3 are mid-sized YC companies that command respect in Bay Area circles but are definitely not universally recognizable, and only 2 have anything approaching household name status.

So, what’s the takeaway here? Maybe the most realistic thing we can say is that while brand likely matters a lot for getting candidates in the door, once they’re in, no matter how well-branded you are, they’re yours to lose.

Choosing the question

If brand doesn’t matter once you’ve actually gotten a candidate in the door, then what does? Turns out, the questions you ask matter a TON. As you recall, feedback on interviewing.io is symmetric, which means that in addition to the interviewer rating the candidate, the candidate also rates the interviewer, and one of the things we ask candidates is how good the question(s) they got asked were.

Question quality was extremely significant (p < 0.002 with an effect size of 1.25) when it came to whether the candidate wanted to move forward with the company. This held both when candidates did well and when they did poorly.

While we obviously can’t share the best questions (these are company interviews, after all), we can look at what candidates had to say about the best and worst-rated questions on the platform.

The good

I liked the fact that questions were building on top of each other so that previous work was not wasted and
finding ways to improve on the given solution.

Always nice to get questions that are more than just plain algorithms.

Really good asking of a classic question, opened my mind up to edge cases and considerations that I never contemplated the couple of times I’ve been exposed to the internals of this data structure.

This was the longest interviewing.io interview I have ever done, and it is also the most enjoyable one! I really like how we started with a simple data structure and implemented algorithms on top of it. It felt like working on a simple small-scale project and was fun.

He chose an interesting and challenging interview problem that made me feel like I was learning while I was solving it. I can’t think of any improvements. He would be great to work with.

I liked the question — it takes a relatively simple algorithms problem (build and traverse a tree) and adds some depth. I also liked that the interviewer connected the problem to a real product at [Redacted] which made it feel like less like a toy problem and more like a pared-down version of a real problem.

This is my favorite question that I’ve encountered on this site. it was one of the only ones that seem like it had actual real-life applicability and was drawn from a real (or potentially real) business challenge. And it also nicely wove in challenges like complexity, efficiency, and blocking.

The bad

Question wasn’t straightforward and it required a lot of thinking/understanding since functions/data structures weren’t defined until a lot later. [Redacted] is definitely a cool company to work for, but some form of structure in interviews would have been a lot more helpful. Spent a long time figuring out what the question is even asking, and interviewer was not language-agnostic.

I was expecting a more technical/design question that showcases the ability to think about a problem. Having a domain-specific question (regex) limits the ability to show one’s problem-solving skills. I am sure with enough research one could come up with a beautiful regex expression but unless this is something one does often, I don’t think it [makes for] a very good assessment.

This is not a good general interview question. A good interview question should have more than one solution with simplified constraints.

Anatomy of a good interview question

  1. Layer complexity (including asking a warmup)
  2. No trivia
  3. Real-world components/relevance to the work the company is doing are preferable to textbook algorithmic problems
  4. If you’re asking a classic algorithmic question, that’s ok, but you ought to bring some nuance and depth to the table, and if you can teach the interviewee something interesting in the process, even better!

Asking the question

One of the other things we ask candidates after their interviews is how helpful their interviewer was in guiding them to the solution. Providing your candidate with well-timed hints that get them out of the weeds without giving away too much is a delicate art that takes a lot of practice (and a lot of repetition), but how much does it matter?

As it turns out, being able to do this well matters a ton. Being good at providing hints was extremely significant (p < 0.00001 with an effect size of 2.95) when it came to whether the candidate wanted to move forward with the company (as before, we corrected for whether the interview went well).

You can see for yourself what candidates thought of their interviewers when it came to their helpfulness and engagement below. Though this attribute is a bit harder to quantify, it seems that hint quality is actually a specific instance of something bigger, namely the notion of turning something inherently adversarial into a collaborative exercise that leaves both people in a better place than where they started.3

And if you can’t do that every time, then at the very least, be present and engaged during the interview. And no matter what the devil on your shoulder tells you, no good will ever come of opening Reddit in another tab.4

One of the most memorable, pithy conversations I ever had about interviewing was with a seasoned engineer who had spent years as a very senior software architect at a huge tech company before going back to what he’d always liked in the first place, writing code. He’d conducted a lot of interviews over a career spanning several decades, and after trying out a number of different interview styles, what he settled on was elegant, simple, and satisfying. According to him, the purpose of any interview is to “see if we can be smart together.” I like that so much, and it’s advice I repeat whenever anyone will listen.

The good

I liked that you laid out the structure of the interview at the outset and mentioned that the first question did not have any tricks. That helped set the pace of the interview so I didn’t spend an inordinate amount of time on the first one.

The interview wasn’t easy, but it was really fun. It felt more like making a design discussion with a colleague than an interview. I think the question was designed/prepared to fill the 45 minute slot perfectly.

I’m impressed by how quickly he identified the issue (typo) in my hash computation code and how gently he led me to locating it myself with two very high-level hints (“what other tests cases would you try?” and “would your code always work if you look for the the pattern that’s just there at the beginning of the string?”). Great job!

He never corrected me, instead asked questions and for me to elaborate in areas where I was incorrect – I very much appreciate this.

The question seemed very overwhelming at first but the interviewer was good at helping to break it down into smaller problems and suggest we focus on one of those first.

The bad

[It] was a little nerve-wracking hearing you yawn while I was coding.

What I found much more difficult about this interview was the lack of back and forth as I went along, even if it was simple affirmation that “yes, that code you just wrote looks good”. There were times when it seemed like I was the only one who had talked in the past five minutes (I’m sure that’s an exaggeration). This made it feel much more like a performance than like a collaboration, and my heart was racing at the end as a result.

While the question was very straightforward, and [he] was likely looking for me to blow through it with no prompting whatsoever in order to consider moving forward in an interview process, it would have been helpful to get a discussion or even mild hinting from him when I was obviously stuck thinking about an approach to solve the the problem. While I did get to the answer in the end, having a conversation about it would have made it feel more like a journey and learning experience. That would have also been a strong demonstration of the collaborative culture that exists while working with teams of people at a tech company, and would have sold me more vis-a-vis my excitement level.

If an interview is set to 45 minutes, the questions should fit this time frame, because people plan accordingly. I think that if you plan to have a longer interview you should notify the interviewee beforehand, so he can be ready for it.

One issue I had with the question though is what exactly he was trying to evaluate from me with the question. At points we talking about very nitty-gritty details about python linked list or array iteration, but it was unclear at any point if that was what he was judging me on. I think in the future he could outline at the beginning what exactly he was looking for with the problem in order to keep the conversation focused and ensure he is well calibrated judging candidates.

Try to be more familiar with all the possible solutions to the problem you choose to pose to the candidate. Try to work on communicating more clearly with the candidate.

Anatomy of a good interview

  1. Set expectations, and control timing/pacing
  2. Be engaged!
  3. Familiarity with the problem and its associated rabbit holes/garden paths
  4. Good balance of hints and letting candidate think
  5. Turn the interview into a collaborative exercise where both people are free to be smart together

The art of storytelling… and the importance of being human

Beyond choosing and crafting good questions and being engaged (but not overbearing) during the interview, what else do top-rated interviewers have in common?

The pervasive common thread I noticed among the best interviewers on our platform is, as above, a bit hard to quantify but dovetails well with the notion of being engaged and creating a collaborative experience. It’s taking a dehumanizing process and elevating it to an organic experience between two capable, thinking humans. Many times, that translates into revealing something real about yourself and telling a story. It can be sharing a bit about the company you work at and why, out of all the places you could have landed, you ended up there. Or some aspect of the company’s mission that resonated with you specifically. Or how the projects you’ve worked on tie into your own, personal goals.

The good

I like the interview format, in particular how it was primarily a discussion about cool tech, as well as an honest description of the company… the discussion section was valuable, and may be a better gauge of fit anyway. It’s nice to see a company which places value on that 🙂

The interviewer was helpful throughout the interview. He didn’t mind any questions on their company’s internal technology decisions, or how it’s structured. I liked that the interviewer gave me a good insight of how the company functions.

Extremely kind and very generous with explaining everything they do at [redacted]. Really interested in the technical challenges they’re working on. Great!

Interesting questions but the most valuable and interesting thing were the insights he gave me about [redacted]. He sounded very passionate about engineering in general, particularly about the challenges they are facing at [redacted]. Would love to work with him.

The bad

[A] little bit of friendly banter (even if it’s just “how are you doing”?) at the very beginning of the interview would probably help a bit with keeping the candidate calm and comfortable.

I thought the interview was very impersonal, [and] I could not get a good read on the goal or mission of the company.

And, as we wrote about in a previous post, one of the most genuine, human things of all is giving people immediate, actionable feedback. As you recall, during the feedback step that happens after each interview, we ask interviewees if they’d want to work with their interviewer. As it turns out, there’s a very statistically significant relationship (p < 0.00005)5 between whether people think they did well and whether they’d want to work with the interviewer. This means that when people think they did poorly, they may be a lot less likely to want to work with you. And by extension, it means that in every interview cycle, some portion of interviewees are losing interest in joining your company just because they didn’t think they did well, despite the fact that they actually did.

How can one mitigate these losses? Give positive, actionable feedback immediately (or as soon as possible)! This way people don’t have time to go through the self-flagellation gauntlet that happens after a perceived poor performance, followed by the inevitable rationalization that they totally didn’t want to work there anyway.

How to be human

  1. Talk about what your company does… and what specifically about it appealed to you and made you want to join
  2. Talk about what you’re currently working on and how that fits in with what you’re passionate about
  3. When you like a candidate, give positive feedback as quickly as you can to save them from the self-flagellation that they’ll likely go through otherwise… and which might make them rationalize away wanting to work with you
  4. And, you know, be friendly. A little bit of warmth can go a long way.

Becoming a better interviewer

Interviewing people is hard. It’s hard to come up with good questions, it’s hard to give a good interview, and it’s especially hard to be human in the face of conducting a never-ending parade of interviews. But, being a good interviewer is massively important. As we saw, while your company’s brand will get people in the door, once they’ve reached the technical interview, the playing field is effectively level, and you can no longer use your brand as a crutch to mask poor questions or a lack of engagement. And in this market, where the best candidates have a ton of options, when wielded properly, a good interview that elevates a potentially cold, transactional interaction into something real and genuine can become the selling point that gets great engineers to work for you, whether you’re a household name or a startup that just got its first users.

Given how important it is to do interviews well, what are some things you can do to get better right away? One thing I found incredibly useful for coming up with good, original questions is to start a shared doc with your team where every time someone solves a problem they think is interesting, no matter how small, they jot down a quick note. These notes don’t have to be fleshed out at all, but they can be the seeds for unique interview questions that give candidates insight into the day-to-day at your company. Turning these disjointed seeds into interview questions takes thought and effort — you have to prune out a lot of the details and distill the essence of the problem into something it doesn’t take the candidate a lot of work/setup to grok, and you’ll likely have to iterate on the question a few times before you get it right — but they payoff can be huge.

Another thing you can do to get actionable feedback like the kind you saw in this post (and then immediately level up) is to get on interviewing.io as an interviewer. If you interview people in our double-blind practice pool, no one will know who you are or which company you represent, which means that you get a truly unbiased take on your interviewing ability, which includes your question quality, how excited people would be to work with you, and how good you are at helping people along without giving away too much. It’s also a great way to go beyond your team, which can be pretty awkward, and try out new questions on a very engaged, high-quality user base. You’ll also get to keep replays of your interviews so you can revisit crucial moments and figure out exactly what you need to do to get better next time.

Become a better interviewer with honest, actionable feedback from candidates

Become a better interviewer with honest, actionable feedback from candidates

Want to hone your skills as an interviewer? Want to help new interviewers at your company warm up before they officially get added to your interview loops? You can sign up to our platform as an interviewer, or (especially for groups) ping us at interviewers@interviewing.io.

1“People will forget what you said, people will forget what you did, but people will never forget how you made them feel.” -Maya Angelou

2It’s important to call out that brand and engineering brand are two separate things that can diverge pretty wildly. For instance, Target has a strong brand overall but probably not the best engineering brand (sorry). Heap, on the other hand, is one of the better-respected places to work among engineers (both on interviewing.io and off), but it doesn’t have a huge overall brand. Both the Klout and Mattermark Mindshare scores aren’t terrible for quantifying brand strength, but they’re not amazing at engineering brand strength (they’re high for Target and low for Heap). The Glassdoor score is a bit better because reviewers tend to skew engineering-heavy, but it’s still not that great of a measure. So, if anyone has a better way to quantify this stuff, let me know. If I were doing it, I’d probably look at GitHub repos of the company and its employees, who their investors are, and so on and so forth. But that’s a project that’s out of scope for this post.

3If you’re familiar with Dan Savage’s campsite rule for relationships, I think there should be a similar for interviewing… leave your candidates in better shape than when you found them.

4Let us save you the time: Trump is bad, dogs are cute, someone ate something.

5This time with even more significance!

Featured

Uncategorized

If you care about diversity, don’t just hire from the same five schools

Posted on October 24th, 2017.

EDIT: Our university hiring platform is now on Product Hunt!

If you’re a software engineer, you probably believe that, despite some glitches here and there, folks who have the technical chops can get hired as software engineers. We regularly hear stories about college dropouts, who, through hard work and sheer determination, bootstrapped themselves into millionaires. These stories appeal to our sense of wonder and our desire for fairness in the world, but the reality is very different. For many students looking for their first job, the odds of breaking into a top company are slim because they will likely never even have the chance to show their skills in an interview. For these students (typically ones without a top school on their resume), their first job is often a watershed moment where success or failure can determine which opportunities will be open to them from that point forward and ultimately define the course of their entire career. In other words, having the right skills as a student is nowhere near enough to get you a job at a top-tier tech company.

To make this point concrete, consider three (fictitious, yet indicative) student personas, similar in smarts and skills but attending vastly different colleges. All are seeking jobs as software engineers at top companies upon graduation.

Mason goes to Harvard. He has a mediocre GPA but knows that doesn’t matter to tech companies, where some of his friends already work. Come September, recent graduates and alums fly back to campus on their company’s dime in order to recruit him. While enjoying a nice free meal in Harvard Square, he has the opportunity to ask these successful engineers questions about their current work. If he likes the company, all he has to do is accept the company’s standing invitation to interview on campus the next morning.

Emily is a computer science student at a mid-sized school ranked in the top 30 for computer science. She has solid coursework in algorithms under her belt, a good GPA, and experience as an engineering intern at a local bank. On the day of her campus’s career fair, she works up the courage to approach companies – this will be her only chance to interact with companies where she dreams of working. Despite the tech industry being casual, the attire of this career fair is business formal with a tinge of sweaty. So after awkwardly putting together an outfit she would never wear again1, she locates an ancient printer on the far side of campus and prints 50 copies of her resume. After pushing through the lines in order to line up at the booths of tech companies, she gives her resume to every single tech company at the fair over the course of several hours. She won’t find out for two more weeks if she got any interviews.

Anthony goes to a state school near the town where he grew up. He is top in his class, as well as a self-taught programmer, having gone above and beyond his coursework to hack together some apps. His school’s career fair has a bunch of local, non-tech employers. He has no means of connecting with tech companies face-to-face and doesn’t know anyone who works in tech. So, he applies to nearly a hundred tech companies indiscriminately through their website online, uploading his resume and carefully crafted cover letter. He will probably never hear from them.

Career fair mania

The status quo in university recruiting revolves around career fairs and in-person campus recruiting, which have serious limitations. For one, they are extremely expensive, especially at elite schools. Prime real estate at the MIT career fair will run you a steep $18,000, for entry alone. That’s not counting the price of swag (which gets more exorbitant each year), travel, and, most importantly, the opportunity cost of attending engineers’ time. While college students command the lowest salaries, it’s not uncommon for tech companies to spend 50% more on recruiting a student than a senior engineer.

At elite schools, the lengths to which companies go to differentiate themselves is becoming more exorbitant with each passing year. In fact, students at elite colleges suffer from company overload because every major tech company, big and small, is trying to recruit them. All of this, while students at non-elite colleges are scrambling to get their foot in the door without any recruiters, let alone VPs of high-profile companies, visiting their campus.

Of course, due to this cost, companies are limited in their ability to visit colleges in person, and even large companies can visit around 15 or 20 colleges at most. This strategy overlooks top students at solid CS programs that are out of physical reach.

In an effort to overcome this, companies are attending conferences and hackathons out of desperation to reach students at other colleges. The sponsorship tier for the Grace Hopper Conference, the premier gathering for women in tech, tops out at $100,000, with the sponsorship tier to get a single interview booth starting at $30,000. Additionally, larger companies send representatives (usually engineers) to large hackathons in an effort to recruit students in the midst of a 48-hour all-nighter. However, the nature of in-person career fairs and events are that not all students will be present. Grace Hopper is famously expensive to attend as a student, especially when factoring in airfare and hotel.

This cost is inefficient at best, and prohibitive at worst, especially for small startups with low budget and brand. Career fairs serve a tiny portion of companies and a tiny portion of students, and the rest are caught in the pecuniary crossfire. Demand for talented engineers out of college who bring a different lived experience to tech has never been higher, yet companies are passing on precisely these students via traditional methods. Confounding the issue even further is the fundamental question of whether having attended a top school has much bearing on candidate quality in the first place (more on that in the section on technical screening below).

Homogeneity of hires

The focus of companies on elite schools has notable, negative implications for the diversity of their applicants. In particular, many schools that companies traditionally visit are notably lacking in diversity, especially when it comes to race and socioeconomic status. According to a survey of computer science students at Stanford, there were just fifteen Hispanic female and fifteen black female computer science majors in the 2015 graduating class total. In this analysis, the Stanford 2015 CS major was 9% Hispanic and 6% black. According to a 2015 analysis, the Harvard CS major was just 3% black and 5 percent Hispanic. Companies that are diversity-forward and constrained to recruiting at the same few schools end up competing over this small pool of diverse students. Meanwhile, there is an entire ocean of qualified, racially diverse students from less traditional backgrounds whom companies are overlooking.

The focus on elite schools also has meaningful implications on socioeconomic diversity. According to a detailed New York Times infographic, “four in 10 students from the top 0.1 percent attend an Ivy League or elite university, roughly equivalent to the share of students from poor families who attend any two- or four-year college.” The infographic highlights the rigid segmentation of students by class background in college matriculation.

Source: New York Times

The article finds that the few lower-income students who end up at elite colleges do about as well as their more affluent classmates but that attending an elite versus non-elite college makes a huge difference in future income.

The focus of tech companies on elite schools lends credence to this statistic, codifying the rigidity with which students at elite college are catapulted into the 1 percent, while others are left behind. Career-wise, it’s that first job or internship you get while you’re still in school that can determine what opportunities you have access to in the future. And yet, students at non-elite colleges have trouble accessing these very internships and jobs, or even getting a meager first round interview, contributing to the lack of social mobility in our society not for lack of skills but for lack of connections. This sucks. A lot.

The technical screen

Let’s return to our three students. Let’s say that Emily, the student who attended her college’s career fair, gets called back by one or two companies for a first round interview if her resume meets the criteria that companies are looking for. Not having an internship at a top tech company already — quite the catch-22 — puts her at a disadvantage. Anthony has little to no chance of hearing back from employers via his applications online, but let’s say that by some miracle lands a phone screen with one of the tech giants (his best shot, as there are more recruiters to look through the resume dump on the other end).

What are their experiences when it comes to prepping for upcoming technical interviews?

Mason, the Harvard student, attends an event on campus with Facebook engineers teaching him how to pass the technical interview. He also accepts a few interviews at companies he’s less excited with for practice, and just in case. While he of course needs be sharp and prepare in order to get good at these sorts of algorithmic problems, he has all of the resources he could ask for and more at his disposal. Unsurprisingly, his Facebook interview goes well.

Emily’s school has an informal, undergraduate computer science club in which they are collectively reading technical interviewing guides and trying to figure out what tech companies want from them. She has a couple interviews lined up, but all of which are for jobs she’s desperate to get. They trade tips after interviews but ultimately have a shaky understanding of they did right and wrong in the absence of post-interview feedback from companies. Only a couple of alumni from their school have made it to top tech companies in the past, and so they lack the kinds of information that Mason has on what companies are looking for. (E.g. Don’t be afraid to take hints, make sure to explain your thought process, what the heck is this CoderPad thing anyway…)

Anthony doesn’t know anyone who has a tech job like the one he’s interviewing for, and only one of his friends is also interviewing. He doesn’t know where to start when it comes to getting ready for his upcoming interview at GoogFaceSoft. He only has one shot at it with no practice interviews lined up. He prepares by googling “tech interview questions” and stumbles upon a bunch of unrealistic interview questions, many of them behavioral or outdated. He might be offered the interview and be fit for the job, but he sure doesn’t know how to pass the interview.

For students who may be unfamiliar with the art of the technical interview, algorithmic interviews can be mystifying, leading to an imbalance of information on how to succeed. Given that technical interviewing is a game, it is important that everyone knows the rules, spoken and unspoken. There are many practice resources available, but no amount of reading and re-reading Cracking the Coding Interview can prepare you for that moment when you are suddenly in a live, technical phone screen with another human.

We built a better way to hire

Ultimately, as long as university hiring relies on a campus-by-campus approach, the status quo will continue to be fundamentally inefficient and unmeritocratic. No company, not even the tech giants, can cover every school or every resume submitted online. And, in the absence of any meaningful information on a student’s resume, companies default to their university as the only proxy. This approach is inefficient at best and, at worst, it’s the first in a series of watershed moments that derail the promise of social mobility for the non-elite, taking with them any hope of promoting diversity among computer science students.

Because this level of inequity, placed for maximum damage right at the start of people’s careers, really pissed us off, we decided to do something about it. interviewing.io’s answer to the unfortunate status quo is a university-specific hiring platform. If you’re already familiar with how core interviewing.io works, you’ll see that the premise is exactly the same. We give out free practice to students, and use their performance in practice to identify top performers, completely independently of their pedigree. Those top performers then get to interview with companies like Lyft and Quora on our platform. In other words, we’re excited to provide students with pathways into tech that don’t involve going to an elite school or knowing someone on the inside. So far, we’ve been very pleased with the results. You can see our student demographics and where they’re coming from below. Students from all walks of life, whether they’re from MIT or a school you’d never visit, are flocking to the platform, and we couldn’t be prouder.

school tier distribution

interviewing.io evaluates students based on their coding skills, not their resume. We are open to students regardless of their university affiliation, college major, and pretty much anything else (we ask for your class year to make sure you’re available when companies want you and that’s about it). Unlike traditional campus recruiting, we attract students organically (getting free practice with engineers from top companies is a pretty big draw) from schools big and small from across the country.

student heatmap

We’re also proud that almost 40 percent of our university candidates come from backgrounds that are underrepresented in tech.

student heatmap

Because of our completely blind, skills-first approach, we’ve seen an interesting phenomenon happen time and time again: when a student unmasks at the end of a successful interview, the company in question realizes that the student who just aced their technical phone screen was one whose resume was sitting at the bottom of the pile all along.

In addition to identifying top students who bring a different lived experience to tech, we’re excited about the economics of our model. With interviewing.io, a mid-sized startup can staff their entire intern class for the same cost as attending 1-2 career fairs at top schools… with a good chunk of those interns coming from underrepresented backgrounds. Want to hire interns and new grads in the most efficient, fair way possible? Sign up to be an employer on our university platform!

Meena runs interviewing.io’s university hiring platform. We help companies hire college students from all over the US, with a focus on diversity. Prior to joining interviewing.io, Meena was a software engineer at Clever, and before that, Meena was in college on the other side of the engineer interviewing equation.

1At least her school didn’t send out this.

Featured

Uncategorized

We analyzed thousands of technical interviews on everything from language to code style. Here’s what we found.

Posted on June 13th, 2017.

Note: Though I wrote most of the words in this post, the legendary Dave Holtz did the heavy lifting on the data side. See more of his work on his blog.

If you’re reading this post, there’s a decent chance that you’re about to re-enter the crazy and scary world of technical interviewing. Maybe you’re a college student or fresh grad who is going through the interviewing process for the first time. Maybe you’re an experienced software engineer who hasn’t even thought about interviews for a few years. Either way, the first step in the interviewing process is usually to read a bunch of online interview guides (especially if they’re written by companies you’re interested in) and to chat with friends about their experiences with the interviewing process (both as an interviewer and interviewee). More likely than not, what you read and learn in this first, “exploratory” phase of the interview process will inform how you choose to prepare moving forward.

There are a few issues with this typical approach to interview preparation:

  • Most interview guides are written from the perspective of one company. While Company A may really value efficient code, Company B may place more of an emphasis on high-level problem-solving skills. Unless your heart is set on Company A, you probably don’t want to give too much weight to what they value.
  • People lie sometimes, even if they don’t mean to. In writing, companies may say they’re language agnostic, or that it’s worthwhile to explain your thought process, even if the answer isn’t quite right. However, it’s not clear if this is actually how they act! We’re not saying that tech companies are nefarious liars who are trying to mislead their applicant pool. We’re just saying that sometimes implicit biases sneak in and people aren’t even aware of them.
  • A lot of the “folk knowledge” that you hear from friends and acquaintances may not be based in fact at all. A lot of people assume that short interviews spell doom. Similarly, everyone can recall one long interview after which they’ve thought to themselves, “I really hit it off with that interviewer, I’ll definitely get passed onto the next stage.” In the past, we’ve seen that people are really bad at gauging how they did in interviews. This time, we wanted to look directly at indicators like interview length and see if those actually matter.

Here at interviewing.io, we are uniquely positioned to approach technical interviews and their outcomes in a data-driven way. This time, we’ve opted for a quick (if not dirty) and quantitative analysis. In other words, rather than digging deep into individual interviews, we focused on easily measurable attributes that many interviews share, like duration and language choice. In upcoming posts, we’ll be delving deeper into the interview content itself. If you’re new to our blog and want to get some context about how interviewing.io works and what interview data we collect, please take a look at the section called “The setup” below. Otherwise, please skip over that and head straight for the results!

The setup

interviewing.io is a platform where people can practice technical interviewing anonymously, and if things go well, unlock the ability to interview anonymously, whenever they’d like, with top companies like Uber, Lyft, and Twitch. The cool thing is that both practice interviews and real interviews with companies take place within the interviewing.io ecosystem. As a result, we’re able to collect quite a bit of interview data and analyze it to better understand technical interviews, the signal they carry, what works and what doesn’t, and which aspects of an interview might actually matter for the outcome.

Each interview, whether it’s practice or real, starts with the interviewer and interviewee meeting in a collaborative coding environment with voice, text chat, and a whiteboard, at which point they jump right into a technical question. Interview questions tend to fall into the category of what you’d encounter in a phone screen for a back-end software engineering role. During these interviews, we collect everything that happens, including audio transcripts, data and metadata describing the code that the interviewee wrote and tried to run, and detailed feedback from both the interviewer and interviewee about how they think the interview went and what they thought of each other.

If you’re curious, you can see what the feedback forms for interviewers and interviewees look like below — in addition to one direct yes/no question, we also ask about a few different aspects of interview performance using a 1-4 scale. We also ask interviewees some extra questions that we don’t share with their interviewers, and one of the things we ask is whether an interviewee has previously seen the question they just worked on.

Feedback form for interviewers

Feedback form for interviewers

Feedback form for interviewees

Feedback form for interviewees

The results

Before getting into the thick of it, it’s worth noting that the conclusions below are based on observational data, which means we can’t make strong causal claims… but we can still share surprising relationships we’ve observed and explain what we found so you can draw your own conclusions.

Having seen the interview question before

“We’re talking about practice!” -Allen Iverson

First thing’s first. It doesn’t take a rocket scientist to suggest that one of the best ways to do better in interviews is to… practice interviewing. There are a number of resources out there to help you practice, ours among them. One of the main benefits of working through practice problems is that you reduce the likelihood of being asked to solve something you’ve never seen before. Balancing that binary search tree will be much less intimidating if you’ve already done it once or twice.

We looked at a sample of ~3000 interviews and compared the outcome to whether the interviewee had seen the interview question before. You can see the results in the plot below.

seen_interview_before_plot

Unsurprisingly, interviewees who had seen the question were 16.6% more likely to be considered hirable by their interviewer. This difference is statistically significant (p < 0.001).1

Does it matter what language you code in?

“Whoever does not love the language of his birth is lower than a beast and a foul smelling fish.” -Jose Rizal

You might imagine that different languages lead to better interviews. For instance, maybe the readability of Python gives you a leg up in interviews. Or perhaps the fact that certain languages handle data structures in a particularly clean way makes common interview questions easier. We wanted to see whether or not there were statistically significant differences in interview performance across different interview languages.

To investigate, we grouped interviews on our platform by interview language and filtered out any languages that were used in fewer than 5 interviews (this only threw out a handful of interviews). After doing this, we were able to look at interview outcome and how it varied as a function of interview language.

The results of that analysis are in the chart below. Any non-overlapping confidence intervals represent a statistically significant difference in how likely an interviewee is to ‘pass’ an interview, as a function of interview language. Although we don’t do a pairwise comparison for every possible pair of languages, the data below suggest that generally speaking, there aren’t statistically significant differences between the success rate when interviews are conducted in different languages.2

interview_varies_with_success_rate_plot

That said, one of the most common mistakes we’ve observed qualitatively is people choosing languages they’re not comfortable in and then messing up basic stuff like array length lookup, iterating over an array, instantiating a hash table, and so on. This is especially mortifying when interviewees purposely pick a fancy-sounding language to impress their interviewer. Trust us, wielding your language of choice comfortably beats out showing off in a fancy-sounding language you don’t know well, every time.

Even if language doesn’t matter… is it advantageous to code in the company’s language of choice?

“God help me, I’ve gone native.” -Margaret Blaine

It’s all well and good that, in general, interview language doesn’t seem particularly correlated with performance. However, you might imagine that there could be an effect depending on the language that a given company uses. You could imagine a Ruby shop saying “we only hire Ruby developers, if you interview in Python we’re less likely to hire you.” On the flip side, you could imagine that a company that writes all of their code in Python is going to be much more critical of an interviewee in Python – they know the ins and outs of the language, and might judge the candidate for doing all sorts of “non-pythonic” things during their interview.

The chart below is similar to the chart which showed differences in interview success rate (as measured by interviewers being willing to hire the interviewee) for C++, Java, and Python. However, this chart also breaks out performance by whether or not the interview language is in the company’s stack. We restrict this analysis to C++, Java and Python because these are the three languages where we had a good mixture of interviews where the company did and did not use that language. The results here are mixed. When the interview language is Python or C++, there’s no statistically significant difference between the success rates for interviews where the interview language is or is not a language in the company’s stack. However, interviewers who interviewed in Java were more likely to succeed when interviewing with a Java shop (p=0.037).

So, why is it that coding in the company’s language seems to be helpful when it’s Java, but not when it’s Python or C++? One possible explanation is that the communities that exist around certain programming languages (such as Java) place a higher premium on previous experience with the language. Along these lines, it’s also possible that interviewers from companies that use Java are more likely to ask questions that favor those with a pre-existing knowledge of Java’s idiosyncrasies.

language_success_rate_company_plot

What about the relationship between what language you program in and how good of a communicator you’re perceived to be?

“To handle a language skillfully is to practice a kind of evocative sorcery.” -Charles Baudelaire

Even if language choice doesn’t matter that much for overall performance (Java-wielding companies notwithstanding), we were curious whether different language choices led to different outcomes in other interview dimensions. For instance, an extremely readable language, like Python, may lead to interview candidates who are assessed to have communicated better. On the other hand, a low-level language like C++ might lead to higher scores for technical ability. Furthermore, very readable or low-level languages might lead to correlations between these two scores (for instance, maybe they’re a C++ interview candidate who can’t explain at all what he or she is doing but who writes very efficient code). The chart below suggests that there isn’t really any observable difference between how candidates’ technical and communication abilities are perceived, across a variety of programming languages.

Furthermore, no matter what, poor technical ability seems highly correlated with poor communication ability – regardless of language, it’s relatively rare for candidates to perform well technically but not effectively communicate what they’re doing (or vice versa), largely (and fortunately) debunking the myth of the incoherent, fast-talking, awkward engineer.3

Interview duration

“It’s fine when you careen off disasters and terrifyingly bad reviews and rejection and all that stuff when you’re young; your resilience is just terrific.” -Harold Prince

We’ve all had the experience of leaving an interview and just feeling like it went poorly. Often, that feeling of certain underperformance is motivated by rules of thumb that we’ve either come up with ourselves or heard repeated over and over again. You might find yourself thinking, “the interview didn’t last long? That’s probably a bad sign… ” or “I barely wrote anything in that interview! I’m definitely not going to pass.” Using our data, we wanted to see whether these rules of thumb for evaluating your interview performance had any merit.

First, we looked at the length of the interview. Does a shorter interviewer mean you were such a trainwreck that the interviewer just had to stop the interview early? Or was it maybe the case that the interviewer had less time than normal, or had seen in just a short amount of time that you were an awesome candidate? The plot below shows the distributions of interview length (measured in minutes) for both successful and unsuccessful candidates. A quick look at this chart suggests that there is no difference in the distribution of interview lengths between interviews that go well and interviews that don’t — the average length of interviews where the interviewer wanted to hire the candidate was 51.00 minutes, whereas the average length of interviews where the interviewer did not was 49.95 minutes. This difference is not statistically significant.4

interview_duration_plot

Amount of code written

“Brevity is the soul of wit.” -William Shakespeare

You may have experienced an interview where you were totally stumped. The interviewer asks you a question you barely understand, you repeat back to him or her “binary search what?”, and you basically write no code during your interview. You might hope that you could still pass an interview like this through sheer wit, charm, and high-level problem-solving skills. In order to assess whether or not this was true, we looked at the final character length of code written by the interviewee. The plot below shows the distributions of character length for both successful and unsuccessful. A quick look at this chart suggests that there is a difference between the two — interviews that don’t go well tend to have less code. There are two phenomena that may contribute to this. First, unsuccessful interviewers may write less code to begin with. Additionally, they may be more prone to delete large swathes of code they’ve written that either don’t run or don’t return the expected result.

interview_code_length_plot

On average, successful interviews had final interview code that was on average 2045 characters long, whereas unsuccessful ones were, on average, 1760 characters long. That’s a big difference! This finding is statistically significant and probably not very surprising.

Code modularity

“The mark of a mature programmer is willingness to throw out code you spent time on when you realize it’s pointless.” -Bram Cohen

In addition to just look at how much code you write, we can also think about the type of code you write. Conventional wisdom suggests that good programmers don’t recycle code – they write modular code that can be reused over and over again. We wanted to know if that type of behavior was actually rewarded during the interview process. In order to do so, we looked at interviews conducted in Python5 and counted how many function definitions appeared in the final version of the interview. We wanted to know if successful interviewees defined more functions — while having more function handlers is not the definition of modularity, in our experience, it’s a pretty strong signal of it. As always, it’s impossible to make strong causal claims about this – it might be the case that certain interviewers (who are more or less lenient) ask interview questions that lend themselves to more or fewer functions. Nonetheless, it is an interesting trend to investigate!

The plot below shows the distribution of the number of Python functions defined for both candidates who the interviewer said they would hire and candidates who the interviewer said they would not hire. A quick look at this chart suggests that there is a difference in the distribution of function definitions between interviews that go well and interviews that don’t. Successful interviewees seem to define more functions.

python_functions_plot

On average, successful candidates interviewing in Python define 3.29 functions, whereas unsuccessful candidates define 2.71 functions. This finding is statistically significant. The upshot here is that interviewers really do reward the kind of code they say they want you to write.

Does it matter if your code runs?

“Move fast and break things. Unless you are breaking stuff, you are not moving fast enough.” -Mark Zuckerberg
“The most effective debugging tool is still careful thought, coupled with judiciously placed print statements.” -Brian Kernighan

A common refrain in technical interviews is that interviewers don’t actually care if your code runs – what they care about is problem-solving skills. Since we collect data on the code interviewees run and whether or not that code compiles, we wanted to see if there was evidence for this in our data. Is there any difference between the percentage of code that compiles error-free in successful interviews versus unsuccessful interviews? Furthermore, can interviewees actually still get hired, even if they make tons of syntax errors?

In order to get at this question, we looked at the data. We restricted our dataset to interviews longer than 10 minutes with more than 5 unique instances of code being executed. This helped filter out interviews where interviewers didn’t actually want the interviewee to run code, or where the interview was cut short for some reason. We then measured the percent of code runs that resulted in errors.5 Of course, there are some limitations to this approach – for instance, candidates could execute code that does compile but gives a slightly incorrect answer. They could also get the right answer and write it to stderr! Nonetheless, this should give us a directional sense of whether or not there’s a difference.

The chart below gives a summary of this data. The x-axis shows the percentage of code executions that were error-free in a given interview. So an interview with 3 code executions and 1 error message would count towards the “30%-40%” bucket. The y-axis indicates the percentage of all interviews that fall in that bucket, for both successful and unsuccessful interviews. Just eyeballing the chart below, one gets the sense that on average, successful candidates run more code that goes off without an error. But is this difference statistically significant?

does_code_compile2

On average, successful candidates’ code ran successfully (didn’t result in errors) 64% of the time, whereas unsuccessful candidates’ attempts to compile code ran successfully 60% of the time, and this difference was indeed significant. Again, while we can’t make any causal claims, the main takeaway is that successful candidates do usually write code that runs better, despite what interviewers may tell you at the outset of an interview.

Should you wait and gather your thoughts before writing code?

“Never forget the power of silence, that massively disconcerting pause which goes on and on and may at last induce an opponent to babble and backtrack nervously.” -Lance Morrow

We were also curious whether or not successful interviewees tended to take their time in the interview. Interview questions are often complex! After being presented with a question, there might be some benefit to taking a step back and coming up with a plan, rather than jumping right into things. In order to get a sense of whether or not this was true, we measured how far into a given interview candidates first executed code. Below is a histogram showing how far into interviews both successful and unsuccessful interviewees first ran code. Looking quickly at the histogram, you can tell that successful candidates do in fact wait a bit longer to start running code, although the magnitude of the effect isn’t huge.

how_soon_run_code_plot

More specifically, on average, candidates with successful interviews first run code 27% of the way through the interview, whereas candidates with unsuccessful interviews first run code 23.9% of the way into the interview, and this difference is significant. Of course, there are alternate explanations for what’s happening here. For instance, perhaps successful candidates are better at taking the time to sweet-talk their interviewer. Furthermore, the usual caveat that we can’t make causal claims applies – if you just sit in an interview for an extra 5 minutes in complete silence, it won’t help your chances. Nonetheless, there does seem to be a difference between the two cohorts.

Conclusions

All in all, this post was our first attempt to understand what does and does not typically lead to an interviewer saying “you know what, I’d really like to hire this person.” Because all of our data are observational, its hard to make causal claims about what we see. While successful interviewees may exhibit certain behaviors, adopting those behaviors doesn’t guarantee success. Nonetheless, it does allow us to support (or call bullshit on) a lot of the advice you’ll read on the internet about how to be a successful interviewee.

That said, there is much still to be done. This was a first, quantitative pass over our data (which is, in many ways, a treasure trove of interview secrets), but we’re excited to do a deeper, qualitative dive and actually start to categorize different questions to see which carry the most signal as well as really get our head around 2nd order behaviors that you can’t measure easily by running a regex over a code sample or measuring how long an interview took. If you want to help us with this and are excited to listen to a bunch of technical interviews, drop me a line (at aline@interviewing.io)!

1All error bars in this post represent a 95% confidence interval.

2There were more languages than these on our platform, but the more obscure the language, the less data points we have. For instance, all interviews in Brainfuck were clearly successful. Kidding.

3The best engineers I’ve met have also been legendarily good at breaking down complex concepts and explaining them to laypeople. Why the infuriating myth of the socially awkward, incoherent tech nerd continues to exist, I have absolutely no idea.

4For every comparison of distributions in this post, we use both a Fisher-Pitman permutation test to compare the difference in the means of the distributions.

5We limit this analysis to interviews in Python because it lends itself particularly well to the identification of function definitions with a simple parsing script.

6We calculate this by looking at what percentage of the time the interviewee executed code that resulted in either an error or non-error output contained the term “error” or “traceback.”

Featured

Uncategorized

LinkedIn endorsements are dumb. Here’s the data.

Posted on February 27th, 2017.

If you’re an engineer who’s been endorsed on LinkedIn for any number of languages/frameworks/skills, you’ve probably noticed that something isn’t quite right. Maybe they’re frameworks you’ve never touched or languages you haven’t used since freshman year of college. No matter the specifics, you’re probably at least a bit wary of the value of the LinkedIn endorsements feature. The internets, too, don’t disappoint in enumerating some absurd potential endorsements or in bemoaning the lack of relevance of said endorsements, even when they’re given in earnest.

Having a gut feeling for this is one thing, but we were curious about whether we could actually come up with some numbers that showed how useless endorsements can be, and we weren’t disappointed. If you want graphs and numbers, scroll down to the “Here’s the data” section below. Otherwise, humor me and read my completely speculative take on why endorsements exist in the first place.

LinkedIn endorsements are just noisy crowdsourced tagging

Pretend for a moment that you’re a recruiter who’s been tasked with filling an engineering role. You’re one of many people who pays LinkedIn ~$9K/year for a recruiter seat on their platform1. That hefty price tag broadens your search radius (which is otherwise artificially constrained) and lets you search the entire system. Let’s say you have to find a strong back-end engineer. How do you begin?

Unfortunately, LinkedIn’s faceted search (pictured below) doesn’t come with a “can code” filter2.

So, instead of searching for what you really want, you have to rely on proxies. Some obvious proxies, even though they’re not that great, might be where someone went to school or where they’ve worked before. However, if you need to look for engineering ability, you’re going to have to get more specific. If you’re like most recruiters, you’ll first look for the main programming language your company uses (despite knowledge of a specific language not being a good indicator of programming ability and despite most hiring managers not caring which languages their engineers know) and then go from there.

Now pretend you’re LinkedIn. You have no data about how good people are at coding, and though you do have a lot of resume/biographical data, that doesn’t tell the whole story. You can try relying on engineers filling in their own profiles with languages they know, but given that engineers tend to be pretty skittish about filling in their LinkedIn profile with a bunch of buzzwords, what do you do?

You build a crowdsourced tagger, of course! Then, all of a sudden, your users will do your work for you. Why do I think this is the case? Well, if LinkedIn cared about true endorsements rather than perpetuating the skills-based myth that keeps recruiters in their ecosystem, they could have written a weighted endorsement system by now, at the very least. That way, an endorsement from someone with expertise in some field might mean more than an endorsement from your mom (unless, of course, she’s an expert in the field).

But they don’t do that, or at least they don’t surface it in candidate search. It’s not worth it. Because the point of endorsements isn’t to get at the truth. It’s to keep recruiters feeling like they’re getting value out of the faceted search they’re paying almost $10K per seat for. In other words, improving the fidelity of endorsements would likely cannibalize LinkedIn’s revenue.

You could make the counterargument that despite the noise, LinkedIn endorsements still carry enough signal to be a useful first-pass filter and that having them is more useful than not having them. This is the question I was curious about, so I decided to cross-reference our users’ interview data with their LinkedIn endorsements.

The setup

So, what data do we have? First, for context, interviewing.io is a platform where people can practice technical interviewing anonymously with interviewers from top companies and, in the process, find jobs. Do well in practice, and you get guaranteed (and anonymous!) technical interviews at companies like Uber, Twitch, Lyft, and more. Over the course of our existence, we’ve amassed performance data from close to 5,000 real and practice interviews.

When an interviewer and an interviewee match on our platform, they meet in a collaborative coding environment with voice, text chat, and a whiteboard and jump right into a technical question. Interview questions on the platform tend to fall into the category of what you’d encounter at a phone screen for a back-end software engineering role. Some examples of these interviews can be found on our public recordings page.

After every interview, interviewers rate interviewees on a few different dimensions, including technical ability. Technical ability gets rated on a scale of 1 to 4, where 1 is “poor” and 4 is “amazing!”. On our platform, a score of 3 or above has generally meant that the person was good enough to move forward. You can see what our feedback form looks like below:

new_interviewer_feedback_circled

As promised, I cross-referenced our data with our users’ LinkedIn profiles and found some interesting, albeit not that surprising, stuff.

Endorsements vs. what languages people actually program in

The first thing I looked at was whether the programming language people interviewed in most frequently had any relationship to the programming language for which they were most endorsed. It was nice that, across the board, people tended to prefer one language for their interviews, so we didn’t really have a lot of edge cases to contend with.

It turns out that people’s interview language of choice matched their most endorsed language on LinkedIn just under 50% of the time.

Of course, just because you’ve been endorsed a lot for a specific language doesn’t mean that you’re not good at the other languages you’ve been endorsed for. To dig deeper, I took a look at whether our users had been endorsed for their interview language of choice at all. It turns out that people were endorsed for their language of choice 72% of the time. This isn’t a particularly powerful statement, though, because most people on our platform have been endorsed for at least 5 programming languages.

That said, even when an engineer had been endorsed for their interview language of choice, that language appeared in their “featured skills” section only 31% of the time. This means that most of the time, recruiters would have to click “View more” (see below) to see the language that people prefer to code in, if it’s even listed in the first place.

So, how often were people endorsed for their language of choice? Quantifying endorsements3 is a bit fuzzy, but to answer this meaningfully, I looked at how often people were endorsed for that language relative to how often they were endorsed for their most-endorsed language, in the cases when the two languages weren’t the same (recall that this happened about half the time). Perhaps if these numbers were close to 1 most of the time, then endorsements might carry some signal. As you can see in the histogram below, this was not the case at all.

The x-axis above is how often people were endorsed for their interview language of choice relative to their most-endorsed language. The bars on the left are cases when someone was barely endorsed for their language of choice, and all the way to right are cases when people were endorsed for both languages equally as often. All told, the distribution is actually pretty uniform, making for more noise than signal.

Endorsements vs. interview performance

The next thing I looked at was whether there was any correspondence between how heavily endorsed someone was on LinkedIn and their interview performance. This time, to quantify the strength of someone’s endorsements4, I looked at how many times someone was endorsed for their most-endorsed language and correlated that to their average technical score in interviews on interviewing.io.

Below, you can see a scatter plot of technical ability vs. LinkedIn endorsements, as well as my attempt to fit a line through it. As you can see, the R^2 is piss-poor, meaning that there isn’t a relationship between how heavily endorsed someone is and their technical ability to speak of.

Endorsements vs. no endorsements… and closing thoughts

Lastly, I took a look at whether having any endorsements in the first place mattered with respect to interview performance. If I’m honest, I was hoping there’d be a negative correlation, i.e. if you don’t have endorsements, you’re a better coder. After running some significance testing, though, it became clear that having any endorsements at all (or not) doesn’t matter.

So, where does this leave us? As long as there’s money to be made in peddling low-signal proxies, endorsements won’t go away and probably won’t get much better. It is my hope, though, that any recruiters reading this will take a second look at the candidates they’re sourcing and try to, where possible, look at each candidate as more than the sum of their buzzword parts.

Thanks to Liz Graves for her help with the data annotation for this post.

1Roughly 60% of LinkedIn’s revenue comes from recruiting, so you can see why this stuff matters.

2You know what comes with a can code filter? interviewing.io does! We know how people are doing rigorous, live technical interviews, which, in turn, lets us reliably predict how well they will do in future interviews. Roughly 60%3 of our candidates pass technical phone screens and make it onsite. Want to use us to hire?

3There are a lot of possible approaches to comparing endorsements, to each other and to other stuff. In this post, I decided to, as much as possible mimic how a recruiter might think about a candidate’s endorsements when looking at their profile. Recruiters are busy (I know; I used to be one) and get paid to make quick judgments. Therefore, given that LinkedIn doesn’t normalize endorsements for you, if a recruiter wanted to do it, they’d have to actually add up all of someone’s endorsements and then do a bunch of pairwise division. This isn’t sustainable, and it’s much easier and faster to look at the absolute numbers. For this exact reason, when comparing the endorsements for two languages, I chose to normalize the relative to each other rather than relative to all other endorsements. And when trying to quantify the strength of someone’s programming endorsements as a whole, I opted to just count the number of endorsements for someone’s most-endorsed language.

4See footnote 3 above; I used the same rationale.

Featured

Uncategorized

Lessons from 3,000 technical interviews… or how what you do after graduation matters way more than where you went to school

Posted on December 28th, 2016.

The first blog post I published that got any real attention was called “Lessons from a year’s worth of hiring data“. It was my attempt to understand what attributes of someone’s resume actually mattered for getting a software engineering job. Surprisingly, as it turned out, where someone went to school didn’t matter at all, and by far and away, the strongest signal came from the number of typos and grammatical errors on their resume.

Since then, I’ve discovered (and written about) how useless resumes are, but ever since writing that first post, I’ve been itching to do something similar with interviewing.io’s data. For context, interviewing.io is a platform where people can practice technical interviewing anonymously and, in the process, find jobs — do well in practice, and you get guaranteed (and anonymous!) technical interviews at companies like Uber, Twitch, Lyft, and more. Over the course of our existence, we’ve amassed performance data from thousands of real and practice interviews. Data from these interviews sets us up nicely to look at what signals from an interviewee’s background might matter when it comes to performance.

As often happens, what we found was surprising, and some of it runs counter to things I’ve said and written on the subject. More on that in a bit.

The setup

When an interviewer and an interviewee match on our platform, they meet in a collaborative coding environment with voice, text chat, and a whiteboard and jump right into a technical question (check out our recordings page to see this in action). Interview questions on the platform tend to fall into the category of what you’d encounter at a phone screen for a back-end software engineering role, and interviewers typically come from a mix of large companies like Google, Facebook, and Uber, as well as engineering-focused startups like Asana, Mattermark, KeepSafe, and more.

After every interview, interviewers rate interviewees on a few different dimensions, including technical ability. Technical ability gets rated on a scale of 1 to 4, where 1 is “poor” and 4 is “amazing!”. On our platform, a score of 3 or above has generally meant that the person was good enough to move forward. You can see what our feedback form looks like below:

new_interviewer_feedback_circled

The results

To run the analysis for this post, we cross-referenced interviewees’ average technical scores (circled in red in the feedback form above) with the attributes below to see which ones mattered most. Here’s the full attribute list1:

  • Attended a top computer science school
  • Worked at a top company
  • Took classes on Udacity/Coursera2
  • Founded a startup
  • Master’s degree
  • Years of experience

Of all of these, only 3 attributes emerged as statistically significant: top school, top company, and classes on Udacity/Coursera. Apparently, as the fine gentlemen of Metallica once said, nothing else matters. In the graph below, you can see the effect size of each of the significant attributes (attributes that didn’t achieve significance don’t have bars).

As I said at the outset, these results were quite surprising, and I’ll take a stab at explaining each of the outcomes below.

Top school & top company

Going into this, I expected top company to matter but not top school. The company thing makes sense — you’re selecting people who’ve successfully been through at least one interview gauntlet, so the odds of them succeeding at future ones should be higher.

Top school is a bit more muddy, and it was indeed the least impactful of the significant attributes. Why did schooling matter in this iteration of the data but didn’t matter when I was looking at resumes? I expect the answer lies in the disparity between performance in an isolated technical phone screen versus what happens when a candidate actually goes on site. With the right preparation, the technical phone interview is manageable, and top schools often have rigorous algorithms classes and a culture of preparing for technical phone screens (to see why this culture matters and how it might create an unfair advantage for those immersed in it, see my post about how we need to rethink the technical interview). Whether passing an algorithmic technical phone screen means you’re a great engineer is another matter entirely and hopefully the subject of a future post.

Udacity/Coursera

MOOC participation (Udacity and Coursera in particular, as those were the ones interviewing.io users gravitated to most) mattering as much as it did (and mattering way more than pedigree) was probably the most surprising finding here, and so it merited some additional digging.

In particular, I was curious about the interplay between MOOCs and top schools, so I partitioned MOOC participants into people who had attended top schools vs. people who hadn’t. When I did that, something startling emerged. For people who attended top schools, completing Udacity or Coursera courses didn’t appear to matter. However, for people who did not, the effect was huge, so huge, in fact, that it dominated the board. Moreover, interviewees who attended top schools performed significantly worse than interviewees who had not attended top schools but HAD taken a Udacity or Coursera course.

So, what does this mean? Of course (as you’re probably thinking to yourself while you read this), correlation doesn’t imply causation. As such, rather than MOOCs being a magic pill, I expect that people who gravitate toward online courses (and especially those who might have a chip on their shoulder about their undergrad pedigree and end up drinking from the MOOC firehose) already tend to be abnormally driven. But, even with that, I’d be hard pressed to say that completing great online CS classes isn’t going to help you become a better interviewee, especially if you didn’t have the benefit of a rigorous algorithms class up until then. Indeed, a lot of the courses we saw people take focused around algorithms, so it’s no surprise that supplementing your preparation with courses like this could be tremendously useful. Some of the most popular courses we saw were:

Udacity
Design of Computer Programs
Intro to Algorithms
Computability, Complexity & Algorithms

Coursera
Algorithms Specialization
Functional Programming Principles in Scala
Machine Learning
Algorithms on Graphs

Founder status

Having been a founder didn’t matter at all when it came to technical interview performance. This, too, isn’t that surprising. The things that make one a good founder are not necessarily the things that make one a good engineer, and if you just came out of running a startup and are looking to get back into an individual contributor role, odds are, your interview skills will be a bit rusty. This is, of course, true of folks who’ve been in industry but out of interviewing for some time, as you’ll see below.

Master’s degree & years of experience

No surprises here. I’ve ranted quite a bit about the disutility of master’s degrees, so I won’t belabor the point.

Years of experience, too, shouldn’t be that surprising. For context, our average user has about 5 years of experience, with most having between 2 and 10. I think we’ve all anecdotally observed that the time spent away from your schooling doesn’t do you any favors when it comes to interview prep. You can see a scatter plot of interview performance vs. years of experience below as well as my attempt to fit a line through it (as you can see, the R^2 is piss poor, meaning that there isn’t a relationship to speak of).

Closing thoughts

If you know me, or even if you’ve read some of my writing, you know that, in the past, I’ve been quite loudly opposed to the concept of pedigree as a useful hiring signal. With that in mind, I feel like I owe clearly acknowledge, up front, that we found this time runs counter to my stance. But that’s the whole point, isn’t it? You live, you get some data, you make some graphs, you learn, you make new graphs, and you adjust. Even with this new data, I’m excited to see that what mattered way more than pedigree was the actions people took to better themselves (in this case, rounding out their existing knowledge with MOOCs), regardless of their background.

Most importantly, these findings have done nothing to change interviewing.io’s core mission. We’re creating an efficient and meritocratic way for candidates and companies to find each other, and as long as you can code, we couldn’t care less about who you are or where you come from. In our ideal world, all these conversations about which proxies matter more than others would be moot non-starters because coding ability would stand for, well, coding ability. And that’s the world we’re building.

Thanks to Roman Rivilis for his help with data annotation for this post.

1For fun, we tried relating browser and operating system choice to interview performance, (smugly) expecting Chrome users to dominate. Not so. Browser choice didn’t matter, nor did what OS people used while interviewing.

2We got this data from looking at interviewees’ LinkedIn profiles.

Featured

Uncategorized

You can’t fix diversity in tech without fixing the technical interview.

Posted on November 2nd, 2016.

In the last few months, several large players, including Google and Facebook, have released their latest and ultimately disappointing diversity numbers. Even with increased effort and resources poured into diversity hiring programs, Facebook’s headcount for women and people of color hasn’t really increased in the past 3 years. Google’s numbers have looked remarkably similar, and both players have yet to make significant impact in the space, despite a number of initiatives spanning everything from a points system rewarding recruiters for bringing in diverse candidates, to increased funding for tech education, to efforts to hire more diverse candidates in key leadership positions.

Why have gains in diversity hiring been so lackluster across the board?

Facebook justifies these disappointing numbers by citing the ubiquitous pipeline problem, namely that not enough people from underrepresented groups have access to the education and resources they need to be set up for success. And Google’s take appears to be similar, judging from what portion of their diversity-themed, forward-looking investments are focused on education.

In addition to blaming the pipeline, since Facebook’s and Google’s announcements, a growing flurry of conversations have loudly waxed causal about the real reason diversity hiring efforts haven’t worked. These have included everything from how diversity training isn’t sticky enough, to how work environments remain exclusionary and thereby unappealing to diverse candidates, to improper calibration of performance reviews to not accounting for how marginalized groups actually respond to diversity-themed messaging.

While we are excited that more resources are being allocated to education and inclusive workplaces, at interviewing.io, we posit another reason for why diversity hiring initiatives aren’t working. After drawing on data from thousands of technical interviews, it’s become clear to us that technical interviewing is a process whose results are nondeterministic and often arbitrary. We believe that technical interviewing is a broken process for everyone but that the flaws within the system hit underrepresented groups the hardest… because they haven’t had the chance to internalize just how much of technical interviewing is a numbers game. Getting a few interview invites here and there through increased diversity initiatives isn’t enough. It’s a beginning, but it’s not enough. It takes a lot of interviews to get used to the process and the format and to understand that the stuff you do in technical interviews isn’t actually the stuff you do at work every day. And it takes people in your social circle all going through the same experience, screwing up interviews here and there, and getting back on the horse to realize that poor performance in one interview isn’t predictive of whether you’ll be a good engineer.

A brief history of technical interviewing

A definitive work on the history of technical interviewing was surprisingly hard to find, but I was able to piece together a narrative by scouring books like How Would You Move Mount Fuji, Programming Interviews Exposed, and the bounty of the internets. The story goes something like this.

Technical interviewing has its roots as far back as 1950s Palo Alto, at Shockley Semiconductor Laboratories. Shockley’s interviewing methodology came out of a need to separate the innovative, rapidly moving, Cold War-fueled tech space from hiring approaches taken in more traditionally established, skills-based assembly-line based industry. And so, he relied on questions that could gauge analytical ability, intellect, and potential quickly. One canonical question in this category has to do with coins. You have 8 identical-looking coins, except one is lighter than the rest. Figure out which one it is with just two weighings on a pan balance.

The techniques that Shockley developed were adapted by Microsoft during the 90s, as the first dot-com boom spurred an explosion in tech hiring. As with the constraints imposed by both the volume and the high analytical/adaptability bar imposed by Shockley, Microsoft, too, needed to vet people quickly for potential — as software engineering became increasingly complex over the course of the dot-com boom, it was no longer possible to have a few centralized “master programmers” manage the design and then delegate away the minutiae. Even rank and file developers needed to be able to produce under a variety of rapidly evolving conditions, where just mastery of specific skills wasn’t enough.

The puzzle format, in particular, was easy to standardize because individual hiring managers didn’t have to come up with their own interview questions, and a company could quickly build up its own interchangeable question repository.

This mentality also applied to the interview process itself — rather than having individual teams run their own processes and pipelines, it made much more sense to standardize things. This way, in addition to questions, you could effectively plug and play the interviewers themselves — any interviewer within your org could be quickly trained up and assigned to speak with any candidate, independent of prospective team.

Puzzle questions were a good solution for this era for a different reason. Collaborative editing of documents didn’t become a thing until Google Docs’ launch in 2007. Without that capability, writing code in a phone interview was untenable — if you’ve ever tried to talk someone through how to code something up without at least a shared piece of paper in front of you, you know how painful it can be. In the absence of being able to write code in front of someone, the puzzle question was a decent proxy. Technology marched on, however, and its evolution made it possible to move from the proxy of puzzles to more concrete, coding-based interview questions. Around the same time, Google itself publicly overturned the efficacy of puzzle questions.

So where does this leave us? Technical interviews are moving in the direction of more concreteness, but they are still very much a proxy for the day-to-day work that a software engineer actually does. The hope was that the proxy would be decent enough, but it was always understood that that’s what they were and that the cost-benefit of relying on a proxy worked out in cases where problem solving trumped specific skills and where the need for scale trumped everything else.

As it happens, elevating problem-solving ability and the need for a scalable process are both eminently reasonable motivations. But here’s the unfortunate part: the second reason, namely the need for scalability, doesn’t apply in most cases. Very few companies are large enough to need plug and play interviewers. But coming up with interview questions and processes is really hard, so despite their differing needs, smaller companies often take their cues from the larger players, not realizing that companies like Google are successful at hiring because the work they do attracts an assembly line of smart, capable people… and that their success at hiring is often despite their hiring process and not because of it. So you end up with a de facto interviewing cargo cult, where smaller players blindly mimic the actions of their large counterparts and blindly hope for the same results.

The worst part is that these results may not even be repeatable… for anyone. To show you what I mean, I’ll talk a bit about some data we collected at interviewing.io.

Technical interviewing is broken for everybody

Interview outcomes are kind of arbitrary
interviewing.io is a platform where people can practice technical interviewing anonymously and, in the process, find jobs. Interviewers and interviewees meet in a collaborative coding environment and jump right into a technical interview question. After each interview, both sides rate one another, and interviewers rate interviewees on their technical ability. And the same interviewee can do multiple interviews, each of which is with a different interviewer and/or different company, and this opens the door for some interesting and somewhat controlled comparative analysis.

We were curious to see how consistent the same interviewee’s performance was from interview to interview, so we dug into our data. After looking at thousands of interviews on the platform, we’ve discovered something alarming: interviewee performance from interview to interview varied quite a bit, even for people with a high average performance. In the graph below, every represents the mean technical score for an individual interviewee who has done 2 or more interviews on interviewing.io. The y-axis is standard deviation of performance, so the higher up you go, the more volatile interview performance becomes.

As you can see, roughly 25% of interviewees are consistent in their performance, but the rest are all over the place. And over a third of people with a high mean (>=3) technical performance bombed at least one interview.

Despite the noise, from the graph above, you can make some guesses about which people you’d want to interview. However, keep in mind that each person above represents a mean. Let’s pretend that, instead, you had to make a decision based on just one data point. That’s where things get dicey. Looking at this data, it’s not hard to see why technical interviewing is often perceived as a game. And, unfortunately, it’s a game where people often can’t tell how they’re doing.

No one can tell how they’re doing
I mentioned above that on interviewing.io, we collect post-interview feedback. In addition to asking interviewers how their candidates did, we also ask interviewees how they think they did. Comparing those numbers for each interview showed us something really surprising: people are terrible at gauging their own interview performance, and impostor syndrome is particularly prevalent. In fact, people underestimate their performance over twice as often as they overestimate it. Take a look at the graph below to see what I mean:

Note that, in our data, impostor syndrome knows no gender or pedigree — it hits engineers on our platform across the board, regardless of who they are or where they come from.

Now here’s the messed up part. During the feedback step that happens after each interview, we ask interviewees if they’d want to work with their interviewer. As it turns out, there’s a very strong relationship between whether people think they did well and whether they would indeed want to work with the interviewer — when people think they did poorly, even if they actually didn’t, they may be a lot less likely to want to work with you. And, by extension, it means that in every interview cycle, some portion of interviewees are losing interest in joining your company just because they didn’t think they did well, despite the fact that they actually did.

As a result, companies are losing candidates from all walks of life because of a fundamental flaw in the process.

Poor performances hit marginalized groups the hardest
Though impostor syndrome appears to hit engineers from all walks of life, we’ve found that women get hit the hardest in the face of an actually poor performance. As we learned above, poor performances in technical interviewing happen to most people, even people who are generally very strong. However, when we looked at our data, we discovered that after a poor performance, women are 7 times more likely to stop practicing than men:

A bevy of research appears to support confidence-based attrition as a very real cause for women departing from STEM fields, but I would expect that the implications of the attrition we witnessed extend beyond women to underrepresented groups, across the board.

What the real problem is

At the end of the day, because technical interviewing is indeed a game, like all games, it takes practice to improve. However, unless you’ve been socialized to expect and be prepared for the game-like aspect of the experience, it’s not something that you can necessarily intuit. And if you go into your interviews expecting them to be indicative of your aptitude at the job, which is, at the outset, not an unreasonable assumption, you will be crushed the first time you crash and burn. But the process isn’t a great or predictable indicator of your aptitude. And on top of that, you likely can’t tell how you’re doing even when you do well.

These are issues that everyone who’s gone through the technical interviewing gauntlet has grappled with. But not everyone has the wherewithal or social support to realize that the process is imperfect and to stick with it. And the less people like you are involved, whether it’s because they’re not the same color as you or the same gender or because not a lot of people at your school study computer science or because you’re a dropout or for any number of other reasons, the less support or insider knowledge or 10,000 foot view of the situation you’ll have. Full stop.

Inclusion and education isn’t enough

To help remedy the lack of diversity in its headcount, Facebook has committed to three actionable steps on varying time frames. The first step revolves around creating a more inclusive interview/work environment for existing candidates. The other two are focused on addressing the perceived pipeline problem in tech:

  • Short Term: Building a Diverse Slate of Candidates and an Inclusive Working Environment
  • Medium Term: Supporting Students with an Interest in Tech
  • Long Term: Creating Opportunity and Access

Indeed, efforts to promote inclusiveness and increased funding for education are extremely noble, especially in the face of potentially not being able to see results for years in the case of the latter. However, both take a narrow view of the problem and both continue to funnel candidates into a broken system.

Erica Baker really cuts to the heart of it in her blog post about Twitter hiring a head of D&I:

“What irks me the most about this is that no company, Twitter or otherwise, should have a VP of Diversity and Inclusion. When the VP of Engineering… is thinking about hiring goals for the year, they are not going to concern themselves with the goals of the VP of Diversity and Inclusion. They are going to say ‘hiring more engineers is my job, worrying about the diversity of who I hire is the job of the VP of Diversity and Inclusion.’ When the VP of Diversity and Inclusion says ‘your org is looking a little homogenous, do something about it,’ the VP of Engineering won’t prioritize that because the VP of Engineering doesn’t report to the VP of Diversity and Inclusion, so knows there usually isn’t shit the VP of Diversity and Inclusion can do if the Eng org doesn’t see some improvement in diversity.”

Indeed, this is sad, but true. When faced with a high-visibility conundrum like diversity hiring, a pragmatic and even reasonable reaction on any company’s part is to make a few high-profile hires and throw money at the problem. Then, it looks like you’re doing something, and spinning up a task force or a department or new set of titles is a lot easier than attempting to uproot the entire status quo.

As such, we end up with a newly minted, well-funded department pumping a ton of resources into feeding people who’ve not yet learned about the interviewing being a game into a broken, nondeterministic machine of a process made further worse by the fact that said process favors confidence and persistence over bona fide ability… and where the link between success in navigating said process and subsequent on-the-job performance is tenuous at best.

How to fix things

In the evolution of the technical interview, we saw a gradual reduction in the need for proxies as companies as the technology to write code together remotely emerged; with its advent, abstract, largely arbitrary puzzle questions could start to be phased out.

What’s the next step? Technology has the power to free us from relying on proxies, so that we can look at each individual as an indicative, unique bundle of performance-based data points. At interviewing.io, we make it possible to move away from proxies by looking at each interviewee as a collection of data points that tell a story, rather than one arbitrary glimpse of something they did once.

But that’s not enough either. Interviews themselves need to continue to evolve. The process itself needs to be repeatable, predictive of aptitude at the actual job, and not a system to be gamed, where a huge benefit is incurred by knowing the rules. And the larger organizations whose processes act as a template for everyone else need to lead the charge. Only then can we really be welcoming to a truly diverse group of candidates.

Featured

Uncategorized

After a lot more data, technical interview performance really is kind of arbitrary.

Posted on October 13th, 2016.

interviewing.io is a platform where people can practice technical interviewing anonymously, and if things go well, get jobs at top companies in the process. We started it because resumes suck and because we believe that anyone, regardless of how they look on paper, should have the opportunity to prove their mettle.

In February of 2016, we published a post about how people’s technical interview performance, from interview to interview, seemed quite volatile. At the time, we just had a few hundred interviews to draw on, so as you can imagine, we were quite eager to rerun the numbers with the advent of more data. After drawing on over a thousand interviews, the numbers hold up. In other words, technical interview outcomes do really seem to be kind of arbitrary.

The setup

When an interviewer and an interviewee match on interviewing.io, they meet in a collaborative coding environment with voice, text chat, and a whiteboard and jump right into a technical question. After each interview, people leave one another feedback, and each party can see what the other person said about them once they both submit their reviews.

After every interview, interviewers rate interviewees on a few different dimensions, including technical ability. Technical ability gets rated on a scale of 1 to 4, where 1 is “poor” and 4 is “amazing!” (you can see the feedback form here). On our platform, a score of 3 or above has generally meant that the person was good enough to move forward.

At this point, you might say, that’s nice and all, but what’s the big deal? Lots of companies collect this kind of data in the context of their own pipelines. Here’s the thing that makes our data special: the same interviewee can do multiple interviews, each of which is with a different interviewer and/or different company, and this opens the door for some pretty interesting and somewhat controlled comparative analysis.

Performance from interview to interview really is arbitrary

If you’ve read our first post on this subject, you’ll recognize the visualization below. For the as yet uninitiated, every represents the mean technical score for an individual interviewee who has done 2 or more interviews on the platform. The y-axis is standard deviation of performance, so the higher up you go, the more volatile interview performance becomes. If you hover over each , you can drill down and see how that person did in each of their interviews. Anytime you see bolded text with a dotted underline, you can hover over it to see relevant data viz. Try it now to expand everyone’s performance. You can also hover over the labels along the x-axis to drill into the performance of people whose means fall into those buckets.

Standard Dev vs. Mean of Interviewee Performance
(1316 Interviews w/ 259 Interviewees)

As you can see, roughly 20% of interviewees are consistent in their performance (down from 25% the last time we did this analysis), and the rest are all over the place. If you look at the graph above, despite the noise, you can probably make some guesses about which people you’d want to interview. However, keep in mind that each represents a mean. Let’s pretend that, instead, you had to make a decision based on just one data point. That’s where things get dicey.1 For instance:

  • Many people who scored at least one 4 also scored at least one 2.
  • And as you saw above, a good amount of people who scored at least one 4 also scored at least one 1.
  • If we look at high performers (mean of 3.3 or higher), we still see a fair amount of variation.
  • Things get really murky when we consider “average” performers (mean between 2.6 and 3.3).

What do the most volatile interviewees have in common?

In the plot below, you can see interview performance over time for interviewees with the highest standard deviations on the platform (the cutoff we used was a standard dev of 1 or more, and this accounted for roughly 12% of our users). Note that the mix of dashed and dotted lines is purely visual — this way it’s easier to follow each person’s performance path.

So, what do the most highly volatile performers have in common? The answer appears to be, well, nothing. About half were working at top companies while interviewing, and half weren’t. Breakdown of top school was roughly 60/40. And years of experience didn’t have much to do with it either — a plurality of interviewees having between 2 and 6 years of experience, with the rest all over the board (varying between 1 and 20 years).

So, all in all, the factors that go into performance volatility are likely a lot more nuanced than the traditional cues we often use to make value judgments about candidates.

Why does volatility matter?

I discussed the implications of these findings for technical hiring at length in the last post, but briefly, a noisy, non-deterministic interview process does no favors to either candidates or companies. Both end up expending a lot more effort to get a lot less signal than they ought, and in a climate where software engineers are at such a premium, noisy interviews only serve to exacerbate the problem.

But beyond micro and macro inefficiencies, I suspect there’s something even more insidious and unfortunate going on here. Once you’ve done a few traditional technical interviews, the volatility and lack of determinism in the process is something you figure out anecdotally and kind of accept. And if you have the benefit of having friends who’ve also been through it, it only gets easier. What if you don’t, however?

In a previous post, we talked about how women quit interview practice 7 times more often than men after just one bad interview. It’s not too much of a leap to say that this is probably happening to any number of groups who are underrepresented/underserved by the current system. In other words, though it’s a broken process for everyone, the flaws within the system hit these groups the hardest… because they haven’t had the chance to internalize just how much of technical interviewing is a game. More on this subject in our next post!

What can we do about it?

So, yes, the state of technical hiring isn’t great right now, but here’s what we can say. If you’re looking for a job, the best piece of advice we can give you is to really internalize that interviewing is a numbers game. Between the kind of volatility we discussed in this post, impostor syndrome, poor evaluation techniques, and how hard it can be to get meaningful, realistic practice, it takes a lot of interviews to find a great job.

And if you’re hiring people, in the absence of a radical shift in how we vet technical ability, we’ve learned that drawing on aggregate performance is much more meaningful than a making such an important decision based on one single, arbitrary interview. Not only can aggregative performance help correct for an uncharacteristically poor performance, but it can also weed out people who eventually do well in an interview by chance or those who, over time, simply up and memorize Cracking the Coding Interview. At interviewing.io, even after just a handful of interviews, we have a much better picture of what someone is capable of and where they stack up than a single company would after a single interview, and aggregate data tells a much more compelling, repeatable story than one, arbitrary data point.

1At this point you might say that it’s erroneous and naive to compare raw technical scores to one another for any number of reasons, not the least of which is that one interviewer’s 4 is another interviewer’s 2. For a comprehensive justification of using raw scores comparatively, please check out the appendix to our previous post on this subject. Just to make sure the numbers hold up, I reran them, and this time, our R-squared is even higher than before (0.41 vs. 0.39 last time).

Huge thanks to Ian Johnson, creator of d3 Building Blocks, who made the graph entitled Standard Dev vs. Mean of Interviewee Performance (the one with the icons) as well as all the visualizations that go with it.

Featured

Uncategorized

People are still bad at gauging their own interview performance. Here’s the data.

Posted on September 8th, 2016.

interviewing.io is a platform where people can practice technical interviewing anonymously, and if things go well, get jobs at top companies in the process. We started it because resumes suck and because we believe that anyone, regardless of how they look on paper, should have the opportunity to prove their mettle.

At the end of 2015, we published a post about how people are terrible at gauging their own interview performance. At the time, we just had a few hundred interviews to draw on, so as you can imagine, we were quite eager to rerun the numbers with the advent of more data. After drawing on roughly one thousand interviews, we were surprised to find that the numbers have really held up, and that people continue to be terrible at gauging their own interview performance.

The setup

When an interviewer and an interviewee match on interviewing.io, they meet in a collaborative coding environment with voice, text chat, and a whiteboard and jump right into a technical question (feel free to watch this process in action on our recordings page).  After each interview, people leave one another feedback, and each party can see what the other person said about them once they both submit their reviews.

If you’re curious, you can see what the feedback forms look like below — in addition to one direct yes/no question, we also ask about a few different aspects of interview performance using a 1-4 scale. We also ask interviewees some extra questions that we don’t share with their interviewers, and one of those questions is about how well they think they did. For context, a technical score of 3 or above seems to be the rough cut-off for hirability.

Feedback form for interviewers

Feedback form for interviewers

Feedback form for interviewees

Feedback form for interviewees

Perceived versus actual performance… revisited

Below are two heatmaps of perceived vs. actual performance per interview (for interviews where we had both pieces of data). In each heatmap, the darker areas represent higher interview concentration. For instance, the darkest square represents interviews where both perceived and actual performance was rated as a 3. You can hover over each square to see the exact interview count (denoted by “z”).

The first heatmap is our old data:

And the second heatmap is our data as of August 2016:

As you can see, even with the advent of a lot more interviews, the heatmaps look remarkably similar. The R-squared for a linear regression on the first data set is 0.24. And for the more recent data set, it’s dropped to 0.18. In both cases, even though some small positive relationship between actual and perceived performance does exist, it is not a strong, predictable correspondence.

You can also see there’s a non-trivial amount of impostor syndrome going on in the graph above, which probably comes as no surprise to anyone who’s been an engineer. Take a look at the graph below to see what I mean.

The x-axis is the difference between actual and perceived performance, i.e. actual minus perceived. In other words, a negative value means that you overestimated your performance, and a positive one means that you underestimated it. Therefore, every bar above 0 is impostor syndrome country, and every bar below zero belongs to its foulsome, overconfident cousin, the Dunning-Kruger effect.1

On interviewing.io (though I wouldn’t be surprised if this finding extrapolated to the qualified engineering population at large), impostor syndrome plagues interviewees roughly twice as often as Dunning-Kruger. Which, I guess, is better than the alternative.

Why people underestimate their performance

With all this data, I couldn’t resist digging into interviews where interviewees gave themselves 1’s and 2’s but where interviewers gave them 4’s to try to figure out if there were any common threads. And, indeed, a few trends emerged. The interviews that tended to yield the most interviewee impostor syndrome were ones where question complexity was layered. In other words, the interviewer would start with a fairly simple question and then, when the interviewee completed it successfully, they would change things up to make it harder. Lather, rinse, repeat. In some cases, an interviewer could get through up to 4 layered tiers in about an hour. Inevitably, even a good interviewee will hit a wall eventually, even if the place where it happens is way further out than the boundary for most people who attempt the same question.

Another trend I observed had to do with interviewees beating themselves up for issues that mattered a lot to them but fundamentally didn’t matter much to their interviewer: off-by-one errors, small syntax errors that made it impossible to compile their code (even though everything was semantically correct), getting big-O wrong the first time and then correcting themselves, and so on.

Interestingly enough, how far off people were in gauging their own performance was independent of how highly rated (overall) their interviewer was or how strict their interviewer was.

With that in mind, if I learned anything from watching these interviews, it was this. Interviewing is a flawed, human process. Both sides want to do a good job, but sometimes the things that matter to each side are vastly different. And sometimes the standards that both sides hold themselves to are vastly different as well.

Why this (still) matters for hiring, and what you can do to make it better

Techniques like layered questions are important to sussing out just how good a potential candidate is and can make for a really engaging positive experience, so removing them isn’t a good solution. And there probably isn’t that much you can do directly to stop an engineer from beating themselves up over a small syntax error (especially if it’s one the interviewer didn’t care about). However, all is not lost!

As you recall, during the feedback step that happens after each interview, we ask interviewees if they’d want to work with their interviewer. As it turns out, there’s a very statistically significant relationship between whether people think they did well and whether they’d want to work with the interviewer. This means that when people think they did poorly, they may be a lot less likely to want to work with you. And by extension, it means that in every interview cycle, some portion of interviewees are losing interest in joining your company just because they didn’t think they did well, despite the fact that they actually did.

How can one mitigate these losses? Give positive, actionable feedback immediately (or as soon as possible)! This way people don’t have time to go through the self-flagellation gauntlet that happens after a perceived poor performance, followed by the inevitable rationalization that they totally didn’t want to work there anyway.

1I’m always terrified of misspelling “Dunning-Kruger” and not double-checking it because of overconfidence in my own spelling abilities.

Featured

Uncategorized

We built voice modulation to mask gender in technical interviews. Here’s what happened.

Posted on June 29th, 2016.

interviewing.io is a platform where people can practice technical interviewing anonymously and, in the process, find jobs based on their interview performance rather than their resumes. Since we started, we’ve amassed data from thousands of technical interviews, and in this blog, we routinely share some of the surprising stuff we’ve learned. In this post, I’ll talk about what happened when we built real-time voice masking to investigate the magnitude of bias against women in technical interviews. In short, we made men sound like women and women sound like men and looked at how that affected their interview performance. We also looked at what happened when women did poorly in interviews, how drastically that differed from men’s behavior, and why that difference matters for the thorny issue of the gender gap in tech.

The setup

When an interviewer and an interviewee match on our platform, they meet in a collaborative coding environment with voice, text chat, and a whiteboard and jump right into a technical question. Interview questions on the platform tend to fall into the category of what you’d encounter at a phone screen for a back-end software engineering role, and interviewers typically come from a mix of large companies like Google, Facebook, Twitch, and Yelp, as well as engineering-focused startups like Asana, Mattermark, and others. For more context, some examples of interviews done on the platform can be found on our public recordings page.

After every interview, interviewers rate interviewees on a few different dimensions.

Feedback form for interviewers

Feedback form for interviewers

As you can see, we ask the interviewer if they would advance their interviewee to the next round. We also ask about a few different aspects of interview performance using a 1-4 scale. On our platform, a score of 3 or above is generally considered good.

Women historically haven’t performed as well as men…

One of the big motivators to think about voice masking was the increasingly uncomfortable disparity in interview performance on the platform between men and women1. At that time, we had amassed over a thousand interviews with enough data to do some comparisons and were surprised to discover that women really were doing worse. Specifically, men were getting advanced to the next round 1.4 times more often than women. Interviewee technical score wasn’t faring that well either — men on the platform had an average technical score of 3 out of 4, as compared to a 2.5 out of 4 for women.

Despite these numbers, it was really difficult for me to believe that women were just somehow worse at computers, so when some of our customers asked us to build voice masking to see if that would make a difference in the conversion rates of female candidates, we didn’t need much convincing.

… so we built voice masking

Since we started working on interviewing.io, in order to achieve true interviewee anonymity, we knew that hiding gender would be something we’d have to deal with eventually but put it off for a while because it wasn’t technically trivial to build a real-time voice modulator. Some early ideas included sending female users a Bane mask.

Early voice masking prototype

Early voice masking prototype (drawing by Marcin Kanclerz)

When the Bane mask thing didn’t work out, we decided we ought to build something within the app, and if you play the videos below, you can get an idea of what voice masking on interviewing.io sounds like. In the first one, I’m talking in my normal voice.

And in the second one, I’m modulated to sound like a man.2

Armed with the ability to hide gender during technical interviews, we were eager to see what the hell was going on and get some insight into why women were consistently underperforming.

The experiment

The setup for our experiment was simple. Every Tuesday evening at 7 PM Pacific, interviewing.io hosts what we call practice rounds. In these practice rounds, anyone with an account can show up, get matched with an interviewer, and go to town. And during a few of these rounds, we decided to see what would happen to interviewees’ performance when we started messing with their perceived genders.

In the spirit of not giving away what we were doing and potentially compromising the experiment, we told both interviewees and interviewers that we were slowly rolling out our new voice masking feature and that they could opt in or out of helping us test it out. Most people opted in, and we informed interviewees that their voice might be masked during a given round and asked them to refrain from sharing their gender with their interviewers. For interviewers, we simply told them that interviewee voices might sound a bit processed.

We ended up with 234 total interviews (roughly 2/3 male and 1/3 female interviewees), which fell into one of three categories:

  • Completely unmodulated (useful as a baseline)
  • Modulated without pitch change
  • Modulated with pitch change

You might ask why we included the second condition, i.e. modulated interviews that didn’t change the interviewee’s pitch. As you probably noticed, if you played the videos above, the modulated one sounds fairly processed. The last thing we wanted was for interviewers to assume that any processed-sounding interviewee must summarily have been the opposite gender of what they sounded like. So we threw that condition in as a further control.

The results

After running the experiment, we ended up with some rather surprising results. Contrary to what we expected (and probably contrary to what you expected as well!), masking gender had no effect on interview performance with respect to any of the scoring criteria (would advance to next round, technical ability, problem solving ability). If anything, we started to notice some trends in the opposite direction of what we expected: for technical ability, it appeared that men who were modulated to sound like women did a bit better than unmodulated men and that women who were modulated to sound like men did a bit worse than unmodulated women. Though these trends weren’t statistically significant, I am mentioning them because they were unexpected and definitely something to watch for as we collect more data.

On the subject of sample size, we have no delusions that this is the be-all and end-all of pronouncements on the subject of gender and interview performance. We’ll continue to monitor the data as we collect more of it, and it’s very possible that as we do, everything we’ve found will be overturned. I will say, though, that had there been any staggering gender bias on the platform, with a few hundred data points, we would have gotten some kind of result. So that, at least, was encouraging.

So if there’s no systemic bias, why are women performing worse?

After the experiment was over, I was left scratching my head. If the issue wasn’t interviewer bias, what could it be? I went back and looked at the seniority levels of men vs. women on the platform as well as the kind of work they were doing in their current jobs, and neither of those factors seemed to differ significantly between groups. But there was one nagging thing in the back of my mind. I spend a lot of my time poring over interview data, and I had noticed something peculiar when observing the behavior of female interviewees. Anecdotally, it seemed like women were leaving the platform a lot more often than men. So I ran the numbers.

What I learned was pretty shocking. As it happens, women leave interviewing.io roughly 7 times as often as men after they do badly in an interview. And the numbers for two bad interviews aren’t much better. You can see the breakdown of attrition by gender below (the differences between men and women are indeed statistically significant with P < 0.00001).

Also note that as much as possible, I corrected for people leaving the platform because they found a job (practicing interviewing isn’t that fun after all, so you’re probably only going to do it if you’re still looking), were just trying out the platform out of curiosity, or they didn’t like something else about their interviewing.io experience.

A totally speculative thought experiment

So, if these are the kinds of behaviors that happen in the interviewing.io microcosm, how much is applicable to the broader world of software engineering? Please bear with me as I wax hypothetical and try to extrapolate what we’ve seen here to our industry at large. And also, please know that what follows is very speculative, based on not that much data, and could be totally wrong… but you gotta start somewhere.

If you consider the attrition data points above, you might want to do what any reasonable person would do in the face of an existential or moral quandary, i.e. fit the data to a curve. An exponential decay curve seemed reasonable for attrition behavior, and you can see what I came up with below. The x-axis is the number of what I like to call “attrition events”, namely things that might happen to you over the course of your computer science studies and subsequent career that might make you want to quit. The y-axis is what portion of people are left after each attrition event. The red curve denotes women, and the blue curve denotes men.

Now, as I said, this is pretty speculative, but it really got me thinking about what these curves might mean in the broader context of women in computer science. How many “attrition events” does one encounter between primary and secondary education and entering a collegiate program in CS and then starting to embark on a career? So, I don’t know, let’s say there are 8 of these events between getting into programming and looking around for a job. If that’s true, then we need 3 times as many women studying computer science than men to get to the same number in our pipelines. Note that that’s 3 times more than men, not 3 times more than there are now. If we think about how many there are now, which, depending on your source, is between 1/3 and a 1/4 of the number of men, to get to pipeline parity, we actually have to increase the number of women studying computer science by an entire order of magnitude.

Prior art, or why maybe this isn’t so nuts after all

Since gathering these findings and starting to talk about them a bit in the community, I began to realize that there was some supremely interesting academic work being done on gender differences around self-perception, confidence, and performance. Some of the work below found slightly different trends than we did, but it’s clear that anyone attempting to answer the question of the gender gap in tech would be remiss in not considering the effects of confidence and self-perception in addition to the more salient matter of bias.

In a study investigating the effects of perceived performance to likelihood of subsequent engagement, Dunning (of Dunning-Kruger fame) and Ehrlinger administered a scientific reasoning test to male and female undergrads and then asked them how they did. Not surprisingly, though there was no difference in performance between genders, women underrated their own performance more often than men. Afterwards, participants were asked whether they’d like to enter a Science Jeopardy contest on campus in which they could win cash prizes. Again, women were significantly less likely to participate, with participation likelihood being directly correlated with self-perception rather than actual performance.3

In a different study, sociologists followed a number of male and female STEM students over the course of their college careers via diary entries authored by the students. One prevailing trend that emerged immediately was the difference between how men and women handled the “discovery of their [place in the] pecking order of talent, an initiation that is typical of socialization across the professions.” For women, realizing that they may no longer be at the top of the class and that there were others who were performing better, “the experience [triggered] a more fundamental doubt about their abilities to master the technical constructs of engineering expertise [than men].”

And of course, what survey of gender difference research would be complete without an allusion to the wretched annals of dating? When I told the interviewing.io team about the disparity in attrition between genders, the resounding response was along the lines of, “Well, yeah. Just think about dating from a man’s perspective.” Indeed, a study published in the Archives of Sexual Behavior confirms that men treat rejection in dating very differently than women, even going so far as to say that men “reported they would experience a more positive than negative affective response after… being sexually rejected.”

Maybe tying coding to sex is a bit tenuous, but, as they say, programming is like sex — one mistake and you have to support it for the rest of your life.

Why I’m not depressed by our results and why you shouldn’t be either

Prior art aside, I would like to leave off on a high note. I mentioned earlier that men are doing a lot better on the platform than women, but here’s the startling thing. Once you factor out interview data from both men and women who quit after one or two bad interviews, the disparity goes away entirely. So while the attrition numbers aren’t great, I’m massively encouraged by the fact that at least in these findings, it’s not about systemic bias against women or women being bad at computers or whatever. Rather, it’s about women being bad at dusting themselves off after failing, which, despite everything, is probably a lot easier to fix.

1Roughly 15% of our users are female. We want way more, but it’s a start.

2If you want to hear more examples of voice modulation or are just generously down to indulge me in some shameless bragging, we got to demo it on NPR and in Fast Company.

3In addition to asking interviewers how interviewees did, we also ask interviewees to rate themselves. After reading the Dunning and Ehrlinger study, we went back and checked to see what role self-perception played in attrition. In our case, the answer is, I’m afraid, TBD, as we’re going to need more self-ratings to say anything conclusive.

Featured

Uncategorized

Technical interview performance is kind of arbitrary. Here’s the data.

Posted on February 17th, 2016.

Note: Though I wrote most of the words in this post, there are a few people outside of interviewing.io whose work made it possible. Ian Johnson, creator of d3 Building Blocks, created the graph entitled Standard Dev vs. Mean of Interviewee Performance (the one with the icons) as well as all the interactive visualizations that go with it. Dave Holtz did all the stats work for computing the probability of people failing individual interviews. You can see more about his work on his blog.

interviewing.io is a platform where people can practice technical interviewing anonymously and, in the process, find jobs. In the past few months, we’ve amassed data from hundreds of interviews, and when we looked at how the same people performed from interview to interview, we were really surprised to find quite a bit of volatility, which, in turn, made us question the reliability of single interview outcomes.

The setup

When an interviewer and an interviewee match on our platform, they meet in a collaborative coding environment with voice1, text chat, and a whiteboard and jump right into a technical question. Interview questions on the platform tend to fall into the category of what you’d encounter at a phone screen for a back-end software engineering role, and interviewers typically come from a mix of large companies like Google, Facebook, and Yelp, as well as engineering-focused startups like Asana, Mattermark, KeepSafe, and more. If you’d like to see an interview an action, head over to our public recordings page for a few examples.

After every interview, interviewers rate interviewees on a few different dimensions, including technical ability. Technical ability gets rated on a scale of 1 to 4, where 1 is “meh” and 4 is “amazing!” (you can see the feedback form here). On our platform, a score of 3 or above has generally meant that the person was good enough to move forward.

At this point, you might say, that’s nice and all, but what’s the big deal? Lots of companies collect this kind of data in the context of their own pipelines. Here’s the thing that makes our data special: the same interviewee can do multiple interviews, each of which is with a different interviewer and/or different company, and this opens the door for some pretty interesting and somewhat controlled comparative analysis.

Performance from interview to interview is pretty volatile

Let’s start with some visuals. In the graph below, every represents the mean technical score for an individual interviewee who has done 2 or more interviews on the platform2. The y-axis is standard deviation of performance, so the higher up you go, the more volatile interview performance becomes. If you hover over each , you can drill down and see how that person did in each of their interviews. Anytime you see bolded text with a dotted underline, you can hover over it to see relevant data viz. Try it now to expand everyone’s performance. You can also hover over the labels along the x-axis to drill into the performance of people whose means fall into those buckets.

Standard Dev vs. Mean of Interviewee Performance
(299 Interviews w/ 67 Interviewees)

As you can see, roughly 25% of interviewees are consistent in their performance, and the rest are all over the place3. If you look at the graph above, despite the noise, you can probably make some guesses about which people you’d want to interview. However, keep in mind that each represents a mean. Let’s pretend that, instead, you had to make a decision based on just one data point. That’s where things get dicey. For instance:

  • Many people who scored at least one 4 also scored at least one 2.
  • If we look at high performers (mean of 3.3 or higher), we still see a fair amount of variation.
  • Things get really murky when we consider “average” performers (mean between 2.6 and 3.3).

To me, looking at this data and then pretending that I had to make a hiring decision based on one interview outcome felt a lot like peering into some beautiful, lavishly appointed parlor through a keyhole. Sometimes you see a piece of art on the wall, sometimes you see the liquor selection, and sometimes you just see the back of the couch.

At this point you might say that it’s erroneous and naive to compare raw technical scores to one another for any number of reasons, not the least of which is that one interviewer’s 4 is another interviewer’s 2. We definitely share this concern and address it in the appendix of this post. It does bear mentioning, though, that most of our interviewers are coming from companies with strong engineering brands and that correcting for brand strength didn’t change interviewee performance volatility, nor did correcting for interviewer rating.

So, in a real life situation, when you’re trying to decide whether to advance someone to onsite, you’re probably trying to avoid two things — false positives (bringing in people below your bar by mistake) and false negatives (rejecting people who should have made it in). Most top companies’ interviewing paradigm is that false negatives are less bad than false positives. This makes sense right? With a big enough pipeline and enough resources, even with a high false negative rate, you’ll still get the people you want. With a high false positive rate, you might get cheaper hiring, but you do potentially irreversible damage to your product, culture, and future hiring standards in the process. And of course, the companies setting the hiring standards and practices for an entire industry ARE the ones with the big pipelines and seemingly inexhaustible resources.

The dark side of optimizing for high false negative rates, though, rears its head in the form of our current engineering hiring crisis. Do single interview instances, in their current incarnation, give enough signal? Or amidst so much demand for talent, are we turning away qualified people because we’re all looking at a large, volatile graph through a tiny keyhole?

So, hyperbolic moralizing aside, given how volatile interview performance is, what are the odds that a good candidate will fail an individual phone screen?

Odds of failing a single interview based on past performance

Below, you can see the distribution of mean performance throughout our population of interviewees.

In order to figure out the probability that a candidate with a given mean score would fail an interview, we had to do some stats work. First, we broke interviewees up into cohorts based on their mean scores (rounded to the nearest 0.25). Then, for each cohort, we calculated the probability of failing, i.e. of getting a score of 2 or less. Finally, to work around our starting data set not being huge, we resampled our data. In our resampling procedure, we treated an interview outcome as a multinomial distribution, or in other words, pretended that each interview was a roll of a weighted, 4-sided die corresponding to that candidate’s cohort. We then re-rolled the dice a bunch of times to create a new, “simulated” dataset for each cohort and calculated new probabilities of failure for each cohort using these data sets. Below, you can see the results of repeating this process 10,000 times.

As you can see, a lot of the distributions above overlap with one another. This is important because these overlaps tell us that there may not be statistically significant differences between those groups (e.g. between 2.75 and 3). Certainly, with the advent of LOT more data, the delineations between cohorts may become clearer. On the other hand, if we do need a huge amount of data to detect differences in failure rate, it might suggest that people are intrinsically highly variable in their performance. At the end of the day, while we can confidently say that there is a significant difference between the bottom end of the spectrum (2.25) versus the top end (3.75), for people in the middle, things are murky.

Nevertheless, using these distributions, we did attempt to compute the probability that a candidate with a certain mean score would fail a single interview (see below — the shaded areas encapsulate a 95% confidence interval). The fact that people who are overall pretty strong (e.g. mean ~= 3) can mess up technical interviews as much as 22% of the time shows that there’s definitely room for improvement in the process, and this is further exacerbated by the general murkiness in the middle of the spectrum.

Is interviewing doomed?

Generally, when we think of interviewing, we think of something that ought to have repeatable results and carry a strong signal. However, the data we’ve collected, meager though it might be, tells a different story. And it resonates with both my anecdotal experience as a recruiter and with the sentiments we’ve seen echoed in the community. Zach Holman’s Startup Interviewing is Fucked hits on the disconnect between interview process and the job it’s meant to fill, the fine gentlemen of TripleByte reached similar conclusions by looking at their own data, and one of the more poignant expressions of inconsistent interviewing results recently came from rejected.us.

You can bet that many people who are rejected after a phone screen by Company A but do better during a different phone screen and ultimately end up somewhere traditionally reputable are getting hit up by Company A’s recruiters 6 months later. And despite everyone’s best efforts, the murky, volatile, and ultimately stochastic circle jerk of a recruitment process marches on.

So yes, it’s certainly one possible conclusion is that technical interviewing itself is indeed fucked and doesn’t provide a reliable, deterministic signal for one interview instance. Algorithmic interviews are a hotly debated topic and one we’re deeply interested in teasing apart. One thing in particular we’re very excited about is tracking interview performance as a function of interview type, as we get more and more different interviewing types/approaches happening on the platform. Indeed, one of our long-term goals is to really dig into our data, look at the landscape of different interview styles, and make some serious data-driven statements about what types of technical interviews lead to the highest signal.

In the meantime, however, I am leaning toward the idea that drawing on aggregate performance is much more meaningful than a making such an important decision based on one single, arbitrary interview. Not only can aggregative performance help correct for an uncharacteristically poor performance, but it can also weed out people who eventually do well in an interview by chance or those who, over time, submit to the beast and memorize Cracking the Coding Interview. I know it’s not always practical or possible to gather aggregate performance data in the wild, but at the very least, in cases where a candidate’s performance is borderline or where their performance differs wildly from what you’d expect, it might make sense to interview them one more time, perhaps focusing on slightly different material, before making the final decision.

 

Appendix: The part where we tentatively justify using raw scores for comparative performance analysisFor the skeptical, inquiring minds among you who realize that using raw coding scores to evaluate an interviewee has some pretty obvious problems, we’ve included this section. The issue is that even though our interviewers tend to come from companies with high engineering bars, raw scores are still comprised of just one piece of feedback, they don’t adjust for interviewer strictness (e.g. one interviewer’s 4 could be another interviewer’s 2), and they don’t adjust well to changes in skill over time. Internally, we actually use a more complex and comprehensive rating system when determining skill, and if we can show that raw scores align with the ratings we calculate, then we don’t feel so bad about using raw scores comparatively.

Our rating system works something like this:

  1. We create a single score for each interview based on a weighted average of each feedback item.
  2. For each interviewer, we pit all the interviewees they’ve interviewed against one another using this score.
  3. We use a Bayesian ranking system (a modified version of Glicko-2) to generate a rating for each interviewee based on the outcome of these competitions.

As a result, each person is only rated based on their score as it compares to other people who were interviewed by the same interviewer. That means one interviewer’s score is never directly compared to another’s, and so we can correct for the hairy issue of inconsistent interviewer strictness.

So, why am I bringing this up at all? You’re all smart people, and you can tell when someone is waving their hands around and pretending to do math. Before we did all this analysis, we wanted to make sure that we believed our own data. We’ve done a lot of work to build a ratings system we believe in, so we correlated that with raw coding scores to see how strong they are at determining actual skill.

These results are pretty strong. Not strong enough for us to rely on raw scores exclusively but strong enough to believe that raw scores are useful for determining approximate candidate strength.

1While listening to interviews day in and day out, I came up with a drinking game. Every time someone thinks the answer is hash table, take a drink. And every time the answer actually is hash table, take two drinks.4

2This is data as of January 2016, and there are only 299 interviews because not all interviews have enough feedback data and because we threw out everyone with less than 2 interviews. Moreover, one thing we don’t show in this graph is the passage of time, so you can see people’s performance over time — it’s kind of a hot mess.

3We were curious to see if volatility varied at all with people’s mean scores. In other words, were weaker players more volatile than strong ones? The answer is no — when we ran a regression on standard deviation vs. mean, we couldn’t come up with any meaningful relationship (R-squared ~= 0.03), which means that people are all over the place regardless of how strong they are on average.

4I almost died.

Thanks to Andrew Marsh for co-authoring the appendix, to Plotly for making a terrific graphing product, and to everyone who read drafts of this behemoth.

Featured

Uncategorized

Engineers can’t gauge their own interview performance. And that makes them harder to hire.

Posted on December 15th, 2015.

interviewing.io is an anonymous technical interviewing platform. We started it because resumes suck and because we believe that anyone, regardless of how they look on paper, should have the opportunity to prove their mettle. In the past few months, we’ve amassed over 600 technical interviews along with their associated data and metadata. Interview questions tend to fall into the category of what you’d encounter at a phone screen for a back-end software engineering role at a top company, and interviewers typically come from a mix of larger companies like Google, Facebook, and Twitter, as well as engineering-focused startups like Asana, Mattermark, KeepSafe, and more.

Over the course of the next few posts, we’ll be sharing some { unexpected, horrifying, amusing, ultimately encouraging } things we’ve learned. In this blog’s heroic maiden voyage, we’ll be tackling people’s surprising inability to gauge their own interview performance and the very real implications this finding has for hiring.

First, a bit about setup

When an interviewer and an interviewee match on our platform, they meet in a collaborative coding environment with voice, text chat, and a whiteboard and jump right into a technical question. After each interview, people leave one another feedback, and each party can see what the other person said about them once they both submit their reviews. If both people find each other competent and pleasant, they have the option to unmask. Overall, interviewees tend to do quite well on the platform, with just under half of interviews resulting in a “yes” from the interviewer.

If you’re curious, we have a few public recordings of interviews done on the platform, so you can watch and see what an interview is really like. In addition to these, our feedback forms are attached below. There is one direct yes/no question, and we also ask about a few different aspects of interview performance using a 1-4 scale. We also ask interviewees some extra questions that we don’t share with their interviewers, and one of those questions is about how well they think they did. In this post, we’ll be focusing on the technical score an interviewer gives an interviewee and the interviewee’s self-assessment (both are circled below). For context, a technical score of 3 or above seems to be the rough cut-off for hirability.

Feedback form for interviewers

Feedback form for interviewers

Feedback form for interviewees

Feedback form for interviewees

Perceived versus actual performance

Below, you can see the distribution of people’s actual technical performance (as rated by their interviewers) and the distribution of their perceived performance (how they rated themselves) for the same set of interviews.1

You might notice right away that there is a little bit of disparity, but things get interesting when you plot perceived vs. actual performance for each interview. Below, is a heatmap of the data where the darker areas represent higher interview concentration. For instance, the darkest square represents interviews where both perceived and actual performance was rated as a 3. You can hover over each square to see the exact interview count (denoted by “z”).

If you run a regression on this data2, you get an R-squared of only 0.24, and once you take away the worst interviews, it drops down even further to a 0.16. For context, R-squared is a measurement of how well you can fit empirical data to some mathematical model. It’s on a scale from 0 to 1 with 0 meaning that everything is noise and 1 meaning that everything fits perfectly. In other words, even though some small positive relationship between actual and perceived performance does exist, it is not a strong, predictable correspondence.

You can also see there’s a non-trivial amount of impostor syndrome going on in the graph above, which probably comes as no surprise to anyone who’s been an engineer.

Gayle Laakmann McDowell of Cracking the Coding Interview fame has written quite a bit about how bad people are at gauging their own interview performance, and it’s something that I had noticed anecdotally when I was doing recruiting, so it was nice to see some empirical data on that front. In her writing, Gayle mentions that it’s the job of a good interviewer to make you feel like you did OK even if you bombed. I was curious about whether that’s what was going on here, but when I ran the numbers, there wasn’t any relationship between how highly an interviewer was rated overall and how off their interviewees’ self-assessments were, in one direction or the other.

Ultimately, this isn’t a big data set, and we will continue to monitor the relationship between perceived and actual performance as we host more interviews, but we did find that this relationship emerged very early on and has continued to persist with more and more interviews — R-squared has never exceeded 0.26 to date.

Why this matters for hiring

Now here’s the actionable and kind of messed up part. As you recall, during the feedback step that happens after each interview, we ask interviewees if they’d want to work with their interviewer. As it turns out, there’s a very statistically significant relationship (p < 0.0008) between whether people think they did well and whether they’d want to work with the interviewer. This means that when people think they did poorly, they may be a lot less likely to want to work with you3. And by extension, it means that in every interview cycle, some portion of interviewees are losing interest in joining your company just because they didn’t think they did well, despite the fact that they actually did.

How can one mitigate these losses? Give positive, actionable feedback immediately (or as soon as possible)! This way people don’t have time to go through the self-flagellation gauntlet that happens after a perceived poor performance, followed by the inevitable rationalization that they totally didn’t want to work there anyway.

Lastly, a quick shout-out to Statwing and Plotly for making terrific data analysis and graphing tools respectively.

1There are only 254 interviews represented here because not all interviews in our data set had comprehensive, mutual feedback. Moreover, we realize that raw scores don’t tell the whole story and will be focusing on standardization of these scores and the resulting rat’s nest in our next post. That said, though interviewer strictness does vary, we gate interviewers pretty heavily based on their background and experience, so the overall bar is high and comparable to what you’d find at a good company in the wild.

2Here we are referring to linear regression, and though we tried fitting a number of different curves to the data, they all sucked.

3In our data, people were 3 times less likely to want to work with their interviewers when they thought they did poorly.

Featured

Uncategorized

Resumes suck. Here’s the data.

Posted on November 10th, 2014.

Note: This post is syndicated from Aline Lerner’s personal blog. Aline is the CEO and co-founder of interviewing.io, and results like these are what inspired her to start this company.

About a year ago, after looking at the resumes of engineers we had interviewed at TrialPay in 2012, I learned that the strongest signal for whether someone would get an offer was the number of typos and grammatical errors on their resume. On the other hand, where people went to school, their GPA, and highest degree earned didn’t matter at all. These results were pretty unexpected, ran counter to how resumes were normally filtered, and left me scratching my head about how good people are at making value judgments based on resumes, period. So, I decided to run an experiment.

In this experiment, I wanted to see how good engineers and recruiters were at resume-based candidate filtering. Going into it, I was pretty sure that engineers would do a much better job than recruiters. (They are technical! They don’t need to rely on proxies as much!) However, that’s not what happened at all. As it turned out, people were pretty bad at filtering resumes across the board, and after running the numbers, it began to look like resumes might not be a particularly effective filtering tool in the first place.

Setup

The setup was simple. I would:

  1. Take resumes from my collection.
  2. Remove all personally identifying info (name, contact info, dates, etc.).
  3. Show them to a bunch of recruiters and engineers.
  4. For each resume, ask just one question: Would you interview this candidate?

Essentially, each participant saw something like this:

If the participant didn’t want to interview the candidate, they’d have to write a few words about why. If they did want to interview, they also had the option of substantiating their decision, but, in the interest of not fatiguing participants, I didn’t require it.

To make judging easier, I told participants to pretend that they were hiring for a full-stack or back-end web dev role, as appropriate. I also told participants not to worry too much about the candidate’s seniority when making judgments and to assume that the seniority of the role matched the seniority of the candidate.

For each resume, I had a pretty good idea of how strong the engineer in question was, and I split resumes into two strength-based groups. To make this judgment call, I drew on my personal experience — most of the resumes came from candidates I placed (or tried to place) at top-tier startups. In these cases, I knew exactly how the engineer had done in technical interviews, and, more often than not, I had visibility into how they performed on the job afterwards. The remainder of resumes came from engineers I had worked with directly. The question was whether the participants in this experiment could figure out who was who just from the resume.

At this juncture, a disclaimer is in order. Certainly, someone’s subjective hirability based on the experience of one recruiter is not an oracle of engineering ability — with the advent of more data and more rigorous analysis, perhaps these results will be proven untrue. But, you gotta start somewhere. That said, here’s the experiment by the numbers.

  • I used a total of 51 resumes in this study. 64% belonged to strong candidates.
  • A total of 152 people participated in the experiment.
  • Each participant made judgments on 6 randomly selected resumes from the original set of 51, for a total of 716 data points1.

If you want to take the experiment for a whirl yourself, you can do so here.

Participants were broken up into engineers (both engineers involved in hiring and hiring managers themselves) and recruiters (both in-house and agency). There were 46 recruiters (22 in-house and 24 agency) and 106 engineers (20 hiring managers and 86 non-manager engineers who were still involved in hiring).

Results

So, what ended up happening? Below, you can see a comparison of resume scores for both groups of candidates. A resume score is the average of all the votes each resume got, where a ‘no’ counted as 0 and a ‘yes’ vote counted as 1. The dotted line in each box is the mean for each resume group — you can see they’re pretty much the same. The solid line is the median, and the boxes contain the 2nd and 3rd quartiles on either side of it. As you can see, people weren’t very good at this task — what’s pretty alarming is that scores are all over the place, for both strong and less strong candidates.

Another way to look at the data is to look at the distribution of accuracy scores. Accuracy in this context refers to how many resumes people were able to tag correctly out of the subset of 6 that they saw. As you can see, results were all over the board.

On average, participants guessed correctly 53% of the time. This was pretty surprising, and at the risk of being glib, according to these results, when a good chunk of people involved in hiring make resume judgments, they might as well be flipping a coin.

Source: https://what-if.xkcd.com/19/
What about performance broken down by participant group? Here’s the breakdown:

  • Agency recruiters – 56%
  • Engineers – 54%
  • In-house recruiters – 52%
  • Eng hiring managers – 48%

None of the differences between participant groups were statistically significant. In other words, all groups did equally poorly. For each group, you can see how well people did below.

To try to understand whether people really were this bad at the task or whether perhaps the task itself was flawed, I ran some more stats. One thing I wanted to understand, in particular, was whether inter-rater agreement was high. In other words, when rating resumes, were participants disagreeing with each other more often than you’d expect to happen by chance? If so, then even if my criteria for whether each resume belonged to a strong candidate wasn’t perfect, the results would still be compelling — no matter how you slice it, if people involved in hiring consistently can’t come to a consensus, then something about the task at hand is too ambiguous.

The test I used to gauge inter-rater agreement is called Fleiss’ kappa. The result is on the following scale of -1 to 1:

  • -1 perfect disagreement; no rater agrees with any other
  • 0 random; the raters might as well have been flipping a coin
  • 1 perfect agreement; the raters all agree with one another

Fleiss’ kappa for this data set was 0.13. 0.13 is close to zero, implying just mildly better than coin flip. In other words, the task of making value judgments based on these resumes was likely too ambiguous for humans to do well on with the given information alone.

TL;DR Resumes might actually suck.

Some interesting patterns

In addition to the finding out that people aren’t good at judging resumes, I was able to uncover a few interesting patterns.

Times didn’t matter
We’ve all heard of and were probably a bit incredulous about the study that showed recruiters spend less than 10 seconds on a resume on average. In this experiment, people took a lot longer to make value judgments. People took a median of 1 minute and 40 seconds per resume. In-house recruiters were fastest, and agency recruiters were slowest. However, how long someone spent looking at a resume appeared to have no bearing, overall, on whether they’d guess correctly.

Different things mattered to engineers and recruiters
Whenever a participant deemed a candidate not worth interviewing, they had to substantiate their decision. Though these criteria are clearly not the be-all and end-all of resume filtering — if they were, people would have done better — it was interesting to see that engineers and recruiters were looking for different things.2

recruiter rejection reasons
engineer rejection reasons copy

Incidentally, lack of relevant experience didn’t refer to lack of experience with a specific stack. Verbatim rejection reasons under this category tended to say stuff like “projects not extensive enough”, “lack of core computer science”, or “a lot of academic projects around EE, not a lot on the resume about programming or web development”. Culture fit in the engineering graph denotes concerns about engineering culture fit, rather than culture fit overall. This could be anything from concern that someone used to working with Microsoft technologies might not be at home in a RoR shop to worrying that the candidate is too much of a hacker to write clean, maintainable code.

Different groups did better on different kinds of resumes
First of all, and not surprisingly, engineers tended to do slightly better on resumes that had projects. Engineers also tended to do better on resumes that included detailed and clear explanations of what the candidate worked on. To get an idea of what I mean by detailed and clear explanations, take a look at the two versions below (source: Lessons from a year’s worth of hiring data). The first description can apply to pretty much any software engineering project, whereas after reading the second, you have a pretty good idea of what the candidate worked on.
bad_description

good_description

Recruiters, on the other hand, tended to do better with candidates from top companies. This also makes sense. Agency recruiters deal with a huge, disparate candidate set while also dealing with a large number of companies in parallel. They’re going to have a lot of good breadth-first insight including which companies have the highest engineering bar, which companies recently had layoffs, which teams within a specific company are the strongest, and so on.

Resumes just aren’t that useful

So, why are people pretty bad at this task? As we saw above, it may not be a matter of being good or bad at judging resumes but rather a matter of the task itself being flawed — at the end of the day, the resume is a low-signal document.

If we’re honest, no one really knows how to write resumes particularly well. Many people get their first resume writing tips from their university’s career services department, which is staffed with people who’ve never held a job in the field they’re advising for. Shit, some of the most fervent resume advice I ever got was from a technical recruiter, who insisted that I list every technology I’d ever worked with on every single undergrad research project I’d ever done. I left his office in a cold sweaty panic, desperately trying to remember what version of Apache MIT had been running at the time.

Very smart people, who are otherwise fantastic writers, seem to check every ounce of intuition and personality at the door and churn out soulless documents expounding their experience with the software development life cycle or whatever… because they’re scared that sounding like a human being on their resume or not peppering it with enough keywords will eliminate them from the applicant pool before an engineer even has the chance to look at it.

Writing aside, reading resumes is a shitty and largely thankless task. If it’s not your job, it’s a distraction that you want to get over with so you can go back to writing code. And if it is your job, you probably have a huge stack to get through, so it’s going to be hard to do deep dives into people’s work and projects, even if you’re technical enough to understand them, provided they even include links to their work in the first place. On top of that, spending more time on a given resume may not even yield a more accurate result, at least according to what I observed in this study.

How to fix top-of-the-funnel filtering

Assuming that my results are reproducible and people, across the board, are really quite bad at filtering resumes, there are a few things we can do to make top-of-the-funnel filtering better. In the short term, improving collaboration across different teams involved in hiring is a good start. As we saw, engineers are better at judging certain kinds of resumes, and recruiters are better at others. If a resume has projects or a GitHub account with content listed, passing it over to an engineer to get a second opinion is probably a good idea. And if a candidate is coming from a company with a strong brand, but one that you’re not too familiar with, getting some insider info from a recruiter might not be the worst thing.

Longer-term, how engineers are filtered fundamentally needs to change. In my TrialPay study, I found that, in addition to grammatical errors, one of the things that mattered most was how clearly people described their work. In this study, I found that engineers were better at making judgments on resumes that included these kinds of descriptions. Given these findings, relying more heavily on a writing sample during the filtering process might be in order. For the writing sample, I am imagining something that isn’t a cover letter — people tend to make those pretty formulaic and don’t talk about anything too personal or interesting. Rather, it should be a concise description of something you worked on recently that you are excited to talk about, as explained to a non-technical audience. I think the non-technical audience aspect is critical because if you can break down complex concepts for a layman to understand, you’re probably a good communicator and actually understand what you worked on. Moreover, recruiters could actually read this description and make valuable judgments about whether the writing is good and whether they understand what the person did.

Honestly, I really hope that the resume dies a grisly death. One of the coolest things about coding is that it doesn’t take much time/effort to determine if someone can perform above some minimum threshold — all you need is the internets and a code editor. Of course, figuring out if someone is great is tough and takes more time, but figuring out if someone meets a minimum standard, mind you the same kind of minimum standard we’re trying to meet when we go through a pile of resumes, is pretty damn fast. And in light of this, relying on low-signal proxies doesn’t make sense at all.

Acknowledgements

A huge thank you to:

  • All the engineers who let me use their resumes for this experiment
  • Everyone who participated and took the time to judge resumes
  • The fine people at Statwing and Plotly
  • Stan Le for doing all the behind-the-scenes work that made running this experiment possible
  • All the smart people who were kind enough to proofread this behemoth
1This number is less than 152*6=912 because not everyone who participated evaluated all 6 resumes.
2I created the categories below from participants’ full-text rejection reasons, after the fact.

Featured

Uncategorized

Lessons from a year’s worth of hiring data

Posted on June 21st, 2013.

Note: This post is syndicated from Aline Lerner’s personal blog. Aline is the CEO and co-founder of interviewing.io, and results like these are what inspired her to start this company.

I ran technical recruiting at TrialPay for a year before going off to start my own agency. Because I used to be an engineer, one part of my job was conducting first-round technical interviews, and between January 2012 and January 2013, I interviewed roughly 300 people for our back-end/full-stack engineer position.

TrialPay was awesome and gave me a lot of freedom, so I was able to use my intuition about whom to interview. As a result, candidates ranged from self-taught college dropouts or associate’s degree holders to PhD holders, ACM winners, MIT/Harvard/Stanford/Caltech students, and Microsoft, Amazon, Facebook, and Google interns and employees with a lot of people in between.

While interviewing such a wide cross section of people, I realized that I had a golden opportunity to test some of the prevalent folk wisdom about hiring. The results were pretty surprising, so I thought it would be cool to share them. Here’s what I found:

  • typos and grammatical errors matter more than anything else
  • having attended a top computer science school doesn’t matter
  • listing side projects on your resume isn’t as advantageous as expected
  • GPA doesn’t seem to matter

And the least surprising thing that I was able to confirm was that:

  • having worked at a top company matters

Of course, a data set of size 300 is a pittance, and I’m a far cry from a data scientist. Most of the statistics here is done with the help of Statwing and with Wikipedia as a crutch. With the advent of more data and more rigorous analysis, perhaps these conclusions will be proven untrue. But, you gotta start somewhere.

Why any of this matters

In the status quo, most companies don’t run exhaustive analyses of hiring data, and the ones that do keep it closely guarded and only share vague generalities with the public. As a result, a certain mysticism persists in hiring, and great engineers who don’t fit in “the mold” end up getting cut before another engineer has the chance to see their work.

Why has a pedigree become such a big deal in an industry that’s supposed to be a meritocracy? At the heart of the matter is scarcity of resources. When a company gets to be a certain size, hiring managers don’t have the bandwidth to look over every resume and treat every applicant like a unique and beautiful snowflake. As a result, the people doing initial resume filtering are not engineers. Engineers are expensive and have better things to do than read resumes all day. Enter recruiters or HR people. As soon as you get someone who’s never been an engineer making hiring decisions, you need to set up proxies for aptitude. Because these proxies need to be easily detectable, things like a CS degree from a top school become paramount.

Bemoaning that non-technical people are the first to filter resumes is silly because it’s not going to change. What can change, however, is how they do the filtering. We need to start thinking analytically about these things, and I hope that publishing this data is a step in the right direction.

Method

To sort facts from folk wisdom, I isolated some features that were universal among resumes and would be easy to spot by technical and non-technical people alike and then ran statistical significance tests on them. My goal was to determine which features were the strongest signals of success, which I defined as getting an offer. I ran this analysis on people whom we decided to interview rather than on every applicant; roughly out 9 out of 10 applicants were screened out before the first round. The motivation there was to gain some insight into what separates decent candidates from great ones, which is a much harder question than what separates poor candidates from great ones.

Certainly there will be some sampling bias at play here, as I only looked at people who chose to apply to TrialPay specifically, but I’m hoping that TrialPay’s experience could be a stand-in for any number of startups that enjoy some renown in their specific fields but are not known globally. It also bears mentioning that this is a study into what resume attributes are significant when it comes to getting hired rather than when it comes to on-the-job performance.

Here are the features I chose to focus on (in no particular order):

  • BS in Computer Science from a top school (as determined by U.S. News and World Report)
  • Number of grammatical errors, spelling errors, and syntactic inconsistencies
  • Frequency of buzzwords (programming languages, frameworks, OSes, software packages, etc.)
  • How easy it is to tell what someone did at each of their jobs
  • Highest degree earned
  • Presence of personal projects
  • Work experience in a top company
  • Undergraduate GPA

TrialPay’s hiring bar and interview process

Before I share the actual results, a quick word about context is in order. TrialPay’s hiring standards are quite high. We ended up interviewing roughly 1 in 10 people that applied. Of those, after several rounds of interviewing (generally a phone screen followed by a live coding round followed by onsite), we extended offers to roughly 1 in 50, for an ultimate offer rate of 1 in 500. The interview process is pretty standard, though the company shies away from asking puzzle questions that depend on some amount of luck/clicking to get the correct answer. Instead, they prefer problems that gradually build on themselves and open-ended design and architecture questions. For a bit more about what TrialPay’s interview process (used to) look like, check out Interviewing at TrialPay 101.

The results

Now, here’s what I discovered. The bar height represents effect size. Every feature with a bar was statistically significant, and if you mouse over each bar, you can also see the p-value. These results were quite surprising, and I will try to explain and provide more info about some of the more interesting stuff I found.

The most significant feature by far was the presence of typos, grammatical errors, or syntactic inconsistencies.

Errors I counted included everything from classic transgressions like mixing up “its” and “it’s” to typos and bad comma usage. In the figure below, I’ve created a fictional resume snippet to highlight some of the more common errors.

errors-visual-annotated

This particular result was especially encouraging because it’s something that can be spotted by HR people as well as engineers. When I surveyed 30 hiring managers about which resume attributes they thought were most important, however, no one ranked number of errors highest. Presumably, hiring managers don’t think that this attribute is that important for a couple of reasons: (1) resumes that are rife with mistakes get screened out before even getting to them and (2) people almost expect engineers to be a bit careless with stuff like spelling and grammar. With respect to the first point, keep in mind that the resumes in this analysis were only of people whom we decided to interview. With respect to the 2nd point, namely that engineers shouldn’t be held to the same writing standards as people in more humanities-oriented fields, I give you my next chart. Below is a breakdown of how resumes that ultimately led to an offer stacked up against those that didn’t. (Here, I’m showing the absolute number of errors, but when I ran the numbers against number of errors adjusted for resume length, the results were virtually identical.)

If you want to play with these histograms, just click on the image, and an interactive version will pop up in a separate window.

errorshistogram

As you can see, the distributions look quite different between the group of people who got offers and those that didn’t. Moreover, about 87% of people who got offers made 2 or fewer mistakes.

In startup situations, not only are good written communication skills extremely important (a lot of heavy lifting and decision making happens over email), but I have anecdotally found that being able to write well tends to correlate very strongly with whether a candidate is good at more analytical tasks. Not submitting a resume rife with errors is a sign that the candidate has strong attention to detail which is an invaluable skill when it comes to coding, where there are often all manners of funky edge cases and where you’re regularly being called upon to review others’ code and help them find obscure errors that they can’t seem to locate because they’ve been staring at the same 10 lines of code for the last 2 hours.

It’s also important to note that a resume isn’t something you write on the spot. Rather, it’s a document that you have every opportunity to improve. You should have at least 2 people proofread your resume before submitting it. When you do submit, you’re essentially saying, “This is everything I have done. This is what I’m proud of. This is the best I can do.” So make sure that that is actually true, and don’t look stupid by accident.

Top company

No surprises here. The only surprise is that this attribute wasn’t more significant. Though I’m generally not too excited by judging someone on pedigree, having been able to hold down a demanding job at a competitive employer shows that you can actually, you know, hold down a demanding job at a competitive employer.

Of all the companies that our applicants had on their resumes, I classified the following as elite: Amazon, Apple, Evernote, Facebook, Google, LinkedIn, Microsoft, Oracle, any Y Combinator startup, Yelp, and Zynga.

Undergraduate GPA

After I ran the numbers to try to figure out whether GPA mattered, the outcome was a bit surprising: GPA appeared to not matter at all. Take a look at the GPA distribution for candidates who got offers versus candidates that didn’t (click to get a bigger, more interactive version).

As a caveat, it’s worth mentioning that roughly half of our applicants didn’t list their GPAs on their resumes, so not only is the data set smaller, but there are probably some biases at play. I did some experiments with filling in the missing data and separating out new grads, and I will discuss those results in a future post.

Is it easy to tell what the candidate actually did?

Take a look at this role description:

good_description

Now take a look at this one:

bad_description

In which of these is it easier to tell what the candidate did? I would argue that the first snippet is infinitely more clear than the second. In the first, you get a very clear idea of what the product is, what the candidate’s contribution was in the context of the product, and why that contribution matters. In the second, the candidate is using some standard industry lingo as a crutch — what he said could easily be applied to pretty much any software engineering position.

Judging each resume along these lines certainly wasn’t an exact science, and not every example was as cut-and-dry as the one above. Moreover, while I did my best to avoid confirmation bias while deciding whether I could tell what someone did, I’m sure that the system wasn’t perfect. All this said, however, I do find this result quite encouraging. People who are passionate about and good at what they do tend to also be pretty good at cutting to the chase. I remember the feeling of having to write my resume when I was looking for my first coding job, and I distinctly remember how easily words flowed when I was excited about a project versus when I knew inside that whatever I had been working on was some bullshit crap. In the latter case is when words like “software development life cycle” and a bunch of acronyms reared their ugly heads… a pitiful attempt to divert the reader from lack of substance by waving a bunch of impressive sounding terms in his face.

This impression is further confirmed by a word cloud generated from candidate resumes that received an offer versus those that didn’t. For these clouds, I took words that appeared very frequently in one data set relative to how often they appeared in the other one.

wordcloud_offer

Offer

 

wordcloud_nooffer

No offer

 

As you can see, “good” resumes focused much more on action words/doing stuff (“manage”, “ship”, “team”, “create”, and so on) versus “bad” resumes which, in turn, focused much more on details/technologies used/techniques.

Highest degree earned

Though highest degree earned didn’t appear to be significant in this particular data set, there was a definite trend that caught my attention. Take a look at the graph of offers extended as a function of degree.

 

As you can see, the higher the degree, the lower the offer rate. I’m confident that with the advent of more data (especially more people without degrees and with master’s degrees), this relationship will become more clear. I believe that self-motivated college dropouts are some of the best candidates around because going out of your way to learn new things on your own time, in a non-deterministic way, while juggling the rest of your life is, in some ways, much more impressive than just doing homework for 4 years. I’ve already ranted quite a bit about how worthless I find most MS degrees to be, so I won’t belabor the point here.1

BS in Computer Science from a top school

But wait, you say, even if highest degree earned doesn’t matter, not all BS degrees are created equal! And, having a BS in Computer Science from a top school must be important because it’s in every fucking job ad I’ve ever seen!

And to you I say, Tough shit, buddy. Then I feel a bit uncomfortable using such strong language, in light of the fact that n ~= 300. However, roughly half of the candidates (122, to be exact) in the data set were sporting some fancy pieces of paper. And yet, our hire rate was not too different among people who had said fancy pieces of paper and those that didn’t. In fact, in 2012, half of the offers we made at TrialPay were to people without a BS in CS from a top school. This doesn’t mean that every dropout or student from a 3rd rate school is an unsung genius — there were plenty that I cut before interviewing because they hadn’t done anything to offset their lack of pedigree. However, I do hope that this finding gives you a bit of pause before taking the importance of a degree in CS from a top school at face value.

pedigree

In a nutshell, when you see someone who doesn’t have a pedigree but looks really smart (has no errors/typos, very clearly explains what they worked on, shows passion, and so forth), do yourself a favor and interview them.

Personal projects

Of late, it’s become accepted that one should have some kind of side projects in addition to whatever it is you’re doing at work, and this advice becomes especially important for people who don’t have a nice pedigree on paper. Sounds reasonable, right? Here’s what ends up happening. To game the system, applicants start linking to virtually empty GitHub accounts that are full of forked repos where they, at best, fixed some silly whitespace issue. In other words, it’s like 10,000 forks when all you need is a glimmer of original thought.

Yay forks.

Outside of that, there’s the fact that not all side projects are created equal. I can find some silly tutorial for some flashy UI thing, copy the code from it verbatim, swap in something that makes it a bit personal, and then call that a side project on my resume. Or I can create a new, actually useful JavaScript framework. Or I can spend a year bootstrapping a startup in my off hours and get it up to tens of thousands of users. Or I can arbitrarily call myself CTO of something I spaghetti-coded in a weekend with a friend.

Telling the difference between these kinds of projects is somewhat time-consuming for someone with a technical background and almost impossible for someone who’s never coded before. Therefore, while awesome side projects are a HUGE indicator of competence, if the people reading resumes can’t (either because of lack of domain-specific knowledge or because of time considerations) tell the difference between awesome and underwhelming, the signal gets lost in the noise.

Conclusion

When I started this project, it was my hope that I’d be able to debunk some myths about hiring or at least start a conversation that would make people think twice before taking folk wisdom as gospel. I also hoped that I’d be able to help non-technical HR people get better at filtering resumes so that fewer smart people would fall through the cracks. Some of my findings were quite encouraging in this regard because things like typos/grammatical errors, clarity of explanation, and whether someone worked at an elite company are all attributes that a non-technical person can parse. I was also especially encouraged by undergraduate pedigree not necessarily being a signal of success. At the end of the day, spotting top talent is extremely hard, and much more work is needed. I’m optimistic, however. As more data becomes available and more companies embrace the spirit of transparency, proxies for aptitude that don’t stand up under scrutiny will be eliminated, better criteria will take their place, and smart, driven people will have more opportunities to do awesome things with their careers than ever before.

Acknowledgements

A huge thank you to:

  • TrialPay, for letting me play with their data and for supporting my ideas, no matter how silly they sounded.
  • Statwing, for making statistical analysis civilized and for saving me from the horrors of R (or worse, Excel).
  • Everyone who suggested features, helped annotate resumes, or proofread this monstrosity.

Lastly, see Hacker News for some good discussion.

1It is worth mentioning that my statement about MS degrees potentially being a predictor of poor interview performance does not contradict this data — when factoring in other roles I interviewed for, especially more senior ones like Director of Engineering, the (negative) relationship is much stronger.↩

Looking for a job yourself? Work with a recruiter who’s a former engineer and can actually understand what you’re looking for. Drop me a line at aline@alinelerner.com.