We built voice modulation to mask gender in technical interviews. Here’s what happened.

interviewing.io is a platform where people can practice technical interviewing anonymously and, in the process, find jobs based on their interview performance rather than their resumes. Since we started, we’ve amassed data from thousands of technical interviews, and in this blog, we routinely share some of the surprising stuff we’ve learned. In this post, I’ll talk about what happened when we built real-time voice masking to investigate the magnitude of bias against women in technical interviews. In short, we made men sound like women and women sound like men and looked at how that affected their interview performance. We also looked at what happened when women did poorly in interviews, how drastically that differed from men’s behavior, and why that difference matters for the thorny issue of the gender gap in tech.

The setup

When an interviewer and an interviewee match on our platform, they meet in a collaborative coding environment with voice, text chat, and a whiteboard and jump right into a technical question. Interview questions on the platform tend to fall into the category of what you’d encounter at a phone screen for a back-end software engineering role, and interviewers typically come from a mix of large companies like Google, Facebook, Twitch, and Yelp, as well as engineering-focused startups like Asana, Mattermark, and others. For more context, some examples of interviews done on the platform can be found on our public recordings page.

After every interview, interviewers rate interviewees on a few different dimensions.

Feedback form for interviewers
Feedback form for interviewers

As you can see, we ask the interviewer if they would advance their interviewee to the next round. We also ask about a few different aspects of interview performance using a 1-4 scale. On our platform, a score of 3 or above is generally considered good.

Women historically haven’t performed as well as men…

One of the big motivators to think about voice masking was the increasingly uncomfortable disparity in interview performance on the platform between men and women1. At that time, we had amassed over a thousand interviews with enough data to do some comparisons and were surprised to discover that women really were doing worse. Specifically, men were getting advanced to the next round 1.4 times more often than women. Interviewee technical score wasn’t faring that well either — men on the platform had an average technical score of 3 out of 4, as compared to a 2.5 out of 4 for women.

Despite these numbers, it was really difficult for me to believe that women were just somehow worse at computers, so when some of our customers asked us to build voice masking to see if that would make a difference in the conversion rates of female candidates, we didn’t need much convincing.

… so we built voice masking

Since we started working on interviewing.io, in order to achieve true interviewee anonymity, we knew that hiding gender would be something we’d have to deal with eventually but put it off for a while because it wasn’t technically trivial to build a real-time voice modulator. Some early ideas included sending female users a Bane mask.

Early voice masking prototype
Early voice masking prototype (drawing by Marcin Kanclerz)

When the Bane mask thing didn’t work out, we decided we ought to build something within the app, and if you play the videos below, you can get an idea of what voice masking on interviewing.io sounds like. In the first one, I’m talking in my normal voice.

And in the second one, I’m modulated to sound like a man.2

Armed with the ability to hide gender during technical interviews, we were eager to see what the hell was going on and get some insight into why women were consistently underperforming.

The experiment

The setup for our experiment was simple. Every Tuesday evening at 7 PM Pacific, interviewing.io hosts what we call practice rounds. In these practice rounds, anyone with an account can show up, get matched with an interviewer, and go to town. And during a few of these rounds, we decided to see what would happen to interviewees’ performance when we started messing with their perceived genders.

In the spirit of not giving away what we were doing and potentially compromising the experiment, we told both interviewees and interviewers that we were slowly rolling out our new voice masking feature and that they could opt in or out of helping us test it out. Most people opted in, and we informed interviewees that their voice might be masked during a given round and asked them to refrain from sharing their gender with their interviewers. For interviewers, we simply told them that interviewee voices might sound a bit processed.

We ended up with 234 total interviews (roughly 2/3 male and 1/3 female interviewees), which fell into one of three categories:

  • Completely unmodulated (useful as a baseline)
  • Modulated without pitch change
  • Modulated with pitch change

You might ask why we included the second condition, i.e. modulated interviews that didn’t change the interviewee’s pitch. As you probably noticed, if you played the videos above, the modulated one sounds fairly processed. The last thing we wanted was for interviewers to assume that any processed-sounding interviewee must summarily have been the opposite gender of what they sounded like. So we threw that condition in as a further control.

The results

After running the experiment, we ended up with some rather surprising results. Contrary to what we expected (and probably contrary to what you expected as well!), masking gender had no effect on interview performance with respect to any of the scoring criteria (would advance to next round, technical ability, problem solving ability). If anything, we started to notice some trends in the opposite direction of what we expected: for technical ability, it appeared that men who were modulated to sound like women did a bit better than unmodulated men and that women who were modulated to sound like men did a bit worse than unmodulated women. Though these trends weren’t statistically significant, I am mentioning them because they were unexpected and definitely something to watch for as we collect more data.

On the subject of sample size, we have no delusions that this is the be-all and end-all of pronouncements on the subject of gender and interview performance. We’ll continue to monitor the data as we collect more of it, and it’s very possible that as we do, everything we’ve found will be overturned. I will say, though, that had there been any staggering gender bias on the platform, with a few hundred data points, we would have gotten some kind of result. So that, at least, was encouraging.

So if there’s no systemic bias, why are women performing worse?

After the experiment was over, I was left scratching my head. If the issue wasn’t interviewer bias, what could it be? I went back and looked at the seniority levels of men vs. women on the platform as well as the kind of work they were doing in their current jobs, and neither of those factors seemed to differ significantly between groups. But there was one nagging thing in the back of my mind. I spend a lot of my time poring over interview data, and I had noticed something peculiar when observing the behavior of female interviewees. Anecdotally, it seemed like women were leaving the platform a lot more often than men. So I ran the numbers.

What I learned was pretty shocking. As it happens, women leave interviewing.io roughly 7 times as often as men after they do badly in an interview. And the numbers for two bad interviews aren’t much better. You can see the breakdown of attrition by gender below (the differences between men and women are indeed statistically significant with P < 0.00001).

Also note that as much as possible, I corrected for people leaving the platform because they found a job (practicing interviewing isn’t that fun after all, so you’re probably only going to do it if you’re still looking), were just trying out the platform out of curiosity, or they didn’t like something else about their interviewing.io experience.

A totally speculative thought experiment

So, if these are the kinds of behaviors that happen in the interviewing.io microcosm, how much is applicable to the broader world of software engineering? Please bear with me as I wax hypothetical and try to extrapolate what we’ve seen here to our industry at large. And also, please know that what follows is very speculative, based on not that much data, and could be totally wrong… but you gotta start somewhere.

If you consider the attrition data points above, you might want to do what any reasonable person would do in the face of an existential or moral quandary, i.e. fit the data to a curve. An exponential decay curve seemed reasonable for attrition behavior, and you can see what I came up with below. The x-axis is the number of what I like to call “attrition events”, namely things that might happen to you over the course of your computer science studies and subsequent career that might make you want to quit. The y-axis is what portion of people are left after each attrition event. The red curve denotes women, and the blue curve denotes men.

Now, as I said, this is pretty speculative, but it really got me thinking about what these curves might mean in the broader context of women in computer science. How many “attrition events” does one encounter between primary and secondary education and entering a collegiate program in CS and then starting to embark on a career? So, I don’t know, let’s say there are 8 of these events between getting into programming and looking around for a job. If that’s true, then we need 3 times as many women studying computer science than men to get to the same number in our pipelines. Note that that’s 3 times more than men, not 3 times more than there are now. If we think about how many there are now, which, depending on your source, is between 1/3 and a 1/4 of the number of men, to get to pipeline parity, we actually have to increase the number of women studying computer science by an entire order of magnitude.

Prior art, or why maybe this isn’t so nuts after all

Since gathering these findings and starting to talk about them a bit in the community, I began to realize that there was some supremely interesting academic work being done on gender differences around self-perception, confidence, and performance. Some of the work below found slightly different trends than we did, but it’s clear that anyone attempting to answer the question of the gender gap in tech would be remiss in not considering the effects of confidence and self-perception in addition to the more salient matter of bias.

In a study investigating the effects of perceived performance to likelihood of subsequent engagement, Dunning (of Dunning-Kruger fame) and Ehrlinger administered a scientific reasoning test to male and female undergrads and then asked them how they did. Not surprisingly, though there was no difference in performance between genders, women underrated their own performance more often than men. Afterwards, participants were asked whether they’d like to enter a Science Jeopardy contest on campus in which they could win cash prizes. Again, women were significantly less likely to participate, with participation likelihood being directly correlated with self-perception rather than actual performance.3

In a different study, sociologists followed a number of male and female STEM students over the course of their college careers via diary entries authored by the students. One prevailing trend that emerged immediately was the difference between how men and women handled the “discovery of their [place in the] pecking order of talent, an initiation that is typical of socialization across the professions.” For women, realizing that they may no longer be at the top of the class and that there were others who were performing better, “the experience [triggered] a more fundamental doubt about their abilities to master the technical constructs of engineering expertise [than men].”

And of course, what survey of gender difference research would be complete without an allusion to the wretched annals of dating? When I told the interviewing.io team about the disparity in attrition between genders, the resounding response was along the lines of, “Well, yeah. Just think about dating from a man’s perspective.” Indeed, a study published in the Archives of Sexual Behavior confirms that men treat rejection in dating very differently than women, even going so far as to say that men “reported they would experience a more positive than negative affective response after… being sexually rejected.”

Maybe tying coding to sex is a bit tenuous, but, as they say, programming is like sex — one mistake and you have to support it for the rest of your life.

Why I’m not depressed by our results and why you shouldn’t be either

Prior art aside, I would like to leave off on a high note. I mentioned earlier that men are doing a lot better on the platform than women, but here’s the startling thing. Once you factor out interview data from both men and women who quit after one or two bad interviews, the disparity goes away entirely. So while the attrition numbers aren’t great, I’m massively encouraged by the fact that at least in these findings, it’s not about systemic bias against women or women being bad at computers or whatever. Rather, it’s about women being bad at dusting themselves off after failing, which, despite everything, is probably a lot easier to fix.

1Roughly 15% of our users are female. We want way more, but it’s a start.

 

2If you want to hear more examples of voice modulation or are just generously down to indulge me in some shameless bragging, we got to demo it on NPR and in Fast Company.

3In addition to asking interviewers how interviewees did, we also ask interviewees to rate themselves. After reading the Dunning and Ehrlinger study, we went back and checked to see what role self-perception played in attrition. In our case, the answer is, I’m afraid, TBD, as we’re going to need more self-ratings to say anything conclusive.

265 thoughts on “We built voice modulation to mask gender in technical interviews. Here’s what happened.”

  1. Pingback: Gender Blindness Favors Men (again) | I,Hypocrite

  2. Pingback: Is Silicon Valley Mistreating Women in Tech? - Johnson Employment Law

  3. Pingback: Google and the diversity debate - Page 5 - Hong Kong Forums - GeoExpat.Com

  4. Pingback: A More Social Software Engineer – Ragged Clown

  5. Pingback: 7 Fascinating Recruitment News Stories to Emerge this Week - 11 July

  6. Pingback: Як ваша стать впливає на результативність ваших співбесід – Двоквітка

  7. Well, or it could just be that overall, men and women are actually different in ways that are not due to socialization.

    I read the entire article and found it contains hints of presupposition that it cannot be this way, throughout.

    This doesn’t mean that any particular woman cannot outperform any particular man, it’s just a matter of aggregates. Nor is this support for bias against woman who are able to compete, it’s just a recognition that an a priori expectation of identical outcomes with respect to mirroring sex based demographics in society within all professions, may be barking up the wrong tree.

    1. One thing you don’t seem to touch on was that by the time women are taking their interviews, they’ve already experienced more attrition events. So it may not be accurate to say that women drop off from attrition events at higher rates, when they may be on their 10th rather than 4th setback. I know by the time I got to my first CS class, I’d already had numerous setbacks and several male peers who actively tried to discourage or prevent me from studying CS. It seems like a failing of your analysis not to consider that possibility, but rather claim women are just worse at brushing themselves off.

      It is nice to know that the interviewer bias wasn’t a huge factor in your results, but sexism is still a very real issue. All you have to do is read the comments section for this article and you’ll see exactly the attitudes that eventually drive many women from CS.

      1. That’s funny. Do attitudes of women drive away men from teaching, or nursing? Or, perhaps, are you saying that one gender is more sensitive, and needs more support?

        Because men deal with a lot of stuff that women would see as “bad attitudes”, by other men. Yet, they themselves succeed.

      2. I think that’s a very interesting idea!

        if we assume that everyone regardless of sex has a set amount of “you suck, go away” that they can support, after which they quit.

        but women receive more of that negative feedback just because of their sex.
        Then women will quit faster ( since they already received a lot of extra abuse).

        Plus, I guess some form of ~ “stereotype threat” could be at play too:
        – man fails -> (subconsciously thinks) I guess I need to learn more -> learns more -> try again
        -> woman fails -> (subconsciously thinks) woman like me don’t belong here -> I can’t that -> quits

  8. Pingback: e⁻² ≈ 0.1 - הג'וניור המצוי

  9. Name Withheld

    What an interesting article.

    The section about attrition events reminded me of several more studies and reports along a similar line. I’ve read in the past that women who graduate with STEM degrees often change careers into non engineering fields after something like 1-4 years, while men stay in the field much longer. I’m sure this is familiar to people reading and I hope someone can post a link to the actual report.

    This is anecdotal, but I’m also thinking back to my time as a student, both graduate and undergraduate. Besides the typical male dominated demographics, I noticed many women ended up changing their major to a non scientific degree. I never gave much thought because I knew a lot of men who changed majors too. But after reading this article, it makes me suspect that women tend to change to non STEM degrees at a higher rate.

  10. Why should this surprise anyone? No one assumes the fact that men make up the majority of the Darwin Awards and prison population due to sexism. Does equality mean “men and women are exactly the same, except when men are inferior”?

  11. Frank Ch. Eigler

    “to get to pipeline parity, we actually have to increase the number of women studying computer science by an entire order of magnitude.”

    Maybe the putative benefits of pipeline parity are not worth the cost of such invasive measures.

  12. Pingback: News of the Week (March 31st, 2018) | The Political Hat

  13. Pingback: How can we fix tech recruiting? | VizzyV.com

  14. Pingback: You can’t fix diversity in tech without fixing the technical interview. – Kleine Hassler Blog

  15. Pingback: The conversation about diversity in tech is getting hijacked by bad research – Latest Online News Website

  16. Pingback: The conversation about diversity in tech is getting hijacked by bad research – The Real News Nowadays

  17. Pingback: The conversation about diversity in tech is getting hijacked by bad research - World News

  18. Pingback: The conversation about diversity in tech is getting hijacked by bad research - MIllennial new world

  19. Pingback: The conversation about diversity in tech is getting hijacked by bad research -

  20. Pingback: The conversation about diversity in tech is getting hijacked by bad research – Hire DB: Candidate Sourcing

  21. Are the interviewers female? You are missing a huge point that a lot of women communicate differently. I’ve only learned a couple CS concepts from women(one TA in undergrad and one colleague who taught me MVC). That and my mom who taught me
    math basics and how to drive. Learned much faster and things made more sense. Even though my dad knew much more advanced math, I never quite understood him the way I did my mom.

    1. Your modulated voice sounds gay to me. There is a good possibility that we observe gay people being discriminated here. Tech is a macho culture.

      1. “Once you factor out interview data from both men and women who quit after one or two bad interviews, the disparity goes away entirely”

        I’d presume that the voice would still sound gay after two interviews. so if that were the case the disparity won’t go away. or am I missing something?

    2. would that mean that single-sex education is better? since you can match the sex of the teacher to the pupils?

  22. Pingback: [Перевод] Почему так важно сообщать соискателю, что пошло не так на собеседовании (и как это сделать правильно) – CHEPA website

  23. Pingback: "What is Beautiful is Good" - Beauty Bias in Selection | Luxembourg Slovenian Business Club | Luxembourg Slovenian Business Club A.s.b.l. - LSBC

  24. Pingback: How to write stuff that gets on the front page of Hacker News - Aline Lerner's BlogAline Lerner's Blog

  25. Pingback: The technical interview practice gap, and how it keeps underrepresented groups out of software engineering - interviewing.io blog

  26. I’m doubtful of the validity that a women’s modulated voice sounds like a male. Besides tone and pitch, there are speech patterns which get coded “female” that this experiment does not account for. It seems irresponsible to deduce from this single flawed experiment that systemic bias is not real and the solution is that women simply need to “toughen up” to solve the gender disparity in tech.

  27. Pingback: “After drawing on data from thousands of technical interviews, it’s become clear to us that technical interviewing is a process whose results are nondeterministic and often arbitrary. We believe that technical interviewing is a broken process for

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top