Categories
Blog

Exploratory Data Analysis (EDA) —  Understanding the Gender Divide in Data Science Roles

with Shreejaya Bharathan on 2018 Kaggle ML & DS Survey data

Women have been historically underrepresented in STEM fields and face discrimination in the workplace. According to a study conducted in 2018, “63 percent of the time, women receive lower salary offers than men for the same job at the same company.’’

Does the Data Science industry inherit these biases from other traditional industries?

Dataset

We chose the 2018 Kaggle ML & DS Survey to analyze the gender divide in Data Science. The survey has information about education, skills, age, income and roles which are relevant for our study.

https://www.kaggle.com/kaggle/kaggle-survey-2018

We also used the 2017 version of the survey for comparative analysis.

https://www.kaggle.com/kaggle/kaggle-survey-2018

Since most data scientists use Kaggle either to get datasets, collaborate or participate in data challenges and the survey has over 23K responses, we considered it to be a good representation of the whole community.

EDA and visualization

We preprocessed the data by removing all responses that took less than 5 minutes to complete, as there were about 50 questions and a genuine response would take longer than 5 minutes. Next, since we are doing a comparison between males and females, we removed all other genders and missing values. All data manipulation was done using pandas and plots created using ggplot2.

Check out our code here:

https://www.kaggle.com/kaggle/kaggle-survey-2018

Only 16% of the respondents were women

% Respondents 2018 — Female vs Male

It was quite shocking to see that such a small portion of the community is comprised of women.

United States has the highest portion of women in Data Science

% Women by countries — 2017 vs 2018

We then compared the portion of women in 5 countries having the most number of respondents and how this portion changed from 2017 to 2018. We discovered that in the US, almost 1/4th of the data science community is women, and it’s doing much better than other countries in terms of the gender ratio. For Russia, the percentage of women actually seems to be going down!

Women and Men have a similar distribution of Roles

It’s good to see that men and women have roughly the same distribution of roles. About half of the respondents are Data Scientists, 20–30 % Data Analysts and the rest are Research Scientists, Data Engineers, or DBA/Database Engineers.

https://www.kaggle.com/kaggle/kaggle-survey-2018

Women hold more advanced degrees than men

The percentage of respondents holding advanced degrees like a Master’s or Ph.D is greater for women.

Women with the same level of education earn less than men

It was really sad to see that the highest average salary earned by women (for Ph.D) is, in fact, lower than the average salaries earned by men of any level of education. Women earn about 20–25% percent lesser than men at any given level of education!

Income gap increases with age

Women and men tend to start off with similar salaries but the gap between salaries gets wider and wider as we look at higher age groups. This suggests that a lesser percentage of women are considered for promotions and higher roles.

Summary

The Kaggle DS and ML surveys gave us a thought-provoking insight into the state of women in Data Science. Being a considerably new field, it’s not too late for us to ensure that the Data Science community is inclusive and welcoming to everyone.

Leave a Reply

Your email address will not be published. Required fields are marked *