Sexism in films in data: IMDB, Hollywood and Language

First up:

DISCLAIMER 1: This is the first of many disclaimers on this post: I am terribly placed to write this. First up, I’m a guy. So although I care about sexism I’m probably not the best person to talk about it. Also, I genuinely do not like movies – I think since TV has made good long form series the standard it doesn’t make sense for me to pay £10 to go to see some superhero origin story with some rich white dudes who I’m supposed to know the names of. I’m sure the medium has its benefits and I’m being unfair but in the spirit of journalistic integrity before I write this post I need to admit that over the last year I’ve watched the entire series of Peep Show through more times  than I’ve gone to the cinema.
Anyway, me being a philistine aside, I found the Kaggle dataset of IMDB movies and wanted to see whether we could also see the sexism in Hollywood in the data.

What I’ve got: IMDB data for 5000 movies
What I’m going to do with it: Show that movies featuring women are rated lower, have lower budgets, but are more profitable than movies featuring men. Also that films with men in have titles which are incredibly phallic.


