“WE ARE NOT A CULT”: Analysing the lifestyle of Dogspotting through data

Everything kind of sucks everywhere at the moment. One thing that doesn’t suck is the Facebook group called ‘Dogspotting‘. In Dogspotting members post pictures of dogs they’ve seen in the street, and other players rate them with points, these points are accrued all in the hope of winning ‘The Big Prize’. The rules are strictly enforced by a team of dedicated admins, knowing that they are under the scrutiny of not only the dogspotting people’s court but also the hacks at the dogspotting gossip and gab magazine.

Screen Shot 2016-07-23 at 13.57.12.png

It’s silly but it’s the best kind of silly. Also its extremely popular, the group has almost a quarter of a million members and its still growing. I thought I’d take a look at how and when people spot dogs, partly to help me on the way to win The Big Prize and partly just for fun, so I took every single dogspotting post from the groups inception to now: 229,971 posts scraped using this code by github user minimaxir.

What I’ve got: every single dogspot ever made

What I’m going to do with it: See what when and why people spot dogs to win the Big Prize

Continue reading

Analysing Handwritten Digits Using Principal Component Analysis

I do quite a few projects which get a few cool graphs in them but no interesting conclusions or discoveries, and so instead of just leaving them to rot in my ‘odd_projects’ folder, I thought I’d start publishing some short posts outlining what i did (like really just an outline, I probably wont go very deep into the theory) and sharing the graphs, so here goes:

What I have: The MNIST database, a database of 70,000 handwritten digits labelled by what number they’re meant to be

What I’m going to do with it: Use principal component analysis to compare relative difficulties of classifying handwritten digits

Continue reading