I’ve got no business writing a blog about statistics. This isn’t going to be zeitgeisty and impactful because I’m neither of those things, and it isn’t going to ‘make statistics fun’ because statistics generally isn’t fun. However, since we’ve accepted that, we can have some fun by asking questions that the smart people don’t have time for. That’s what gutterstats is about, stupid questions with stupid answers. Today’s question is: who’s hungry? We’re gonna look at the kind of person who tweets about their stomach and see how they differ from the ‘average’ tweeter.
At some point over the last year, computers started doing what I told them to. So I told them to take every tweet sent with a certain set of words, I had a couple of ideas, my brief was “well lets just get some tweets”. I tried something that was trending, which was the Milibrand meeting, and alongside it I just got every tweet of anyone saying they were hungry.
Guess which is which.
If the words on the left slowed down for a second, you’d see those are the cries of the world’s hungry. The fact that it is updating much quicker than the Milibrand one shows that the public are much much more inclined to tell you about the status of their stomach than what they think about Russell Brand, thank god.
So that settled it. I took an hour and a bit (3.48pm – 5.21pm 30th June) of people telling me they’re hungry and saw what I could get from it. There were 13,782 tweets. THIRTEEN THOUSAND.
The first thing I got from it is apparently you can see emojis on the command line:
Here’s another bunch of inconsequential shit that I gleaned from it under broad, ill-defined and useless titles
Is there originality in hunger?
A few of the tweets were word for word exactly the same as one another, look at these hungry, unoriginal squares.
Top 10 hungry tweets
- I’m hungry (182 tweets)
- RT @HashtagAbdul: Yo, I’m hungry. (128 tweets)
- I’m so hungry (105 tweets)
- RT @Scripture_Truth: If your enemy is hungry, give him food to eat; if he is thirsty, give him water to drink. -Proverbs 25:21 (99 tweets)
- Hungry (84 tweets)
- RT @TheEarthPeople: Starving boy and a missionary. [some link] (82 tweets)
- Hungry af (62 tweets)
- I’m starving (54 tweets)
- Starving (52 tweets)
- So hungry (47 tweets)
The top one is obvious, but the second one really puzzled me and got me going a bit Serial. I looked up Hashtag Abdul (https://twitter.com/HashtagAbdul) to see what was going on.
His tweet was from 9 days before, it’s had 777 retweets at time of looking and 254 favourites. He’s making out like he’s got no idea whats going on.
I was mega suspicious of whether this guy had been buying retweets or whatever, people retweet boring stuff all the time, sure, but why 9 days later? Well because of this:
Pretty weird that he pushed this in the middle of the time I was trawling for this data. You’re going to have to decide what’s most likely out of either a) the worlds big enough for something like this to happen by chance or b) me and #Abdul are colluding to try and get him his ≈1k retweets. I’m not giving you any clues.
‘hungry’ tweets came up about 7x more than ‘starving’ tweets.
‘hungry’ or ‘starving’ tweets came up about 24x more than ‘HUNGRY’ or ‘STARVING’.
The dangers of overtweeting
There’s plenty of information given to you if you trawl these tweets off the internet. One thing we get is how many other tweets the user has.
If someone’s likely to tweet about how hungry they are, it wouldn’t be too much of a leap to say that they’re likely to tweet a lot, and therefore the average amount of tweets a user has should be more if they’ve tweeted “hungry”.
The problem is that its not clear how much an ‘average person’ tweeting at the moment will tweet. What we need are samples of tweets which differ from the hungry tweets in a way that we have a grasp over.
We could just take every tweet sent, but that would be likely to be a lot of spambots, which would be a shitty control sample because spambots don’t get hungry as much as regular people do.
So I came up with two different control samples. First was anyone who tweeted with the word ‘and’, second was anyone who tweeted with the word ‘lol’.
The first thing to point out is that people tweet these words a shitload. It took me an hour and a half to get ≈13,000 tweets with ‘hungry’ in them, it took me about 3 minutes to get the same amount of tweets with ‘and’ or ‘lol’.
So I did the normal science thing and took the ‘mean’ of the total tweets. In effect what a mean is is dumping all your cookies into a communal jar and sharing them out. The result was pretty bizarre.
Mean amount of tweets of a user which tweets [word]:
The order in which these are ranked is just about believable, if you’re sticking two separate thoughts in a tweet (with ‘and’) you’re likely to be more economical with tweets than hungry people or lollers.
However, this is obviously wrong. The average user tweeting ‘lol’ having 30,000 tweets doesn’t seem to correspond to reality at all, its so large. I’ve been tweeting nonsense daily for all of my adult life and I’m just pushing 14,000.
The problem here is in two parts: a) losers on the internet and b) using the mean as the average. Take this situation:
Ten people walk into a dog friendly pub. The owner checks how many dogs they have at the door, the list goes:
Another person walks in, how many dogs would you guess they have?
Guy number 9 has a whole mess of dogs, he’s really pushing the limits of the “dog friendly” label. If another person walks in, how many dogs should we expect them to have? If we just go with the mean, we’d expect the next person should have 63,247 dogs. This is really stupid and is due to the fact that there are just some bozos with a whole mess of dogs which are impacting on any reasonable guess we can have. The same is true of this twitter list. I pulled up the person with the most tweets out of the ‘and’s. They had just short of TWO MILLION TWEETS. People with this amount of tweets and dogs are just ruining it for us normos.
The best way to ignore those morons is using the only other good average, the median (who uses mode?). What we do is rank the dog owners by how many dogs they have; 0,1,1,1,1,1,1,2,2,632456; and pick the one in the middle. That makes us guess that the next person will have 1 dog, much more believable.
So lets do the same with our tweets
Median amount of tweets of a user who tweets [word]
On one hand this is much nicer and on the other hand this is absolutely wild.
On the nicer hand we see a believable ranking: lollers tweet more than hungry people which tweet more than the boring ands. We also see a more believable number of tweets – Joey Essex has about 20,000 and Aaron Rodgers has 2,000 and they’re a pretty realistic cross section of society.
Now on the wild side. If the median is a good representative of the average twitter user for these samples then someone who tweets ‘lol’ has double the amount of tweets that someone who tweets ‘and’. This can’t be a coincidence, if I’d taken 5 people and made this claim then it would look like just some fluke that there was this distinction. I took 13,000 people, which is far more than necessary to give this significance.
Anyone who grew up on MSN Messenger knows that ‘lol’ is a perfect way to end staccato messages: “I’m just hanging out lol” “I’m really tired lol”. ‘Lol’ is an apology for a message you know doesn’t really stand up by itself. If you’re sending an email though, a lol isn’t going to cut it, you’ve got to go for “I’m just hanging out and I’m really tired so I probably can’t meet today”. Maybe the difference between tweeting ‘and’ and tweeting ‘lol’ is whether you see twitter more like emails or MSN. If you’ll allow me to spout some ill-thought-out nonsense, you could argue that the fact that the average loller has twice as many tweets as the ‘and’ people implies that they’re tweeting the exact same stuff, but the ‘and’s are doubling up on their tweets.
To finish on a high, lets go with my favourite stat:
Of the tweets collected with [word] in them, how many were their user’s FIRST EVER TWEET
I’m very willing to believe that more people would rather their first tweet to be about two separate brilliant ideas connected with ‘and’ than about their hunger, chalk this one up to grandiose but ultimately doomed plans of a continued quality of content. I can think of no more suitable way to end my first blogpost.
Anyway, the next time I post I’ll have more than an hour’s tweets. This poor sucker is currently reading all the boring ‘royal baby’ tweets, and can do so even when I wanna watch Netflix.