How does emoji usage on Twitter differ across the United States? Great question! Let’s explore…
Here’s a larger version of this map, created with d3.js. In addition, all of the code for this analysis is on GitHub.
Details
Using Twitter’s streaming API, I collected ~1.8 million tweets (over a week last month) that were geolocated within the USA. Of those, ~410k tweets contained at least one emoji. From there, I split up the tweets based on the state they came from and then tallied up all of the emojis.
A couple initial thoughts: 1) the emoji dictionary is expansive—there were 1,067 different emojis used across all of the tweets. 2) The ”face with tears of joy” emoji (😂) is crazy popular. Of the 821k total emojis (many tweets contain more than one), 16% (134k) were 😂s; this is more than the second (😭), third (❤️), fourth (😍), and fifth (🔥) most popular emojis, combined.
If you simply plotted the most commonly used emoji for each state (which I did at first), it’s a pretty boring and repetitive map with 49 😂s out of 51 (including DC). To overcome the monotony, we can compute the “term frequency–inverse document frequency” (tf–idf for short) for each emoji. This is a numerical technique to determine words (emojis) that are significant to a document (tweets within a state) within a larger corpus (all USA tweets). As explained by Wikipedia, ”the tf-idf value increases proportionally to the number of times a word appears in the document, but is often offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general.”
With emojis ranked by tf-idf, the maps (top 1, top 4, also see image above) are a lot more interesting. I love seeing emojis for a state match up with things I associate with that state, like the mountains (🏔) in Alaska and Colorado, volcano (🌋) in Hawaii, shamrock (☘) in Massachusetts, and checkered flag (🏁) in Indiana. I also like seeing the creative use of emojis that resemble letters, like the 〰️ (Wavy Dash) for “W” in Washington or the 🔰 (Japanese Symbol for Beginner) for “V” in Virginia. Some of these don’t make as much sense to me, like the mouse (🐁) in South Dakota or nose (👃) in West Virginia, but that might be more of a reflection on me and my unculturedness 😜.
How do you feel about the emojis that represent your state? I hope you think they’re 👌.
Code
If you’re curious about how I collected and aggregated the data, or want to extend this analysis or do something similar, see below for some code snippets.
step 1: fetch tweets (within USA)
🚨 100 tweets... (4 secs elapsed)
🚨 200 tweets... (8 secs elapsed)
🚨 300 tweets... (13 secs elapsed)
🚨 400 tweets... (17 secs elapsed)
🚨 500 tweets... (21 secs elapsed)
🚨 600 tweets... (26 secs elapsed)
🚨 700 tweets... (29 secs elapsed)
🚨 800 tweets... (34 secs elapsed)
🚨 900 tweets... (38 secs elapsed)
🚨 1000 tweets... (42 secs elapsed)
step 2: filter tweets to ones containing emojis
done with 0...
done with 200000...
done with 400000...
done with 600000...
done with 800000...
done with 1000000...
done with 1200000...
done with 1400000...
done with 1600000...
done with 1800000...
413433 tweets with emojis
step 3: aggregate by emoji and state
{'coordinates': None,
'emojis': ['👀', '😏'],
'emojis_names': [':eyes:', ':smirking_face:'],
'id': '866461716569350145',
'place': {'attributes': {},
'bounding_box': {'coordinates': [[[-85.605166, 30.355644],
[-85.605166, 35.000771],
[-80.742567, 35.000771],
[-80.742567, 30.355644]]],
'type': 'Polygon'},
'country': 'United States',
'country_code': 'US',
'full_name': 'Georgia, USA',
'id': '7142eb97ae21e839',
'name': 'Georgia',
'place_type': 'admin',
'url': 'https://api.twitter.com/1.1/geo/id/7142eb97ae21e839.json'},
'text': "I'm here for Drake and Vanessa... Dranessa 👀😏",
'time': 'Mon May 22 01:12:25 +0000 2017',
'user': 'LongLiveDenzy'}
[('Texas', 55023),
('California', 48004),
('Florida', 26849),
('New York', 20858),
('Ohio', 18172),
('Georgia', 17905),
('Illinois', 13705),
('Louisiana', 13586),
('North Carolina', 11725),
('Pennsylvania', 10869)]
[(':face_with_tears_of_joy:', 133771),
(':loudly_crying_face:', 47571),
(':red_heart:', 30301),
(':smiling_face_with_heart-eyes:', 28715),
(':fire:', 18188),
(':female_sign:', 17195),
(':weary_face:', 15559),
(':skull:', 13885),
(':face_with_rolling_eyes:', 13747),
(':person_shrugging:', 12666)]
Alabama:
[(':face_with_tears_of_joy:', 2882), (':loudly_crying_face:', 590), (':female_sign:', 498)]
Alaska:
[(':face_with_tears_of_joy:', 80), (':red_heart:', 50), (':loudly_crying_face:', 33)]
Arizona:
[(':face_with_tears_of_joy:', 1985), (':loudly_crying_face:', 745), (':red_heart:', 718)]
Arkansas:
[(':face_with_tears_of_joy:', 958), (':fire:', 258), (':red_heart:', 221)]
California:
[(':face_with_tears_of_joy:', 13933), (':loudly_crying_face:', 5584), (':red_heart:', 4001)]
step 4: add tf-idf (term frequency–inverse document frequency)
Alabama:
1. :double_exclamation_mark: tf-idf: 0.00104
2. :speaking_head: tf-idf: 0.00094
3. :cat_face_with_tears_of_joy: tf-idf: 0.00074
Alaska:
1. :snow-capped_mountain: tf-idf: 0.00971
2. :mount_fuji: tf-idf: 0.0053
3. :passenger_ship: tf-idf: 0.00486
Arizona:
1. :A_button_(blood_type): tf-idf: 0.00311
2. :deciduous_tree: tf-idf: 0.00198
3. :cactus: tf-idf: 0.00138
Arkansas:
1. :cloud: tf-idf: 0.01442
2. :double_exclamation_mark: tf-idf: 0.00209
3. :water_wave: tf-idf: 0.00207
California:
1. :heavy_minus_sign: tf-idf: 0.00147
2. :thermometer: tf-idf: 0.0009
3. :cat_face_with_tears_of_joy: tf-idf: 0.00068