BTS x UNICEF “Love Myself” Campaign

Same message, different interpretation from K-pop fandom

On September 14, 2018, RM, the leader of BTS, gave a powerful speech to the UN General Assembly, which became a significant moment in K-pop history. His message encouraged people to “speak [them]selves (…) no matter who [they] are, where [they’re] from, [their] skin color, gender identity.” His mention of gender identity was significant to international ARMYs and positioned him as an advocate for LGBTQ+ human rights. While the speech also resonated in Korea, it did so for different reasons. Despite K-pop being produced in a conservative and heteronormative society, it challenges gender norms worldwide. This article will explore the following questions: (1) How did RM’s speech resonate differently among international and Korean K-pop fandoms? (2) What does this discrepancy reveal about BTS’ influence on their fans? The study uses on-site fan interviews and social media analysis of tweets to answer these questions.

In this paper, I provide a comparative analysis of K-pop fandom’s reactions to BTS’s speech at the United Nations. To do so, I used a theoretical framework that mixed the background of Korean society and performative conceptions of gender and adapted them to the nature of the audiences I was looking at. As a result, I used the concept of “localized perception of gender” to complicate the difference in the sensibility of Korean and non-Korean fans to RM’s speech. Finally, to advocate for an interdisciplinary approach to Korean Studies, I used data science techniques and ethnographical fieldwork.

The results of the study show that the non-Korean fans of BTS use the UN speech as a way to express their views regarding gender identity. They were particularly receptive to the part of the speech that dealt with that issue, allowing them to re-interpret other BTS-related content, such as the lyrics of their songs or their TV appearances. This queering process of BTS by the non-Korean fandom of K-pop is quite different as it focuses on real events rather than fan-produced content such as fan fiction or fan videos.

The Korean fans of BTS also used BTS’ UN speech to express themselves but with an additional factor of their place in society. With a focus on the self, the Korean fans used this powerful moment to overcome the difficulties of their daily lives that mainly deal with trying to fit into Korean society. Their reactions also added a nationalist perspective to the speech, showing Korean ARMYS proud of their idols representing Korea in front of the rest of the world.  Even if it might sound like non-Korean fandom is particularly focused on gender issues and that the Korean fandom was not as receptive to that part, it is essential to note that there are exceptions, such as the making of the Rainbow ARMY Scout.  

In both fandoms, the common thread of self-expression or self-love appears as a continuity of BTS’s message through their Love Yourself Trilogy. Here, we made a simple distinction between “Korean” and “Non-Korean” fandoms, but to make this study even more insightful, it is important to dig deeper into cultural values such as the self in different cultural backgrounds. Also, because this study mainly focuses on BTS, it would be interesting to look at other K-pop idols that have a message of self-love, compare it with BTS, and see if there was a similar process of queer interpretation by the fans.

The following Notebook excerpts provide greater details concerning the result of the text analysis of the corpora used in the project. It shows the code that leads to the results exploited in the paper and wider results and methods to explore them further.

Methodology

To explore the opinions of international and Korean fans, I will combine two different study methods. The first one will be Internet ethnography, a qualitative study method that consists of participant observation of the Internet by analyzing content such as Internet community postings or social media conversations. This largely qualitative approach will be augmented by data analysis of Twitter feeds. Twitter will be the most useful social media since it is a platform for interaction between international and Korean K-pop fans. Moreover, it is not only the most-used platform by BTS to communicate with their fans but also the platform chosen by UNICEF to promote the “Love Myself” campaign.

Since it is impossible to retrieve tweets older than seven days through Twitter’s API, I had to use, to avoid this limitation, a tool called GetOldTweets3 working on Python3. To scrap the reactions of the non-Korean fans of BTS, I used the keywords “BTS+UN+Speech.” I added that the tweets should be in English and gathered 14,258 tweets. For the Korean fans, I used the queries “방탄+유엔+연설” (lit. Bangtan+uen+yŏnsŏl) and “방탄+UN+연설” (lit. Bangtan+UN+yŏnsŏl) and collected a total of 2,806 tweets. Through both processes, I have been able to compile tweets uploaded during a period going from September 2018 to November 2019. To provide a better understanding of the methodology and more detailed results, this paper goes along with a companion Jupyter notebook that the reader can access through a repository uploaded on my GitHub account.

Preparation of the tweet corpora and tokenization

The tweets were compiled in two separate CSV files, one for the international and the other for the Korean fandom. The first task was to extract the content of the tweets in each CSV file to create our Twitter corpora (cf. Fig 1a. & 1b. of the companion notebook). To do so, I read the CSV files using Pandas, one of the most common data analysis libraries, working on Python. I put the content of each column cell into a single string (cf. Fig 1c. of the companion notebook).

Once the corpora were compiled, the next step was to tokenize and clean the data. According to Manning, Raghavan, and Schütze, “given a character sequence and a defined document unit, tokenization is the task of chopping it up into pieces, called tokens.” I tokenized the international fans’ reactions corpus using the Natural Language Toolkit (NLTK), which “provides basic classes for representing data relevant to natural language processing [in Python.]”

For the corpus showing the Korean fans’ reactions, it was impossible to use NLTK. One of the main challenges in this study was using text analysis tools with corpora in two different languages. Korean is an agglutinative language; space was not a sufficient unit of tokenization. As a result, I used a specific set of tools for natural Korean language processing called KoNLPy. KoNLPy gathers various tokenization classes made with different dictionaries. I chose Open Korean Text (Okt), “an open-source Korean tokenizer written in Scala, developed by Will Hohyon Ryu.” The main reason for choosing Okt over the other classes was that it was created by a developer who worked at Twitter and, as a result, had a dictionary built on Twitter corpora. For both corpora, it was also important to filter the content of the tweets using a list of stop words. In Natural Language Processing, stop words are “common words that appear of little value in helping select documents matching a user need.” For the English corpus, I used NTLK’s pre-set stop words list. I also extended it with terms that distorted the data, such as the keywords used during the search query, BTS members, hyperlinks, or unrelated hashtags. For the Korean corpus, I used a pre-made list of stop words that I extended with the same equivalent of the unnecessary keywords in Korean. As I was processing the text analysis of the corpora, I came across new stop words to add, and I had to edit them numerous times. For the code that leads to the “clean” corpora, refer to Fig. 1d. and 1e. of the companion notebook.

#Loading the Necessary Libraries

import nltk
from nltk import word_tokenize
from nltk.corpus import stopwords
from nltk import FreqDist
from nltk.collocations import *

from konlpy.tag import Okt 
okt = Okt()

import numpy as np
import pandas as pd
from os import path
from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

import matplotlib.pyplot as plt

%matplotlib inline
Fig. 1a – Introducing the first data set, “Reactions from the International Fandom”
# Load in the dataframe
df_en = pd.read_csv("output_got_BTS+UN+Speech(EN).csv", sep=';', encoding = "ISO-8859-1")
# Looking at first 5 rows of the dataset
df_en.head()
dateusernametorepliesretweetsfavoritestextgeomentionshashtagsidpermalink
026/11/2019 23:08InvictusBangtanbaepsayed104My pleasure! I was never into Kpop until I saw…NaNNaNNaN1,19946E+18https://twitter.com/InvictusBangtan/status/119…
126/11/2019 21:44leopnxhNaN000writing my presentation about the globalisatio…NaNNaNNaN1,19944E+18https://twitter.com/leopnxh/status/11994438224…
226/11/2019 19:34MsScorpion555Bangtan_Egypt003Egyptian schools teaching the new generations …NaN@BTS_twt @_GenUnlimitedhttps#Loveyourself1,19941E+18https://twitter.com/MsScorpion555/status/11994…
326/11/2019 18:40_faraahahmeedNaN103A school made a seminar about bullying and gue…NaN@BTS_twt#BTSLoveMyself #ENDviolencepic1,1994E+18https://twitter.com/_faraahahmeed/status/11993…
426/11/2019 17:43MlNCOSMOSgyuarm016Yes we stan bts, Princes of pop, academy membe…NaNNaN#11,19938E+18https://twitter.com/MlNCOSMOS/status/119938322…
Fig. 1b – Introducing the second data set, “Reactions from the Korean Fandom”
# Load in the dataframe
df_kr = pd.read_csv("output_got_방탄+유엔+연설.csv", encoding = "UTF-8")
# Looking at first 5 rows of the dataset
df_kr.head()
dateusernametorepliesretweetsfavoritestextgeomentionshashtagsidpermalink
02019-11-23 15:13:09uno0917_URMHB000ㅃㄹ방탄유엔연설해 https://twitter.com/URMHB/status/119…NaNNaNNaN1198258190435045376https://twitter.com/uno0917_/status/1198258190…
12019-11-22 15:09:24Eunbin_VROHA_0613101시험범의에 방탄 유엔 연설 들어가서 행복하네요^^NaNNaNNaN1197894858003734534https://twitter.com/Eunbin_V/status/1197894858…
22019-11-22 15:08:07Eunbin_VNaN6021여러분들 오랜만이네요 중학교 마지막 영어 시험 범위에 방탄 유엔 연설이 들어가서ㅠㅠ…NaNNaNNaN1197894536447414272https://twitter.com/Eunbin_V/status/1197894536…
32019-11-20 14:32:58artistbts130613TunaTastyShrimp102그래미 후보발표 보는것도 첨 유엔연설도 실시간으로 보는것도 첨. 신세계 경험이네요 …NaNNaNNaN1197160913989595136https://twitter.com/artistbts130613/status/119…
42019-11-20 13:36:07moinoi9999NaN001맞다맞다 새벽에 엄청 떨리면서 유엔연설 봣던거생각난다 방탄덕분에 김용 총재님 연설두…NaNNaNNaN1197146604475543553https://twitter.com/moinoi9999/status/11971466…
Fig. 1c – Elaborating corpora with only the content of the tweets
tweets_en = " ".join([str(tweet) for tweet in df_en.text])
print ("There are {} words in the combination of all tweets in English.".format(len(tweets_en)))
There are 2852817 words in the combination of all tweets in English.
tweets_kr = " ".join([str(tweet) for tweet in df_kr.text])
print ("There are {} words in the combination of all tweets in Korean.".format(len(tweets_kr)))
There are 272717 words in the combination of all tweets in Korean.
Fig. 1d – Tokenizing the corpora and cleaning them by removing stopwords, punctuation and numbers. (International Fans corpus)
stop_words_en = stopwords.words('english')
stop_words_en.extend(['pic', 'gon', 'na', 'wan', 'na', 'com', 'http', 'https', 'bts', 'un', 'speech', 'twitter', 'rm', 'RM', 'namjoon', 'kim', 'BTS_twt', 'general', 'assembly'])

tokens_en = word_tokenize(tweets_en)
#Make all words lowercase
tokens_en_lower = [w.lower() for w in tokens_en]
#Remove stopwords, punctuation, and numbers.
content_en = [w for w in tokens_en_lower if w not in stop_words_en and w.isalpha()]
#Number of words in the "clean" corpus
print("Number of words in the tweet corpus: ", len(content_en))
Number of words in the tweet corpus:  196094
Fig. 1e – Tokenizing the corpora and cleaning them by removing stopwords, punctuation and numbers. (Korean Fans corpus)
stop_words_kr = open("stopwords-ko.txt").read()

#Those parameters makes it possible to get the plain form of verbs even when they are conjugated, which is very useful.
tokens_kr = okt.morphs(tweets_kr, norm=True, stem=True)
#Remove stopwords, punctuation, and numbers.
content_kr = [w for w in tokens_kr if w not in stop_words_kr]
#Number of words in the "clean" corpus
print("Number of words in the tweet corpus: ", len(content_kr))
Number of words in the tweet corpus:  38229
4. BTS as a tool of expression of Gender Identity
Fig. 2 – List of most recurring terms among tweets from the international fandom
#Count the frequency of each word
from collections import Counter
#Word Frequency
counts = Counter(content_en)
#Convert counter object to data frame
word_count = pd.DataFrame.from_dict(counts, orient='index').reset_index()
word_count.rename(columns = {'index' : 'Word',0:'Frequency'},inplace = True)
word_count = word_count.sort_values('Frequency',ascending=False)

word_count['Rank'] = np.arange(1,1+len(word_count.Frequency))
word_count = word_count.reindex(columns=['Rank','Word','Frequency'])


#Number of unique words
print("Number of unique words in the tweets : ", len(word_count))

#Top ten words
word_count.head(20)
Number of unique words in the tweets :  14110
RankWordFrequency
1love2175
2like1742
3one1187
4really1147
5year1094
6people1085
7proud1073
8gave1034
9said1007
10even992
11world986
12army978
13know927
14also918
15made916
16us871
17time853
18much841
19unicef832
20thank807
#Export results in a CSV file
word_count.to_csv('wordcount_en.csv')
word_count[word_count['Word'] == "gender"]
RankWordFrequency
183gender208
Fig. 2a – Main collocations of the reactions to the speech (biagrams)
from nltk.collocations import BigramCollocationFinder 
from nltk.metrics import BigramAssocMeasures

bigram_collocation = BigramCollocationFinder.from_words(content_en) 
bigram_collocation.nbest(BigramAssocMeasures.likelihood_ratio, 15)
[('cultural', 'merit'),
 ('various', 'countries'),
 ('gender', 'identity'),
 ('jimmy', 'fallon'),
 ('social', 'artist'),
 ('liked', 'youtube'),
 ('year', 'since'),
 ('hair', 'products'),
 ('curriculum', 'various'),
 ('matter', 'skin'),
 ('youtube', 'video'),
 ('grammy', 'museum'),
 ('school', 'curriculum'),
 ('favorite', 'social'),
 ('last', 'year')]
from nltk.collocations import TrigramCollocationFinder 
from nltk.metrics import TrigramAssocMeasures 

trigram_collocation = TrigramCollocationFinder.from_words(content_en)  
trigram_collocation.nbest(TrigramAssocMeasures.likelihood_ratio, 15)
[('order', 'cultural', 'merit'),
 ('curriculum', 'various', 'countries'),
 ('cultural', 'merit', 'award'),
 ('liked', 'youtube', 'video'),
 ('various', 'countries', 'null'),
 ('cultural', 'merit', 'medal'),
 ('material', 'various', 'countries'),
 ('gender', 'identity', 'speak'),
 ('favorite', 'social', 'artist'),
 ('color', 'gender', 'identity'),
 ('cultural', 'merit', 'medals'),
 ('got', 'cultural', 'merit'),
 ('received', 'cultural', 'merit'),
 ('receiving', 'cultural', 'merit'),
 ('receive', 'cultural', 'merit')]
stop_words_en.extend(['cultural', 'merit'])
content_en = [w for w in tokens_en_lower if w not in stop_words_en and w.isalpha()]
Fig. 2b – Main collocations of the reactions to the speech (trigrams)
from nltk.collocations import TrigramCollocationFinder 
from nltk.metrics import TrigramAssocMeasures 

trigram_collocation = TrigramCollocationFinder.from_words(content_en)  
trigram_collocation.nbest(TrigramAssocMeasures.likelihood_ratio, 15)
[('curriculum', 'various', 'countries'),
 ('liked', 'youtube', 'video'),
 ('various', 'countries', 'null'),
 ('material', 'various', 'countries'),
 ('gender', 'identity', 'speak'),
 ('favorite', 'social', 'artist'),
 ('color', 'gender', 'identity'),
 ('various', 'countries', 'via'),
 ('last', 'year', 'since'),
 ('colour', 'gender', 'identity'),
 ('school', 'curriculum', 'various'),
 ('skin', 'color', 'gender'),
 ('part', 'school', 'curriculum'),
 ('various', 'countries', 'soompi'),
 ('race', 'gender', 'identity')]
Fig. 3 – Collocations for a specific word, “gender”
#The following code finds bigrams of the keyword, but does not span an interval of words.

bigram_measures = nltk.collocations.BigramAssocMeasures()
kw_filter = lambda *w: 'gender' not in w
finder.apply_ngram_filter(kw_filter)
finder.nbest(bigram_measures.mi_like,15)
[('gender', 'identity'),
 ('skin', 'gender'),
 ('matter', 'gender'),
 ('color', 'gender'),
 ('colour', 'gender'),
 ('gender', 'speak'),
 ('transcends', 'gender'),
 ('race', 'gender'),
 ('conviction', 'gender'),
 ('gender', 'neutral'),
 ('motivational', 'gender'),
 ('gender', 'find'),
 ('gender', 'name'),
 ('gender', 'race'),
 ('age', 'gender')]
5. Between Self and National Pride
Fig. 8a & 8b – Word Count for Korean Corpus
#Count the frequency of each word
from collections import Counter
#Word Frequency
counts_kr = Counter(content_kr)
#Convert counter object to data frame
word_count_kr = pd.DataFrame.from_dict(counts_kr, orient='index').reset_index()
word_count_kr.rename(columns = {'index' : 'Word',0:'Frequency'},inplace = True)
word_count_kr = word_count_kr.sort_values('Frequency',ascending=False)

word_count_kr['Rank'] = np.arange(1,1+len(word_count_kr.Frequency))
word_count_kr = word_count_kr.reindex(columns=['Rank','Word','Frequency'])


#Number of unique words
print("Number of unique words in the tweets : ", len(word_count_kr))

#Top ten words
word_count_kr.head(25)
Number of unique words in the tweets :  7268
RankWordFrequency
1325
2너무311
3ㅋㅋㅋ293
4사랑283
5아이돌279
6아미268
7가다231
8그렇다230
9나오다223
10보라185
11184
12오늘182
13보고182
14ㅠㅠㅠ178
15모르다161
16영어153
17많다151
18받다151
19오다146
20ㅠㅠ144
21그래미139
22자랑스럽다136
23지금133
24가수129
25한국128
#Export results in a CSV file
word_count_kr.to_csv('wordcount_kr.csv')
word_count_kr[word_count_kr['Word'] == "성"]
RankWordFrequency
37120
word_count_kr[word_count_kr['Word'] == "성별"]
RankWordFrequency
1404성별4
word_count_kr[word_count_kr['Word'] == "성정체성"]
RankWordFrequency
2142성정체성3
word_count_kr[word_count_kr['Word'] == "젠더"]
RankWordFrequency
5929젠더1
Fig. 9a & 9b – Comparison of Word Clouds
# Generate a word cloud image
wordcloud = WordCloud(collocations=False, stopwords=stop_words_en, background_color="white", height=3000, width=3000).generate(tweets_en)

# Display the generated image:
# the matplotlib way:
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
wordcloud.to_file("wordcloud_img/speech_EN/fig9.png")
<wordcloud.wordcloud.WordCloud at 0x7feb68c11130>
# Create stopword list:
stopwords = set(STOPWORDS)
stopwords.update(["빌보드1위", "진짜", "RM", "랩몬스터", "hankooki", "BTS_twtpic", "pic", "유엔연설", "https", "BTS_twt", "BTS", "twt", "twitter", "방탄의", "방탄이", "유엔에서", "방탄소년단", "방탄", "유엔", "연설", "지민", "뷔", "정국", "남준", "슈가", "idol", "hankooki.com", "sports", "@BTS_twt", "빌보드", "제이홉", "fancake", "연설을", "연설이", "김남준", "방탄은", "star_single", "plugin"])

# Generate a word cloud image
font_path = "/Library/Fonts/AppleGothic.ttf"
wordcloud_kr = WordCloud(font_path = font_path, collocations=False, stopwords=stopwords, background_color="white", height=3000, width=3000).generate(tweets_kr)

# Display the generated image:
# the matplotlib way:
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
wordcloud_kr.to_file("wordcloud_img/speech_KR/fig10.png")
<wordcloud.wordcloud.WordCloud at 0x7f8ba161fd90>
6. Difference of Interpretation and Skepticism as a weapon of Protection
bigram_measures = nltk.collocations.BigramAssocMeasures()
finder = BigramCollocationFinder.from_words(content_kr, window_size=10)
kw_filter = lambda *w: '성정체성' not in w
finder.apply_ngram_filter(kw_filter)
finder.nbest(bigram_measures.likelihood_ratio, 15)
[('피부색', '성정체성'),
 ('성정체성', '따윈'),
 ('성정체성', '부숴'),
 ('성정체성', '불쾌하다'),
 ('성정체성', '슨건데'),
 ('성정체성', '어떻든'),
 ('도넛', '성정체성'),
 ('상당하다', '성정체성'),
 ('성찰', '성정체성'),
 ('챔피언', '성정체성'),
 ('성정체성', '스픽'),
 ('성정체성', '요약'),
 ('틀리다', '성정체성'),
 ('성정체성', '비꼬다'),
 ('성차별', '성정체성')]
bigram_measures = nltk.collocations.BigramAssocMeasures()
finder = BigramCollocationFinder.from_words(content_kr, window_size=15)
kw_filter = lambda *w: '성정체성' not in w
finder.apply_ngram_filter(kw_filter)
finder.nbest(bigram_measures.likelihood_ratio, 15)
[('성정체성', '도하'),
 ('피부색', '성정체성'),
 ('pic.twitter.com/0bH8e873El', '성정체성'),
 ('값어치', '성정체성'),
 ('깨끗하다', '성정체성'),
 ('논란중', '성정체성'),
 ('도넛', '성정체성'),
 ('상당하다', '성정체성'),
 ('성정체성', 'ㅡㅡ'),
 ('성정체성', '따윈'),
 ('성정체성', '부숴'),
 ('성정체성', '불쾌하다'),
 ('성정체성', '슨건데'),
 ('성정체성', '어떻든'),
 ('성정체성', '투도')]
Bonus – Main collocations of the reactions to the speech
from nltk.collocations import BigramCollocationFinder 
from nltk.metrics import BigramAssocMeasures

biagram_collocation_kr = BigramCollocationFinder.from_words(content_kr) 
biagram_collocation_kr.nbest(BigramAssocMeasures.likelihood_ratio, 15)
[('칼', '군무'),
 ('브링더', '소울'),
 ('장기', '휴가'),
 ('문화', '훈장'),
 ('참여', '부탁드리다'),
 ('아포', '방포'),
 ('부탁드리다', '#BTSpic'),
 ('명동', '전광판'),
 ('군무', '라이브'),
 ('작다', '화양연화'),
 ('실력', '파'),
 ('전광판', '무료'),
 ('페이크', '럽'),
 ('귀', '움'),
 ('휴가', '브링더')]
from nltk.collocations import TrigramCollocationFinder 
from nltk.metrics import TrigramAssocMeasures 

trigram_collocation_kr = TrigramCollocationFinder.from_words(content_kr)  
trigram_collocation_kr.nbest(TrigramAssocMeasures.likelihood_ratio, 15)
[('칼', '군무', '라이브'),
 ('휴가', '브링더', '소울'),
 ('잘생기다', '칼', '군무'),
 ('참여', '부탁드리다', '#BTSpic'),
 ('장기', '휴가', '브링더'),
 ('페르소나', '장기', '휴가'),
 ('브링더', '소울', '아미'),
 ('명동', '전광판', '무료'),
 ('윙즈', '칼', '군무'),
 ('작다', '화양연화', '달방'),
 ('스타', '지금', '투표'),
 ('섹시하다', '칼', '군무'),
 ('칼', '군무', '뮤비'),
 ('군무', '라이브', '실력'),
 ('전광판', '무료', '광고')]