With this exercise, you can learn more about text mining. The exercise is an extension of the example you know from the text. Instead of only eight tweets, you should analyze the complete corpus of Donald Trump tweets from 2017 and use word clouds to visualize how your preprocessing modifies the data.
Data and Libraries
Your task in this exercise is to analyze textual data. You will perform various processing steps and see how the results of a simple visualization through word clouds evolve. You can find everything you need in the nltk
and wordcloud
libraries (+ some basic stuff, e.g., for regular expressions).
For this exercise set, we provide data about the tweets from Donald Trump in 2017. You can download the data here, each line contains a single tweet.
Word clouds without pre-processing
Load the data and create a word cloud without any further processing of the text data. Does this already work? What are problems?