Collecting data is always an important step for data miners and if this data comes from Twitter it sounds even much more interesting for text miners.
We will introduce the possibilities of R package TwitteR as regards collecting data from Twitter.
First of all we will need to open a secure connection with Twitter. To do so, please refer to page R OAuth for TwitteR
# Let's start by loading library and certification
# Say that we are interested in tweets regarding company Continental airlines which has a twitter handler @United. We will proceed to collect 1.000 tweets.
# Let's answer a few basic questions
# To go further in our example we will load a new package (plyr) which contains tools for splitting, applying and combining data
# Get all text from our twitter data set in data frame format
# Now we only have text entries in our data set un.text.
# Another interesting way to work data from Twitter would be first of all to get the Trending Topics.
This function will return the top 30 trending topics for each day of the week starting with yesterday.
# And after that, extract data via function searchTwitter about one specific Trend
Once we have a data set in data frame R format, it is time for mining the text, for instance, by implementing functions included in R package (tm) ...let's get it started.
# Let's apply some transformations to our Twitter data set
# Some commands to view corpora data
# Functions with basic statistics
At this point we have extracted the most used words in tweets regarding Continental airlines, ... cool (~_~;) isn't it ?
You can post any comment about this article at cRomoData.com
Data Mining with R >