5 Ways to Squeeze Information out of Tweets, Comments and
Reviews. Through posted comments, tweets and reviews your
customers can provide you with a lot of useful feedback. It has
become essential to rapidly glean information from textual data
in order to respond to emerging issues and trending topics.
In this short post I’ll summarise different approaches to
analysing text. The follow up article “Words Worth –
Extracting Meaning and Sentiment from Textual Data” shows
how to apply these techniques using FastStats.
Selection uses a query to count or extract all the text that
includes a certain word, a set of words, or a wildcard pattern.
For example an airline might wish to count, prioritise and
respond to any tweets that mention delay or delayed.
My text is from movie reviews but you get the idea. Of
course this is most useful when you know the words that you are
2. Word Frequencies
When there is a large volume of text items to consider the first
task is to extract which words are occurring most frequently in
order to get an idea of the important topics. This involves
“shredding” the text into words, calculating the word frequencies
and ranking. The results can be displayed in a table or Word
Cloud which uses font size to reflect the relative frequencies.
This can work well when comparing reviews of two products or
trying to distil the essence of a large number of reviews.
The two word clouds above are for horror films and
comedies. In practice you need to be able to exclude very common
words and also those that are generic in your topic area e.g.
“hotel” for a holiday company. To analyse hashtags you
obviously need to filter to only include words prefixed by #.
3. Scoring Words
When we read a review we can use our knowledge of language, the
subject area and the audience to quickly judge whether it
expresses a good, bad or neutral opinion. Faced with large
quantities of reviews it would be useful to automate this
process. One approach is to train a model to recognise the
keywords associated with known good or bad reviews then use this
model to “score” new reviews.
The keywords and coefficients define the model. For example using
our knowledge of language, and some gut feeling, we could assign
the following values to keywords
great = 8
unique = 8
funny = 6
laugh = 6
overrated = -5
bad = -8
boring = -10
So with this simple model a review that reads:
"great film with a uniquely funny twist"
scores 8 + 6 + 8 = 22
"overrated - a few laughs but so boring"
scores -5 +6 -10 = -9
The model is improved by basing the coefficients on the odds of
the keyword appearing in good vs. bad reviews. Using this
technique FastStats can calculate the coefficient estimates from
a training set of reviews.
4. Modelling Language
The problem with scoring individual words is that we are missing
all the subtlety of language. Scoring “bad” as -8,
for example, might make sense overall but there is a big
difference between “exceptionally bad”, “plain bad” and “really
not bad at all”.
The model can therefore be improved by recognising “qualifier”
words such as “very”, “rather”, and “not” which amplify, reduce
or even negate the scoring impact of the keyword.
FastStats text model allows for a pre-scored table of qualifier
words. Of course we can’t hope to model all the complexity of
language and will certainly be defeated by sarcasm. However
within a limited subject area where reasonably simple language is
the norm it is possible to get a reasonable level of automation.
5. Subject Sentiment
A similar word scoring strategy can be used in the scenario where
you are interested in the reviewer’s sentiment around a
particular subject. For example if your company produces bicycles
and your latest model has changed from an aluminium to a steel
frame you might be sensitive to whether the adjective qualifier
words near “frame” are expressing positive or negative sentiment.
If you detect a lot of negative opinion then you might try to
counter that with your own expert reviews and news releases.
Extracting the most frequently used words in reviews can give you
actionable insight into customer trends and sentiment.
Through automated word scoring you can judge the sentiment behind
customer reviews and respond accordingly.
Word scoring around subject sentiment enables you to judge how
well a new product or service has been received in the market.
Learn more about how you can gain a business advantage by
understanding what your customers are saying by downloading our
free eGuide ‘Words Worth: Extracting Meaning and Sentiment
from Textual Data’ now.