Precision and Recall in Social Listening
For the last 4 years, we have been talking about the importance of sentiment accuracy in social listening. When people asked: “What is sentiment accuracy?” we responded along these lines:
• 80% sentiment accuracy means: if you are given 100 posts from the web about your brand that are annotated with positive, negative or neutral sentiment, you will agree with 80 of them and disagree with 20
• 80% sentiment accuracy means: if you are given 100 positive posts from the web about your brand, only 80 will be positive; the rest will be negative, neutral or irrelevant.
We then went on to explain that 100% sentiment accuracy is not attainable because even humans do not agree among themselves. In 10%-30% of cases, there may be a lack of consensus on whether a post is positive, negative or neutral. If we can accept that ambiguity will always exist, due to sarcasm and other complex forms of expression, then how do we expect a machine learning algorithm to agree with all of the humans checking the data?
Maybe at this point we should also explain that in social listening, the most popular way to check sentiment accuracy is to extract a random sample of 1000 posts and have 2-3 humans manually annotate them with sentiment. We then compare the sentiment that the algorithm has assigned to each of the posts and determine the percent agreement between all 3 human curators and the algorithm.
As clients of social media monitoring become more sophisticated, they start asking questions like: “When you say accuracy do you mean precision or recall?” If the vendor is one of the usual suspects that offer social media monitoring tools, then chances are that they will not understand the question. For them, we share here a simple Wikipedia definition: “In simple terms, high precision means that an algorithm returned substantially more relevant results than irrelevant, while high recall means that an algorithm returned most of the relevant results.” Another more detailed definition provided on Wikipedia is this:
“In a classification task, the precision for a class is the number of true positives (i.e. the number of items correctly labelled (by the algorithm) as belonging to the positive class) divided by the total number of elements labelled (by the algorithm) as belonging to the positive class (i.e. the sum of true positives and false positives - which are items incorrectly labelled as belonging to the class). Recall, in this context, is defined as the number of true positives divided by the total number of elements that actually belong to the positive class (i.e. the sum of true positives and false negatives - which are items that were not labelled as belonging to the positive class but should have been).”
Although there were previous titles given to accuracy in the past 4 years for simplicity’s sake, we know it really was “precision”. Now that the consumer insights managers started getting involved in social listening, we need to adapt the way we vendors talk and explain the new terms. Here we should add that precision and recall are not only relevant for measuring sentiment accuracy but also we can use them to measure semantic accuracy i.e. how accurately a solution can report topics and themes of online conversations.
Also, I doubt if these definitions are on the radar of ESOMAR, MRS, MRA or CASRO. If this is true, I suggest that the market research associations start defining how the accuracy of social listening data is measured for the sake of all the market research companies and clients looking for guidance. If they need help, we, the practitioners of social listening and analytics, are here to offer a helping hand in better defining the market research methods of the future.
Image source: By Walber (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons
Share this article: