Is Social Media Analytics Possible Without Taxonomies?

hierarchical taxonomy

The answer to this question if you are in the insights business is a definitive NO. If you are in PR, or in other adjacent marketing disciplines… then possibly your answer will be YES.

Our answer is still a NO. My advice to you Mr. PR Manager: call your colleague in consumer insights and ask for help. 

Let us first establish what 'social media analytics' and 'taxonomy' mean. According to Wikipedia:

  • Social media analytics is: “Measuring + Analysing + Interpreting interactions and associations between people, topics and ideas”. Some people use “social analytics” as a term in an interchangeable way with “social media analytics”. Wikipedia tells us otherwise: “The practise of Social Analytics is to report on tendencies of the times. It is a philosophical perspective that has been developed since the early 1980s”; in other words it is something different. Of course there are various definitions out there that make the two terms part of each other, such as the Gartner definition.

  • Taxonomy is: “the practice and science of classification. The word is also used as a count noun: a taxonomy, or taxonomic scheme, is a particular classification. Many taxonomies have a hierarchical structure, but this is not a requirement. “

Taxonomies in a social listening context are used to describe a product category, an industry vertical, or simply a subject, like the 2015 parliament elections in the UK. When used to analyse posts from social media, they act as “dictionaries” which include the words people use to discuss the subject/category online. Taxonomies can be flat or hierarchical; for market researchers there is more value in a hierarchical taxonomy because of the drill-down capability that it avails.

A taxonomy could be created to represent a logical structure of how a market research analyst sees and understands the product category, however this approach is not good enough when the taxonomy is to be used for social media analytics; it should instead be directly derived from the posts harvested from social media that need to be analysed.

In social listening, when millions of posts about a product category in a specific language need to be analysed in order to extract insights, the following disciplines and skill sets are required:

  • Machine learning - to annotate sentiment with accuracy as high as possible
  • Taxonomy in order to know the themes and sub-themes or topics of conversation

Some of these processes are automated, some are semi-automated and some are manual. The good news is that when manual work is required it is mainly done during the set-up of a social media monitoring programme.

Most social media monitoring tools - including the really popular ones - do not actually make use of a taxonomy; all a user can do is use search terms. One could speculate that this approach is equivalent to a flat taxonomy but unfortunately it is not; a taxonomy implies that multiple relevant words and phrases roll up under a topic or theme. In the case of a 'search term only' approach, analysis of the posts will be shown only for posts that contain the specific search term. So if a user wanted to look at social media sentiment within topics and sub-topics, that would not be possible without a hierarchical taxonomy.

One scenario remains to be investigated in answering the question in the title of this blog post:

What if a user only wants to analyse sentiment in social media? Well, if the point of the research was about one specific notion or keyword, or high level term, then perhaps it would be the one exception when a taxonomy would not be necessary for social media analytics; however, if a whole product category was monitored (without using a taxonomy) then this would be a lost opportunity because the user would not know what subject the sentiment was about.

Taxonomies are a very broad topic on which we will probably need to dedicate a number of blog posts similar to this one. They are also very necessary for social media listening and analytics; for example, a huge opportunity lies out there for the company that will own detailed hierarchical taxonomies of all the major product categories sold in supermarkets.

Like we always say, ‘there will be a day when marketing directors will not be able to perform their jobs without social media analytics dashboard on their computer, or tablet for that matter’. Tell us what you think about taxonomies; do you think they represent an opportunity, or rather an insurmountable challenge? After all, it takes refined computing processes and a considerable effort from a small army of experienced and smart people to create one!

Social Listening and Online Communities: 1+1=3?


Two out of the top 3 trends in market research repeatedly reported by Greenbook’s GRIT report are social listening and online communities. The third is mobile research which, being a method of collecting data for surveys, can be part of online communities anyway.

We have written about private online communities and social media listening separately many times before, but this blog post is dedicated to the power of integrating the two disciplines.

Back in February, the CEO of Kantar Research Eric Salama spoke at the Insight Innovation Exchange conference in Amsterdam, about his view of the future of market research. One of the concepts that stuck with me was that in the future, market research will be divided in “learning applications” and “action applications”. My interpretation of these two types of apps is that the former is pure market research as we know it, and the latter are adjacent marketing activities that today are not governed by the ESOMAR or the MRS code of conduct. Examples of action applications are programmatic advertising, customer advocacy, and agile customer engagement.

Two of the following three ways to integrate social listening and online community platforms are action applications, and one is a learning application. Let’s see if you agree that 1+1 will equal more than 2 in these three cases:

  1.  Member recruitment for online communities
    For the first time in the history of marketing and market research, we can now find respondents for ad-hoc research or members of communities based on their perceptions, without having to use a screener questionnaire. We can use social listening to gather all the posts from the web that: are aligned with an idea, agree with a concept or express love for a brand. Because the expressed opinions on social media posts are unsolicited, they are of better quality than those expressed in a screener questionnaire used with people from a consumer panel. The panelists have an interest to figure out how to answer “right” so that they will be invited to participate in a survey (expert respondents).
  2. Listen-probe-listen-probe
    A virtuous circle can be created by integrating listening and communities. A brand or organisation can first “listen” to what people say on the web about the subjects of interest, and then engage with the members of their private online communities to ask questions (probe) about what they learnt from the harvesting and analysis of online posts. Through the probing they are bound to discover information that will improve the way they do their social media monitoring. And so on and so forth… Every time they complete a listen-probe-listen cycle, new valuable insights can be extracted that were never attainable before.
  3. Amplified customer advocacy
    Product category influencers can be identified through the content of their online posts and the size of their networks. They can then be invited to join an exclusive private online community for co-creation of digital content and customer advocacy amplification i.e. the sharing of the digital content with their friends and network.

Connecting the dots is a very powerful notion in market research. As shared on this blog several times, we firmly believe that a true business insight is more likely to be the result of synthesizing data from multiple sources as opposed to analysing a (small) data-set to death. The insights expert is a necessary part of this equation (1+1=3). There is also a new breed of a human skill-set that is becoming more and more an integral part of those market research agencies that “get it”; it is the data scientist who is among other things a machine learning specialist not daunted by tera-, peta-, hexa or zeta-bytes. Thoughts?

The 3 Things You Should Get Right If You Use Social Media Listening

Are you Listening?

Social media listening has many names; the most accurate term to describe this new marketing discipline is probably Active Web Listening. “Web” is more appropriate than “social” because when people share their views about brands, organisations and people, they do so not only on the well known social media sites but also on blogs, forums, and commercial websites (such as Amazon). Sometimes we also want to listen to what is in the news – editorial content – as well. The word “active” emphasises that it is not enough to just "listen", you have to do something about it, which assumes that you understand what people are saying and what the issues are. Having said all that, the most popular term used in a Google search by people looking for solutions as such is: 'social media monitoring'.

Now that we have the nomenclature out of the way, let’s discuss how to do social media listening properly; we need to pay attention to 3 things really:

  1. Noise

  2. Sentiment Accuracy

  3. Drill-down capability

Let’s have a closer look at these 3 things one by one:

  1. Noise
    Any given query that will initially be used for the monitoring of a subject or product category will, almost for sure, produce posts that are not relevant to the subject . Sometimes the irrelevant posts are 80%-90% of the total posts harvested from the web. For example if we have a query with just one search term e.g. Apple (Computers), we will get lots of posts about apple – the fruit. The usual way to get rid of noise is to use a Boolean logic query, something along the lines of: Apple AND Computers OR phone OR Tablet NOT taste ….etc.

  2. Sentiment Accuracy
    This is probably the most difficult problem to solve when it comes to making sense out of social media. Most end-clients (brands) of social media monitoring and analytics have developed ways to extract value out of their existing social media monitoring dashboards, without making use of sentiment analytics. In other words, they know how many posts are talking about their brand and their competitors, but they do not know how many of these posts are negative and how many are positive. They also have no idea what their Net Sentiment Score benchmarked with their competitors is (NSS is a very useful metric and a DigitalMR trade mark). We believe the reason they chose to ignore sentiment is simply because no supplier of theirs is able to deliver a sentiment accuracy over 60%.

    negative, neutral, positive

    This ended on December 31st 2014 when DigitalMR completed the 2.5 year development of listening247. Through the use of a unique combination of machine learning algorithms and computational linguistic methods, the DigitalMR R&D team was able to achieve sentiment accuracy over 85% in multiple languages and product categories. A machine learning model usually delivers 70% - 75% sentiment accuracy initially, and then with continuous fine tuning (for about a month) it climbs slowly but surely to 85% and even higher. 

    The key to establishing the sentiment accuracy is for a number of humans to agree with the posts processed (and the sentiment detected) by the algorithms. We use random samples and ask the end user (client) or an independent third party to go through the posts and annotate sentiment manually. We then compare the results of listening247 and those of the human annotations, and establish the degree of agreement. The caveat here is that sentiment accuracy can never be 100% since even humans do not agree 20%-30% of the time due to sarcasm and general ambiguity.

  1. Drill-down capability
    The drill-down capability depends on two things: a drill-down dashboard and an appropriate taxonomy that describes the topics discussed around a subject or product category. It is fairly easy to drill down into posts about a single brand, and then within that brand to drill down into a key term used in the discussions, and then within that term, to look at only the negative posts. What is not easy to do is look at the posts around a topic or discussion driver, then drill down to see what the sub-topics around that main topic are, and then drill down further  to see what people are saying about one attribute (of the many) within the (chosen) sub-topic. After all that, we can still have a look at a specific brand, the sentiment, and the source of the posts at the attribute level; a total of 8 drill-down levels are possible with an approach like this.

A delegate at the MRS Healthcare research conference last week in London said that if anyone could take thousands of posts in any language, and was able to analyse for topics and sentiment, they would consider this a superpower equal to that of super heroes such as Superman and Spiderman. Well it is quite telling that a colleague in the business of market research did not even know that this is possible and that the only superpower we need to achieve it is machine learning capability.

Here is where the magic comes in (if you get the above 3 things right): Social media listening takes unstructured text (consisting of thousands of posts), provides structure to it which allows us to see a quantitative analysis and interpretation otherwise impossible, and furthermore allows you to get to a few homogeneous posts that you can read for a qualitative analysis take and further probing.

Can Market research get any better? What do you think?

Market Research as we know it is perfectly unsuited for the digital economy

example of displaying insights


In the January 2015 edition of Research World – the ESOMAR monthly magazine – there is an article that I co-authored with Dimitris A. Mavros, the Managing Director of MRB Hellas, the third largest Market Research agency in Greece. The article is about the impact of the digital era on traditional market research agencies; our premise is that if they remain traditional, the impact will not be good!



 Here are our 10 predictions about the future of the market research industry:

  1. The traditional market research agencies that refuse to change will go out of business
  2. DIY market research will catch on even more and will democratise our sector
  3. Social listening analytics will be a must-have for every marketing and market research manager
  4. Agile research will become mainstream and will be facilitated by online communities
  5. Micro surveys and intercepts will eventually replace long monthly customer tracking studies
  6. Processing behavioural data in motion and delivering real-time micro insights will be a core competence of any insights expert agency
  7. Adjacent marketing services such as customer engagement, enterprise feedback management, customer advocacy, will become solutions offered by the market research companies of the future
  8. Data scientists will be the new insight experts, utilising a lot more predictive analytics than rear-view mirror analytics
  9. The code of conduct of market research associations such as ESOMAR and MRS will be revised as it does not apply to the digital economy. If not, the new breed of MR agencies will refuse to be members of such archaic organisations, and the latter will die out
  10. Nielsen will no longer be the largest market research company in the world

Client vs Supplier next-gen market research interest

The above table from GRIT Winter 2014 more or less confirms some of our predictions; the source of this data is market research agencies and end clients of market research. The social media analytics is interesting because 47% of end clients claim to be using it whilst only 34% of the agencies claim the same. This could mean that other technology companies are being used by the end clients that are not market research suppliers.

I would be very interested to start a conversation with colleagues who have an opinion on the matter. As a company, DigitalMR holds the above positions since 2010, when it was established; in 4 years we did not have to change our minds on any of them, if anything, we see social traction confirming those positions. I am sure there will be more than two opposing views and maybe we can define different narratives and segments of us.

The bottom line is: change or perish. If you are a traditional agency it is not too late. A good first step will be to include in your solutions portfolio: social media listening and online communities. DigitalMR is looking for selected market research agencies to be its partners in certain countries and industry sectors. Please do get in touch, if nothing else, we can have a pleasant chat about the future of market research or have coffee if you are visiting London.

Do you know of any 2nd or 3rd generation social media listening platforms?

example of displaying insightsMarket research companies do not really have a reputation of being a very innovative bunch; for years there has been no revolutionary innovation in market research tools.

Social media listening is a discipline that, as we wrote many times before, was elusive for market research. Both the corporate and agency researchers did not see the value; they did not trust it, it was not “representative”…yes, they used the 'R' word…ooouuuuuuuuuhhhhhhhhh. The first generation of social media monitoring tools (which are still in full use today) are not deemed useful for market research by the corporate researchers; rightly so if you ask me. Their sentiment accuracy is less than 60% and usually only capable of dealing with a specific language. There is no capability of drilling down into topics and sub-topics, when they provide sentiment it is only at the brand or search term level, and the percentage of noise (irrelevance to the topic) that a user search query returns is 80-90%.

I had not realised that our own listening247 is a 2nd generation social media listening tool until Lenny Murphy (@lennyism), Editor-in-Chief of the GreenBook Research Industry Trends Report and blog,  called it that during a phone conversation a couple of weeks ago. I guess what 2nd generation means is a platform that was specifically developed for consumer insights- one that addresses all the shortcomings of 1st generation tools that are mentioned above. Social media research is now a very concrete discipline with new tools such as listening247 and online community tools (such as communities247) that complement it. We believe that the biggest shortcoming listening247 addresses is multilingual sentiment accuracy - the result of almost 3 years of hard work and relentless focus of the DigitalMR R&D team is being able to consistently reach over 85% sentiment accuracy in any language.

I was very pleased to hear Lenny go on describing what he thought 3rd generation tools will be like; he said they will link sentiment to customer behaviour and customer profiles, there will be more focus on analysing images, voice, and video for sentiment and more granular emotions. He also spoke about using text analytics to deal with sources of unstructured text other than social media e.g. email databases, instant messaging, call centre conversations etc. In case you are wondering why I was pleased, you can reach out and ask me on @DigitalMR_CEO.

Some of these new market research methods will disrupt traditional market research as we know it even further. The do-it-yourself aspect of these new platforms will help democratise the space and allow current non-users to convert to online market research, mainly because these platforms will be affordable, efficient and effective.

End clients of market research have already started realising that 1st generation social media monitoring tools have very low sentiment accuracy, and even if they are captive to these tools, they still ask DigitalMR to score their posts for sentiment. In most cases, the sentiment accuracy difference between listening247 and the other tool is over 30 percentage points. If you would like to know how we define and measure accuracy please ping me on Twitter and I will be more than happy to provide definitions and examples.

At DigitalMR we have a couple of other rabbits in the hat, not sure if they will help us qualify as a 4th generation social media listening tool, but we are certainly happy that we are currently considered 2nd generation going on 3rd :) since the majority of the current players are still gen 1.