A blog post written at 300 km/h

A blog post written at 300 km/h

As promised last week, here I am writing another blog post as I travel at 300 km/hour on the Eurostar towards Brussels - from London - on a Sunday afternoon. I am heading to my second consecutive participation as a speaker at LT-Accelerate; a conference about language technologies, not the usual market research conferences that I attend.

Last year at LT-Accelerate I spoke about rich analytics for social listening and stressed the importance of semantic analysis and accuracy; this year I will be describing what differentiates the more than 1,000 social media monitoring tools currently available out there.

Looking at this from a market research and customer insight perspective, we categorised the social listening tools into three generations:

  1. GEN 1: sentiment accuracy less than 60%, search based topic analysis, limited attention to noise elimination, automated sentiment analysis in usually one or two languages only
  2. GEN 2: sentiment and semantic accuracy over 75% in any language, inductive approach to report topics of conversation, significantly reduced noise (less than 5% irrelevant posts)
  3. GEN 3: In addition to what Gen 2 social listening tools can do, those few that can be classified as GEN 3 can also detect emotions, analyse images in an automated way for brands in terms of theme and possibly sentiment, and they offer guidance for integration with consumer tracking surveys and other data sources and profile users.

If you want to know what generation your current social media monitoring tool belongs to, all you need to do is ask your vendor what is their sentiment and semantic accuracy and whether they can detect emotions and analyse images for insights.

The main reason I go to conferences such as this one is to demonstrate thought leadership in the field of market research and customer insights, with the hope that prospective clients, partners, and vendors will come forward and initiate conversations that could develop to become mutually beneficial deals.

Last year only half of the conference delegates showed up because of the terrorist attack that had happened in Paris. Brussels was on a high terrorist alert that started the Sunday before the conference; the prudent thing to do was to stay at home and switch to a skype presentation as some speakers did. My take on the situation was that a city is at its safest when it is on high alert, so I decided not to change my plans. Indeed as I arrived at the train station last year and on the way to my hotel the streets were deserted, apart from armed soldiers. It was eerie but funnily enough it felt quite safe.

So here I am again this year on my way to the Brussels Central station and in the absence of a red alert due to terrorist threats I sort of feel less safe. I am making a mental note to remain vigilant and pay attention to what is going on around me; look out for any suspicious behaviour in other words.

Enough reminiscence, back to the essence of this post: I am sure there are other meaningful ways to categorise social listening tools and I would be very interested to find out how other people classify them. Maybe a plausible way to classify them is according to the use case of each one. Maybe another is the target customer/department the tool was created for, such as:

  • PR
  • Communications
  • Operations
  • Customer Service
  • New Product Development
  • Customer Insights

In my opinion around 98% of the current tools on the market belong to Gen 1, around 1% belong to Gen 2, and only a handful belong to Gen 3. I would not be the least surprised if the only social listening tool that meets all Gen 3 criteria is listening247®. Clearly, only Gen 2 and 3 tools are suitable and can be used for market research and customer insights. Gen 1 tools would be disqualified from the get-go, if nothing else, due to the noise (irrelevant posts) that is analysed and included in what is reported to the user as relevant.

How do you classify social listening tools? Please feel free to share your approach with me on Twitter @DigitalMR_CEO.

A blog post about…writing a blog post!

I know I should be writing at least 4 blog posts per month, but somehow I only manage to do one. Call it procrastination, call it daily re-prioritisation of tasks, whatever the reason it does not matter. One of my favourite business truisms is: “There are reasons and results, reasons simply don’t count”.

Especially if you think that there is a number of reasons (no pun intended) as to why it is beneficial to write frequently on your company’s website or blog:

1)      You hopefully demonstrate thought leadership in your subject of expertise

2)      You improve the SEO for your website

3)      You can initiate a dialogue with prospective customers or other stakeholders

Reasons 4 and 5 are more personal and can be seen as the icing on the cake: it can be fulfilling and if you aspire to publish a book one day you may discover that you have already written it bit by bit without making a big deal out of it.

My subject of expertise is Online Market Research with special focus on “social listening” and “online communities”. The former is searched for on Google 480 times monthly in the UK only and the latter 390 times – this is what HubSpot says by the way. Both combinations of words are currently on the second page of Google search for DigitalMR. One of the reasons I am mentioning them in this and other blog posts is SEO; I would really love it if we moved up a few places in the Google ranking for these keywords. It could increase the monthly visitors of the DigitalMR website by at least 1,000 if you think that people also use Google in the US and many other countries, to find information on social listening and online communities in the English language.

There is also the issue of the personal brand which apparently is very important for leaders of start-ups. A good friend and advisor told me recently that tech start-ups are the new rock bands. When I was growing up we would look at photos on album covers of rock bands and get inspired, now we look at photos of Zuckerberg and Musk to get our cues on what is trendy.

I was actually planning to write a blog post about the US elections this weekend; instead you are getting a boring introspection. We did a poll on Twitter on October 3rd and the outcome was that Trump received 53% of the votes Vs 47% for Clinton. I did not like the result and I also did not believe it to be honest so we said nothing about it. Now I felt compelled to say: our Twitter poll predicted the US elections result…but I am not going to do it; you see how I did this Smile?

The point is, there is saturation in the media about the subject…. Duh….no one is interested about another Trump story. So now we are getting somewhere; the reason why I am doing an introspection piece is because I want us to be unique and different and offer original ideas that are not recycled a million times.

How am I doing? If you are still reading and you are not my mom or my wife (I don’t think she reads them either) then maybe I am on to something. I will be checking on Google Analytics and HubSpot all of next week to learn something about this style of writing a blog post. I do like A/B marketing tests; I do not know how I will turn this one into a test but I need to figure something out during the next 10-20 lines so that I can offer some take-home value to you the reader.

I am guessing I will write another blog post next weekend on the Eurostar on my way to Brussels for the LT-Accelerate conference. That one will be more about social listening or online communities – see how I did this again? Well don’t get too excited, it is not that easy; it is not enough to mention a keyword several times in a page in order to rank on Google. As a matter of fact, I might be penalised by the Google algorithm if the mention of any keyword is more than 3% of the whole document – so I heard, but you can never be sure with the Google algorithms. The latest is Penguin 4.0 and before that we had Possum and Panda. I may need another few sentences so that the mention of social listening will become 2.9% of the whole post. The previous word was number 749; 3% of that is 21 times; I think we are safe I only mentioned social listening and online communities 4 times Smile.

As always, if you have any comments feel free to share them with me on Twitter @DigitalMR_CEO.

Does the Market Research Industry Need a New Name?

Market Research

When I read titles such as this one in articles or blog posts, I expect that the author’s answer is a YES, and I prepare myself to find “holes” in their arguments. So I invite you dear readers to wear the hat of the sceptic, and share your views about this existential question (in my opinion anyway).

I have always had a problem with the term “market research” because it is different but close to “marketing research” and it is not an all-inclusive term for what ESOMAR or MRS or CASRO consider our industry to be. Over the past few years and knowing what I know about the type of research made available through the use of technology (such as social listening and online communities) my concerns about this name being unrepresentative have only grown.

If the job titles of the people who carry out MR within Brand organisations are an indication then Consumer Insights or Customer Insights should qualify as new name contenders. “Market Research” is not (just) about the number of responses, it’s about the insights, so shouldn’t the name of the industry be more representative in this sense?

The limitation of market or marketing research as names is that they only go half way. What does that mean? Take a look at the following data operations that are involved in the full spectrum of market research activity:

  • Collection tool set-up
  • Collection
  • Cleaning/validation
  • Processing
  • Analysis/Synthesis
  • Action Discovery
  • Visualisation

Market Research sounds like it is over when the data is collected and maybe understood, but what about the rest of the activity that takes place in order to discover the elusive insights? We spoke before about ‘insights’ being a buzz word with different people understanding different things. Our definition however is very clear: an insight is a “nugget” that can usually be discovered by synthesizing information from more than one source and by adding a good measure of intuition; it is actionable and it delivers a positive result when actioned. It is not a number that can be extracted from a single market research report.

The need to rename our industry becomes more acute nowadays when a lot of data (Big Data) is easily accessible by organisations and thus commoditised. The use of Artificial Intelligence and specifically Machine Learning makes it so much easier to look for insights in big data-sets. We can get data by asking questions (traditional market research e.g. surveys), by “listening” on social media, by tracking behaviour through transactions, or by observation. The best approach is actually to integrate all of the above sources in order to increase our chances of discovering unique business insights.

It looks like the word “insight” is mentioned a lot in this post... Perhaps it’s a giveaway on the new name for the MR industry? Well ‘insight’ is good but ‘foresight’ is probably even better. Does the current market research industry have what it takes to be a player in foresight generation? Instead of using rear view mirror techniques such as what Business Intelligence prescribes, can our industry make the leap toward Predictive Analytics?

Maybe we as an industry are a bit late in the game; the IBMs and the TeraDatas of this world are already players in this space and have been for a few years now. In any case the first step toward any change is introspection and soul searching. What is the role of our industry? If it still is to gather data then we should continue calling ourselves market researchers but if it is going to be more than that, then here are some candidates to trigger debate:

  • Insight Management
  • Foresight Research
  • Insight to Foresight Management
  • Business Decision Support
  • Organisation Decision Support

Changing the name will be the first step towards accepting some of the real changes that are taking place; the new truths that define who we are.  Please feel free to tweet me on @DigitalMR_CEO if you have any other ideas to suggest on the re-naming of the MR industry, or to let me know if you disagree. I look forward to learning more about your views and opinions on the above.

What is Market Research and What isn't? What is Ethical and What isn't?


These are some really big questions, probably too big to handle in one blog post. We have to start somewhere though; this is too important to let it pass without discussions that should ideally involve as many market research practitioners as possible – both on the agency and the client side. The real trigger to write this blog post, however, is the 7th R&D grant DigitalMR has won, this time as part of a consortium of 7 entities: two from Portugal, two from Spain, two from the UK (DigitalMR and City University) and one from Germany. The project is called DiSIEM (Diversity enhancements for Security Information and Event Management) and it is funded by the Horizon 2020 framework of the European Union. This is a trigger because a whole work package has to be delivered at the front of the project about ethics; a very important element of humanity. The handling of these questions should permeate all industries and market research is not an exception.

With social listening & analytics now in the picture, along with behaviour tracking, the lines of where market research stops and marketing starts are getting a bit blurry for me. I hope it is just me. If you have a clear view, then please share it with the global insights community because some other people and organisations seem to be confused too.

It looks like the market research associations are trying their best to get ahead of this new world with all the new disciplines and its blurry lines. ESOMAR seems to be getting its cue from the EU and sometimes it tries to influence its legislation. I am not sure how successful they are.

Here is an excerpt from the ESOMAR guideline for social media research:

“Researchers must not allow personal data they collect in a market research project to be used for any other purpose than market, social and opinion research. If it is intended to collect personalised social media data for other purposes, they must clearly differentiate this activity from their research activities and not misrepresent it as research.”

This is not very different from what is in the code of conduct for market research, so no surprises here. My interpretation is that a company in this space IS ALLOWED to collect personalised social media data for purposes other than research - as long as they clearly differentiate it from market research and not misrepresent it as such.

Now that the rules around collecting data are clear we can concentrate on reporting personally identifiable data. Here is another excerpt from the ESOMAR Guideline:

“Social media platforms offer many opportunities to view personally identifiable data. Some people post information that overtly discloses their identity, are aware of this and have a diminished expectation of privacy. Others are not aware that the services they are using are open for others to collect data from or think that they have disguised their identity by using a pseudonym or username. However, online services are now available that make it possible in many cases to identify a “poster” from their username or comments and can link that to many other aspects of personally identifiable data including their address, phone number, likely income and socio demographic data.”

Well I am not sure it is a market research practitioner’s problem if someone is not aware that she/he is posting on a public website even though it is common knowledge (e.g. on Twitter) and openly explained on the relevant websites and their Terms of Use (ToU).  

…and I am sure I totally disagree with the next excerpt:

“Given this, data cannot always be 100% anonymised on the internet by merely removing the username and linked URL from the comment. Therefore if researchers wish to quote publicly made comments in reports or to pass these on to people not bound by the ICC/ESOMAR Code (or a contract linked to this), they must first check if the user’s identity can be easily discoverable using online search services. If it can, they must make reasonable efforts to either seek permission from the user to quote them or mask the comment to such an extent that the identity of the user cannot be obtained.”

I am not quite sure why the market research associations feel this way. I assume it is because they want to extend the survey thinking into anything else that is market research nowadays. However there is a big difference between surveys and social listening. In the survey world market research agencies actually collect the personal information thus they are responsible to safeguard it based on the codes of conduct and the law. In social media research, agencies have nothing to do with putting the personal information in the public domain, so why should they be responsible to mask it or anonymise the posts?

I asked this question many times before but I have not received a satisfactory answer yet: What about the copyright laws? Doesn’t every author of any published content have the right and expectation to be quoted when the content is quoted?

Of course in the guidelines we also found this sentence:

“If consent has not been obtained (directly or under the ToU) researchers must ensure that they report only depersonalised data from social media sources.”

This effectively allows a market research agency to report personalised data from e.g. Twitter if Twitter’s ToU state that by publishing on their medium users should expect to be quoted with whatever personal information they have shared. Hmmm confusing….which one is it: de-personalise or give credit?

Tweet to @DigitalMR_CEO if you have an opinion!

The Impact of Ambiguity in Social Listening and Analytics

SarcasmThere are many forms of ambiguity in social media posts, with the most popular being sarcasm. Sometimes it is confused or used in an interchangeable way with irony. Here is a definition for the two terms from stackexchange.com :

Irony is used to convey, usually, the opposite meaning of the actual things you say, but its purpose is not intended to hurt the other person. Sarcasm, while still keeping the "characteristic" that you mean the opposite of what you say, unlike irony it is used to hurt the other person.”

For the purposes of this blog post, both irony and sarcasm are responsible for and present the same problem when trying to automatically annotate a post with sentiment or an emotion. The author of a social media post may write something positive about a brand e.g. “I love the new flavour” but if it’s sarcastic, then it is really a negative post, and vice versa e.g. “don’t you hate this ice cream flavour?”.

DigitalMR’s claim to fame, since 2014, is that its R&D focus to solve the problem of low accuracy in automated sentiment analysis in any language, has produced a solution – listening247 – that delivers over 80% sentiment and semantic precision (precision is one of the accuracy metrics in big data analytics). The reason why it is not and cannot really be 100% is because of ambiguity. The outcome of ambiguity in this context is that humans will not agree amongst themselves about the sentiment of a sarcastic or ironic post. Some will think it is positive, some may think it is negative, and in some cases others will think it is neutral for the brand mentioned in the post – i.e. the sentiment is not towards the brand but something else (see Fig.1); it follows that we cannot expect an algorithm to produce a result that everyone agrees with in a case like this. In our research, we have found that on average 10%-30% of posts about a category contain some form of ambiguity. In the example below, 43% was the highest level of agreement among 30 market research practitioners; this is why 80% precision is an excellent result for automated sentiment analysis.

Manual Sentiment Curation of an Ambiguous Tweet                    Figure 1: Manual Sentiment Curation of an Ambiguous Tweet (Base n=30)

Some of you are already aware that DigitalMR uses machine learning to annotate sentiment in an automated way. Machine learning implies that there is an algorithm or a combination of multiple algorithms which are trained with the use of a training dataset, to create a model that does the job. There is one possibility to expect 100% sentiment precision; If supervised machine learning is used (as opposed to semi-supervised or unsupervised), it means that humans create the training dataset manually. If only one human is responsible for creating a training dataset, then the model will only use this person’s judgement to annotate posts for sentiment. In a case as such, because only one person has to agree with the sentiment annotated by the model, if that person is the judge of the model’s precision – then 100% precision is achievable – because that person will not disagree with herself.

It is needless to say that when machine learning is used for automated sentiment analysis, by definition, the identification of sarcasm/irony is a solved problem. “Why?” you may ask. Because a human curator (the person who creates the training dataset) has an understanding of sarcasm/irony, and more often than not, he or she will detect it and annotate a post accordingly.