5 Tips To Reduce Noise In Social Listening

5 Tips To Reduce Noise In Social Listening

When we say noise in a social listening or social analytics context, we mean posts that are irrelevant to the subject being researched. If for example our social media monitor is created to harvest online posts about beer, the search query will be structured around brands of interest and other beer related keywords. It is horrifying to consider that 80%-90% of what an initial harvesting query (of online posts) will return are irrelevant posts i.e. noise. 

So how do we get rid of this noise?  


Here are our top 5 tips on how to significantly reduce noise from your social listening reports:

1. As a researcher team responsible for the search queries you should appoint a team of intelligent humans with great vocabulary in the language used (to harvest for social media monitoring) including colloquialisms/slang etc. and mainstream common sense.

2. The researchers should have an intimate knowledge of the research subject or product category. For example, if the category is cars it would help if one of the team members was a “petrol head”, or if it is watches someone should be a watch enthusiast who knows most of the makes.

3. There is no substitute to thorough research before you create your first search query. It is important to discover as many synonyms and homonyms so that the FIRST search query will be informed accordingly. An example of a homonym is apple (computers) and apple (fruit) or mine (for gold) and mine (that explodes). If we are interested in harvesting posts about apple the company and their products, then our search query should exclude posts about apple the fruit.

4. The best method for tip 3. (above) is to use Regular Expression queries which usually include Boolean logic. A simple query for the apple example would look like this: apple AND (computer OR phone) NOT (juice OR fruit).

5. After we run the first regular expression query that will harvest the first batch of online posts for us, our intelligent researchers (from tip 1.) will check a large enough random sample of our social media posts and search for irrelevant posts. Once this is done, patterns of noise will be identified so that search query version 2 can be created, this time avoiding harvesting posts that were identified as irrelevant. This is an iterative process that goes on for as long as our human researcher finds patterns that can become part of our regular expression string, which will exclude the noise.

By the end of these noise cleaning iterations, in most cases we end up with regular expression strings that are multiple pages long; this process usually takes a few days for a skilled team. The good news is that we only need to go through this arduous process once, when setting up the social listening for a product category for the first time. Due to the fact that languages are alive and tend to evolve, the search queries used for social listening harvesting would likely have to be reviewed once a year. We should look for new words, phrases and acronyms that become popular and are relevant (to be included) or irrelevant (to be excluded).

It is the users of DIY social media monitoring tools such as Sysomos, Brandwatch, radian6 or Meltwater Buzz that I am the most concerned about; if they are not familiar with the above issues then they are probably analysing data on beautiful dashboards, sharing them with supervisors and other colleagues with pride, not knowing that the proverbial “garbage in, garbage out” applies in an extreme form.

Please do share your own experiences on how you deal with noise reduction from social media.


Share this article: