“I keep finding errors in my social media monitoring data and management is losing confidence in my reports. What should I do?”
As the volume of data proliferates, I and others increasingly hear complaints about the integrity of the data that many popular monitoring systems provide. The number of decisions currently being made based on bad data is frightening. A recent study we conducted discovered that 60 percent of data delivered to a client was invalid (that includes spam, paid posts, ads for Viagra, and duplicates).
The good news is that you can get cleaner data, but it will take a little work. First, look over my How To Get Good Data Checklist. One of the most important things on this list is to remember to review your search terms with your supplier each month.
Second, let’s deal with some specific situations. Do any of these sound familiar?
1. An article appeared in April, but is listed in your June results
Fix: Check whether your vendor is reporting “day of publication” or “day of collection.”
2. Wrong circulation numbers
Explanation and fix:
For online data: First check the URL. Is your system reporting for all of Yahoo, or the specific blog in which your news appeared?
This happens with Facebook data all the time. Routinely, agencies and vendors report “impressions” in the billions. Billions, really? Sure, if you assume that every Facebook post reaches the entire 850,000,000-member network. (Yes, that is how some firms report the “circulation” for Facebook.)
Let’s think about that for a moment. If your target is the U.S., that includes only about 150 million members. Then you have to take into account that only about 10 percent of what is posted on Facebook is public. And only about 10 percent of what is public is actually seen. At best you’re talking a reach of 1.5 million for the U.S.
Let’s not even go there for Twitter. The good news is Twitter actually tells you the number of followers an individual author has. Why do some organizations report a “circulation” for Twitter of 42 million? Because it’s easier than actually tying the Twitterer to the tweet.
For traditional media data: Do a common sense check. Put the data into an Excel spreadsheet and sort by circulation (OTS) largest to smallest. Now visit the URLs of the 10 items with the biggest circulation. Are they for real? For online news, if you use www.compete.com you will only get the root URL, not the subdomains. Make sure the URL corresponds to where the item actually ran.
Fix: Use the “find” function in Excel to do a search for “Vioxx” and “Viagra” and other common spam terms.
Fix: Go to data in Excel and click on “remove duplicates.” You can select the appropriate columns. We typically do it on the item URLs.
5. Press releases that are picked up because the company’s boiler plate says, “has worked with clients such as…”
Fix: Search for “PR Newswire,” “PR Web,” or just “wire,” and you’ll be able to quickly identify the press releases.
6. Underwriting credits
Explanation and fix: Many broadcast clipping services now provide radio, including transcripts from National Public Radio as well as individual shows like “Marketplace” and “Living on Earth.” Unfortunately they haven’t figured out that those nice message-rich statements at the end of the broadcast are, in fact, paid underwriting credits. They should not count as “earned” media.
7. Mentions from the UK, Germany, Australia, Canada, France and India in supposedly “U.S. Only” clips
Fix: Look for URL domains like .dk, .fr, .ca, etc., and remove them.
8. Paid bloggers
Fix: Sort all blogs by frequency and tone. If you get one blogger who is consistently positive and posts regularly, check him/her out. I’m not saying they are all paid, but you should double check. Any blog that has on its home page “I was paid or compensated to write this” is not earned media.