Identifying and Removing Google Analytics Spam (Post 1 of 2)
Google Analytics spam has been a constant thorn in the side of agencies and organisations who rely on accurate web analytics reporting to drive their business decisions. The impact of spammy data in Google Analytics reports varies from company to company. For small local businesses, for example, it can completely distort their reports and render their data almost useless.
Up until recently, this spam was mainly identifiable in Google Analytics as:
- Fake landing pages
- Fake sources/referrers
- Fake keywords
Vote For Trump?
Over the past couple of months, however, we’ve been noticing a new culprit making it’s way into clients Google Analytics accounts. Language spam. The main offender with regards to this type of spam, which many of you may have heard about, can be seen below. This spam uses legitimate site domains as hostnames and even includes a not-so-secret “Vote for Trump” message.
The story behind this spam and its creator, Russian spammer – Vitaly Popov, is quite interesting. But that’s not what this series of posts will focus on – you can read more about him here. What we’re going to focus on is the key types of spam that we see in Google Analytics (which we’ll discuss in this post), and how to go about removing them from your Google Analytics reports (which we’ll discuss in our next blog post).
Types of Google Analytics Spam
There are two broad categories of spam that can find their way into your Google Analytics reports. They are:
The majority of spam that we have encountered in our clients Google Analytics accounts has been “ghost spam”. Ghost spam is spam that never actually accesses your site at all. Rather, they sneak their data into your reports via the measurement protocol. For this reason, ghost spam will always leave a fake hostname or hostname will appear as (not set) in Google Analytics (if they actually landed on your site, the hostname would usually be your sites domain e.g. www.arekibo.com). This is important to bear in mind when it comes to creating a solution for ghost spam in Google Analytics (more on that later!).
Example: The “Vote For Trump” spam above is a form of ghost spam. This spam has been finding its way into a huge number of Google Analytics properties via the measurement protocol using legitimate websites as hostnames.
Crawler spam, on the other hand, does in fact access your site. Usually crawler spam takes the form of spam bots which crawl your site. These bots ignore your robots.txt and other rules that are meant to block them. This type of spam is more difficult to identify because a) it uses a valid hostname; and b) unlike most ghost spam, it will usually leave Page View and Time on Site figures of >0.
At the same time, the main offenders are well known, and it’s getting less common for new offenders of this type appear.
Example: The timer4web.com referral in the screenshot below is an example of crawler spam. This URL in particular is utilized by an affiliate of a malicious advertising platform. It appears in Google Analytics reports in the hope that whoever views it will want to investigate, and ultimately will go to the site (don’t).
Hopefully this introduction has given you a good overview of the main types of spam that you’re likely to find in your Google Analytics reports. In our next blog post, we’ll be covering 6 Steps to Dealing With Google Analytics Spam. In that post we’ll explain how you can block all the spam types we’ve identified from messing up your Google Analytics data.
If you’ve noticed any other kind of spam that we haven’t covered in this post, let us know in the comments section. Feel free to get in touch with any other Google Analytics related queries at firstname.lastname@example.org.