Referral spam has been slowly but steadily creeping into Google Analytics’ reports reaching a point where it cannot be ignored anymore. Referral sperm involves sending fake referral traffic to a website using spambots. It sounds quite harmless but can turn serious with an increase in the number of fake referrals.
Bots are crawler programs developed to perform repetitive tasks at high speeds and accuracy. They are mostly used for indexing content on websites, but could maliciously be used for:
– Harvesting email addresses
– Scraping website content
– Spreading malware
– Click fraud
– Artificially inflating website traffic.
Bots can either be good or bad, it just depends on who is using them and what their intentions are.
When these bots are used for spamming purposes, they are referred to as spambots. Spambots crawl through thousands of websites daily and send out HTTP requests with referrer headers to the websites. The reason they send the fake referrer headers is to avoid detection as bots.
The fake referrer headers used by spambots contain website URLs that the spammer wishes to promote by building backlinks.
This is how it works: The spambot sends a HTTP request with a fake referrer to your server which your server will receive and record in its log. If your server log can be crawled and indexed by google, then google will treat the referrer value in the log as a backlink. This will in effect affect the search engine ranking of the website the spammer is promoting.
Why do I need to deny spambots access to my site?
There are two main reasons why you should block these bots from your website. First: Corruption of Google Analytics data. A few hundred hits per month for a big site with millions of sessions per month will not be significant. But, on a small local site, 20 sessions per day will account for about 70% spam referral traffic. This will suffocate the remaining legitimate traffic and make market analysis a very difficult endeavour.
Second: security and server load. Spambots crawl your site and use server resources doing something you did not ask them to do. They overload the server which leads to slower load times, higher bounce rates and eventually lower rankings. On top of that, you’ll have no idea what they are doing on your site while they are there. For all you know, they could be looking for plugin and server vulnerabilities.
How to detect and fix spam referrers
Detection of spam on your site is the easy part. Just follow the following steps:
Step 1: Go to the referrals report in your Google Analytics and sort the report by bounce rate in descending order.
Step 2: Check for refers with 0% or 100% bounce rate with 10 or more sessions. These are most likely spammy referrers.
Step 3: If any of these suspicious websites appears on the list below, you can be sure it’s a spammy referrer without further investigation.
These are the most common spam referrers:
Step 4: If after checking in the above list you cannot confirm the identity of the referrer, you should visit the website to make sure it is legitimate. Make sure you have an antivirus installed before doing this.
Step 5: Once you have confirmed the bad bots, you can now take measures to block them from your site.
Blocking Spammy referrers
Apply view level filters to exclude referrer spam.
Here are some regex’s that work. Remember that you have a 255 character limit so creating multiple filters will be necessary.
1 REGEX The Worst Offenders
Add this as a filter and segment. This regex covers most of the already identified referral spam sites.
2) REGEX The Wannabes & Stragglers
To overcome the 255 character limit, add a second filter.
Here are some more offenders.
Solution 3: Filter all referral sperm sources (http://viget.com/advance/removing-referral-spam-from-google-analytics)
Blocking referral spam may sometimes require a more extensive filter that encompasses all offenders.
While the below list covers many known offenders, it is by no means exhaustive.
Featured Regular Expressions:
Other measures you could take to avoid spambots include:
– Monitoring your server logs at least once per week
– Using a firewall
– Asking your administrator to help
– Investing in penetration testing or a bot protection service
Hope this helps.