The world beyond Google Analytics - Piwik - free & self-hosted alternative for tracking visitors

If you run an e-commerce website, there is a high chance you use visitor tracking tools. Or at least you consider using some. First, the most obvious and attractive solution for website owners is Google Analytics (GA). But there are other alternatives on the market like Parse.ly, KISSMetrics, Clicky, Woopra or Piwik

In this article we would like to focus on Piwik.

Why would anyone consider step away from GA? Why did we?

First of all, we needed raw visitor data to build a custom BI model upon it and we didn’t feel like spending 150,000$/year on Google Analytics Premium. We needed a tool that is able to handle heavy load (up to 1000 actions per minute) and tons of data. What is more, we didn’t want to give up a convenient user interface. So, in a nutshell, we were looking for GA Premium, but free of charge. Piwik meets the criteria and therefore is sometimes called a “serious contender”.

Piwik is a self-hosted, open source solution used by over 900,000 websites. Particularly popular in Europe - e.g. it has 16% market share in Germany (based on top-level domains). It is supported by a thriving community and a great team. Since it’s mainly a self-hosted software, it requires some time to invest in, but in exchange pays off a lot.

(disclaimer - we are not affiliated to any of the providers listed here in any way)

It’s you who own the data

Privacy of the data is a major concern for many users. Your Internet activity can disclose any information about your life or work. Not mentioning the fact that majority of the websites are tracked by several companies. Within last years, the awareness in the topic is constantly growing.

With Piwik you are 100% sure neither logs nor reports won't be sent to other servers - in the end all your data is stored safely in your MySQL database and will never be shared to third parties.

Raw data science

The other advantage is the flexibility you get when having access to the raw data.

For big players who invest large amounts of money in marketing, spanning over multiple marketing channels, technologies, campaigns, and countries, it’s crucial to be able to precisely measure the impact of all their marketing actions.

These companies usually employ complex custom reports, business intelligence solutions aggregating data across multiple sources, or even implement a machine-learning aided solutions to help track defined KPIs.

Without access to raw data, it’s hardly possible to put any of these advanced solutions in place. With Piwik you get access to raw data out of the box (SQL database, or CSV exports). Whereas to get it with GA you are required to go Premium and pay heavy money every year.

Dealing with ad blockers

In the world of annoying ads, ad blocker software has become very popular. Some reports claim that “ad blocking grew by 41% globally in the last 12 months”. In consequence, these ad blocking tools significantly limit marketing tracking capabilities from third party advertisers. As a result, your customer analytics data is likely to be incomplete or biased.

 

Obviously, there are ways for developers to work around these limitations with GA, but it’s usually doomed to be a lost battle against aggressive and resilient blockers.

An open-source and self-hosted solution - such as Piwik - gives you a possibility to customize the tracking pixel scripts or rewrite the tracker url to efficiently bypass ad blockers and in consequence keep your analytics data much more complete.

The “Cookie law”

In March 2011, the Independent Center for Privacy Protection in Germany (ULD) recommended Piwik as privacy-compliant web analytics software. (source)
In January 2014, the Center for Data Privacy Protection in France (CNIL) recommended Piwik as the only tool that can easily ensure full compliance with privacy regulations. In July 2014 the French CNIL recommends Piwik: the only analytics tool that does not require Cookie Consent. (source)
Piwik privacy compliance is also reflected by the many government agencies who already trust and rely on Piwik (in Europe, Asia, North America, Africa) for providing self-hosted web analytics. (source)

In short, Piwik with disabled cookies, anonymised visitor IP addresses and an opt-out button on your website allows you comply with European law without the annoying cookie notice, while you still collect the customer analytics that you need.

Safe Harbor agreement

Safe Harbor agreement is a set of rules prepared by U.S. Department of State. It was reached in 2000, and has provided a convenient way for US companies to get data from Europe, without violating European law.

On October 6, 2015, the Safe Harbor agreement has become invalid, though. What does it mean? Basically, as an European company, you have to be careful while sharing users' personal or sensitive data with third parties, because it’s your responsibility to guarantee adequate level of data protection.

In effect, over 5,000 US companies, including Google, will have to register and inform EU Data Protection Authorities about their privacy practices and issue a new data-processing clause or addendum (e.g. Google, Salesforce).

With Piwik as a self-hosted solution, you are free of choice how and where you store the data. You don’t need to worry whether your US-based service providers handle your sensitive data with enough protection.

What's more?

Most of the Piwik features are up to par to Google Analytics - like real-time reporting, segmentation, customizable dashboards, tracking behaviour and conversion, mobile & e-commerce analytics and more.

 

Besides that, there are some areas that Piwik seems to beat GA in:

  • tracking file downloads (GA requires sending custom event to do that),
  • tracking outbound links,
  • tracking cart abandonment,
  • a clear roadmap with long term goals to achieve,
  • tracking a particular visitor along with behaviour and shopping cart history, as opposed to gathering only general traffic statistics,
  • allowing third-party plugins for extra features.

There are some down-sides to it as well:

  • you won’t get insights into Google AdWords or AdSense, though,
  • you won’t get the additional context information that Google provides, like: age, gender, interests.

Our case

We used Piwik raw data to measure traffic on the website during an advertising campaign in TV and track visitor behaviour. In short, the goal was - how to measure the whole impact of a TV campaign?

The tricky part is that TV advertising is not a directly traceable marketing channel - i.e. it’s difficult to figure out how many paying customers you gained after having displayed a particular spot in the TV. The most obvious idea is to observe online traffic and assume the people who visit your website within minutes after the TV spot airtime to be coming from your campaign. On top of it, there are more accurate techniques that take into account the interference coming from other marketing channels and campaigns, or seasonal trends.

Together with our business partners, we used an indirect methodology related to Temporal Canonical Correlation Analysis. In a nutshell, by decomposition of KPI time series into KPI time series related to a particular marketing channel, and KPI time series related to side effects, we were able to explain how the TV campaign affects given KPI.

In brief, we needed the raw visitor data to feed into our BI machinery and squeeze out as much information as possible to be better prepared for the upcoming ad campaigns that had been planned. With limited budget and only GA we wouldn't have been able to crunch it.

As a result, we helped increase long-term ROI of TV commercial spendings. Moreover, based on gathered data we were also able to improve the budget allocation for future commercials.

Conclusion

Piwik can be a serious competitor for Google Analytics. They both offer amazing features. While GA is super simple to set up, with Piwik you have to invest a few hours to set up the infrastructure and then probably a few more to fine-tune the software to your needs and traffic.

If you are a regular user with no specific requirements, you are good to go with GA. But if the benefit of complete control over the data far surpasses the disadvantages of self-hosting, you should definitely consider Piwik a good option.

Stay tuned - our next post in Piwik series will shed more light on the technical side - setting up infrastructure on OpenShift and Amazon EC2, load-testing and performance tuning, operations and scalability.