Piwik for High-traffic Websites
Our case in numbers
Piwik is certainly a viable alternative to GA in a case similar to ours – tracking the overall impact of a TV campaign based on the website traffic. TV is a marketing medium that is not directly traceable. To make it even harder, there are a few more variables in the environment – other marketing channels, interfering campaigns, periodic trends, etc. In our specific case we needed the raw visitor data in order to feed it into BI algorithms and take data-driven decisions when planning the upcoming campaigns.
Together with our business partners, we’ve set up Piwik for a popular German e-commerce website. On top of the usual page visit load we had to take into account the increased traffic coming from the TV campaign. According to rough estimations, we had to be prepared to handle at least 150k visits per day or 4.5M visits per month (the campaign was scheduled for the whole month). Each visit means six actions on average. When you take into account daily traffic cycle and possible peaks, then it turns out that we should be prepared for handling roughly 150 requests per second.
We have decided to give OpenShift a chance. In a startup environment, it’s crucial to deliver fast – with Piwik quickstart available on GitHub we launched an MVP with the basic cartridges in a few minutes, including a HAProxy gear for load-balancing the incoming requests. We have also installed DataDog and Logentries plugins to collect metrics and logs from the application.
When building any sort of systems, for us it’s essential to see it fail, understand what exactly happens when things go wrong, how system components behave under the hood, just to have an idea how to fix things and how to recover in case of emergency. You want to know the application inside out and be able to react quickly before you e.g. launch an (expensive) TV marketing campaign.
That’s why we started with a very basic setup - two web servers running Piwik, talking to a dedicated MySQL db server. Additionally an external HAProxy for spreading the traffic across the web servers. Each gear allocated 4x vCPU and 2GB memory (Large gears).
With this very simple architecture and pretty much a default Piwik installation we weren’t able to go far in terms of performance.
Before we actually moved any further, we had to solve one problem – OpenShift gear auto-scaling feature.
In our case, booting new gears took so long that the app would be completely flooded before the new gear would eventually become available. That’s because the app is built on the application gear, and if you scale up each new gear builds a new copy of the app. Since it’s quite probable to observe a traffic peak on the website soon after the TV spot is aired - we couldn’t rely on the auto-scale feature. So we quickly moved to manual scaling (that required additional work - e.g. scripts for synchronizing the Piwik config.php.ini file across all gears).
We used loader.io to stress test our Piwik instance with thousands of concurrent connections.
The chart below shows the initial load-test with ~150 requests per second (Y axis) and only ~50 responses per second; the rest ended up timing out.
As we were limited by memory allocated to a single OpenShift gear (for both the app servers and db server - maximum of 2GB), we gradually scaled our application horizontally throwing in more gears, beginning with 2 and ending up with 10.
In fact, this helped. But - as you could predict - doubling the number of app servers wouldn’t necessarily double the overall performance. With 6 app servers already up and running, throwing in another one wouldn’t actually make much performance difference (if any). We soon realized that the single MySQL gear with only 2GB of memory was a bottleneck.
Piwik optimisation for speed
We didn’t want to leave OpenShift just because of the memory limitations - we hardly optimised the software so far! We basically followed the rules available on Piwik blog step by step, in order to optimize our Piwik installation.
The first visible improvement was noticed after we had stepped away from real-time reports. We were tracking a large website and when real-time reports were enabled, accessing the Piwik dashboard would cause timeouts and freeze the application, or in the worst case even collapse the whole thing. Triggering calculation of the reports would instantly eat up tons of resources. Thankfully, you can switch off live reports and calculate them once in a while with a cron job.
For higher performance we have increased the maximum number of client connections to MySQL database from 100 to 250 (OPENSHIFT_MYSQL_MAX_CONNECTIONS environment variable).
Also, Piwik recommends to keep the number of unique URLs tracked low. We’ve reviewed the website URL parameters and filtered out the ones that didn’t bring any value for us.
We have also decided to use APC as PHP cache to reduce the disk I/O and changed the process of determining where an IP address is geographically located - we used PECL module to efficiently determine location of the visitors.
With all these changes in place, we got rid of timeout errors and reasonably decreased the average response time. The infrastructure seemed stable with the load of 120 requests per second. So we pushed a little more to see where it breaks (chart below; Y axis is requests per second; 10-minute test).
In theory we were ready for the show, but also somehow worried about the MySQL gear.
We’ve decided to welcome the TV campaign with 10 app gears (just for certainty) and 1 database gear. Our assumptions regarding the traffic were pretty close to reality. Instead of anticipated 150,000 visits per day, the first day of the campaign has ended with 260,181 visits - 73% more than expected (chart below; Y axis is visits per minute). With slightly increased visitor engagement this translated into as much as 78 requests per second observed in the peak time.
With our setup, we were able to successfully handle the load. The average response time was still closer to seconds rather than milliseconds, the latency for 99th percentile soaring up to 23 seconds, and the overall timeout ratio close to 0.7%. Not perfect, as we failed to track ~1.8k visits the first day, but good enough for marketing analytics purposes.
After the campaign
In our case the biggest issue with hosting Piwik on OpenShift was database cartridge. First of all, we have provided only 2GB of memory for database, in opposite to the recommended 16GB in large load-balanced installations. We were limited by gear sizes on OpenShift, and Dedicated Node Services were too expensive for us. We have reached the point, when adding new gears didn’t make any sense, because of database bottleneck.
For sure we didn’t drain all possibilities to optimise Piwik. It’s worth to mention that Piwik beginning with 2.10.0 supports caching backends such as Redis, in order to queue incoming tracking requests for delayed processing. Well, we did without it. We also didn’t take advantage of MySQL InnoDB tables (offers greater reliability and reduces I/O overhead when properly optimised), since we sticked to MyISAM as a storage engine. And there are at least a few options to optimize regarding InnoDB itself.
At some point, we felt that the cost of further optimisation effort would be disproportionate to the performance gain, assuming that we stick to the 2GB memory gears.
Considering a move to Amazon Web Services
Having some experience with Piwik already, we decided to move away from OpenShift to Amazon Web Services. That TV campaign was already over, but we wanted to be better prepared for future.
Setting up Piwik instance on Amazon EC2 requires more technical knowledge comparing to OpenShift, but in exchange you can overcome the db limitations. Amazon RDS brings more flexibility in terms of configuring your database. Besides standard instances, Amazon provides memory-optimized instances with even as much as 244GB memory on-board!
We decided to go with db.m3.xlarge instance for a database: 4x vCPU & 15GB memory, plus two EC2 c3.2xlarge instances: 8x vCPU & 15GB memory and distributed application traffic with Elastic Load Balancer.
This straightforward setup has already yielded much better results that our OpenShift with 11 gears. Interestingly, even at the rate of 240 requests per second we managed to keep the average response times below 300ms and do without timeouts or errors. But it’s important to mention that here the load tests were performed with loader.io, not real users, though.
Apart from plethora of configuration options, Amazon offers a couple of convenient tools to help you manage the infrastructure. For instance, Amazon Virtual Private Cloud (Amazon VPC) lets you define a virtual network in your own logically- and resource-isolated area within the AWS cloud. Helpful when managing more than a single Piwik instance.
Another advantage of hosting your solution on AWS is an option to increase fault tolerance by placing your EC2 instances in multiple Availability Zones.
Auto-scaling is also available with AWS, but again with a variety of extra options - based on a fixed schedule, application load metrics, or Amazon SQS size. So we could align the auto-scaling schedule with the TV spot airtimes just to be prepared for the extra load.
In terms of pricing - we spent $900/mo. with 11 Large OpenShift gears + extra storage space. For comparable amount of money, in Amazon we could afford fewer but way more powerful machines (in fact we didn’t need as many as 10 web servers), which in the end has significantly surpassed OpenShift in performance. And we were still far from any resource constraints.
In the end, we’ve survived the TV campaign hosting our Piwik on OpenShift. The biggest issue with this provider was a single limited-memory database gear. We have reached the point, when adding new application gears didn’t make any sense, because of the database bottleneck.
We didn’t want to spend even more money ($1000/mo.) on OpenShift Dedicated Node Services to get more powerful machines. So in the meantime, we’ve explored how well Piwik works on Amazon EC2 and RDS (assuming a similar budget for hosting). The results look very promising - we were able to easily handle twice the load (keeping Piwik super stable) and still had a lot of room to go.
Next time our clients ask us to set up Piwik for tracking a high-traffic website we’ll most probably go for Amazon Web Services right away.
Interested in working at Voucherify?