A couple of months ago, we successfully migrated a larger part of our infrastructure from Heroku to AWS. Now, when the dust (or should I say the cloud) has settled, we’d like to share what was the main driver behind our decision and how we approached the transfer without stopping Voucherify API, even for a minute.
To better understand our reasoning here, let’s take a quick look at what Voucherify is and what the architecture looks like.
Voucherify offers programmable building blocks to build coupon, referral, and loyalty campaigns. It’s basically an API-first platform which devs can use to build complex and personalized promotional campaigns, like send a customer an email with a specific coupon code when he or she joins a “premium” segment. It also allows companies to track coupon redemptions to figure out what promotions work best. Lastly, it provides a dashboard for marketers to take the burden of promotions’ maintenance and monitoring off developers’ shoulders.
The platform consists basically of 3 components:
When it comes to data storage, we employ Postgres, Mongo, and redis trio.
This is how it looks after the migration:
We serve over 100 customers, who send a couple of million API calls monthly, including both regular requests and some more power-consuming ones like bulk imports/exports or syncs with 3rd party integrations.
Heroku was a perfect solution when we kicked off Voucherify in 2015. It gave us cost-effective hosting and fantastic continuous deployment workflow. Anyone who has used Heroku before knows how simple it is to integrate it with Github and how fast you can deploy. On top of that, Heroku is well-documented and the community is pretty vibrant.
All of this allows you to focus on iterating on your product without having to assign a dedicated person to devops for quite a long time; it was ~16 months in our case. Hosting on Heroku is actually about the speed. You just build, ship, and scale without the bother of infrastructure (deployment scripts, scaling, or security). But the speed manifests itself also in the low latency ensured by Heroku’s data centers located around the world. This is super important for us because the main priority for our API-first platform is developer experience - and I don’t know any dev who’s happy with sluggish responses.
Heroku was fine but our platform has started to grow more dynamically. New enterprise customers wanting to serve dozens of thousands of API calls per day. Our guerilla approach to scale the dynos was becoming more and more costly. We knew that it wouldn’t scale financially in the long-run. As we’re a bootstrapped business we had to react.
This is how our billing looked:
But the pricing wasn’t the only reason we walked away from Heroku.
Firstly, we started facing strange and hard to debug infrastructure problems. A couple of times we noticed our platform had problems and responded non-deterministically. Off we went to find the problem, only to find out (much, much later) that the problems lay with Heroku. The status page just wasn’t updated right away. Sometimes they reacted quickly, sometimes we had to work around the problem ourselves because the fix took several hours.
Secondly, Heroku (or I guess any other PaaS) is poor at resource utilization. The Heroku plans are strict about the machine’s resources structure. As much as we know that such resource limiting policy is important and justified, one should keep in mind every application is different and therefore it needs an appropriate CPU/memory utilization profile. In effect, we paid for additional unused CPU power when we upgraded the plan for more memory. And it gets more and more visible and costly when you need to scale your app. Let’s take a look at our case. Tom, our infrastructure engineer, says:
Now, for exactly the same monthly price (750$ - only services, without databases) we don’t have troubles with managing bulk operations because we utilize resources properly. Moreover, we can handle 600% more traffic.
Thirdly, lack of private IP address - we received notifications that our application was generating spam traffic which was not the case. Moreover, some of our enterprise clients have security policies controlling the outbound traffic and they asked us for the IP address for their firewall - with Heroku we couldn’t satisfy this requirement.
Lastly, limited addons - some Heroku addons are not compatible with recent versions of software Voucherify is integrated with. For example, the Compose addon can only be used in its 2.6 version.
All in all, Heroku became both more money- and time-consuming.
Why AWS? Well, I guess it’s sort of a disappointing answer, but it’s the most popular cloud provider out there and we have significant experience from previous projects. We didn’t run thorough research. Plus, our database instances were already hosted on the AWS in the same regions.
Our API handles hundreds of thousands of coupon redemption requests all over the globe. If the API goes down, somewhere in the world a bunch of folks get embarrassed and then frustrated when their latte discount becomes invalid right at the checkout. Or, somebody can’t use a birthday gift card to pay for that new drone they long dreamed of. Such unpleasant cases quickly escalate to our customers, they get disappointed and it’s the last thing we want.
This is why we came up with a step-by-step availability-first migration strategy. Here’s how we did it.
- 10% AWS - 90% Heroku
- 25% - 75%
- 50% - 50%
- 100% AWS
In the end, we got a more predictable platform and future-proofed it against growing traffic for the next couple of months. We still use Heroku for a couple of services like our dashboard, because it’s easier to deploy and it doesn’t need that many resources after all. Heroku Connect is worth noting too. We love it because it mitigates the effort of Salesforce sync.
Lastly, we also migrated postgres instance away from Heroku. Doing this without stopping production required us to apply some interesting tricks. We’ll describe them in the next post. Stay tuned!