What does Unsplash cost in 2019?
What does it cost to run a site with tens of millions of users and billions of photos viewed per month? $10,000? $100,000? $500,000?
3 years ago, we wrote ‘What does Unsplash cost?’ to give a totally transparent look at the bills associated with hosting one of the largest photography sites in the world.
Since then, Unsplash has continued to grow tremendously, now powering more image use than the major image media incumbents, Shutterstock, Getty, and Adobe, combined.
With Unsplash’s public API, we power over 1000+ mainstream applications, including Medium, Trello, Squarespace, Tencent, Naver, Square, Adobe, and Dropbox.
All of that growth means two things: more traffic and bigger bills.
In the interest of transparency, Chris and I thought we were overdue for an update.
It’s 2019. What does it cost to host Unsplash?
Back in 2016, Unsplash had just crossed 1 billion images viewed and 5.5M photos downloaded per month.
Our team was smaller and our product was a lot less developed, which led to less services and less in-house processing. We had one main application, a traditional Rails monolith, that consumed a handful of services to create the basic Unsplash experience.
Heavy features like search and realtime photo stats were in their infancy, which led to much simpler data processing requirements and the use of 3rd party services like Keen and a handful of CRON jobs.
The final monthly breakdown for April 2016 was:
- Web Servers: $2,731.23
- Monitoring: $630.00
- Data Processing: $1,000.00
- Image Hosting: $11,170.00
- Other: $2,127.39
Total (USD): $17,658.62
A lot has changed.
For one, Unsplash is a hell of a lot bigger. 10+ times bigger. We now get more traffic from our API partners than our own website and official apps, despite these growing significantly.
Partnering with some of the largest consumer facing apps in the world has pushed our engineering team to match their practices around redundancy, monitoring, and availability, which requires more supporting resources and services.
Our product team has continued to push the envelope for core features like search and contributor stats, requiring more and more data to be processed in greater and greater volumes.
All of these things have pushed our architecture to be more complex, while also increasing the baseline costs.
Total monthly cost: $29,763
We continue to use Heroku as our main web platform. Despite its premium cost over AWS, Azure, and Google Cloud, Heroku’s built-in deployment and configuration tools allow our team to move faster, more confidently, and more reliably.
As we’ve detailed previously, the alternatives would undoubtably be cheaper on paper. But in reality, the increased simplicity and freedom offered by Heroku for a small, product-focused team is a major cost savings advantage.
In addition to our main web servers and databases using Heroku, we use Fastly for distributed CDN caching, Elastic Cloud for our Elasticsearch clusters, and Stream for our feed and notification architecture.
Total monthly cost: $7,679
Our team is small for Unsplash’s size, with our total product team counting in at just 11 people.
With no one dedicated to dev-ops, ensuring Unsplash is running smoothly and never goes down, requires a lot of instrumentation and reporting.
Despite the volume of metrics we monitor and report on, New Relic, Sentry, and Datadog remain fairly inexpensive solutions. Our logging is certainly our largest monitoring expense, but the detailed information is crucial when debugging issues or rolling out new features.
Total monthly cost: $15,223
Data processing has been the area with the largest relative increase since 2016. Back then, analytics and data were an afterthought in our development process. We relied on tools like Google Analytics for user analytics and Keen for product metrics like photo views and downloads.
Since then, we’ve needed to expand our data collection, aggregation, and reporting significantly, both from a product and a company perspective. As Unsplash has grown, the volume has also increased considerably, with hundreds of millions of events tracked every day.
We’ve replaced Google Analytics and Keen with an open-source data pipeline, Snowplow Analytics. Snowplow takes care of the data collection and formatting, allowing Tim, our data engineer, to focus on data aggregation, modelling, and visualization.
We’ve also expanded the role of the data architecture in the product to handle all of our machine learning and search processing. As we go forward, we expect this to continue to be the biggest area of expansion.
Total monthly cost: $42,408
Imgix is our single biggest expense, but we love it. Yes there are cheaper options, but trust us when we say that they aren’t as good for what we do.
We send petabytes of data through Imgix’s CDN and render more than 250 million variations of our source images every month. Their reliability, performance, and flexibility is unmatched, and negotiating our contract through them actually allows us to discount our CDN costs due to their bulk negotiations with CDN providers.
The final monthly breakdown for February 2019 was:
- Web servers: $29,763
- Monitoring: $7,679
- Data Processing: $15,223
- Image Hosting: $42,408
- Other: $3,580
Total (USD): $98,653
Comparing across the years, some trends emerge.
Despite growing top-line metrics over 12x and significantly expanding the systems to include more features, reliability, and redundancies, hosting costs in total have only increased 5x.
There are few reasons backing this:
- As systems approach a certain cost threshold, it becomes more optimal to trade engineering salary for technical optimizations. We try to avoid this as it removes engineering resources from user-facing feature development, but over the years we’ve made significant improvements to low-level caches, bulk data aggregations, and HTTP caching.
- At larger and larger volumes, it becomes easier to negotiate bulk discounts from services.
- Resources can be more fully utilized at high capacity. This is especially true for our Redis and Redshift clusters.
At the same time, the ratio between our hosting costs and the non-hosting software we use, like Github, Looker, and Slack, continues to increase, as it’s a function of engineering team size. To put that in perspective, per engineer, Unsplash supports more users than Facebook at the equivalent point in time.
Hopefully getting a behind-the-scenes look at what it costs to run a site like Unsplash will help you with your own business, or at least give you a better understanding of what’s involved.
If you’re in a position to be able to share your company’s costs, we’d love to see.
If you have any questions, or want to dive deeper into this topic, give us a shout on twitter @lukechesser & @chrisliverani. If you liked reading this, you might like hearing about how we scaled Unsplash with a small team.