Platforms like Canva have witnessed astronomical growth since their inception. Serving over 100 million active monthly users and hosting more than 15 billion designs, the sheer scale of data managed is staggering. With this growth comes the challenge of managing and optimizing the storage of over 230 petabytes of data on AWS, specifically using Amazon S3 storage.
This article explores the strategies and tools Canva used to optimize their storage costs, focusing on the transition to S3 Glacier Instant Retrieval and the significant cost savings that followed. The lessons learned from Canva’s approach can serve as a valuable guide for other companies looking to optimize their AWS storage expenditures.
Understanding Amazon S3 and Its Storage Classes
Amazon S3 (Simple Storage Service) is a scalable cloud storage solution designed to handle vast amounts of data. It allows users to upload data as objects into containers called buckets, each with a unique URL for easy access. S3 is renowned for its scalability, automatically adjusting to store anything from a few gigabytes to hundreds of petabytes, charging users only for what they use.
To cater to different data access patterns and cost requirements, AWS offers several S3 storage classes:
- S3 Standard: Ideal for frequently accessed data, such as common design patterns and high-demand assets.
- S3 Standard-Infrequent Access (S3 Standard-IA): A cost-effective option for data that is accessed less frequently but still requires rapid retrieval.
- S3 Glacier Flexible Retrieval: Provides low-cost archival storage with longer retrieval times, suitable for data like logs and backups.
- S3 Glacier Instant Retrieval: Launched in November 2021, this storage class combines low-cost archival storage with fast, millisecond-level retrieval times, making it an attractive option for infrequently accessed data that still needs quick access.
The Challenge of Managing Massive Data Growth
Canva’s exponential growth led to the accumulation of a vast amount of user-generated content. While much of this content is accessed frequently in the days immediately following its creation, the access rates drop significantly over time. Historically, Canva stored such data in S3 Standard-IA to balance cost and access needs. However, the introduction of S3 Glacier Instant Retrieval prompted Canva to reevaluate their storage strategy.
Using AWS’s S3 Storage Class Analysis tool, Canva gained insights into their data access patterns. The analysis revealed that while 90% of their data was stored in S3 Standard-IA, this only accounted for 30-40% of data accesses. Conversely, S3 Glacier Instant Retrieval appeared to be a better fit for much of their infrequently accessed data, promising significant cost savings.
Calculating Potential Savings and Transition Costs
The potential savings of moving 207 petabytes (90% of 230PB) of data from S3 Standard-IA to S3 Glacier Instant Retrieval were substantial. By transitioning to the new storage class, Canva projected a monthly savings of approximately $1.8 million, translating to over $22 million annually. However, the transition was not without its challenges.
AWS charges a one-time fee of $0.02 per 1,000 objects for transitioning data between storage classes. With over 300 billion objects in their S3 inventory, Canva faced a potential transition cost exceeding $6 million. This made it crucial to carefully analyze which data should be moved to ensure the transition would be cost-effective in the long run.
Strategic Data Transition
Canva approached the transition by focusing on objects with an average size of 400 KB or larger, as these would reach a positive return on investment (ROI) within six months. Smaller objects, particularly those around 20 KB, were found to be more cost-effective to store in S3 Standard due to the minimum billable object size of 128 KB in S3 Standard-IA and S3 Glacier Instant Retrieval.
This strategic approach ensured that Canva would achieve significant cost savings without incurring excessive transition costs. The implementation was straightforward: Canva applied lifecycle policies to the targeted S3 buckets, automating the migration of objects to S3 Glacier Instant Retrieval. Within two days, nearly 80 billion objects were successfully migrated, resulting in ongoing savings of approximately $300,000 per month, or $3.6 million annually.
Canva’s experience highlights the importance of understanding data access patterns and leveraging the appropriate AWS tools and storage classes to optimize costs effectively. The transition to S3 Glacier Instant Retrieval required careful planning and a significant one-time investment, but the resulting savings justified the effort.
For companies managing large amounts of data on AWS, Canva’s approach offers valuable insights. By utilizing tools like S3 Storage Class Analysis and adopting a strategic approach to data migration, businesses can achieve substantial cost savings while maintaining efficient data access and management.
AWS continues to invest in storage solutions tailored to various use cases, making it a valuable partner for businesses of all sizes. As data volumes grow and access patterns evolve, the ability to optimize storage costs will remain a critical factor in managing cloud infrastructure effectively.
If you’re looking to make the most of your cloud storage and reduce costs, our team at ZirconTech is here to help. Let’s explore how you can optimize your storage solutions together.