It’s unfair to say that AWS data transfer costs are designed to be opaque and confusing. It is, however, probably very fair to say that AWS data transfer costs can easily get businesses in a muddle. The good news is that if you understand the underlying principles, you can usually work out what you need to do to tame them. Here is a quick guide to what you need to know.
Taming AWS data transfer costs starts with building the right infrastructure
At the risk of stating the extremely obvious, the best way to tame your AWS data transfer costs is to minimize both the amount of data you transfer and the distance over which you transfer it.
While this is a very straightforward principle, the nature of the cloud can make it something of a challenge to implement in practice, especially if you take a piecemeal approach to developing your cloud infrastructure.
Quite simply, if there is a corporate culture of “just adding bits here and there” as the perceived need or want arises, then there is a very good chance you’re quickly going to end up with cloud infrastructure which is all over the place and that’s even before you consider the distinct possibility that infrastructure will be left active long after it has ceased to be needed (if it was ever really needed at all).
In short, do whatever you need to do to ensure that your AWS cloud infrastructure is developed in a considered manner rather than just thrown together.
Importing data from the internet is usually either free or very affordable
You’re probably not going to have a great deal of flexibility with regards to how much data you import from the internet, but the good news is that importing it into the headline AWS services (e.g. EC2 instances, RDS instances, S3 storage) is either free or very affordable.
The cost of exporting data to the internet can vary greatly depending on the region
AWS regions are an interesting topic. On the one hand, compliance reasons may limit your options. On the other hand, even considering compliance, there may be some room to maneuver and if there is, it’s worth considering whether or not you could use this to your advantage.
For example, even if compliance reasons require you to keep data in the U.S. then you still have four regions from which to choose, likewise if you need to keep it in the EU, you have four/five regions from which to choose (London is due to exit the EU shortly).
You may even want to consider using different regions for different purposes. For example, you could use your nearest region when you want to minimize latency and a more economical region when you’re happy to wait a bit longer for your tasks to complete if it means a lower cost.
Transferring data between services is where life can get complicated
This is the part where you really need to read the fine print. As a rule of thumb, you will get free AWS data transfers within an Availability Zone and it is very likely (although not completely guaranteed) that you will get free AWS data transfers within a region. With some services, however you can get free data transfers within a region for certain operations but not for others. For example, backup, restore, load, and unload operations between Amazon Redshift and Amazon S3 are all free (within the same region) but other operations are chargeable.
Why AWS data transfer costs can still end up being higher than you think they should
Assuming you have sorted out your cloud infrastructure properly so that data is flowing the way it should, then the likeliest reason why your AWS data transfer costs are escalating more than you think they should is because you are transferring more data than you realize.
The question then becomes whether or not you will just have to live with this or whether you can adjust the behavior of your application to reduce the costs without excessive negative impact on the user experience.
For example, let’s say you have an application which regularly requests large quantities of data from S3. If this app is an essential, customer-facing app which needs to work at maximum speed, then you may just have to live with this and swallow the cost.
If it’s not, however, then you could look at reducing the number of requests it makes (assuming that the amount of data transferred per request stays the same instead of increasing to compensate for the fact that the number of requests has been reduced). Alternatively, you might want to see if you could live with slower storage, such as Amazon Glacier, which can actually work a whole lot more quickly than its name might suggest.