Cost Optimization: The Misunderstood Pillar
Slashing overspend and seeing the bill is only part of it
Until we get it right, it’s time to get back to basics.
Today, I want to simply go over the 5 provided design principles of the Cost Optimization pillar according to AWS. For reference, I’ll be quoting the link above to get the definitions. (I didn’t come up with these on my own!)
1. Implement cloud financial management
“To achieve financial success and accelerate business value realization in the cloud, you must invest in Cloud Financial Management. Your organization must dedicate the necessary time and resources for building capability in this new domain of technology and usage management. Similar to your Security or Operations capability, you need to build capability through knowledge building, programs, resources, and processes to help you become a cost efficient organization.”
This is actually one of the best addressed principles I see with customers. It’s generally realized at this point the importance of effectively managing costs and tracking those. The execution of that, well that’s another story.
True FinOps provides the bridge necessary to make this work. That doesn’t mean making a program and calling it a day. That’s called checking a box. It requires total buy-in, collaboration, and accountability. It’s best supported culturally, and leaders drive culture through identifying cost optimization as a requirement in daily communications and putting it on paper in the form of SOPs. When there’s not alignment there among leadership, it shows. A cost-sensitive culture isn’t what we’re trying to achieve here. What we desire is a cost-conscious culture where accountability for spend is shared and reinforced continuously.
Cloud supports the business!
2. Adopt a consumption model
“Pay only for the computing resources you consume, and increase or decrease usage depending on business requirements. For example, development and test environments are typically only used for eight hours a day during the work week. You can stop these resources when they’re not in use for a potential cost savings of 75% (40 hours versus 168 hours).”
The first component of this is what I think draws people to cloud in the first place. The idea of the PAYG model is truly second to none. Much like the previous item, it relies on planning and routine inspection to maintain efficiency.
Understanding where PAYG can be used to effect is important. Like the example call-out above, knowing hours of usage and making the conscious decision to restrict it during periods where it isn’t warranted is an example of knowing where PAYG can be used effectively.
To know this, organizations need to understand their applications and their teams who take care of them. These decisions should be informed by engineers and driven by leadership. Like leaving the lights on after closing time, organizations that don’t pay attention to the little things will inevitably struggle with PAYG and in turn overpay.
3. Measure overall efficiency
“Measure the business output of the workload and the costs associated with delivery. Use this data to understand the gains you make from increasing output, increasing functionality, and reducing cost.”
This is the first of the principles where I really see organizations struggle.
Ultimately, it’s not what’s being measured but where it’s being measured. I think many organizations see this as a downstream effort, where the bill summarizes that efficiency.
The bill is a latent indicator. It shows the truth but abstracts the path there, and inefficiencies which accumulate. It can identify causes of spend in aggregate, and even provide the offenders. But it cannot replace a design conversation and understanding of the many things which contribute to overall spend. The bill doesn’t measure efficiency in a precise manner.
Let me explain with an analogy:
Imagine you start a diet plan with the goal of losing 20 lbs. After a few weeks you’ve lost some weight but eventually you hit a standstill, and you don’t know why. Unfortunately, you didn’t track anything!
You ran a Fit-stagram to post your workouts a few days of the week but didn’t track anything else with any reliable precision. No logged meals to go off of. You ate less, but not consistently. You woke up some days at 7, but you slept in on others. You went to sleep on time…sometimes. You had a carb binge at least twice since you got started. There was a sneaky ice cream tub in your freezer you sometimes turned to in your darkest hours.
All told, it’s a crap shoot why you’ve stalled. The number says you’ve lost weight, but that’s it. You know something is up and that your results aren’t what you expected.
That’s kind of how your bill is. It’s the scale. Maybe a better scale, but still a scale.
Measuring efficiency can be done in ways granular and non-granular, but it certainly helps to start at a low level and work out. In my previous post about using an AWS Transit Gateway as a checkpoint for chargeback, you saw a way to measure and accurately attribute cost to consumers. This doesn’t have to stop here though. You can measure nearly anything in AWS or any cloud for that matter through maintaining effective observability over an environment.
For example, designing an environment that supports effective observability can help reduce overall expenditure because it improves accountability and the ability to measure. Much like keeping unhealthy foods out of your reach during a diet, setting the conditions for better cost management can be way of improving overall efficiency. But you can’t do that if you aren’t consciously measuring and assessing.
4. Stop spending money on undifferentiated heavy lifting
“AWS does the heavy lifting of data center operations like racking, stacking, and powering servers. It also removes the operational burden of managing operating systems and applications with managed services. This allows you to focus on your customers and business projects rather than on IT infrastructure.”
Perhaps the most obvious one of them all, but is it really?
There are many organizations that only understand it from the hardware perspective, which is only half the battle. Part of operationalizing on public cloud is owning the mindset shift of shifting the burden from IT infrastructure management to the platform itself and allowing workers to focus on supporting the needs of the business.
Not sure what I mean? Permit me to use another analogy. This one is faster, I promise!
My dad has been balancing an actual checkbook now for my entire life. In spite of the fact that he receives online statements, does online banking, and is otherwise entirely reliant on online systems, he maintains this checkbook routine to this day. Fortunately for him, I think it keeps his mind sharp with numbers and gives him an outlet in his retirement—so I approve, I guess.
However, it still reminds me that adopting something (online banking) doesn’t mean it gets operationalized in reality.
I frequently meet with customers that are using their employees to do manual infrastructure tasks. While I am all about meeting business requirements, running the cloud like a datacenter and avoiding the true benefits of using it in order to have workers perform routine tasks manually is a waste of time and effort. It’s the opposite of cost optimization. It values neither time nor money. A worker doing a task that can be automated effectively is spending money in the form of that person’s salary. Why spend that money on doing something routine when it can be spent improving systems?
Operationalizing the public cloud in support of broader business goals should always be the end goal of this principle.
5. Analyze and attribute expenditure
“The cloud makes it easier to accurately identify the cost and usage of workloads, which then allows transparent attribution of IT costs to revenue streams and individual workload owners. This helps measure return on investment (ROI) and gives workload owners an opportunity to optimize their resources and reduce costs.”
Principle 5 is perhaps the most underdone of them all.
Everyone talks about chargeback, next to no one measures ROI or attributes to revenue streams. I see this so infrequently it’s borderline fiction to see it in writing.
It’s a simple question: What is the impact of this application on revenue?
Werner Vogels, the CTO of Amazon, hits it right on the head in the second law of his Frugal Architecture:
“When designing and building systems, we must consider the revenue sources and profit levers. It’s important to find the dimension you’re going to make money over, then make sure the architecture follows the money.” - Werner
Follow the money. Build with revenue generation in mind. Even if the end goal isn’t profit (I see you PubSec customers), knowing the impact that an application has in your business/organization and being able to quantifiably understand the impact of it is a massive boon to cost optimization understanding and planning. When that number is clearly defined, it makes it painfully obvious where cost should and shouldn’t accrue.
This has to be driven by leadership from the top and across multiple teams and should not be delegated to cloud engineers to determine. It could start with a simple asset inventory or doing a business impact assessment. It doesn’t need to be complicated, but it should be well-understood and more importantly, quantified.
The engineers are often removed from the money, even if they’re greatly influenced by it. Everyone understands budgets and cost centers. Just hearing it makes most people shudder. But if engineers truly understood the revenue generation they bring to companies through the building and maintenance of applications, wouldn’t decisions made regarding those applications be much clearer, especially in the realm of cost optimization? I think so.
Conclusion
I hope this was a useful review of the 5 principles! Sometimes it’s good to get back to basics and look at them from a new perspective. I remember the first time I read these they didn’t sink in like they do now. I imagine in a few years’ time they’ll look even more different to me and I’ll probably disagree with myself on some aspect of this.
Aspirationally I hope it did the same for you and that you’ll reconsider how you look at them too. Becoming cost-conscious is all about continuous improvement and accountability, and that starts with our own understandings.