We built a feature that allows our clients to use their own brand domain, and separate ingress and egress IP to other clients in our system.
At the beginning, since we run entirely on AWS, so choose to use Cloudformation to approach. with managed state, atomic operation, rollback, and easy to clean up, so convenient for us to build this for our clients.
From AWS Scripts to a Language Your Infrastructure Actually Speaks
A practitioner’s walk through the mental shift from cloud-native IaC tools to building abstractions that work across every provider — and why the journey changes how you think about infrastructure forever.
Every engineer who’s worked in the cloud long enough has a moment where a bash script starts to feel like a liability. You wrote it at 2am to spin up a staging environment. It worked. Then three months later no one knows what it does, including you.
Infrastructure as Code is the answer to that moment. But IaC is not one thing — it’s a spectrum that runs from “shell scripts with a YAML config” all the way to “a full programming model that describes the desired state of the world.” Where you sit on that spectrum says a lot about how mature your infrastructure practice is, and how much pain you’re willing to absorb as you scale.
This post walks through that evolution: from the first time you reach for aws cloudformation to the point where you’re building your own abstraction layer that treats AWS, GCP, and Azure as interchangeable backends. We’ll look at the tools, yes — but more importantly, we’ll look at the mindset that each step requires.
Act I — The native phase
When you’re a startup on AWS, the path of least resistance is CloudFormation. It’s built in, it’s free, and it speaks the exact same API as everything else you’re touching. A simple EC2 instance looks like this:
# cloudformation/compute.yaml
AWSTemplateFormatVersion: '2010-09-09'
Resources:
AppServer:
Type: AWS::EC2::Instance
Properties:
InstanceType: t3.medium
ImageId: ami-0c55b159cbfafe1f0
Tags:
- Key: Environment
Value: production
It works. It’s declarative. You check it in and feel like a responsible adult. Then the template grows to 2,000 lines and you start copying and pasting resource blocks between stacks. There’s no abstraction — YAML doesn’t have functions, loops, or types. You have parameters and mappings, which are the programming equivalent of duct tape.
AWS CDK solves this by letting you write actual TypeScript or Python, then synthesizing CloudFormation from it. Suddenly your infrastructure has real abstractions:
import * as ec2 from 'aws-cdk-lib/aws-ec2';
// A reusable "web server" construct that encodes your org's defaults
class WebServer extends Construct {
constructor(scope: Construct, id: string, props: { env: string }) {
super(scope, id);
new ec2.Instance(this, 'Instance', {
instanceType: ec2.InstanceType.of(
ec2.InstanceClass.T3,
props.env === 'prod'
? ec2.InstanceSize.LARGE
: ec2.InstanceSize.SMALL,
),
machineImage: ec2.MachineImage.latestAmazonLinux(),
});
}
}
This feels like a leap forward. You have real programming constructs. You can build opinionated abstractions that encode your team’s best practices once and reuse them everywhere. The CDK mental model is powerful: describe what you want, not how to get it.
The native tool trap. The problem with CDK isn’t the tool — it’s the assumptions baked in. Every API call, every resource type, every default value is AWS-specific. The moment your company signs a GCP contract for a machine-learning workload, your entire abstraction library is useless.
Act II — The provider-agnostic awakening
Terraform was built from the start around a different premise: infrastructure is a graph of resources that have providers, and providers are interchangeable. You write HCL, the provider translates it to API calls, and your core configuration doesn’t change when the provider does.
# The same compute resource, but now the provider is a variable
terraform {
required_providers {
aws = { source = "hashicorp/aws" }
}
}
resource "aws_instance" "app" {
instance_type = var.instance_type
ami = data.aws_ami.ubuntu.id
tags = {
Environment = var.environment
ManagedBy = "terraform"
}
}
# To move to GCP, you swap the provider and the resource type.
# Your variable definitions stay the same.
resource "google_compute_instance" "app" {
machine_type = "n1-standard-2"
zone = "us-central1-a"
boot_disk {
initialize_params {
image = "debian-cloud/debian-11"
}
}
}
Terraform teaches you a crucial lesson: infrastructure has a shape. That shape is a directed acyclic graph of resources with dependencies. The specific API calls to materialize that graph are the provider’s problem, not yours. Once you internalize this, you start thinking about compute, storage, and networking as abstract resource kinds — not as AWS-specific services.
The shift from “I’m deploying an EC2 instance” to “I’m deploying a compute resource” sounds semantic. It isn’t. It’s the difference between building on a specific cloud and building for the cloud in general.
Act III — Building your own abstraction layer
Once you’re operating at scale — multiple teams, multiple environments, multiple clouds — even Terraform starts to show its seams. The HCL language is declarative but not expressive. You can’t easily enforce org-wide policies, you can’t easily build a UI on top of it, and you can’t easily implement approval workflows for production changes.
This is where the most interesting engineering happens: building a platform layer that sits above the IaC tools. The key insight is that you need to separate three concerns that are often tangled together:
- What you want — the desired state of your infrastructure (a web server, a database, a queue)
- Where you want it — the cloud provider, region, and environment context
- How to get there — the specific API calls and IaC syntax to materialize the desired state
The platform layer uses Terraform as the execution engine — you don’t reinvent what it does. What you add is a control plane: a service that accepts abstract resource specifications, translates them to provider-specific HCL via an adapter, and then runs the IaC tool as a subprocess with proper lifecycle management.
The evolution at a glance
| Era | Tooling | Key strength | Breaking point |
|---|---|---|---|
| 1 — Native | CloudFormation, CDK | Zero setup, full AWS coverage | Any non-AWS requirement |
| 2 — Agnostic | Terraform, Pulumi | Multi-provider, plan before apply | HCL isn’t a real language; no built-in policy layer |
| 3 — Platform | Internal control planes | Full abstraction, approval gates, drift detection | Highest build cost; only worth it at scale |
The adapter pattern in practice
Here’s what that adapter contract looks like. Every cloud provider implements the same interface — but the code inside is completely cloud-specific:
// Every cloud provider implements this contract.
// The control plane only talks to this interface.
interface CloudAdapter {
// Turn an abstract resource spec into provider-specific HCL
synthesize(resource: CloudResource, ctx: SynthContext): string;
// Validate before any HCL is generated
validate(spec: ResourceSpec): ValidationResult;
// Map a profile name ("medium-compute") to a real instance type
resolveProfile(profile: string, region: string): ResolvedProfile | null;
// Compare desired state to what's actually running
getActualState(
resource: CloudResource,
creds: CloudCredentials,
): Promise<ResourceState>;
// Return the minimal IAM-equivalent policy needed to operate
getRequiredPolicy(scope: PolicyScope): string;
}
The synthesize method is where the real translation happens. The caller says “I want a compute resource with the medium profile in us-east-1.” The AWS adapter turns that into an aws_instance block targeting a t3.medium. The GCP adapter turns it into a google_compute_instance block targeting n1-standard-2. The caller never sees the difference.
Control plane (abstract ResourceSpec)
│
▼
CloudAdapter interface
┌─────┼─────┐
▼ ▼ ▼
AWS Azure GCP
adapter adapter adapter
The mindset, not the tools
The tool evolution matters less than the mental models it forces you to build. Here’s what changes as you go deeper:
1 — Infrastructure is desired state, not commands
The moment you stop thinking in imperative API calls and start thinking in declarations of desired state, everything clicks. You describe a world. The engine figures out the delta from the current world and applies the minimum necessary changes to close the gap. This is why Terraform’s plan command is so powerful — it makes the delta visible before you commit to it.
2 — Profiles beat raw specs for real users
When a developer wants to deploy a web server, they don’t want to pick an instance type. They want to say “medium compute” and have the platform resolve that to the right thing for their cloud and region. Building a profile system — where medium-compute means t3.medium on AWS and n1-standard-2 on GCP — is one of the highest-leverage things you can do as a platform team. It also gives you a lever for cost optimization without touching application code.
3 — State is the hard part
Every IaC system has a state problem. What does the world look like right now? CloudFormation stores it in the stack. Terraform stores it in a state file. Your own platform stores it in a database. The critical thing is that state is the source of truth, not the config file. Drift — where the actual world diverges from the recorded state — is the enemy, and detecting it requires actively reconciling what your state says against what the cloud API returns.
4 — Async by default, approval gates for production
Long-running infrastructure changes should never block an HTTP request. The right model is: accept a change, create an operation record, enqueue a job, stream events back to the client as the job runs. And for production environments, insert an approval step before the apply — a human should always have a chance to review a Terraform plan before it touches prod.
Beware of premature abstraction. If you’re a 10-person team on AWS, you don’t need a cross-cloud control plane. CloudFormation or CDK will serve you well for years. The platform layer earns its keep when you have multiple teams, multiple clouds, and compliance requirements around who can deploy what where. Build it when the pain of not having it is measurable.
Where does this leave us?
The arc from CloudFormation to a full control plane is really a story about abstractions becoming more honest. CloudFormation is honest about being AWS-specific. CDK adds expressiveness but stays honest about the same constraint. Terraform is honest about provider diversity but hands you HCL instead of a real programming model. A control plane is honest about the full complexity — multi-cloud, multi-team, approval workflows, drift detection — while hiding the details behind a well-designed interface.
The next time you’re evaluating an IaC tool, don’t just ask “does it support my cloud?” Ask: what does this tool think infrastructure is? What’s its mental model of resources, state, and change? The best tool is the one whose mental model matches where you’re headed — not just where you are today.
The patterns in this post come from building production IaC systems at scale — from early CloudFormation templates to custom control planes that synthesize Terraform HCL from abstract resource specifications across AWS, GCP, and Azure. The journey changes how you think about infrastructure, and that’s the real point.
Tags: IaC · Terraform · AWS CDK · Platform Engineering · Multi-cloud · Infrastructure