← Writing

November 1, 2017

Yet Another Explanation of What is Cloud Native

Cloud Native software gets its configuration dynamically. A plain-language explanation of IaaS, PaaS, CaaS, microservices, and what "cloud native" actually means for reliability.

Cloud Native Software is software that gets its configuration information dynamically.

TL;DR — Configuration is derived at runtime, not baked in at deploy time. The more dynamic the configuration, the more cloud native.


Background: Release Management vs Configuration Management

Release Management is what we've been doing with software since the earliest days of computing. It's naming and managing a collection of versioned files. Release management systems often pull directly from GitHub with a git SHA or collection of SHAs. Naming versions is where version numbers like 1.0, 1.0.1, 1.0.2 come from. In practice, when an operating system or application is installed, it brings a fixed set of files with it.

Configuration Management is distinct. Configuration is the information that makes a given release of software specific to a particular deployment. The most common examples are IP addresses and passwords entered when an OS or application is first installed. Another example: when an email client is configured with your address and password. Many people run the same version of the same email app, but each person's configuration is unique.

Cloud Native Computing is the art of deriving as much of that configuration information as possible dynamically — from other systems, at runtime, rather than from static files. If there is static configuration text in a file, git repo, or database, it's less cloud native. When you call an API to an IaaS platform and get a reference to a fresh VM, that's more cloud native. Wrap that call in something that composes a fault-resistant pattern of hosts — that's even more cloud native.


The Stack, Defined

IaaS — Infrastructure as a Service — Call an API, get a VM. AWS, GCP, Azure. The unit of work is a virtual machine.

PaaS — Platform as a Service — The services your application needs are already running and exposed as APIs. Databases, queues, DNS. If the underlying service is implemented cloud-natively with many interchangeable workers, it's fault-tolerant. If it's MySQL on a single host, it's just hosted infrastructure with a different name.

CaaS — Container as a Service — Containers replaced VMs as the unit of work. Apache Mesos and Cloud Foundry announced Docker support and effectively turned PaaS into CaaS. The unit of work shrank; the model is otherwise similar.

FaaS — Function as a Service (Serverless) — The extreme case: a unit of work with no IP address or URL at deploy time. OpenWhisk and similar systems handle addressing and scaling transparently. The classic "hello world" assumed you were on a host with a name. That doesn't work in the cloud — but FaaS sidesteps the problem entirely.


Managing the Chaos: Decomposing Workloads

When your application needs more than one microservice, you need a discipline for managing complexity.

Intel's Tick-Tock model is still instructive. Intel managed the complexity of processor development by limiting each generation to either an architecture change OR a die geometry change — never both. Benefits: engineers had clarity of focus, two separate teams could optimize independently, and when something broke, the source of the problem was half as ambiguous.

Applied to cloud-native systems: change the architecture separately from changing the scale of deployment. Don't refactor services and resize clusters in the same release. The error space is easier to reason about when you constrain what can change at once.

The 12-Factor App methodology (Heroku, 2011) remains a practical checklist: configuration from environment variables, stateless processes, attached backing services, logs as streams. Each factor makes the application more amenable to dynamic configuration and therefore more cloud native.


Why It Matters for Reliability

A cloud-native application that gets its configuration dynamically can be deployed to a new host without downtime. If the host needs a kernel patch, route traffic away, patch the host, route it back. The application didn't go down — it moved.

This is the cloud dream in practice: routine operations (patches, migrations, upgrades) cause little to no downtime because the system was designed from the start to treat hosts as interchangeable.

The failure modes of non-cloud-native software — single hosts, baked-in IP addresses, static configuration files checked into repos — are not bugs. They are design choices made before the cloud existed. Cloud-native is the discipline of unmaking those choices systematically.