Configuration Drift, defined
In the realm of Infrastructure as Code (IaC), Configuration Drift happens when the state of your actual infrastructure becomes different from the defined or stored state that you created in code.
This article focuses on how to detect and handle configuration drift within a Terraform-based ecosystem; however, many of the principles applied are suitable for use with other IaC set-ups.
The three types of configuration drift
There are three types of configuration configuration drift you will encounter when working with Terraform:
- Emergent Drift – Changes made outside of the Terraform ecosystem to infrastructure already managed by Terraform, and recorded in Terraform state.
- Bogus Drift – “Changes” in the plan/apply cycle due to list ordering and other provider idiosyncrasies.
- Introduced Drift – New infrastructure created outside of the Terraform life-cycle.
Of these three types, Emergent and Bogus Drift are the easiest to detect; since running a terraform plan cycle against your infrastructure will highlight their presence. Detecting Introduced Drift is more complex – Terraform has no record of this new infrastructure. This requires that you conduct an external review of your system against the IaC definition. You could argue that the creation of new infrastructure technically isn’t configuration drift; however this article covers the instance where your IaC definition models a complex system, and you need to bring that “new” infrastructure under the management of your current project.
A quick review of terraform plan
Before we get into the details let’s cover the basics of reading the output of a terraform plan operation. If you’re a Terraform guru, please feel free to skip to the next section.
The output of a terraform plan run will show you the possible changes that Terraform intends to apply:
- Add – Terraform intends to create a new resource, indicated by a (+) in the output
- Remove – Terraform intends to remove something, indicated by a (-) in the output
- Update – Terraform intends to update an item in-place, indicated by a (~) in the output
- Recreate – Terraform intends to remove and recreate an item, indicated by a (-/+) in the output
- Nothing – Terraform hasn’t detected the need for any change, life is good.
This is a simple refresher on the plan command output. for greater detail, visit the Terraform Documentation.
BEFORE PROCEEDING: Take the time to understand the impact that each of these various kinds of changes will have on your infrastructure before you run terraform apply. Even an update-in-place operation has the potential to cause an interruption of service, depending on the provider.
Detecting and handling Emergent Drift
Emergent Drift happens when infrastructure managed by Terraform is modified outside of the terraform life-cycle. This type of configuration drift is easy to detect and remediate – it will show up in the output of a regularly run terraform plan output. One caveat to this is Bogus Drift, detailed below.
Once you’ve identified Emergent Drift, the next step is to decide what to do about it. Go about this by asking the relevant questions, and then implement the solution. Where did the configuration drift come from? Does the Terraform code represent what it should look like, or does the Terraform need to be updated to reflect the infrastructure?
If the infrastructure represents the “correct” state – update your terraform to match and validate with terraform plan.
If the terraform represents the “correct” state – run terraform apply to true-up the infrastructure.
In both cases, conduct a brief review to understand how the configuration drift happened, then prioritize and address any findings moving forward.
- Run terraform plan, identify Emergent Drift (check for Bogus Drift)
- Decide what to do
- Based on point 2 change:
- The Terraform – validate with terraform plan
- The infrastructure – execute a terraform apply followed by a terraform plan,
- Follow up with a process/security review as necessary
Detecting and handling Bogus Drift
Bogus Drift happens due to certain idiosyncrasies within Terraform and Terraform providers. It will initially appear to be Emergent Drift, but it’s actually a mismatch between how items are declared in your Terraform code and how the result is recorded in the state file.
As an example, let’s say you’ve got a list of permissions within your Terraform code defined as:
users = [“user-a”,”user-b”,”user-1″]
You run terraform apply, and everything looks great! Of course you immediately run terraform plan after the terraform apply – because we’re paranoid battle-hardened operators and we’ve learned to double-check our work – and it wants to update the user list… WHAT?
Here is what’s happening: the Terraform provider responsible for that particular item has recorded the list in the terraform state in a slightly different order:
users = [“user-1″,”user-a”,”user-b”]
When a terraform plan operation executes the comparison, it is not taking the order into account, and now you’re stuck in a never-ending update loop, because no matter how many times you successfully run terraform apply, this comparison will still fail.
To resolve this, review your state file with terraform state pull and change the ordering of that item in your Terraform sources to match how Terraform state “thinks” it should look.
Rerun terraform plan and the Bogus Drift should be resolved.
- Run terraform apply
- Immediately execute terraform plan and find configuration drift
- Review output of terraform state pull
- Adjust your code to match state , rerun terraform plan to verify the configuration drift is resolved
Detecting and handling Introduced Drift
Introduced Drift happens when new infrastructure is introduced outside the Terraform life-cycle. This is the most difficult type of configuration drift to detect and handle, as Terraform simply doesn’t know that the infrastructure exists yet.
Sometimes you’ll find Introduced Drift during a destroy operation, when Terraform attempts to destroy a resource and it flat out fails due to a dependency check in the upstream provider. Other times you’ll be looking at a specific system and notice a previously unrecorded bit of infra that seems to have auto-magically appeared overnight (this often happens when two or more teams are working within the same system and aren’t communicating as well as they could be).
Either way, you’re now in the position where you need to make a decision on what to do. In some instances the answer will be “Do nothing.” – depending on your specific system, setup, and environment. The rest of this section assumes that “Do nothing.” is the wrong answer, and we now need to get this new infrastructure managed within our existing Terraform project.
Your first step should be to review your Terraform provider for this particular resource and determine if it supports terraform import. If it does, keep on reading! If not, we’re going to have to treat this like a new infrastructure addition, including the normal shell-game while recreating it to avoid an outage. Ideally you already know how to do that, so we don’t cover it in depth here.
Next, we model the resource within Terraform like we would any other piece of new infrastructure. The key is to ensure that you match the identifier your provider uses with that used by the live system so they will line up during the import process – in practice this is typically the “name” parameter on the resource.
Then, run a terraform import on the newly modeled resource to record it into the Terraform state.
Finally run terraform plan against the newly modified state to verify our model against the imported resource, and adjust our model to match any notable differences in the imported state until the plan is “clean”. There is a non-zero chance that you will encounter Bogus Drift (see above) during this process, so make sure you understand how to deal with that as well.
At this point your “new” infrastructure is managed and you can make changes to it as you normally would in your Terraform ecosystem.
- Review your infrastructure, detect Introduced Drift
- Decide what to do – maybe nothing
- Understand the resource provider
- Model the new resource within Terraform
- Execute terraform import the new resources
Execute terraform plan to validate the import, adjust local Terraform as required.