Who Changed Our Cloud Environment? Identifying Root Causes of Terraform Drift

May 24, 2023AWS, Terraform

Engineer thinking about cloud architecture

Best Practice: Terraform as the Single Source of Cloud Truth

Best practice usage of Terraform includes having it serve as the single source of truth for what is in you organization’s cloud environment. This means that all cloud resources are accurately captured within Terraform configuration, and all changes to the cloud (resource edits, resource deletions, and new resource creations) are driven by a Terraform workflow.

The easiest way to enforce this pattern is to only grant edit access to the cloud to your organization’s Terraform workflow. Unfortunately, for organizations transitioning to Terraform, or in various stages of Terraform maturity, this is too blunt of an option to seriously consider. As a result, sooner or later, most organizations will experience drift and its downsides.

Drift Happens

Drift can happen in a variety of ways, and is not limited to the following examples:

  • A cloud engineer makes a hot-fix in production to resolve a bug and forgets to update the corresponding Terraform code.
  • A DevOps engineer is much more comfortable with CLI tools and makes changes to/creates/deletes cloud resources on the command line.
  • A full stack engineer with minimal working knowledge of Terraform creates resources in their dev environment to prototype new application logic.

In any of these cases, an organization moving towards the principal of “Terraform as the Source of Truth” would need to take the following actions:

  1. Identify that drift has occurred. Ideally this is done proactively, and not when running  or  (e.g. when your engineering team is actively trying to deploy new resources and is now blocked due to drift having occurred). Furthermore, how will you identify resources that have been created wholely outside of the Terraform workflow?
  2. Determine who or what changed the resource. Which service account or user caused the drift to occur? Did they have a good reason to do so? For most teams, this would require manually going through cloud admin logs to identify when a resource instance was changed/created/deleted. If that sounds like a lot of work that is likely never to be done — that’s because for the vast majority of engineering teams, it is.
  3. Educate and lock down cloud permissions to prevent drift re-occurence. Unfortunately, since step 2) requires so much time and is rarely completed, this step cannot happen and drift is likely to re-occur.

Solution: dragondrop.cloud’s Cloud Actor Identification

dragondrop’s State of Cloud Report includes information on the root causes of drift for both resources already managed by Terraform and those completely outside of Terraform control. This means when drift occurs within your organization’s cloud, you can go right to step 3), and right to improving your organization’s cloud management posture.

You can watch a video of this in action with AWS here.

Conclusion

Drift happens, but the process needed to prevent it from regularly re-occuring is usually very manual. This process involves so much manual toil that it often simply does not happen, leaving organizations saddled with drift and unable to fully adopt best practices with their Terraform usage.

This is why we created dragondrop — to help automate the toil often required for organizations to adopt and maintain Terraform best practices. dragondrop will answer your drift question of “Who changed our cloud environment?”, so that you can ensure it does not happen again.

dragondrop.cloud’s mission is to automate developer best practices while working with Infrastructure as Code. Our flagship product regularly scans and identifies resource changes that have occurred outside of a Terraform workflow (e.g. drift) so that dev teams can have a Cloud environment that is fully represented as code. All of our tools are self-hosted by our customers, with no data ever leaving their servers. To learn more, schedule a demo or get started today!

Learn More About Terraform

Terraform Variable Management

We've previously discussed the syntax for creating variables within Terraform configuration. While this helps us with syntax, it leaves open questions about how variable values are actually passed into our Terraform workflow. CLI Specification When running terraform...

read more

What is Terraform? How Does Terraform Work?

What is Terraform? Terraform is the leading Infrastructure as Code (IaC) tool (see our article for a review of IaC). It is fully open-sourced, and managed by HashiCorp. Over 1000+ different infrastructure providers can be controlled via Terraform, and new providers...

read more

Quickstart: Writing Terraform

In this article we discuss how the basics of writing organized Terraform infrastructure configuration. Specifying Terraform's Configuration We recommend keeping a given Terraform module's requirements within their own versions.tf file. Within versions.tf, you can...

read more