DevOps: Flexible Config Management? Not so fast!
[Published on DZONE at http://java.dzone.com/articles/devops-flexible-configuration]
You’re in charge of establishing a department-wide deployment automation capability. Your fellow developers are excited about it, and their managers are too. There is no shortage of ideas on how it might work:
- “Let us create our own workflows!”
- “We should be able to configure our own servers.”
- “It should be able to deploy from Nexus, Artifactory, S3, or whatever we choose.”
- “We can finally use the app versioning scheme my team likes.”
- “My team should get to do parallel installs if we want”
- “We should have open APIs so anybody can execute their own deployment solution.”
- “Each team should be able to configure the middleware for their application’s needs.”
Developers hate being told how to do things, so there is a general consensus that if you can make this deployment tool as flexible as possible, you’ll be able to build the best deployment automation system the world has ever seen.
Sounds great, except that it’s totally wrong.
Flexibility kills quality
Configuration “drift” is evil. Drift causes downtime and rollbacks. Flexibility creates drift.
I’ve been involved in data center migration projects where almost every server in a production farm was configured differently. It’s amazing the application even worked! On many other occasions we have rolled back code because the QA and Prod configurations were so different that our testing failed to uncover critical bugs. Although these environments sound ridiculous, I’m confident that it describes a common scenario across enterprise environments. I will also state that we had talented systems administrators managing the environments, unfortunately each one had the flexibility to manage the systems to their liking.
Our initial investment in deployment automation (and what initiated our devops strategy) was largely driven by a need to eliminate drift and increase availability. We knew automated deployments should be driven by data, and server instance data would be sourced from a CMDB. However, we quickly realized that our CMDB schema allowed for configuration drift. This led to one of our first devops principles: Don’t manage problems that you can eliminate.
Eliminate drift with inflexible schema data. Tools from operations teams tend to be server or device centric and we wanted our deployment automation to be app and farm centric. In other words, we wanted to deploy apps to a farm entity, where the server instances are attributes of the farm. However, we found traditional schema for configuration data was very flexible. The diagram below shows a typical farm with multiple instances, and each instance has an OS version. Since the OS version can be independently selected for each instance, the schema allows the ability to represent drift across the farm. While architecting our app deployment CMDB (interestingly named deathBURRITO), we specifically did not want to manage farm configuration consistency. We simply wanted a guarantee that our farm deployments did not have drift.
To achieve this we made a simple change to the schema that did just that – prevented the data from representing farm drift (picture below). Although you can incorrectly represent farm attributes, the data driven deployment is either 100% right or 100% wrong.
Gratuitous flexibility and useful flexibility
Eliminating schema flexibility to control drift is not that controversial since most people get it — and support it. When you start limiting personal preference, man look out, people get really passionate over stupid things. So we started communicating another one of our devops principles: flexibility is not always a good thing.
Your deployment automation should start with inflexibility and provide flexibility as needed. Don’t get me wrong, we absolutely support innovation and the ability to empower our department with tools that enable creativity. I often confuse the hell out of people by saying weird stuff like, “by limiting your flexibility, I can offer you more flexibility.” And I actually mean it — because we focus on the flexibility that is actually valuable. The objective is to distinguish between value-added flexibility and gratuitous flexibility, and eliminate the gratuitous junk.
- Value-added flexibility can be represented by a middleware option between Tomcat, JBoss and Glassfish. Each solution provides different features to the development team and they should have the ability to choose the best match (within reason) for developing to application requirements. Easy enough, there is value to the options.
- Gratuitous flexibility can be represented by allowing multiple install directory variations for each Tomcat app. SysAdmins usually have a preference and sometimes make it a very passionate preference. Although the configuration matters, it should support automation and security, not personal preference. There is no inherent value gained by allowing your environment to have different install directories such as /opt, /app, /u01. In fact, allowing options creates complexity for install scripts, logging, permissions, service accounts, monitoring etc. Pick one and restrict the rest.
One of the great things about automation is the ability to make the deployment platform deliver what you want, and fail what you don’t want. It’s a platform that gives the devops team enforcement power in the IT department that is rearly available. Like most organizations, you probably have many awesome design standards that are drafted, but in effect are just glorious shelfware documents. Automation empowers your ability to eliminate drift, control flexibility and operationalize the shelfware designs.
So back to my statement about limiting flexibility to offer more flexibility? I will argue that by eliminating all the gratuitous variations, you can simplify environment complexity and eliminate the associated busy work and time waste. I also believe that eliminating the gratuitous variations will allow your devops teams to focus on delivering the value of predictable self-service deployments… Real flexibility is the ability to provide your developers and QA teams self-service deployments; on demand at any time day and over weekends.