Code

How We Used Immutable Servers to Simplify Our Cloud Infrastructure

At FullContact we’re always looking for ways to simplify our infrastructure. Recently, we decided to evaluate ways to improve our server creation processes. We were using Chef to bootstrap a few hundred servers with a lot of success, but there was room for improvement.

Configuration management tools have paved the way for systems administrators to think about infrastructure as code. This shift has helped with repeatability, scalability, and reliability, but they’re not perfect. I was an early adopter of Chef and used it heavily for multiple projects, but throughout that time I experienced things that needed improvement:

  • Chef has a steep learning curve. There are different deployment models (Chef server, hosted Chef, Chef Solo).
  • Understanding the anatomy of a Chef run requires some time.
  • Knowledge of Ruby is required to use Chef.
  • Cookbooks constantly need refactoring as Chef evolves.
  • Chef cookbook dependency management isn’t trivial. There are multiple tools to help with this, but they’ve got their own learning curves as well.
  • Chef runs don’t always produce identical servers. Servers provisioned at different times may end up in different states. This commonly happens when external resources return different artifacts. (Package repositories, etc.)

Our overall goal was simplification, and we had some pretty basic ideas on how to do this.

Opportunities for Simplification

  • Reduce the learning curve for DevOps engineers.
  • Standardize on a Linux distribution.
  • Fail at build time, not during bootstrap.
  • Reduce AWS costs by reducing the time it takes to boot servers. (It might seem small, but 5-10 minutes on each boot will add up.)

Our Plan of Action

  • Use Ubuntu 12.04 LTS for all servers.
  • Quit using the package manager for critical software.
  • Use simple, repeatable shell scripts to configure servers instead of configuration management tools.
  • Utilize UpStart for service and event management.
  • Pre-bake AMIs instead of bootstrapping servers with Chef.

Since all of our infrastructure is hosted with Amazon Web Services we’re in a position to leverage custom AMIs for image management. Both Netflix and KickStarter have released tools to automate this process.  We evaluated both Aminator and build-ubuntu-ami and we decided to use KickStarter’s build-ubuntu-ami toolkit.

Example Use-Case

FullContact has a service-oriented architecture, much of which is backed by DropWizard applications deployed into auto-scaling groups managed by Netflix’s Asgard. Instead of having Chef cookbooks for each of these applications, we created a generic AMI that uses the metadata provided by Asgard to determine which application to deploy. This allows us to deploy all of our DropWizard applications using a single AMI.  Here are some of the steps we take when creating this AMI:

  • Patch system and install packages.
  • Setup Java – Custom install of Java under /opt/java6 or /opt/java7
  • Setup unattended upgrades – We only install non-critical packages (vim, dstat, etc.) so it is safe to do unattended upgrades.
  • Create a custom DropWizard/Asgard UpStart job – Sources the Asgard provided instance metadata to download the appropriate JAR file and create the right instance tags.

Example Code: DropWizard/Asgard UpStart job

function setup_dropwizard_runtime () {
cat << 'EOF' > /etc/init/dropwizard.conf
description "Setup a dropwizard application based on Asgard defined environmental variables"
start on (local-filesystems)
stop on runlevel [!2345]
task

pre-start script
 wget http://169.254.169.254/latest/user-data -O /tmp/userdata
 chmod +x /tmp/userdata
 # Source Asgard variables
 . /tmp/userdata

if [ -n "$CLOUD_DEV_PHASE" ] ; then
 config_file="${CLOUD_DEV_PHASE}.yaml"
else
 config_file='prod.yaml'
 CLOUD_DEV_PHASE=prod
fi

if [ -n "$CLOUD_REVISION" ] ; then
 version=$CLOUD_REVISION
else
 version='latest'
fi

if [ "$CLOUD_STACK" != "null" ] ; then 
 jar_name=$CLOUD_STACK.jar
else
 jar_name='root.jar'
fi

mkdir -p /etc/$CLOUD_APP /usr/local/$CLOUD_APP

/opt/ec2toolkit/bin/s3download 
 -b fullcontact-builds 
 -k "${CLOUD_APP}/${version}/${config_file}" 
 -d /etc/$CLOUD_APP/

/opt/ec2toolkit/bin/s3download 
 -b fullcontact-builds 
 -k $CLOUD_APP/$version/$jar_name 
 -d /usr/local/$CLOUD_APP/

/opt/ec2toolkit/bin/ec2tag 
 -DName=$CLOUD_APP 
 -Denvironment=$CLOUD_DEV_PHASE 
 -Drole=dropwizard

end script

script
 set -x
 # Source Asgard variables
 . /tmp/userdata

if [ -n "$CLOUD_DEV_PHASE" ] ; then
 config_file="${CLOUD_DEV_PHASE}.yaml"
else
 config_file='prod.yaml'
 CLOUD_DEV_PHASE=prod
fi

if [ -n "$CLOUD_REVISION" ] ; then
 version=$CLOUD_REVISION
else
 version='latest'
fi

if [ "$CLOUD_STACK" != "null" ] ; then
 jar_name=$CLOUD_STACK.jar
else
 jar_name='root.jar'
fi
cd /etc/$CLOUD_APP
 exec java -jar -Dfile.encoding=UTF8 
 -Dcom.sun.management.jmxremote.authenticate=false 
 -Dcom.sun.management.jmxremote 
 -Dcom.sun.management.jmxremote.port=7199 
 -Dcom.sun.management.jmxremote.ssl=false 
 /usr/local/$CLOUD_APP/$jar_name server $config_file
end script

EOF
}

We’re using the same immutable server approach with a majority of our other services.

Services with Custom Immutable AMIs

If you’re wanting to implement immutable AMIs yourself, here are some services where you can get started:

  1. ElasticSearch
  2. Cassandra
  3. Umpire
  4. RabbitMQ
  5. Squid Proxies
  6. Storm
  7. Tomcat
  8. ZooKeeper

This approach has greatly simplified our deployment process.  It requires some up-front planning, but the long-term consequence is well thought-out reusable AMIs.

Like this post? Share it:

Recent Posts