DevOps

DevOps is the combination of application development and operations, which minimizes or eliminates the disconnect between software developers who build applications and systems administrators who keep infrastructure running.

Why is DevOps important?

When the Agile methodology is properly used to develop software, a new bottleneck often appears during the frequent deployment and operations phases. New updates and fixes are produced so fast in each sprint that infrastructure teams can be overwhelmed with deployments and push back on the pace of delivery. To alleviate some of these issues, application developers are asked to work closely with operations folks to automate the delivery of code from development to production.

DevOps tooling resources

DevOps cannot be performed with tools alone, but having the right tools to augment the culture and processes is important to successful software delivery. The following resources discuss both Python-specific and general tools and services for DevOps environments.

  • DevOps: Python tools to get started is a presentation slideshow that explains that while DevOps is a culture, it can be supported by tools such as Fabric, Jenkins, BuildBot and Git which when used properly can enable continuous software delivery.

  • For an Atlassian-centric perspective on tooling, take a look at this guide on how to choose the right DevOps tools which is biased towards their tools but still has some good insight such as using automated testing to provide immediate awareness of defects that require fixing.

General DevOps resources

The following resources give advice and approaches for building the right teams, culture, processes and tools into software development organizations.

  • DevOps vs. Platform Engineering considers DevOps to be an ad hoc approach to developing software while building a platform is a strict contract. I see this as "DevOps is a process", while a "platform is code". Running code is better than any organizational process.

  • The open source PagerDuty Incident Response guide is the amazing result from their company taking the practices they use to keep their services running and putting them out for other developers to consume. Highly recommended.

  • Introduction to DevOps and Software Delivery Performance explains the four key delivery metrics of Delivery Lead Time, Deployment Frequency, Time to Restore Service, and Change Fail Rate, and then gives a high-level overview of technical, process and cultural capabilities that impact these metrics.

  • Operations for software developers for beginners gives advice to developers who have never done operations work and been on call for outages before in their career. The advantage of DevOps is greater ownership for developers who built the applications running in production. The disadvantage of course is the greater ownership also leads to much greater responsibility when something breaks!

  • Google's Site Reliability Engineering (SRE) book is free online and required reading for understanding the practices and principles behind keeping the largest websites alive. Note though that some of the advice in the book will be considered controversial at more stodgy traditional organizations that have done operations differently for a long time. There is also a wonderful interview with Ben Treynor, one of the authors of the book, that contains additional information.

  • The Increment, Stripe's fantastic digital and print magazine, has an issue dedicated to being on-call which discusses many DevOps-related topics such as what happens when your pager goes off, ownership and how startups can be different from large companies with their incident responses.

  • Why are we racing to DevOps? is a very high level summary of the benefits of DevOps to IT organizations. It's not specific to Python and doesn't dive into the details, but it's a decent start for figuring out why IT organizations consider DevOps the hot new topic after adopting an Agile development methodology.

  • SRE vs. DevOps: competing standards or close friends? covers Google's take on how Site Reliability Engineering (SRE) fits with the DevOps world. Roughly speaking, SRE is more closely aligned with metrics and how to operate infrastructure and applications, rather than the broader principles embodied by the DevOps philosophy.

What do you want to learn next about deployments?

I've built a Python web app, now how do I deploy it?

Can I automate testing and deployments for my app?

How do I go about testing my Python code?


Matt Makai 2012-2022