Source Control

Source control, also known as version control, stores software code files with a detailed history of every modification made to those files.

Why is source control necessary?

Version control systems allow developers to modify code without worrying about permanently screwing something up. Unwanted changes can be easily rolled back to previous working versions of the code.

Source control also makes team software development easier. One developer can combine her code modifications with other developers' code through diff views that show line-by-line changes then merge the appropriate code into the main code branch.

Version control is a necessity on all software projects regardless of development time, codebase size or the programming language used. Every project should immediately begin by using a version control system such as Git or Mercurial.

Monorepo vs Multirepo

There is a spectrum of philosophies for how to store projects within source code repositories.

On one extreme end of the spectrum, every line of code for every project within an organization is stored in a single repository. That approach is called monorepo and it is used by companies like Google. On the other end of the spectrum, there are potentially tens of thousands or more repositories that store parts of projects. That approach is known as multirepo or manyrepo.

For example, in a microservices architecture, there could be thousands of microservices and each one is stored within its own repository. No one repository contains the code for the entire application created by the interaction of the microservices.

There are many hybrid strategies for how to store source code that fall between these opposite approaches. What to choose will depend on your organization's needs, resources and culture.

Source control during deployment

Pulling code during a deployment is a potential way source control systems fit into the deployment process.

App deployment uses a server to pull from the source control system.

Note that some developers recommend deployment pipelines package the source code to deploy it and never have a production environment touch a source control system directly. However, for small scale deployments it's often easiest to pull from source code when you're getting started instead of figuring out how to wrap the Python code in a system installation package.

Source control projects

Numerous source control systems have been created over the past several decades. In the past, proprietary source control software offered features tailored to large development teams and specific project workflows. However, open source systems are now used for version control on the largest and most complicated software projects in existence. There's no reason why your project should use anything other than an open source version control system in today's Python development world. The two primary choices are:

  • Git is a free and open source distributed version control system.

  • Mercurial is similar to Git, also a free and open source distributed version control system.

  • Subversion is a centralized system where developers must check files in and out of the hosted repository to minimize merge conflicts.

Hosted version control services

Git and Mercurial can be downloaded and run on your own server. However, it's easy and cheap to get started with a hosted version control service. You can transition away from the service at a later time by moving your repositories if your needs change. A couple of recommended hosted version control services are:

  • GitLab has both a self-hosted version of its open source software as well as their hosted version with pricing for businesses that need additional hosting support.

  • GitHub is a software-as-a-service platform that provides a user interface, tools and backup for developers to use with their Git repositories. Accounts are free for public open source development and private Git repositories can also be hosted for $7 per month.

  • BitBucket is Atlassian's software-as-a-service tool that with a user interface, comparison tools and backup for Git projects. There are many features in BitBucket focused on making it easier for groups of developers to work on projects together. BitBucket also has private repositories for up to five users. Users pay for hosting private repositories with more than five users.

General source control resources

Monorepo vs multirepo resources

Monorepo versus multirepo version control strategies are a weirdly contentious topic in software development, likely because once a policy is set for an organization it is exceptionally difficult to change your approach. The following resources give more insight into the debate on how to structure your repositories.

  • Monorepo, Manyrepo, Metarepo is an awesome guide to varying ways of structuring your source repositories that contain more than one project. The guide covers advantages and disadvantages of common approaches used in both small and large organizations.

  • Repo Style Wars: Mono vs Multi goes into the implications of using one side or the other and why it is unlikely you can create a combination solution that will give you the advantages of both without the disadvantages.

  • Why Google Stores Billions of Lines of Code in a Single Repository covers the history and background of Google's source control monorepo, which is one of if not the largest monorepo for an organization in the world.

  • Advantages of monorepos goes into the advantages of using a monorepo and does not discuss the downsides but admits there are many so the decision is not clear-cut on using either strategy.

  • Monorepos and the Fallacy of Scale argues that having all of an organization's code in a single repository encourages code sharing. The author considers the concerns often raised about tight coupling between components in a monorepo code base but says that the advantages outweigh the disadvantages overall.

Git distributed source control system

Git is the most widely-used source control system currently in use. Its distributed design eliminates the need to check files in and out of a centralized repository, which is a problem when using Subversion without a network connection. There is a full page on Git with further details and resources.

Subversion resources

Apache Subversion (source code), often just called "Subversion" or "SVN", is a source control system implementation.

Source control learning checklist

  1. Pick a version control system. Git is recommended because on the web there are a significant number of tutorials to help both new and advanced users.

  2. Learn basic use cases for version control such as committing changes, rolling back to earlier file versions and searching for when lines of code were modified during development history.

  3. Ensure your source code is backed up in a central repository. A central repository is critical not only if your local development version is corrupted but also for the deployment process.

  4. Integrate source control into your deployment process in three ways. First, pull the project source code from version control during deployments. Second, kick off deployments when code is modified by using webhooks or polling on the repository. Third, ensure you can roll back to a previous version if a code deployment goes wrong.

What do you want to learn about after source control?

I've built a Python web app, now how do I deploy it?

I want to learn more about app users via web analytics.

How do I log errors that occur in my application?


Matt Makai 2012-2022