This is the first installment of a three part series on version control.

In this day and age computers are everywhere and each of these computers need a set of instructions to perform tasks. These instructions are developed in the form of textual source code and often require a great deal of collaboration. How do teams of developers, working in different locations, at different times on the same codebase manage to integrate all of their work into a coherent software project? How do they maintain a work history, potentially diverging into several branches, to undo changes if necessary? Before answering these questions, let’s consider what many of us do when working on our own projects.

The need for version control

Let’s say we decide to solve the famous P versus NP problem. We begin by creating a document pnpProof.txt to write down our proof. As we go along we continue to save our document, but we get to a point after a couple of days where we realise our current line of thought isn’t leading to a solution. We then create a copy of our document npnProof_v2.txt to work on a different approach without losing our previous work. Being a particularly difficult problem to solve, several days later we have reached npnProof_v10.txt. At this point we decide to get the help of others in the project by placing our work in a shared directory. We quickly find a number of hurdles that need to be overcome, the least of which is our groundbreaking proof:

  • New team members have no idea what is included in each file and what changes occurred without reading each one.
  • New team members may apply different naming conventions for versioning, or may not use any form of versioning.
  • If two members are working on local copies of the same file, then the second person’s changes will overwrite the first when copying back to the central directory. That is, there is no mechanism to manage the merging of both changes together.
  • The change tracking is not robust – we can’t undo a file to the state it was in before such a merge. We do not have a full history of changes to the project.

This example highlights the fact that our natural tendency to version by creating copies and renaming is difficult to manage and does not scale beyond a single developer.

Version control systems and their benefits

As described above, it can be difficult to manually manage versioning when working collaboratively. Luckily, version control systems (VCS) provide a solution to this problem. Git, SVN (pronounced “subversion”) and Mercurial are among the most popular tools used today, with each tool having it’s own defining features, benefits, and drawbacks. Some of the most powerful features offered by version control software today include:

  • The ability to track changes to each artifact in a project.
  • A fully maintained project history so that the entire repository can be rolled back to an earlier point in time either permanently or temporarily.
  • Complete tracking of which contributors made which changes to the project.
  • Metadata for each change including date, time, number of lines altered and comments.
  • A mechanism to inspect only the changes made between versions – this is known as a diff.
  • The ability to create and maintain multiple independent branches of the same project.
  • The ability to merge two project branches and resolve change conflicts interactively.

This relieves many of the headaches presented earlier and allows teams to focus on writing code and building solutions. I used to ‘version’ my own documents using the haphazard, copy and rename approach explained earlier. Managing each version of all of my files myself was time consuming and confusing. Since discovering version control, my work has become more organised and I’m now able code freely, safe in the knowledge that all of my work can be rolled back to an earlier point in time in a matter of seconds.

Centralised vs distributed

An important distinction between VCS offerings is the centralised and decentralised models. SVN is a centralised VCS meaning that the project repository is stored on a central server and team members obtain working copies of the codebase to work on their local machines. To commit changes to the repository each contributor needs to connect to the master server and apply their corresponding changes there. Once completed, the central work history is updated and the rest of the team can immediately sync the changes to their own working copies. An alternative approach is to do away a single source of truth and let each contributor clone the project repository so that each individual maintains an independent project repository in its own right. Using this approach project changes can be committed locally and the team can combine their changes by merging repositories in a peer-to-peer fashion. Git is a recognised example of such a VCS.

 

In practice, decentralised tools are often used in a centralised way – that is a single source of truth is maintained for the codebase. A project will typically have a master repository stored on a server, which developers will clone to their local machines. Changes to the work history are then made offline and when the time comes to share a line of work with the rest of the team the local repository is merged with the master repository by a process called pushing. Other contributors can choose to pull from the master repository as often or little as they like to merge the pushed changes of their peers to their local repositories.

 

Keep on eye out for Part 2 of this series, where further details of this workflow will be outlined.

About the author

Tyson is an Associate Consultant at Servian with a background in data warehousing, ETL and big data. Most of his professional life is currently spent working with Talend and Amazon Web Services. With a passion for programming and mathematics, Tyson tragically spends far too much of his free time programming rather than going outside.
Tyson Liddell