10. Version Control Systems

Version control is a time machine for code.
— Anonymous

A fundamental tool of software development, version control systems (VCS) provide the means to manage changes to source code and documents safely. Why would you need one? Well, consider the comic below:

_images/phd_final.png

Fig. 5 “Piled Higher and Deeper” by Jorge Cham 

Yes sonny, back in the old days before VCS, we had to save copies and versions manually. (cough cough)  The dilemma so succinctly illustrated above is common for one working on a substantial project alone. Now, consider a team working on something even bigger—the situation quickly becomes untenable.

If you don’t have source control, you’re going to stress out trying to get programmers to work together. Programmers have no way to know what other people did. Mistakes can’t be rolled back easily. The other neat thing about source control systems is that the source code itself is checked out on every programmer’s hard drive — I’ve never heard of a project using source control that lost a lot of code.
— Joel Spolsky, Blogger/co-founder StackExchange 

As an answer to that, version control systems provide the following features and benefits that build on one another:

  • Ability to record your work regularly at discrete points in time as “snapshots.” Ideally, done immediately after improvements have been made and tests run.

  • To revert to older versions, retrieve deleted information, or determine when a bug was introduced.

  • To share work with others and keep them current.

  • To manage conflicts with other’s work.

  • To work easily from multiple locations, work, home, or mobile—keeping copies in sync.

  • To back up important files remotely—asset protection.

  • Increased freedom to experiment, through increased safety.

Freedom to Experiment

From the treehouse blog :

Let’s say you’re in the middle of making a [web]site and want to try something crazy with the css. Maybe you want to do a bit of an experiment and make everything strange shades of yellow and red. You don’t know if you’re going to keep the work when you finish it. Since it’s an experiment, it would probably be a good idea to not mess up your working version of the site.

At this point, you might be thinking of creating a copy of the site and making your changes on the copy. Maybe you’re sure that you’re going to use this work and you’re going to crank through. Maybe it doesn’t work out, though. Wouldn’t it be nice if you could take a snapshot of your site now, do your experiment, and not have to worry about whether you mess things up? That’s exactly what version control is for!

In contrast to other chapters in this book, we’ll focus on concrete implementations of popular (VCS) tools to a greater degree in this one.

Warning:  VCS FTW!

Tip 23: Always Use Source Code Control—Always.
Even if you are a single-person team on a one-week project. Even if it's a "throw-away" prototype. Even if the stuff you're working on isn't source code. Make sure that *everything* is under source control -- documentation, phone number lists, memos to vendors, makefiles, build and release procedure, that little shell script that burns the CD master—everything. We routinely use source code control on just about everything (including the text of this book).
— The Pragmatic Programmer, Hunt / Thomas

Indeed. Version control has become so useful and ubiquitous it is hard to believe that some developers avoided them in decades past.    True, the tools are much better now than in the early days. By the dawn of the 2010’s however, you won’t be considered a Professional Developer™ unless you understand and use version control systems and use them well, simple as that.

Tip:  Do The Exercises

If the subject is unfamiliar, it’s recommended to complete this chapter, grab a book (or two) of those recommended below and read them cover to cover.

Importantly, don’t skip the exercises; VCS concepts require practice to internalize.

10.1. Terminology

Unfortunately, version control systems have a number of names and abbreviations you’ll want to be familiar with:

  • Version Control System (VCS)

  • Revision Control System (RCS)

  • Source Control (or Code) Management (SCM)

There are also quite a few terms you’ll need to know defined below, and discussed further in the chapter.

Repo:

A repository to hold important code and documents. In practice a folder with special bookkeeping information to facilitate “time travel.”

Branch:

A space for or “line of” development where changes can be developed in isolation, to avoid interfering with and interference from others.

Check out, or Clone:

To retrieve a copy of a repo or select file(s) within, from an authoritative server.

Check in, or Push:

To submit or publish changes to others.

Commit:

To record, or a record of changes made by a user at a specific point in time.

Conflict:

When more than one user makes changes to the same lines in a file contemporaneously, there will be a conflict. The first user to check-in their changes in will succeed, subsequent users will need to update and resolve the conflicts by merging or abandoning them.

Merge:

The process of bringing the changes from one branch to another, potentially requiring conflict resolution.

Revision:

A change, or set of changes, to a document and repo.

Update:

Retrieval of the latest version from an authoritative server.

10.2. History

An important problem in program development and maintenance is version control, i.e., the task of keeping a software system consisting of many versions and configurations well organized.
— Walter F. Tichy, Author of RCS, CS Dept. Purdue University 

The need for version control systems was recognized early on. Let’s take a look at notable developments over the last few decades.

The Mesozoic Era (aka 1st Generation)

A local data model was used in the earliest VCS tools, designed before ubiquitous networking.

  • Versioning file systems :

    Most famously the file system  on VMS  (used on VAX computers by DEC) had built-in versioning in the 1970’s. For example, to list the fifth revision of a file at the command-line you could type:

    $ DIR PROG1.COB;5
    

    This was quite helpful, for backup purposes at the very least.

  • SCCS  & RCS :

    File-based version control was an advance over the versioning filesystem, because the developer gets to decide which versions are worth recording—it is not bundled with the save button. These were crude by the standards of today, with no notion of networking. Using “locking” , only one person could edit a file at a time. In later years, with a repo kept on a network mount, they could be highly annoying. When another developer acquired the lock and forgot to check in their changes (and unlock), you’d have to hunt them down and ask them to.  :-/

Client/Server or Centralized Era (2nd Gen.)

_images/cvcs.png

Fig. 6 Centralized Version Control, courtesy 

  • CVS :

    Though built upon RCS conventions it included a client-server networked repository. Centralization of the repo avoided having to track down others—and facilitated backups and security—a big improvement. Whoever checks in a contested file first wins, laggards have to update and merge.

  • Not long afterward, sites such as SourceForge  sprang up to offer free online hosting to open source projects. Sadly this site has been taken over by sleazy owners so should generally be avoided.

  • The successor to CVS, Subversion will be discussed shortly.

The Proprietary Era

  • Perforce , P4:

    This was a popular choice when having a single giant repository for the whole company was considered a good idea. (It still can be.) It worked well in that role, though “the repo” was often a mess for the same reason. Still used in the game industry because it handles binary assets well.

  • Bitkeeper :

    An early notable distributed system (described below) that inspired git, not only through design but in reaction to its restrictive license policies.

  • Microsoft and IBM also offered a number  of reviled  systems that aren’t quite worth mentioning by name.

By the dawn of the 2010s, the era of proprietary version control is at its end. There are now a number of open, freedom-respecting choices that have taken the mantle of innovation and run with it. There’s no longer a reason to use an obsolete or proprietary VCS unless required to.

10.3. Contemporary Systems

Version control is indispensable on team projects. It becomes even more powerful when version control, defect tracking, and change management are integrated. The applications division of Microsoft found its proprietary version-control tool to be a “major competitive advantage” (Moore 1992).
— Steve McConnell, Code Complete (Ch. 28)

Design Overview

Though focused on the ongoing state of a portion of a filesystem, internally version control systems are designed around trees of nodes  that track file and metadata changes. Typically this is done using snapshots —the recording of state at a particular time—and/or delta  (change) encoding techniques. Revisions may be visualized as directed acyclic graphs  (or DAG) to ease the learning curve. To illustrate, consider the short repository history below:

Diagram

As shown, the first commit record created was C1, and represents the initial state of the repo. Each additional commit, for example C2 and C3, represents the additions and changes made after C1. The label “Head” is merely a pointer that refers to the location where the next child commit will appear. Each commit records:

  • The user

  • The change

  • A timestamp

  • Hash of the contents

  • The parent commit, (why
    the arrows point backward).

  • A descriptive message, etc.

By storing changes as line-by-line “diffs,” or differences  instead of a full copy, we avoid ballooning disk space requirements. Hash functions  (most often SHA-1  due to speed/size/safety reasons), that “map data of arbitrary size to a bit string of a fixed size” are used to confirm the integrity of tracked assets and/or detect that changes have occurred.

Branching and Merging

_images/doc_time.jpg

Fig. 7 Doc Brown explains branching.   

A branch in VCS terms is a path of development that diverges from the “trunk”, or original or main path, into an “alternate timeline.” Developers use branches as private workspaces, free from the worries of interfering with, or the interference of others. Branches are most often created to isolate an experimental new feature or protect an older maintenance release from unintended changes. Unlike a real tree however, temporary branches are often merged back into the trunk (the first/main branch) once completed, then deleted.

Diagram

An experimental dev branch is created between C2 and B1.

Bookkeeping Information

Where do VCSs keep all this bookkeeping information? Typically, in a folder under the root of the repo, e.g.:

.svn/

.git/

.hg/

Remember that a leading . period in front of a filename is a Unix convention to hide a file from view. Older versions of Subversion kept one of these in every folder under the root, leading to lots of cruft. :-/

Warning:  Propellerhead Alert!

For whatever reason this area of study tends to attract propeller-head types that love to desimplify things—beware.      

Distributed Version Control (DVCS) (3rd Gen.)

Subversion = Leeches. Mercurial and Git = Antibiotics.
We have better technology now.
— Joel Spolsky, Blogger/co-founder StackExchange 
_images/p2tp.jpg

Fig. 8 Power to the People, perhaps Lennon’s most annoying track.   

The latest breakthrough in VCS has been distributed version control. Rather than there being one (and only one) official server, each user keeps a full copy of the repo and can share changes with others, participating in a peer-to-peer  (P2P) manner as well. DVCS is a bit more complicated as a result. The distributed  part is also somewhat of a misnomer , there is no coordinated computation happening among copies.

A consequence of the distributed design is that repos need to be smaller, and normally hold one project rather than have one giant repo for the whole company. Otherwise you’d have to store tons of data on each workstation and waste a lot of time shipping copies of it over the network. A consequence of this consequence is that smaller repos can now realistically be neat and tidy. Branching and merging are also easier in DVCS systems.

Separation of Commit and Publish

_images/dvcs.png

Fig. 9 Distributed Version Control, courtesy M. Ernst, UW 

Another important advance (which is somewhat orthogonal but bundled with DVCS) is the separation of the act of committing and sharing work—enabling developers to make their commits offline. In an increasingly mobile world with growing numbers of remote workers you can imagine this feature is a big deal. Further, this allows developers to do a better job saving their progress in more frequent, higher-granularity commits. Before “inflicting” their bug-ridden changes onto others, as Joel Spolsky likens the act of publishing. 

Before DVCS, a developer might have to postpone making a commit, and by extension a proper milestone and backup, until very sure things were close to perfect. Yes, that sucks hard when you stop to think about it  , practically defeating the purpose of using a VCS tool in the first place.

(This separation feature is not intrinsic to DVCS, but rather a consequence of poor merging tools implemented in centralized systems of the past.)

Hint:  Potential Misconception…

Just because you are using distributed version control doesn’t mean you can’t have a central server considered authoritative, always available, and backed up regularly. It just won’t be the only copy.

Note:  Online Resources

  • Understanding Version-Control Systems, an older piece  by ESR describing VCS history and design.

  • A Visual Guide to Version Control , at BetterExplained.

  • A Visual Guide to Distributed Version Control , at BetterExplained.

10.3.1. Comparison Table

An oddly-named trio of open, free, and popular choices that work on Windows, macOS, and Linux, that could be described as an “embarrassment of riches” , are detailed below. While GUI  clients are linked for convenience, developers should use the command-line until proficient.

As a user of one of these three you won’t notice a lot of difference day to day, the biggest difference (as mentioned) is that with Subversion saving a commit and publishing the changes are done with a single command.

VCS

Subversion

Git

Mercurial

Type:

Centralized

Distributed

Distributed

Architecture:

Client/server

Peer to peer

Peer to peer

Main qualities:

Simple, stupid

Powerful, fast

Powerful, easy

Typical repo size:

Large

Small

Small

Site:

apache.org 

git-scm.com 

mercurial… 

Install:

apache.org 

git-scm.com 

mercurial… 

Book:

svnbook.org 

git-scm…   

hgbook… 

Tutorial:

svnbook.org 

git-guide… 

hginit.com 

GUI Tools:

tortoise 

tort… , st 

tortoise 

Commands:

Initialize repo:

svnadmin create…

git init

hg init

Check out code:

svn checkout

git clone

hg clone

Get updates:

svn update

git pull

hg pull -u

Goto revision:

svn update -r…

git checkout…

hg update -r

Create branch:

svn copy

git branch

hg bookmark…

Goto branch:

cd /branches/…

git checkout…

hg update

Add to repo:

svn add

git add

hg add

Add to commit:

git add

Print status:

svn status

git status

hg status

Print diff:

svn diff

git diff

hg diff

Commit & publish:

svn commit

Commit changes:

git commit

hg commit

Publish changes:

git push

hg push

== WTF?

Below, we’ll explore what’s interesting about each of the three choices above.

10.3.2. Subversion, aka SVN

_images/svn_logo.png

The Subversion  VCS is in a somewhat unique position these days. Though its fully centralized design is largely considered obsolete, it continues to be an easy to use and free system that was built to fix many of the shortcomings of CVS. While it includes a few advantages over historical systems it falls short in several important areas such as branching, merging, and offline use in comparison to more modern alternatives.

Subversion is included here because it is still widely used, maintained, and appropriate for a few use cases, such as for folks who require centralization and/or increased security controls  for one reason or another. It is “the last and finest of the dinosaurs,” as Fred Brooks would say.

Branching and Merging

Branching and merging is considered somewhat “scary” under Subversion, because the tools are a bit braindead compared to newer systems. To do branching, first you’ll create a number of folders in your repo, by convention:

  • /trunk/

  • /branches/

Files are created under trunk then copied over (linked, actually) to the branch folder when a new branch is desired. Modifications may be merged either way, and are implemented as copying patches from folder to folder.

Merging has traditionally been very difficult with Subversion, though it has improved over the years with automated merge tracking in v1.5 and v1.8. The situation is not as dire as it used to be. 

See also:  Online Resources

  • Subversion branching quick start 

  • Favorite non obvious feature of svn? 

  • Subversion vs. Git: Myths and Facts , a spirited defense.

10.3.3. Git, aka Goddamn Idiotic Truckload…

I'm an egotistical bastard, and I name all my projects after myself. First 'Linux', now 'git'.
— Linus Torvalds 
_images/jpbs.jpg

Fig. 10 Metal Gods…   

Git, named after the British slang  term for a contemptible person, has taken the world by storm in just a few years. It was conceived and designed by Torvalds  after a conspicuous fallout over the use of Bitkeeper to maintain the Linux kernel  sources—a use case which requires a heavy-duty distributed solution for its multi-level development team scattered across the world. Git was largely implemented and is now maintained by Junio Hamano . Thanks to its elegant core design and despite it being harder to learn and use (in terms of WTFs/min) than other choices it has risen to become the most popular version control system for open-source projects and continues its growth into other contexts as well.

_images/xkcd_git.png

Fig. 11 On Git, courtesy xkcd  :-/

Though git was started at roughly the same time as the other DVCS favorite 10.3.4.   Mercurial, aka HG, it took off at a faster velocity for two main reasons:

  • Cheap lightweight branching (see below)

  • Popularity of GitHub 

Branching and Merging

Super-lightweight branching was pioneered by git. The lightweight branch is simply a pointer to a commit, which is implemented as the hashcode of the commit (40 bytes) saved to a file named with the branch name. Branching operations are therefore instantaneous. Merging is less likely to result in conflicts, due to the increased context available from automatically determining common ancestor commits.   

Staging Area

Git has an extra staging area to help assemble and compose a commit as cleanly as possible. If you’ve used a VCS without one in the past, it will seem like extra busywork for “not a huge” benefit at first. But with a little practice, experience with a larger team, and focus on clean, single-purpose commits, its value will becomes clearer.

See also:  Online Resources

  • The anatomy of a Git commit 

  • Understanding branches in Git 

  • Fun with merges and purposes of branches 

  • Learn Git with Bitbucket Cloud 

10.3.4. Mercurial, aka HG

Shortly before the first release, I read an article about the ongoing Bitkeeper debacle that described Larry McVoy as mercurial (in the sense of 'fickle'). Given the multiple meanings, the convenient abbreviation, and the good fit with my pre-existing naming scheme (see my email address), it clicked instantly. Mercurial is thus named in Larry's honor. I do not know if the same is true of Git.
— Matt Mackall 
_images/hg_logo.svg

Mercurial  or “hg” (the periodic table abbreviation for Mercury, aka quicksilver) for short, is an open and free DVCS also inspired by the aforementioned BitKeeper fiasco and started about the same time by Matt Mackall. Mercurial values safety and ease of use over unadulterated power, though it can be extended and configured to be as powerful as git. Implemented in Python and C, it is easily extensible. That very extensibility, as harnessed by entities such as Facebook has allowed it to scale past git in some narrow use cases .

To ease the learning curve, Mercurial more often uses traditional terminology to describe concepts, borrowed from systems such as CVS and SVN, tends to have safer, clearer defaults that require fewer obscure parameters, and gives less rope to hang yourself. Though Mercurial has been overshadowed by git popularity to some extent, for reasons just discussed it’s arguably a better choice for most projects and developers—those less demanding than Linux kernel development.

Branching and Merging

Mercurial has “heavyweight” long-lived branches, as well as lightweight git-style temporary branches called bookmarks. The long-lived branch is used for isolating, e.g.: version X.X from new development, while the lightweight bookmarks are used for new and experimental feature work.

Merging is less likely to result in conflicts than in older systems, due to increased context available from its use of a changeset rather than revision model.   

See also:  Online Resources

  • Developer Info: docs on  hacking, extending, hg internals, and more.

Warning:  Invasion of the Git Snatchers

_images/git_logo.svg

Now that you’ve made your choice as to what VCS you’d like to use congratulations—give yourself a pat on the back. Unfortunately, unless you chose git prepare to be on the receiving end of a lot of peer pressure to use git instead. You may be forced to use it due to ubiquity, team choice, and/or github—even though it’s like using a chainsaw to slice a tomato for simple use cases. You probably should learn it anyway, right?  It could be a lot worse, such as being forced to use a clumsy proprietary tool  (shrug).

10.4. Þe Auld (Ye Olde) Crash Course

Here’s a micro crash course in the top-three VCS systems. The commands below were authored under Linux, but should work under macOS and Windows without much modification. Alternatively, you may skip to the end of this section if you are already using one proficiently.

Getting Started

You’ll need to install a VCS beforehand if you haven’t already. First, let’s create a place to work.

> mkdir repos
> cd repos
> mkdir films
> cd films

Now let’s create a repo. Choose a system and follow its section:

If you’d like to complete all three to get a feel for them, it certainly won’t hurt your career.

10.4.1. Subversion

Sorry, Subversion needs a little setup first. Its repo must be in different folder than a working copy, and would normally be on a server. Let’s put it in the parent folder for simplicity:

​# Make local repo in parent folder, to keep separate
> svnadmin create ../films.svn-repo

​# Copy to our working folder
​# Replace $(…) with absolute path under Windows
> svn checkout file://$(dirname $PWD)/films.svn-repo .

​# Add std folders
> mkdir trunk branches
> svn add trunk branches
> svn commit -m "Frist Ps0t: add std folders"
> cd trunk

More commonly we’d check out/clone an existing repository, but we’ve gone ahead and created one ourselves for purposes of this tutorial.

Let’s create a text file. We’ll use markdown syntax, the leading 1. creates an ordered list. Use a real editor you’ve a mind to, copying only the double quoted text:

> echo -e "# Top Star Wars Films\n"    >  starwars.md  # clobber
> echo "1. IV. A New Hope"             >> starwars.md  # append
> echo "1. V. The Empire Strikes Back" >> starwars.md
> echo "1. VI. Return of the Jedi"     >> starwars.md
> echo "1. VII. The Force Awakens"     >> starwars.md

Better add it to our repo and commit so we don’t lose all this work!

> svn add starwars.md  # mark for addition at next commit
A       starwars.md    # status A for Add

> svn commit -m "Add Star Wars document."  # now permanent
Adding         starwars.md
Transmitting file data .done
Committing transaction...
Committed revision 2.
_images/sad_beep.jpg

Thank you $VCS.  Should we add a few more films? How about the shitty prequels? Hmm, let’s create an experimental branch first and switch to it.

(The caret chars below are a shortcut that is filled in with the repository URL, and need to be escaped on the Windows or Fish shell.)

> svn copy ^/trunk ^/branches/prequels
​# Immediate commit happens, editor opens
New branch, prequels.  # Hit save key

> cd ../branches/
> svn update           # retrieve latest
Updating '.':
A    prequels
A    prequels/starwars.md
Updated to revision 4.
> cd prequels/

Ok, branch complete. Let’s add the prequels and—oh yes, as everyone knows Empire was the best so I just moved it to the top in my editor. (Ctrl+T  FTW!)

> echo "1. III. Revenge of the Sith"  >> starwars.md
> echo "1. II. Attack of the Clones"  >> starwars.md
> echo "1. I. The Phantom Menace"     >> starwars.md

Good, let’s check our status:

> svn status
M       starwars.md

Note the status letter to the left of the tracked file. Oh, it’s now M for modified, good, good. Let’s take a look at what we’ve accomplished with the diff command. Will ya look at that?

> svn diff | colordiff
Index: starwars.md
====================================
--- starwars.md     (revision 3)
+++ starwars.md     (working copy)
@@ -1,6 +1,9 @@
 ​# Top Star Wars Films

+1. V. The Empire Strikes Back
 1. IV. A New Hope
-1. V. The Empire Strikes Back
 1. VI. Return of the Jedi
 1. VII. The Force Awakens
+1. III. Revenge of the Sith
+1. II. Attack of the Clones
+1. I. The Phantom Menace

The diff format looks a bit complicated at first, as it’s designed for scripting changes as well as looking at them. It is not hard to read with a bit of experience. Like a VCS, it works line by line.

By default Subversion doesn’t color its output, so in the example above we’ve piped it into colordiff to bring out the details.

Diff format uses a + character at the beginning of a line to signify an addition—which colordiff highlighted for us in green, while a - char signifies a subtracted line—highlighted red. A pair of +/- often signifies a modified line. The purple @@ stuff shows us what line/character the changes occurred at. A few extra lines are provided before and after a change to give the reader context.

On the movie front, Jedi and Force Awakens are a toss up quality-wise, so let’s leave this order for now.

> svn commit -m "I've got a bad feeling about this."
Sending        starwars.md
Transmitting file data .done
Committing transaction...
Committed revision 4.

Make sure to update to get a full history from the server, then let’s take a look at our commit log:

> cd ../..  # back to repo root
> svn up    # short for update
> svn log
-------------------------------------------------
r4 | nobody | 2016-09-06 17:04:56 -0700 | 1 line

I've got a bad feeling about this.
-------------------------------------------------
r3 | nobody | 2016-09-06 16:56:46 -0700 | 2 lines

New branch, prequels.  # Hit save key
-------------------------------------------------
r2 | nobody | 2016-09-06 16:56:22 -0700 | 1 line

Add Star Wars document.
-------------------------------------------------
r1 | nobody | 2016-09-06 16:55:46 -0700 | 1 line

Frist Ps0t: add std folders
-------------------------------------------------

Looks good so far; suppose we’ll keep it after all. In the real world, we’d run our comprehensive test suite right about now. For a list of things to confirm before publishing changes, consult the section on Publishing Code. Time to merge and commit our changes:

> svn merge file://$(dirname $PWD)/films.svn-repo/branches/prequels trunk
--- Merging r3 through r4 into 'trunk':
U    trunk/starwars.md
--- Recording mergeinfo for merge of r3 through r4 into 'trunk':
 U   trunk
> svn commit -m "merge prequels into trunk."
Sending        trunk
Sending        trunk/starwars.md
Transmitting file data .done
Committing transaction...
Committed revision 5.
> svn update
Updating '.':
At revision 5.

And we have successfully merged our experimental branch into the trunk:

> cat trunk/starwars.md
​# Top Star Wars Films

1. V. The Empire Strikes Back
1. IV. A New Hope
1. VI. Return of the Jedi
1. VII. The Force Awakens
1. III. Revenge of the Sith
1. II. Attack of the Clones
1. I. The Phantom Menace

Now that we’ve finished, let’s clean up:

> svn rm branches/prequels/
D         branches/prequels             # Deleten-Sie!
D         branches/prequels/starwars.md
> svn commit -m "rm prequel branch"
Deleting       branches/prequels
Committing transaction...
Committed revision 6.

We’ve finished this section. You may continue on ahead or skip to the next section.

10.4.2. Git

First we’ll need to initialize the repository:

> git init
Initialized empty Git repository in $HOME/repos/films/.git/

More commonly we’d check out/clone an existing repository, but we’ve gone ahead and created one ourselves for purposes of this tutorial.

Let’s create a text file. We’ll use markdown syntax, the leading 1. creates an ordered list. Use a real editor you’ve a mind to, copying only the double quoted text:

> echo -e "# Top Star Wars Films\n"    >  starwars.md  # clobber
> echo "1. IV. A New Hope"             >> starwars.md  # append
> echo "1. V. The Empire Strikes Back" >> starwars.md
> echo "1. VI. Return of the Jedi"     >> starwars.md
> echo "1. VII. The Force Awakens"     >> starwars.md

Better add it to our repo and commit so we don’t lose all this work!

> git add starwars.md
> git status
On branch master / Initial commit

Changes to be committed:
    (use "git rm --cached <file>..." to unstage)

    new file:   starwars.md

> git commit -m "Frist Psot! Add Star Wars document."
[master (root-commit) 05fe71d] Frist Psot! Add Star Wars document.
 1 file changed, 6 insertions(+)
 create mode 100644 starwars.md
_images/sad_beep.jpg

Thank you $VCS.  Should we add a few more films? How about the shitty prequels? Hmm, let’s create an experimental branch first and switch to it.

> git branch prequels           # -b can do this in one step
> git checkout prequels
Switched to branch 'prequels'

Ok, branch complete. Let’s add the prequels and—oh yes, as everyone knows Empire was the best so I just moved it to the top in my editor. (Ctrl+T  FTW!)

> echo "1. III. Revenge of the Sith"  >> starwars.md
> echo "1. II. Attack of the Clones"  >> starwars.md
> echo "1. I. The Phantom Menace"     >> starwars.md

Good, let’s check our status:

> git status
On branch prequels
Changes not staged for commit:

    modified:   starwars.md

Let’s take a look at what we’ve accomplished with the diff command. Will ya look at that?

> git diff
diff --git a/starwars.md b/starwars.md
index 03ed770..22a34ac 100644
--- a/starwars.md
+++ b/starwars.md
@@ -1,6 +1,9 @@
 # Top Star Wars Films

-1. IV. A New Hope
 1. V. The Empire Strikes Back
+1. IV. A New Hope
 1. VI. Return of the Jedi
 1. VII. The Force Awakens
+1. III. Revenge of the Sith
+1. II. Attack of the Clones
+1. I. The Phantom Menace

The diff format looks a bit complicated at first, as it’s designed for scripting changes as well as looking at them. It is not hard to read with a bit of experience. Like a VCS, it works line by line.

Diff format uses a + character at the beginning of a line to signify an addition—which colordiff highlighted for us in green, while a - char signifies a subtracted line—highlighted red. A pair of +/- often signifies a modified line. The purple @@ stuff shows us what line/character the changes occurred at. A few extra lines are provided before and after a change to give the reader context.

On the movie front, Jedi and Force Awakens are a toss up quality-wise, so let’s leave this order for now.

> git add starwars.md  # add to commit from staging area
> git commit -m "I've got a bad feeling about this."
[prequels 4f45f4a] I've got a bad feeling about this.
 1 file changed, 4 insertions(+), 1 deletion(-)

Let’s take a look at the commit log:

> git log
commit 4f45f4a270c24f8c624872ccf55b281677be33ee
Author: NAME <EMAIL>
Date:   Tue Sep 6 18:40:03 2016 -0700

    I've got a bad feeling about this.

commit 05fe71dcb1b7eb03126e44c4f9a5f1b607d01809
Author: NAME <EMAIL>
Date:   Tue Sep 6 18:22:50 2016 -0700

    Frist Psot! Add Star Wars document.

Looks good so far; suppose we’ll keep it after all. In the real world, we’d run our comprehensive test suite right about now. For a list of things to confirm before publishing changes, consult the section on Publishing Code. Time to merge and commit our changes:

> git checkout master  # back to master
Switched to branch 'master'

> git merge prequels
Updating 05fe71d..4f45f4a
Fast-forward
 starwars.md | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

And we have successfully merged our experimental branch into the trunk:

> cat starwars.md
​# Top Star Wars Films

1. V. The Empire Strikes Back
1. IV. A New Hope
1. VI. Return of the Jedi
1. VII. The Force Awakens
1. III. Revenge of the Sith
1. II. Attack of the Clones
1. I. The Phantom Menace

Now that we’ve finished, let’s clean up:

> git branch -d prequels
Deleted branch prequels (was 4f45f4a).

> git push  # how to publish changes when there's a remote repo

We’ve finished this section. You may continue on ahead or skip to the next section.

10.4.3. Mercurial

First we’ll need to initialize the repository:

> hg init

More commonly we’d check out/clone an existing repository, but we’ve gone ahead and created one ourselves for purposes of this tutorial.

Let’s create a text file. We’ll use markdown syntax, the leading 1. creates an ordered list. Use a real editor you’ve a mind to, copying only the double quoted text:

> echo -e "# Top Star Wars Films\n"    >  starwars.md  # clobber
> echo "1. IV. A New Hope"             >> starwars.md  # append
> echo "1. V. The Empire Strikes Back" >> starwars.md
> echo "1. VI. Return of the Jedi"     >> starwars.md
> echo "1. VII. The Force Awakens"     >> starwars.md

Better add it to our repo and commit so we don’t lose all this work!

> hg add starwars.md  # add to repo
A starwars.md
> hg commit -m "Frist Psot! Add Star Wars document." starwars.md
_images/sad_beep.jpg

Thank you $VCS.  Should we add a few more films? How about the shitty prequels? Hmm, let’s create an experimental branch first and switch to it.

> hg bookmark trunk    # need to make a default/master the 1st time
> hg bookmark prequel

Ok, branch complete. Let’s add the prequels and—oh yes, as everyone knows Empire was the best so I just moved it to the top in my editor. (Ctrl+T  FTW!)

> echo "1. III. Revenge of the Sith"  >> starwars.md
> echo "1. II. Attack of the Clones"  >> starwars.md
> echo "1. I. The Phantom Menace"     >> starwars.md

Good, let’s check our status:

> hg status
M starwars.md

Note the status letter to the left of the tracked file. Oh, it’s now M for modified, good, good. Let’s take a look at what we’ve accomplished with the diff command. Will ya look at that?

> hg diff
diff --git a/starwars.md b/starwars.md
--- a/starwars.md
+++ b/starwars.md
@@ -1,6 +1,9 @@
 # Top Star Wars Films

+1. V. The Empire Strikes Back
 1. IV. A New Hope
-1. V. The Empire Strikes Back
 1. VI. Return of the Jedi
 1. VII. The Force Awakens
+1. III. Revenge of the Sith
+1. II. Attack of the Clones
+1. I. The Phantom Menace

The diff format looks a bit complicated at first, as it’s designed for scripting changes as well as looking at them. It is not hard to read with a bit of experience. Like a VCS, it works line by line.

Diff format uses a + character at the beginning of a line to signify an addition—which colordiff highlighted for us in green, while a - char signifies a subtracted line—highlighted red. A pair of +/- often signifies a modified line. The purple @@ stuff shows us what line/character the changes occurred at. A few extra lines are provided before and after a change to give the reader context.

On the movie front, Jedi and Force Awakens are a toss up quality-wise, so let’s leave this order for now.

> hg commit -m "I've got a bad feeling about this."

Let’s take a look at the commit log:

> hg log
@  changeset:   1:2107a79899d3
|  bookmark:    prequel
|  tag:         tip
|  user:        USER <EMAIL>
|  date:        Tue Sep 06 20:02:15 2016 -0700
|  summary:     I've got a bad feeling about this.
|
o  changeset:   0:700a6613b23a
   user:        USER <EMAIL>
   date:        Tue Sep 06 19:16:54 2016 -0700
   summary:     Frist Psot! Add Star Wars document.

Looks good so far; suppose we’ll keep it after all. In the real world, we’d run our comprehensive test suite right about now. For a list of things to confirm before publishing changes, consult the section on Publishing Code. Time to merge and commit our changes:

Mercurial requires another commit to be made on a bookmark to create another head, so we’ll make a tiny addition first:

> hg update trunk
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
(activating bookmark trunk)
> echo "1. VIII. TBD" >> starwars.md
> hg commit -m "add next film." starwars.md
created new head

> hg log
@  changeset:   2:714dddf58392
|  bookmark:    trunk
|  tag:         tip
|  parent:      0:bb99850473b7
|  user:        USER <EMAIL>
|  date:        Tue Sep 06 20:26:10 2016 -0700
|  summary:     add next film.
|
| o  changeset:   1:a41e4d13a26f
|/   bookmark:    prequel
|    user:        USER <EMAIL>
|    date:        Tue Sep 06 20:23:24 2016 -0700
|    summary:     I've got a bad feeling about this.
|
o  changeset:   0:bb99850473b7
   user:        USER <EMAIL>
   date:        Tue Sep 06 20:21:36 2016 -0700
   summary:     Frist Psot! Add Star Wars document.

> hg com -m "merge"  # short for commit

And we have successfully merged our experimental branch into the trunk:

> cat starwars.md
​# Top Star Wars Films

1. V. The Empire Strikes Back
1. IV. A New Hope
1. VI. Return of the Jedi
1. VII. The Force Awakens
1. III. Revenge of the Sith
1. II. Attack of the Clones
1. I. The Phantom Menace
1. VIII. TBD

Now that we’ve finished, let’s clean up:

> hg bookmark -d prequel

> hg push  # how to publish changes when there's a remote repo

10.5. On Drudgery

Woe unto me…
— Book of Job 
_images/woe-is-me.png

Fig. 12 *Sniffle* 

It may seem like this VCS stuff is a lot of overhead—and it is in this situation—a toy project where you’re still clumsy and perhaps unclear on the concept. That will all change with a bit of experience working with others on something substantial. You’ll need VCS to work with a team effectively, and the first time it saves your bacon  you’ll be a convert. It’s not as hard as you might be thinking at the beginning.

Tip:  Aliases Foo’!

As these commands get typed a lot, it’s recommended to create a shitload of shell aliases  and perhaps a fancy prompt command that displays e.g. branch info to expedite everyday tasks.  

See also:  Online Resources

10.6. Best Practice

Best practices are useful reference points, but they must come with a warning label: The more you rely on external intelligence, the less you will value an internal idea. And this is the age of the idea.
— Gyan Nagpal, Talent Economics 

A few topics that may raise the value you’ll get out of version control tools are discussed in this section.

Clean, Single-Purpose Commits

Sometimes it is too burdensome to separate every change into its own commit. However, aiming for (and often achieving) this goal will serve you well in the longer term.
— Michael Ernst, Professor of CS & E, UW 

Remember back to the “Programming in a Nutshell”    sketch from Malcolm in the Middle? When looking over a large codebase, whether to add a feature or fix a bug, you’ll undoubtedly see several things to fix while you’re in there; and most of the time you be unable to resist the urge to fix them. At least that’s the case with me, grin. 😀

This poses a problem because ideally you’d like to have your commits be as focused as possible, i.e.: dealing with one and only one issue at a time, and implemented as completely as is practical. It will vastly simplify things when the next person needs to understand, modify, or fix your fix, and of course fixes need to be reverted completely sometimes. But, what happens if an important refactor or the boss’ new pet feature was included in there too in the same commit? A big mess, that’s what. The changes should have been independent in the first place.

To avoid this situation and keep commits as focused as possible we have the notion of cherry-picking  from a list of possible changes when composing your commit. Git makes this easier with its staging area , Mercurial has an interactive mode as well. The process is a bit like making a shopping list, or two. You look over the list of lines you’ve changed from the repo, then group them into bundles according to purpose. You’ll be glad you did later.

Write Descriptive Commit Messages

It goes without saying that each commit should have a concise and yet descriptive message of what occurred—making it possible for team members to troubleshoot issues more quickly. Think of it as basic consideration for others, including your future-self.

Inclusion of a bug-id to fully describe the problem helps considerably. Modern systems will often link it to the bug record.

FOO subsystem updated to handle BAR-type input #215.

Incorporate and Share Work Frequently

Whenever starting work on a new item, and up to several times a day, get in the habit of updating to the latest possible version to avoid conflicts. The longer one waits, the more others will have changed. The greater number of changes to merge increase the likelihood of issues.

Further, do not hold on to work if it has been completed to a satisfactory level. Sharing and integrating our work frequently decreases the chance of conflicts with others for the same reason mentioned above.

While this practice may result in a higher number of small errors, they will be ones that get fixed quickly, with fewer large errors and less mistaken work thrown out later. Under Subversion, there’s a greater need to make branches since commits may directly affect teammates.

Coordinate with co-workers of course, if you will be revising the same file(s) at the same time.

History Modification

…a pathway to many abilities some consider to be unnatural.
— Chancellor Palpatine 

Over the course of a project utilizing version control, a log of events is recorded, as you’ve undoubtedly noticed by now. Indeed, it is this bookkeeping information that powers the majority of the features we desire from a VCS. For that reason, the history log is often considered sacred, but there are other views. On the subject of rewriting history, there seems to be two camps :

1) Those that believe that repo history should be permanent and written in stone, for safety and auditing reasons.

Mercurial and Subversion fall into this camp, though hg history can be modified through extensions.

2) Those that believe that history should be heavily edited to make it easy to understand, as you would refactor code. Sometimes called “rebasing.”

Git users tend to fall into this camp, though you are welcome to keep your mitts to yourself and leave its history untouched.

Which is correct?  It depends on your project, industry, and temperament.

Additional Considerations

A few remaining important considerations are discussed below, list courtesy of Michael Ernst at UW :

  • Remember that the tools are line-based.

    This means generally you should avoid making inconsequential formatting-based changes to files, such as removal of trailing spaces or reflowing or justifying text. In many source-based docs, spacing doesn’t matter. If a change to your coding standards need to be made go ahead, however try to segregate such formatting changes from others when practical.

    Keeping lines shorter, to 80 or so characters, helps reduce the chance of multiple edits occurring on the same line, reducing conflicts.

  • Don’t commit generated files.

    Only files that have a significant amount of time invested in them should be tracked in version control. This generally means human-editable source files, not output files such object code and binaries. These types of files can be regenerated at will, and do not work well with line-based difference tracking—they don’t have lines.

    Use of an “ignore” file, e.g. .gitignore, .hgignore, makes this easier, through customization of the output files to avoid.

  • Practice with a merge tool before needing it.

    Don’t wait until a big deadline and merge conflict emergency to learn how to use a merge tool!

  • Don’t force things.

    If your VCS is requiring you to “force” it to continue, something is probably wrong or it is unsafe to proceed. Get help!

  • Cache your credentials using the VCS or an OS-provided keyring so you don’t have to type in your password constantly.

  • Setup of a mailing list for email notification of commits can be useful for those who need to keep on top of them.

See also:  Online Resources

  • Version control concepts and best practices by Michael Ernst, Professor of CS & E, University of Washington 

10.7. Hosted Services

Little people doing little things in little places everywhere can change the world.
— Anwar Fazal 

With the advent of cheap disk space, bandwidth, and massive data centers, freemium online VCS hosting has gone mainstream:

Tip:  “Social Networks” for Developers

Most developers should have an account on one or more of these sites—as they’ve grown from mere hosting to an industry-specific social and professional network. You’ll share code, learn from others, contribute to the common good, and start creating a public portfolio to showcase what you can do.

While GitHub is often chosen for public FLOSS projects, Bitbucket is best for private projects due to the difference in their pricing strategies. It is a shame these services are not more open products however. In the group above only GitLab has a FLOSS version. There are others, such as Kallithea .

$SERVICE is My Resume…

_images/github_resume.png

The meme that peaked a few years ago, “GitHub is My Resume”  (or Bitbucket/Gitlab, etc) has truth to it. It can be quite useful to have a record of work out in the open, to use for hiring purposes. Employers: it should not be a requirement however—as not everyone can participate, due to work contracts among other reasons.

As these networks are places to develop yourself professionally in a public forum, take care to follow best practice and act in an appropriate manner if you want to further, rather than hurt your career.

See also:  Online Resources & Tools

  • An Agile Perspective on Branching and Merging , a catalog of workflow patterns.

  • Easy Git , the power of git with less bite.

  • git-svn

    Work in a more modern manner—make offline commits, lightweight branching/merging, cherry pick changes, etc—then push to a Subversion server without affecting coworkers. 

    …while your collaborators continue to work in their dark and ancient ways.
    — ProGit, by Schacon and Straub 
  • Meld: a standalone, opensource GUI diff and merge tool. 

We’ll continue on to discuss other facets of Configuration Management in the next chapter.

TL;DR 

  • Version control systems provide the means to manage changes to source code and documents safely.

  • You won’t be considered a Professional Developer™ unless you use and understand version control systems well.

  • Version Control System (VCS), Revision Control System (RCS), and Source Control (or Code) Management (SCM), are three names for version control you’ll encounter.

  • A local data model was used in the earliest VCS tools, designed before ubiquitous networking.

  • A centralized repository model solved the problem of forgotten locks, and facilitated backups and security.

  • The era of proprietary version control is at its end (unless you’re unlucky).

  • Internally version control systems are designed around trees of nodes that track file and metadata changes, via snapshot or delta-encoding techniques.

  • With distributed version control (DVCS) each user keeps a full copy of the repo and can share changes with others, participating in a peer-to-peer networking as necessary.

  • The separation of commit and publishing steps (under DVCS) facilitates offline/mobile work and “tighter” or cleaner commits.

  • Distributed version control doesn’t preclude a central server considered authoritative, always available, and backed up regularly.

  • An oddly-named trio of open, free, and popular VCS choices are Subversion, Git, and Mercurial.

  • Shell aliases can speed day-to-day VCS tasks.

  • Commits should be focused as possible on a single or single set of issues.

  • Write descriptive commit messages as in consideration of others, including your future self.

  • Incorporate and share work frequently to reduce conflicts and integration risks.

  • History modification may be considered to present a simplified view of what occurred during development.

  • VCS tools are generally line-based.

  • Don’t commit generated output files.

  • Practice with a merge tool before needing it.

  • Don’t force a VCS to continue if it refuses, get help.

  • Email notification of commits to a mailing list can be helpful for those who oversee development.

  • With the advent of cheap disk space, bandwidth, and massive data centers, freemium online VCS hosting has gone mainstream. GitHub, Bitbucket, and GitLab are three contenders.