Rich recently discussed version control (slides, notes for Git with Ubuntu and NYU). Version control, which is also called revision control or source control, is ‘the management of changes to documents, programs and other information stored as computer files’. Rich started with some ideas on what you might use it for and why you should use it. One use is for synchronization. You have your laptop, which you work on from home and while traveling, and you need to snyc it with your server, which you use for simulations and backup.
Another use is for collaborating on writing papers with many authors. In this case, you need to share many different documents and version control is the right way to do it. Also good for record taking and reproducibility. In that direction, you’ll be in the habit of commenting on the changes you make and you’ll be able to return to previous points in the code, such as ones where things seemed to be working. Final motivator is that it’s useful for releasing code.
Rich gave his desiderata as: synchronize between multiple machines; share between multiple local and nonlocal coauthors; make parts of the code public (e.g., development versus stable release); and record simulation settings, changes an author has made, etc. His point, version control does this and more. Some more background on version control it’s an analogue to ‘track changes’ in Microsoft Word, but for all files in a directory; it records who changed what, where and when; it can roll back changes; and can share/synchronize all the files.
Rich focussed on one distributed version control system, Git. And he provided some explanation for why Git and not something else. First reason is that it’s distributed. Some version control systems are centralized, so everything passes through a server. This means it’s slow and you cannot work offline. In Git, everyone has a copy of the database, meaning they have the whole history. Everything is local. It’s very fast and allows for offline work. It’s distributed so there are many backups. Branching, where a set of files are separated so they can be independently worked on in different ways, is easy with Git. It’s open source and free. Having said that, one limitation is that it only works on certain kinds of files.
Last thing is where to go for more information. American Scientist article which makes the point that ‘scientists would do well to pick up some tools widely used in the software industry’. More on the bottleneck idea from HPC Wire. Software Carpentry, an awesome place for learning more about these kinds of things, includes lectures on version control. InfoWorld on why Git is on the up and up. Git project home. Git Reference. Git Wiki. Git tutorial. Git for scientific computing tutorial (part of Advanced Scientific Programming in Python). Version control in MATLAB. Unison (for file synchronization). Some Mac tools: Versions (for SVN); Gitbox; Tower; and SourceTree. Last but not least, GitHub, ‘a web-based hosting service for software development projects that use the Git’.