Using Git as a "Poor Man's" Time Machine

Audience: Doesn’t cry when installing new software and using the command line, interested in reading a long rambling post about Garry Winogrand, Chuck Norris and, eventually, Git.

In recent versions of Apple’s OSX there has been a new feature called “Time Machine”. In short it allows you to step back in time to revisit your file system in previous states and copy files from those previous times. “Time Machine” is a simplified gui for a advanced filesystem created by Sun called ZFS. In this series of articles, we will attempt to mimic some of these features in our own creation, powered by Git.

Note: These directions are a little Windows-centric, though it should be easy to adapt them to any operating system

It Finally Happened

So the day has finally come all those hours and hours of blood, sweat and tears spent working on the All Important Spreadsheet, or perhaps it was a colossal document of your famous stamp collection. Maybe it was your latest digital masterpiece, a homage to Garry Winogrand’s “Park Avenue, New York” done entirely in MS Paint?

You’ve checked once…twice…three times and you are finally coming to grips with the fact that your file is missing. Well the hope is that its just missing, maybe it was misplaced, accidentally saved to a different folder. After a frantic search, nothing has turned up.

After the panic subsides you start thinking of what to do next. After searching in other folders, the next logical step would be to retrieve it from yesterdays backup (you are backing up right?). So you pull out yesterday’s backup and realize you have just lost hours of productive time. Not only have you spent all morning searching for the file, now you have to somehow recreate all the work that was lost. There has to be a better way!

You Silly Git

Well, its your lucky day über hacker Linus Torvalds already invented it, for fun, while beating Bruce Schneier at chess and punching Chuck Norris in the face. It’s called Git.

The Playschool definition of Git is as follows:

Super-duper undo for files - with cheatcodes. With its amazing branching, merging and other hoopla it is like the [Konami Code](http://en.wikipedia.org/wiki/Konami_Code) crossed with [Portal](http://www.youtube.com/watch?v=iFhPFSjNovA&feature=related) crossed with [Your Mom](http://beltespenner.com/oscommerce/images/i%20love%20your%20mom.jpg) (because she really does try to do whats best for you even if you don't understand it at the time)

The Git Website says the following:

> > ## Git is... > > Git is a **free & open source, distributed version control system** designed to handle everything from small to very large projects with speed and efficiency. **Every Git clone is a full-fledged repository** with complete history and full revision tracking capabilities, not dependent on network access or a central server.**Branching and merging are fast** and easy to do.

Say What?

The short of it is this: git (and other distributed revision control software) let you record changes to your files in a meaningful way (you get to tag the changes with a name). Then you can arbitrarily pick and choose, roll back and forward, and branch and merge these changes. We are going to take the baby steps necessary to get you up and running with an “automatic” record (or commit) instead of the “real” way, which is to record these changes as you go.

This is not the intended use for Git. Git was created to be used by somewhat technical people, working independently, with text files. We will be abusing it by using it with/for non-technical people (well you might be a smarty pants but we are using a very dumb approach of “fire and forget” for the commits).

Also, we will be recording the pool (repository) of everyone’s work, normally each person would have their own copy of all the files to work on and then when they are done working all of everyones changes are merged into one repository.

Lastly, Git (and most other versioning software) is made to work on text files. “Why?” you may ask? Because, diffing (finding the differences between two different versions of the same file) text files is easy peasy, diffing binary files (images, word documents, programs, etc) is hard stuff. Every type of binary file has its own format. Which means whoever invented the format had their own vision how the bits should be ordered and what they really mean.

In some binary formats, to be more efficient, everything gets rewritten, not just the stuff that changed. For example, if you are working on an image and remove the background. If you compare the images side by side it is obvious to you what has changed. But to the computer all it knows is bits. So half of the file may be changed for more efficient storage. But I digress…

The point is that even though we are using Git in a way its creator hadn’t intended, it is flexible enough for the job. This is a testament to the philosophy of doing one thing and doing it well. Our project today is not an end-all-be-all it is merely a stop-gap for situations where you don’t have regular, consistent backups and/or you want some of the benefits of version control.

Enough Already, Let’s Get To It

You’re going to hate me. Only a little though.

In the amount of time you spent reading the above drivel you could have already implemented our little project. Well, we laughed, we cried, good times…

Anyway on to it! On to…

Building a “Poor Man’s Time Machine” with Git

It’s not really the worst ever but it’s no Tardis. We will be able to move backward in time, kinda forward-ish, depending on your perspective, and sideways as well. We are mostly going to be concerned with the preventing-the-JFK-assassination-and-returning-to-the-present-day-with-nothing-else-changed rather than the Bill-and-Ted-travel-back-in-time-and-totally-screw-with-the-present-errm-future-err-whatever-dude.

When everything is said and done you will be able to: see what files have been added or changed in the previous day (or hour, or whatever interval you choose) and be able to arbitrarily grab any previous version of any file.

What you need to download (assuming your are running Windows)

What you need to know

  • What directory you are going to run this on

  • How to run a scheduled task on your file server

  • The appropriate bribe for your local sysadmin if you don’t have admin rights

What else

  • This will take up extra space on your disk, maybe a lot. Depending on the changes git may have to store a complete copy (or two or three) of any file that changes. It all depends on the number and type of changes

  • You will need to watch this like a little baby bird in a shoebox for a while. Eventually, you can probably let it fly on its own, but for now if you drop it out the window…well…the results won’t be pretty

  • I never said this was a great solution. If things start on fire, if everything gets deleted, if your server explodes. Well I’m sorry, I’ll send you a very sorrowful e-card, but thats about it. Your kinda on your own with this one

Time Machine Go

Okay, so install everything, I’ll wait…

Good now lets tell Git what folder to work on and “initialize” it. And no, don’t worry, “initialize” has nothing to do with “erase”. The easiest way to show you is from the command line, so roll up your sleeves.

During our example we will be using a folder called “SharedFolder”. This will represent the common file-share on our hypothetical server.

C:\>cd SharedFolder

C:\SharedFolder>dir
 Volume in drive C has no label.
 Volume Serial Number is 18A3-D0C5

 Directory of C:\SharedFolder

02/13/2010  12:12 PM    <dir>          .
02/13/2010  12:12 PM    <dir>          ..
02/13/2010  12:02 PM                 5 FileOne.txt
02/13/2010  12:02 PM                 5 FileTwo.txt
               2 File(s)             10 bytes
               2 Dir(s)     984,203,264 bytes free

C:\SharedFolder>git status
fatal: Not a git repository (or any of the parent directories): .git

C:\SharedFolder>git init
Initialized empty Git repository in C:/SharedFolder/.git/

C:\SharedFolder>git status
# On branch master
#
# Initial commit
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       FileOne.txt
#       FileTwo.txt
nothing added to commit but untracked files present (use "git add" to track)

In this example we have initialized the directory (telling git this is where we want to work). Since we have not added any files yet, we have not told git to actually track them. So now we will do just that.

C:\SharedFolder>git add FileOne.txt

C:\SharedFolder>git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#
#       new file:   FileOne.txt
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       FileTwo.txt

C:\SharedFolder>git add .

C:\SharedFolder>git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#
#       new file:   FileOne.txt
#       new file:   FileTwo.txt
#

As you have seen we can use “git add” to be picky about what files we include. In advanced usage you can even tell git what part of which files to include. The last command “git add .” is a shortcut telling git to add every new file that has is not already being tracked.

Lastly, we are going to commit our changes to git. In effect, this is creating a checkpoint within git. Now any time in the future we can roll back to exactly this state, regardless of how many changes we have made, even if we have deleted the files entirely. As long as the hidden “.git” directory is there, all our changes are there too.

C:\SharedFolder>git commit -am "Initial Commit"
[master (root-commit) 436162e] Initial Commit
 2 files changed, 2 insertions(+), 0 deletions(-)
 create mode 100644 FileOne.txt
 create mode 100644 FileTwo.txt

C:\SharedFolder>git status
# On branch master
nothing to commit (working directory clean)

C:\SharedFolder>git log
commit 436162e50d2075366634064793ef7ef8051da871
Author: unknown <[email protected](none)>
Date:   Sat Feb 13 12:12:41 2010 -0600

    Initial Commit

C:\SharedFolder>

Implementation

Like the flux capacitor in Dr. Brown’s DeLorean, git is doing most of the work in our little “time machine”. Since, all of the hard work has already been done, there is only a small script we need to write to “steer” git.

cd C:\SharedFolder
git add . && git commit -am "Daily Update"

Yep, that’s really all there is to it.

So go ahead and save this code as a batch file. Test it a few times. After you run it you should be able to do a “git log” and see a new revision (assuming that changes have been made).

At this point all there is left to do is to setup your batch file as a scheduled task and wait…

Review

  • git init - Create a new git repository

  • git add SomeFile - Add SomeFile to the list of files that git will track

  • git add . - Add every untracked file to the list of files that git will track

  • git commit SomeFile - Commit the changes to SomeFile

  • git commit -am “My Message” - Commit all changes and tag them with the message “My Message”

  • git log - Review the past changes

Further Reading

Coming Soon…

In part two we will be discussing exactly what can you do with this wonderful contraption we have built. We will learn how to compare changes, see new files that have been added, and bring back old files that have been deleted or changed. See you…in the future.

Matryoshka Dolls: We have a winner!

We have a winner

I want to thank everyone for participating in last week’s puzzle.  I would also like to announce that we have a winner.

ropers was the first (and only) person to successfully reveal the hidden message: “Open the pod bay doors, HAL.”

Hi Zach,

This is a reply to the Matryoshka Dolls Sunday Hacker Puzzle.
The tentative solution/command I've found is:

Open the pod bay doors, HAL.

However, I'm a bit unsure if that's really the <strong>*whole*</strong> solution and
all there is to this riddle -- or if there's yet more to find (see
below).

To get to this point, I have:

1. Recognized the RussianDolls.ppm file as a portable pixmap picture
in binary format with the P6 (0x5036) magic number and looked at it in
the GIMP. I noted that there were eleven red babushkas.

2. Run strings(1) on the .ppm file and looked at the output, where I
saw the Rar! magic number and the secret.txt.gpg file name.

3. Used manual file carving ( cf.
<a href="http://en.wikipedia.org/wiki/File_carving">http://en.wikipedia.org/wiki/File_carving</a> ) to isolate the RAR file.
(This proved unnecessary, as the .ppm will also open just fine as a
RAR if simply renamed to RussianDolls.rar. That's a known phenomenon
and an old trick with both RAR and ZIP files for hosting files at
imageboards.) I extracted the secret.txt.gpg file from the RAR file.

4. Used Scalpel ( <a href="http://www.digitalforensicssolutions.com/Scalpel/">http://www.digitalforensicssolutions.com/Scalpel/</a> )
to automatically care the shit out of RussianDolls.ppm, but all I got
that way was a shed full of .pgp files, which are probably just false
positives.

5. Looked at the RussianDolls.ppm file with a hex editor and noted the
DEADBEEF magic number at 0xe0, which corresponds to pixels 69x0y
(FFFFDE) and 70x0y (ADBEEF). I decrypted the secret.txt.gpg file with
the lowercase deadbeef password and found it to contain the
aforementioned command from "2001: A Space Odyssey" (Stanley Kubrick,
1968) in figlet(6)'s default ASCII art font.

But reading your hints page
<a href="http://thehelpfulhacker.net/2010/02/02/sunday-hacker-puzzle-matryoshka-dolls-hints/">http://thehelpfulhacker.net/2010/02/02/sunday-hacker-puzzle-matryoshka-dolls-hints/</a>
 confused me and made me doubt whether I've found the solution. You
emphasised the visual difference between the original image and your
version. Sure, pixels 69x0y and 70x0y are different from the original
image you referenced, but the most noticeable difference is the
missing shadow and the prominent structure/square to the left of the
largest babushka: <a href="http://i.imgur.com/Vk5R0.png">http://i.imgur.com/Vk5R0.png</a>
But when I calculated the hex offset that corresponds to that location
24x168y -- starting at 0x0y(!) with 345 pixels per row (in this
345x292 picture) and 3 bytes per pixel (in the binary ppm format),
that's 15 bytes (=offset at beginning of file before binary pixel data
starts, occupied by 50 36 0A 33 34 35 20 32 39 32 0A 32 35 35 0A) +
(168(!) rows * 345 columns + 24(!) _more_ columns in the next row) * 3
bytes = (168*345+24)*3+15 = 173967 bytes = 0x2A78F, I didn't find
anything noteworthy there.

So maybe the entire missing shadows and prominent squares are
incidental, and you were really just referring to FFFFDEADBEEF at
pixels 69x0y and 70x0y?

I don't want to get a better chance than anyone else, but maybe you
could publicly drop a hint that might make it clearer at what point
the chase is finished and whether the 2001 quote is the solution
(without giving it away, of course)?

Many thanks and kind regards,
--ropers

As noted, there was some confusion which i cleared up in subsequent posts.

Thanks for playing

I had a great time with this puzzle and really enjoyed the responses from everyone.  I’ve learned how to make these puzzles better.  Hopefully some of you have learned some new skills or put some old ones to the test.

I look forward to doing something like this again, in the near future.  If anyone has requests/suggestions, please let me know - [email protected]

Matryoshka Dolls - Update...

So far I have received one correct answer, but there is still time until Sunday for you to crack the puzzle…

There might be some confusion about the two images, the original jpg and the modified ppm file.  Don’t worry about the difference in cropping, this isn’t significant.

How do you know you are at the end of this rabbit hole?  The command will be recognizable to anyone who has seen “2001: A Space Odyssey”.