The Git hashes

Not a blog about the science or practice of travel through books, but perhaps rather about science, practice and travel through books.

The Git hashes

April 5, 2011

I started thinking about the SHA-1 hashes used in Git today. The hashes are 160 bits long, and are usually represented by 40 character hex strings. Why 40?

If one character represents x bits, the character set must include 2^xdistinct characters. And with 2^xdistinct characters (or numerals), we have a base-2^x system. With just two distinct characters (A and B, say), one character represents one bit, x = 1, and we have a base-2 system. With four distinct characters (1,2,3,4, say), we have a base-4 system, and one character represents two bits since 2² = 4.

One hex character (or hexadecimal), is a character in a base-16 system (hexa – 6, decimal – 10), so a full set must include 16 distinct characters, and each character represent 4 bits. Most often, the set are taken to consist of the numerals 0-9 plus the six letters A-F. This makes a hex string easy to read and use for human beings.

The number n of x bit characters needed to store a 160 bit hash is found by solving 2¹⁶⁰ = (2^x)ⁿ, or n = 160/x. The number of hexadecimals needed are therefore n = 160/4 = 40.

On the other hand the byte character represents 8 bits, which means a full set must include 2⁸= 256 different characters. A byte string would therefore only be 160/8 = 20 characters long, but most byte character sets include characters that are not human readable, such as control characters. A character set of 256 human readable characters using some of the richer alphabets than the basic Latin one would be possible, but then come issues with transferability.

What about something inbetween? A 5 bit string would only be 160/5 = 32 characters long, and would need 2^5=32 unique characters, a base-32 system. That could be the 26 letters A-Z plus 6 numerals. It could still be case-insensitive and easy readable, yet 8 characters shorter. Nevermind…

Category:

computing, Uncategorized

Biblionautics

Not a blog about the science or practice of travel through books, but perhaps rather about science, practice and travel through books.

The Git hashes

Leave a comment Cancel reply

Welcome

Biblionautics

Not a blog about the science or practice of travel through books, but perhaps rather about science, practice and travel through books.

The Git hashes

Share this:

Related

Leave a comment Cancel reply

Welcome