This week I found myself having to explain to well meaning folk the important differences between encryption and a hash function. Simply put encryption, comes with the equal and opposite notion of decryption, while a hash function is designed to be a one way process (once its hashed there is no way back). I am purposefully ignoring the highly advance mathematics that goes into both encryption and hash functions (an engineering approach rather than an academic one).

So once the notion of hashing came up we talked about a variety of techniques and for some unknown reason MD5 was mentioned and started to be used synonymously with hashing in general. It was quickly brought to our attention that MD5 was compromised and badly (emphasis mine):

In order for a software integrity checksum or a digital signature based on a hash value to be of any value, the cryptographic hash function that is used must be collision resistant. That is, it must be practically impossible to find different messages that have the same hash value. Otherwise, a miscreant can use a single hash value to commit to more than a single file.

The cryptographic hash function MD5 was shown to be not collision resistant, by prof. Xiaoyun Wang and her co-authors, in 2004 (see the EuroCrypt 2005 paper "How to break MD5 and other hash functions").

I am not sure there is anything sadder than a compromised hash function, so MD5 is out! How about SHA-1? Well technically we have the same issue, and by technically, I mean at considerable cost in raw computing power:

As of 2012, the most efficient attack against SHA-1 is considered to be the one by Marc Stevens with an estimated cost of $2.77M to break a single hash value by renting CPU power from cloud servers. Stevens developed this attack in a project called HashClash, implementing a differential path attack.

So when selecting a hashing algorithm most security experts will anticipate and expect a minimum internal state size of greater than 160 bits, this ensures that the opportunity to find a collision becomes unrealistic even with a massive amount of computing power. In order to meet that minimum requirement SHA-224/SHA-256 becomes the baseline, here is a code sample:

class Program
{
static void Main(string[] args)
{
//Even though I am using a secure hashing algorithm
//I still believe you should be using a salt!
byte[] salt = CreateRandomSalt();

byte[] somebytes = Encoding.UTF8.GetBytes("THIS IS SOME TEXT TO HASH");
string hash1 = CreateHash(somebytes, salt);

byte[] morebytes = Encoding.UTF8.GetBytes("THIS IS SOME TEXT TO HASh");
string hash2 = CreateHash(morebytes, salt);

//Not likely to see a collision o_O
if (hash1 == hash2)
Console.WriteLine("We have a match");
else
Console.WriteLine("Does not match");

Console.ReadLine();

}

private static string CreateHash(byte[] textbytes, byte[] saltbytes)
{
byte[] textandsalt = new byte[textbytes.Length + saltbytes.Length];

//fill out the initial by text bytes
for (int i = 0; i < textbytes.Length; i++)
textandsalt[i] = textbytes[i];

//pad with the salt bytes (not necessary but safe)
for (int i = 0; i < saltbytes.Length; i++)
textandsalt[textbytes.Length + i] = saltbytes[i];

SHA256Managed hash = new SHA256Managed();
byte[] hashedbytes = hash.ComputeHash(textandsalt);

return Convert.ToBase64String(hashedbytes);
}

private static byte[] CreateRandomSalt()
{
Random randgen = new Random();
int saltsize = randgen.Next(8, 12);

//Define a salt array
byte[] saltbytes = new byte[saltsize];

RNGCryptoServiceProvider rngprovider = new RNGCryptoServiceProvider();

//Create random salt
rngprovider.GetNonZeroBytes(saltbytes);

return saltbytes;
}

}