I was looking through my photos yesterday and I ran into a problem because it looks like I have hundreds of duplicate pictures.  I think that this must have happened when I was trying to get all the files that I had in the three or four different picture folders into one single folder.  I knew that calculating an MD5 hash would find the duplicates without having to get into huge programs that would compare creating dates, file sizes, etc to find the duplicates but I wasn’t sure how to find and MD5 hash in VB.net so after some work with MSDN I came up with the following function:

In theory an MD5 hash should return a unique 128-bit value for every file in existence.  It has been shown in the past that it is possible to create collisions in the MD5 hash function so keep that in mind.  My duplicate file sorter makes you compare the files before you delete them.

Private Function getFileMd5(ByVal filePath As String) As String
    ' get all the file contents
    Dim File() As Byte = System.IO.File.ReadAllBytes(filePath)

    ' create a new md5 object
    Dim Md5 As New MD5CryptoServiceProvider()

    ' compute the hash
    Dim byteHash() As Byte = Md5.ComputeHash(File)

    ' return the value in base 64
    Return Convert.ToBase64String(byteHash)
End Function

I returned the value in base64 because it was quick and because it results in a shorter string.