Let’s say you have to download a file from the Internet. This file is highly sensitive and it’s important that you receive exactly the file that the sender is trying to send you. What if one of the following occurs:
- A hacker breaks into the site and replaces the original download with their own malicious download?
- There is an error in the file transfer and it’s accidentally modified somehow?
How do you know that the file you downloaded is exactly the same as the one promised by the person sending you the file? The answer lies in a process known as “hashing”. In this article, we’ll take a look at what hashing is, how to hash a file in Linux, and a live example of how it works.
What is Hashing?
“Hashing” a file refers to generating a unique alphanumeric string that isn’t shared by any other file. This alphanumeric string is called the “hash”, or the “digest”. Well, that’s the ideal at least. Different hashing algorithms produce outputs of varying security. In principle, we want to use a process that shares the following features:
- It’s unfeasible to have two files return the same hash value;
- The computation process is fast – we can do it with everyday hardware;
- Small changes in the file create a big difference in the hash – so that it’s obvious even at a cursory glance that the two hashes are not the same;
- We can’t generate the original file by looking at the hash value.
This “digital fingerprint” of a file is used to verify its identity. The provider will tell you for example, that the hash of the file is “xxxxxxxxx”. Then when you receive the file, you compute the hash on your own. If you also get “xxxxxxxx”, it means you know that the file hasn’t been tampered with or corrupted. Let’s take a look at an example.
And Example Using the MD5 Hashing Algorithm
MD5 is a popular hashing algorithm to generate hashes from files. Even though it was found to be insecure years ago, it’s still very commonly used. At the end of this tutorial, we’ll also look at the “SHA256” hash which is much more secure and is recommended for any real security conscious applications.
Let’s say we want to download the latest server version of Ubuntu from the following website: http://releases.ubuntu.com/xenial/. You can see in the screenshot below, that Canonical also provides a file called “MD5SUMS”.
Opening this file will show us the following:
This is a list of hashes with the corresponding file names next to them. So for example, if I’m interested in the server AMD64 file, the MD5 hash for that is:
I choose the file I want to download from the original page here:
And I use the familiar wget program to download it onto my Linux server:
Now I use the built in program called “md5sum” to generate the hash for the file I just downloaded. Like this:
This generates the following output:
You can see that the output here exactly matches “10fcd20619dce11fe094e960c85ba4a9”, which was what was displayed in the MD5SUMS file provided by Canonical. This means that the two files are the same, there has been no tampering, and nothing has been messed up in transit.
But wait – didn’t I just say that MD5 was insecure? I did! And if you really want to be sure, you should use SHA256 instead like this:
Canonical also provides the SHA256 sums in the same folder:
So you can compare these as well. Remember, one of the features of a good hashing algorithm is that even a slight variation in the file will produce a drastically different end result. So if the two hashes look the same even at a glance, the files are most likely the same. I say “most likely” because there is a theoretical chance that two different files will have the same hash. But in reality this chance is small enough to be negligible.
So that’s how we go about hashing a file in Linux. Hashing is the basis of modern cryptography in one form or another, and it’s amazing how easily we can verify the authenticity of a file with just one command!