Hashing • 0xnhl

Hashing takes arbitrary input and produce a fixed-length string (hash value) that has the following attributes:

The same input will always produce the same output.
Multiple disparate inputs should not produce the same output.
It should not be possible to go from the output to the input.
Any modification of a given input should result in drastic change to the hash.
Hashing serves the purpose of ensuring integrity, i.e. making it so that if something is changed you can know that it’s changed.
Examples: SHA1, SHA-256, SHA-3 ↗, MD5 ↗, etc.

Hashing is used in conjunction with authentication to produce strong evidence that a given message has not been modified.

This is accomplished by taking a given input, hashing it, and then signing the hash with the sender’s private key.
When the recipient opens the message, they can then validate the signature of the hash with the sender’s public key and then hash the message themselves and compare it to the hash that was signed by the sender. If they match it is an unmodified message, sent by the correct person.

Passwords are securely stored by hashing them using a secure hashing function, so that in case of a database breach, plain-text passwords are not exposed.

Authentication mechanisms only need to confirm that the user knows the password so they can be granted access to the resource, so they just need to store hashes to match with the user input.
There’s just one problem with this. What if two users have the same password? As a hash function will always turn the same input into the same output, you will store the same password hash for each user. That means if someone cracks that hash, they gain access to more than one account. It also means someone can create a Rainbow Table to break the hashes.
A Rainbow Table is a lookup table of hashes to plaintexts, so you can quickly find out what password a user had just from the hash. A rainbow table trades the time to crack a hash for hard disk space, but it takes time to create.
Websites like CrackStation ↗ and Hashes.com ↗ internally use massive rainbow tables to provide fast password cracking for hashes without salts.

To protect against rainbow tables, we add a salt to the passwords.

The salt is a randomly generated value stored in the database and should be unique to each user. In theory, you could use the same salt for all users, but duplicate passwords would still have the same hash and a rainbow table could still be created for passwords with that salt.
The salt is added to either the start or the end of the password before it’s hashed, and this means that every user will have a different password hash even if they have the same password.
Hash functions like Bcrypt and Scrypt handle this automatically. Salts don’t need to be kept private.

Hash Functions#

Hash functions are algorithms that produce a code that can’t be decrypted.

Hash functions have been around since the early days of computing. They were originally created as a way to quickly search for data. Since the beginning, these algorithms have been designed to represent data of any size as small, fixed-size values, or digests. Using a hash table, which is a data structure that’s used to store and reference hash values, these small values became a more secure and efficient way for computers to reference data.

MD5#

One of the earliest hash functions is Message Digest 5, more commonly known as MD5. Professor Ronald Rivest of the Massachusetts Institute of Technology (MIT) developed MD5 in the early 1990s as a way to verify that a file sent over a network matched its source file.
Message Digest 5 (MD5) is a cryptographic hash function that takes any input and produces a 128-bit hexadecimal number. The output of an MD5 hash function is called a digest. MD5 digests are often used to verify the integrity of files or data; however, MD5 is no longer considered secure and should not be used for sensitive applications.
Whether it’s used to convert a single email or the source code of an application, MD5 works by converting data into a 128-bit value. You might recall that a bit is the smallest unit of data measurement on a computer. Bits can either be a 0 or 1. In a computer, bits represent user input in a way that computers can interpret. In a hash table, this appears as a string of 32 characters. Altering anything in the source file generates an entirely new hash value.
Generally, the longer the hash value, the more secure it is. It wasn’t long after MD5’s creation that security practitioners discovered 128-bit digests resulted in a major vulnerability.

Hash collisions#

One of the flaws in MD5 happens to be a characteristic of all hash functions. Hash algorithms map any input, regardless of its length, into a fixed-size value of letters and numbers. What’s the problem with that? Although there are an infinite amount of possible inputs, there’s only a finite set of available outputs!
MD5 values are limited to 32 characters in length. Due to the limited output size, the algorithm is considered to be vulnerable to hash collision, an instance when different inputs produce the same hash value. Because hashes are used for authentication, a hash collision is similar to copying someone’s identity. Attackers can carry out collision attacks to fraudulently impersonate authentic data.

Next-generation hashing#

To avoid the risk of hash collisions, functions that generated longer values were needed. MD5’s shortcomings gave way to a new group of functions known as the Secure Hashing Algorithms, or SHAs.
The National Institute of Standards and Technology (NIST) approves each of these algorithms. Numbers besides each SHA function indicate the size of its hash value in bits. Except for SHA-1, which produces a 160-bit digest, these algorithms are considered to be collision-resistant. However, that doesn’t make them invulnerable to other exploits.
Five functions make up the SHA family of algorithms:

SHA-1
SHA-224
SHA-256
SHA-384
SHA-512

SHA-256#

Secure Hash Algorithm 256 bits (SHA-256) is a cryptographic hash function that takes any input and produces a 256-bit hexadecimal number. SHA-256 is often used to verify the integrity of files or data and to create digital signatures. SHA-256 is considered very secure and is widely used in applications such as Bitcoin and blockchain technology.

HMACs#

HMAC (Keyed-Hash Message Authentication Code) is a type of message authentication code (MAC) that uses a cryptographic hash function in combination with a secret key to verify the authenticity and integrity of data.

An HMAC can be used to ensure that the person who created the HMAC is who they say they are, i.e., authenticity is confirmed; moreover, it proves that the message hasn’t been modified or corrupted, i.e., integrity is maintained. This is achieved through the use of a secret key to prove authenticity and a hashing algorithm to produce a hash and prove integrity.

The following steps give you a fair idea of how HMAC works.

The secret key is padded to the block size of the hash function.
The padded key is XORed with a constant (usually a block of zeros or ones).
The message is hashed using the hash function with the XORed key.
The result from Step 3 is then hashed again with the same hash function but using the padded key XORed with another constant.
The final output is the HMAC value, typically a fixed-size string.

HMAC(K,M) = H((K⊕opad)||H((K⊕ipad)||M))