To authenticate to a website, you need to provide a username and password. The site then checks the authentication details that you have provided by comparing them with the details it has stored in its database. If the details match, access is granted. If the details don’t match, access is denied.
Unfortunately, data breaches are a relatively common occurrence. Data breaches can be a big problem because one of the bits of data most commonly targeted is the user data, specifically the list of usernames and passwords. If the passwords are just stored as is, in plaintext then anyone with access to the database can then access the account of any other user. It’s as if they were handed a keyring with the key to every door in an apartment building.
While a lot of effort goes into preventing data breaches in the first place, a strategy of defence in depth is recommended. Specifically, security advice holds that passwords should be hashed, with only the hash of the password ever stored. A hash function is a one-way function that always converts the same input into the same output. Even a minor change in input, however, produces an entirely different output. Critically, there is no way to reverse the function and turn the outputted hash back into the original input. What you can do, however, is hash a new input and see if the output matches the stored hash in the database. If it does you know the password matched, without ever knowing the actual password.
Helpfully this also means that if an attacker does breach the database, they don’t get a list of immediately useful passwords, they get hashes instead. To be able to use these hashes they need to be cracked.
Cracking password hashes with smarts
Cracking a password hash is the process of working out what the original password is that the hash represents. Because there’s no way to reverse the hash function and turn the hash into the password. The only way to crack a hash is to guess the password. One method is to use a brute force attack. This literally involves trying every possible password. That means starting from “a”, trying every letter, in both cases, and every number and symbol. Then the attacker needs to try all two-character combinations, three-character combinations, and so on. The increase in possible combinations of characters is exponential each time you add a character. This makes it difficult to efficiently guess long passwords even when fast hashing algorithms are used with powerful GPU cracking rigs.
Some effort can be saved by looking at the password requirements of the site and not trying passwords that would be too short to be allowed or that don’t feature a number, for example. This would save some time and still fits in the class of a brute force attack trying all allowed passwords. Brute force attacks while slow, will – if left long enough with a lot of processing power – eventually crack any password as all possible combinations will be tried.
The problem with brute force attacks is that they’re not very smart. A dictionary attack is a variant that is much more targeted. Instead of just trying any possible password, it tries a list of specified passwords. The success of this type of attack depends on the list of passwords, and the dictionary in question.
Making educated guesses
Password dictionaries typically are built from previously cracked passwords from other data breaches. These dictionaries can contain thousands or millions of entries. This builds on the concept that people are bad at creating unique passwords. Evidence from data breaches does show this to be the case too, unfortunately. People still use variations on the word “password”. Other common topics are sports teams, names of pets, place names, company names, hating your job, and passwords based on the date. This last one specifically tends to happen when people are forced to regularly change their passwords.
Using a password dictionary massively reduces the number of guesses that need to be made in comparison to a brute-force attack. Password dictionaries also tend to contain both short and longer passwords meaning that some passwords might be tried that would not be reached even with years or brute force guessing. The approach proves successful as well. Stats vary based on the data breach and the size and quality of the dictionary used but success rates can exceed 70%.
Success rates can be raised even further with word mangling algorithms. These algorithms take each word in the password dictionary and then modify it a bit. These modifications tend to be standard character replacements and adding trailing numbers or symbols. For example, it’s common for people to replace the letter “e” with a “3” an “s” with a “$” or to add an exclamation mark on the end. Word mangling algorithms create duplicates of each entry in the password dictionary. Each duplicate has a different variation of these character replacements. This significantly increases the number of passwords to guess and also increases the success rate, in some cases above 90%.
Conclusion
A dictionary attack is a targeted variation of a brute force attack. Rather than attempting all possible character combinations, a subset of character combinations is tested. This subset is a list of passwords that have previously been found and if necessary cracked in past data breaches. This massively reduces the number of guesses to be made while covering passwords that have been used before, and in some cases, seen often. A dictionary attack doesn’t have as high a success rate as a brute force attack. That, however, assumes you have unlimited time and processing power. A dictionary attack tends to get a decently high success rate much faster than a brute force attack can. This is because it doesn’t waste time on extremely unlikely combinations of characters.
One of the main things you should do when coming up with a password is to make sure that it wouldn’t appear on a wordlist. One way to do that is to make a complex password, another is to make a long password. Generally, the best option is to make a long password made up of a few words. It’s just important that those words don’t make an actual phrase as that might be guessed. They should be completely unrelated. It’s recommended that you choose a password over 10 characters with 8 as the absolute bare minimum.