Linguistic Cracking of Passphrases using Markov Chains

Abstract or Summary:
In order to remember long passwords, it is not uncommon users are recommended to create a sentence which then is assembled to form a long password, a passphrase. However, theoretically a language is very limited and predictable, why a linguistically correct passphrase according to Shannon's definition of information theory should be relatively easy to crack compared to brute-force.

This work focuses on cracking linguistically correct passphrases, partly to determine to what extent it is advisable to base a password policy on such phrases for protection of data, and partly because today, widely available effective methods to crack passwords based on phrases are missing.

Within this work, phrases were generated for further processing by available cracking applications, and the language of the phrases were modeled using a Markov process. In this process, phrases were built up by using the number of observed instances of subsequent characters or words in a source text, known as n-grams, to determine the possible/probable next character/word in the phrases.

The work shows that by creating models of language, linguistically correct passphrases can be broken in a practical way compared to an exhaustive search. In the tests, passphrases consisting of up to 20 characters were broken.crack compared to brute-force.

This work focuses on cracking linguistically correct passphrases, partly to determine to what extent it is advisable to base a password policy on such phrases for protection of data, and partly because today, widely available effective methods to crack passwords based on phrases are missing.

Within this work, phrases were generated for further processing by available cracking applications, and the language of the phrases were modeled using a Markov process. In this process, phrases were built up by using the number of observed instances of subsequent characters or words in a source text, known as n-grams, to determine the possible/probable next character/word in the phrases.

The work shows that by creating models of language, linguistically correct passphrases can be broken in a practical way compared to an exhaustive search. In the tests, passphrases consisting of up to 20 characters were broken.

PasswordResearch.com Note: Video of presentation: https://www.youtube.com/watch?v=6-vrumB0UUw Link to software called Phraser related to this project: https://github.com/sparell/phraser

<-- Back to Authentication Research Paper Index