A Large-Scale Empirical Analysis of Chinese Web Passwords
Author(s): Zhigong Li, Weili Han, Wenyuan Xu

Date: August 2014
Publication: 23rd USENIX Security Symposium, SEC '14
Publisher: USENIX
Source 1: https://www.usenix.org/system/files/conference/usenixsecurity14/sec14-paper-li-zhigong.pdf

Users speaking different languages may prefer different patterns in creating their passwords, and thus knowledge on English passwords cannot help to guess passwords from other languages well. Research has already shown Chinese passwords are one of the most difficult ones to guess. We believe that the conclusion is biased because, to the best of our knowledge, little empirical study has examined regional differences of passwords on a large scale, especially on Chinese passwords. In this paper, we study the differences between passwords from Chinese and English speaking users, leveraging over 100 million leaked and publicly available passwords from Chinese and international websites in recent years. We found that Chinese prefer digits when composing their passwords while English users prefer letters, especially lowercase letters. However, their strength against password guessing is similar. Second, we observe that both users prefer to use the patterns that they are familiar with, e.g., Chinese Pinyins for Chinese and English words for English users. Third, we observe that both Chinese and English users prefer their conventional format when they use dates to construct passwords. Based on these observations, we improve a PCFG (Probabilistic Context-Free Grammar) based password guessing method by inserting Pinyins (about 2.3% more entries) into the attack dictionary and insert our observed composition rules into the guessing rule set. As a result, our experiments show that the efficiency of password guessing increases by 34%.

PasswordResearch.com Note: Video and audio recordings of paper presentation available here: https://www.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/li_zhigong

