Savoy, Jacques; Zubaryeva, Olena - In: Computational Management Science 9 (2012) 3, pp. 401-415
Assuming a binomial distribution for word occurrence, we propose computing a standardized Z score to define the specific vocabulary of a subset compared to that of the entire corpus. This approach is applied to weight terms (character n-gram, word, stem, lemma or sequence of them) which...