Testing and Validating the Cosine Similarity Measure for Textual Analysis in Accounting
Textual similarity has drawn much attention in the recent literature of accounting and related fields. There has been, however, limited work to systematically test and validate its measures. In this paper I conduct three incremental studies to comprehensively test and validate the commonly used cosine similarity (COS) method. The results suggest that the 5-gram COS measure meets the requirements of reliability and validity, and hence it is a viable alternative to the commonly used 1-gram measure for assessing textual similarity