A system and method for identifying language attributes through probabilistic analysis is described. A set of language classes and a plurality of training documents are defined, Each language class identifies a language and a character set encoding. Occurrences of one or more document properties within...http://www.google.com/patents/US7386438?utm_source=gb-gplus-sharePatent US7386438 - Identifying language attributes through probabilistic analysis