Data center activity traces form a corpus used for machine learning. The data in the corpus are putatively normal but may be tainted with latent anomalies. There is a statistical likelihood that the corpus represents predominately legitimate activity, and this likelihood is exploited to allow for a targeted...http://www.google.com/patents/US7690037?utm_source=gb-gplus-sharePatent US7690037 - Filtering training data for machine learning