Week #17 – Corpus

A corpus is a body of documents to be used in a text mining task.  Some corpuses are standard public collections of documents that are commonly used to benchmark and tune new text mining algorithms.  More typically, the corpus is a body of documents for a specific text mining task – e.g. a set of maintenance tickets, or a group of discovery documents in a legal case, for which a classification model is needed.