Bag-of-words is a simplified natural language processing concept.
In language processing, stemming is the process of taking multiple forms of the same word and reducing them to the same basic core form.
Structured data is data that is in a form that can be used to develop statistical or machine learning models (typically a matrix where rows are records and columns are variables or features).
In predictive modeling, a key step is to turn available data (which may come from varied sources and be messy) into an orderly matrix of rows (records to be predicted) and columns (predictor variables or features).
A full Bayesian classifier is a supervised learning technique that assigns a class to a record by finding other records with attributes just like it has, and finding the most prevalent class among them.
In computer science, MapReduce is a procedure that prepares data for parallel processing on multiple computers.