Structured data is data that is in a form that can be used to develop statistical or machine learning models (typically a matrix where rows are records and columns are variables or features). Or data that is in a form that can be extracted and turned into such a matrix fairly easily (e.g. database tables). Unstructured data is data, often text data, that is heterogeneous in format and requires considerable pre-processing before it can be used in a model. Examples are tweets, social network profiles and postings, and tech support cases or maintenance requests.
Week #48 – Structured vs. unstructured data
- November 7, 2014
- , 10:07 pm
Structured data is data that is in a form that can be used to develop statistical or machine learning models (typically a matrix where rows are records and columns are variables or features).