Skip to content

Week #48 – Structured vs. unstructured data

Structured data is data that is in a form that can be used to develop statistical or machine learning models (typically a matrix where rows are records and columns are variables or features).  Or data that is in a form that can be extracted and turned into such a matrix fairly easily (e.g. database tables).  Unstructured data is data, often text data, that is heterogeneous in format and requires considerable pre-processing before it can be used in a model.   Examples are tweets, social network profiles and postings, and tech support cases or maintenance requests.