Glossary
Each entry includes a definition, why the idea matters, a short example, and links to related terms, guides, and curated external resources.
- Data literacy
The ability to read, interpret, question, and communicate about data in context.
- Dataset
A structured collection of values—often rows and columns—that can be analyzed or visualized.
- Metadata
Data about data: who collected it, when, how, definitions of fields, and limitations.
- Data quality
How well data fits its intended use across dimensions like accuracy, completeness, and timeliness.
- Variable
A measurable attribute that can differ across records or time (for example, age or revenue).
- Observation
A single recorded instance—often one row—representing an entity at a point in time.
- Sample
A subset drawn from a larger population used to estimate characteristics of the whole.
- Population
The full set of individuals, cases, or events a study or dataset aims to describe.
- Bias
Systematic distortion that pushes results away from the truth—through collection, processing, or interpretation.
- Aggregation
Combining many values into summaries such as totals, averages, or rates.
- Distribution
The shape of how values spread—where they cluster, tails, and outliers.
- Outlier
A value unusually far from most other values in a dataset or chart.
- Rate
A ratio that compares counts to a baseline—often per thousand or per hundred thousand.
- Categorical data
Values that represent groups or labels, such as country names or survey responses.
- Numeric data
Quantities measured as numbers, whether discrete counts or continuous measurements.
- Missing data
Fields intentionally or unintentionally left blank or unknown.
- Validation
Checks that data meets expected formats, ranges, and business rules.
- Schema
The planned structure of a dataset: table names, columns, types, and relationships.
- Provenance
The origin and history of a dataset—sources, transformations, and versions.
- Data documentation
Written materials—README files, data dictionaries, methodology notes—that explain how to use data.
- Ethics (data)
Principles for fair, transparent, and privacy-respecting collection and use of data.
- Margin of error
A range expressing sampling uncertainty around an estimate from a survey.
- Visualization
Graphical representations—charts and maps—that encode data values for quick comparison.