Skip to Main Content

Data and Statistics: Terminology and Definitions: Data and Statistics Terminology and Definitions

Defines terms relevant to using data and statistics.

Access to Data Planet

Sage Data is available via IP (and proxy server) authentication at https://data.sagepub.com/. The Sage Data interface allows users to browse available datasets by subject and source and to manipulate variables to create customized views of the data, as well as to search for statistics of interest via Search.

(EZproxy users, please visit here.)

Sage Data

 

 

ATTN SAGE DATA SUBSCRIBERS: Be sure you are logged into your library's network in order to access the Sage Data database from the links provided in these guides.

 Questions, comments, suggestions, technical support requests? Contact us at:

Emailonlinesupport@sagepub.co.uk

Website: https://data.sagepub.com/

 

 

ATTN Library Administrators: Please feel free to re-use and customize Sage Data guides for your institution.

Data and Statistics: Terminology and Examples

Terminology Basics 

Below you will find simple definitions of the basic terminology associated with data and statistics. From the examples below you can link into Data Planet Statistical Datasets to explore the millions of datasets available in the repository.

 

Data Planet publishes aggregated secondary datasets:

Secondary means that the data are collected by source organizations other than Data Planet. Secondary data are contrasted with primary datasets, which refer to data that researchers have collected themselves.

Aggregated means simply that the datasets are a collection of summary data, vs microdata, which refer to the individual response items in surveys and other data collection instruments.

Data: Fundamentally, data=information. We typically use the term to refer to numeric files that are created and organized for analysis. There are two types of data: aggregate and microdata.

  • Aggregate data are statistical summaries of data, meaning that the data have been analyzed in some way.  The Data Planet repository is an excellent resource for obtaining aggregated data. 
  • Microdata: Individual response data obtained in surveys and censuses - these are data points directly observed or collected from a specific unit of observation. Also known as raw data. ICPSR is an excellent resource for obtaining microdata files.

Data point or datum: Singular of data. Refers to a single point of data. Example: 25,114 billion BTU of aviation gasoline was consumed by the transportation sector in the US in 2012

Quantitative data/variables: Information that can be handled numerically. Example: spending by US consumers on personal care products and services

Qualitative data/variables: Information that refers to the quality of something. Ethnographic research, participant observation, open-ended interviews, etc., may collect qualitative data. However, often there is some element of the results obtained via qualitative research that can be handled numerically, eg, how many observations, number of interviews conducted, etc. Qualitative variables to categorical or nominal data, which differentiates responses by classes or categories, eg, by gender. 

Indicator: Typically used as a synonym for statistics that describe something about the socioeconomic environment of a society, eg, per capita income, unemployment rate, median years of education.

Statistic: A number that describes some characteristic, or status, of a variable, eg, a count or a percentage. Example: total nonfarm job starts in August 2014

Statistics: Numerical summaries of data that has been analyzed in some way. Example: ranking of airlines by percentage of flights arriving on-time into Huntsville International Airport in Alabama in 2013

Time series data: Any data arranged in chronological order. Example: Gross Domestic Product of Greece, 2000-2013

Variable: Any finding that can change or vary. Examples include anything that can be measured, such as  the number of logging operations in Alabama.

  • Numerical variable: Usually referring to a variable whose possible values are numbers. Example: Bank Prime Loan Rate
  • Categorical variable: A variable that distinguishes among subjects by putting them in categories (eg, gender). Also called discrete or nominal variables. Example: Female vs Male Infant Mortality Rate of Belarus (the mortality rate is numerical - the age and gender characteristic is categorical)

 

Terminology Used with Collections of Data

Data aggregation: A collection of datapoints and datasets. Example: a search on the broad category "higher education" in Data Planet retrieves results from a collection of  sources. 

Dataset: A collection of related data items, eg, the responses of survey participants. This term is used very loosely – the entire Census 2010 Summary File 1 can be considered a dataset as can any individual table published in the Census 2010 Summary File 1, eg, Table P20. Households by Presence of People Under 18 Years by Household Type by Age of People Under 18 Years

Database: A collection of data organized for research and retrieval. Example: American Community Survey.

Time series: A set of measures of a single variable recorded over a period of time. Example: Hourly Mean Earnings of Civilian Workers – Mining Management, Professional, and Related Workers

 

"Big Data" Terminology

Big data: A popular term used across academia, industry, and other arenas to describe the increased availability of all types of data. Big data is typically described as being huge in volume, high in velocity (how fast it is created, and diverse in variety. 

Data analytics: Generally used to refer to the analytical techniques and tools required to analyze massive amounts of data.                                                                                                                                                                                                                                                                                                                                                                                                                                                             

Definition References:

Cramer, D., & Howitt, D. (2004). The SAGE dictionary of statistics (Vols. 1-0). SAGE Publications, Ltd. https://doi.org/10.4135/9780857020123  

Herzog, D. (2015). Data literacy: A user’s guide. SAGE Publications, Inc. https://doi.org/10.4135/9781483399966  

Kitchin, R. (2014). The data revolution: Big data, open data, data infrastructures & their consequences. SAGE Publications Ltd. https://doi.org/10.4135/9781473909472     

Vogt, W. P. (2005). Dictionary of statistics & methodology.SAGE Publications, Inc. https://doi.org/10.4135/9781412983907  

What Datasets Are Included in Data Planet Statistical Datasets?

To see which datasets are included in Data Planet, go to the Data Planet home page and view the "Browse Data by" section to see which Subjects and Sources are available to begin your search.  

Subject Browse:

Browse By Subject Menu

Source Browse:

Browse By Source Menu

All datasets are also visible within the "Datasets" page using tabbed browsing via the left-hand navigation:

Left-Hand Navigation

If your institution subscribes to any premium datasets, they will appear in the left-hand navigation on the "Featured" tab, as well as within the appropriate Subject and Source tab listings.. 

For a full listing of datasets available in Data Planet Statistical Datasets, and the sources of these datasets, see Data Planet Datasets and Sources.

Contact support