Data Types of Values

DM 352 Syllabus | DM 552 Syllabus

last updated 6/23/21

There are many ways to represent values visually. We also need to recognize the differering types of values.

There are an abundance of terms from computer science, statistics and application areas that have similar meaning.

What is data?

A data set is a collection of objects and their attributes

An attribute is a property or characteristic of an object

A collection of attributes describe an object

Attributes and attribute values

Attribute values are numbers or symbols assigned to an attribute for a particular object

Distinction between attributes and attribute values

Different attributes can be mapped to the same set of values

 

Other considerations

How you measure an attribute may not match the properties of the attribute.


Simple data types

You need to understand your data to be able to determine what type they are.

Nominal:

Discrete values, often non-numeric

Properties: distinctness (= != are meaningful)

You further can characterize nominal attributes:

Examples include eye color, postal codes, id numbers, sex(male|female)

Ordinal:

Quantitative in nature; numeric codings may not necessarily be ordinal (quantitative). Further analysis.

Properties: distinctness and order, (= != < > are meaningful)

You further can characterize ordinal attributes:

Examples: rankings (e.g., taste of potato chips on a scale from 1-10), grades, height {3=tall, 2=medium, 1=short}, street numbers

Interval

Data values are separated by fixed amount(s).

Properties: distinctness, order and differences, (= != < > + - are meaningful)

Examples: calendar dates, temperatures in Celsius or Fahrenheit

Ratio

Properties: distinctness, order, differences and ratios (= != < > + - * / are meaningful)

Examples: temperature in Kelvin, length, time, counts, mass

Differences between intervals and ratios

Is it physically meaningful to say that a temperature of 10 ° is twice that of 5° on

Consider measuring the height above average

Attribute Type Transformation Comments
Nominal Any permutation of values If all employee ID numbers were reassigned, would it make any difference?
Ordinal An order preserving change of values, i.e., new_value = f(old_value)
where f is a monotonic function
An attribute encompassing the notion of good, better best can be represented equally well by the values {1, 2, 3}
or by { 0.5, 1, 10}.
Interval new_value = a * old_value + b
where a and b are constants
Thus, the Fahrenheit and Celsius temperature scales differ in terms of where their zero value is and the size of a unit (degree).
Ratio new_value = a * old_value Length can be measured in meters or feet.

This categorization of attributes is due to S. S. Stevens

 


Asymmetric Attributes

Only presence (a non-zero attribute value) is regarded as important. Its absence is little information.

If we met a friend in the grocery store would we ever say the following? “I see our purchases are very similar since we didn’t buy most of the same things.”

We need two asymmetric binary attributes to represent one ordinary binary attribute

Asymmetric attributes typically arise from objects that are sets

 


Other common data types

Geometry or spatial, contains 2 or 3 values (lat, long, alt) that together may be treated as a single dimension. Some mathematical geometries can be considered: cartesian or spherical (GIS).

Timestamp or temporal, chronological types.

Topology or relationship connectivity.

 

 


Other Considerations

Incompleteness

Real data is approximate and noisy

The types of operations you choose should be “meaningful” for the type of data you have

 


Records and structure within data

Data sets are organized typically as sets of records, where a record represents a data observation or data "point".

Records that have one value are univariate; having two values are bivariate; three are trivariate; more are hypervariate.

A value itself may have structure:

Record

Graph

Ordered


Characteristics of data sets

Dimension

The number of values per record/observation is its dimension. Dimension should be consistent across all records of a data set.

A town center on a map has latitude, longitude, altitude, square miles, population, name, postal code, state, country as elements of its record. Its dimension, in this example, is 9.

A high dimension can pose challenges

Sparsity

Only presence counts.

And then there's the issue of missing data

Resolution

Patterns depend on the scale.

Have data been already aggregated?

Size (N of records)

may also drive type of analysis


Example data sets

Record data

What is the dimension?

What are the types of the attributes?

 

Data Matrix

If data objects have the same fixed set of numeric attributes, then the data objects can be thought of as points in a multi-dimensional space, where each dimension represents a distinct attribute.

Such data set can be represented by an m by n matrix, where there are m rows, one for each object, and n columns, one for each attribute.

A better example is rainfall data

Document data

Each document becomes a ‘term’ vector

 

Transactional data

A special type of record data, where

Graph data

General directed graph, a molecule

 

Webpage connections

Ordered data

Genomic sequence

Spatio-temporal data

Average Monthly Temperature of land and ocean

 


Summary

Types of data
General category Specific type Description Examples Coding
Nominal "non-numeric",
discrete values
Categorical a value selected from a finite, usually short, list of possibilities color, days of week enumeration or arbitrary numbers; only equality tests are sensible
Ranked a categorical type with an implied ordering (can be converted to ordinal and ordinal can be converted to ranked nominal) small, medium, large numbers, according to the order
Arbitrary a value from an infinite range of possibilities with no implied ordering addresses, names no coding possible; only equality
Binary Boolean two distinct categories yes/no true/false 0/1

Ordinal "numeric"

interval?/ratio?

Continuous any real value between upper and lower limits weights, lengths typically a float variable type
Discrete values separated by a constant value (1, 10, 0.5) counts typically an integer variable type
Statistical values calculated from a set of ordinal values counts, means, medians, modes, st.dev. typically float,
counts may be integer
Spatial Geographical location on a map or plane or 3D space longitude, latitude pairs of values
Temporal Chronological times, dates, numeric sequences birthdates, daily, hourly observations integers, time, float
Topological Connectivity (Relational) relationship mappings hierarchies, graphs, di-graphs foreign keys, cross-referencing values