# Data Types of Values

last updated 6/23/21

There are many ways to represent values visually. We also need to recognize the differering types of values.

There are an abundance of terms from computer science, statistics and application areas that have similar meaning.

### What is data?

A data set is a collection of objects and their attributes

An attribute is a property or characteristic of an object

• Examples: eye color of a person, temperature, cost, etc.
• Attributes are also known as variables, fields, characteristics, dimensions, or features

A collection of attributes describe an object

• Objects are also known as records, points, cases, samples, entities, or instances

### Attributes and attribute values

Attribute values are numbers or symbols assigned to an attribute for a particular object

Distinction between attributes and attribute values

• Same attribute can be mapped to different attribute values
• Example: height can be measured in feet or meters

Different attributes can be mapped to the same set of values

• Example: Attribute values for ID and age are integers
• But properties of attribute values can be different

### Other considerations

• Data may have structure.
• Objects may have relationships to other objects, same or different objects
• Data may be incomplete.

How you measure an attribute may not match the properties of the attribute.

## Simple data types

You need to understand your data to be able to determine what type they are.

### Nominal:

Discrete values, often non-numeric

Properties: distinctness (= != are meaningful)

You further can characterize nominal attributes:

• categorical -- a value selected from a finite, usually short, list of possibilities (colors, days of week); can be coded as an enumeration
• ranked -- a categorical type with natural ordering (small, medium, large) so (= != < > are meaningful)
• artibrary -- a value from an infinite range of possibilities with no implied ordering (addresses, names)

Examples include eye color, postal codes, id numbers, sex(male|female)

### Ordinal:

Quantitative in nature; numeric codings may not necessarily be ordinal (quantitative). Further analysis.

Properties: distinctness and order, (= != < > are meaningful)

You further can characterize ordinal attributes:

• continuous, what are the upper and lower limits, and inclusive of the limits?
• binary, only the values 0 and 1 (true/false, yes/no, etc.) -- can also consider this nominal/categorical
• discrete (integer or real), are the values separated by a constant value/interval?
• statistical (counts, means, medians, modes, standard deviations) -- these arise from ordinal data.

Examples: rankings (e.g., taste of potato chips on a scale from 1-10), grades, height {3=tall, 2=medium, 1=short}, street numbers

### Interval

Data values are separated by fixed amount(s).

Properties: distinctness, order and differences, (= != < > + - are meaningful)

Examples: calendar dates, temperatures in Celsius or Fahrenheit

### Ratio

Properties: distinctness, order, differences and ratios (= != < > + - * / are meaningful)

Examples: temperature in Kelvin, length, time, counts, mass

### Differences between intervals and ratios

Is it physically meaningful to say that a temperature of 10 ° is twice that of 5° on

• the Celsius scale?
• the Fahrenheit scale?
• the Kelvin scale?

Consider measuring the height above average

• If Bill’s height is three inches above average and Bob’s height is six inches above average, then would we say that Bob is twice as tall as Bill?
• Is this situation analogous to that of temperature
Nominal Any permutation of values If all employee ID numbers were reassigned, would it make any difference?
Ordinal An order preserving change of values, i.e., new_value = f(old_value)
where f is a monotonic function
An attribute encompassing the notion of good, better best can be represented equally well by the values {1, 2, 3}
or by { 0.5, 1, 10}.
Interval new_value = a * old_value + b
where a and b are constants
Thus, the Fahrenheit and Celsius temperature scales differ in terms of where their zero value is and the size of a unit (degree).
Ratio new_value = a * old_value Length can be measured in meters or feet.

## Asymmetric Attributes

Only presence (a non-zero attribute value) is regarded as important. Its absence is little information.

• Words present in documents
• Items present in customer transactions

If we met a friend in the grocery store would we ever say the following? “I see our purchases are very similar since we didn’t buy most of the same things.”

We need two asymmetric binary attributes to represent one ordinary binary attribute

• Association analysis uses asymmetric attributes

Asymmetric attributes typically arise from objects that are sets

## Other common data types

Geometry or spatial, contains 2 or 3 values (lat, long, alt) that together may be treated as a single dimension. Some mathematical geometries can be considered: cartesian or spherical (GIS).

Timestamp or temporal, chronological types.

Topology or relationship connectivity.

## Other Considerations

Incompleteness

• Asymmetric binary--only non-zero is significant
• Cyclical--e.g. days of the week
• Multivariate -- data item more complex
• Partially ordered
• Partial membership
• Relationships between the data

Real data is approximate and noisy

• This can complicate recognition of the proper attribute type
• Treating one attribute type as another may be approximately correct

The types of operations you choose should be “meaningful” for the type of data you have

• Distinctness, order, meaningful intervals, and meaningful ratios are only four properties of data
• The data type you see – often numbers or strings – may not capture all the properties or may suggest properties that are not there
• Analysis may depend on these other properties of the data
• Many statistical analyses depend only on the distribution
• Many times what is meaningful is measured by statistical significance
• But in the end, what is meaningful is measured by the domain

## Records and structure within data

Data sets are organized typically as sets of records, where a record represents a data observation or data "point".

Records that have one value are univariate; having two values are bivariate; three are trivariate; more are hypervariate.

A value itself may have structure:

• (x,y) point or (x,y,z) point
• latitude-longitude
• vector, or a sub-list of values
• etc.

Record

• Data Matrix
• Document Data
• Transaction Data

Graph

• World Wide Web
• Molecular Structures

Ordered

• Spatial Data
• Temporal Data
• Sequential Data
• Genetic Sequence Data

## Characteristics of data sets

### Dimension

The number of values per record/observation is its dimension. Dimension should be consistent across all records of a data set.

A town center on a map has latitude, longitude, altitude, square miles, population, name, postal code, state, country as elements of its record. Its dimension, in this example, is 9.

A high dimension can pose challenges

### Sparsity

Only presence counts.

And then there's the issue of missing data

### Resolution

Patterns depend on the scale.

### Size (N of records)

may also drive type of analysis

## Example data sets

### Record data

What is the dimension?

What are the types of the attributes?

### Data Matrix

If data objects have the same fixed set of numeric attributes, then the data objects can be thought of as points in a multi-dimensional space, where each dimension represents a distinct attribute.

Such data set can be represented by an m by n matrix, where there are m rows, one for each object, and n columns, one for each attribute.

A better example is rainfall data

### Document data

Each document becomes a ‘term’ vector

• Each term is a component (attribute) of the vector
• The value of each component is the number of times the corresponding term occurs in the document

### Transactional data

A special type of record data, where

• Each record (transaction) involves a set of items.
• For example, consider a grocery store. The set of products purchased by a customer during one shopping trip constitute a transaction, while the individual products that were purchased are the items

### Graph data

General directed graph, a molecule

Webpage connections

Genomic sequence

### Spatio-temporal data

Average Monthly Temperature of land and ocean

## Summary

Types of data
General category Specific type Description Examples Coding
Nominal "non-numeric",
discrete values
Categorical a value selected from a finite, usually short, list of possibilities color, days of week enumeration or arbitrary numbers; only equality tests are sensible
Ranked a categorical type with an implied ordering (can be converted to ordinal and ordinal can be converted to ranked nominal) small, medium, large numbers, according to the order
Arbitrary a value from an infinite range of possibilities with no implied ordering addresses, names no coding possible; only equality
Binary Boolean two distinct categories yes/no true/false 0/1

Ordinal "numeric"

interval?/ratio?

Continuous any real value between upper and lower limits weights, lengths typically a float variable type
Discrete values separated by a constant value (1, 10, 0.5) counts typically an integer variable type
Statistical values calculated from a set of ordinal values counts, means, medians, modes, st.dev. typically float,
counts may be integer
Spatial Geographical location on a map or plane or 3D space longitude, latitude pairs of values
Temporal Chronological times, dates, numeric sequences birthdates, daily, hourly observations integers, time, float
Topological Connectivity (Relational) relationship mappings hierarchies, graphs, di-graphs foreign keys, cross-referencing values