Introduction to Data Structures
and Arrays

last updated 2/2/10

Data Types Characteristics

  1. A domain is defined - the set of all possible values.
  2. A set of allowable operations on those values is also defined.
  3. A value is atomic within the domain.

Java refers to or implements these as primitive types.

Data Structures Characteristics

  1. Contains component data items, which may be atomic or another data structure (still a domain)
  2. A set of operations on one or more of the component items
  3. Defines rules as to how components relate to each other and to the structure as a whole (assertions)

Java implements data structures as an object.

 

 


Example: Array

Domain:

  1. a collection of a fixed number of component values, each of the same type, atomic or structured.
  2. a set of index (subscript) values that are nonnegative integers (consecutive).

Structure: There is a 1-1 relationship between each index and an array component (element). Index 0 selects the first element, 1 the second, etc.

Operations: [i] references the array element. Values may be assigned (written or stored) or retrieved (read or returned).

An array differs from a class in the following three ways:

  1. An array is a homogenous structure, whereas classes are heterogeneous structures.
  2. A component of an array is accessed by its position in the structure, whereas a component of a class is accessed by an identifier (the name).
  3. Because array components are accessed by numeric position, an array is a structured composite type.

 

 


A Data Structures Taxonomy

Linear vs Non-linear:

Linear- every component has a unique predecessor and successor, except first and last elements.

Sequential vs Direct:

Sequential - to access the n-th element you must access the preceding n-1 elements

Direct (random) - any element can be accessed without accessing its predecessor or successor

 


The Array as an ADT

From the taxonomy this structure is linear - direct access - homogeneous

In Java we define an array with the following syntax:
type arrayname [ ] = new type [n-elements];
or type [ ] arrayname = ...

Formally the array ADT is defined as

Domain:

  1. a collection of a fixed number of component values, each of the same type (homogeneous), can be atomic or structured. This permits arrays of arrays, arrays of records, etc.
  2. a set of index (subscript) values that have a one-to-one correspondence to nonnegative integers (consecutive).

Structure:

Operations:

Other operations such as remove or insert are not available because there are always values in the array.  They really can never be removed.

 


Implementation

The notation array[i] is implemented as follows

index
[i]
address value
0

5e70 (24176)

10
1 5e74 (24180) -1
2 5e78 (24184) 50
3 5e7c (24188) 100
4 5e80 (24192) 30
5 5e84 (24196) 35

 

 

 


Dynamic versus Static

FORTRAN, COBOL, C/C++, Pascal are languages that support only static arrays; size is preset at compile time.

In these languages the programmer must establish at program writing time, the size of the array.  The array is allocated and the size cannot be changed.  The programmer then also has to track the number of elements actually used and maybe has to include code to verify that the index is not ever too large.

Java and other modern languages support dynamic arrays.  Here arrays are created just as any object can be created.  The size can be optimally determined as needed if at all possible.

int array[];
int size = Integer.parseInt(
           JOptionsPane.showInputDialog("Enter size of array: "));
array = new int[size];
...

However, once established the size remains fixes. If you want to extend or shrink the array, you need to create a second array and copy the references over.  This should be done on a limited basis because of the linear complexity nature of the copy operation.  That is, the cost in time of copying may not really benefit the little bit of memory that you are saving by trying to keep arrays small.

 

 


Physical versus Logical Size

In order to avoid the cost of resizing the array, it usually is beneficial to have an array that is larger than needed (if at some point you need to make it larger, you can) but you track the number of elements needed.

public static final int MAX_SIZE = 100;  //physical size (final says no change permitted)
private Object[] array;     //name of array
private int nObjects;       //logical size
public dataModel(){     //default constructor
    array = new Object[MAX_SIZE];
    nObjects = 0;
}
public dataModel(int size){     // constructor
    if(size<0 || size>MAX_SIZE) size = MAX_SIZE;
    array = new Object[size];
    nObjects = 0;
}

 

 

 


Resizing arrays

Increasing an array by one or decreasing by one requires copying the entire array over to the new sized array.  Very inefficient over time.  In fact, the overall algorithm, if it were meant to be linear would then be quadratic, or that the algorithm complexity is increased by a factor of n.

Generally if you need to increase the size of an existing array, consider increasing it by some factor such as doubling or 50%.  This will result in a complexity factor increase to be logarithmic. (So a linear O(n) program becomes O(n log n)).

//want to add an element to array; check if room
if(nObjects == array.length){
    Object temp [] = new Object[array.length*2];  //double the size
    for(int i=0; i<nObjects; i++){     // copy the old array over
        temp[i] = array[i];
    }
    array = temp;   //keep new array and lose the old (garbage)
}
//now add the element by shifting other elements 
//to make room in the right position

Decreasing the array should be done when the array utilization is substantially smaller than the increasing factor.  What could happen if you decrease by the same factor? 

//want to delete an element
//delete by shifting elements
...
//check if logical size has reached a threshold
if(nObjects <= array.length/4 && nObjects>DEFAULT_SIZE){
     Object temp [] = new Object[array.length/2];
     for(int i=0; i<nObjects; i++){     // copy the old array over
        temp[i] = array[i];
     }
     array = temp;   //keep new array and lose the old
}


Composition

Ways of combining built-in types and classes into versatile hierarchies:

 

 

 

 

 


Example of an Aggregate Object

Suppose we define:

public class Point
   { //record structure -- note the public definition
   public int xValue;
   public int yValue;
   }

Then, we could define a new circle class as:

 Public class NewCircle
   {
   public Point location;
   public float radius;
   public boolean solid;
   }

 


Example of an Array of Objects

NewCircle[] allCircles = new NewCircle[10];
allCIrcles[0] = new NewCircle();
allCircles[0] = myNewCircle;

allCircles[1] = new NewCircle();
allCircles[1].location = new Point();
allCircles[1].location.xValue = 6;
allCircles[1].location.yValue = 6;
allCircles[1].radius = 1.3f;
allCircles[1].solid = false;allCircles Memory layout

 

 

 

 

 

 

 

 

 

 

 

 

 


Example of a Two-Dimensional Array

Two Dimension Array

 

 


Arrays of objects

Arrays of objects are denoted with [ ].  The brackets can be placed either after the type/class or after the array variable.  The name of the array is an array variable.

You need to instantiate the array.  BUT you simply get an array of references that are ready to be assigned to instances of the plain objects.

Circle [] circles;  //declare the array variable 
circles = new Circle[10];  //instantiate an array of references
circles[0] = new Circle (1,1,2);  //populate
circles[1] = new Circle (2,2,3);
Shape shapes[] = new Shape[10];  //declare and instantiate
shapes[0] = new Circle(1,2,3);   //populate
shapes[1] = circles[1];
shapes[2] = new Rectangle(2,3,4,4);
shapes[3] = circles[0];
shapes[4] = new Rectange(3,4,5,1);

 

 

 


Dynamic Binding

Here's the real power of an object oriented language.  You use polymorphic names to represent the same concept or operation across several classes in a hierarchy. 

totalArea = 0;
for(i=0; i<10; i++){//scan whole array
   totalArea = shapes[i].area();
} // there's no need to know which area function is used
for(i=0; i<10; i++){
    System.out.println(shape[i]);
}
 

 

 


Class Type Control

There are times you need to

To test for a subclass, use the instanceof operator.

BUT the compiler wants to verify that operations will be legal.  You need to help the compiler know the operation will be legitimate.  Use type casting as in (ClassName)variable

if(shapes[i] instanceof Circle){
    System.out.println("Circle "+(i+1)+"has radius "+
                   ((Circle)shapes[i]).getRadius()+"\n");
}

 

 

 


The Record as an ADT

The Java class/object really implements this implicitly with objects and instance variables.

From the taxonomy this structure is linear - direct access - heterogeneous.

The components of a record are usually called fields each referenced by a fieldname.

Domain:

Each record consists of

  1. a collection of a fixed number of component (field) values; the collection elements are heterogeneous.
  2. a set of identifiers (field names) for selecting each component.

Structure:

Operations: