Distributed Databases

last updated 11/18/12

Distributed Database Systems

Advantages of Distribution --by design

Compared to a single, centralized system that provides remote access, distributed system advantages are

Disadvantages (more likely scenario)

 


Architecture Design Considerations

Factors the designer considers in choosing an architecture for a distributed system

Architecture Alternatives

 

 

 

 


Types of Distributed Systems

Homogeneous
Heterogeneous
  • All nodes use the same hardware and software
  • Nodes have different hardware or software
  • Require translations of codes and word lengths due to hardware differences
  • Translation of data models and data structures due to software differences
  • XML can help here

 


Software Components of DDBMS

Data communications component (DC)

Local database management system (DBMS)

Global data dictionary (GDD)

Distributed database management system component (DDBMS)

Not all sites necessarily have all these components

 

 

 

 


DDBMS Functions

Provide the user interface--needed for location transparency

Locate the data--directs queries to proper site(s)

Process queries--local, remote, compound (global)

Provide network-wide concurrency control and recovery procedures

Provide data translation in heterogeneous systems

 

 


Data Placement Alternatives

Centralized-- all data at one site only

Replicated-- all data duplicated at all sites; read one, write all

Partitioned

Hybrid-- Combination of the others (some centralized, some partitioned with replication)

Factors in data placement decisions

 


Types of Transparency

Data distribution transparency -- user is not aware of how the data is distributed

DBMS heterogeneity transparency -- user is not aware of the different databases and architectures

Transaction transparency

Performance transparency

 

 


Transaction Management for DDBMS

Each site that initiates transactions has a transaction coordinator to manage transactions that originate there

Additional concurrency control problem for distributed databases: multiple-copy inconsistency problem

 


Locking Protocols

Extension of two-phase locking protocol for transaction processing

Single-site lock manager

Distributed lock manager

Primary copy

Majority locking

 


Global Deadlock Detection

Each site has local wait-for graph-detects only local deadlock

Need global wait-for graph

 


Timestamping Protocols

One site could issue all timestamps

Instead, multiple sites could issue them

 


Recovery from failures

Must guarantee atomicity and durability of transactions

Failures include usual types, plus

Network partitioning

Handling Node Failure

 

 


Commit Protocols

Two-phase commit protocol

Implemented as an exchange of messages between the coordinator and the cohorts

Phase 1:
Prepare message
(coordinator to cohort)
  • If cohort wants to abort, it aborts
  • If cohort wants to commit, it moves all update log records to non-volatile store and forces a prepared record to its log
  • Cohort sends a (ready or aborting) vote message to coordinator
Phase 1:
Vote message
(cohort to coordinator): Cohort indicates ready to commit or aborting.
  • If any are aborting, coordinator decides abort
  • If all are ready, coordinator decides commit and forces commit record to its log
  • Coordinator sends commit/abort message to all cohorts that voted ready
Phase 2:
Commit/abort message
(coordinator to cohort):
  • Cohort commits locally by forcing a commit record to its log. Or, if abort message, it aborts
  • Cohort sends done message to coordinator
Phase 2:
Done message
(cohort to coordinator):
  • When coordinator receives done message from all cohorts, it writes a complete record to its log

 

 


Distributed Query Processing

Queries can be

Must consider cost of transferring data between sites

Semijoin operation is sometimes used when a join of data stored at different sites is required

Steps in Distributed Query Processing

  1. Accept user’s request
  2. Check request’s validity
  3. Check user’s authorization
  4. Map external to logical level
  5. Determine request processing strategy
  6. For heterogeneous system, translate query to DML of data node(s)
  7. Encrypt request
  8. Determine routing (job of Data Communications component)
  9. Transmit messages (job of DC component)
  10. Each node decrypts its request
  11. Do update synchronization
  12. Each data node’s local DBMS does its processing
  13. Nodes translate data, if heterogeneous system
  14. Nodes send results to requesting node or other destination node
  15. Consolidate results, edit, format results
  16. Return results to user