database, a collection of related data and a data base management system (DBMS) to be the software that manages and controls access to the database
meta data, a complete definition or descripton of the database structure and constrains
entity, a distinct object in the organization that is to be represented in the database
attribute, a property that describes some aspect of the object
relationship, an association between entities
DBMS, a software system that enables users to define, create, maintain, and control access to the database
- security system
- integrity system
- concurrency control system
- recovery control system
- user-accessible catalog
Data Definition Language (DDL), specify the data types and structures and the contraints on the data to be stored in the database
Data Manipulation Language (DML), insert, update, delete, and retrieve data from the database
Structured Query Language (SQL), the formal standard language for relational DBMSs
application programs, a computer program that interacts with the database by issuing an appropriate request to the DBMS
view, dynamic result of one or more relational operations operating on base relations to produce another relation, allows each user to have his or her own view of the database
- a virtual relation that does not necessarily actually exist in the database but is produced upon request, at time of request
- are defined as a query on one or more base relations
- changes made to base relations that affect view attributes are immediately reflected in the view
- provides a level of security
- provides a mechanism to customize the appearance of the database
- present a consistent, unchanging picture of the structure of the database
- simplify the complex operations on base relations
- all updates to a base relation should be immediately reflected in all views that reference that base relation
- if view is updated, underlying base relation should reflect change
- Updates are allowed if query involves a single base relation and contains a candidate key of base relation
- Updates are not allowed involving multiple base relations
- Updates are not allowed involving aggregation or grouping operations
schema, the structure of the database
data warehouses, store data drawn from several data sources
database integrity, the validity and consistency of stored data
Advantage of database
- control of data redundancy
- data consistency
- more information from the same amount of data
- sharing of data
- improved data integrity
- improved security
- enforcement of standards
- economy of scale
- balance of conflicting requirements
- improved data accessibility and responsiveness
- increase productivity
- improved mantenance through data independence
- increased concurrency
- improved backup and recovery services
Disadvantages of database
- complexity
- size
- cost of DBMSs
- additional hardware costs
- cost of conversion
- performance
- greater impact of a failure
Three-level architecture
- external level, the way users perceive the data, describe that part of the database that is relevant to each user
- conceptual level, describe what data is stored in the database and the relationships among the data
- all entities, attributes, relationships
- constraints
- semantic information about the data
- security and integrity information
- internal level, describe how the data is physically stored in the database
- storage space allocation for data and indexes
- record descriptions for storage
- record placement
- data compression and data encryption techniques
- isolation, each transaction appears to execute in isolation
- atomicity, all the database operations in a transaction are executed or none are
procedural DML, a language that allows the user to tell the system what data is needed and exactly how to retrieve the data
nonprocedural DML, a language that allows the user to state what data is needed rather than how it is to be retrieved
data model, an integrated collection of concepts for describing and manipulating data, relationships between data, and constraints on the data in an organization
- structural part, consisting of a set of rules according to which databases can be constructed
- manipulative part, defining the types of operation that are allowed on the data
- a set of integrity constraints, ensures that the data is accurate
Functions of a DBMS
- Data Storage, Retrieval, and Update
- A User-Accessible Catalog
- transaction, an executing program or process including one or more database access
- Concurrency Control Services
- Recovery Services
- Authorization Services
- Support for Data Communication
- Integrity Services
- Services to Promote Data Independence
- Utility Services
relations. tables
- Relation name is distinct from all other relation names in relational schema
- Each cell of relation contains exactly one atomic (single) value
- Each attribute has a distinct name
- Values of an attribute are all from the same domain
- Each tuple is distinct; there are no duplicate tuples
- Order of attributes has no significance
- Order of tuples has no significance, theoretically
attributes, columns
domain, the set of allowed values for one or more attributes
degree, the number of attributes in a relation
cardinality, the number of tuples in a relation
tuple, row, tuples in a relation do not have any particular order
NULL, value unknown, value exists but is not available, or attribute does not apply to this tuple
relational Database, a collection of normalized relations with distinct relation names
key, identify each entity uniquely
- super key, a set of attributes that keep a tuple unique, every relation has at least one default superkey - the set of all its attributes
- key, minimal superkey, two distinct tuples cannot have identical values for the atrributes in the key
- candidate key, there may have more than one key, each of the keys is called a candidate key
- primary key, set one of candidate keys as the primary key
- alternate keys, candidate keys that are not selected to be primary key
- foreign key, attributes in R1 reference the primary key of R2
Integrity constraints
- Entity integrity, no attribute of a primary key can be null
- Referential integrity, if foreign key exists in a relation, either foreign key value must match a candidate key value of some tuple in its home relation or foreign key value must be wholly null
- General constraints, additional rules specified by users or database administrators that define or constrain some aspect of the enterprise
Relational algebra, how to retrieve it
- Selection, =, <, ≤, >, ≥, ≠, connected by and, or, not
- Projection
- Union, two relations must be union-compatible
- Set difference, two relations must be union-compatible
- Intersection, two relations must be union-compatible
- Cartesian product
- Join
- θ join
- Natural join
- Out join
- Left out join
- Right out join
- Semi join
- Aggregate operations
- Grouping operations
Relational calculus, what to retrieve
SQL operations
- create database and relation structures
- CREATE SCHEMA
- CREATE TABLE tableName (
fieldName fieldType constrain DEFAULT defaultValue CHECK condition,
...,
CONSTRAINT constrainName
PRIMARY KEY (fieldName ...),
CONSTRAINT constrainName
FOREIGN KEY (fieldName) REFERENCES tableName(fieldName) ON DELETE set ... ON UPDATE CASCADE,
CONSTRAINT constrainName
UNIQUE(fieldName))
- CREATE ASSERTION constrainName
CHECK(conditions ...)
- CREATE VIEW viewName
AS ...
- CREATE INDEX
- drop
- DROP SCHEMA schemaName CASCADE, drop a database
- DROP TABLE tableName CASCADE, drop a table
- alter
- ALTER TABLE tableName ADD COLUMN fieldName fieldType
- ALTER TABLE tableName DROP COLUMN fieldName CASCADE
- ALTER TABLE tableName ALTER COLUMN operation ...
- data types
- Numeric
- INTEGER, INT, and SMALLINT
- FLAOT, REAL, and DOUBLE PRECISION
- DECIMAL(i, j), NUMERIC(i, j)
- Character-string
- Bit-string
- Boolean
- DATE, TIME
- timestamp
- INTERVAL
- Large objects
- BLOB, a binary string that does not have a character set or collation association
- CLOB, NCLOB, character strings
- perform insertion, modification, deletion of data from relations
- INSERT INTO tableName VALUES (value, ...)
- UPDATE tableName
SET setup
WHERE condition
- DELETE tableName
WHERE condition
- perform simple and complex queries
- SELECT fieldName ...
FROM tableName
WHERE conditions
GROUP BY fieldName ...
HAVING condition ...
ORDER BY fieldName DESC, fieldName ASC fieldName ...
- DISTINCT, remove duplicates
- AS, rename field name
- BETWEEN, range search
- IN, list
- LIKE
- %, replaces an arbitrary number of zero or more characters
- _, replaces a single character
- IS NULL, IS NOT NULL
- ANY, ALL
- join
- =
- LEFT JOIN, RIGHT JOIN, FULL JOIN
- EXIST and NOT EXIST, are for use only with subqueries
- UNION, EXCEPT, INTERSECT
SQL data
- Required data, NOT NULL
- Domain constraints, CHECK
- Entity, PRIMARY KEY
- Referential integrity, FOREIGN KEY ... REFERENCES
- General constraints
- CREATE ASSERTION, contraintName
CHECK searchCondition
- environment > catalogs > schemas > objects
Privileges
- GRANT, privilegeList
ON objectName
TO authorizationIDList
WITH GRANT OPTION
- privilege list
- SELECT
- DELETE
- INSERT
- UPDATE
- REFERENCES
- USAGE
- WITH GRANT OPTION allows privileges to be passed on
- REVOKE GRANT OPTION FOR privilegeList
ON objectName
FROM authorizationIDLIst
- GRANT OPTION FOR allows privileges passed on via WITH GRANT OPTION of GRANT to be revoked separately from the privileges themselves
SQL programming language
- Declarations
- %TYPE, variable same type as a column
- %ROWTYPE, variable same type as an entire row
- variableName type
- Assignments
- Control statements
- IF ... THEN ... END IF
- LOOP ... END LOOP
- WHILE condition DO ... END WHILE
- FOR indexVariable AS ... DO ... END FOR
Trigger
- CREATE TRIGGER triggerName
BEFORE | AFTER | INSTEAD OF | INSERT | DELETE | UPDATE OF triggerColumnList
ON tableName
REFERENCING OLD|NEW AS new
FOR EACH ROW|STATEMENT
BEGIN ... END
Database System Development Lifecycle
- Database planning
- System definition
- Requirements collection and analysis
- Database design
- DBMS selection
- Application design
- Prototyping
- Implementation
- Data conversion and loading
- Testing
- Operational maintenance
Fact-finding techniques
- The formal process of using techniques such as interviews and questionnaires to collect facts about systems, requirements, and preferences
- techniques
- examining documentation
- interviewing
- observing the organization in operation
- research
- questionnaires
ER model
- entity type, group of objects with same properties, identified by enterprise as having an independent existence
- strong entity type, entity type that is not existence-dependent on some other entity type
- weak entity type, entity type has no key, that is existence-dependent on some other entity type
- relationship type, set of meaningful associations among entity types
- degree, number of participating entities in relationship
- cardinality ratio
- attribute, property of an entity or a relationship type
- single attributes, compose of a single component with an independent existence
- composite attributes, compose of multiple components, each with an independent existence
- single-valued attributes, hold a single value for each occurrence of an entity type
- multi-valued attributes, hold multiple values for each occurrence of an entity type
- stored attributes, pysically stored values
- derived attributes, represent the values that is derivable from value of a related attribute, or set of attributes, not necessarily in the same entity type
- connection traps
- fan traps, where a model represents a relationship between entity types, but pathway between certain entity occurrences is ambiguous, exist two or more 1:N relathionships fan out from the same entity
- chasm traps, where a model suggests the existence of a relationship between entity types, but pathway does not exist between certain entity occurrences, one or more relationships with a minimum multiplicity of zero
Enhanced Entity-Relationship (EER) model
- specialization / generalization
- Superclass, an entity type that includes one or more distinct subgroupings of its occurrences
- Subclass, a distinct subgrouping of occurrences of an entity type
- Specialization, process of maximizing differences between members of an entity by identifying their distinguishing characteristics
- Generalization, process of minimizing differences between entities by identifying their common characteristics
- Constraints
- Participation constraint, determines whether every member in superclass must participate as a member of a subclass, mandatory or optional
- Disjoint constraint, describes relationship between members of the subclasses and indicates whether member of a superclass can be a member of one, or more than one, subclass, disjoint or nondisjoint
ER-to-Relational Mapping
Normalization, a technique for producing a set of suitable relations that support the data requirements of an enterprise
- benefits
- easier for the user to access and maintain the data
- take up minimal storage space on the computer
- update anomalies
- insertion
- deletion
- modification
- Functional dependencies
- X → Y, Y depends on X
- X ⊇ Y, then X → Y, reflexive rule
- X → Y, then XZ → XZ, augmentation rule
- X → Y, Y → Z, then X → Z, transitive rule
- X → YZ, then X → Y, decoposition rule
- X → Y, X → Z, then X → YZ, union rule
- X → Y, WY → Z, then WX → Z, pseudotransitive rule
- normalization is a formal technique based on the primary key and the functional dependencies
- First Normal Form (1NF) is critical for relational data model, to avoid the update anomalies, we proceed to at least Third Normal Form (3NF)
- Unnormalized Form (UNF), a table that contains one or more repeating groups
- 1NF, a relation in which the intersection of each row an dcolumn contains one and only one value
- 2NF, a relation that is in 1NF and every non-primary-key attribute is fully functionally dependent on the primary key
- 3NF, a relation that is in 1NF and 2NF and in which no non-primary-key attribute is transitively dependent on the primary key
Methodology
- Conceptual datbase design
- Step 1.1 Identify entity types
- Step 1.2 Identify relationship types
- Step 1.3 Identify and associate attributes with entity or relationship types
- Step 1.4 Determine attribute domains
- Step 1.5 Determine candidate, primary, and alternate key attributes
- Step 1.6 Consider use of enhanced modeling concepts
- Step 1.7 Check model for redundancy
- Re-examine one-toone (1:1) relationships
- Remove redundant relationships
- Consider time dimension
- Step 1.8 Validate conceptual model against user transactions
- Step 1.9 Review conceptual data model with user
- Build and validate logical data model
- Step 2.1 Derive relations for logical data model
- Step 2.2 Validate relations using normalization
- Step 2.3 Validate relations against user transactions
- Step 2.4 Define integrity constraints
- Required data, some attributes must always contain a valid value
- Attribute domain constraints, every attribute has a domain, a set of values that are legal
- Multiplicity, represents the constraints that are placed on relationships between data in the database
- Entity integrity
- Referential integrity
- General constraints
- Step 2.5 Review logical data model with user
- Step 2.6 Merge logical data models into global model (optional step)
- Step 2.7 Check for future growth
- Translate logical data model for target DBMS
- Step 3.1 Design base relations
- Step 3.2 Design representation of derived data
- Step 3.3 Design general constraints
- Design file organizations and indexes
- Step 4.1 Analyze transactions
- Step 4.2 Choose file organization
- Step 4.3 Choose indexes
- the attribute that is used most often for join operations
- the attribute that is used most often to access the tuples in a relation in order of that attribute
- if the ordering attribute chosen is a key of the relation, the index is a primary index; if the ordering attribute is not a key, the index will be a clustering index
- Specifying indexes
- CREATE UNIQUE INDEX indexName ON relation
- CREATE INDEX indexName ON relation CLUSTER
- Step 4.4 Estimate disk space requirements
- Step 5 Design user views
- Step 6 Design security mechanisms
- Step 7 Consider the introduction of controlled redundancy
- normalization is a logical database design that has minimal redundancy
- denormalization may improve efficiency
- Step 8 Monitor and tune the operational system
- Transaction throughput: number of transactions processed in given time interval
- Response time: elapsed time for completion of a single transaction
- Disk storage: amount of disk space required to store database files
Transaction management
- Transaction, Action, or series of actions, carried out by user or application, which reads or updates contents of database
- success, transaction commits and database reaches a new consistent state
- failure, transaction aborts, and database must be restored to consistent state before it started, such a transaction is rolled back or undone
- Committed transaction cannot be aborted
- Aborted transaction that is rolled back can be restarted later
- Properties
- Atomicity
- Consistency
- Isolation
- Durability
- Concurrency control
- Process of managing simultaneous operations on the database without having them interfere with one another
- Potential problems
- Lost update problem
- Uncommitted dependency problem
- Inconsistent analysis problem
- Techniques
- Lock
- Two-phase locking
- growing phase
- shrinking phase
- Deadlock
- Timeouts
- Deadlock prevention
- Deadlock detection and recovery
- Timestamping
- A unique identifier created by DBMS that indicates relative starting time of a transaction
- Optimistic
- serializable, find nonserial schedules that are equivalent to some serial schedule
Data warehouse
- A subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process