Few Notes on SQL

There are couples of  important concepts in SQL that confused me for a long time. After one afternoon reading in library, I realize a few tricks on SQL. So I decide to take a note as snapshot for future.

1. Index

Index is primary mechanism to get improved performance on relational database design, especially for speeding up queries. Indexes, which are  persistent data structure and stored in database, are created automatically or explicitly by programmer.

Let us see how index works. If we have a table below,

A B
cat 2
dog 5
cow 1
dog 9
cat 2
cat 8
cow 6

Indexes can be created on by ” create index on T.A”, so the database turns into

index A B
1 cat 2
2 dog 5
3 cow 1
4 dog 9
5 cat 2
6 cat 8
7 cow 6

If query T.A = ‘cow’ is sent, ‘cow 1’ and ‘cow 6’ will be returned. Be clear that user don’t access indexes; they are used underneath by the query execution engine.

The mechanism is indexing every tuples of database to speed up retrieve time ( in some sense like hash table).  So it is nature to ask which data structure is index based on? Well, two popular data structures, balanced tree and hash table, are candidates for index. Balanced tree provides O(logn) time for looking-up, which is not bad even for large numbers of data. While hash table provides O(1) for searching if no collisions happen. This seems hash table outweighs balanced tree. However, data stored in balanced tree are ordered which is efficient for inequality query, like query T.A  < ‘d’. Hence, there is trade-off on these two data structure for index and choose appropriate data structure based on type of query.

Downsides of Indexes. Obviously,  indexes are employed to improve performance of query on database. Every object has both sides, no excepts for indexes. The drawbacks of indexes are mainly illustrated as below.

(1) Indexes consume extra space. It is obvious that new column of index is added to database when index is created. Indexes will cost large memory when database consists of large mount of rows.

(2) Indexes creation consumes time. The index creation time seems trivial compared with query time of large database. But it is not case when database is relatively small.

(3) Indexes maintenance becomes huge problem for designers (significant).  Any updates, new tuple insertion or deletion might introduce data integrity problem to database designing. The burden of Indexes maintenance might even offset benefit of indexes itself.

As we talked above, indexes have both benefits and downsides for database design. Generally, whether we use indexes depends on:

(1) Size of table. Indexes are not benefit for small table, for both time cost and space cost.

(2) Data distribution. Table with lots of duplicates is better without indexes, due to less efficiency improvement on query but huge consumption on space and time.

(3) Query vs update load. If this database is designed for lots of queries, indexes will definitely have benefits for performance of database. However, if operations on database are mainly update load, indexes maintenance will become huge burden that benefit of indexes will be offset.

2. Referential Integrity

Integrity of references is a mechanism to keep data integrity between tables, which is implemented by primary key and foreign key.

Let us show this by an example. If we have two tables Student(sID, sName, GPA, HS) and Apply(sID, cName, major, dec).

Student
sID sName GPA HS
123 Mary 3.8
Apply
sID cName major dec
123 Stanford CS y

In this example, Student.sID is primary key while Apply.sID is foreign key.

Definition: Referential integrity is  from R.A to S.B. Each value in column A of Table R must appear in column B of table S. ( elements of  foreign key is subset of primary key)

(1) A is called the foreign key. (Apply.sID)

(2) B is usually required to be the primary key for table S or at least unique. (Student.sID)

If any updates, insertion or deletion on primary key, foreign key will do same to table S, which is mechanism of referential integrity.

 

Design a site like this with WordPress.com
Get started