Instructor: |
Alex Thomo |
Phone: |
(250) 472-5786 |
Office: |
ECS 556 |
Office Hours: |
Monday/Thursday 12:30-1:30 PM |
Email: |
thomo@cs.uvic.ca |
Course Outline: |
Link |
Text:
Database Systems: The Complete Book by Hector Garcia-Molina,
Jeffrey D. Ullman, Jennifer D. Widom, 2nd Edition, Prentice Hall, 2008
Reference:
Readings in Database Systems, 4th Edition by Michael
Stonebraker and Joseph M. Hellerstein (Editors), The MIT Press 2005.
Lecture Notes
- Storage Management
- Intro to Database Management Systems Slides.
- Memory Hiearchy,
Disks mechanics, Computation Model, Using Secondary Storage Effectively.
Two Phase Multiway Merge Sort (2PMMS).
Impact of block size. Sorting very large relations.
Cylindrification. Multiple Disks. Prefetching and large scale buffering.
Adapting tape algorithms.
Slides
- Reliability of Disk Systems. Disk failures. RAID 4, RAID 5, RAID 6. Nested levels.
Slides.
- Index Structures
- Primary Indexes. Dense and Sparse Indexes. Secondary Indexes.
B-Trees. Insertion into B-Trees. Number of Levels in
B-trees. Deletion from B-Trees. Batch Insertion into B-Trees. Inverted Indexes.
Slides
- Secondary-Storage Hash Tables. Dynamic Hashing Framework: Extensible
Hashing, Linear Hashing.
Slides
- Multidimensional Data. Attempts at using B-trees for MD-queries.
Grid files. Partitioned hashing. Multiple-key indexes. KD-Trees. Quad trees.
Slides
- R-Trees. Bitmap Indexes.
Slides
- Signature Files.
Slides
- Query Processing
-
Model of Computation for Physical Operators, Iterators, One pass
algorithms.
Slides
- Block-based nested loops, Two-pass algorithms
based on sorting, Two-pass algorithms based on hashing
Slides
- Index joins. Zigzag Join.
Slides
- Query and parse tree, Algebraic laws, Pushing
selections. Laws for (bag) Projection, duplicate elimination, aggregations.
Slides
- Estimating the Cost of Operations: Estimating the size of joins,
Size estimates for other operations, Incremental computation of statistics.
Slides
- Cost based transformations. Join Trees.
Dynamic Programming for Join Order. Greedy Algorithm for Join Order.
Slides
- Parallel and Distributed Databases
- Parallel Algorithms for Relational Operations:
Model of Parallelism,
Selections and Joins in Parallel.
Slides
-
Google's Map-Reduce Parallelism Framework.
Slides.
-
Peer-to-Peer Distributed Search. The Distributed-Hashing Problem.
Slides.
- Data Mining
-
Frequent-Itemset Mining. The Market-Basket Model. Association Rules.
Slides.
-
Techniques for Frequent Itemsets. The A-Priori Algorithm.
Slides (Adapted from J.D. Ullman's slides).
-
Park-Chen-Yu Algorithm, Multistage Algorithm, Approximate Algorithms.
Slides (Adapted from J.D. Ullman's slides).
-
FP-Tree/FP-Growth Algorithm.
Slides.
-
Finding Similar Items. The Jaccard Measure of Similarity.
Divide-Compute-Merge.
Minhashing.
Locality-Sensitive Hashing.
Slides (ppt).
-
Clustering of Large-Scale Data.
Agglomerative Clustering.
k-Means for Large-Scale Data.
Slides (ppt) (Adapted from J.D. Ullman's slides).
- Database Systems and the Internet
- Ranking documents using TF/IDF and Cosine Similarity.
Slides (ppt).
-
PageRank. Recursive Formulation of PageRank. Spider Traps and Dead Ends.
Slides (ppt) (Adapted from J.D. Ullman's slides).
Reading list for final exam: Link
|