SENG 480D/CSC 485D/CSC 571: Advanced Databases - Fall 2009

Instructor: Alex Thomo
Phone: (250) 472-5786
Office: ECS 556
Office Hours: Monday/Thursday 12:30-1:30 PM
Email: thomo@cs.uvic.ca
Course Outline: Link

Text: Database Systems: The Complete Book
by Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer D. Widom, 2nd Edition, Prentice Hall, 2008

Reference: Readings in Database Systems, 4th Edition
by Michael Stonebraker and Joseph M. Hellerstein (Editors), The MIT Press 2005.

Lecture Notes

  • Storage Management
    • Intro to Database Management Systems Slides.
    • Memory Hiearchy, Disks mechanics, Computation Model, Using Secondary Storage Effectively. Two Phase Multiway Merge Sort (2PMMS). Impact of block size. Sorting very large relations. Cylindrification. Multiple Disks. Prefetching and large scale buffering. Adapting tape algorithms. Slides
    • Reliability of Disk Systems. Disk failures. RAID 4, RAID 5, RAID 6. Nested levels. Slides.
  • Index Structures
    • Primary Indexes. Dense and Sparse Indexes. Secondary Indexes. B-Trees. Insertion into B-Trees. Number of Levels in B-trees. Deletion from B-Trees. Batch Insertion into B-Trees. Inverted Indexes. Slides
    • Secondary-Storage Hash Tables. Dynamic Hashing Framework: Extensible Hashing, Linear Hashing. Slides
    • Multidimensional Data. Attempts at using B-trees for MD-queries. Grid files. Partitioned hashing. Multiple-key indexes. KD-Trees. Quad trees. Slides
    • R-Trees. Bitmap Indexes. Slides
    • Signature Files. Slides
  • Query Processing
    • Model of Computation for Physical Operators, Iterators, One pass algorithms. Slides
    • Block-based nested loops, Two-pass algorithms based on sorting, Two-pass algorithms based on hashing Slides
    • Index joins. Zigzag Join. Slides
    • Query and parse tree, Algebraic laws, Pushing selections. Laws for (bag) Projection, duplicate elimination, aggregations. Slides
    • Estimating the Cost of Operations: Estimating the size of joins, Size estimates for other operations, Incremental computation of statistics. Slides
    • Cost based transformations. Join Trees. Dynamic Programming for Join Order. Greedy Algorithm for Join Order. Slides
  • Parallel and Distributed Databases
    • Parallel Algorithms for Relational Operations: Model of Parallelism, Selections and Joins in Parallel. Slides
    • Google's Map-Reduce Parallelism Framework. Slides.
    • Peer-to-Peer Distributed Search. The Distributed-Hashing Problem. Slides.
  • Data Mining
    • Frequent-Itemset Mining. The Market-Basket Model. Association Rules. Slides.
    • Techniques for Frequent Itemsets. The A-Priori Algorithm. Slides (Adapted from J.D. Ullman's slides).
    • Park-Chen-Yu Algorithm, Multistage Algorithm, Approximate Algorithms. Slides (Adapted from J.D. Ullman's slides).
    • FP-Tree/FP-Growth Algorithm. Slides.
    • Finding Similar Items. The Jaccard Measure of Similarity. Divide-Compute-Merge. Minhashing. Locality-Sensitive Hashing. Slides (ppt).
    • Clustering of Large-Scale Data. Agglomerative Clustering. k-Means for Large-Scale Data. Slides (ppt) (Adapted from J.D. Ullman's slides).
  • Database Systems and the Internet
    • Ranking documents using TF/IDF and Cosine Similarity. Slides (ppt).
    • PageRank. Recursive Formulation of PageRank. Spider Traps and Dead Ends. Slides (ppt) (Adapted from J.D. Ullman's slides).

Reading list for final exam: Link