C SC 485D/586D/SEng 480D: Topics in Systems/Topics in Computer Systems and Software/Topics in Software Engineering: "Database System Implementation"

Instructor: Alex Thomo
Phone: (250) 721-8659
Office: EOW 315
Office Hours: Tuesday/Thursday 6:00-7:00 PM
Email: thomo@cs.uvic.ca
TA: Marina Barsky
Email: mgbarsky@uvic.ca
Course Outline: Link

Text:

Database Systems: The Complete Book
by Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer D. Widom, 1st Edition, Prentice Hall, 2002

References:

  1. Readings in Database Systems, 4th Edition
    by Michael Stonebraker and Joseph M. Hellerstein (Editors), The MIT Press 2005.

Reading list: Link

Tentative Schedule and Lecture Handouts:

  • Intro to Database Management Systems Slides.
  • Memory Hiearchy, Disks mechanics, Computation Model Slides
  • Using Secondary Storage Effectively. Two­Phase, Multiway Merge Sort. Analysis of 2PMMS. Impact of block size. Sorting very large relations. Cylindrification. Multiple Disks. Mirror Disks. Prefetching and large scale buffering. Slides
  • Reliability of Disk Systems. Disk failures. RAID 4, RAID 5,RAID 6. Slides
  • Representing Data Elements. Fixed­Length Records. Inside the block. Representing addresses. Pointer swizzling. Pinned records. Variable-Length Data. Varying schema. Clustering relations together. Insertions. Deletions. Slides
  • Primary Indexes. Dense and Sparse Indexes. Secondary Indexes. Inverted Indexes. B­Trees. Insertion into B-Trees. Number of Levels in B-trees. Deletion from B-Trees. Slides
  • Secondary-Storage Hash Tables. Dynamic Hashing Framework: Extensible Hashing, Linear Hashing. Slides
  • Multidimensional Data. Attempts at using B-trees for MD-queries. Grid files. Partitioned hashing. Slides
  • Multiple-key indexes. KD-Trees. Quad trees. R-Trees. Bitmap Indexes. Slides
  • Query Execution I: Introduction to Physical-Query-Plan Operators, The Model of Computation for Physical Operators, Iterators, One pass algorithms. Slides
  • Query Execution II: Block-based nested loops, Two-pass algorithms based on sorting, Two-pass algorithms based on hashing Slides
  • Index joins. Zigzag Join. Slides
  • Midterm review and exercises. Slides
  • Parallel Algorithms for Relational Operations: Models of Parallelism, Tuple-at-a- Time Operations in Parallel, Joins Slides
  • Query Compiler: Query and parse tree, Algebraic laws, Pushing selections. Slides
  • Laws for (bag) Projection, duplicate elimination, aggregations. Estimating the Cost of Operations. Slides
  • Estimating the Cost of Operations: Estimating the size of joins, Containment of value sets, Preservation of set values, Size estimates for other operations, Incremental computation of statistics. Slides
  • Cost based transformations. Heuristics for selecting the physical plan. Dynamic Programming to Select a Join Order and Grouping. Slides
  • Greedy Algorithms for Selecting a Join Order. Pipelining Versus Materialization. Slides
  • Recovery: Undo Logging, Checkpointing, Nonquiescent Checkpointing, Undo Drawback, Redo. Slides
  • Concurrent Transactions. Serial and Serializable Schedules. Conflicts. Serializability/precedence graphs. Why two phase locking works. Shared/Exclusive Locks. Upgrading Locks. Deadlocks. Solution: Update Locks. Compatibility Matrix. Slides
  • Timestamps. Serializability Via Timestamps. Physically unrealizable events. Abort/Update Decisions. Timestamps Versus Locks. Slides

Assignments:
There would be three assignments.