CSC484A/CSC589A/MUS490/MUS590 

(Music Information Retrieval )

Taught by:  George Tzanetakis
(http://www.cs.uvic.ca/~gtzan)
gtzan@cs.uvic.ca


dog gramophone      


News
Outline
Assignments
Projects
Software
Resources
Feedback
Bibliography
Grades

Based on our discussions here are the tentative projects. I will be happy to discuss alternatives/details etc.

Groups/Projects

  1. Real-time pitch detection for monophonic sounds possibly implemented as a MIDI device or VST plugin - Tony and Lacey Antoniou
  2. World music classification for particular regions/styles possibly incorporating singer identification information - Dale Lyons and Onat Yazir
  3. Visualization, browsing and retrieval of drum loops and possibly music based on rhythmic patterns using the e-drum as a control interface - David Sprague and Adam Tindale
  4. Error detection in music performance of computer-generated music notation at different levels - Graham Percival and Lucas Longley
  5. Event detection and visualization of trumpet sounds using a psychoacoustically motivated approach - Aaron Hilton, Nathan McDonald
  6. Rotational sensors of music composition and analysis - Ryan Willoghby
  7. Identification of recording style/producer from audio signals - Randy Jones
  8. Music Caricatures of audio using drums and chors - Keith Chan and Terence Nathan


Other Ideas


The following are some representative ideas for projects for this class. Feel free to ask me questions/propose your own projects/modify the descriptions etc. The ordering of the projects has no significance. A variety of existing source code, programs, toolboxes, datasets will help you get started with each project. Some of them can be found in the software section of this website. Each project has several unique characteristics  and they differ in the skills required, type of programming, availiability of existing work and many other factors. Feel free to contact me via email or in person to clarify any questions you might have regarding these projects. Groups should preferably consist of 2 or 3 partners. Exception to this rule are possible but not recommended. The expectations about each project will be adjusted according to the number of students in the group. Most projects will consist of the following stages:
1) literature review
2) data collection
3) ground-truth annotation
4) implementation
5) debugging
6) evaluation
7) written report
  1. Genre classification on MIDI data
  2. Genre classification has mostly been explored in the audio domain for some time. More recently algorithms for genre classification based on statistics/features over symbolic data such as MIDI files have appeared in the literature. The goal of this project would be to recreate some of the existing algorithms and investigate alternatives.

  3. A framework for evaluating similarity retrieval for music
  4. A variety of similarity-based retrieval algorithms have been proposed for music in audio-format. The only way to reliably evaluate content-based similarity retrieval is to conduct user studies. The goal of this project is to build a framework (possibly web-based) that would allow different algorithms for audio similarity to be used and evaluated by users. The main challenge would be to design the framework to be flexible in the way the algorithms are evaluated, the similarity measure, the presentation mode etc.

  5. Sensor-based MIR
  6. One of the less explored areas in MIR is the interface of MIR systems to the user. As more and more music is available in portable digital music players of various forms and sizes we should envision how MIR can be used on these devices. This project is going to explore how sensor technology such as piezos, knobs, sliders can be used for browsing music collections, specifying music queries (for example tapping a query or playing a melody), and for annotation such onset detection and beat locations.
     
  7. Automatic beat detection
  8. There are many algorithms proposed for automatic beat detection in the MIR literature and this research actually predates MIR. However there is little detailed experimental comparison between different approaches. A recent pleasant exception has been the tempo inference contest at the last ISMIR in Barcelona, 2004. In this projects, students will implement 2-3 of the most well-known algorithms published in the literature, collect ground truth and do some experimental comparisons.

  9. Key finding in polyphonic audio
  10. There has been some existing work on key finding on symbolic scores. In addition, pitch-based representations such as Chroma vectors or Pitch Histograms have been shown to be effective for alignment, structural analysis and classification. This project will explore the use of pitch-based representations in order to identify the key in polyphonic audio recordings.

  11. Query-by-humming front-end
  12. The first stage in a QBH system is to convert a recording of a human singing, humming or whistling into either a pitch contour or note sequence that can then be used to search a database of musical pieces for a match. A large variety of pitch detection algorithms have been proposed in literature. This project will explore different pitch detection algorithms as well as note segmentation strategies

  13. Query-by-humming back-end
  14. Once either a pitch contour or a series of notes have been extracted they can be converted to some representation that can then be used to search a database of melodies for approximate matches. In this project some of the major approaches that have been proposed for representing melodies and searching melodic databases will be implemented.

  15. ThemeFinder In order to search for melodic fragments in polyphonic music it is necessary to extract the most important "themes" of a polyphonic recording. This can be done by incorporating knowledge from voice leading, MIDI instrument labels, amount of repetition, melodic shape and many other factors. The goal of this project is to implement a theme finder using both techniques described in the literature as well as exploring alternatives.
  16. Structural analysis based on similarity matrix 
  17. The similarity matrix is a visual representation that shows the internal structure of a piece of music (chorus-verse, measures, beats). By analyzing this representation it is possible to reconstruct the structural form of a piece of music such as AABA.

  18. Drum pattern similarity retrieval
  19. Drums are part of a large number of musical pieces. There are many software packages that provide a wide variety of drum loops/pattern that can be used to create music. Typically these large drum loop collections can only by browsed/searched based on filename. The aim of this project is to explore how the actual sound/structural  similarity between drum patterns can be exploited for finding drum loops that are "similar". Accurate drum pattern classification/similarity can potentially lead to significant advances in audio MIR as most of recorded music today is characterized by drums and their patterns.
  20. Drum detection in polyphonic audio
  21. Recently researchers have started looking at the problem of identifying individual drum sounds in polyphonic music recordings such as hihat, bass drum etc. In this project, students will implement some of these new algorithms and explore variations and alternative approach. A significant part of the project will consists of building tools for obtaining ground truth annotations as well as evaluating the developed algorithms.

  22. Content-based audio analysis using plugins
  23. Many of the existing software music players such as WinAmp or itunes provide an API for writing plugins. Although typically geared toward spectrum visualization these plugins could pottentially be used as a front-end for feature extraction, classification and similarity retrieval. This project will expore this possibility.

  24. Chord-detection in polyphonic audio
  25. Even though polyphonic transcription of general audio is still far from being solved a variety of pitch-based representations such as chroma-vectors and pitch histograms have been proposed for audio. There is some limited research on using such representations potentially with some additional knowledge (such as likely chord progression) to perform chord detection in polyphonic audio signals. The goal of this project is to explore possibilities in that space. Jazz standards or beatles tunes might be a good starting point for data.

  26. Polyphonic alignment of audio and MIDI
  27. A symbolic score even in a "low" level format such as MIDI contains a wealth of useful information that is not directly available in the acoustic waveforms (beats/measures/chords etc). On the other hand most of the time we are interesting in hearing actual music rather than bad sounding MIDI files. In polyphonic audio alignment the idea is to compute features on both the audio and MIDI data and try to align the two sequences of features. This project will implement some of the existing approaches to this problem and explore alternatives and variations.
  28. Music Caricatures
  29. Even though we are still a long way from full polyphonic transcription music information retrieval are increasingly extracting more and more higher-level information about audio signals. The idea behind this project is to use this information to create musical "caricatures" of the original audio using MIDI. The only constrain is that the resulting "caricature" should somehow  match possibly in a funny way the original music.

  30. Comparison of algorithms for audio-segmentation
  31. Audio segmentation referes to the process of detecting when there is a change of audio "texture" such as the change from singing to instrumental background, the change from an orchestra to guitar solo, etc. A variety of algorithms have been proposed for audio segmentation. The goal of this project is to implement the main approaches and explore alternatives and variants.

  32. Music Information Retrieval using MPEG-7 low level descriptors
  33. The MPEG-7 standard was recently proposed for standarizing some of the ways multimedia content is described. Part of it describes some audio descriptors that can be used to characterize  audio signals. There has been little evaluation of those descriptors compared to more other feature front-ends proposed in the literature.  The goal of this project is to implement the MPEG-7 audio descriptors and compare them with other features in a variety of tasks such as similarity retrieval, classification and segmentation.


  34. Instrumentation-based genre/style classification
  35. The type of instruments used in a song can be a quite reliable indicator of a particular musical genre. For example the significant presense of saxophone probably implies a jazz tune. Even though these rules always have exceptions they still will probably work for many cases. The goal of this project is to explore the use of decision trees for automatically finding and using such instrumentation-based rules. A significant part of the project will consist of collecting instrumentation annotation data.

  36. Template-based detection of instrumentation
  37. The goal of this project is to detect what (and maybe when) instruments are present in an audio recording. The goal is not source separation or transcription but rather just a presense/absence indicator for particular instruments. For example from minute 1 to minute 2 there is a saxophone, piano and drums playing after which a singer joins the ensemble would be the output of such a system. In order to identify specific instruments templates will be learned from a large database of examples and then adapted to the particular recording.

  38. Singing-voice detection
  39. Detecting the segments of a piece of music where there is singing is the first step in singer identification. This is a classic classification problem which is made difficult by the large variety of singers and instrumental backgrounds. The goal of this project is to explore various proposed algorithms and feature front-ends for this task. Specifically the use of phasevocoding techniques for enhancing the prominent singing voice is a promising area of exploration.
     
  40. Singer Identification
  41. The singer identity is major part of the way popular music is characterized and identified. Most listeners that hear a piece they haven't heard before can not identify the group until the singer starts singing. The goal of this project is to explore existing approaches to singer identification and explore variations and alternatives.

  42. Male/Female singer detection
  43. Automatic male/female voice classification has been explored in the context of the spoken voice. The goal of this project is to first explore male/female singer detection in monophonic recordings of singing and then expand this work to polyphonic recordings.

  44. Direct manipulation music browsing
  45. Although MIR for historical reasons has been mostly focused on retrieval a large part of music listening involves browsing and exploration. The goal of this project is to explore various creative ways of browsing large collections of music that are direct and provide constant audio feedback about the user actions.

  46. Hyperbolic trees for music collection visualization
  47. Hyperbolic trees are an impressive visualization technique for representing trees/graphs of documents/images. The goal of this project is to explore the potential of using this technique for visualizing large music collections. Of specific interest is the possibility adjustment of this technique to incorporate content-based music similarity.

  48. Playlist summarization using similarity graphs
  49. Similarity graphs are constructed by using content-based distances for edges and nodes that correspond to musical pieces. The goal of this project is to explore how this model could be used to generate summaries for music playlists i.e a short duration representation (3 seconds for each song in the playlist) that summarizes a playlist.