Umeå universitets logga

umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Towards automatic DJ mixing: cue point detection and drum transcription
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. (HPAC)ORCID-id: 0000-0001-5022-1686
2024 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)Alternativ titel
Mot automatisk DJ-mixning : cue point-detektering och trumtranskription (Svenska)
Abstract [en]

With this thesis, we aim to automate the creation of DJ mixes. A DJ mix consists of an uninterrupted sequence of music, constructed by playing tracks one after the other, to improve the listening experience for the audience. Thus, to be able to build mixes automatically, we first need to understand the tracks we want to mix. This is done by extracting information from the audio signal. Specifically, we retrieve two pieces of information that are essential for DJs: cue points and drum transcription. In the field of music information retrieval, the two associated tasks are cue point detection and automatic drum transcription.

With cue point detection, we identify the positions in the tracks that can be used to create pleasant transitions in the mix. DJs have a good intuition on how to detect these positions. However, it is not straightforward to transform their intuition into a computer program because of the semantic gap between the two. To solve this problem we propose multiple approaches based on either expert knowledge or machine learning. Further, by interpreting the resulting models from our approaches, we also reflect on the musical content that is linked to the presence of cue points.

With automatic drum transcription, we aim to retrieve the position and the instrument of the notes played on the drumkit, to characterize the musical content of the tracks. To create the transcription, the most promising method is based on supervised deep learning. That is, models trained on labeled datasets. However, because of the difficulty of creating the annotations, the datasets available for training are usually limited in size or diversity. Thus, we propose novel methods to create better training data, either with real-world or synthetic music tracks. Further, by investigating thoroughly the performance of the models resulting from the training data, we deduce the most relevant characteristics of a dataset that help train models.

The solutions we proposed for both tasks of cue point detection and automatic drum transcription achieve high levels of accuracy. By investigating how these tasks reach this accuracy, we further our understanding of music information retrieval. And by open-sourcing our contributions, we make these findings reproducible. With the software resulting from this research, we created a proof of concept for automatic DJ mixing.

Ort, förlag, år, upplaga, sidor
Umeå: Umeå University, 2024. , s. 34
Serie
Report / UMINF, ISSN 0348-0542 ; 24.08
Nyckelord [en]
Music Information Retrieval, Cue Point Detection, Automatic Drum Transcription
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:umu:diva-228266ISBN: 9789180704533 (tryckt)ISBN: 9789180704540 (digital)OAI: oai:DiVA.org:umu-228266DiVA, id: diva2:1887409
Disputation
2024-09-02, MIT.C.343, MIT-huset, Umeå, 13:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2024-08-15 Skapad: 2024-08-07 Senast uppdaterad: 2024-08-09Bibliografiskt granskad
Delarbeten
1. M-DJCUE: a manually annotated dataset of cue points
Öppna denna publikation i ny flik eller fönster >>M-DJCUE: a manually annotated dataset of cue points
2019 (Engelska)Konferensbidrag, Enbart muntlig presentation (Övrigt vetenskapligt)
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-228225 (URN)
Konferens
20th International Society for Music Information Retrieval Conference: Across the bridge, Delft, The Netherlands, November 4-8, 2019
Anmärkning

Session: Late Breaking/Demo

Tillgänglig från: 2024-08-07 Skapad: 2024-08-07 Senast uppdaterad: 2024-08-08Bibliografiskt granskad
2. Automatic detection of cue points for the emulation of DJ mixing
Öppna denna publikation i ny flik eller fönster >>Automatic detection of cue points for the emulation of DJ mixing
2022 (Engelska)Ingår i: Computer music journal, ISSN 0148-9267, E-ISSN 1531-5169, Vol. 46, nr 3, s. 67-82Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

The automatic identification of cue points is a central task in applications as diverse as music thumbnailing, generation of mash ups, and DJ mixing. Our focus lies in electronic dance music and in a specific kind of cue point, the “switch point,” that makes it possible to automatically construct transitions between tracks, mimicking what professional DJs do. We present two approaches for the detection of switch points. One embodies a few general rules we established from interviews with professional DJs, the other models a manually annotated dataset that we curated. Both approaches are based on feature extraction and novelty analysis. From an evaluation conducted on previously unknown tracks, we found that about 90 percent of the points generated can be reliably used in the context of a DJ mix.

Ort, förlag, år, upplaga, sidor
MIT Press, 2022
Nationell ämneskategori
Signalbehandling Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-216393 (URN)10.1162/comj_a_00652 (DOI)001101195600004 ()2-s2.0-85177430629 (Scopus ID)
Tillgänglig från: 2023-11-10 Skapad: 2023-11-10 Senast uppdaterad: 2025-04-24Bibliografiskt granskad
3. Interpretability of methods for switch point detection in electronic dance music
Öppna denna publikation i ny flik eller fönster >>Interpretability of methods for switch point detection in electronic dance music
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Nationell ämneskategori
Datavetenskap (datalogi) Musik
Identifikatorer
urn:nbn:se:umu:diva-228227 (URN)
Tillgänglig från: 2024-08-07 Skapad: 2024-08-07 Senast uppdaterad: 2025-02-21
4. ADTOF: A large dataset of non-synthetic music for automatic drum transcription
Öppna denna publikation i ny flik eller fönster >>ADTOF: A large dataset of non-synthetic music for automatic drum transcription
2021 (Engelska)Ingår i: Proceedings of the 22nd International Society for Music Information Retrieval Conference, 2021, s. 818-824Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

The state-of-the-art methods for drum transcription in the presence of melodic instruments (DTM) are machine learning models trained in a supervised manner, which means that they rely on labeled datasets. The problem is that the available public datasets are limited either in size or in realism, and are thus suboptimal for training purposes. Indeed, the best results are currently obtained via a rather convoluted multi-step training process that involves both real and synthetic datasets. To address this issue, starting from the observation that the communities of rhythm games players provide a large amount of annotated data, we curated a new dataset of crowdsourced drum transcriptions. This dataset contains real-world music, is manually annotated, and is about two orders of magnitude larger than any other non-synthetic dataset, making it a prime candidate for training purposes. However, due to crowdsourcing, the initial annotations contain mistakes. We discuss how the quality of the dataset can be improved by automatically correcting different types of mistakes. When used to train a popular DTM model, the dataset yields a performance that matches that of the state-of-the-art for DTM, thus demonstrating the quality of the annotations.

Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-189852 (URN)10.5281/zenodo.5624527 (DOI)2-s2.0-85148923089 (Scopus ID)9781732729902 (ISBN)
Konferens
ISMIR 2021, the 22nd International Society for Music Information Retrieval Conference, Online, November 7-12, 2021
Tillgänglig från: 2021-11-23 Skapad: 2021-11-23 Senast uppdaterad: 2024-08-07Bibliografiskt granskad
5. High-quality and reproducible automatic drum transcription from crowdsourced data
Öppna denna publikation i ny flik eller fönster >>High-quality and reproducible automatic drum transcription from crowdsourced data
2023 (Engelska)Ingår i: Signals, E-ISSN 2624-6120, Vol. 4, nr 4, s. 768-787Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Within the broad problem known as automatic music transcription, we considered the specific task of automatic drum transcription (ADT). This is a complex task that has recently shown significant advances thanks to deep learning (DL) techniques. Most notably, massive amounts of labeled data obtained from crowds of annotators have made it possible to implement large-scale supervised learning architectures for ADT. In this study, we explored the untapped potential of these new datasets by addressing three key points: First, we reviewed recent trends in DL architectures and focused on two techniques, self-attention mechanisms and tatum-synchronous convolutions. Then, to mitigate the noise and bias that are inherent in crowdsourced data, we extended the training data with additional annotations. Finally, to quantify the potential of the data, we compared many training scenarios by combining up to six different datasets, including zero-shot evaluations. Our findings revealed that crowdsourced datasets outperform previously utilized datasets, and regardless of the DL architecture employed, they are sufficient in size and quality to train accurate models. By fully exploiting this data source, our models produced high-quality drum transcriptions, achieving state-of-the-art results. Thanks to this accuracy, our work can be more successfully used by musicians (e.g., to learn new musical pieces by reading, or to convert their performances to MIDI) and researchers in music information retrieval (e.g., to retrieve information from the notes instead of audio, such as the rhythm or structure of a piece).

Ort, förlag, år, upplaga, sidor
MDPI, 2023
Nyckelord
automatic drum transcription, crowdsourced dataset, self-attention mechanism, tatum
Nationell ämneskategori
Signalbehandling Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-216394 (URN)10.3390/signals4040042 (DOI)001177003200001 ()2-s2.0-85180709684 (Scopus ID)
Forskningsfinansiär
Swedish National Infrastructure for Computing (SNIC)Vetenskapsrådet, 2022-06725Vetenskapsrådet, 2018-05973
Tillgänglig från: 2023-11-10 Skapad: 2023-11-10 Senast uppdaterad: 2025-04-24Bibliografiskt granskad
6. In-depth performance analysis of the ADTOF-based algorithm for automatic drum transcription
Öppna denna publikation i ny flik eller fönster >>In-depth performance analysis of the ADTOF-based algorithm for automatic drum transcription
2024 (Engelska)Ingår i: Proceedings of the 25th international society for music information retrieval conference, San Francisco: ISMIR , 2024, s. 1060-1067Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

The importance of automatic drum transcription lies in the potential to extract useful information from a musical track; however, the low reliability of the models for this task represents a limiting factor. Indeed, even though in the recent literature the quality of the generated transcription has improved thanks to the curation of large training datasets via crowdsourcing, there is still a large margin of improvement for this task to be considered solved. Aiming to steer the development of future models, we identify the most common errors from training and testing on the aforementioned crowdsourced datasets. We perform this study in three steps: First, we detail the quality of the transcription for each class of interest; second, we employ a new metric and a pseudo confusion matrix to quantify different mistakes in the estimations; last, we compute the agreement between different annotators of the same track to estimate the accuracy of the ground-truth. Our findings are twofold: On the one hand, we observe that the previously reported issue that less represented instruments (e.g., toms) are less reliably transcribed is mostly solved now. On the other hand, cymbal instruments have unprecedented relative low performance. We provide intuitive explanations as to why cymbal instruments are difficult to transcribe and we identify that they represent the main source of disagreement among annotators.

Ort, förlag, år, upplaga, sidor
San Francisco: ISMIR, 2024
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-228264 (URN)2-s2.0-85219129262 (Scopus ID)
Konferens
25th International Society for Music Information Retrieval Conference (ISMIR), San Francisco, USA, 10-14 november, 2024.
Tillgänglig från: 2024-08-07 Skapad: 2024-08-07 Senast uppdaterad: 2025-04-02Bibliografiskt granskad
7. Analyzing and reducing the synthetic-to-real transfer gap in music information retrieval: the task of automatic drum transcription
Öppna denna publikation i ny flik eller fönster >>Analyzing and reducing the synthetic-to-real transfer gap in music information retrieval: the task of automatic drum transcription
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-228228 (URN)
Tillgänglig från: 2024-08-07 Skapad: 2024-08-07 Senast uppdaterad: 2024-08-08

Open Access i DiVA

fulltext(3385 kB)266 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 3385 kBChecksumma SHA-512
4ab261ec3b7b48fb3febb5613fbbcd0d926f45a68310211423ccacde5cdfdda53d11bb63316de36a8755ce9a110456e988f940e3fe481e952d996045e6702240
Typ fulltextMimetyp application/pdf
spikblad(197 kB)59 nedladdningar
Filinformation
Filnamn SPIKBLAD02.pdfFilstorlek 197 kBChecksumma SHA-512
fd621bf7610f06fa99a21da299f3a8d0294ae78ade622406d122982f270df22fa11ec08ba4e3ba2845d4e54a876225c154ff43d1eeea1718e63d139fc68ff60d
Typ spikbladMimetyp application/pdf

Person

Zehren, Mickaël

Sök vidare i DiVA

Av författaren/redaktören
Zehren, Mickaël
Av organisationen
Institutionen för datavetenskap
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 266 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 1232 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf