Umeå universitets logga

umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
High-quality and reproducible automatic drum transcription from crowdsourced data
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. (HPAC)ORCID-id: 0000-0001-5022-1686
Department of Music, Universidad EAFIT, Medellín 050022, Colombia.
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. (HPAC)ORCID-id: 0000-0002-4972-7097
2023 (Engelska)Ingår i: Signals, E-ISSN 2624-6120, Vol. 4, nr 4, s. 768-787Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Within the broad problem known as automatic music transcription, we considered the specific task of automatic drum transcription (ADT). This is a complex task that has recently shown significant advances thanks to deep learning (DL) techniques. Most notably, massive amounts of labeled data obtained from crowds of annotators have made it possible to implement large-scale supervised learning architectures for ADT. In this study, we explored the untapped potential of these new datasets by addressing three key points: First, we reviewed recent trends in DL architectures and focused on two techniques, self-attention mechanisms and tatum-synchronous convolutions. Then, to mitigate the noise and bias that are inherent in crowdsourced data, we extended the training data with additional annotations. Finally, to quantify the potential of the data, we compared many training scenarios by combining up to six different datasets, including zero-shot evaluations. Our findings revealed that crowdsourced datasets outperform previously utilized datasets, and regardless of the DL architecture employed, they are sufficient in size and quality to train accurate models. By fully exploiting this data source, our models produced high-quality drum transcriptions, achieving state-of-the-art results. Thanks to this accuracy, our work can be more successfully used by musicians (e.g., to learn new musical pieces by reading, or to convert their performances to MIDI) and researchers in music information retrieval (e.g., to retrieve information from the notes instead of audio, such as the rhythm or structure of a piece).

Ort, förlag, år, upplaga, sidor
MDPI, 2023. Vol. 4, nr 4, s. 768-787
Nyckelord [en]
automatic drum transcription, crowdsourced dataset, self-attention mechanism, tatum
Nationell ämneskategori
Signalbehandling Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:umu:diva-216394DOI: 10.3390/signals4040042ISI: 001177003200001Scopus ID: 2-s2.0-85180709684OAI: oai:DiVA.org:umu-216394DiVA, id: diva2:1811103
Forskningsfinansiär
Swedish National Infrastructure for Computing (SNIC)Vetenskapsrådet, 2022-06725Vetenskapsrådet, 2018-05973Tillgänglig från: 2023-11-10 Skapad: 2023-11-10 Senast uppdaterad: 2025-04-24Bibliografiskt granskad
Ingår i avhandling
1. Towards automatic DJ mixing: cue point detection and drum transcription
Öppna denna publikation i ny flik eller fönster >>Towards automatic DJ mixing: cue point detection and drum transcription
2024 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Alternativ titel[sv]
Mot automatisk DJ-mixning : cue point-detektering och trumtranskription
Abstract [en]

With this thesis, we aim to automate the creation of DJ mixes. A DJ mix consists of an uninterrupted sequence of music, constructed by playing tracks one after the other, to improve the listening experience for the audience. Thus, to be able to build mixes automatically, we first need to understand the tracks we want to mix. This is done by extracting information from the audio signal. Specifically, we retrieve two pieces of information that are essential for DJs: cue points and drum transcription. In the field of music information retrieval, the two associated tasks are cue point detection and automatic drum transcription.

With cue point detection, we identify the positions in the tracks that can be used to create pleasant transitions in the mix. DJs have a good intuition on how to detect these positions. However, it is not straightforward to transform their intuition into a computer program because of the semantic gap between the two. To solve this problem we propose multiple approaches based on either expert knowledge or machine learning. Further, by interpreting the resulting models from our approaches, we also reflect on the musical content that is linked to the presence of cue points.

With automatic drum transcription, we aim to retrieve the position and the instrument of the notes played on the drumkit, to characterize the musical content of the tracks. To create the transcription, the most promising method is based on supervised deep learning. That is, models trained on labeled datasets. However, because of the difficulty of creating the annotations, the datasets available for training are usually limited in size or diversity. Thus, we propose novel methods to create better training data, either with real-world or synthetic music tracks. Further, by investigating thoroughly the performance of the models resulting from the training data, we deduce the most relevant characteristics of a dataset that help train models.

The solutions we proposed for both tasks of cue point detection and automatic drum transcription achieve high levels of accuracy. By investigating how these tasks reach this accuracy, we further our understanding of music information retrieval. And by open-sourcing our contributions, we make these findings reproducible. With the software resulting from this research, we created a proof of concept for automatic DJ mixing.

Ort, förlag, år, upplaga, sidor
Umeå: Umeå University, 2024. s. 34
Serie
Report / UMINF, ISSN 0348-0542 ; 24.08
Nyckelord
Music Information Retrieval, Cue Point Detection, Automatic Drum Transcription
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-228266 (URN)9789180704533 (ISBN)9789180704540 (ISBN)
Disputation
2024-09-02, MIT.C.343, MIT-huset, Umeå, 13:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2024-08-15 Skapad: 2024-08-07 Senast uppdaterad: 2024-08-09Bibliografiskt granskad

Open Access i DiVA

fulltext(686 kB)199 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 686 kBChecksumma SHA-512
2a9ebbd4e87551c2fbb03c222b12036ca2061b216b51048d580fe065c827eff08003cb17448c28352c3b51c3f94e0902d038e447914dbeba310c87bbe8804159
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltextScopus

Person

Zehren, MickaëlBientinesi, Paolo

Sök vidare i DiVA

Av författaren/redaktören
Zehren, MickaëlBientinesi, Paolo
Av organisationen
Institutionen för datavetenskap
SignalbehandlingDatavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 200 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 605 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf