Umeå University's logo

umu.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
ADTOF: A large dataset of non-synthetic music for automatic drum transcription
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. (HPAC)ORCID-id: 0000-0001-5022-1686
Universidad EAFIT Medellín.
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. (HPAC)ORCID-id: 0000-0002-4972-7097
2021 (engelsk)Inngår i: Proceedings of the 22nd International Society for Music Information Retrieval Conference, 2021, s. 818-824Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

The state-of-the-art methods for drum transcription in the presence of melodic instruments (DTM) are machine learning models trained in a supervised manner, which means that they rely on labeled datasets. The problem is that the available public datasets are limited either in size or in realism, and are thus suboptimal for training purposes. Indeed, the best results are currently obtained via a rather convoluted multi-step training process that involves both real and synthetic datasets. To address this issue, starting from the observation that the communities of rhythm games players provide a large amount of annotated data, we curated a new dataset of crowdsourced drum transcriptions. This dataset contains real-world music, is manually annotated, and is about two orders of magnitude larger than any other non-synthetic dataset, making it a prime candidate for training purposes. However, due to crowdsourcing, the initial annotations contain mistakes. We discuss how the quality of the dataset can be improved by automatically correcting different types of mistakes. When used to train a popular DTM model, the dataset yields a performance that matches that of the state-of-the-art for DTM, thus demonstrating the quality of the annotations.

sted, utgiver, år, opplag, sider
2021. s. 818-824
HSV kategori
Identifikatorer
URN: urn:nbn:se:umu:diva-189852DOI: 10.5281/zenodo.5624527Scopus ID: 2-s2.0-85148923089ISBN: 9781732729902 (tryckt)OAI: oai:DiVA.org:umu-189852DiVA, id: diva2:1613668
Konferanse
ISMIR 2021, the 22nd International Society for Music Information Retrieval Conference, Online, November 7-12, 2021
Tilgjengelig fra: 2021-11-23 Laget: 2021-11-23 Sist oppdatert: 2024-08-07bibliografisk kontrollert
Inngår i avhandling
1. Towards automatic DJ mixing: cue point detection and drum transcription
Åpne denne publikasjonen i ny fane eller vindu >>Towards automatic DJ mixing: cue point detection and drum transcription
2024 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Alternativ tittel[sv]
Mot automatisk DJ-mixning : cue point-detektering och trumtranskription
Abstract [en]

With this thesis, we aim to automate the creation of DJ mixes. A DJ mix consists of an uninterrupted sequence of music, constructed by playing tracks one after the other, to improve the listening experience for the audience. Thus, to be able to build mixes automatically, we first need to understand the tracks we want to mix. This is done by extracting information from the audio signal. Specifically, we retrieve two pieces of information that are essential for DJs: cue points and drum transcription. In the field of music information retrieval, the two associated tasks are cue point detection and automatic drum transcription.

With cue point detection, we identify the positions in the tracks that can be used to create pleasant transitions in the mix. DJs have a good intuition on how to detect these positions. However, it is not straightforward to transform their intuition into a computer program because of the semantic gap between the two. To solve this problem we propose multiple approaches based on either expert knowledge or machine learning. Further, by interpreting the resulting models from our approaches, we also reflect on the musical content that is linked to the presence of cue points.

With automatic drum transcription, we aim to retrieve the position and the instrument of the notes played on the drumkit, to characterize the musical content of the tracks. To create the transcription, the most promising method is based on supervised deep learning. That is, models trained on labeled datasets. However, because of the difficulty of creating the annotations, the datasets available for training are usually limited in size or diversity. Thus, we propose novel methods to create better training data, either with real-world or synthetic music tracks. Further, by investigating thoroughly the performance of the models resulting from the training data, we deduce the most relevant characteristics of a dataset that help train models.

The solutions we proposed for both tasks of cue point detection and automatic drum transcription achieve high levels of accuracy. By investigating how these tasks reach this accuracy, we further our understanding of music information retrieval. And by open-sourcing our contributions, we make these findings reproducible. With the software resulting from this research, we created a proof of concept for automatic DJ mixing.

sted, utgiver, år, opplag, sider
Umeå: Umeå University, 2024. s. 34
Serie
Report / UMINF, ISSN 0348-0542 ; 24.08
Emneord
Music Information Retrieval, Cue Point Detection, Automatic Drum Transcription
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-228266 (URN)9789180704533 (ISBN)9789180704540 (ISBN)
Disputas
2024-09-02, MIT.C.343, MIT-huset, Umeå, 13:00 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2024-08-15 Laget: 2024-08-07 Sist oppdatert: 2024-08-09bibliografisk kontrollert

Open Access i DiVA

fulltext(1084 kB)236 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 1084 kBChecksum SHA-512
62c138d4396b43e1b761d1d57f1890d0bcf83785980de57b7e4271490dea3432127258fb397ab2e75a9397ae00941c8971d95a04546c777fc6a2d3649a1f985a
Type fulltextMimetype application/pdf

Andre lenker

Forlagets fulltekstScopus

Person

Zehren, MickaëlBientinesi, Paolo

Søk i DiVA

Av forfatter/redaktør
Zehren, MickaëlBientinesi, Paolo
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 237 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

doi
isbn
urn-nbn

Altmetric

doi
isbn
urn-nbn
Totalt: 498 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf