Umeå universitets logga

umu.sePublikationer
Driftmeddelande
För närvarande är det driftstörningar. Felsökning pågår.
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
In-depth performance analysis of the ADTOF-based algorithm for automatic drum transcription
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. (HPAC)ORCID-id: 0000-0001-5022-1686
Department of Music, Universidad EAFIT, Medellín 050022, Colombia.
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Högpresterande beräkningscentrum norr (HPC2N). (HPAC)ORCID-id: 0000-0002-4972-7097
2024 (Engelska)Ingår i: Proceedings of the 25th international society for music information retrieval conference, San Francisco: ISMIR , 2024, s. 1060-1067Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

The importance of automatic drum transcription lies in the potential to extract useful information from a musical track; however, the low reliability of the models for this task represents a limiting factor. Indeed, even though in the recent literature the quality of the generated transcription has improved thanks to the curation of large training datasets via crowdsourcing, there is still a large margin of improvement for this task to be considered solved. Aiming to steer the development of future models, we identify the most common errors from training and testing on the aforementioned crowdsourced datasets. We perform this study in three steps: First, we detail the quality of the transcription for each class of interest; second, we employ a new metric and a pseudo confusion matrix to quantify different mistakes in the estimations; last, we compute the agreement between different annotators of the same track to estimate the accuracy of the ground-truth. Our findings are twofold: On the one hand, we observe that the previously reported issue that less represented instruments (e.g., toms) are less reliably transcribed is mostly solved now. On the other hand, cymbal instruments have unprecedented relative low performance. We provide intuitive explanations as to why cymbal instruments are difficult to transcribe and we identify that they represent the main source of disagreement among annotators.

Ort, förlag, år, upplaga, sidor
San Francisco: ISMIR , 2024. s. 1060-1067
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:umu:diva-228264Scopus ID: 2-s2.0-85219129262OAI: oai:DiVA.org:umu-228264DiVA, id: diva2:1887345
Konferens
25th International Society for Music Information Retrieval Conference (ISMIR), San Francisco, USA, 10-14 november, 2024.
Tillgänglig från: 2024-08-07 Skapad: 2024-08-07 Senast uppdaterad: 2025-04-02Bibliografiskt granskad
Ingår i avhandling
1. Towards automatic DJ mixing: cue point detection and drum transcription
Öppna denna publikation i ny flik eller fönster >>Towards automatic DJ mixing: cue point detection and drum transcription
2024 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Alternativ titel[sv]
Mot automatisk DJ-mixning : cue point-detektering och trumtranskription
Abstract [en]

With this thesis, we aim to automate the creation of DJ mixes. A DJ mix consists of an uninterrupted sequence of music, constructed by playing tracks one after the other, to improve the listening experience for the audience. Thus, to be able to build mixes automatically, we first need to understand the tracks we want to mix. This is done by extracting information from the audio signal. Specifically, we retrieve two pieces of information that are essential for DJs: cue points and drum transcription. In the field of music information retrieval, the two associated tasks are cue point detection and automatic drum transcription.

With cue point detection, we identify the positions in the tracks that can be used to create pleasant transitions in the mix. DJs have a good intuition on how to detect these positions. However, it is not straightforward to transform their intuition into a computer program because of the semantic gap between the two. To solve this problem we propose multiple approaches based on either expert knowledge or machine learning. Further, by interpreting the resulting models from our approaches, we also reflect on the musical content that is linked to the presence of cue points.

With automatic drum transcription, we aim to retrieve the position and the instrument of the notes played on the drumkit, to characterize the musical content of the tracks. To create the transcription, the most promising method is based on supervised deep learning. That is, models trained on labeled datasets. However, because of the difficulty of creating the annotations, the datasets available for training are usually limited in size or diversity. Thus, we propose novel methods to create better training data, either with real-world or synthetic music tracks. Further, by investigating thoroughly the performance of the models resulting from the training data, we deduce the most relevant characteristics of a dataset that help train models.

The solutions we proposed for both tasks of cue point detection and automatic drum transcription achieve high levels of accuracy. By investigating how these tasks reach this accuracy, we further our understanding of music information retrieval. And by open-sourcing our contributions, we make these findings reproducible. With the software resulting from this research, we created a proof of concept for automatic DJ mixing.

Ort, förlag, år, upplaga, sidor
Umeå: Umeå University, 2024. s. 34
Serie
Report / UMINF, ISSN 0348-0542 ; 24.08
Nyckelord
Music Information Retrieval, Cue Point Detection, Automatic Drum Transcription
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-228266 (URN)9789180704533 (ISBN)9789180704540 (ISBN)
Disputation
2024-09-02, MIT.C.343, MIT-huset, Umeå, 13:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2024-08-15 Skapad: 2024-08-07 Senast uppdaterad: 2024-08-09Bibliografiskt granskad

Open Access i DiVA

fulltext(267 kB)70 nedladdningar
Filinformation
Filnamn FULLTEXT02.pdfFilstorlek 267 kBChecksumma SHA-512
b32e04c6a8686d4c0b27138714083e9d22f46f365047e355470437f23b1214957f9e50427e43886f55fd0caba12570509317181ca08f7614452f9c9e6d2e833e
Typ fulltextMimetyp application/pdf

Övriga länkar

ScopusConference website

Person

Zehren, MickaëlBientinesi, Paolo

Sök vidare i DiVA

Av författaren/redaktören
Zehren, MickaëlBientinesi, Paolo
Av organisationen
Institutionen för datavetenskapHögpresterande beräkningscentrum norr (HPC2N)
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 363 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 379 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf