Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A parallel corpus-based approach to the crime event extraction for low-resource languages
Umeå University, Faculty of Science and Technology, Department of Computing Science. National Technical University, Kharkiv Polytechnic Institute, Department of Intelligent Computer Systems, Kharkiv, Ukraine.ORCID iD: 0000-0002-9826-0286
Institute of Information and Computational Technologies, Almaty, Kazakhstan.
Gdańsk University of Technology, Department of Informatics in Management, Gdańsk, Poland.
Friedrich Schiller University Jena, Institut für Slawistik und Kaukasusstudien, Jena, Germany.
Show others and affiliations
2023 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 11, p. 54093-54111Article in journal (Refereed) Published
Abstract [en]

These days, a lot of crime-related events take place all over the world. Most of them are reported in news portals and social media. Crime-related event extraction from the published texts can allow monitoring, analysis, and comparison of police or criminal activities in different countries or regions. Existing approaches to event extraction mainly suggest processing texts in English, French, Chinese, and some other resource-rich and well-annotated languages. This paper presents a parallel corpus-based approach that follows a closed-domain event extraction methodology to event extraction from web news articles in low-resource languages. To identify the event, its arguments, and the arguments' roles in the source-language part of the corpus we utilize an enhanced pattern-based method that involves the multilingual synonyms dictionary with knowledge about crime-related concepts and logic-linguistic equations. The event extraction from the target-language part of the corpus uses a cross-lingual crime-related event extraction transfer technique that is based on supplementary knowledge about the semantic similarity patterns of the considered pair of languages. The presented approach does not require a preliminarily annotated corpus for training making it more attractive to low-resource languages and allows extracting TRANSFER, CRIME, and POLICE types of events and their seven subtypes from various topics of news articles simultaneously. Implementation of our approach for the Russian-Kazakh parallel corpus of news portals articles allowed obtaining the F1-measure of crime-related event extraction of over 82% for the source language and 63% for the target language.

Place, publisher, year, edition, pages
IEEE, 2023. Vol. 11, p. 54093-54111
Keywords [en]
crime analysis, Cross-lingual transfer, event extraction, low-resource language, natural language processing, parallel corpus, semantic annotation
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:umu:diva-211831DOI: 10.1109/ACCESS.2023.3281680ISI: 001005528500001Scopus ID: 2-s2.0-85161064304OAI: oai:DiVA.org:umu-211831DiVA, id: diva2:1781820
Available from: 2023-07-11 Created: 2023-07-11 Last updated: 2024-07-22Bibliographically approved

Open Access in DiVA

fulltext(3275 kB)295 downloads
File information
File name FULLTEXT01.pdfFile size 3275 kBChecksum SHA-512
2a26a7970cf710a19ac9c0d4cdd85682f33f7a6e80522c53817dadcfcd4930f205dc981a310c26cd8b68a19f41fbce4b7ef1a973ed4f0469e4b4a789f657a8b0
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Khairova, Nina

Search in DiVA

By author/editor
Khairova, Nina
By organisation
Department of Computing Science
In the same journal
IEEE Access
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 295 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 229 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf