Umeå universitets logga

umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
CoordinateCleaner: Standardized cleaning of occurrence records from biological collection databases
Visa övriga samt affilieringar
2019 (Engelska)Ingår i: Methods in Ecology and Evolution, E-ISSN 2041-210X, Vol. 10, nr 5, s. 744-751Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Species occurrence records from online databases are an indispensable resource in ecological, biogeographical and palaeontological research. However, issues with data quality, especially incorrect geo-referencing or dating, can diminish their usefulness. Manual cleaning is time-consuming, error prone, difficult to reproduce and limited to known geographical areas and taxonomic groups, making it impractical for datasets with thousands or millions of records.

Here, we present CoordinateCleaner, an r-package to scan datasets of species occurrence records for geo-referencing and dating imprecisions and data entry errors in a standardized and reproducible way. CoordinateCleaner is tailored to problems common in biological and palaeontological databases and can handle datasets with millions of records. The software includes (a) functions to flag potentially problematic coordinate records based on geographical gazetteers, (b) a global database of 9,691 geo-referenced biodiversity institutions to identify records that are likely from horticulture or captivity, (c) novel algorithms to identify datasets with rasterized data, conversion errors and strong decimal rounding and (d) spatio-temporal tests for fossils.

We describe the individual functions available in CoordinateCleaner and demonstrate them on more than 90million occurrences of flowering plants from the Global Biodiversity Information Facility (GBIF) and 19,000 fossil occurrences from the Palaeobiology Database (PBDB). We find that in GBIF more than 3.4 million records (3.7%) are potentially problematic and that 179 of the tested contributing datasets (18.5%) might be biased by rasterized coordinates. In PBDB, 1205 records (6.3%) are potentially problematic.

All cleaning functions and the biodiversity institution database are open-source and available within the CoordinateCleaner r-package.

Ort, förlag, år, upplaga, sidor
Wiley-Blackwell, 2019. Vol. 10, nr 5, s. 744-751
Nyckelord [en]
biodiversity institutions, data quality, fossils, GBIF, geo-referencing, palaeobiology database (PBDB), r package, species distribution modelling
Nationell ämneskategori
Bioinformatik och beräkningsbiologi Annan fysik
Identifikatorer
URN: urn:nbn:se:umu:diva-161543DOI: 10.1111/2041-210X.13152ISI: 000471332800014Scopus ID: 2-s2.0-85062375357OAI: oai:DiVA.org:umu-161543DiVA, id: diva2:1336848
Forskningsfinansiär
Vetenskapsrådet, 2015-04748Stiftelsen för strategisk forskning (SSF)WallenbergstiftelsernaTillgänglig från: 2019-07-10 Skapad: 2019-07-10 Senast uppdaterad: 2025-02-05Bibliografiskt granskad

Open Access i DiVA

fulltext(1009 kB)899 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 1009 kBChecksumma SHA-512
9d60128d74abfcd92df8a358ab3c5b812715f60366557c433e8064733bde7afaf2bc56c1668397091be8922cfb2d19c13cb28bd3b8a57759f1cbf9d93ba6b353
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltextScopus

Person

Edler, Daniel

Sök vidare i DiVA

Av författaren/redaktören
Zizka, AlexanderSilvestro, DanieleEdler, DanielHerdean, Andrei
Av organisationen
Institutionen för fysik
I samma tidskrift
Methods in Ecology and Evolution
Bioinformatik och beräkningsbiologiAnnan fysik

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 899 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 634 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf