umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Automation of Editorial Tasks on the Website Content Central
Umeå University, Faculty of Science and Technology, Department of Computing Science.
2016 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Content Central is a website that allows freelance journalists and photographers to upload their work so that media outlets can buy and publish them. Content Central must moderate the content uploaded to assure that everything is of high quality and that it can be published directly. Right now this is done manually with an editor that work at Content Central.

The aim of this thesis is to automate the editorial process on Content Central with the use of natural language processing techniques. The focus of the automation is put on the tasks that consume the most time which is spell checking, formatting and word and sign replacement.

The automation of these tasks is done by the development of prototypes. The spell checking task is handled with two prototypes, one prototype uses a dictionary and handles non-word errors and the other prototype uses probability and word trigrams and bigrams to handle real word errors. The formatting and sign replacement is handled by a rule-based prototype.

These prototypes are tested on data from Content Central and compared with the results from the editor moderating the same data. Problems are found with the spell checkers, they give many false positives and are therefore deemed not so useful. The formatting and sign replacement prototype achieve a 52.8% recall and 98.6% precision which isestimated to decrease the time the editor spend on content with these errors with at least 51 seconds.

Place, publisher, year, edition, pages
2016. , 41 p.
Series
UMNAD, 1083
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:umu:diva-128577OAI: oai:DiVA.org:umu-128577DiVA: diva2:1052703
Educational program
Master of Science Programme in Computing Science and Engineering
Supervisors
Examiners
Available from: 2016-12-07 Created: 2016-12-07 Last updated: 2016-12-07Bibliographically approved

Open Access in DiVA

fulltext(222 kB)64 downloads
File information
File name FULLTEXT01.pdfFile size 222 kBChecksum SHA-512
24ea85267b1747ad2126b8574280bbf5c11396d2e36ccbbeeb382088abbe89b76c80cfe2397f6c58b94f26c9f3982c8680b9c13d8533d9218cdfc1c09e882a5e
Type fulltextMimetype application/pdf

By organisation
Department of Computing Science
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 64 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 95 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf