Change search
ReferencesLink to record
Permanent link

Direct link
A Stuctural Approach to Authorship Attribution using Dependency Grammars
Umeå University, Faculty of Science and Technology, Department of Computing Science.
2012 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Authorship attribution is an important problem, with many applications of practical use in the real-world. One principal constraint in dealing with this problem is related to the type of text being written, and for what purpose — its context. The context of a text has consequences on the stylistics of the resulting text.

This thesis presents an approach to the problem attempting to avoid the implications of context by analyzing grammatical structures, in practice dependency structures derived by computerized parsing software. For classification, latent semantic indexing is employed. Results are presented in terms of a comparison, in terms of performance, with a similar approach based on phrase structure trees.

The corpus used in these experiments is a subset of the ICWSM2009 corpus, provided by the International Conference on Weblogs and Social Media. The subset contains only blog posts, and shows a high degree of variance in a number of aspects, such as attributes in the authors and actual textual content.

In conclusion, the approach to the problem of attributing authorship appears to be significantly weaker than its phrase-structure counterpart. The outcome is further discussed, and possible approaches beyond the realm of authorship attribution is identified.

Place, publisher, year, edition, pages
, UMNAD, 915
National Category
Engineering and Technology
URN: urn:nbn:se:umu:diva-58258OAI: diva2:547547
Educational program
Bachelor of Science Programme in Computing Science
Available from: 2012-08-28 Created: 2012-08-28 Last updated: 2012-09-06Bibliographically approved

Open Access in DiVA

No full text

By organisation
Department of Computing Science
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 44 hits
ReferencesLink to record
Permanent link

Direct link