User Tools

Site Tools


publication

Publication details

  • Characterizing Literature Using Machine Learning Methods (Jan Bilek), Master's Thesis, School: Universität Hamburg, 2016-10-14
    Publication details

Abstract

In this thesis, we explore the classical works by famous authors available in Project Gutenberg – a free online ebook library. The contemporary computational power enables us to analyze thousands of books and find similarities between them. We explore the differences between books and genres with respect to features such as proportion of stop words, the distribution of part of speech classes or frequencies of individual words. Using this knowledge, we create a model which predicts book metadata, including author or genre, and compare the performance of different approaches. With multinomial naive Bayes model, we reached 74.1 % accuracy on the author prediction task out of more than 1 400 authors. For other metadata, the random forest classifier achieved the best results. Through most predictive features, we try to capture what is typical for individual genres or epochs. As a part of the analysis, we create Character Interactions model that enables us to visualize the interactions between characters in the book and define the main or central character of the book.

BibTeX

@mastersthesis{CLUMLMB16,
	author	 = {Jan Bilek},
	title	 = {{Characterizing Literature Using Machine Learning Methods}},
	advisors	 = {Julian Kunkel},
	year	 = {2016},
	month	 = {10},
	school	 = {Universität Hamburg},
	howpublished	 = {{Online \url{https://wr.informatik.uni-hamburg.de/_media/research:theses:jan_bilek_characterizing_literature_using_machine_learning_methods.pdf}}},
	type	 = {Master's Thesis},
	abstract	 = {In this thesis, we explore the classical works by famous authors available in Project Gutenberg – a free online ebook library. The contemporary computational power enables us to analyze thousands of books and find similarities between them. We explore the differences between books and genres with respect to features such as proportion of stop words, the distribution of part of speech classes or frequencies of individual words. Using this knowledge, we create a model which predicts book metadata, including author or genre, and compare the performance of different approaches. With multinomial naive Bayes model, we reached 74.1 \% accuracy on the author prediction task out of more than 1 400 authors. For other metadata, the random forest classifier achieved the best results. Through most predictive features, we try to capture what is typical for individual genres or epochs. As a part of the analysis, we create Character Interactions model that enables us to visualize the interactions between characters in the book and define the main or central character of the book.},
}

publication.txt · Last modified: 2019-01-23 10:26 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki