Using Topological Data Analysis for Text Classification
1 online resource (44 pages) : PDF
University of North Carolina at Charlotte
I show that by applying discourse features derived through topological data analysis(TDA), namely homological persistence, we can improve classification results on thetask of movie genre detection, including identification of overlapping movie genres.On the IMDB dataset we improve prior art results, namely we increase the Jaccardscore by 4.7% over a recent results by . I also significantly improve the F-score(by over 15%) and slightly improve the hit rate (by 0.5%, ibid.). The limitations ofmy work, mostly due to the smaller data set, are also discussed in the end. I see mycontribution as threefold: (a) for general audience of computational linguists, I wantto increase their awareness about topology as a possible source of semantic features;(b) for researchers using machine learning for NLP tasks, I want to propose the useof topological features when the number of training examples is small; and (c) forthose already aware of the existence of computational topology, I see this work ascontributing to the discussion about the value of topology for NLP, in view of mixedresults reported by others.
BARCODESMOVIE GENRE CLASSIFICATIONPERSISTENT HOMOLOGYTDATEXT CLASSIFICATIONTOPOLOGICAL DATA ANALYSIS
Akella, SrinivasWartell, Zackery
Thesis (M.S.)--University of North Carolina at Charlotte, 2018.
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). For additional information, see http://rightsstatements.org/page/InC/1.0/.
Copyright is held by the author unless otherwise indicated.