Term Network Approach for Transductive Classification

Term Network Approach for Transductive Classification

16th International Conference on Intelligent Text Processing and Computational Linguistics - 2015.

Rafael G. Rossi, Solange O. Rezende, Alneu A. Lopes

Abstract Transductive classification is a useful way to classify texts when just few labeled examples are available. Transductive classification algorithms rely on term frequency to directly classify texts represented in vector space model or to build networks and perform label propagation. Related terms tend to belong to the same class and this information can be used to assign relevance scores of terms for classes and consequently the labels of documents. In this paper we propose the use of term networks to model term relations and perform transductive classification. In order to do so, we propose (i) different ways to generate term networks, (ii) how to assign initial relevance scores for terms, (iii) how to propagate the relevance scores among terms, and (iv) how to use the relevance scores of terms in order to classify documents. We demonstrate that transductive classification based on term networks can surpass the accuracies obtained by transductive classification considering texts represented in other types of networks or vector space model, or even the accuracies obtained by inductive classification. We also demonstrated that we can decrease the size of term networks through feature selection while keeping classification accuracy and decreasing computational complexity.

Contacts {ragero,solange,alneu}@icmc.usp.br


Complete results

HERE you can find a directory containing PDFs with all results obtained for all text collections in our evaluation.

Source code

Authors' implementations

- Transductive Classification through Term Networks) (TCTN)

- Learning with Local and Global Consistency (LLGC) [REF]

- GNetMine [REF]

- Self-Training with Multinomial Naïve Bayes (MNB-Se) [REF]

- Expectation Maximization [REF]

- Transductive Support Vector Machines (TSVM) maintaining the same class proportion of labeled document and TSVM withouth maining the class proportion [REF]

The complete source code containing all implementations listed above and the classes used by them are available HERE.

Weka's implementations

We used the Weka library to run Multinomial Naïve Bayes (MNB). Weka library is available HERE .

Text Collections

HERE you can find a directory containing all text collections used in our experimental evaluation. The text collections are sparse, i.e., non-zero attributes are explicitly identified by attribute number and their value stated, and are in ARFF format.