About

The online platforms for publishing and storing digital content has contributed significantly to the growth of several text repositories. Due to the need to extract useful knowledge from these repositories, methods for automatic and intelligent organization of text collections have received great attention in the literature. The use of topic hierarchies is one of the most popular approaches for this organization because they allow users to explore the collection interactively through topics that indicate the contents of the available documents.

The software Torch - Topic Hierarchies was developed to help users to 'see hidden topics' in text collections and can be used in a wider variety of applications such as digital libraries, web directories, social networks, and document engineering. The extraction of topic hierarchies is based on unsupervised and semi-supervised learning methods from text collections. In particular, Torch uses clustering methods such as k-means, bisecting-kmeans, UPGMA, and incremental single-pass clustering. Currently, we are investigating consensus clustering, active learning, and semi-supervised clustering to improve the learning of topic hierarchies.

For more details about the Torch and to download our software, visit the papers in the related research.

Additionally, you can try a simple online demo of the Torch for web cluster engine (DBLP Computer Science Bibliography).


Related research

Marcacini, R. M., Corrêa, N. G. and Rezende, S. O. An Active Learning Approach to Frequent Itemset-Based Text Clustering. In ICPR'2012: 21st International Conference on Pattern Recognition, 2012. Tsukuba Science City, Japan (accepted for publication).

Marcacini, R. M., Cherman, E. A., Metzz, J. and Rezende, S. O. A fast dendrogram refinement approach for unsupervised expansion of hierarchies. In ECML/PKDD Discovery Challenge: Third Challenge on Large Scale Hierarchical Text Classification, 2012. Bristol, UK (accepted for publication).

Marcacini, R. M., Hruschka, E. R. and Rezende, S. O. On the Use of Consensus Clustering for Incremental Learning of Topic Hierarchies. In SBIA'2012: 21st Brazilian Symposium on Artificial Intelligence, 2012. Curitiba - PR, Brazil.

Carvalho, D. B. F., Marcacini, R. M., Lucena, C. J. P., Rezende, S. O. Towards a process to support solving the content selection problem from online community forums. In BraSNAM'2012: Brazilian Workshop on Social Network Analysis and Mining, Curitiba - PR, Brazil.

Panaggio, B. Z., Marcacini, R. M. and Rezende, S. O. Torch-ETS: análise exploratória de tópicos emergentes com apoio de agrupamento hierárquico de textos. In WFA'2011: X Worshop on Tools and Applications, 2011. Webmedia'2011: Brazilian Symposium on Multimedia and the Web. [In Portuguese].

Marcacini, R. M. and Rezende, S. O. Construção Automática de Diretórios Web usando Agrupamento Incremental de Termos. In ENIA'2011: VII Encontro Nacional de Inteligência Artificial. Natal - RN, Brasil. Congresso da Sociedade Brasileira de Computação, pages 1-12, 2011. [In Portuguese]

Marcacini, R. M. Unsupervised learning of topic hierarchies from dynamic text collections. Master of Science Dissertation - Mathematical and Computer Sciences Institute - ICMC University of Sao Paulo - USP - Sao Carlos, SP, Brazil. 2011.

Marcacini, R. M. and Rezende, S. O. Torch: a tool for building topic hierarchies from growing text collections. In WFA'2010: IX Workshop on Tools and Applications. Belo Horizonte - MG, Brasil. Webmedia'2011: Brazilian Symposium on Multimedia and the Web, pages 1-3, 2010.

Marcacini, R. M. and Rezende, S. O. Incremental Construction of Topic Hierarchies using Hierarchical Term Clustering. In SEKE'2010: Proceedings of the 22nd International Conference on Software Engineering and Knowledge Engineering , Redwood City, San Francisco, USA. KSI - Knowledge Systems Institute, pages 553-558, 2010. [Updated version 2011-05-03]