Vargas-Vera, M. (2015). A Framework for Extraction of Relations from Text using Relational Learning and Similarity Measures. J. Univers. Comput. Sci., 21(11), 1482–1495.
Abstract: Named entity recognition (NER) has been studied largely in the Information Extraction community as it is one step in the construction of an Information Extraction System. However, to extract only names without contextual information is not sufficient if we want to be able to describe facts encountered in documents, in particular, academic documents. Then, there is a need for extracting relations between entities. This task is accomplished using relational learning algorithms embedded in an Information Extraction framework. In particular, we have extended two relational learning frameworks RAPIER and FOIL. Our proposed extended frameworks are equipped with DSSim (short for Dempster-Shafer Similarity) our similarity service. Both extended frameworks were tested using an electronic newsletter consisting of news articles describing activities or events happening in an academic institution as our main application is on education.
|
Villalon, J., & Calvo, R. A. (2013). A Decoupled Architecture for Scalability in Text Mining Applications. J. Univers. Comput. Sci., 19(3), 406–427.
Abstract: Sophisticated Text Mining features such as visualization, summarization, and clustering are becoming increasingly common in software applications. In Text Mining, documents are processed using techniques from different areas which can be very expensive in computation cost. This poses a scalability challenge for real-life applications in which users behavior can not be entirely predicted. This paper proposes a decoupled architecture for document processing in Text Mining applications, that allows applications to be scalable for large corpora and real-time processing. It contributes a software architecture designed around these requirements and presents TML, a Text Mining Library that implements the architecture. An experimental evaluation on its scalability using a standard corpus is also presented, and empirical evidence on its performance as part of an automated feedback system for writing tasks used by real students.
|