Popis:
Automated text categorization (ATC) is the task of automatically sorting a set of electronic text based documents into predefined categories based on their content. Applications of ATC include indexing documents or Web pages by controlled vocabulary, e-mail routing, spam filtering, authorship attribution, and many others. The talk will focus on the machine learning approach to ATC. The main phases of text categorization, recent models and learning methods used in ATC will be discussed.