Text document classification

0411407 20240903204330.320060210235959.9 Text document classification 2 s. cav_un_epca*02904900926-4981ERCIM News53-54European Research Consortium for Informatics and Mathematics Klasifikace textových dokumentů document representation categorization classification cav_un_auth*0101171 Novovičová Jana UTIA-B Ústav teorie informace a automatizace AV ČR, v. v. i. 09K 09J IAA2075302 GA AV ČR cav_un_auth*0001801 KSK1019101 GA AV ČR cav_un_auth*0000219 1M0572 GA MŠk cav_un_auth*0001814 CEZ:AV0Z10750506 During the last twenty years the number of text documents in digital form is enormously growing in size. As a consequence the need to automatically organize and classify documents is of great practical importance. Text classification aims for partition of an unstructured set of documents into groups that describe the contents of the document. There are two main variants of text classification: text clustering and text categorization. A major characteristic of the problem is the high dimension of text data. V minulých dvaceti letech značně vzrostl počet textových dokumentů v elektronické formě. V důsledku toho je velice žádoucí automaticky organizovat a klasifikovat dokumenty. Výzkum v oblasti klasifikace textů je zaměřen na třídění dokumentů do skupin založených na obsahu dokumentů. Existují dvě hlavní varianty klasifikace textů: shlukování textů (nalezení skrytých skupin v množině dokumentů) a kategorizace textů (zařazení dokumentů do předem definovaných skupin). BD 2006 RO http://hdl.handle.net/11104/0131489 UTIA-B 20050137 2005 cav_un_epca*0290490 ERCIM News 0926-4981 1564-0094 č. 62 2005 53 54 European Research Consortium for Informatics and Mathematics