Informational Cathegorical Data Clustering

0098540 20240103185625.220080124235959.9 Informational Cathegorical Data Clustering 10 s. cav_un_epca*0089863978-80-01-03913-7Doktorandské dny 200757-66PrahaČeská technika ČVUT2007 Informační shlukování kategoriálních dat EM algorithm distribution mixtures cluster analysis cathegorial data cav_un_auth*0230019 Hora Jan UTIA-B Ústav teorie informace a automatizace AV ČR, v. v. i. GA102/07/1594 GA ČR cav_un_auth*0228611 1M0572 GA MŠk cav_un_auth*0001814 2C06019 GA MŠk CZ cav_un_auth*0216518 CEZ:AV0Z10750506 The EM algorithm has been used repeatedly to identify latent classes in categorical data by estimating finite distribution mixtures of product components. Unfortunately, the underlying mixtures are not uniquely identifiable and, moreover, the estimated mixture parameters are starting-point dependent. For this reason we use the latent class model only to define a set of ``elementary'' classes by estimating a mixture of a large number components. As such a mixture we use also an optimally smoothed kernel estimate. We propose a hierarchical ``bottom up'' cluster analysis based on unifying the elementary latent classes sequentially. The clustering procedure is controlled by minimum information loss criterion. Shlukování kategoriálních dat je často řešeno hledáním tzv. latentních tříd pomocí EM algoritmu. Tento přístup ovšem závisí na počátečním řešení a naráží na problém neidentifikovatelosti směsi. Popisovaná metoda vyhledává shluky nikoliv jako jednotlivé komponenty směsi jako v případě latentních tříd, ale jako podsměsi vzniklé sloučením několika jednoduchých tříd z odhadnuté distribuční směsi s vyšším počtem komponent. Extrémní variantou takové směsi může být jádrový odhad, jehož optimální vyhlazení je v práci popsáno. V práci je dále představena metoda hierarchického shlukování s kritériem nejmenší informační ztráty. cav_un_auth*0232814 Doktorandské dny 2007 Praha 16.11.2007 CZ 2008 BB http://hdl.handle.net/11104/0157420 2007 cav_un_epca*0089863 Doktorandské dny 2007 978-80-01-03913-7 57 66 Praha Česká technika ČVUT 2007