Web projektu DAR

Výzkumné centrum
Data - Algoritmy - Rozhodování

Založeno v roce 2005 s podporou MŠMT ČR (projekt 1M0572)

Publikace

Informational Cathegorical Data Clustering

Typ:

Konferenční příspěvek

Autoři publikace:

Hora J.

Název sborniku:

Doktorandské dny 2007

Nakladatel:

Česká technika ČVUT

Místo vydání:

Praha

Rok:

2007

ISBN:

978-80-01-03913-7

Kontaktní osoba:

prof. Ing. Michal Haindl, DrSc. (ÚTIA - Oddělení rozpoznávání obrazů)

Klíčová slova:

EM algorithm, distribution mixtures, cluster analysis, cathe

Anotace:

The EM algorithm has been used repeatedly to identify latent classes in categorical data by estimating finite distribution mixtures of product components. Unfortunately, the underlying mixtures are not uniquely identifiable and, moreover, the estimated mixture parameters are starting-point dependent. For this reason we use the latent class model only to define a set of ``elementary'' classes by estimating a mixture of a large number components. As such a mixture we use also an optimally smoothed kernel estimate. We propose a hierarchical ``bottom up'' cluster analysis based on unifying the elementary latent classes sequentially. The clustering procedure is controlled by minimum information loss criterion.