UTIA - Library

bibtype

K - Conference Paper (Czech conference)

ARLID

0098540

utime

20240103185625.2

mtime

20080124235959.9

title (primary) (eng)

Informational Cathegorical Data Clustering

specification

page_count	10 s.

serial

ARLID

cav_un_epca*0089863

ISBN

978-80-01-03913-7

title

Doktorandské dny 2007

page_num

57-66

publisher

place	Praha
name	Česká technika ČVUT
year	2007

title (cze)

Informační shlukování kategoriálních dat

keyword

EM algorithm

keyword

distribution mixtures

keyword

cluster analysis

keyword

cathegorial data

author (primary)

ARLID	cav_un_auth*0230019
name1	Hora
name2	Jan
institution	UTIA-B
fullinstit	Ústav teorie informace a automatizace AV ČR, v. v. i.

cas_special

project

project_id	GA102/07/1594
agency	GA ČR
ARLID	cav_un_auth*0228611

project

project_id	1M0572
agency	GA MŠk
ARLID	cav_un_auth*0001814

project

project_id	2C06019
agency	GA MŠk
country	CZ
ARLID	cav_un_auth*0216518

research

CEZ:AV0Z10750506

abstract (eng)

The EM algorithm has been used repeatedly to identify latent classes in categorical data by estimating finite distribution mixtures of product components. Unfortunately, the underlying mixtures are not uniquely identifiable and, moreover, the estimated mixture parameters are starting-point dependent. For this reason we use the latent class model only to define a set of ``elementary'' classes by estimating a mixture of a large number components. As such a mixture we use also an optimally smoothed kernel estimate. We propose a hierarchical ``bottom up'' cluster analysis based on unifying the elementary latent classes sequentially. The clustering procedure is controlled by minimum information loss criterion.

abstract (cze)

Shlukování kategoriálních dat je často řešeno hledáním tzv. latentních tříd pomocí EM algoritmu. Tento přístup ovšem závisí na počátečním řešení a naráží na problém neidentifikovatelosti směsi. Popisovaná metoda vyhledává shluky nikoliv jako jednotlivé komponenty směsi jako v případě latentních tříd, ale jako podsměsi vzniklé sloučením několika jednoduchých tříd z odhadnuté distribuční směsi s vyšším počtem komponent. Extrémní variantou takové směsi může být jádrový odhad, jehož optimální vyhlazení je v práci popsáno. V práci je dále představena metoda hierarchického shlukování s kritériem nejmenší informační ztráty.

action

ARLID	cav_un_auth*0232814
name	Doktorandské dny 2007
place	Praha
dates	16.11.2007
country	CZ

reportyear

2008

RIV

permalink

http://hdl.handle.net/11104/0157420

arlyear

2007

mrcbU63

cav_un_epca*0089863 Doktorandské dny 2007 978-80-01-03913-7 57 66 Praha Česká technika ČVUT 2007