abstract
(eng) |
This brochure has been written with the support of the bilateral Czech-Taiwanese project Compositional models for data mining financially supported by the Ministry of Science and Technology, Taiwan, and by the Czech Academy of Sciences under Grant No. MOST-18-04 . The main output of the project, realized in 2018 and 2019, is a new supervised web system enabling researchers to learn probabilistic (compositional) models (both causal and stochastic) from data. We have opted for the web architecture for two reasons. First, we assume the system will be expanded in subsequent years, and the web application means that the system administrator only has to keep updated one version of program codes. Second, the system is accessible from any place in the world, so it can be applied not only by the members of research teams collaborating within the above-mentioned project but also by all interested researchers from anywhere inthe world. This book should serve as a manual for users of the data mining system. Nevertheless, since the system is based on the theory of compositional models, and no comprehensive text on this theory exists, we decided to set up this text from two parts. The first one describes the theoretical background on which the models constructed from data are based. It also includes chapters showing how the compositional models can be applied to data mining tasks. For this reason, the first part summarizes results scattered in a number of research journal and conference papers, mainly by R. Jiroušek and his coauthors Vl. Bína and V. Kratochvíl. This part, after introducing the notation from general probability theory, puts a special emphasis on the notion of stochastic (conditional) independence, without which one cannot distill knowledge from probability models. Chapters 2-5 sum up excerpts from the original research conference and journal papers. The importance of this part can be seen not only in the fact that it is the first time when these results are surveyed in one comprehensive text but also that it is presented using a new unifying notation, without which it might be difficult to see the links interconnecting individual parts of this theoretical approach.\n |