bibtype C - Conference Paper (international conference)
ARLID 0507120
utime 20240103222343.2
mtime 20190731235959.9
SCOPUS 85001945953
WOS 000391051600008
DOI 10.1145/2996758.2996761
title (primary) (eng) Discriminative models for multi-instance problems with tree-structure
specification
page_count 9 s.
media_type P
serial
ARLID cav_un_epca*0507119
ISBN 978-1-4503-4573-6
title Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security (AISec'16)
page_num 83-91
publisher
place New York
name ACM
year 2016
keyword big data
keyword learning indicators of compromise
keyword malware detection
keyword neural network
keyword user modeling
author (primary)
ARLID cav_un_auth*0307300
name1 Pevný
name2 T.
country CZ
author
ARLID cav_un_auth*0101197
name1 Somol
name2 Petr
full_dept (cz) Rozpoznávání obrazu
full_dept Department of Pattern Recognition
department (cz) RO
department RO
institution UTIA-B
full_dept Department of Pattern Recognition
fullinstit Ústav teorie informace a automatizace AV ČR, v. v. i.
source
url http://library.utia.cas.cz/separaty/2019/RO/somol-0507120.pdf
cas_special
abstract (eng) Modelling network traffic is gaining importance to counter modern security threats of ever increasing sophistication. It is though surprisingly difficult and costly to construct reliable classifiers on top of telemetry data due to the variety and complexity of signals that no human can manage to interpret in full. Obtaining training data with sufficiently large and variable body of labels can thus be seen as a prohibitive problem. The goal of this work is to detect infected computers by observing their HTTP(S) traffic collected from network sensors, which are typically proxy servers or network firewalls, while relying on only minimal human input in the model training phase. We propose a discriminative model that makes decisions based on a computer's all traffic observed during a predefined time window (5 minutes in our case). The model is trained on traffic samples collected over equally-sized time windows for a large number of computers, where the only labels needed are (human) verdicts about the computer as a whole (presumed infected vs. presumed clean). As part of training, the model itself learns discriminative patterns in traffic targeted to individual servers and constructs the final high-level classifier on top of them. We show the classifier to perform with very high precision, and demonstrate that the learned traffic patterns can be interpreted as Indicators of Compromise. We implement the discriminative model as a neural network with special structure reflecting two stacked multi instance problems. The main advantages of the proposed configuration include not only improved accuracy and ability to learn from gross labels, but also automatic learning of server types (together with their detectors) that are typically visited by infected computers.
action
ARLID cav_un_auth*0377828
name the 2016 ACM Workshop on Artificial Intelligence and Security (AISec'16)
dates 20161028
mrcbC20-s 20161028
place Vienna
country AT
RIV BC
FORD0 20000
FORD1 20200
FORD2 20205
reportyear 2020
num_of_auth 2
presentation_type PR
inst_support RVO:67985556
permalink http://hdl.handle.net/11104/0298524
confidential S
mrcbC86 n.a. Proceedings Paper Computer Science Artificial Intelligence|Computer Science Theory Methods
arlyear 2016
mrcbU14 85001945953 SCOPUS
mrcbU24 PUBMED
mrcbU34 000391051600008 WOS
mrcbU63 cav_un_epca*0507119 Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security (AISec'16) 978-1-4503-4573-6 83 91 New York ACM 2016