formatting and background anomaly detection chapter work

This commit is contained in:
Jan Kowalczyk
2025-04-03 13:53:45 +02:00
parent 5f0ece63a0
commit 1bb06395b4
2 changed files with 373 additions and 252 deletions

View File

@@ -233,8 +233,22 @@ LiDAR sensors function by projecting lasers in multiple directions simultaneousl
\newsection{anomaly_detection}{Anomaly Detection}
Anomaly detection refers to the process of detecting unexpected patterns of data, outliers which deviate significantly from the majority of data which is implicitly defined as normal by its prevalence. In classic statistical analysis these techniques have been studied as early as the 19th century~\cite{anomaly_detection_history}. Since then, a multitude of methods and use-cases for them have been proposed and studied.
Anomaly detection refers to the process of detecting unexpected patterns of data, outliers which deviate significantly from the majority of data which is implicitly defined as normal by its prevalence. In classic statistical analysis these techniques have been studied as early as the 19th century~\cite{anomaly_detection_history}. Since then, a multitude of methods and use-cases for them have been proposed and studied. Examples of applications include healthcare, where computer vision algorithms are used to detect anomalies in medical images for diagnostics and early detection of diseases~\cite{anomaly_detection_medical}, detection of fraud in decentralized financial systems based on block-chain technology~\cite{anomaly_detection_defi} as well as fault detection in industrial machinery using acoustic sound data~\cite{anomaly_detection_manufacturing}.
By their very nature anomalies are rare occurences and oftentimes unpredictable in nature, which makes it hard to define all possible anomalies in any system. It also makes it very challenging to create an algorithm which is capable of detecting anomalies which may have never occured before and may not have been known to exist during the creation of the detection algorithm. There are multiple possible approaches taken by anomaly detection algorithms to achieve this feat.
\citeauthor{anomaly_detection_survey} categorize anomaly detection algorithms in\cite{anomaly_detection_survey} into six distinct categories based on the techniques used:
\begin{enumerate}
\item \textbf{Classification Based} - Using classification techniques such as SVMs, neural networks to classify samples as either normal or anomalous based on labeled training data. Alternatively, if not enough labeled training data is available a one-class classification algorithm can be used which assumes all training samples to be normal and then learns a boundary around the normal samples to differentiate them from anomalous samples which lie outside the learnt boundary.
\item \textbf{Clustering Based} - Using clustering techniques such as K-Means clustering, DBSCAN to cluster normal data together with the assumption that anomalies do not belong to the cluster, are an appreciable distance from the clusters center or belong to smaller different clusters than the normal data.
\item \textbf{Nearest Neighbor Based} - Similar to clustering based, these techniques assume normal data is more closely clustered than anomalies and therefore judge samples based on either the distance to their $k^{th}$ nearest neighbor or on the density of their local neighborhood.
\item \textbf{Statistical} - Using statistical techniques to fit a statistical model of the normal behaviour to the data and determining if samples are anomalous based on their likelihood of fitting into the statistical model.
\item \textbf{Information Theoretic} - Using information theoretic measures to determine iregularities in the data's information content which are assumed to be caused by anomalies.
\item \textbf{Spectral} - Using dimensionality reduction techniques like PCA to embed the data into a lower dimensional subspace where normal data appears significantly different from anomalous data. Spectral techniques may also be used as a pre-processing step followed by any other anomaly detection algorithm in the lower dimensional subspace.
\end{enumerate}
% strategies of anomaly detection algorithnms according to x include classification, neirest neighbor, clustering, spectral, information theoretic, statistical
\todo[inline, color=green!40]{cite exists since X and has been used to find anomalous data in many domains and works with all kinds of data types/structures (visual, audio, numbers). examples healthcare (computer vision diagnostics, early detection), financial anomalies (credit card fraud, maybe other example), security/safety video cameras (public, traffic, factories).}
\todo[inline, color=green!40]{the goal of these algorithms is to differentiate between normal and anomalous data by finding statistically relevant information which separates the two, since these methods learn how normal data typically is distributed they do not have to have prior knowledge of the types of all anomalies, therefore can potentially detect unseen, unclassified anomalies as well. main challenges when implementing are that its difficult to cleanly separate normal from anormal data}
@@ -242,7 +256,6 @@ LiDAR sensors function by projecting lasers in multiple directions simultaneousl
\todo[inline, color=green!40]{figure example shows 2d data but anomaly detection methods work with any kind of dimensionality/shape. shows two clusters of normal data with clear boundaries and outside examples of outliers (anomalous data two single points and one cluster), anomaly detection methods learn to draw these boundaries from the training data given to them which can then be used to judge if unseen data is normal or anormal}
\todo[inline, color=green!40]{as discussed in motivation, and same as in reference paper (rain autonomous driving) we model our problem as an anomaly detection problem where we define that good quality sensor data is normal data and degraded sensor data (in our case due to dust/smoke) is defined as an anomaly. this allows us to quantify the degradation of data by using the anomaly detection method to check how likely new data is an anomaly}
\iffalse
Anomaly detection algorithms are designed to detect or quantify the likelihood of a pattern in data deviating significantly from a well-defined expected norm. Deviations such as these are classified as anomalies or outliers and often signify critical or actionable information.
\begin{figure}
\begin{center}

View File

@@ -1,11 +1,45 @@
@article{anomaly_detection_survey,
author = {Chandola, Varun and Banerjee, Arindam and Kumar, Vipin},
title = {Anomaly detection: A survey},
author = {Varun Chandola and Arindam Banerjee and Vipin Kumar},
journal = {ACM Comput. Surv.},
year = {2009},
issue_date = {July 2009},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {41},
pages = {15:1-15:58},
url = {https://api.semanticscholar.org/CorpusID:207172599},
number = {3},
issn = {0360-0300},
url = {https://doi.org/10.1145/1541880.1541882},
doi = {10.1145/1541880.1541882},
abstract = {Anomaly detection is an important problem that has been researched
within diverse research areas and application domains. Many anomaly
detection techniques have been specifically developed for certain
application domains, while others are more generic. This survey
tries to provide a structured and comprehensive overview of the
research on anomaly detection. We have grouped existing techniques
into different categories based on the underlying approach adopted
by each technique. For each category we have identified key
assumptions, which are used by the techniques to differentiate
between normal and anomalous behavior. When applying a given
technique to a particular domain, these assumptions can be used as
guidelines to assess the effectiveness of the technique in that
domain. For each category, we provide a basic anomaly detection
technique, and then show how the different existing techniques in
that category are variants of the basic technique. This template
provides an easier and more succinct understanding of the
techniques belonging to each category. Further, for each category,
we identify the advantages and disadvantages of the techniques in
that category. We also provide a discussion on the computational
complexity of the techniques since it is an important issue in real
application domains. We hope that this survey will provide a better
understanding of the different directions in which research has
been done on this topic, and how techniques developed in one area
can be applied in domains for which they were not intended to begin
with.},
journal = {ACM Comput. Surv.},
month = jul,
articleno = {15},
numpages = {58},
keywords = {outlier detection, Anomaly detection},
},
@dataset{alexander_kyuroson_2023_7913307,
author = {Alexander Kyuroson and Niklas Dahlquist and Nikolaos Stathoulopoulos
@@ -114,62 +148,120 @@
detection of adversarial examples of GTSRB stop signs.},
},
@inproceedings{anomaly_detection_medical,
title = {Anomaly detection for medical images based on a one-class
classification},
author = {Qi Wei and Yinhao Ren and Rui Hou and Bibo Shi and Joseph Y. Lo and
Lawrence Carin},
booktitle = {Medical Imaging},
year = {2018},
url = {https://api.semanticscholar.org/CorpusID:3605439},
author = {{Wei}, Qi and {Ren}, Yinhao and {Hou}, Rui and {Shi}, Bibo and {Lo},
Joseph Y. and {Carin}, Lawrence},
title = "{Anomaly detection for medical images based on a one-class
classification}",
booktitle = {Medical Imaging 2018: Computer-Aided Diagnosis},
year = 2018,
editor = {{Petrick}, Nicholas and {Mori}, Kensaku},
series = {Society of Photo-Optical Instrumentation Engineers (SPIE) Conference
Series},
volume = {10575},
month = feb,
eid = {105751M},
pages = {105751M},
doi = {10.1117/12.2293408},
adsurl = {https://ui.adsabs.harvard.edu/abs/2018SPIE10575E..1MW},
adsnote = {Provided by the SAO/NASA Astrophysics Data System},
},
@article{anomaly_detection_defi,
author = {Ul Hassan, Muneeb and Rehmani, Mubashir Husain and Chen, Jinjun},
journal = {IEEE Communications Surveys & Tutorials},
title = {Anomaly Detection in Blockchain Networks: A Comprehensive Survey},
author = {Muneeb Ul Hassan and Mubashir Husain Rehmani and Jinjun Chen},
journal = {IEEE Communications Surveys \& Tutorials},
year = {2021},
year = {2023},
volume = {25},
number = {1},
pages = {289-318},
url = {https://api.semanticscholar.org/CorpusID:245124512},
},
keywords = {Blockchains;Anomaly detection;Security;Smart
contracts;Privacy;Bitcoin;Tutorials;Blockchain;anomaly
detection;fraud detection},
doi = {10.1109/COMST.2022.3205643},
}
,
@article{anomaly_detection_manufacturing,
title = {Residual Error Based Anomaly Detection Using Auto-Encoder in SMD
AUTHOR = {Oh, Dong Yul and Yun, Il Dong},
TITLE = {Residual Error Based Anomaly Detection Using Auto-Encoder in SMD
Machine Sound},
author = {Dong Yul Oh and Il Dong Yun},
journal = {Sensors (Basel, Switzerland)},
year = {2018},
volume = {18},
url = {https://api.semanticscholar.org/CorpusID:14006440},
JOURNAL = {Sensors},
VOLUME = {18},
YEAR = {2018},
NUMBER = {5},
ARTICLE-NUMBER = {1308},
URL = {https://www.mdpi.com/1424-8220/18/5/1308},
PubMedID = {29695084},
ISSN = {1424-8220},
ABSTRACT = {Detecting an anomaly or an abnormal situation from given noise is
highly useful in an environment where constantly verifying and
monitoring a machine is required. As deep learning algorithms are
further developed, current studies have focused on this problem.
However, there are too many variables to define anomalies, and the
human annotation for a large collection of abnormal data labeled at
the class-level is very labor-intensive. In this paper, we propose
to detect abnormal operation sounds or outliers in a very complex
machine along with reducing the data-driven annotation cost. The
architecture of the proposed model is based on an auto-encoder, and
it uses the residual error, which stands for its reconstruction
quality, to identify the anomaly. We assess our model using
Surface-Mounted Device (SMD) machine sound, which is very complex,
as experimental data, and state-of-the-art performance is
successfully achieved for anomaly detection.},
DOI = {10.3390/s18051308},
},
@article{anomaly_detection_history,
author = {F.Y. Edgeworth and},
title = {XLI. On discordant observations },
author = {Francis Ysidro Edgeworth},
journal = {Philosophical Magazine Series 1},
year = {1887},
journal = {The London, Edinburgh, and Dublin Philosophical Magazine and
Journal of Science},
volume = {23},
pages = {364-375},
url = {https://api.semanticscholar.org/CorpusID:120568135},
number = {143},
pages = {364--375},
year = {1887},
publisher = {Taylor \& Francis},
doi = {10.1080/14786448708628471},
URL = { https://doi.org/10.1080/14786448708628471 },
eprint = { https://doi.org/10.1080/14786448708628471 },
},
@article{degradation_quantification_rain,
title = {LiDAR Degradation Quantification for Autonomous Driving in Rain},
author = {Chen Zhang and Zefan Huang and Marcelo H. Ang and Daniela Rus},
journal = {2021 IEEE/RSJ International Conference on Intelligent Robots and
@inproceedings{degradation_quantification_rain,
author = {Zhang, Chen and Huang, Zefan and Ang, Marcelo H. and Rus, Daniela},
booktitle = {2021 IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS)},
title = {LiDAR Degradation Quantification for Autonomous Driving in Rain},
year = {2021},
volume = {},
number = {},
pages = {3458-3464},
url = {https://api.semanticscholar.org/CorpusID:245264644},
keywords = {Degradation;Location awareness;Laser radar;Rain;Codes;System
performance;Current measurement},
doi = {10.1109/IROS51168.2021.9636694},
},
@article{deep_learning_overview,
title = {Deep learning in neural networks: An overview},
author = {J{\"u}rgen Schmidhuber},
journal = {Neural networks : the official journal of the International Neural
Network Society},
year = {2014},
journal = {Neural Networks},
volume = {61},
pages = {85-117},
url = {https://api.semanticscholar.org/CorpusID:11715509},
year = {2015},
issn = {0893-6080},
doi = {https://doi.org/10.1016/j.neunet.2014.09.003},
url = {https://www.sciencedirect.com/science/article/pii/S0893608014002135},
author = {Jürgen Schmidhuber},
keywords = {Deep learning, Supervised learning, Unsupervised learning,
Reinforcement learning, Evolutionary computation},
abstract = {In recent years, deep artificial neural networks (including
recurrent ones) have won numerous contests in pattern recognition
and machine learning. This historical survey compactly summarizes
relevant work, much of it from the previous millennium. Shallow and
Deep Learners are distinguished by the depth of their credit
assignment paths, which are chains of possibly learnable, causal
links between actions and effects. I review deep supervised
learning (also recapitulating the history of backpropagation),
unsupervised learning, reinforcement learning & evolutionary
computation, and indirect search for short programs encoding deep
and large networks.},
},
@article{autoencoder_survey,
title = {A comprehensive survey on design and application of autoencoder in deep learning},
title = {A comprehensive survey on design and application of autoencoder in
deep learning},
journal = {Applied Soft Computing},
volume = {138},
pages = {110176},
@@ -178,6 +270,22 @@ issn = {1568-4946},
doi = {https://doi.org/10.1016/j.asoc.2023.110176},
url = {https://www.sciencedirect.com/science/article/pii/S1568494623001941},
author = {Pengzhi Li and Yan Pei and Jianqiang Li},
keywords = {Deep learning, Autoencoder, Unsupervised learning, Feature extraction, Autoencoder application},
abstract = {Autoencoder is an unsupervised learning model, which can automatically learn data features from a large number of samples and can act as a dimensionality reduction method. With the development of deep learning technology, autoencoder has attracted the attention of many scholars. Researchers have proposed several improved versions of autoencoder based on different application fields. First, this paper explains the principle of a conventional autoencoder and investigates the primary development process of an autoencoder. Second, We proposed a taxonomy of autoencoders according to their structures and principles. The related autoencoder models are comprehensively analyzed and discussed. This paper introduces the application progress of autoencoders in different fields, such as image classification and natural language processing, etc. Finally, the shortcomings of the current autoencoder algorithm are summarized, and prospected for its future development directions are addressed.}
keywords = {Deep learning, Autoencoder, Unsupervised learning, Feature
extraction, Autoencoder application},
abstract = {Autoencoder is an unsupervised learning model, which can
automatically learn data features from a large number of samples
and can act as a dimensionality reduction method. With the
development of deep learning technology, autoencoder has attracted
the attention of many scholars. Researchers have proposed several
improved versions of autoencoder based on different application
fields. First, this paper explains the principle of a conventional
autoencoder and investigates the primary development process of an
autoencoder. Second, We proposed a taxonomy of autoencoders
according to their structures and principles. The related
autoencoder models are comprehensively analyzed and discussed. This
paper introduces the application progress of autoencoders in
different fields, such as image classification and natural language
processing, etc. Finally, the shortcomings of the current
autoencoder algorithm are summarized, and prospected for its future
development directions are addressed.},
}