formatting and background anomaly detection chapter work

2025-04-03 13:53:45 +02:00
parent 5f0ece63a0
commit 1bb06395b4
2 changed files with 373 additions and 252 deletions
--- a/thesis/Main.tex
+++ b/thesis/Main.tex
@@ -233,8 +233,22 @@ LiDAR sensors function by projecting lasers in multiple directions simultaneousl

 \newsection{anomaly_detection}{Anomaly Detection}

-    Anomaly detection refers to the process of detecting unexpected patterns of data, outliers which deviate significantly from the majority of data which is implicitly defined as normal by its prevalence. In classic statistical analysis these techniques have been studied as early as the 19th century~\cite{anomaly_detection_history}. Since then, a multitude of methods and use-cases for them have been proposed and studied.  
+Anomaly detection refers to the process of detecting unexpected patterns of data, outliers which deviate significantly from the majority of data which is implicitly defined as normal by its prevalence. In classic statistical analysis these techniques have been studied as early as the 19th century~\cite{anomaly_detection_history}. Since then, a multitude of methods and use-cases for them have been proposed and studied. Examples of applications include healthcare, where computer vision algorithms are used to detect anomalies in medical images for diagnostics and early detection of diseases~\cite{anomaly_detection_medical}, detection of fraud in decentralized financial systems based on block-chain technology~\cite{anomaly_detection_defi} as well as fault detection in industrial machinery using acoustic sound data~\cite{anomaly_detection_manufacturing}.

+By their very nature anomalies are rare occurences and oftentimes unpredictable in nature, which makes it hard to define all possible anomalies in any system. It also makes it very challenging to create an algorithm which is capable of detecting anomalies which may have never occured before and may not have been known to exist during the creation of the detection algorithm. There are multiple possible approaches taken by anomaly detection algorithms to achieve this feat.
+
+\citeauthor{anomaly_detection_survey} categorize anomaly detection algorithms in\cite{anomaly_detection_survey} into six distinct categories based on the techniques used: 
+
+\begin{enumerate}
+  \item \textbf{Classification Based} - Using classification techniques such as SVMs, neural networks to classify samples as either normal or anomalous based on labeled training data. Alternatively, if not enough labeled training data is available a one-class classification algorithm can be used which assumes all training samples to be normal and then learns a boundary around the normal samples to differentiate them from anomalous samples which lie outside the learnt boundary. 
+  \item \textbf{Clustering Based} - Using clustering techniques such as K-Means clustering, DBSCAN to cluster normal data together with the assumption that anomalies do not belong to the cluster, are an appreciable distance from the clusters center or belong to smaller different clusters than the normal data.
+  \item \textbf{Nearest Neighbor Based} - Similar to clustering based, these techniques assume normal data is more closely clustered than anomalies and therefore judge samples based on either the distance to their $k^{th}$ nearest neighbor or on the density of their local neighborhood.
+  \item \textbf{Statistical} - Using statistical techniques to fit a statistical model of the normal behaviour to the data and determining if samples are anomalous based on their likelihood of fitting into the statistical model. 
+  \item \textbf{Information Theoretic} - Using information theoretic measures to determine iregularities in the data's information content which are assumed to be caused by anomalies. 
+  \item \textbf{Spectral} - Using dimensionality reduction techniques like PCA to embed the data into a lower dimensional subspace where normal data appears significantly different from anomalous data. Spectral techniques may also be used as a pre-processing step followed by any other anomaly detection algorithm in the lower dimensional subspace.
+\end{enumerate}
+
+% strategies of anomaly detection algorithnms according to x include classification, neirest neighbor, clustering, spectral, information theoretic, statistical

 \todo[inline, color=green!40]{cite exists since X and has been used to find anomalous data in many domains and works with all kinds of data types/structures (visual, audio, numbers). examples healthcare (computer vision diagnostics, early detection), financial anomalies (credit card fraud, maybe other example), security/safety video cameras (public, traffic, factories).}
 \todo[inline, color=green!40]{the goal of these algorithms is to differentiate between normal and anomalous data by finding statistically relevant information which separates the two, since these methods learn how normal data typically is distributed they do not have to have prior knowledge of the types of all anomalies, therefore can potentially detect unseen, unclassified anomalies as well. main challenges when implementing are that its difficult to cleanly separate normal from anormal data}
@@ -242,7 +256,6 @@ LiDAR sensors function by projecting lasers in multiple directions simultaneousl
 \todo[inline, color=green!40]{figure example shows 2d data but anomaly detection methods work with any kind of dimensionality/shape. shows two clusters of normal data with clear boundaries and outside examples of outliers (anomalous data two single points and one cluster), anomaly detection methods learn to draw these boundaries from the training data given to them which can then be used to judge if unseen data is normal or anormal}
 \todo[inline, color=green!40]{as discussed in motivation, and same as in reference paper (rain autonomous driving) we model our problem as an anomaly detection problem where we define that good quality sensor data is normal data and degraded sensor data (in our case due to dust/smoke) is defined as an anomaly. this allows us to quantify the degradation of data by using the anomaly detection method to check how likely new data is an anomaly}
 \iffalse
-
 	Anomaly detection algorithms are designed to detect or quantify the likelihood of a pattern in data deviating significantly from a well-defined expected norm. Deviations such as these are classified as anomalies or outliers and often signify critical or actionable information.
 	\begin{figure}
 		\begin{center}
--- a/thesis/bib/bibliography.bib
+++ b/thesis/bib/bibliography.bib
@@ -1,11 +1,45 @@
@article{anomaly_detection_survey,
+  author = {Chandola, Varun and Banerjee, Arindam and Kumar, Vipin},
  title = {Anomaly detection: A survey},
-  author = {Varun Chandola and Arindam Banerjee and Vipin Kumar},
-  journal = {ACM Comput. Surv.},
  year = {2009},
+  issue_date = {July 2009},
+  publisher = {Association for Computing Machinery},
+  address = {New York, NY, USA},
  volume = {41},
-  pages = {15:1-15:58},
-  url = {https://api.semanticscholar.org/CorpusID:207172599},
+  number = {3},
+  issn = {0360-0300},
+  url = {https://doi.org/10.1145/1541880.1541882},
+  doi = {10.1145/1541880.1541882},
+  abstract = {Anomaly detection is an important problem that has been researched
+              within diverse research areas and application domains. Many anomaly
+              detection techniques have been specifically developed for certain
+              application domains, while others are more generic. This survey
+              tries to provide a structured and comprehensive overview of the
+              research on anomaly detection. We have grouped existing techniques
+              into different categories based on the underlying approach adopted
+              by each technique. For each category we have identified key
+              assumptions, which are used by the techniques to differentiate
+              between normal and anomalous behavior. When applying a given
+              technique to a particular domain, these assumptions can be used as
+              guidelines to assess the effectiveness of the technique in that
+              domain. For each category, we provide a basic anomaly detection
+              technique, and then show how the different existing techniques in
+              that category are variants of the basic technique. This template
+              provides an easier and more succinct understanding of the
+              techniques belonging to each category. Further, for each category,
+              we identify the advantages and disadvantages of the techniques in
+              that category. We also provide a discussion on the computational
+              complexity of the techniques since it is an important issue in real
+              application domains. We hope that this survey will provide a better
+              understanding of the different directions in which research has
+              been done on this topic, and how techniques developed in one area
+              can be applied in domains for which they were not intended to begin
+              with.},
+  journal = {ACM Comput. Surv.},
+  month = jul,
+  articleno = {15},
+  numpages = {58},
+  keywords = {outlier detection, Anomaly detection},
 },
@dataset{alexander_kyuroson_2023_7913307,
  author = {Alexander Kyuroson and Niklas Dahlquist and Nikolaos Stathoulopoulos
@@ -114,62 +148,120 @@
              detection of adversarial examples of GTSRB stop signs.},
 },
  @inproceedings{anomaly_detection_medical,
-  title = {Anomaly detection for medical images based on a one-class
-           classification},
-  author = {Qi Wei and Yinhao Ren and Rui Hou and Bibo Shi and Joseph Y. Lo and
-            Lawrence Carin},
-  booktitle = {Medical Imaging},
-  year = {2018},
-  url = {https://api.semanticscholar.org/CorpusID:3605439},
+  author = {{Wei}, Qi and {Ren}, Yinhao and {Hou}, Rui and {Shi}, Bibo and {Lo},
+            Joseph Y. and {Carin}, Lawrence},
+  title = "{Anomaly detection for medical images based on a one-class
+           classification}",
+  booktitle = {Medical Imaging 2018: Computer-Aided Diagnosis},
+  year = 2018,
+  editor = {{Petrick}, Nicholas and {Mori}, Kensaku},
+  series = {Society of Photo-Optical Instrumentation Engineers (SPIE) Conference
+            Series},
+  volume = {10575},
+  month = feb,
+  eid = {105751M},
+  pages = {105751M},
+  doi = {10.1117/12.2293408},
+  adsurl = {https://ui.adsabs.harvard.edu/abs/2018SPIE10575E..1MW},
+  adsnote = {Provided by the SAO/NASA Astrophysics Data System},
 },
  @article{anomaly_detection_defi,
+  author = {Ul Hassan, Muneeb and Rehmani, Mubashir Husain and Chen, Jinjun},
+  journal = {IEEE Communications Surveys & Tutorials},
  title = {Anomaly Detection in Blockchain Networks: A Comprehensive Survey},
-  author = {Muneeb Ul Hassan and Mubashir Husain Rehmani and Jinjun Chen},
-  journal = {IEEE Communications Surveys \& Tutorials},
-  year = {2021},
+  year = {2023},
  volume = {25},
+  number = {1},
  pages = {289-318},
-  url = {https://api.semanticscholar.org/CorpusID:245124512},
-},
+  keywords = {Blockchains;Anomaly detection;Security;Smart
+              contracts;Privacy;Bitcoin;Tutorials;Blockchain;anomaly
+              detection;fraud detection},
+  doi = {10.1109/COMST.2022.3205643},
+}
+,
  @article{anomaly_detection_manufacturing,
-  title = {Residual Error Based Anomaly Detection Using Auto-Encoder in SMD
+  AUTHOR = {Oh, Dong Yul and Yun, Il Dong},
+  TITLE = {Residual Error Based Anomaly Detection Using Auto-Encoder in SMD
           Machine Sound},
-  author = {Dong Yul Oh and Il Dong Yun},
-  journal = {Sensors (Basel, Switzerland)},
-  year = {2018},
-  volume = {18},
-  url = {https://api.semanticscholar.org/CorpusID:14006440},
+  JOURNAL = {Sensors},
+  VOLUME = {18},
+  YEAR = {2018},
+  NUMBER = {5},
+  ARTICLE-NUMBER = {1308},
+  URL = {https://www.mdpi.com/1424-8220/18/5/1308},
+  PubMedID = {29695084},
+  ISSN = {1424-8220},
+  ABSTRACT = {Detecting an anomaly or an abnormal situation from given noise is
+              highly useful in an environment where constantly verifying and
+              monitoring a machine is required. As deep learning algorithms are
+              further developed, current studies have focused on this problem.
+              However, there are too many variables to define anomalies, and the
+              human annotation for a large collection of abnormal data labeled at
+              the class-level is very labor-intensive. In this paper, we propose
+              to detect abnormal operation sounds or outliers in a very complex
+              machine along with reducing the data-driven annotation cost. The
+              architecture of the proposed model is based on an auto-encoder, and
+              it uses the residual error, which stands for its reconstruction
+              quality, to identify the anomaly. We assess our model using
+              Surface-Mounted Device (SMD) machine sound, which is very complex,
+              as experimental data, and state-of-the-art performance is
+              successfully achieved for anomaly detection.},
+  DOI = {10.3390/s18051308},
 },
  @article{anomaly_detection_history,
+  author = {F.Y. Edgeworth and},
  title = {XLI. On discordant observations },
-  author = {Francis Ysidro Edgeworth},
-  journal = {Philosophical Magazine Series 1},
-  year = {1887},
+  journal = {The London, Edinburgh, and Dublin Philosophical Magazine and
+             Journal of Science},
  volume = {23},
-  pages = {364-375},
-  url = {https://api.semanticscholar.org/CorpusID:120568135},
+  number = {143},
+  pages = {364--375},
+  year = {1887},
+  publisher = {Taylor \& Francis},
+  doi = {10.1080/14786448708628471},
+  URL = { https://doi.org/10.1080/14786448708628471 },
+  eprint = { https://doi.org/10.1080/14786448708628471 },
 },
-@article{degradation_quantification_rain,
-  title = {LiDAR Degradation Quantification for Autonomous Driving in Rain},
-  author = {Chen Zhang and Zefan Huang and Marcelo H. Ang and Daniela Rus},
-  journal = {2021 IEEE/RSJ International Conference on Intelligent Robots and
+  @inproceedings{degradation_quantification_rain,
+  author = {Zhang, Chen and Huang, Zefan and Ang, Marcelo H. and Rus, Daniela},
+  booktitle = {2021 IEEE/RSJ International Conference on Intelligent Robots and
               Systems (IROS)},
+  title = {LiDAR Degradation Quantification for Autonomous Driving in Rain},
  year = {2021},
+  volume = {},
+  number = {},
  pages = {3458-3464},
-  url = {https://api.semanticscholar.org/CorpusID:245264644},
+  keywords = {Degradation;Location awareness;Laser radar;Rain;Codes;System
+              performance;Current measurement},
+  doi = {10.1109/IROS51168.2021.9636694},
 },
  @article{deep_learning_overview,
  title = {Deep learning in neural networks: An overview},
-  author = {J{\"u}rgen Schmidhuber},
-  journal = {Neural networks : the official journal of the International Neural
-             Network Society},
-  year = {2014},
+  journal = {Neural Networks},
  volume = {61},
  pages = {85-117},
-  url = {https://api.semanticscholar.org/CorpusID:11715509},
+  year = {2015},
+  issn = {0893-6080},
+  doi = {https://doi.org/10.1016/j.neunet.2014.09.003},
+  url = {https://www.sciencedirect.com/science/article/pii/S0893608014002135},
+  author = {Jürgen Schmidhuber},
+  keywords = {Deep learning, Supervised learning, Unsupervised learning,
+              Reinforcement learning, Evolutionary computation},
+  abstract = {In recent years, deep artificial neural networks (including
+              recurrent ones) have won numerous contests in pattern recognition
+              and machine learning. This historical survey compactly summarizes
+              relevant work, much of it from the previous millennium. Shallow and
+              Deep Learners are distinguished by the depth of their credit
+              assignment paths, which are chains of possibly learnable, causal
+              links between actions and effects. I review deep supervised
+              learning (also recapitulating the history of backpropagation),
+              unsupervised learning, reinforcement learning & evolutionary
+              computation, and indirect search for short programs encoding deep
+              and large networks.},
 },
@article{autoencoder_survey,
-title = {A comprehensive survey on design and application of autoencoder in deep learning},
+  title = {A comprehensive survey on design and application of autoencoder in
+           deep learning},
  journal = {Applied Soft Computing},
  volume = {138},
  pages = {110176},
@@ -178,6 +270,22 @@ issn = {1568-4946},
  doi = {https://doi.org/10.1016/j.asoc.2023.110176},
  url = {https://www.sciencedirect.com/science/article/pii/S1568494623001941},
  author = {Pengzhi Li and Yan Pei and Jianqiang Li},
-keywords = {Deep learning, Autoencoder, Unsupervised learning, Feature extraction, Autoencoder application},
-abstract = {Autoencoder is an unsupervised learning model, which can automatically learn data features from a large number of samples and can act as a dimensionality reduction method. With the development of deep learning technology, autoencoder has attracted the attention of many scholars. Researchers have proposed several improved versions of autoencoder based on different application fields. First, this paper explains the principle of a conventional autoencoder and investigates the primary development process of an autoencoder. Second, We proposed a taxonomy of autoencoders according to their structures and principles. The related autoencoder models are comprehensively analyzed and discussed. This paper introduces the application progress of autoencoders in different fields, such as image classification and natural language processing, etc. Finally, the shortcomings of the current autoencoder algorithm are summarized, and prospected for its future development directions are addressed.}
+  keywords = {Deep learning, Autoencoder, Unsupervised learning, Feature
+              extraction, Autoencoder application},
+  abstract = {Autoencoder is an unsupervised learning model, which can
+              automatically learn data features from a large number of samples
+              and can act as a dimensionality reduction method. With the
+              development of deep learning technology, autoencoder has attracted
+              the attention of many scholars. Researchers have proposed several
+              improved versions of autoencoder based on different application
+              fields. First, this paper explains the principle of a conventional
+              autoencoder and investigates the primary development process of an
+              autoencoder. Second, We proposed a taxonomy of autoencoders
+              according to their structures and principles. The related
+              autoencoder models are comprehensively analyzed and discussed. This
+              paper introduces the application progress of autoencoders in
+              different fields, such as image classification and natural language
+              processing, etc. Finally, the shortcomings of the current
+              autoencoder algorithm are summarized, and prospected for its future
+              development directions are addressed.},
 }