deepsad added figure, started with algo details

This commit is contained in:
Jan Kowalczyk
2025-03-12 15:41:20 +01:00
parent 08b25879f6
commit e02c1cedb3
3 changed files with 2922 additions and 7 deletions

View File

@@ -253,7 +253,7 @@
%In this chapter we explore the method \emph{Deep Semi-Supervised Anomaly Detection}~\cite{deepsad} which we employed during our experiments to quanitfy the degradation of lidar scans caused by artifically introduced water vapor from a theater smoke machine. The same approach of modeling a degradation quantification problem as an anomaly detection task was succesfully used in \cite{degradation_quantification_rain} to quantify the degradation caused to lidar scans by bad weather conditions such as rain, fog and snow for autonomous driving tasks. Deep SAD is characterized by it being a deep-learning approach to anomaly detection which enables it to learn more complex anomalous data patterns than more classic statistical approaches and its capability of employing hand-labeled data samples-both normal and anomalous-during its training step to better teach the model to differentiate between know anomalies and normal data than if only an unsupervised approach was used which basically just learns the most common patterns in the implicitely more common normal data and to differentiate anything from that.
In this chapter, we explore the method \emph{Deep Semi-Supervised Anomaly Detection} (Deep SAD)~\cite{deepsad}, which we employed to quantify the degradation of LiDAR scans caused by artificially introduced water vapor from a theater smoke machine. A similar approach—modeling degradation quantification as an anomaly detection task—was successfully applied in \cite{degradation_quantification_rain} to assess the impact of adverse weather conditions on LiDAR data for autonomous driving applications. Deep SAD leverages deep learning to capture complex anomalous patterns that classical statistical methods might miss. Furthermore, by incorporating a limited amount of hand-labeled data (both normal and anomalous), it can more effectively differentiate between known anomalies and normal data compared to purely unsupervised methods, which typically learn only the most prevalent patterns in the dataset~\cite{deepsad}.
In this chapter, we explore the method \emph{Deep Semi-Supervised Anomaly Detection} (Deep SAD)~\cite{deepsad}, which we employ to quantify the degradation of LiDAR scans caused by airborne particles in the form of artificially introduced water vapor from a theater smoke machine. A similar approach—modeling degradation quantification as an anomaly detection task—was successfully applied in \cite{degradation_quantification_rain} to assess the impact of adverse weather conditions on LiDAR data for autonomous driving applications. Deep SAD leverages deep learning to capture complex anomalous patterns that classical statistical methods might miss. Furthermore, by incorporating a limited amount of hand-labeled data (both normal and anomalous), it can more effectively differentiate between known anomalies and normal data compared to purely unsupervised methods, which typically learn only the most prevalent patterns in the dataset~\cite{deepsad}.
%Deep Semi-Supervised Anomaly Detection~\cite{deepsad} is a deep-learning based anomaly detection method whose performance in regards to sensor degradation quantification we explore in this thesis. It is a semi-supervised method which allows the introduction of manually labeled samples in addition to the unlabeled training data to improve the algorithm's performance over its unsupervised predecessor Deep One-Class Classification~\cite{deepsvdd}. The working principle of the method is to encode the input data onto a latent space and train the network to cluster normal data close together while anomalies get mapped further away in that latent space.
@@ -263,24 +263,30 @@ In this chapter, we explore the method \emph{Deep Semi-Supervised Anomaly Detect
%Deep SAD is a typical clustering based anomaly detection technique which is described in \cite{anomaly_detection_survey} to generally have a two step approach to anomaly detection. First a clustering algorithm is used to cluster data closely together around a centroid and secondly the distances from data to that centroid is calculated and interpreted as an anomaly score. This general idea can also be found in the definition of the Deep SAD algorithm, which uses the encoder part of an autoencoder architecture which is trained to cluster data around a centroid in the latent space of its output. The datas geometric distance to that centroid in the latent space is defined as an anomaly score. Deep SAD is a semi-supervised training based method which can work completely unsupervised (no labeled data available) in which case it falls back to its predecessor method Deep SVDD but additionally allows the introduction of labeleld data samples during training to more accurately map known normal samples near the centroid and known anomalous samples further away from it.
Deep SAD is an anomaly detection method that belongs to the category of clustering-based methods, which according to~\cite{anomaly_detection_survey} typically follow a two-step approach. First, a clustering algorithm groups data points around a centroid; then, the distances of individual data points from this centroid are calculated and used as an anomaly score. In Deep SAD, this concept is implemented by employing the encoder part of an autoencoder architecture, which is jointly trained to map data into a latent space and to minimize the volume of an data-encompassing hypersphere whose center is the aforementioned centroid. The geometric distance in the latent space to the hypersphere center is used as the anomaly score, where a higher score corresponds to a higher probability of a sample being anomalous according to the method.
Deep SAD is an anomaly detection algorithm that belongs to the category of clustering-based methods, which according to~\cite{anomaly_detection_survey} typically follow a two-step approach. First, a clustering algorithm groups data points around a centroid; then, the distances of individual data points from this centroid are calculated and used as an anomaly score. In Deep SAD, this concept is implemented by employing the encoder part of an autoencoder architecture, which is jointly trained to map data into a latent space and to minimize the volume of an data-encompassing hypersphere whose center is the aforementioned centroid. The geometric distance in the latent space to the hypersphere center is used as the anomaly score, where a higher score corresponds to a higher probability of a sample being anomalous according to the method, due to normal samples clustering more closely around the hypersphere center than anomalies. This general working principle is depicted in figure~\ref{fig:deep_svdd_transformation}.
\fig{deep_svdd_transformation}{figures/deep_svdd_transformation}{DeepSAD teaches a neural network to transform data into a latent space and minimize the volume of an data-encompassing hypersphere centered around a predetermined centroid $\textbf{c}$. \\Reproduced from~\cite{deepsvdd}.}
Deep SAD is semi-supervised, though it can operate in a fully unsupervised mode—effectively reverting to its predecessor, Deep SVDD~\cite{deepsvdd}—when no labeled data are available. However, it also allows for the incorporation of labeled samples during training. This additional supervision helps the model better position known normal samples near the centroid and push known anomalies farther away, thereby enhancing its ability to differentiate between normal and anomalous data.
%As a pre-training step an autoencoder architecture is trained and its weights are used to initialize its encoder part before training of the method itself begins. \citeauthor{deepsad} argue in~\cite{deepsad} that this pre-training step which was already present in~\cite{deepsvdd}, allows them to not only interpret the method in geometric terms as minimum volume estimation but also in probalistic terms as entropy minimization over the latent distribution, since the autoencoding objective implicitely maximizes the mutual information between the data and its latent space represenation. This insight-that the method follows the Infomax principle with the additional objective of the latent distribution having mininmal entropy-allowed \citeauthor{deepsad} to introduce an additional term in Deep SAD's - over Deep SVDD's objective, which encorporates labeled data to better model the nature of normal and anomalous data. They show that Deep SAD's objective can be interpreted as normal data's distribution in the latent space being modeled to have low entropy and anomalous data's distribution in that latent space being modeled as having high entropy, which they argue captures the nature of the difference between normal and anomalous data by interpreting anomalies ``as being generated from an infinite mixture of distributions that are different from normal data distribution''~\cite{deepsad}.
As a pre-training step, an autoencoder is trained and its encoder weights are used to initialize the model before beginning the main training phase. \citeauthor{deepsad} argue in \cite{deepsad} that this pre-training step—originally introduced in \cite{deepsvdd}—not only provides a geometric interpretation of the method as minimum volume estimation but also a probabilistic one as entropy minimization over the latent distribution. The autoencoding objective implicitly maximizes the mutual information between the data and its latent representation, aligning the approach with the Infomax principle while encouraging a latent space with minimal entropy. This insight enabled \citeauthor{deepsad} to introduce an additional term in DeepSADs objective, beyond that of Deep SVDD, which incorporates labeled data to better capture the characteristics of normal and anomalous data. They demonstrate that DeepSADs objective effectively models the latent distribution of normal data as having low entropy, while that of anomalous data is characterized by higher entropy. In this framework, anomalies are interpreted as being generated from an infinite mixture of distributions that differ from the normal data distribution.
As a pre-training step, an autoencoder is trained and its encoder weights are used to initialize the model before beginning the main training phase. \citeauthor{deepsad} argue in \cite{deepsad} that this pre-training step—originally introduced in \cite{deepsvdd}—not only allows a geometric interpretation of the method as minimum volume estimation but also a probabilistic one as entropy minimization over the latent distribution. The autoencoding objective implicitly maximizes the mutual information between the data and its latent representation, aligning the approach with the Infomax principle while encouraging a latent space with minimal entropy. This insight enabled \citeauthor{deepsad} to introduce an additional term in DeepSADs objective, beyond that of its predecessor Deep SVDD~\cite{deepsvdd}, which incorporates labeled data to better capture the characteristics of normal and anomalous data. They demonstrate that DeepSADs objective effectively models the latent distribution of normal data as having low entropy, while that of anomalous data is characterized by higher entropy. In this framework, anomalies are interpreted as being generated from an infinite mixture of distributions that differ from the normal data distribution.
The introduction of the aforementioned term in Deep SAD's objective allows it to learn in a semi-supervised way, though it can operate in a fully unsupervised mode—effectively reverting to its predecessor, Deep SVDD~\cite{deepsvdd}—when no labeled data are available. However, it also allows for the incorporation of labeled samples during training. This additional supervision helps the model better position known normal samples near the hypersphere center and push known anomalies farther away, thereby enhancing its ability to differentiate between normal and anomalous data.
\newsection{algorithm_details}{Algorithm Details and Hyperparameters}
\todo[inline]{backpropagation optimization formula, hyperaparameters explanation}
The neural network architecture of DeepSAD is not fixed but rather dependent on the datatype the algorithm is supposed to operate on. This is due to the way it employs an autoencoder for pre-training and the encoder part of the network for its main training step. This makes the adaption of an autoencoder architecture suitable to the specific application necessary but also allows for flexibility in choosing a fitting architecture depending on the application's requirements. For this reason the specific architecture employed, may be considered an hyperparameter of the Deep SAD algorithm. During the pre-training step-as is typical for autoencoders-no labels are necessary since the optimization objective of autoencoders is generally to reproduce the input, as is indicated by the architecture's name.
\todo[inline, color=green!40]{Core idea of the algorithm is to learn a transformation to map input data into a latent space where normal data clusters close together and anomalous data gets mapped further away. to achieve this the methods first includes a pretraining step of an auto-encoder to extract the most relevant information, second it fixes a hypersphere center in the auto-encoders latent space as a target point for normal data and third it traings the network to map normal data closer to that hypersphere center. Fourth The resulting network can map new data into this latent space and interpret its distance from the hypersphere center as an anomaly score which is larger the more anomalous the datapoint is}
\todo[inline, color=green!40]{explanation pre-training step: architecture of the autoencoder is dependent on the input data shape, but any data shape is generally permissible. for the autoencoder we do not need any labels since the optimization target is always the input itself. the latent space dimensionality can be chosen based on the input datas complexity (search citations). generally a higher dimensional latent space has more learning capacity but tends to overfit more easily (find cite). the pre-training step is used to find weights for the encoder which genereally extract robust and critical data from the input because TODO read deepsad paper (cite deepsad). as training data typically all data (normal and anomalous) is used during this step.}
\todo[inline, color=green!40]{explanation hypersphere center step: an additional positive ramification of the pretraining is that the mean of all pre-training's latent spaces can be used as the hypersphere target around which normal data is supposed to cluster. this is advantageous because it allows the main training to converge faster than choosing a random point in the latent space as hypersphere center. from this point onward the center C is fixed for the main training and inference and does not change anymore.}
\todo[inline, color=green!40]{explanation training step: during the main training step the method starts with the pre-trained weights of the encoder but removes the decoder from the architecture since it optimizes the output in the latent space and does not need to reproduce the input data format. it does so by minimizing the geometric distance of each input data's latent space represenation to the previously defined hypersphere center c. Due to normal data being more common in the inputs this results in normal data clustering closely to C and anormal data being pushed away from it. additionally during this step the labeled data is used to more correctly map normal and anormal data}
\todo[inline, color=green!40]{explanation inference step: with the trained network we can transform new input data into the latent space and calculate its distance from the hypersphere center which will be smaller the more confident the network is in the data being normal and larger the more likely the data is anomalous. This output score is an analog value dependent on multiple factors like the latent space dimensionality, encoder architecture and ??? and has to be interpreted further to be used (for example thresholding)}
\newsection{algorithm_details}{Algorithm Details and Hyperparameters}
\todo[inline]{backpropagation optimization formula, hyperaparameters explanation}
\todo[inline, color=green!40]{in formula X we see the optimization target of the algorithm. explain in one paragraph the variables in the optimization formula}
\todo[inline, color=green!40]{explain the three terms (unlabeled, labeled, regularization)}
\begin{equation}