general rework and start preprocessing common thread

This commit is contained in:
Jan Kowalczyk
2025-05-20 14:26:39 +02:00
parent 341285a10a
commit 98d95edad6

View File

@@ -70,7 +70,7 @@
%\usepackage[disable]{todonotes}
\DeclareRobustCommand{\threadtodo}[4]{%
\todo[inline,
\todo[disable,
backgroundcolor=red!20,
bordercolor=red!50,
textcolor=black!80,
@@ -279,9 +279,9 @@ This thesis tackles a broad, interdisciplinary challenge at the intersection of
Because anomalies are, by nature, often unpredictable in form and structure, unsupervised learning methods are widely used since they do not require pre-assigned labels—a significant advantage when dealing with unforeseen data patterns. However, these methods can be further refined through the integration of a small amount of labeled data, giving rise to semi-supervised approaches. The method evaluated in this thesis, DeepSAD, is a semi-supervised deep learning approach that also leverages an autoencoder architecture in its design. Autoencoders have gained widespread adoption in deep learning for their ability to extract features from unlabeled data, which is particularly useful for handling complex data types such as LiDAR scans.
LiDAR sensors function by projecting lasers in multiple directions near-simultaneously, measuring the time it takes for each reflected ray to return. Using the angles and travel times, the sensor constructs a point cloud that is often accurate enough to map the sensor's surroundings. In the following sections, we will delve into these technologies, review how they work, how they are generally used and describe how they are employed in this thesis. We will also explore research from these backgrounds related to our thesis.
LiDAR sensors function by projecting lasers in multiple directions near-simultaneously, measuring the time it takes for each reflected ray to return. Using the angles and travel times, the sensor constructs a point cloud that is often accurate enough to map the sensor's surroundings. In the following sections, we will delve into these technologies, review how they work, how they are generally used, how we employ them in this thesis and explore related work from these backgrounds.
\todo[inline, color=green!40]{mention related work + transition to anomaly detection}
%\todo[inline, color=green!40]{mention related work + transition to anomaly detection}
\newsection{anomaly_detection}{Anomaly Detection}
@@ -316,13 +316,13 @@ By their very nature anomalies are rare occurences and oftentimes unpredictable
\end{enumerate}
In this thesis we used an anomaly detection method, namely Deep Semi-Supervised Anomaly Detection~\cite{deepsad} to model our problem -how to quantify the degradation of lidar sensor data- as an anomaly detection problem. We do this by classifying good quality data as normal and degraded data as anomalous and rely on a method which can express each samples likelihood of being anomalous as an analog anomaly score, which enables us to interpret it as the datas degradation quantification value.
In this thesis we used an anomaly detection method, namely \citetitle{deepsad}~\cite{deepsad} to model our problem -how to quantify the degradation of lidar sensor data- as an anomaly detection problem. We do this by classifying good quality data as normal and degraded data as anomalous and rely on a method which can express each samples likelihood of being anomalous as an analog anomaly score, which enables us to interpret it as the datas degradation quantification value.
Chapter~\ref{chp:deepsad} describes DeepSAD in more detail, which shows that it is a clustering based approach with a spectral pre-processing component, in that it uses a neural network to reduce the inputs dimensionality while simultaneously clustering normal data closely around a given centroid. It then produces an anomaly score by calculating the geometric distance between a data sample and the aforementioned cluster centroid, assuming the distance is shorter for normal than for anomalous data. Since our data is high dimensional it makes sense to use a spectral method to reduce the datas dimensionality and an approach which results in an analog value rather than a binary classification is useful for our use-case since we want to quantify not only classify the data degradation.
%\todo[inline, color=green!40]{data availability leading into semi-supervised learning algorithms}
There is a wide array of problems in domains similar to the one we research in this paper, for which modeling them as anomaly detection problems has been proven successful. The degradation of pointclouds, produced by an industrial 3D sensor, has been modeled as an anomaly detection task in~\cite{bg_ad_pointclouds_scans}. \citeauthor{bg_ad_pointclouds_scans} propose a student-teacher model capable of infering a pointwise anomaly score for degradation in point clouds. The teacher network is trained on an anomaly-free dataset to extract dense features of the point clouds' local geometries, after which an identical student network is trained to emulate the teacher networks' outputs. For degraded pointclouds the regression between the teacher's and student's outputs is calculated and interpreted as the anomaly score, with the rationalization that the student network has not observed features produced by anomalous geometries during training, leaving it incapable of producing a similar output as the teacher for those regions. Another example would be~\cite{bg_ad_pointclouds_poles}, which proposes a method to detect and classify pole-like objects in urban point cloud data, to differentiate between natural and man-made objects such as street signs, for autonomous driving purposes. An anomaly detection method was used to identify the vertical pole-like objects in the point clouds and then the preprocessed objects were grouped by similarity using a clustering algorithm to then classify them as either trees or man-made poles.
There is a wide array of problems in domains similar to the one we research in this paper, for which modeling them as anomaly detection problems has been proven successful. The degradation of pointclouds, produced by an industrial 3D sensor, has been modeled as an anomaly detection task in \citetitle{bg_ad_pointclouds_scans}~\cite{bg_ad_pointclouds_scans}. \citeauthor{bg_ad_pointclouds_scans} propose a student-teacher model capable of infering a pointwise anomaly score for degradation in point clouds. The teacher network is trained on an anomaly-free dataset to extract dense features of the point clouds' local geometries, after which an identical student network is trained to emulate the teacher networks' outputs. For degraded pointclouds the regression between the teacher's and student's outputs is calculated and interpreted as the anomaly score, with the rationalization that the student network has not observed features produced by anomalous geometries during training, leaving it incapable of producing a similar output as the teacher for those regions. Another example would be \citetitle{bg_ad_pointclouds_poles}~\cite{bg_ad_pointclouds_poles}, which proposes a method to detect and classify pole-like objects in urban point cloud data, to differentiate between natural and man-made objects such as street signs, for autonomous driving purposes. An anomaly detection method was used to identify the vertical pole-like objects in the point clouds and then the preprocessed objects were grouped by similarity using a clustering algorithm to then classify them as either trees or man-made poles.
As already shortly mentioned at the beginning of this section, anomaly detection methods and their usage are oftentimes challenged by the limited availability of anomalous data, owing to the very nature of anomalies which are rare occurences. Oftentimes the intended use-case is to even find unknown anomalies in a given dataset which have not yet been identified. In addition, it can be challenging to classify anomalies correctly for complex data, since the very definition of an anomaly is dependent on many factors such as the type of data, the intended use-case or even how the data evolves over time. For these reasons most types of anomaly detection approaches limit their reliance on anomalous data during training and many of them do not differentiate between normal and anomalous data at all. DeepSAD is a semi-supervised method which is characterized by using a mixture of labeled and unlabeled data.
@@ -415,7 +415,7 @@ Autoencoders are a type of neural network architecture, whose main goal is learn
One key use case of autoencoders is to employ them as a dimensionality reduction technique. In that case, the latent space inbetween the encoder and decoder is of a lower dimensionality than the input data itself. Due to the aforementioned reconstruction goal, the shared information between the input data and its latent space representation is maximized, which is known as following the infomax principle. After training such an autoencoder, it may be used to generate lower-dimensional representations of the given datatype, enabling more performant computations which may have been infeasible to achieve on the original data. DeepSAD uses an autoencoder in a pre-training step to achieve this goal among others.
Autoencoders have been shown to be useful in the anomaly detection domain by assuming that autoencoders trained on more normal than anomalous data are better at reconstructing normal behaviour than anomalous one. This assumption allows methods to utilize the reconstruction error as an anomaly score. Examples for this are the outlier detection method in~\cite{bg_autoencoder_ad} or the anomaly detection method in~\cite{bg_autoencoder_ad_2} which both employ an autoencoder and the aforementioned assumption. Autoencoders have also been shown to be a suitable dimensionality reduction technique for lidar data, which is oftentimes high-dimensional and sparse, making feature extraction and dimensionality reduction popular preprocessing steps. As an example,~\cite{bg_autoencoder_lidar} shows the feasibility and advantages of using an autoencoder architecture to reduce lidar-orthophoto fused feature's dimensionality for their building detection method, which can recognize buildings in visual data taken from an airplane. Similarly, we can make use of the dimensionality reduction in DeepSAD's pre-training step, since our method is intended to work with high-dimensional lidar data.
Autoencoders have been shown to be useful in the anomaly detection domain by assuming that autoencoders trained on more normal than anomalous data are better at reconstructing normal behaviour than anomalous one. This assumption allows methods to utilize the reconstruction error as an anomaly score. Examples of this are the method in \citetitle{bg_autoencoder_ad}~\cite{bg_autoencoder_ad} or the one in \citetitle{bg_autoencoder_ad_2}~\cite{bg_autoencoder_ad_2} which both employ an autoencoder and the aforementioned assumption. Autoencoders have also been shown to be a suitable dimensionality reduction technique for lidar data, which is oftentimes high-dimensional and sparse, making feature extraction and dimensionality reduction popular preprocessing steps. As an example, \citetitle{bg_autoencoder_lidar}~\cite{bg_autoencoder_lidar} shows the feasibility and advantages of using an autoencoder architecture to reduce lidar-orthophoto fused feature's dimensionality for their building detection method, which can recognize buildings in visual data taken from an airplane. Similarly, we can make use of the dimensionality reduction in DeepSAD's pre-training step, since our method is intended to work with high-dimensional lidar data.
%Another way to employ autoencoders is to use them as a generative technique. The decoder in autoencoders is trained to reproduce the input state from its encoded representation, which can also be interpreted as the decoder being able to generate data of the input type, from an encoded representation. A classic autoencoder trains the encoder to map its input to a single point in the latent space-a distriminative modeling approach, which can succesfully learn a predictor given enough data. In generative modeling on the other hand, the goal is to learn the distribution the data originates from, which is the idea behind variational autoencoders (VAE). VAEs have the encoder produce an distribution instead of a point representation, samples from which are then fed to the decoder to reconstruct the original input. The result is the encoder learning to model the generative distribution of the input data, which enables new usecases, due to the latent representation
@@ -443,11 +443,11 @@ LiDARs high accuracy, long range, and full-circle field of view make it indis
In subterranean and rescue domain scenarios, the dominant challenge is airborne particles: dust kicked up by debris or smoke from fires. These aerosols create early returns that can mask real obstacles and cause missing data behind particle clouds, undermining SLAM and perception algorithms designed for cleaner data. This degradation is a type of atmospheric scattering, which can be caused by any kind of airborne particulates (e.g., snowflakes) or liquids (e.g., water droplets). Other kinds of environmental noise exist as well, such as specular reflections caused by smooth surfaces, beam occlusion due to close objects blocking the sensor's field of view or even thermal drift-temperature affecting the sensor's circuits and mechanics, introducing biases in the measurements.
All of these may create unwanted noise in the point cloud created by the lidar, making this domain an important research topic. \cite{lidar_denoising_survey} gives an overview about the current state of research into denoising methods for lidar in adverse environments, categorizes them according to their approach (distance-, intensity- or learning-based) and concludes that all approaches have merits but also open challenges to solve, for autonomous systems to safely navigate these adverse environments. The current research is heavily focused on the automotive domain, which can be observed by the vastly higher number of methods filtering noise from adverse weather effects-environmental scattering from rain, snow and fog-than from dust, smoke or other particles occuring rarely in the automotive domain.
All of these may create unwanted noise in the point cloud created by the lidar, making this domain an important research topic. \citetitle{lidar_denoising_survey}~\cite{lidar_denoising_survey} gives an overview about the current state of research into denoising methods for lidar in adverse environments, categorizes them according to their approach (distance-, intensity- or learning-based) and concludes that all approaches have merits but also open challenges to solve, for autonomous systems to safely navigate these adverse environments. The current research is heavily focused on the automotive domain, which can be observed by the vastly higher number of methods filtering noise from adverse weather effects-environmental scattering from rain, snow and fog-than from dust, smoke or other particles occuring rarely in the automotive domain.
A learning-based method to filter dust-caused degradation from lidar is introduced in~\cite{lidar_denoising_dust}. The authors employ a convultional neural network to classify dust particles in lidar point clouds as such, enabling the filtering of those points and compare their methods to more conservative approaches, such as various outlier removal algorithms. Another relevant example would be the filtering method proposed in~\cite{lidar_subt_dust_removal}, which enables the filtration of pointclouds degraded by smoke or dust in subterranean environments, with a focus on the search and rescue domain. To achieve this, they formulated a filtration framework that relies on dynamic onboard statistical cluster outlier removal, to classify and remove dust particles in point clouds.
A learning-based method to filter dust-caused degradation from lidar is introduced in \citetitle{lidar_denoising_dust}~\cite{lidar_denoising_dust}. The authors employ a convultional neural network to classify dust particles in lidar point clouds as such, enabling the filtering of those points and compare their methods to more conservative approaches, such as various outlier removal algorithms. Another relevant example would be the filtering method proposed in \citetitle{lidar_subt_dust_removal}~\cite{lidar_subt_dust_removal}, which enables the filtration of pointclouds degraded by smoke or dust in subterranean environments, with a focus on the search and rescue domain. To achieve this, they formulated a filtration framework that relies on dynamic onboard statistical cluster outlier removal, to classify and remove dust particles in point clouds.
Our method does not aim to remove the noise or degraded points in the lidar data, but quantify its degradation to inform other systems of the autonomous robot about the data's quality, enabling more informed decisions. One such approach, though from the autonomous driving and not from the search and rescue domain can be found in~\cite{degradation_quantification_rain}. A learning-based method to quantify the lidar's sensor data degradation caused by adverse weather-effects was proposed, implemented by posing the problem as an anomaly detection task and utilizing DeepSAD to learn degraded data to be an anomaly and high quality data to be normal behaviour. DeepSAD's anomaly score was used as the degradation's quantification score. From this example we decided to imitate this method and adapt it for the search and rescue domain, although this proved challenging due to the more limited data availability. Since it was effective for the closely related~\cite{degradation_quantification_rain}, we also employed DeepSAD, whose detailed workings we present in the following chapter.
Our method does not aim to remove the noise or degraded points in the lidar data, but quantify its degradation to inform other systems of the autonomous robot about the data's quality, enabling more informed decisions. One such approach, though from the autonomous driving and not from the search and rescue domain can be found in \citetitle{degradation_quantification_rain}~\cite{degradation_quantification_rain}. A learning-based method to quantify the lidar's sensor data degradation caused by adverse weather-effects was proposed, implemented by posing the problem as an anomaly detection task and utilizing DeepSAD to learn degraded data to be an anomaly and high quality data to be normal behaviour. DeepSAD's anomaly score was used as the degradation's quantification score. From this example we decided to imitate this method and adapt it for the search and rescue domain, although this proved challenging due to the more limited data availability. Since it was effective for the closely related \citetitle{degradation_quantification_rain}~\cite{degradation_quantification_rain}, we also employed DeepSAD, whose detailed workings we present in the following chapter.
%\todo[inline]{related work, survey on lidar denoising, noise removal in subt - quantifying same as us in rain, also used deepsad - transition}
@@ -476,7 +476,7 @@ Our method does not aim to remove the noise or degraded points in the lidar data
{explain use-case, similar use-case worked, allude to core features}
{interest/curiosity created $\rightarrow$ wants to learn about DeepSAD}
In this chapter, we explore the method \emph{Deep Semi-Supervised Anomaly Detection} (Deep SAD)~\cite{deepsad}, which we employ to quantify the degradation of LiDAR scans caused by airborne particles in the form of artificially introduced water vapor from a theater smoke machine. A similar approach—modeling degradation quantification as an anomaly detection task—was successfully applied in \cite{degradation_quantification_rain} to assess the impact of adverse weather conditions on LiDAR data for autonomous driving applications. Deep SAD leverages deep learning to capture complex anomalous patterns that classical statistical methods might miss. Furthermore, by incorporating a limited amount of hand-labeled data (both normal and anomalous), it can more effectively differentiate between known anomalies and normal data compared to purely unsupervised methods, which typically learn only the most prevalent patterns in the dataset~\cite{deepsad}.
In this chapter, we explore the method \citetitle{deepsad}~(Deep SAD)~\cite{deepsad}, which we employ to quantify the degradation of LiDAR scans caused by airborne particles in the form of artificially introduced water vapor from a theater smoke machine. A similar approach—modeling degradation quantification as an anomaly detection task—was successfully applied in \citetitle{degradation_quantification_rain}~\cite{degradation_quantification_rain} to assess the impact of adverse weather conditions on LiDAR data for autonomous driving applications. Deep SAD leverages deep learning to capture complex anomalous patterns that classical statistical methods might miss. Furthermore, by incorporating a limited amount of hand-labeled data (both normal and anomalous), it can more effectively differentiate between known anomalies and normal data compared to purely unsupervised methods, which typically learn only the most prevalent patterns in the dataset~\cite{deepsad}.
%Deep Semi-Supervised Anomaly Detection~\cite{deepsad} is a deep-learning based anomaly detection method whose performance in regards to sensor degradation quantification we explore in this thesis. It is a semi-supervised method which allows the introduction of manually labeled samples in addition to the unlabeled training data to improve the algorithm's performance over its unsupervised predecessor Deep One-Class Classification~\cite{deepsvdd}. The working principle of the method is to encode the input data onto a latent space and train the network to cluster normal data close together while anomalies get mapped further away in that latent space.
@@ -492,7 +492,7 @@ In this chapter, we explore the method \emph{Deep Semi-Supervised Anomaly Detect
{how clustering AD generally works, how it does in DeepSAD}
{since the reader knows the general idea $\rightarrow$ what is the step-by-step?}
Deep SAD's overall mechanics are similar to clustering-based anomaly detection methods, which according to~\cite{anomaly_detection_survey} typically follow a two-step approach. First, a clustering algorithm groups data points around a centroid; then, the distances of individual data points from this centroid are calculated and used as an anomaly score. In Deep SAD, these concepts are implemented by employing a neural network, which is jointly trained to map input data onto a latent space and to minimize the volume of an data-encompassing hypersphere, whose center is the aforementioned centroid. The data's geometric distance in the latent space to the hypersphere center is used as the anomaly score, where a larger distance between data and centroid corresponds to a higher probability of a sample being anomalous. This is achieved by shrinking the data-encompassing hypersphere during training, proportionally to all training data, of which is required that there is significantly more normal than anomalous data present. The outcome of this approach is that normal data gets clustered more closely around the centroid, while anomalies appear further away from it as can be seen in the toy example depicted in figure~\ref{fig:deep_svdd_transformation}.
Deep SAD's overall mechanics are similar to clustering-based anomaly detection methods, which according to \citetitle{anomaly_detection_survey}~\cite{anomaly_detection_survey} typically follow a two-step approach. First, a clustering algorithm groups data points around a centroid; then, the distances of individual data points from this centroid are calculated and used as an anomaly score. In Deep SAD, these concepts are implemented by employing a neural network, which is jointly trained to map input data onto a latent space and to minimize the volume of an data-encompassing hypersphere, whose center is the aforementioned centroid. The data's geometric distance in the latent space to the hypersphere center is used as the anomaly score, where a larger distance between data and centroid corresponds to a higher probability of a sample being anomalous. This is achieved by shrinking the data-encompassing hypersphere during training, proportionally to all training data, of which is required that there is significantly more normal than anomalous data present. The outcome of this approach is that normal data gets clustered more closely around the centroid, while anomalies appear further away from it as can be seen in the toy example depicted in figure~\ref{fig:deep_svdd_transformation}.
%Deep SAD is an anomaly detection algorithm that belongs to the category of clustering-based methods, which according to~\cite{anomaly_detection_survey} typically follow a two-step approach. First, a clustering algorithm groups data points around a centroid; then, the distances of individual data points from this centroid are calculated and used as an anomaly score. In addition to that, DeepSAD also utilizes a spectral component by mapping the input data onto a lower-dimensional space, which enables it to detect anomalies in high-dimensional complex data types. In Deep SAD, these concepts are implemented by employing a neural network, which is jointly trained to map data into a latent space and to minimize the volume of an data-encompassing hypersphere whose center is the aforementioned centroid. The geometric distance in the latent space to the hypersphere center is used as the anomaly score, where a larger distance between data and centroid corresponds to a higher probability of a sample being anomalous. This is achieved by shrinking the data-encompassing hypersphere during training, proportionally to all training data, of which is required that there is significantly more normal than anomalous data present. The outcome of this approach is that normal data gets clustered more closely around the centroid, while anomalies appear further away from it as can be seen in the toy example depicted in figure~\ref{fig:deep_svdd_transformation}.
@@ -514,7 +514,7 @@ Before DeepSAD's training can begin, a pre-training step is required, during whi
{pre-training weights used to init main network, c is mean of forward pass, collapse}
{network built and initialized, centroid fixed $\rightarrow$ start main training}
The pre-training results are used in two more key ways. First, the encoder weights obtained from the autoencoder pre-training initialize DeepSADs network for the main training phase. Second, we perform an initial forward pass through the encoder on all training samples, and the mean of these latent representations is set as the hypersphere center, $\mathbf{c}$. According to \cite{deepsad}, this initialization method leads to faster convergence during the main training phase compared to using a randomly selected centroid. An alternative would be to compute $\mathbf{c}$ using only the labeled normal examples, which would prevent the center from being influenced by anomalous samples; however, this requires a sufficient number of labeled normal samples. Once defined, the hypersphere center $\mathbf{c}$ remains fixed, as allowing it to be optimized freely could in the unsupervised case lead to a hypersphere collapse-a trivial solution where the network learns to map all inputs directly onto the centroid $\mathbf{c}$.
The pre-training results are used in two more key ways. First, the encoder weights obtained from the autoencoder pre-training initialize DeepSADs network for the main training phase. Second, we perform an initial forward pass through the encoder on all training samples, and the mean of these latent representations is set as the hypersphere center, $\mathbf{c}$. According to \citeauthor{deepsad}, this initialization method leads to faster convergence during the main training phase compared to using a randomly selected centroid. An alternative would be to compute $\mathbf{c}$ using only the labeled normal examples, which would prevent the center from being influenced by anomalous samples; however, this requires a sufficient number of labeled normal samples. Once defined, the hypersphere center $\mathbf{c}$ remains fixed, as allowing it to be optimized freely could in the unsupervised case lead to a hypersphere collapse-a trivial solution where the network learns to map all inputs directly onto the centroid $\mathbf{c}$.
\threadtodo
{how does the main training work, what data is used, what is the optimization target}
@@ -534,12 +534,14 @@ In the main training step, DeepSAD's network is trained using SGD backpropagatio
To infer if a previously unknown data sample is normal or anomalous, the sample is fed in a forward-pass through the fully trained network. During inference, the centroid $\mathbf{c}$ needs to be known, to calculate the geometric distance of the samples latent representation to $\mathbf{c}$. This distance is tantamount to an anomaly score, which correlates with the likelihood of the sample being anomalous. Due to differences in input data type, training success and latent space dimensionality, the anomaly score's magnitude has to be judged on an individual basis for each trained network. This means, scores produced by one network that signify normal data, may very well clearly indicate an anomaly for another network. The geometric distance between two points in space is a scalar analog value, therefore post-processing of the score is necessary to achieve a binary classification of normal and anomalous if desired.
DeepSAD's full training and inference procedure is visualized in figure~\ref{fig:deepsad_procedure}, which gives a comprehensive overview of the dataflows, tuneable hyperparameters and individual steps involved.
\newsection{algorithm_details}{Algorithm Details and Hyperparameters}
%\todo[inline]{backpropagation optimization formula, hyperaparameters explanation}
%As a pre-training step an autoencoder architecture is trained and its weights are used to initialize its encoder part before training of the method itself begins. \citeauthor{deepsad} argue in~\cite{deepsad} that this pre-training step which was already present in~\cite{deepsvdd}, allows them to not only interpret the method in geometric terms as minimum volume estimation but also in probalistic terms as entropy minimization over the latent distribution, since the autoencoding objective implicitely maximizes the mutual information between the data and its latent space represenation. This insight-that the method follows the Infomax principle with the additional objective of the latent distribution having mininmal entropy-allowed \citeauthor{deepsad} to introduce an additional term in Deep SAD's - over Deep SVDD's objective, which encorporates labeled data to better model the nature of normal and anomalous data. They show that Deep SAD's objective can be interpreted as normal data's distribution in the latent space being modeled to have low entropy and anomalous data's distribution in that latent space being modeled as having high entropy, which they argue captures the nature of the difference between normal and anomalous data by interpreting anomalies ``as being generated from an infinite mixture of distributions that are different from normal data distribution''~\cite{deepsad}.
Since Deep SAD is heavily based on its predecessor Deep SVDD it is helpful to first understand Deep SVDD's optimization objective, so we start with explaining it here. For input space $\mathcal{X} \subseteq \mathbb{R}^D$, output space $\mathcal{Z} \subseteq \mathbb{R}^d$ and a neural network $\phi(\wc; \mathcal{W}) : \mathcal{X} \to \mathcal{Z}$ where $\mathcal{W}$ depicts the neural networks' weights with $L$ layers $\{\mathbf{W}_1, \dots, \mathbf{W}_L\}$, $n$ the number of unlabeled training samples $\{\mathbf{x}_1, \dots, \mathbf{x}_n\}$, $\mathbf{c}$ the center of the hypersphere in the latent space, Deep SVDD teaches the neural network to cluster normal data closely together in the latent space by defining its optimization objective as seen in~\ref{eq:deepsvdd_optimization_objective}.
Since Deep SAD is heavily based on its predecessor \citetitle{deepsvdd}~(Deep SVDD)~\cite{deepsvdd} it is helpful to first understand Deep SVDD's optimization objective, so we start with explaining it here. For input space $\mathcal{X} \subseteq \mathbb{R}^D$, output space $\mathcal{Z} \subseteq \mathbb{R}^d$ and a neural network $\phi(\wc; \mathcal{W}) : \mathcal{X} \to \mathcal{Z}$ where $\mathcal{W}$ depicts the neural networks' weights with $L$ layers $\{\mathbf{W}_1, \dots, \mathbf{W}_L\}$, $n$ the number of unlabeled training samples $\{\mathbf{x}_1, \dots, \mathbf{x}_n\}$, $\mathbf{c}$ the center of the hypersphere in the latent space, Deep SVDD teaches the neural network to cluster normal data closely together in the latent space by defining its optimization objective as seen in~\ref{eq:deepsvdd_optimization_objective}.
\begin{equation}
\label{eq:deepsvdd_optimization_objective}
@@ -550,9 +552,9 @@ Since Deep SAD is heavily based on its predecessor Deep SVDD it is helpful to fi
As can be seen from \ref{eq:deepsvdd_optimization_objective}, Deep SVDD is an unsupervised method which does not rely on labeled data to train the network to differentiate between normal and anomalous data. The first term of the optimization objective depicts the shrinking of the data-encompassing hypersphere around the given center $\mathbf{c}$. For each data sample $\{\mathbf{x}_1, \dots, \mathbf{x}_n\}$, its geometric distance to $\mathbf{c}$ in the latent space produced by the neural network $\phi(\wc; \mathcal{W})$ is minimized proportionally to the amount of data samples $n$. The second term is a standard L2 regularization term which prevents overfitting with hyperparameter $\lambda > 0$ and $\|\wc\|_F$ denoting the Frobenius norm.
\citeauthor{deepsad} argue in \cite{deepsad} that the pre-training step employing an autoencoder—originally introduced in \cite{deepsvdd}—not only allows a geometric interpretation of the method as minimum volume estimation i.e., the shrinking of the data encompassing hypersphere but also a probabilistic one as entropy minimization over the latent distribution. The autoencoding objective during pre-training implicitly maximizes the mutual information between the data and its latent representation, aligning the approach with the Infomax principle while encouraging a latent space with minimal entropy. This insight enabled \citeauthor{deepsad} to introduce an additional term in DeepSADs objective, beyond that of its predecessor Deep SVDD~\cite{deepsvdd}, which incorporates labeled data to better capture the characteristics of normal and anomalous data. They demonstrate that DeepSADs objective effectively models the latent distribution of normal data as having low entropy, while that of anomalous data is characterized by higher entropy. In this framework, anomalies are interpreted as being generated from an infinite mixture of distributions that differ from the normal data distribution.
\citeauthor{deepsad} argue that the pre-training step employing an autoencoder—originally introduced in Deep SVDD—not only allows a geometric interpretation of the method as minimum volume estimation i.e., the shrinking of the data encompassing hypersphere but also a probabilistic one as entropy minimization over the latent distribution. The autoencoding objective during pre-training implicitly maximizes the mutual information between the data and its latent representation, aligning the approach with the Infomax principle while encouraging a latent space with minimal entropy. This insight enabled \citeauthor{deepsad} to introduce an additional term in DeepSADs objective, beyond that of its predecessor Deep SVDD, which incorporates labeled data to better capture the characteristics of normal and anomalous data. They demonstrate that DeepSADs objective effectively models the latent distribution of normal data as having low entropy, while that of anomalous data is characterized by higher entropy. In this framework, anomalies are interpreted as being generated from an infinite mixture of distributions that differ from the normal data distribution.
The introduction of the aforementioned term in Deep SAD's objective allows it to learn in a semi-supervised way, though it can operate in a fully unsupervised mode—effectively reverting to its predecessor, Deep SVDD~\cite{deepsvdd}—when no labeled data are available. This additional supervision helps the model better position known normal samples near the hypersphere center and push known anomalies farther away, thereby enhancing its ability to differentiate between normal and anomalous data.
The introduction of the aforementioned term in Deep SAD's objective allows it to learn in a semi-supervised way, though it can operate in a fully unsupervised mode—effectively reverting to its predecessor, Deep SVDD—when no labeled data are available. This additional supervision helps the model better position known normal samples near the hypersphere center and push known anomalies farther away, thereby enhancing its ability to differentiate between normal and anomalous data.
From \ref{eq:deepsvdd_optimization_objective} it is easy to understand Deep SAD's optimization objective seen in \ref{eq:deepsad_optimization_objective} which additionally defines $m$ number of labeled data samples $\{(\mathbf{\tilde{x}}_1, \tilde{y}_1), \dots, (\mathbf{\tilde{x}}_m, \tilde{y}_1)\} \in \mathcal{X} \times \mathcal{Y}$ and $\mathcal{Y} = \{-1,+1\}$ for which $\tilde{y} = +1$ denotes normal and $\tilde{y} = -1$ anomalous samples as well as a new hyperparameter $\eta > 0$ which can be used to balance the strength with which labeled and unlabeled samples contribute to the training.
@@ -668,10 +670,16 @@ To mitigate the aforementioned risks we adopt a human-centric, binary labelling
\newsection{data_dataset}{Chosen Dataset}
\threadtodo
{give a comprehensive overview about chosen dataset}
{all requirements/challenges are clear, now reader wants to know about subter dataset}
{overview, domain, sensors, lidar, experiments, volume, statistics}
{statistics about degradation, not full picture $\rightarrow$ how did we preprocess and label}
%\todo[inline, color=green!40]{list sensors on the platform}
%Based on the previously discussed requirements and labeling difficulties we decided to train and evaluate the methods on \emph{Multimodal Dataset from Harsh Sub-Terranean Environment with Aerosol Particles for Frontier Exploration}~\cite{subter}. The dataset is comprised of data from multiple sensors on a moving sensor platform which was driven through tunnels and rooms in a subterranean setting. What makes it especially fitting for our use case is that during some of the experiments, an artifical smoke machine was employed to simulate aerosol particles.
%The sensors employed during capture of the dataset include:
Based on the previously discussed requirements and the challenges of obtaining reliable labels, we selected the \citetitle{subter}~\cite{subter} for training and evaluation. This dataset comprises multimodal sensor data collected from a robotic platform navigating tunnels and rooms in a subterranean environment, an underground tunnel in Luleå, Sweden. Notably, some experiments incorporated an artificial smoke machine to simulate heavy degradation from aerosol particles, making the dataset particularly well-suited to our use case. The sensors used during data capture include:\todo[inline, color=green!40]{refer to sketch with numbers}
Based on the previously discussed requirements and the challenges of obtaining reliable labels, we selected the \citetitle{subter}~\cite{subter} for training and evaluation. This dataset comprises multimodal sensor data collected from a robotic platform navigating tunnels and rooms in a subterranean environment, an underground tunnel in Luleå, Sweden. Notably, some experiments incorporated an artificial smoke machine to simulate heavy degradation from aerosol particles, making the dataset particularly well-suited to our use case. A Pioneer 3-AT2 robotic platform, which can be seen in figure~\ref{fig:subter_platform_photo}, was used to mount a multitude of sensors that are described in table~\ref{tab:subter-sensors} and whose mounting locations are depicted in figure~\ref{fig:subter_platform_sketch}.
% \begin{itemize}
% \item Lidar - Ouster OS1-32
@@ -684,10 +692,11 @@ Based on the previously discussed requirements and the challenges of obtaining r
%-------------------------------------------------
% Compact sensor overview (row numbers follow Fig.~\ref{fig:subter_platform})
%-------------------------------------------------
\todo[inline]{todo: check table for accuracy/errors}
\begin{table}[htbp]
\centering
\caption{Onboard sensors recorded in the \citetitle{subter} dataset. Numbers match the labels in Fig.~\ref{fig:subter_platform}; only the most salient details are shown for quick reference.\todo[inline]{check errors}}
\label{tab:sensor-suite-compact}
\caption{Onboard sensors recorded in the \citetitle{subter} dataset. Numbers match the labels in Fig.~\ref{fig:subter_platform}; only the most salient details are shown for quick reference.}
\label{tab:subter-sensors}
\setlength{\tabcolsep}{4pt}
\renewcommand{\arraystretch}{1.25}
\rowcolors{2}{gray!08}{white}
@@ -698,7 +707,7 @@ Based on the previously discussed requirements and the challenges of obtaining r
2 & \sensorcell{mm-wave RADAR (×4)}{TI IWR6843AoP} & 4 × 60° RADAR point clouds & 30 Hz, 60 GHz, 9 m max, 0.05 m res. \\
3 & \sensorcell{Solid-state LiDAR}{Velodyne Velarray M1600} & Forward LiDAR cloud & 10 Hz, 160 ch, 120° × 32°, 0.130 m \\
4 & \sensorcell{RGB-D / stereo cam}{Luxonis OAK-D Pro} & RGB image, depth map, point cloud & 15 fps, 75 mm baseline, active IR 930 nm \\
5 & \sensorcell{LED flood-light}{RS PRO WL28R} & Scene illumination only & 7 W, 650 lm (no data stream) \\
5 & \sensorcell{LED flood-light}{RS PRO WL28R} & Illumination for stereo cam & 7 W, 650 lm (no data stream) \\
6 & \sensorcell{IMU}{Pixhawk 2.1 Cube Orange} & Accel, gyro, mag, baro & 190 Hz, 9-DoF, vibration-damped \\
7 & \sensorcell{On-board PC}{Intel NUC i7} & Time-synced logging & Quad-core i7, 16 GB RAM, 500 GB SSD \\
\end{tabular}
@@ -731,7 +740,7 @@ Based on the previously discussed requirements and the challenges of obtaining r
%We mainly utilize the data from the \emph{Ouster OS1-32} lidar sensor, which produces 10 frames per second with a resolution of 32 vertical channels by 2048 measurements per channel, both equiangularly spaced over the vertical and horizontal fields of view of 42.4° and 360° respectively. Every measurement of the lidar therefore results in a point cloud with a maximum of 65536 points. Every point contains the \emph{X}, \emph{Y} and \emph{Z} coordinates in meters with the sensor location as origin, as well as values for the \emph{range}, \emph{intensity} and \emph{reflectivity} which are typical data measured by lidar sensors. The data is dense, meaning missing measurements are still present in the data of each point cloud with zero values for most fields.
\todo[inline, color=green!40]{short description of sensor platform and refer to photo}
%\todo[inline, color=green!40]{short description of sensor platform and refer to photo}
We use data from the \emph{Ouster OS1-32} LiDAR sensor, which was configured to capture 10 frames per second with a resolution of 32 vertical channels and 2048 measurements per channel. These settings yield equiangular measurements across a vertical field of view of 42.4° and a complete 360° horizontal field of view. Consequently, every LiDAR scan can generate up to 65,536 points. Each point contains the \emph{X}, \emph{Y}, and \emph{Z} coordinates (in meters, with the sensor location as the origin) along with values for \emph{range}, \emph{intensity}, and \emph{reflectivity}—typical metrics measured by LiDAR sensors. The datasets' point clouds are saved in a dense format, meaning each of the 65,536 measurements is present in the data, although fields for missing measurements contain zeroes.
@@ -748,18 +757,11 @@ We use data from the \emph{Ouster OS1-32} LiDAR sensor, which was configured to
%-------------------------------------------------
\begin{figure}[htbp]
\centering
\subfigure[Pioneer 3-AT2 mobile base carrying the sensor tower.
The four-wheel, skid-steered platform supports up to 30 kg
payload and can negotiate rough terrain—providing the
mobility required for subterranean data collection.]
\subfigure[Pioneer 3-AT2 mobile base carrying the sensor tower. The four-wheel, skid-steered platform supports up to 30 kg payload and can negotiate rough terrain—providing the mobility required for subterranean data collection.]
{\includegraphics[width=0.45\textwidth]{figures/data_subter_platform_photo.jpg}
\label{fig:subter_platform_photo}}
\hfill
\subfigure[Sensor layout and numbering.
Components: 1 OS1-32 LiDAR, 2 mm-wave RADARs, 3 M1600 LiDAR,
4 OAK-D Pro camera, 5 LED flood-light, 6 IMU, 7 Intel NUC.
See Table~\ref{tab:sensor-suite-compact} for detailed
specifications.]
\subfigure[Sensor layout and numbering. Components: 1 OS1-32 LiDAR, 2 mm-wave RADARs, 3 M1600 LiDAR, 4 OAK-D Pro camera, 5 LED flood-light, 6 IMU, 7 Intel NUC. See Table~\ref{tab:subter-sensors} for detailed specifications.]
{\includegraphics[width=0.45\textwidth]{figures/data_subter_platform_sketch.png}
\label{fig:subter_platform_sketch}}
\caption{Robotic platform and sensor configuration used to record the dataset.}
@@ -817,6 +819,12 @@ Taken together, the percentage of missing points and the proportion of near-sens
\newsection{preprocessing}{Preprocessing Steps and Labeling}
\threadtodo
{explain preprocessing and rationale behind labeling techniques}
{raw dataset has been explained, how did we change the data before use}
{projection because easier autoencoder, manual and experiment-based labeling explained}
{method and preprared data known $\rightarrow$ next explain experimental setup}
\newsubsubsectionNoTOC{Preprocessing}
%\todo{describe how 3d lidar data was preprocessed (2d projection), labeling}
%\todo[inline]{screenshots of 2d projections?}