Autonomous robots have gained more and more prevailance in search and rescue missions due to not endangering another human being and still being able to fulfil the difficult tasks of navigating hazardous environments like collapsed structures, identifying and locating victims and assessing the environment's safety for human rescue teams. To understand the environment, robots employ multiple sensor systems such as lidar, radar, ToF, ultrasound, optical cameras or infrared cameras of which lidar is the most prominently used due to its accuracy. The robots use the sensors' data to map their environments, navigate their surroundings and make decisions like which paths to prioritize. Many of the aforementioned algorithms are deep learning-based algorithms which are trained on large amounts of data whose characteristics are learned by the models.
Environments of search and rescue situations provide challenging conditions for the sensor systems to produce reliable data. One of the most promiment examples are aerosol particles from smoke and dust which can obstruct the view and lead sensors to produce erroneous data. If such degraded data was not present in the robots' algorithms' training data these errors may lead to unexpected outputs and potentially endanger the robot or even human rescue targets. This is especially important for autonomous robots whose decisions are entirely based on their sensor data without any human intervention. To safeguard against these problems, robots need a way to assess the trustworthiness of their sensor systems' data.
For remote controlled robots a human operator can make these decisions but many search and rescue missions do not allow remote control due to environment factors, such as radio signal attenuation or the search area's size and therefore demand autonomous robots. Therefore, during the design for such robots we arrive at the following critical question:
\begin{quote} Can autonomous robots quantify the reliability of lidar sensor data in hazardous environments to make more informed decisions? \end{quote}
In this thesis we aim to answer this question by assessing a deep learning-based anomaly detection method and its performance when quantifying the sensor data's degradation. The employed algithm is a semi-supervised anomaly detection algorithm which uses manually labeled training data to improve its performance over unsupervised methods. We show how much the introduction of these labeled samples improves the methods performance. The models output is an anomaly score which quantifies the data reliability and can be used by algorithms that rely on the sensor data. These reliant algorithms may decide to for example slow down the robot to collect more data, choose alternative routes, signal for help or rely more heavily on other sensor's input data.
\todo[inline]{discuss results (we showed X)}
%\todo[inline, color=green!40]{autonomous robots have many sensors for understanding the world around them, especially visual sensors (lidar, radar, ToF, ultrasound, optical cameras, infrared cameras), they use that data for navigation mapping, SLAM algorithms, and decision making. these are often deep learning algorithms, oftentimes only trained on good data}
%\todo[inline, color=green!40]{difficult environments for sensors to produce good data quality (earthquakes, rescue robots), produced data may be unreliable, we don't know how trustworthy that data is (no quantification, confidence), since all navigation and decision making is based on input data, this makes the whole pipeline untrustworthy/problematic}
%\todo[inline, color=green!40]{contribution/idea of this thesis is to calculate a confidence score which describes how trustworthy input data is. algorithms further down the pipeline (slam, navigation, decision) can use this to make more informed decisions - examples: collect more data by reducing speed, find alternative routes, signal for help, do not attempt navigation, more heavily weight input from other sensors}
\todo[inline]{output is score, thresholding (yes/no), maybe confidence in sensor/data? NOT how this score is used in navigation/other decisions further down the line}
\todo[inline]{Sensor degradation due to dust/smoke not rain/fog/...}
\todo[inline, color=green!40]{we look at domain of rescue robots which save buried people after earthquakes, or in dangerous conditions (after fires, collapsed buildings) which means we are mostly working with indoors or subterranean environments which oftentimes are polluted by smoke and a lot of dust, ideally works for any kind of sensor data degradation but we only explore this domain}
\todo[inline, color=green!40]{mostly use lidar (state of the art) since they are very accurate in 3d mapping environments, so we focus on quantifying how trustworthy the lidar data is by itself. we do not look at other sensor data (tof, ultrasound, optical)}
\todo[inline, color=green!40]{intended output is confidence score which simply means higher score = worse data quality, lower score = trustworthy data. this score can be interpreted by algorithms in pipeline. we do not look at how this is implemented in the algorithms, no binary classifier but analog value, if this is wished followup algorithm has to decide (example by threshold or other methods)}
\todo[inline, color=green!40]{in section x we discuss anomaly detection, semi-supervised learning since such an algorithm was used as the chosen method, we also discuss how lidar works and the data it produces. then in we discuss in detail the chosen method DeepSAD in section X, in section 4 we discuss the traing and evaluation data, in sec 5 we describe our setup for training and evaluation (whole pipeline). results are presented and discussed in section 6. section 7 contains a conclusion and discusses future work}
\todo[inline, color=green!40]{in this section we will discuss necessary background knowledge for our chosen method and the sensor data we work with. related work exists mostly from autonomous driving which does not include subter data and mostly looks at precipitation as source of degradation, we modeled after one such paper and try to adapt the same method for the domain of rescue robots, this method is a semi-supervised deep learning approach to anomaly detection which we describe in more detail in sections 2.1 and 2.2. in the last subsection 2.3 we discuss lidar sensors and the data they produce}
\todo[inline, color=green!40]{cite exists since X and has been used to find anomalous data in many domains and works with all kinds of data types/structures (visual, audio, numbers). examples healthcare (computer vision diagnostics, early detection), financial anomalies (credit card fraud, maybe other example), security/safety video cameras (public, traffic, factories).}
\todo[inline, color=green!40]{the goal of these algorithms is to differentiate between normal and anomalous data by finding statistically relevant information which separates the two, since these methods learn how normal data typically is distributed they do not have to have prior knowledge of the types of all anomalies, therefore can potentially detect unseen, unclassified anomalies as well. main challenges when implementing are that its difficult to cleanly separate normal from anormal data}
\todo[inline, color=green!40]{typically no or very little labeled data is available and oftentimes the kinds of possible anomalies are unknown and therefore its not possible to label all of them. due to these circumstances anomaly detection methods oftentimes do not rely on labeled data but on the fact that normal circumstances make up the majority of training data (quasi per defintion)}
\todo[inline, color=green!40]{figure example shows 2d data but anomaly detection methods work with any kind of dimensionality/shape. shows two clusters of normal data with clear boundaries and outside examples of outliers (anomalous data two single points and one cluster), anomaly detection methods learn to draw these boundaries from the training data given to them which can then be used to judge if unseen data is normal or anormal}
\todo[inline, color=green!40]{as discussed in motivation, and same as in reference paper (rain autonomous driving) we model our problem as an anomaly detection problem where we define that good quality sensor data is normal data and degraded sensor data (in our case due to dust/smoke) is defined as an anomaly. this allows us to quantify the degradation of data by using the anomaly detection method to check how likely new data is an anomaly}
\iffalse
Anomaly detection algorithms are designed to detect or quantify the likelihood of a pattern in data deviating significantly from a well-defined expected norm. Deviations such as these are classified as anomalies or outliers and often signify critical or actionable information.
\caption{An example of a 2-dimensional data set with anomalies. Reproduced from~\cite{Chandola2009AnomalyDA}}\label{fig:anomaly_detection_overview}
\end{figure}
\todo[inline]{Figure example normal data boundaries, single outliers o1, o2, cluster of outliers o3. difficult to define boundaries so that all normal data inside and anomalies outside }
\todo[inline]{Quick overview of the DeepSAD metho}
\todo[inline, color=green!40]{deep learning based (neural network with hidden layers), neural networks which get trained using backpropagation, to learn to solve a novel task by defining some target}
\todo[inline, color=green!40]{data labels decide training setting (supervised, non-supervised, semi-supervised incl explanation), supervised often classification based, but not possible if no labels available, un-supervised has no well-defined target, often used to fined common hidden factors in data (distribution). semi-supervised more like a sub method of unsupervised which additionally uses little (often handlabelled) data to improve method performance}
\todo[inline, color=green!40]{find easy illustrative example with figure of semi-supervised learning and include + explain here}
\todo[inline, color=green!40]{our chosen method DeepSAD is a semi-supervised deep learning method whose workings will be discussed in more detail in secion X}
\todo[inline, color=green!40]{autoencoders are a neural network architecture archetype (words) whose training target is to reproduce the input data itself - hence the name. the architecture is most commonly a mirrored one consisting of an encoder which transforms input data into a hyperspace represantation in a latent space and a decoder which transforms the latent space into the same data format as the input data (phrasing), this method typically results in the encoder learning to extract the most robust and critical information of the data and the (todo maybe something about the decoder + citation for both). it is used in many domains translations, LLMs, something with images (search example + citations)}
\todo[inline, color=green!40]{our chosen method DeepSAD uses an autoencoder to translate input data into a latent space, in which it can more easily differentiate between normal and anomalous data}
\todo[inline, color=green!40]{the older more commonly known radar works by sending out an electromagnetic wave in the radiofrequency and detecting the time it takes to return (if it returns at all) signalling a reflective object in the path of the radiowave. lidar works on the same principle but sends out a lightray produced by a laser (citation needed) and measuring the time it takes for the ray to return to the sensor. since the speed of light is constant in air the system can calculate the distance between the sensor and the measured point. modern lidar systems send out multiple, often millions of measurement rays per second which results in a three dimensional point cloud, constructed from the information in which direction the ray was cast and the distance that was measured}
\todo[inline, color=green!40]{lidar is used in most domains reliant on accurate 3d representations of the world like autonomous driving, robot navigation, (+ maybe quickly look up two other domains), its main advantage is high measurement accuracy, precision (use correct term), and high resolution (possible due to single point measurements instead of cones like radar, ToF, Ultrasonic) which enables more detailed mappings of the environment}
\todo[inline, color=green!40]{due to point precision, lidar is sensitive to noise/degradation of airborne particles, which may produce early returns, deflections, errrors of light rays, this results in noise in the 3d point cloud and possibly missing data of the measurement behind the aerosol particle.}
\todo[inline, color=green!40]{because of the given advantages of lidar it is most commonly used nowadays on robot platforms for environment mapping and navigiation - so we chose to demonstrate our method based on degraded data collected by a lidar sensor as discussed in more dtail in section (data section)}
Deep Semi-Supervised Anomaly Detection~\cite{deepsad} is a deep-learning based anomaly detection method whose performance in regards to sensor degradation quantification we explore in this thesis. It is a semi-supervised method which allows the introduction of manually labeled samples in addition to the unlabeled training data to improve the algorithm's performance over its unsupervised predecessor Deep One-Class Classification~\cite{deepsvdd}.\todo{phrasing} The working principle of the method is to encode the input data onto a latent space and train the network to cluster normal data close together while anomalies get mapped further away in that latent space.
%\todo[inline, color=green!40]{DeepSAD is a semi-supervised anomaly detection method proposed in cite, which is based on an unsupervised method (DeepSVDD) and additionally allows for providing some labeled data which is used during the training phase to improve the method's performance}
\todo[inline, color=green!40]{Core idea of the algorithm is to learn a transformation to map input data into a latent space where normal data clusters close together and anomalous data gets mapped further away. to achieve this the methods first includes a pretraining step of an auto-encoder to extract the most relevant information, second it fixes a hypersphere center in the auto-encoders latent space as a target point for normal data and third it traings the network to map normal data closer to that hypersphere center. Fourth The resulting network can map new data into this latent space and interpret its distance from the hypersphere center as an anomaly score which is larger the more anomalous the datapoint is}
\todo[inline, color=green!40]{explanation pre-training step: architecture of the autoencoder is dependent on the input data shape, but any data shape is generally permissible. for the autoencoder we do not need any labels since the optimization target is always the input itself. the latent space dimensionality can be chosen based on the input datas complexity (search citations). generally a higher dimensional latent space has more learning capacity but tends to overfit more easily (find cite). the pre-training step is used to find weights for the encoder which genereally extract robust and critical data from the input because TODO read deepsad paper (cite deepsad). as training data typically all data (normal and anomalous) is used during this step.}
\todo[inline, color=green!40]{explanation hypersphere center step: an additional positive ramification of the pretraining is that the mean of all pre-training's latent spaces can be used as the hypersphere target around which normal data is supposed to cluster. this is advantageous because it allows the main training to converge faster than choosing a random point in the latent space as hypersphere center. from this point onward the center C is fixed for the main training and inference and does not change anymore.}
\todo[inline, color=green!40]{explanation training step: during the main training step the method starts with the pre-trained weights of the encoder but removes the decoder from the architecture since it optimizes the output in the latent space and does not need to reproduce the input data format. it does so by minimizing the geometric distance of each input data's latent space represenation to the previously defined hypersphere center c. Due to normal data being more common in the inputs this results in normal data clustering closely to C and anormal data being pushed away from it. additionally during this step the labeled data is used to more correctly map normal and anormal data}
\todo[inline, color=green!40]{explanation inference step: with the trained network we can transform new input data into the latent space and calculate its distance from the hypersphere center which will be smaller the more confident the network is in the data being normal and larger the more likely the data is anomalous. This output score is an analog value dependent on multiple factors like the latent space dimensionality, encoder architecture and ??? and has to be interpreted further to be used (for example thresholding)}
\todo[inline, color=green!40]{in formula X we see the optimization target of the algorithm. explain in one paragraph the variables in the optimization formula}
\todo[inline, color=green!40]{explain the three terms (unlabeled, labeled, regularization)}
\todo[inline]{semi supervised, learns normality by amount of data (no labeling/ground truth required), very few labels for better training to specific situation}
%\todo[inline, color=green!40]{good data important for learning based methods and for evaluation. in this chapter we talk about the requirements we have for our data and the difficulties that come with them and will then give some information about the dataset that was used as well as how the data was preprocessed for the experiments (sec 4.2)}
%Fortunately situations like earthquakes, structural failures and other circumstances where rescue robots need to be employed are uncommon occurences. When such an operation is conducted, the main focus lies on the fast and safe rescue of any survivors from the hazardous environment, therefore it makes sense that data collection is not a priority. Paired with the rare occurences this leads to a lack of publicly available data of such situations. To improve any method, a large enough, diversified and high quality dataset is always necessary to provide a comprehensive evaluation. Additionally, in this work we evaluate a training based method, which increases the requirements on the data manifold, which makes it all the more complex to find a suitable dataset. In this chapter we will state the requirements we defined for the data, talk about the dataset that was chosen for this task, including some statistics and points of interest, as well as how it was preprocessed for the training and evaluation of the methods.
Situations such as earthquakes, structural failures, and other emergencies that require rescue robots are fortunately rare. When these operations do occur, the primary focus is on the rapid and safe rescue of survivors rather than on data collection. Consequently, there is a scarcity of publicly available data from such scenarios. To improve any method, however, a large, diverse, and high-quality dataset is essential for comprehensive evaluation. This challenge is further compounded in our work, as we evaluate a training-based approach that imposes even higher requirements on the data to enable training, making it difficult to find a suitable dataset.
In this chapter, we outline the specific requirements we established for the data, describe the dataset selected for this task—including key statistics and notable features—and explain the preprocessing steps applied for training and evaluating the methods.
%\todo[inline, color=green!40]{we require lidar sensor data that was collected in a domain as closely related to our target domain (rescue robots indoors, cave-ins, ) as possible which also includes some kind of appreciable degradation for which we have some kind of labeling possibility. ideally the degradation should be from smoke/dust/aerosol particles. most data should be without degradation (since we require more normal than anormal data to train the method as described in X) but we need enough anormal data so we can confidently evaluate the methods performance}
%Our main requirement for the data was for it to be as closely related to the target domain of rescue operations as possible. Since autonomous robots get largely used in situations where a structural failures occured we require of the data to be subterranean. This provides the additional benefit, that data from this domain oftentimes already has some amount of airborne particles like dust due to limited ventilation and oftentimes exposed rock, which is to be expected to also be present in rescue situations. The second and by far more limiting requirement on the data, was that there has to be appreciable degradation due to airborne particles as would occur during a fire from smoke. The type of data has to at least include lidar but for better understanding other types of visual data e.g., visual camera images would be benefical. The amount of data has to be sufficient for training the learning based methods while containing mostly good quality data without degradation, since the semi-supervised method implicitely requires a larger amount of normal than anomalous training for successful training. Nonetheless, the number of anomalous data samples has to be large enough that a comprehensive evaluation of the methods' performance is possible.
Our primary requirement for the dataset was that it closely reflects the target domain of rescue operations. Because autonomous robots are predominantly deployed in scenarios involving structural failures, the data should be taken from subterranean environments. This setting not only aligns with the operational context but also inherently includes a larger than normal amount of airborne particles (e.g., dust) from limited ventilation and exposed rock surfaces, which is typically encountered during rescue missions.
A second, more challenging requirement is that the dataset must exhibit significant degradation due to airborne particles, as would be expected in scenarios involving smoke from fires. The dataset should at minimum include LiDAR data, and ideally also incorporate other visual modalities (e.g., camera images) to provide a more comprehensive understanding of the environment.
Additionally, the dataset must be sufficiently large for training learning-based methods. Since the semi-supervised approach we utilize relies on a predominance of normal data over anomalous data, it is critical that the dataset predominantly consists of high-quality, degradation-free samples. At the same time, there must be enough anomalous samples to allow for a thorough evaluation of the method’s performance.
\newsubsubsectionNoTOC{Labeling Challenges}
%\todo[inline, color=green!40]{labeling is an especially problematic topic since ideally we would want an analog value which corresponds with the amount of smoke present for evaluation. for training we only require the possibility to provide labels in the form of normal or anormal targets (binary classification) and these labels do not have to be present for all data, only for some of the data (since semi-supervised only uses some labeled data as discussed in X)}
%To evaluate how proficiently any method can quantify the degradation of lidar data we require some kind of degradation label per scan. Ideally we would want an analog value per scan which somehow correlates to the degradation, but even a binary label of either degraded or not degraded would be useful. To find out which options are available for this task, we first have to figure out what degradation means in the context of lidar scans and especially the point clouds in which they result. Lidar sensors combine multiple range measurements which are executed near simultaneously into a point cloud whose reference point is the sensor location at the time of measurement. Ideally for each attempted measurement during a scan one point is produced, albeit in reality there are many factors why a fraction of the measurements cannot be completed and therefore there will be missing points even in good conditions. Additionally, there are also measurements which result in an incorrect range, like for example when an aerosol particle is hit by the measurement ray and a smaller range than was intended to be measured (to the next solid object) was returned. The sum of missing and erroneous measurements makes up the degradation, although it can be alleged that the term also includes the type or structure of errors or missing points and the resulting difficulties when further utilizing the resulting point cloud. For example, if aerosol particles are dense enough in a small portion of the frame, they could produce a point cloud where the particles are interpreted as a solid object even though the amount of erroneous measurements is smaller than for another scan where aerosol particles are evenly distributed around the sensor. In the latter case the erroneous measurements may be identified by outlier detection algorithms and after removal do not hinder further processing of the point cloud. For these reasons it is not simple to define data degradation for lidar scans.
%Another option would be to try to find an objective measurement of degradation. As the degradation in our use case mostly stems from airborne particles, it stands to reason that measuring the amount of them would enable us to label each frame with an analog score which correlates to the amount of degradation. This approach turns out to be difficult to implement in real life, since sensors capable of measuring the amount and size of airborne particles typically do so at the location of the sensor while the lidar sensor also sends measurement rays into all geometries visible to it. This localized measurement could be useful if the aerosol particle distribution is uniform enough but would not allow the system to anticipate degradation in other parts of the point cloud. We are not aware of any public dataset fit for our requirements which also includes data on aerosol particle density and size.
To evaluate how effectively a method can quantify LiDAR data degradation, we require a degradation label for each scan. Ideally, each scan would be assigned an analog value that correlates with the degree of degradation, but even a binary label—indicating whether a scan is degraded or not—would be useful.
Before identifying available options for labeling, it is essential to define what “degradation” means in the context of LiDAR scans and the resulting point clouds. LiDAR sensors combine multiple range measurements, taken nearly simultaneously, into a single point cloud with the sensor’s location as the reference point. In an ideal scenario, each measurement produces one point; however, in practice, various factors cause some measurements to be incomplete, resulting in missing points even under good conditions. Additionally, some measurements may return incorrect ranges. For example, when a measurement ray strikes an aerosol particle, it may register a shorter range than the distance to the next solid object. The combined effect of missing and erroneous measurements constitutes degradation. One could also argue that degradation includes the type or structure of errors and missing points, which in turn affects how the point cloud can be further processed. For instance, if aerosol particles are densely concentrated in a small region, they might be interpreted as a solid object which could indicate a high level of degradation, even if the overall number of erroneous measurements is lower when compared to a scan where aerosol particles are evenly distributed. In the latter case, outlier detection algorithms might easily remove the erroneous points, minimizing their impact on subsequent processing. Thus, defining data degradation for LiDAR scans is not straightforward.
An alternative approach would be to establish an objective measurement of degradation. Since the degradation in our use case primarily arises from airborne particles, one might assume that directly measuring their concentration would allow us to assign an analog score that correlates with degradation. However, this approach is challenging to implement in practice. Sensors that measure airborne particle concentration and size typically do so only at the sensor’s immediate location, whereas the LiDAR emits measurement rays that traverse a wide field of view. This localized measurement might be sufficient if the aerosol distribution is uniform, but it does not capture variations in degradation across the entire point cloud. To our knowledge, no public dataset exists that meets our requirements while also including detailed data on aerosol particle density and size.
%For training purposes we generally do not require labels since the semi-supervised method may fall back to a unsupervised one if no labels are provided. To improve the method's performance it is possible to provide binary labels i.e., normal and anomalous-correlating to non-degraded and degraded respectively-but the amount of the provided training labels does not have to be large and can be handlabelled as is typical for semi-supervised methods, since they often work on mostly unlabeled data which is difficult or even impossible to fully label.
For training, explicit labels are generally not required because the semi-supervised method we employ can operate in an unsupervised manner when labels are absent. However, incorporating binary labels—normal for non-degraded and anomalous for degraded conditions—can enhance the method's performance. Importantly, only a small number of labels is needed, and these can be hand-labeled, which is typical in semi-supervised learning where the majority of the data remains unlabeled due to the difficulty or impracticality of fully annotating the dataset.
%\todo[inline, color=green!40]{We chose to evaulate the method on the dataset "Multimodal Dataset from Harsh Sub-Terranean Environment with Aerosol Particles for Frontier Exploration"~\cite{alexander_kyuroson_2023_7913307} which is a public dataset collected by X in a sub-terranean environment and includes data from multiple sensors on a moving sensor platform as well as experiments where sensor data is explicitely degraded by aerosol particles produced by a smoke machine.}
%\todo[inline, color=green!40]{list sensors on the platform}
%Based on the previously discussed requirements and labeling difficulties we decided to train and evaluate the methods on \emph{Multimodal Dataset from Harsh Sub-Terranean Environment with Aerosol Particles for Frontier Exploration}~\cite{subter}. The dataset is comprised of data from multiple sensors on a moving sensor platform which was driven through tunnels and rooms in a subterranean setting. What makes it especially fitting for our use case is that during some of the experiments, an artifical smoke machine was employed to simulate aerosol particles.
%The sensors employed during capture of the dataset include:
Based on the previously discussed requirements and the challenges of obtaining reliable labels, we selected the \emph{Multimodal Dataset from Harsh Sub-Terranean Environment with Aerosol Particles for Frontier Exploration}~\cite{subter} for training and evaluation. This dataset comprises multimodal sensor data collected from a moving platform navigating tunnels and rooms in a subterranean environment. Notably, some experiments incorporated an artificial smoke machine to simulate aerosol particles, making the dataset particularly well-suited to our use case. The sensors used during data capture include:
%\todo[inline, color=green!40]{lidar data of 360° sensor is captured at 10 frames per second. each sensor output consists of point cloud which resulted from measurement of 32 vertical channels for each of which 2048 measurement points are taken during each measurement equiangular distributed around the whole horizontal 360°, so the sensor measures 32 * 2048 = 65536 measurements 10 times a second for which ideally every one produces a point in the point cloud consisting of x,y,z coordinates (relative to sensor platform) as well as some other values per measurement (reflectivity, intensity originally measured range value)}
%We mainly utilize the data from the \emph{Ouster OS1-32} lidar sensor, which produces 10 frames per second with a resolution of 32 vertical channels by 2048 measurements per channel, both equiangularly spaced over the vertical and horizontal fields of view of 42.4° and 360° respectively. Every measurement of the lidar therefore results in a point cloud with a maximum of 65536 points. Every point contains the \emph{X}, \emph{Y} and \emph{Z} coordinates in meters with the sensor location as origin, as well as values for the \emph{range}, \emph{intensity} and \emph{reflectivity} which are typical data measured by lidar sensors. The data is dense, meaning missing measurements are still present in the data of each point cloud with zero values for most fields.
We use data from the \emph{Ouster OS1-32} LiDAR sensor, which was configured to capture 10 frames per second with a resolution of 32 vertical channels and 2048 measurements per channel. These settings yield equiangular measurements across a vertical field of view of 42.4° and a complete 360° horizontal field of view. Consequently, every LiDAR scan can generate up to 65,536 points. Each point contains the \emph{X}, \emph{Y}, and \emph{Z} coordinates (in meters, with the sensor location as the origin) along with values for \emph{range}, \emph{intensity}, and \emph{reflectivity}—typical metrics measured by LiDAR sensors. Although the dataset is considered dense, each point cloud still contains missing measurements, with fields of these missing measurements registering as zero.
%During the measurement campaign 14 experiments were conducted, of which 10 did not contain the utilization of the artifical smoke machine and 4 which did contain the artifical degradation, henceforth refered to as normal and anomalous experiments respectively. During 13 of the experiments the sensor platform was in near constant movement (sometimes translation - sometimes rotation) with only 1 anomalous experiment having the sensor platform stationary. This means we do not have 2 stationary experiments to directly compare the data from a normal and an anomalous experiment, where the sensor platform was not moved, nonetheless the genereal experiments are similar enough for direct comparisons. During anomalous experiments the artifical smoke machine appears to have been running for some time before data collection, since in camera images and lidar data alike, the water vapor appears to be distributed quite evenly throughout the closer perimeter of the smoke machine. The stationary experiment is also unique in that the smoke machine is quite close to the sensor platform and actively produces new smoke, which is dense enough for the lidar data to see the surface of the newly produced water vapor as a solid object.
During the measurement campaign, 14 experiments were conducted—10 without the artificial smoke machine (hereafter referred to as normal experiments) and 4 with it (anomalous experiments). In 13 of these experiments, the sensor platform was in near-constant motion (either translating or rotating), with only one anomalous experiment conducted while the platform remained stationary. Although this means we do not have two stationary experiments for a direct comparison between normal and anomalous conditions, the overall experiments are similar enough to allow for meaningful comparisons.
In the anomalous experiments, the artificial smoke machine appears to have been running for some time before data collection began, as evidenced by both camera images and LiDAR data showing an even distribution of water vapor around the machine. The stationary experiment is particularly unique: the smoke machine was positioned very close to the sensor platform and was actively generating new, dense smoke, to the extent that the LiDAR registered the surface of the fresh water vapor as if it were a solid object.
%The 14 experiments differ regarding the available illumination, the presence of humans-traversing the measurement grounds- or additional static objects as artifcats and of course regarding the presence of the water vapor from the smoke machine. Aside from the artifical smoke which is essential for our use case, the other differences during the individual experiments are of no interestet to us and do not affect it in any way. Regardless of illumination, the lidar sensor produces indistinguishable point clouds and any static objects do not factor into our quantification of the point clouds' degradation.
The 14 experiments varied in illumination conditions, the presence of humans on the measurement grounds, and additional static artifacts, as well as in the presence of water vapor from the smoke machine. For our purposes, only the artificial smoke is relevant; differences in lighting or incidental static objects do not affect our analysis. Regardless of illumination, the LiDAR sensor consistently produces comparable point clouds, and the presence of static objects does not influence our quantification of point cloud degradation.
The figures~\ref{fig:data_screenshot_pointcloud}~and~\ref{fig:data_screenshot_camera} show an representative depiction of the environment of the experiments as a camera image of the IR camera and the point cloud created by the OS1 lidar sensor at practically the same time.
\fig{data_screenshot_pointcloud}{figures/data_screenshot_pointcloud.png}{Screenshot of 3D rendering of an experiment without smoke and with illumination (same frame and roughly same alignment as figure~\ref{fig:data_screenshot_camera}). Point color corresponds to measurement range and axis in center of figure is the lidar's position.}
\fig{data_screenshot_camera}{figures/data_screenshot_camera.png}{Screenshot of IR camera output of an experiment without smoke and with illumination (same frame and roughly same alignment as figure~\ref{fig:data_screenshot_pointcloud})}
%\todo[inline, color=green!40]{talk about how much data is available (maybe a plot about data?), number of experiments with/without degradation, other factors in these experiments which do not concern our use-case of them}
%Regarding the amount of data, of the 10 normal experiments the shortest was 88.7 seconds and the longest 363.1 seconds with a mean of 157.65 seconds between all 10 experiments, which results in 15765 non-degraded point clouds. Of the 4 anomalous experiments, the shortest was the stationary one with 11.7 seconds and the longest was 62.1 seconds, having a mean of 47.325 seconds, resulting in 1893 degraded point clouds. This gives us 17658 point clouds alltogether with 89.28\% of them being non-degraded/normal samples and the other 10.72\% of them begin degraded/anomalous samples.
Regarding the dataset volume, the 10 normal experiments ranged from 88.7 to 363.1 seconds, with an average duration of 157.65 seconds. At a capture rate of 10 frames per second, these experiments yield 15,765 non-degraded point clouds. In contrast, the 4 anomalous experiments, including one stationary experiment lasting 11.7 seconds and another extending to 62.1 seconds, averaged 47.33 seconds, resulting in 1,893 degraded point clouds. In total, the dataset comprises 17,658 point clouds, with approximately 89.28\% classified as non-degraded (normal) and 10.72\% as degraded (anomalous). The distribution of experimental data is visualized in figure~\ref{fig:data_points_pie}.
\fig{data_points_pie}{figures/data_points_pie.png}{Pie chart visualizing the amount and distribution of normal and anomalous point clouds in \cite{subter}}
As we can see in figure~\ref{fig:data_missing_points}, the artifical smoke introduced as explicit degradation during some experiments results in more missing measurements during scans, which can be explained by measurement rays hitting airborne particles but not being reflected back to the sensor in a way it can measure.
\fig{data_missing_points}{figures/data_missing_points.png}{Density histogram showing the percentage of missing measurements per scan for normal experiments without degradation and anomalous experiments with artifical smoke introduced as degradation.}
% In experiments with artifical smoke present, we observe many points in the point cloud very close to the sensor where there are no solid objects and therefore the points have to be produced by airborne particles from the artifical smoke. The phenomenon can be explained, in that the closer to the sensor an airborne particle is hit, the higher the chance of it reflecting the ray in a way the lidar can measure. In \ref{fig:particles_near_sensor} we see a box diagram depicting how significantly more measurements of the anomaly expirements produce a range smaller than 50 centimeters. Due to the sensor platform's setup and its paths taken during experiments we can conclude that any measurement with a range smaller than 50 centimeters has to be erroneous. While the amount of these returns near the sensor could most likely be used to estimate the sensor data quality while the sensor itself is located inside an environment containing airborne particles, this method would not allow to anticipate sensor data degradation before the sensor itself enters the affected area. Since lidar is used to sense the visible geometry from a distance, it would be desireable to quantify the data degradation of an area before the sensor itself enters it. Due to these reasons we did not use this phenomenon in our work.
In experiments with artificial smoke, we observe numerous points in the point cloud very close to the sensor, even though no solid objects exist at that range. These points are therefore generated by airborne particles in the artificial smoke. This phenomenon occurs because the closer an airborne particle is to the sensor, the higher the probability it reflects the laser beam in a measurable way. As shown in Figure~\ref{fig:particles_near_sensor}, a box diagram illustrates that significantly more measurements during these experiments report ranges shorter than 50 centimeters. Given the sensor platform's setup and its experimental trajectory, we conclude that any measurement with a range under 50 centimeters is erroneous.
While the density of these near-sensor returns might be used to estimate data quality when the sensor is already in an environment with airborne particles, this method cannot anticipate data degradation before the sensor enters such an area. Since LiDAR is intended to capture visible geometry from a distance, it is preferable to quantify potential degradation of an area in advance. For these reasons, we did not incorporate this phenomenon into our subsequent analysis.
\fig{particles_near_sensor}{figures/particles_near_sensor_boxplot_zoomed_500.png}{Box diagram depicting the percentage of measurements closer than 50 centimeters to the sensor for normal and anomalous experiments}
\newsection{preprocessing}{Preprocessing Steps and Labeling}
%\todo{describe how 3d lidar data was preprocessed (2d projection), labeling}
%\todo[inline]{screenshots of 2d projections?}
%\todo[inline, color=green!40]{while as described in sec X the method DeepSAD is not dependend on any specific type/structure of data it requires to train an auto encoder in the pretraining step. such autoencoders are better understood in the image domain since there are many uses cases for this such as X (TODO citation needed), there are also 3d data auto encoders such as X (todo find example). same as the reference paper (rain cite) we chose to transform the 3d data to 2d by using a spherical spherical projection to map each of the 3d points onto a 2d plane where the range of each measurement can be expressed as the brightness of a single pixel. this leaves us with a 2d image of resolution 32x2048 (channels by horizontal measurements), which is helpful for visualization as well as for choosing a simpler architecture for the autoencoder of deepsad, the data in the rosbag is sparse meaning that measurements of the lidar which did not produce any value (no return ray detected before sensor specific timeout) are simply not present in the lidar scan. meaning we have at most 65xxx measurements per scan but mostly fewer than this, (maybe statistic about this? could aslo be interesting to show smoke experiment stuff)}
%As described in section~\ref{sec:algorithm_description} the method we want to evaluate is datatype agnostic and can be adjusted to work with any kind of data. The data from~\cite{subter} that we will train on is a point cloud per scan created by the lidar sensor which contains up to 65536 points with \emph{X}, \emph{Y}, and \emph{Z} coordinates (in meters) per point. To adjust the architecture of DeepSAD to work with a specific datatype, we have to define an autoencoder architecture that works for the given datatype. While autoencoders can be created for any datatype, as~\cite{autoencoder_survey} points out over 60\% of research papers pertaining autoencoders in recent years look at image classification and reconstruction, so we have a better understanding of their architectures for two dimensional images than for three dimensional point clouds.
As described in Section~\ref{sec:algorithm_description}, the method under evaluation is data type agnostic and can be adapted to work with any kind of data. In our case, we train on point clouds from~\cite{subter}, where each scan produced by the LiDAR sensor contains up to 65,536 points, with each point represented by its \emph{X}, \emph{Y}, and \emph{Z} coordinates. To tailor the DeepSAD architecture to this specific data type, we must design an autoencoder suitable for processing three-dimensional point clouds. Although autoencoders can be developed for various data types, as noted in~\cite{autoencoder_survey}, over 60\% of recent research on autoencoders focuses on two-dimensional image classification and reconstruction. Consequently, there is a more established understanding of architectures for images compared to those for three-dimensional point clouds.
%\todo[inline, color=green!40]{to achieve this transformation we used the helpful measurement index and channel present in each measurement point of the dataset which allowed a perfect reconstruction of the 2d projection without calculating the pixel position in the projection of each measurement via angles which in our experience typically leads to some ambiguity in the projection (multiple measurements mapping to the same pixel due to precision loss/other errors) the measurement index increases even for unavailable measurements (no ray return) so we can simply create the 2d projection by mapping the normalized range (FIXME really normalized) value to the pixel position y = channel, x = measurement index. by initalizing the array to NaN values originally we have a 2d data structure with the range values and NaN on pixel positions where originally no measurement took place (missing measurements in scans due to no ray return)}
%For this reason we decided to preprocess the point clouds by converting them to two dimensional grayscale images using spherical projection. Additionally, \cite{degradation_quantification_rain}-which we modeled our approach after-successfully chose this approach. In the projected image each measurement is encoded to a single pixel, whose grayscale value $v$ is the normalized range of the measurement $v = \sqrt{\emph{X}^2 + \emph{Y}^2 + \emph{Z}^2}$. Due to the settings of the datasets' lidar, this results in images with the resolution of 2048 pixels wide by 32 pixels tall. Missing measurements of the point cloud are mapped to pixels with a brightness of 0. To create the mapping we used the measurements indices and channels which are available since the dataset contains dense point clouds and which can be used since the point indices are ordered from 0 to 65535 horizontally ascending channel by channel. For point clouds without indices which can be directly mapped, as is often the case for sparse ones, it would be necessary to use the pitch and yaw angles to the sensor origin to map each point to a pixel on the projection.
To simplify further processing, we converted the point clouds into two-dimensional grayscale images using a spherical projection. This approach—also employed successfully in \cite{degradation_quantification_rain}—encodes each LiDAR measurement as a single pixel, where the pixel’s grayscale value is determined by the normalized range, calculated as $v =\sqrt{\emph{X}^2+\emph{Y}^2+\emph{Z}^2}$. Given the LiDAR sensor's configuration, the resulting images have a resolution of 2048 pixels in width and 32 pixels in height. Missing measurements in the point cloud are mapped to pixels with a brightness value of 0.
To create this mapping, we leveraged the available measurement indices and channel information inherent in the dense point clouds, which are ordered from 0 to 65,535 in a horizontally ascending, channel-by-channel manner. For sparser point clouds without such indices, one would need to rely on the pitch and yaw angles relative to the sensor's origin to correctly map each point to its corresponding pixel.
\todo[inline, color=green!40]{add two projections one with one without smoke to }
\todo[inline, color=green!40]{another important preprocessing step is labeling of the lidar frames as normal/anormal. this is one hand used during training (experiments with zero labeled up to most of the data being labeled) and on the other hand is important for evaluation of the method performance. originally we do not have any labels on the data regarding degradation and no analog values from another sensor which measures current smoke particles in the air. our simple approach was to label all frames from experiments which included artifical degradation by fog machine smoke as anomalous and all frames from experiments without artifical degradation as normal.}
\todo[inline, color=green!40]{this simple labeling method is quite flawed since we do not label based on the actual degradation of the scan (not by some kind of threshold of analog measurement threshold, statistical info about scan) since (TODO FIXME) this would result in training which only learns this given metric (example missing measurement points) which would make this methodology useless since we could simply use that same measurement as an more simple way to quantify the scan's degradation. }
\todo[inline]{TODO maybe evaluate based on different thresholds? missing datapoints, number of detected outliers, number of particles in phantom circle around sensor?}