data requirements labeling challenges section

2025-05-19 09:51:56 +02:00
parent 0a35786ebb
commit cb1c58813c
2 changed files with 61 additions and 20 deletions
--- a/thesis/Main.tex
+++ b/thesis/Main.tex
@@ -66,6 +66,7 @@
 
 \usepackage{xcolor}
 \usepackage[colorinlistoftodos]{todonotes}
+%\usepackage[disable]{todonotes}

 \DeclareRobustCommand{\threadtodo}[4]{%
  \todo[inline,
@@ -441,7 +442,7 @@ All of these may create unwanted noise in the point cloud created by the lidar,

 A learning-based method to filter dust-caused degradation from lidar is introduced in~\cite{lidar_denoising_dust}. The authors employ a convultional neural network to classify dust particles in lidar point clouds as such, enabling the filtering of those points and compare their methods to more conservative approaches, such as various outlier removal algorithms. Another relevant example would be the filtering method proposed in~\cite{lidar_subt_dust_removal}, which enables the filtration of pointclouds degraded by smoke or dust in subterranean environments, with a focus on the search and rescue domain. To achieve this, they formulated a filtration framework that relies on dynamic onboard statistical cluster outlier removal, to classify and remove dust particles in point clouds.

-Our method does not aim to remove the noise or degraded points in the lidar data, but quantify its degradation to inform other systems of the autonomous robot about the data's quality, enabling more informed decisions. One such approach, though from the autonomous driving and not from the search and rescue domain can be found in~\cite{degradation_quantification_rain}. A learning-based method to quantify the lidar's sensor data degradation caused by adverse weather-effects was proposed, implemented by posing the problem as an anomaly detection task and utilizing DeepSAD to learn degraded data to be an anomaly and high quality data to be normal behaviour. DeepSAD's anomaly score was used as the degradation's quantification score. From this example we decided to imitate this method and adapt it for the search and rescue domain, although this proved challenging due to the more limited data availability. Since it proved successfull for~\cite{degradation_quantification_rain} we also employed DeepSAD, whose detailed workings we present in the following chapter.
+Our method does not aim to remove the noise or degraded points in the lidar data, but quantify its degradation to inform other systems of the autonomous robot about the data's quality, enabling more informed decisions. One such approach, though from the autonomous driving and not from the search and rescue domain can be found in~\cite{degradation_quantification_rain}. A learning-based method to quantify the lidar's sensor data degradation caused by adverse weather-effects was proposed, implemented by posing the problem as an anomaly detection task and utilizing DeepSAD to learn degraded data to be an anomaly and high quality data to be normal behaviour. DeepSAD's anomaly score was used as the degradation's quantification score. From this example we decided to imitate this method and adapt it for the search and rescue domain, although this proved challenging due to the more limited data availability. Since it was effective for the closely related~\cite{degradation_quantification_rain}, we also employed DeepSAD, whose detailed workings we present in the following chapter.

 %\todo[inline]{related work, survey on lidar denoising, noise removal in subt - quantifying same as us in rain, also used deepsad - transition}

@@ -600,7 +601,6 @@ Situations such as earthquakes, structural failures, and other emergencies that
 In this chapter, we outline the specific requirements we established for the data, describe the dataset selected for this task—including key statistics and notable features—and explain the preprocessing steps applied for training and evaluating the methods.


-\newsection{data}{Data}


 %\todo[inline]{describe data sources, limitations}
@@ -610,7 +610,7 @@ In this chapter, we outline the specific requirements we established for the dat
 %\todo[inline, color=green!40]{we require lidar sensor data that was collected in a domain as closely related to our target domain (rescue robots indoors, cave-ins, ) as possible which also includes some kind of appreciable degradation for which we have some kind of labeling possibility. ideally the degradation should be from smoke/dust/aerosol particles. most data should be without degradation (since we require more normal than anormal data to train the method as described in X) but we need enough anormal data so we can confidently evaluate the methods performance}

 %Our main requirement for the data was for it to be as closely related to the target domain of rescue operations as possible. Since autonomous robots get largely used in situations where a structural failures occured we require of the data to be subterranean. This provides the additional benefit, that data from this domain oftentimes already has some amount of airborne particles like dust due to limited ventilation and oftentimes exposed rock, which is to be expected to also be present in rescue situations. The second and by far more limiting requirement on the data, was that there has to be appreciable degradation due to airborne particles as would occur during a fire from smoke. The type of data has to at least include lidar but for better understanding other types of visual data e.g., visual camera images would be benefical. The amount of data has to be sufficient for training the learning based methods while containing mostly good quality data without degradation, since the semi-supervised method implicitely requires a larger amount of normal than anomalous training for successful training. Nonetheless, the number of anomalous data samples has to be large enough that a comprehensive evaluation of the methods' performance is possible.
-\newsubsubsectionNoTOC{Requirements}
+\newsection{data_req}{Data Requirements and Challenges}

 \threadtodo
 {list requirements we had for data}
@@ -645,28 +645,23 @@ To ensure our chosen dataset meets the needs of reliable degradation quantificat
 \end{enumerate}


-\newsubsubsectionNoTOC{Labeling Challenges}
+%\newsubsubsectionNoTOC{Labeling Challenges}

-%\todo[inline, color=green!40]{labeling is an especially problematic topic since ideally we would want an analog value which corresponds with the amount of smoke present for evaluation. for training we only require the possibility to provide labels in the form of normal or anormal targets (binary classification) and these labels do not have to be present for all data, only for some of the data (since semi-supervised only uses some labeled data as discussed in X)}
+\threadtodo
+{What are the challenges for correctly labeling the data}
+{we alluded to labels being challenging before this}
+{difficult to define degradation, difficult to capture, objective leads to puppet}
+{with all requirements and challenges know $\rightarrow$ what dataset did we choose}

-%To evaluate how proficiently any method can quantify the degradation of lidar data we require some kind of degradation label per scan. Ideally we would want an analog value per scan which somehow correlates to the degradation, but even a binary label of either degraded or not degraded would be useful. To find out which options are available for this task, we first have to figure out what degradation means in the context of lidar scans and especially the point clouds in which they result. Lidar sensors combine multiple range measurements which are executed near simultaneously into a point cloud whose reference point is the sensor location at the time of measurement. Ideally for each attempted measurement during a scan one point is produced, albeit in reality there are many factors why a fraction of the measurements cannot be completed and therefore there will be missing points even in good conditions. Additionally, there are also measurements which result in an incorrect range, like for example when an aerosol particle is hit by the measurement ray and a smaller range than was intended to be measured (to the next solid object) was returned. The sum of missing and erroneous measurements makes up the degradation, although it can be alleged that the term also includes the type or structure of errors or missing points and the resulting difficulties when further utilizing the resulting point cloud. For example, if aerosol particles are dense enough in a small portion of the frame, they could produce a point cloud where the particles are interpreted as a solid object even though the amount of erroneous measurements is smaller than for another scan where aerosol particles are evenly distributed around the sensor. In the latter case the erroneous measurements may be identified by outlier detection algorithms and after removal do not hinder further processing of the point cloud. For these reasons it is not simple to define data degradation for lidar scans. 
+Quantitative benchmarking of degradation quantification requires a degradation label for every scan. Ideally that label would be a continuous degradation score, although a binary label would still enable meaningful comparison. As the rest of this section shows, producing any reliable label is already challenging and assigning meaningful analog scores may not be feasible at all. Compounding the problem, no public search-and-rescue (SAR) LiDAR data set offers such ground truth as far as we know. To understand the challenges around labeling lidar data degradation, we will look at what constitutes degradation in this context.

-%Another option would be to try to find an objective measurement of degradation. As the degradation in our use case mostly stems from airborne particles, it stands to reason that measuring the amount of them would enable us to label each frame with an analog score which correlates to the amount of degradation. This approach turns out to be difficult to implement in real life, since sensors capable of measuring the amount and size of airborne particles typically do so at the location of the sensor while the lidar sensor also sends measurement rays into all geometries visible to it. This localized measurement could be useful if the aerosol particle distribution is uniform enough but would not allow the system to anticipate degradation in other parts of the point cloud. We are not aware of any public dataset fit for our requirements which also includes data on aerosol particle density and size.
+In section~\ref{sec:lidar_related_work} we discussed some internal and environmental error causes of lidar sensors, such as multi-return ambiguities or atmospheric scattering respectively. While we are aware of research into singular failure modes, such as \citetitle{lidar_errormodel_particles}~\cite{lidar_errormodel_particles} or research trying to model the totality of error souces occuring in other domains, such as .\citetitle{lidar_errormodel_automotive}~\cite{lidar_errormodel_automotive}, there appears to be no such model for the search and rescue domain and its unique environmental circumstances. Although, scientific consensus appears to be, that airborne particles are the biggest contributor to degradation in SAR~\cite{lidar_errormodel_consensus}, we think that a more versatile definition is required to ensure confidence during critical SAR missions, which are often of a volatile nature. We are left with an ambiguous definition of what constitutes lidar point cloud degradation in the SAR domain.

-To evaluate how effectively a method can quantify LiDAR data degradation, we require a degradation label for each scan. Ideally, each scan would be assigned an analog value that correlates with the degree of degradation, but even a binary label—indicating whether a scan is degraded or not—would be useful.
+We considered which types of objective measurements may be available to produce ground-truth labels, such as particulate matter sensors, lidar point clouds' inherent properties such as range-dropout rate and others, but we fear that using purely objective measures to label the data, would limit our learning based method to imitating the labels' sources instead of differentiating all possible degradation patterns from high quality data. Due to the incomplete error model in this domain, there may be novel or compound error sources that would not be captured using such an approach. As an example, we did observe dense smoke reflecting enough rays to produce phantom objects, which may fool SLAM algorithms. Such a case may even be labeleled incorrectly as normal by one of the aforementioned objective measurement labeling options, if the surroundings do not exhibit enough dispersed smoke particles already.

-Before identifying available options for labeling, it is essential to define what “degradation” means in the context of LiDAR scans and the resulting point clouds. LiDAR sensors combine multiple range measurements, taken nearly simultaneously, into a single point cloud with the sensor’s location as the reference point. In an ideal scenario, each measurement produces one point; however, in practice, various factors cause some measurements to be incomplete, resulting in missing points even under good conditions. Additionally, some measurements may return incorrect ranges. For example, when a measurement ray strikes an aerosol particle, it may register a shorter range than the distance to the next solid object. The combined effect of missing and erroneous measurements can be argued to constitute the scan's degradation. On the other hand, degradation could also include the type or structure of errors and missing points, which in turn affects how the point cloud can be processed further. For instance, if aerosol particles are densely concentrated in a small region, they might be interpreted as a solid object which could indicate a high level of degradation, even if the overall number of erroneous measurements is lower when compared to a scan where aerosol particles are evenly distributed. In the latter case, outlier detection algorithms might easily remove the erroneous points, minimizing their impact on subsequent processing. Thus, defining data degradation for LiDAR scans is not straightforward.
+To mitigate the aforementioned risks we adopt a human-centric, binary labelling strategy. We judged analog and multi-level discrete rating scales to be too subjective for human consideration, which only left us with the simplistic, but hopefully more reliable binary choice. We used two labeling approaches, producing two evaluation sets, whose motivation and details will be discussed in more detail in section~\ref{sec:preprocessing}. Rationale for the exact labeling procedures requires knowledge of the actual dataset we ended up choosing, which we will present in the next section.

-An alternative approach would be to establish an objective measurement of degradation. Since the degradation in our use case primarily arises from airborne particles, one might assume that directly measuring their concentration would allow us to assign an analog score that correlates with degradation. However, this approach is challenging to implement in practice. Sensors that measure airborne particle concentration and size typically do so only at the sensor’s immediate location, whereas lidar sensors emit measurement rays that traverse a wide field of view and distance. This localized measurement might be sufficient if the aerosol distribution is uniform, but it does not capture variations in degradation across the entire point cloud. To our knowledge, no public dataset exists that meets our requirements while also including detailed data on aerosol particle density and size.
-
-%For training purposes we generally do not require labels since the semi-supervised method may fall back to a unsupervised one if no labels are provided. To improve the method's performance it is possible to provide binary labels i.e., normal and anomalous-correlating to non-degraded and degraded respectively-but the amount of the provided training labels does not have to be large and can be handlabelled as is typical for semi-supervised methods, since they often work on mostly unlabeled data which is difficult or even impossible to fully label.
-
-For training, explicit labels are generally not required because the semi-supervised method we employ can operate in an unsupervised manner when labels are absent. However, incorporating binary labels—normal for non-degraded and anomalous for degraded conditions—can enhance the method's performance. Importantly, only a small number of labels is needed, and these can be hand-labeled, which is typical in semi-supervised learning where the majority of the data remains unlabeled due to the difficulty or impracticality of fully annotating the dataset.
-
-
-%\todo[inline, color=green!40]{We chose to evaulate the method on the dataset "Multimodal Dataset from Harsh Sub-Terranean Environment with Aerosol Particles for Frontier Exploration"~\cite{alexander_kyuroson_2023_7913307} which is a public dataset collected by X in a sub-terranean environment and includes data from multiple sensors on a moving sensor platform as well as experiments where sensor data is explicitely degraded by aerosol particles produced by a smoke machine.}
-
-\newsubsubsectionNoTOC{Chosen Dataset}
+\newsection{data_dataset}{Chosen Dataset}

 %\todo[inline, color=green!40]{list sensors on the platform}
 %Based on the previously discussed requirements and labeling difficulties we decided to train and evaluate the methods on \emph{Multimodal Dataset from Harsh Sub-Terranean Environment with Aerosol Particles for Frontier Exploration}~\cite{subter}. The dataset is comprised of data from multiple sensors on a moving sensor platform which was driven through tunnels and rooms in a subterranean setting. What makes it especially fitting for our use case is that during some of the experiments, an artifical smoke machine was employed to simulate aerosol particles. 
@@ -773,7 +768,7 @@ Figure~\ref{fig:data_projections} displays two examples of LiDAR point cloud pro

 %We discussed the requirements to data labels in section~\ref{sec:data}, where we mentioned the challenges but also importance of correctly labeled data, especially for evaluation. Since to our knowledege no public dataset with objective labels regarding dataset degradation of lidar data in subterranean environments is available and the dataset chosen for evaluation in this thesis \cite{subter} does not contain any explicit data or measurements about the dedata degradation, we had to choose a method of how we would label the data ourselves for evaluation. After considering multiple avenues, we decided to simply label all point clouds created during experiments with artifical smoke present as anomalies and all point clouds from other experiments as normal data. 

-We discussed the challenges and importance of obtaining correctly labeled data in Section~\ref{sec:data}, particularly for evaluation purposes. Since, to our knowledge, no public dataset provides objective labels for LiDAR data degradation in subterranean environments—and the dataset selected for this thesis \cite{subter} lacks explicit measurements of degradation—we had to develop our own labeling approach. After considering several options, we decided to label all point clouds from experiments with artificial smoke as anomalies, while point clouds from experiments without smoke were labeled as normal data.
+We discussed the challenges and importance of obtaining correctly labeled data in Section~\ref{sec:data_req}, particularly for evaluation purposes. Since, to our knowledge, no public dataset provides objective labels for LiDAR data degradation in subterranean environments—and the dataset selected for this thesis \cite{subter} lacks explicit measurements of degradation—we had to develop our own labeling approach. After considering several options, we decided to label all point clouds from experiments with artificial smoke as anomalies, while point clouds from experiments without smoke were labeled as normal data.

 %\todo[inline, color=green!40]{this simple labeling method is quite flawed since we do not label based on the actual degradation of the scan (not by some kind of threshold of analog measurement threshold, statistical info about scan) since (TODO FIXME) this would result in training which only learns this given metric (example missing measurement points) which would make this methodology useless since we could simply use that same measurement as an more simple way to quantify the scan's degradation. }