more data chapter text and one figure

2025-03-07 11:37:50 +01:00
parent 0debdc55dd
commit 65cce288dc
3 changed files with 47 additions and 8 deletions
@@ -328,8 +328,11 @@ For training, explicit labels are generally not required because the semi-superv

 \newsubsubsectionNoTOC{Chosen Dataset}

-Based on the previously discussed requirements and labeling difficulties we decided to train and evaluate the methods on \emph{Multimodal Dataset from Harsh Sub-Terranean Environment with Aerosol Particles for Frontier Exploration}~\cite{subter}. The dataset is comprised of data from multiple sensors on a moving sensor platform which was driven through tunnels and rooms in a subterranean setting. What makes it especially fitting for our use case is that during some of the experiments where data was captured, an artifical smoke machine was employed to simulate aerosol particles. 
-The sensors employed during capture of the dataset include:
+  %\todo[inline, color=green!40]{list sensors on the platform}
+%Based on the previously discussed requirements and labeling difficulties we decided to train and evaluate the methods on \emph{Multimodal Dataset from Harsh Sub-Terranean Environment with Aerosol Particles for Frontier Exploration}~\cite{subter}. The dataset is comprised of data from multiple sensors on a moving sensor platform which was driven through tunnels and rooms in a subterranean setting. What makes it especially fitting for our use case is that during some of the experiments, an artifical smoke machine was employed to simulate aerosol particles. 
+%The sensors employed during capture of the dataset include:
+Based on the previously discussed requirements and the challenges of obtaining reliable labels, we selected the \emph{Multimodal Dataset from Harsh Sub-Terranean Environment with Aerosol Particles for Frontier Exploration}~\cite{subter} for training and evaluation. This dataset comprises multimodal sensor data collected from a moving platform navigating tunnels and rooms in a subterranean environment. Notably, some experiments incorporated an artificial smoke machine to simulate aerosol particles, making the dataset particularly well-suited to our use case. The sensors used during data capture include:
+
 \begin{itemize}
  \item Lidar - Ouster OS1-32
  \item mmWave RADARs - 4 IWR6843AoP ES2.0 based radar models
@@ -337,11 +340,33 @@ The sensors employed during capture of the dataset include:
  \item IR-enabled RBG-D Camera - OAK-D Pro
  \item IMU - Pixhawk 2.1 Cube Orange,
 \end{itemize}
-We mainly utilize the data from the \emph{Ouster OS1-32} lidar sensor, which produces 10 frames per second with a resolution of 32 vertical channels by 2048 measurements per channel, both equiangularly spaced over the vertical and horizontal fields of view of 42.4° and 360° respectively.
+    %\todo[inline, color=green!40]{lidar data of 360° sensor is captured at 10 frames per second. each sensor output consists of pointcloud which resulted from measurement of 32 vertical channels for each of which 2048 measurement points are taken during each measurement equiangular distributed around the whole horizontal 360°, so the sensor measures 32 * 2048 = 65536 measurements 10 times a second for which ideally every one produces a point in the pointcloud consisting of x,y,z coordinates (relative to sensor platform) as well as some other values per measurement (reflectivity, intensity originally measured range value)}

-    \todo[inline, color=green!40]{list sensors on the platform}
-    \todo[inline, color=green!40]{talk about how much data is available (maybe a plot about data?), number of experiments with/without degradation, other factors in these experiments which do not concern our use-case of them}
-    \todo[inline, color=green!40]{lidar data of 360° sensor is captured at 10 frames per second. each sensor output consists of pointcloud which resulted from measurement of 32 vertical channels for each of which 2048 measurement points are taken during each measurement equiangular distributed around the whole horizontal 360°, so the sensor measures 32 * 2048 = 65536 measurements 10 times a second for which ideally every one produces a point in the pointcloud consisting of x,y,z coordinates (relative to sensor platform) as well as some other values per measurement (reflectivity, intensity originally measured range value)}
+%We mainly utilize the data from the \emph{Ouster OS1-32} lidar sensor, which produces 10 frames per second with a resolution of 32 vertical channels by 2048 measurements per channel, both equiangularly spaced over the vertical and horizontal fields of view of 42.4° and 360° respectively. Every measurement of the lidar therefore results in a pointcloud with a maximum of 65536 points. Every point contains the \emph{X}, \emph{Y} and \emph{Z} coordinates in meters with the sensor location as origin, as well as values for the \emph{range}, \emph{intensity} and \emph{reflectivity} which are typical data measured by lidar sensors. The data is dense, meaning missing measurements are still present in the data of each pointcloud with zero values for most fields.
+
+We use data from the \emph{Ouster OS1-32} LiDAR sensor, which was configured to capture 10 frames per second with a resolution of 32 vertical channels and 2048 measurements per channel. These settings yield equiangular measurements across a vertical field of view of 42.4° and a complete 360° horizontal field of view. Consequently, every LiDAR scan can generate up to 65,536 points. Each point contains the \emph{X}, \emph{Y}, and \emph{Z} coordinates (in meters, with the sensor location as the origin) along with values for \emph{range}, \emph{intensity}, and \emph{reflectivity}—typical metrics measured by LiDAR sensors. Although the dataset is considered dense, each point cloud still contains missing measurements, with fields of these missing measurements registering as zero.
+
+%During the measurement campaign 14 experiments were conducted, of which 10 did not contain the utilization of the artifical smoke machine and 4 which did contain the artifical degradation, henceforth refered to as normal and anomalous experiments respectively. During 13 of the experiments the sensor platform was in near constant movement (sometimes translation - sometimes rotation) with only 1 anomalous experiment having the sensor platform stationary. This means we do not have 2 stationary experiments to directly compare the data from a normal and an anomalous experiment, where the sensor platform was not moved, nonetheless the genereal experiments are similar enough for direct comparisons. During anomalous experiments the artifical smoke machine appears to have been running for some time before data collection, since in camera images and lidar data alike, the water vapor appears to be distributed quite evenly throughout the closer perimeter of the smoke machine. The stationary experiment is also unique in that the smoke machine is quite close to the sensor platform and actively produces new smoke, which is dense enough for the lidar data to see the surface of the newly produced water vapor as a solid object. 
+
+During the measurement campaign, 14 experiments were conducted—10 without the artificial smoke machine (hereafter referred to as normal experiments) and 4 with it (anomalous experiments). In 13 of these experiments, the sensor platform was in near-constant motion (either translating or rotating), with only one anomalous experiment conducted while the platform remained stationary. Although this means we do not have two stationary experiments for a direct comparison between normal and anomalous conditions, the overall experiments are similar enough to allow for meaningful comparisons.
+
+In the anomalous experiments, the artificial smoke machine appears to have been running for some time before data collection began, as evidenced by both camera images and LiDAR data showing an even distribution of water vapor around the machine. The stationary experiment is particularly unique: the smoke machine was positioned very close to the sensor platform and was actively generating new, dense smoke, to the extent that the LiDAR registered the surface of the fresh water vapor as if it were a solid object.
+
+
+\todo[inline, color=green!40]{shortly mention the differences in conditions for these experiments and why they do not matter for us}
+\todo[inline, color=green!40]{include representative image of pointcloud and camera image}
+
+    %\todo[inline, color=green!40]{talk about how much data is available (maybe a plot about data?), number of experiments with/without degradation, other factors in these experiments which do not concern our use-case of them}
+%Regarding the amount of data, of the 10 normal experiments the shortest was 88.7 seconds and the longest 363.1 seconds with a mean of 157.65 seconds between all 10 experiments, which results in 15765 non-degraded pointclouds. Of the 4 anomalous experiments, the shortest was the stationary one with 11.7 seconds and the longest was 62.1 seconds, having a mean of 47.325 seconds, resulting in 1893 degraded pointclouds. This gives us 17658 pointclouds alltogether with 89.28\% of them being non-degraded/normal samples and the other 10.72\% of them begin degraded/anomalous samples. 
+
+Regarding the dataset volume, the 10 normal experiments ranged from 88.7 to 363.1 seconds, with an average duration of 157.65 seconds. At a capture rate of 10 frames per second, these experiments yield 15,765 non-degraded point clouds. In contrast, the 4 anomalous experiments, including one stationary experiment lasting 11.7 seconds and another extending to 62.1 seconds, averaged 47.33 seconds, resulting in 1,893 degraded point clouds. In total, the dataset comprises 17,658 point clouds, with approximately 89.28\% classified as non-degraded (normal) and 10.72\% as degraded (anomalous). The distribution of experimental data is visualized in figure~\ref{fig:data_points_pie}.
+
+    \begin{figure}
+      \begin{center}
+        \includegraphics[width=0.9\textwidth]{figures/data_points_pie.png}
+      \end{center}
+      \caption{Pie chart visualizing the amount and distribution of normal and anomalous pointclouds in \cite{subter}}\label{fig:data_points_pie}
+    \end{figure}

    %BEGIN missing points
    As we can see in figure~\ref{fig:data_missing_points}, the artifical smoke introduced as explicit degradation during some experiments results in more missing measurements during scans, which can be explained by measurement rays hitting airborne particles but not being reflected back to the sensor in a way it can measure.
@@ -370,7 +395,7 @@ While the density of these near-sensor returns might be used to estimate data qu
    \end{figure}
    %END early returns

-    \newsection{Preprocessing Steps}{sec:preprocessing}
+    \newsection{Preprocessing Steps and Labeling}{sec:preprocessing}
    \todo[inline]{describe how 3d lidar data was preprocessed (2d projection), labeling}
    \todo[inline]{screenshots of 2d projections?}