experiment chapter dataloading wip

2025-08-07 15:11:12 +02:00
parent a24ac3af5c
commit bb875cc08e
1 changed files with 86 additions and 38 deletions
--- a/thesis/Main.tex
+++ b/thesis/Main.tex
@@ -66,8 +66,8 @@
 \usepackage{xcolor}
 \usepackage[colorinlistoftodos]{todonotes}
 \usepackage{makecell}
 %\usepackage[disable]{todonotes}
 \usepackage{makecell}
 \DeclareRobustCommand{\threadtodo}[4]{%
  \todo[inline,
@@ -840,7 +840,7 @@ As described in Section~\ref{sec:algorithm_description}, the method under evalua
 %For this reason we decided to preprocess the point clouds by converting them to two dimensional grayscale images using spherical projection. Additionally, \cite{degradation_quantification_rain}-which we modeled our approach after-successfully chose this approach. In the projected image each measurement is encoded to a single pixel, whose grayscale value $v$ is the normalized range of the measurement $v = \sqrt{\emph{X}^2 + \emph{Y}^2 + \emph{Z}^2}$. Due to the settings of the datasets' lidar, this results in images with the resolution of 2048 pixels wide by 32 pixels tall. Missing measurements of the point cloud are mapped to pixels with a brightness of 0. To create the mapping we used the measurements indices and channels which are available since the dataset contains dense point clouds and which can be used since the point indices are ordered from 0 to 65535 horizontally ascending channel by channel. For point clouds without indices which can be directly mapped, as is often the case for sparse ones, it would be necessary to use the pitch and yaw angles to the sensor origin to map each point to a pixel on the projection.
-For this reason and to simplify the architecture, we converted the point clouds into two-dimensional grayscale images using a spherical projection. This approach—proven sucessful in related work~\cite{degradation_quantification_rain}—encodes each LiDAR measurement as a single pixel, where the pixel’s grayscale value is determined by the normalized range, calculated as $v = \frac{1}{\sqrt{\emph{X}^2 + \emph{Y}^2 + \emph{Z}^2}}$. Given the LiDAR sensor's configuration, the resulting images have a resolution of 2048 pixels in width and 32 pixels in height. Missing measurements in the point cloud are mapped to pixels with a brightness value of $v = 0$.
+For this reason and to simplify the architecture, we converted the point clouds into two-dimensional grayscale images using a spherical projection. This approach—proven sucessful in related work~\cite{degradation_quantification_rain}—encodes each LiDAR measurement as a single pixel, where the pixel’s grayscale value is determined by the reciprocal range, calculated as $v = \frac{1}{\sqrt{\emph{X}^2 + \emph{Y}^2 + \emph{Z}^2}}$. Given the LiDAR sensor's configuration, the resulting images have a resolution of 2048 pixels in width and 32 pixels in height. Missing measurements in the point cloud are mapped to pixels with a brightness value of $v = 0$.
 To create this mapping, we leveraged the available measurement indices and channel information inherent in the dense point clouds, which are ordered from 0 to 65,535 in a horizontally ascending, channel-by-channel manner. For sparser point clouds without such indices, one would need to rely on the pitch and yaw angles relative to the sensor's origin to correctly map each point to its corresponding pixel.
@@ -850,7 +850,7 @@ Figure~\ref{fig:data_projections} displays two examples of LiDAR point cloud pro
 \todo[inline, color=green!40]{add same projections as they are used in training? grayscale without vertical scaling}
-\fig{data_projections}{figures/data_2d_projections.png}{Two-dimensional projections of two pointclouds, one from an experiment without degradation and one from an experiment with artifical smoke as degradation. To aid the readers perception, the images are vertically stretched and a colormap has been applied to the pixels' normalized range values, while the actual training data is grayscale.}
+\fig{data_projections}{figures/data_2d_projections.png}{Two-dimensional projections of two pointclouds, one from an experiment without degradation and one from an experiment with artifical smoke as degradation. To aid the readers perception, the images are vertically stretched and a colormap has been applied to the pixels' reciprocal range values, while the actual training data is grayscale.}
@@ -912,11 +912,11 @@ In the following sections, we detail our adaptations to this framework:
 \end{itemize}
-%\todo[inline]{codebase}
+\section{Framework \& Data Preparation}
 % Combines: Framework Initialization + Data Integration
 % Goals: introduce codebase, how you adapted it, dataset loading/preprocessing, labeling
 \newsubsubsectionNoTOC{DeepSAD PyTorch codebase and our adaptations}
 \newsection{setup_overview}{Experimental Setup Overview}
 %\todo[inline]{starting from deepsad codebase}
 \threadtodo
 {Explain deepsad codebase as starting point}
 {what is the starting point?}
@@ -925,37 +925,79 @@ In the following sections, we detail our adaptations to this framework:
 The PyTorch implementation of the DeepSAD framework includes the MNIST, Fashion-MNIST, and CIFAR-10 datasets and arrhythmia, cardio, satellite, satimage-2, shuttle, and thyroid datasets from \citetitle{odds}~\cite{odds}, as well as suitable autoencoder and DeepSAD network architectures for the corresponding datatypes. The framework can train and test DeepSAD as well as a number of baseline algorithms, namely SSAD, OCSVM, Isolation Forest, KDE and SemiDGM with the loaded data and evaluate their performance by calculating the ROC area under curve for all given algorithms. We adapted this implementation which was originally developed for Python 3.7 to work with Python 3.12 and changed or added functionality for dataloading our chosen dataset, added DeepSAD models that work with the lidar projections datatype, more evaluation methods and an inference module.
-%\todo[inline]{data preprocessed (2d projections, normalized range)}
+\newsubsubsectionNoTOC{SubTERR dataset preprocessing, train/test splits, and label strategy}
 \threadtodo
 {explain how dataloading was adapted}
 {loading data first point step to new training}
 {preprocessed numpy (script), load, labels/meta, split, k-fold}
 {k-fold $\rightarrow$ also adapted in training/testing}
-dataset in rosbag format (one bag file per experiment) was preprocessed as mentioned in chapter X by projecting the 3d lidar data (xzy pointcloud) using a spherical projection in a python script and saved as a npy araray of dimensions frames by height by width with value normalized distance (1 over sqrt(distance)) using numpy save method for simplicity while loading and to avoid having to do this preprocessing during each experiment. the projection was done using the meta information in the bag which includes the channel (height/row) and the index which is available since the data is non-sparse/dense, which means that for each possible measurement a data is available in the original rosbag even if the sensor did not record a return ray for this measurement, which means there is no data and it could be left out in a sparse array saving file size. this is very helpful since it allows the direct mapping of all measurements to the spherical projection using channel as the height index and measurement_idx modulo (measurements / channel) as the width index for each measurement. the reason that this is useful is that otherwise the projection would have to be calculated, meaning the angles between the origin and each point from the point cloud would have to be used to reconstruct the mapping between each measurement and a pixel in the projection. we also tried this method originally which lead to many ambiguities in the mappings were sometimes multiple measurements were erroneously mapped to the same pixel with no clear way to differentiate between which of them was mapped incorrectly. this is most likely due to quantification errors, systematic and sporadic measurement errors and other unforseen problems. for these reasons the index based mapping is a boon to us in this dataset. it should also be mentioned that lidar sensors originally calculate the distance to an object by measuring the time it takes for an emitted ray to return (bg chapter lidar ref) and the point cloud point is only calculated using this data and the known measurement angles. for this reason it is typically possible to configure lidar sensors to provide this original data which is basically the same as the 2d projection directly, without having to calculate it from the pointcloud.
+%dataset in rosbag format (one bag file per experiment) was preprocessed as mentioned in chapter X by projecting the 3d lidar data (xzy pointcloud) using a spherical projection in a python script and saved as a npy araray of dimensions frames by height by width with value normalized distance (1 over sqrt(distance)) using numpy save method for simplicity while loading and to avoid having to do this preprocessing during each experiment. the projection was done using the meta information in the bag which includes the channel (height/row) and the index which is available since the data is non-sparse/dense, which means that for each possible measurement a data is available in the original rosbag even if the sensor did not record a return ray for this measurement, which means there is no data and it could be left out in a sparse array saving file size. this is very helpful since it allows the direct mapping of all measurements to the spherical projection using channel as the height index and measurement index modulo (measurements / channel) as the width index for each measurement. the reason that this is useful is that otherwise the projection would have to be calculated, meaning the angles between the origin and each point from the point cloud would have to be used to reconstruct the mapping between each measurement and a pixel in the projection. we also tried this method originally which lead to many ambiguities in the mappings were sometimes multiple measurements were erroneously mapped to the same pixel with no clear way to differentiate between which of them was mapped incorrectly. this is most likely due to quantification errors, systematic and sporadic measurement errors and other unforseen problems. for these reasons the index based mapping is a boon to us in this dataset. it should also be mentioned that lidar sensors originally calculate the distance to an object by measuring the time it takes for an emitted ray to return (bg chapter lidar ref) and the point cloud point is only calculated using this data and the known measurement angles. for this reason it is typically possible to configure lidar sensors to provide this original data which is basically the same as the 2d projection directly, without having to calculate it from the pointcloud.
 %\todo[inline]{why normalize range?}
 The raw SubTERR dataset is provided as ROS bag files—one per experiment—each containing a dense 3D point cloud from the Ouster OS1-32 LiDAR. To streamline training and avoid repeated heavy computation, we project these point clouds offline into 2D “range images” and save them as NumPy arrays. We apply a spherical projection that maps each LiDAR measurement to a pixel in a 2D image of size Height × Width, where Height = number of vertical channels (32) and Width = measurements per rotation (2048). Instead of computing per-point azimuth and elevation angles at runtime, we exploit the sensor’s metadata:
 \begin{itemize}
 	\item \textbf{Channel index:} directly gives the row (vertical position) of each measurement.
 	\item \textbf{Measurement index:} by taking the measurement index modulo Width, we obtain the column (horizontal position) in the 360° sweep.
 \end{itemize}
 Because the SubTERR data is dense—every possible channel × measurement pair appears in the bag, even if the LiDAR did not record a return-we can perform a direct 1:1 mapping without collision or missing entries. This avoids the ambiguities we previously encountered when reconstructing the projection via angle computations alone, which sometimes mapped multiple points to the same pixel due to numerical errors in angle estimation.
 For each projected pixel, we compute $\mathbf{v}_i = \sqrt{{x_i}^2 + {y_i}^2 + {z_i}^2}$ where $\mathbf{v}_i$ is the reciprocal range value assigned to each pixel in the projection and $x_i, y_i$ and $z_i$ are the corresponding measurement's 3d coordinates. This transformation both compresses the dynamic range and emphasizes close-range returns—critical for detecting near-sensor degradation. We then save the resulting tensor of shape (Number of Frames, Height, Width) using NumPy’s save function. Storing precomputed projections allows rapid data loading during training and evaluation.
 Many modern LiDARs can be configured to output range images directly—bypassing the need for post-hoc projection—since they already compute per-beam azimuth and elevation internally. When available, such native range-image streams can further simplify preprocessing or even allow skipping this step completely.
 \newsubsubsectionNoTOC{Any implementation challenges or custom data loaders}
 %the original code base utilized pytorch's dataloaders, so we added a new one which used the framework's existing structure but loaded the aforementioned numpy files. in addition to loading these we assign the two kinds of evaluation labels discussed in section~\ref{sec:preprocessing} by using the original experiment's name which either contained the word smoke or not as the deciding factor for correspdondingly anomalous and normal labels (which are -1 and +1 respectively) as the first type of evaluation labels called henceforth "experiment-based labels" and loaded a JSON file which contained manually chosen start and end frames  for the 4 experiments containing smoke degradation and used these to only assign -1 (the anomalous label) to these frames which were manually selected to be definitely degraded and +1 to all experiments which once again didn't have the word smoke in their file names and therefore did not contain artifical degradation by smoke machine. the frames before the manually chosen start frame and the ones after the manually chosen end frame were labeled as "unknown" with a 0 value and not used in the evaluation. this second type of evaluation label method is henceforth called "manually defined" evaluation labels.  
 We extended the DeepSAD framework’s PyTorch \texttt{DataLoader} by implementing a custom \texttt{Dataset} class that ingests our precomputed NumPy range-image files and attaches appropriate evaluation labels.
 Each experiment’s frames are stored as a single \texttt{.npy} file of shape \((\text{Number of Frames}, H, W)\), containing the reciprocal range values described in Section~\ref{sec:preprocessing}. Our \texttt{Dataset} initializer scans a directory of these files, loads the NumPy arrays from file into memory, transforms them into PyTorch tensors and assigns evaluation and training labels accordingly.
 The first labeling scheme, called \emph{experiment-based labels}, assigns
 \[
 	y_{\mathrm{exp}} =
 	\begin{cases}
 		-1 & \text{if the filename contains “smoke”, signifying anomalous/degraded data,} \\
 		+1 & \text{otherwise, signifying normal data.}
 	\end{cases}
 \]
 At load time, any file with “smoke” in its name is treated as anomalous (label \(-1\)), and all others (normal experiments) are labeled \(+1\).
 To obtain a second source of ground truth, we also support \emph{manually-defined labels}. A companion JSON file specifies a start and end frame index for each of the four smoke experiments—defining the interval of unequivocal degradation. During loading:
 \[
 	y_{\mathrm{man}} =
 	\begin{cases}
 		-1 & \text{Frames within the manually selected window from smoke experiments}  \\
 		+1 & \text{All frames from non-smoke experiments}                              \\
 		0  & \text{Frames outside the manually selected window from smoke experiments}
 	\end{cases}
 \]
 We pass instances of this \texttt{Dataset} to PyTorch’s \texttt{DataLoader}, enabling batch sampling, shuffling, and multi-worker loading. The dataloader returns the preprocessed lidar projection, both evaluation labels and a semi-supervised training label. This modular design lets us train and evaluate DeepSAD under both labeling regimes without duplicating data-handling code.
 %since deepsad is a semi-supervised method which allows for optional training labels to improve performance over the fully unsupervised case, the pytorch dataset is passed parameters which define the number of samples which should have training labels during training, individually for anomalies and normal data samples. this allows for experiments that compare unsupervised to semi-supervised cases and how the introduction of different number of labeled data during training affects the model's performance. for the semi-supervised training labels the manually defined evaluation labels are used as an initial source of which randomized labels are removed (by replacing their value with 0 which indicates no known status of either anomalous or normal) until the desired number of training labels per class (normal / anomalous) which was passed as a configuration parameter was is reached. 
 %because the amount of data is not too large, we also implemented k-fold training and evaluation to improve confidence in the evaluation results, which means an integer is passed as the number of desired folds, which in turn will define the split between training and evaluation data. for our trainings we always chose a 5 fold training and evaluation which results in a 20 80 split of training evaluation data. for the implementation of the k-fold crossvalidation data loading we utilized the KFold class included in sklearn's model\_selection module.
 %additionally we implemented another pytorch dataset which loads a single experiment from the corresponding numpy file for inference, which does not use k-fold crossvalidation nor shuffles the frames around to allow for sequential calculations of forward passes on fully trained models to validate their funcationality and produce sequential frame-by-frame infreence results (anomaly scores) for one complete experiment from start to finish
 %\todo[inline]{semi-supervised trainingn labels, k_fold, inference dataloader}
 \section{Model Configuration \& Evaluation Protocol}
 \newsubsubsectionNoTOC{Network architectures (LeNet variant, custom encoder) and how they suit the point‑cloud input}
 %\todo[inline]{k-fold data loading, training, testing}
 \threadtodo
 {how was training/testing adapted (networks overview), inference, ae tuning}
 {data has been loaded, how is it processed}
 {networks defined, training/testing k-fold, more metrics, inference + ae tuning implemented}
 {training procesure known $\rightarrow$ what methods were evaluated}
 %\todo[inline]{deepsad + baselines = isoforest, ocsvm (deepsad ae, dim reduction)}
 \threadtodo
 {what methods were evaluated}
 {we know what testing/training was implemented for deepsad, but what is it compared to}
 {isoforest, ocsvm adapted, for ocsvm only dim reduced feasible (ae from deepsad)}
 {compared methods known $\rightarrow$ what methods were used}
 %\todo[inline]{roc, prc, inference}
 \threadtodo
 {what evaluation methods were used}
 {we know what is compared but want to know exactly how}
 {explain roc, prc, inference with experiment left out of training}
 {experiment overview given $\rightarrow$ details to deepsad during training?}
 \newsection{setup_deepsad}{DeepSAD Experimental Setup}
 \threadtodo
 {custom arch necessary, first lenet then second arch to evaluate importance of arch}
 {training process understood, but what networks were actually trained}
@@ -974,32 +1016,38 @@ dataset in rosbag format (one bag file per experiment) was preprocessed as menti
 {LR, eta, epochs, latent space size (hyper param search), semi labels}
 {everything that goes into training known $\rightarrow$ what experiments were actually done?}
-\newsection{setup_matrix_hardware_runtime}{Experiment Matrix, Hardware and Runtimes}
+\newsubsubsectionNoTOC{Baseline methods (Isolation Forest, one-class SVM) and feature extraction via the encoder}
 \threadtodo
 {what methods were evaluated}
 {we know what testing/training was implemented for deepsad, but what is it compared to}
 {isoforest, ocsvm adapted, for ocsvm only dim reduced feasible (ae from deepsad)}
 {compared methods known $\rightarrow$ what methods were used}
 \newsubsubsectionNoTOC{Training procedure (k‑fold cross‑validation, semi‑supervised loss) and hyperparameter choices}
 \newsubsubsectionNoTOC{Evaluation metrics (ROC, PRC, AUC, F1) and inference protocol}
-%\todo[inline]{what experiments were performed and why (table/list containing experiments)}
+\threadtodo
 {what evaluation methods were used}
 {we know what is compared but want to know exactly how}
 {explain roc, prc, inference with experiment left out of training}
 {experiment overview given $\rightarrow$ details to deepsad during training?}
 \section{Experiment Matrix \& Computational Environment}
 % Combines: Experiment Matrix + Hardware & Runtimes
 % Goals: clearly enumerate each experiment configuration and give practical runtime details
 \newsubsubsectionNoTOC{Table of experiment variants (architectures, hyperparameters, data splits)}
 \threadtodo
 {give overview of experiments and their motivations}
 {training setup clear, but not what was trained/tested}
 {explanation of what was searched for (ae latent space first), other hyperparams and why}
 {all experiments known $\rightarrow$ how long do they take to train}
-
+\newsubsubsectionNoTOC{Hardware specifications (GPU/CPU, memory), software versions, typical training/inference runtimes}
 \threadtodo
 {give overview about hardware setup and how long things take to train}
 {we know what we trained but not how long that takes}
 {table of hardware and of how long different trainings took}
 {experiment setup understood $\rightarrow$ what were the experiments' results}
 % \newsection{autoencoder_architecture}{Deep SAD Autoencoder Architecture}
 % \newsection{data_setup}{Training/Evaluation Data Distribution}
 % \todo[inline]{which data was used how in training/evaluation}
 % \todo[inline]{explain concept of global/local application for global-/window quantifiction}
 %
 % \newsection{evaluation_metrics}{Evaluation Metrics}
 % \todo[inline]{k-fold evaluation, ROC, generalization (evaluation on other datasets?)}
 %
 % \newsection{hyperparameters}{Hyperparameters}
 % \todo[inline]{vary hyperparameters (no labeled anomalies vs some), specific training on local windows (only z-axis difference?), window size?}
 \newchapter{results_discussion}{Results and Discussion}
 \newsection{results}{Results}
 \todo[inline]{some results, ROC curves, for both global and local}