results ae section

This commit is contained in:
Jan Kowalczyk
2025-09-18 11:58:28 +02:00
parent 8f36bd2e07
commit a20a4a0832
4 changed files with 393 additions and 145 deletions

Binary file not shown.

View File

@@ -1436,151 +1436,103 @@ Together, these results provide a comprehensive overview of the computational re
\newchapter{results_discussion}{Results and Discussion}
\threadtodo
{Introduce the structure and scope of the results chapter}
{The reader knows the experiments from the previous chapter, but not the outcomes}
{State that we will first analyze autoencoder results, then anomaly detection performance, and finally inference experiments}
{Clear roadmap $\rightarrow$ prepares reader for detailed sections}
%The results from the experiments described in chapter~\ref{chp:experimental_setup} will be presented in this chapter as follows: the pretraining results from training the two autoencoder architectures for multiple latent space dimensionalities will be shown an discussed in section~\ref{sec:results_pretraining}, the results from training DeepSAD and comparing it with the baseline algorithms will be presented and discussed in section~\ref{sec:results_deepsad} and lastly we will present some plots from running inference on experiments which were held-out during training, to improve the reader's grasp on how the algorithms would perform and may be used in real-world applications.
The experiments described in Chapter~\ref{chp:experimental_setup} are presented in this chapter. We begin in Section~\ref{sec:results_pretraining} with the pretraining stage, where the two autoencoder architectures were trained across multiple latent space dimensionalities. These results provide insight into the representational capacity of each architecture. In Section~\ref{sec:results_deepsad}, we turn to the main experiments: training DeepSAD models and benchmarking them against baseline algorithms (Isolation Forest and One-Class SVM). Finally, in Section~\ref{sec:results_inference}, we present inference results on experiments that were held out during training. These plots illustrate how the algorithms behave when applied sequentially to unseen traversals, offering a more practical perspective on their potential for real-world rescue robotics applications.
% --- Section: Autoencoder Pretraining Results ---
\section{Autoencoder Pretraining Results}
\newsection{results_pretraining}{Autoencoder Pretraining Results}
\threadtodo
{Present autoencoder reconstruction performance across architectures and latent sizes}
{Important because latent size and architecture determine representation quality, which may affect DeepSAD later}
{Show reconstruction losses over latent dimensions, compare Efficient vs LeNet}
{Understanding representation capacity $\rightarrow$ motivates analyzing if AE results transfer to DeepSAD}
The results of pretraining the two autoencoder architectures are summarized in Table~\ref{tab:pretraining_loss}. Reconstruction performance is reported as mean squared error (MSE), with trends visualized in Figure~\ref{fig:ae_loss_overall}. The results show that the modified Efficient architecture consistently outperforms the LeNet-inspired baseline across all latent space dimensionalities. The improvement is most pronounced at lower-dimensional bottlenecks (e.g., 32 or 64 dimensions) but remains observable up to 1024 dimensions, although the gap narrows.
%\fig{ae_loss_overall}{figures/ae_loss_overall.png}{Reconstruction loss across latent dimensions for LeNet-inspired and Efficient architectures.}
\begin{table}[t]
\centering
\label{tab:pretraining_loss}
\begin{tabularx}{\textwidth}{c*{2}{Y}|*{2}{Y}}
\toprule
& \multicolumn{2}{c}{Overall loss} & \multicolumn{2}{c}{Anomaly loss} \\
\cmidrule(lr){2-3} \cmidrule(lr){4-5}
Latent Dim. & LeNet & Efficient & LeNet & Efficient \\
\midrule
32 & 0.0223 & \textbf{0.0136} & 0.0701 & \textbf{0.0554} \\
64 & 0.0168 & \textbf{0.0117} & 0.0613 & \textbf{0.0518} \\
128 & 0.0140 & \textbf{0.0110} & 0.0564 & \textbf{0.0506} \\
256 & 0.0121 & \textbf{0.0106} & 0.0529 & \textbf{0.0498} \\
512 & 0.0112 & \textbf{0.0103} & 0.0514 & \textbf{0.0491} \\
768 & 0.0109 & \textbf{0.0102} & 0.0505 & \textbf{0.0490} \\
1024 & 0.0106 & \textbf{0.0101} & 0.0500 & \textbf{0.0489} \\
\bottomrule
\end{tabularx}
\caption{Autoencoder pre-training MSE losses across latent dimensions. Left: overall loss; Right: anomaly-only loss. Cells show means across folds (no $\pm$std). Maximum observed standard deviation across all cells (not shown): 0.0067.}
\end{table}
\threadtodo
{Analyze anomaly reconstruction performance specifically}
{Critical because degraded inputs may reconstruct differently, showing whether networks capture degradation structure}
{Show reconstruction losses on anomalous-only data subset}
{This analysis $\rightarrow$ motivates testing whether better AE reconstructions imply better anomaly detection}
\fig{ae_loss_overall}{figures/ae_elbow_test_loss_overall.png}{Reconstruction loss across latent dimensions for LeNet-inspired and Efficient architectures.}
%\fig{ae_loss_degraded}{figures/ae_loss_degraded.png}{Reconstruction loss on degraded-only subsets.}
Because overall reconstruction loss might obscure how well encoders represent anomalous samples, we additionally evaluate reconstruction errors only on degraded samples from hand-labeled smoke segments (Figure~\ref{fig:ae_loss_degraded}). As expected, reconstruction losses are about 0.05 higher on these challenging samples than in the overall evaluation. However, the relative advantage of the Efficient architecture remains, suggesting that its improvements extend to anomalous inputs as well.
\fig{ae_loss_degraded}{figures/ae_elbow_test_loss_anomaly.png}{Reconstruction loss across latent dimensions for LeNet-inspired and Efficient architectures, evaluated only on degraded data from hand-labeled smoke experiments.}
% It is important to note that absolute MSE values are difficult to interpret in isolation, as their magnitude depends on the data scaling and chosen reconstruction target. More detailed evaluations in terms of error in meters, relative error, or distance-binned metrics could provide richer insights into encoder quality. However, since the downstream anomaly detection results (Section~\ref{sec:results_deepsad}) do not reveal significant differences between pretraining regimes, such detailed pretraining evaluation was not pursued here. Instead, these metrics are left as promising directions for future work, particularly if pretraining were to play a larger role in final detection performance.
%In the following, we therefore focus on the main question of whether improved reconstruction performance during pretraining translates into measurable benefits for anomaly detection in DeepSAD.
It is important to note that absolute MSE values are difficult to interpret in isolation, as their magnitude depends on the data scaling and chosen reconstruction target. More detailed evaluations in terms of error in meters, relative error, or distance-binned metrics could provide richer insights into encoder quality. However, since the downstream anomaly detection results (Section~\ref{sec:results_deepsad}) do not reveal significant differences between pretraining regimes, such detailed pretraining evaluation was not pursued here. Instead, we restrict ourselves to reporting the reconstruction trends and leave more in-depth pretraining analysis as future work.
% % --- Section: Autoencoder Pretraining Results ---
% \newsection{results_pretraining}{Autoencoder Pretraining Results}
%
% \threadtodo
% {Present autoencoder reconstruction performance across architectures and latent sizes}
% {Important because latent size and architecture determine representation quality, which may affect DeepSAD later}
% {Show reconstruction losses over latent dimensions, compare Efficient vs LeNet}
% {Understanding representation capacity $\rightarrow$ motivates analyzing if AE results transfer to DeepSAD}
%
% The results from pretraining the two autoencoder architectures are shown as MSE-Loss in table~\ref{tab:pretraining_loss} and demonstrate that the modifications to the original LeNet-inspired architecture improved the reconstruction performance as can be seen in figure~\ref{fig:ae_loss_overall} which shows that especially at lower latent space dimensionalities the modified architecture results in a highly improved loss when compared to the LeNet-inspired one and that the improvement persists all the way up to a 1024-dimensional latent space, though the difference becomes less pronounced.
%
% \begin{table}[t]
% \centering
% \label{tab:pretraining_loss}
% \begin{tabularx}{\textwidth}{c*{2}{Y}|*{2}{Y}}
% \toprule
% & \multicolumn{2}{c}{Overall loss} & \multicolumn{2}{c}{Anomaly loss} \\
% \cmidrule(lr){2-3} \cmidrule(lr){4-5}
% Latent Dim. & LeNet & Efficient & LeNet & Efficient \\
% \midrule
% 32 & 0.0223 & \textbf{0.0136} & 0.0701 & \textbf{0.0554} \\
% 64 & 0.0168 & \textbf{0.0117} & 0.0613 & \textbf{0.0518} \\
% 128 & 0.0140 & \textbf{0.0110} & 0.0564 & \textbf{0.0506} \\
% 256 & 0.0121 & \textbf{0.0106} & 0.0529 & \textbf{0.0498} \\
% 512 & 0.0112 & \textbf{0.0103} & 0.0514 & \textbf{0.0491} \\
% 768 & 0.0109 & \textbf{0.0102} & 0.0505 & \textbf{0.0490} \\
% 1024 & 0.0106 & \textbf{0.0101} & 0.0500 & \textbf{0.0489} \\
% \bottomrule
% \end{tabularx}
% \caption{Autoencoder pre-training MSE losses across latent dimensions. Left: overall loss; Right: anomaly-only loss. Cells show means across folds (no $\pm$std). Maximum observed standard deviation across all cells (not shown): 0.0067.}
% \end{table}
%
% \fig{ae_loss_overall}{figures/ae_elbow_test_loss_overall.png}{Reconstruction loss across latent dimensions for LeNet-inspired and Efficient architectures.}
%
% \threadtodo
% {Analyze anomaly reconstruction performance specifically}
% {Critical because degraded inputs may reconstruct differently, showing whether networks capture degradation structure}
% {Show reconstruction losses on anomalous-only data subset}
% {This analysis $\rightarrow$ motivates testing whether better AE reconstructions imply better anomaly detection}
%
% Since it could be argued, that the overall reconstruction loss is not a good metric for evaluating the encoders' capability of extracting the most important information of anomalous data, we also plot the reconstruction loss of the two architectures for only anomalous samples from the hand-labeled sections of the experiments containing artifical smoke in figure~\ref{fig:ae_loss_degraded}. These evaluations show that while the loss of degraded sample reconstruction is overall roughly 0.05 higher than the one from the overall evaluation (which included normal and anomalous samples for reconstruction evaluation) the same improvement per latent space dimesionality between the LeNet-inspired and the Efficient encoder can be observed for anomalous samples, which would indicate that the modified architecture is still an improvement when only looking at its degraded sample reconstruction performance.
%
% \fig{ae_loss_degraded}{figures/ae_elbow_test_loss_anomaly.png}{Reconstruction loss across latent dimensions for LeNet-inspired and Efficient architectures evaluated only on degraded data from hand-labeled section of experiments with artifical smoke.}
%
% The reported MSE-loss is hard to judge in isolation but the overall better performance of the new architecture allows us to evaluate the importance of autoencoder performance for the anomaly detection performance of deepsad overall. To gauge the encoder performance' impact on the overall DeepSAD performance we next present the results of training DeepSAD not on a single but on all of the same latent space dimeionsionalities which we explored during the pre-training evaluation, which allows us to check if see similar differences between the architectures for the anomaly detection performance.
%
% --- Section: DeepSAD Training Results ---
\section{DeepSAD Detection Performance}
\begin{table}[t]
\centering
\caption{ROC AUC (mean \textpm std) across 5 folds for \texttt{experiment-based evaluation}, semi-labeling regime: 0 normal samples 0 anomalous samples.}
\label{tab:auc_exp_based_semi_0_0}
\begin{tabularx}{\textwidth}{cYYYY}
\toprule
\textbf{Latent Dim.} & \textbf{DeepSAD (LeNet)} & \textbf{DeepSAD (Efficient)} & \textbf{IsolationForest} & \textbf{OC\text{-}SVM} \\
\midrule
32 & \textbf{0.801 \textpm 0.019} & 0.791 \textpm 0.011 & 0.717 \textpm 0.006 & 0.752 \textpm 0.045 \\
64 & 0.776 \textpm 0.009 & \textbf{0.786 \textpm 0.012} & 0.718 \textpm 0.010 & 0.742 \textpm 0.018 \\
128 & \textbf{0.784 \textpm 0.024} & 0.784 \textpm 0.017 & 0.719 \textpm 0.017 & 0.775 \textpm 0.009 \\
256 & 0.762 \textpm 0.028 & 0.772 \textpm 0.016 & 0.712 \textpm 0.006 & \textbf{0.793 \textpm 0.022} \\
512 & 0.759 \textpm 0.020 & 0.784 \textpm 0.021 & 0.712 \textpm 0.007 & \textbf{0.804 \textpm 0.027} \\
768 & 0.749 \textpm 0.041 & 0.754 \textpm 0.024 & 0.713 \textpm 0.011 & \textbf{0.812 \textpm 0.023} \\
1024 & 0.757 \textpm 0.020 & 0.750 \textpm 0.017 & 0.716 \textpm 0.012 & \textbf{0.821 \textpm 0.019} \\
\bottomrule
\end{tabularx}
\end{table}
\begin{table}[t]
\centering
\caption{ROC AUC (mean \textpm std) across 5 folds for \texttt{experiment-based evaluation}, semi-labeling regime: 50 normal samples 10 anomalous samples.}
\label{tab:auc_exp_based_semi_50_10}
\begin{tabularx}{\textwidth}{cYYYY}
\toprule
\textbf{Latent Dim.} & \textbf{DeepSAD (LeNet)} & \textbf{DeepSAD (Efficient)} & \textbf{IsolationForest} & \textbf{OC\text{-}SVM} \\
\midrule
32 & 0.741 \textpm 0.013 & 0.747 \textpm 0.015 & 0.717 \textpm 0.006 & \textbf{0.752 \textpm 0.045} \\
64 & \textbf{0.757 \textpm 0.011} & 0.750 \textpm 0.017 & 0.718 \textpm 0.010 & 0.742 \textpm 0.018 \\
128 & 0.746 \textpm 0.019 & 0.751 \textpm 0.016 & 0.719 \textpm 0.017 & \textbf{0.775 \textpm 0.009} \\
256 & 0.746 \textpm 0.015 & 0.750 \textpm 0.015 & 0.712 \textpm 0.006 & \textbf{0.793 \textpm 0.022} \\
512 & 0.760 \textpm 0.057 & 0.763 \textpm 0.027 & 0.712 \textpm 0.007 & \textbf{0.804 \textpm 0.027} \\
768 & 0.749 \textpm 0.016 & 0.747 \textpm 0.036 & 0.713 \textpm 0.011 & \textbf{0.812 \textpm 0.023} \\
1024 & 0.748 \textpm 0.021 & 0.732 \textpm 0.015 & 0.716 \textpm 0.012 & \textbf{0.821 \textpm 0.019} \\
\bottomrule
\end{tabularx}
\end{table}
\begin{table}[t]
\centering
\caption{ROC AUC (mean \textpm std) across 5 folds for \texttt{experiment-based evaluation}, semi-labeling regime: 500 normal samples 100 anomalous samples.}
\label{tab:auc_exp_based_semi_500_100}
\begin{tabularx}{\textwidth}{cYYYY}
\toprule
\textbf{Latent Dim.} & \textbf{DeepSAD (LeNet)} & \textbf{DeepSAD (Efficient)} & \textbf{IsolationForest} & \textbf{OC\text{-}SVM} \\
\midrule
32 & 0.765 \textpm 0.005 & \textbf{0.775 \textpm 0.010} & 0.717 \textpm 0.006 & 0.752 \textpm 0.045 \\
64 & 0.754 \textpm 0.013 & \textbf{0.773 \textpm 0.020} & 0.718 \textpm 0.010 & 0.742 \textpm 0.018 \\
128 & 0.758 \textpm 0.009 & 0.769 \textpm 0.014 & 0.719 \textpm 0.017 & \textbf{0.775 \textpm 0.009} \\
256 & 0.749 \textpm 0.016 & 0.768 \textpm 0.021 & 0.712 \textpm 0.006 & \textbf{0.793 \textpm 0.022} \\
512 & 0.766 \textpm 0.043 & 0.770 \textpm 0.026 & 0.712 \textpm 0.007 & \textbf{0.804 \textpm 0.027} \\
768 & 0.746 \textpm 0.016 & 0.750 \textpm 0.027 & 0.713 \textpm 0.011 & \textbf{0.812 \textpm 0.023} \\
1024 & 0.743 \textpm 0.023 & 0.739 \textpm 0.016 & 0.716 \textpm 0.012 & \textbf{0.821 \textpm 0.019} \\
\bottomrule
\end{tabularx}
\end{table}
\begin{table}[t]
\centering
\caption{ROC AUC (mean \textpm std) across 5 folds for \texttt{handlabeling-based evaluation}, semi-labeling regime: 0 normal samples 0 anomalous samples.}
\label{tab:auc_manual_based_semi_0_0}
\begin{tabularx}{\textwidth}{cYYYY}
\toprule
\textbf{Latent Dim.} & \textbf{DeepSAD (LeNet)} & \textbf{DeepSAD (Efficient)} & \textbf{IsolationForest} & \textbf{OC\text{-}SVM} \\
\midrule
32 & \textbf{1.000 \textpm 0.000} & \textbf{1.000 \textpm 0.000} & 0.921 \textpm 0.010 & 0.917 \textpm 0.014 \\
64 & 1.000 \textpm 0.000 & \textbf{1.000 \textpm 0.000} & 0.917 \textpm 0.007 & 0.931 \textpm 0.023 \\
128 & \textbf{1.000 \textpm 0.000} & \textbf{1.000 \textpm 0.000} & 0.921 \textpm 0.008 & 0.967 \textpm 0.029 \\
256 & 1.000 \textpm 0.000 & \textbf{1.000 \textpm 0.000} & 0.918 \textpm 0.009 & 0.966 \textpm 0.016 \\
512 & 1.000 \textpm 0.000 & \textbf{1.000 \textpm 0.000} & 0.920 \textpm 0.010 & 0.949 \textpm 0.021 \\
768 & 1.000 \textpm 0.000 & \textbf{1.000 \textpm 0.000} & 0.923 \textpm 0.007 & 0.960 \textpm 0.024 \\
1024 & 1.000 \textpm 0.000 & \textbf{1.000 \textpm 0.000} & 0.919 \textpm 0.005 & 0.956 \textpm 0.011 \\
\bottomrule
\end{tabularx}
\end{table}
\begin{table}[t]
\centering
\caption{ROC AUC (mean \textpm std) across 5 folds for \texttt{handlabeling-based evaluation}, semi-labeling regime: 50 normal samples 10 anomalous samples.}
\label{tab:auc_manual_based_semi_50_10}
\begin{tabularx}{\textwidth}{cYYYY}
\toprule
\textbf{Latent Dim.} & \textbf{DeepSAD (LeNet)} & \textbf{DeepSAD (Efficient)} & \textbf{IsolationForest} & \textbf{OC\text{-}SVM} \\
\midrule
32 & 0.990 \textpm 0.019 & \textbf{0.998 \textpm 0.001} & 0.921 \textpm 0.010 & 0.917 \textpm 0.014 \\
64 & 0.998 \textpm 0.003 & \textbf{0.999 \textpm 0.000} & 0.917 \textpm 0.007 & 0.931 \textpm 0.023 \\
128 & 0.991 \textpm 0.018 & \textbf{0.999 \textpm 0.000} & 0.921 \textpm 0.008 & 0.967 \textpm 0.029 \\
256 & 0.999 \textpm 0.002 & \textbf{0.999 \textpm 0.001} & 0.918 \textpm 0.009 & 0.966 \textpm 0.016 \\
512 & 0.972 \textpm 0.060 & \textbf{0.999 \textpm 0.001} & 0.920 \textpm 0.010 & 0.949 \textpm 0.021 \\
768 & \textbf{1.000 \textpm 0.000} & 0.998 \textpm 0.001 & 0.923 \textpm 0.007 & 0.960 \textpm 0.024 \\
1024 & \textbf{0.999 \textpm 0.001} & 0.998 \textpm 0.001 & 0.919 \textpm 0.005 & 0.956 \textpm 0.011 \\
\bottomrule
\end{tabularx}
\end{table}
\begin{table}[t]
\centering
\caption{ROC AUC (mean \textpm std) across 5 folds for \texttt{handlabeling-based evaluation}, semi-labeling regime: 500 normal samples 100 anomalous samples.}
\label{tab:auc_manual_based_semi_500_100}
\begin{tabularx}{\textwidth}{cYYYY}
\toprule
\textbf{Latent Dim.} & \textbf{DeepSAD (LeNet)} & \textbf{DeepSAD (Efficient)} & \textbf{IsolationForest} & \textbf{OC\text{-}SVM} \\
\midrule
32 & \textbf{1.000 \textpm 0.000} & 1.000 \textpm 0.000 & 0.921 \textpm 0.010 & 0.917 \textpm 0.014 \\
64 & 1.000 \textpm 0.000 & \textbf{1.000 \textpm 0.000} & 0.917 \textpm 0.007 & 0.931 \textpm 0.023 \\
128 & 1.000 \textpm 0.000 & \textbf{1.000 \textpm 0.000} & 0.921 \textpm 0.008 & 0.967 \textpm 0.029 \\
256 & 0.999 \textpm 0.001 & \textbf{1.000 \textpm 0.000} & 0.918 \textpm 0.009 & 0.966 \textpm 0.016 \\
512 & 0.989 \textpm 0.025 & \textbf{1.000 \textpm 0.000} & 0.920 \textpm 0.010 & 0.949 \textpm 0.021 \\
768 & 1.000 \textpm 0.000 & \textbf{1.000 \textpm 0.000} & 0.923 \textpm 0.007 & 0.960 \textpm 0.024 \\
1024 & 1.000 \textpm 0.000 & \textbf{1.000 \textpm 0.000} & 0.919 \textpm 0.005 & 0.956 \textpm 0.011 \\
\bottomrule
\end{tabularx}
\end{table}
\newsection{results_deepsad}{DeepSAD Detection Performance}
\threadtodo
{Introduce DeepSAD anomaly detection results compared to baselines}
@@ -1629,7 +1581,7 @@ Together, these results provide a comprehensive overview of the computational re
1024 & 0.743 & 0.739 & 0.716 & \textbf{0.821} & \textbf{1.000} & \textbf{1.000} & 0.919 & 0.956 \\
\bottomrule
\end{tabularx}
\caption{AUC means across 5 folds for both evaluations, grouped by labeling regime. Maximum observed standard deviation across all cells (not shown in table): 0.060.}
\caption{ROC AUC means across 5 folds for both evaluations, grouped by labeling regime. Maximum observed standard deviation across all cells (not shown in table): 0.06.}
\end{table}
%\fig{roc_prc_unsup}{figures/roc_prc_unsup.png}{ROC and PRC curves for DeepSAD, Isolation Forest, and OCSVM (unsupervised, all latent dimensions).}
@@ -1655,7 +1607,7 @@ Together, these results provide a comprehensive overview of the computational re
{This discussion $\rightarrow$ motivates looking at model behavior over time via inference}
% --- Section: Inference Experiments ---
\section{Inference on Held-Out Experiments}
\newsection{results_inference}{Inference on Held-Out Experiments}
\threadtodo
{Introduce inference evaluation on unseen experiments}
@@ -1679,15 +1631,6 @@ Together, these results provide a comprehensive overview of the computational re
%\fig{inference_clean_vs_smoke}{figures/inference_clean_vs_smoke.png}{Normalized anomaly scores for a clean vs degraded experiment. Clear amplitude separation is visible.}
% --- Section: Results Summary ---
\section{Summary of Results}
\threadtodo
{Summarize main findings across all results}
{Reader should leave with a compact understanding of what was learned}
{State that Efficient autoencoder reconstructs better, DeepSAD beats baselines, semi-supervision shows tradeoffs, inference confirms degradation quantification works}
{Clear closure $\rightarrow$ prepares transition to discussion, limitations, and future work}
% \todo[inline]{introductory paragraph results}
% \todo[inline]{autoencoder results, compare lenet to efficient, shows that efficient is better and especially at lower latent dims, interesting to see in future exps if autencoder results appear to transfer to deepsad training results, therefore not a single latent dim in later exps, but rather all so it can be compared. also interesting to see if efficient better than lenet since reconstruction loss is better for efficient}
%