results ae section

2025-09-18 11:58:28 +02:00
parent 8f36bd2e07
commit a20a4a0832
4 changed files with 393 additions and 145 deletions
--- a/thesis/Main.pdf
+++ b/thesis/Main.pdf
--- a/thesis/Main.tex
+++ b/thesis/Main.tex
@@ -1436,151 +1436,103 @@ Together, these results provide a comprehensive overview of the computational re
 \newchapter{results_discussion}{Results and Discussion}
 \threadtodo
 {Introduce the structure and scope of the results chapter}
 {The reader knows the experiments from the previous chapter, but not the outcomes}
 {State that we will first analyze autoencoder results, then anomaly detection performance, and finally inference experiments}
 {Clear roadmap $\rightarrow$ prepares reader for detailed sections}
 %The results from the experiments described in chapter~\ref{chp:experimental_setup} will be presented in this chapter as follows: the pretraining results from training the two autoencoder architectures for multiple latent space dimensionalities will be shown an discussed in section~\ref{sec:results_pretraining}, the results from training DeepSAD and comparing it with the baseline algorithms will be presented and discussed in section~\ref{sec:results_deepsad} and lastly we will present some plots from running inference on experiments which were held-out during training, to improve the reader's grasp on how the algorithms would perform and may be used in real-world applications.
 The experiments described in Chapter~\ref{chp:experimental_setup} are presented in this chapter. We begin in Section~\ref{sec:results_pretraining} with the pretraining stage, where the two autoencoder architectures were trained across multiple latent space dimensionalities. These results provide insight into the representational capacity of each architecture. In Section~\ref{sec:results_deepsad}, we turn to the main experiments: training DeepSAD models and benchmarking them against baseline algorithms (Isolation Forest and One-Class SVM). Finally, in Section~\ref{sec:results_inference}, we present inference results on experiments that were held out during training. These plots illustrate how the algorithms behave when applied sequentially to unseen traversals, offering a more practical perspective on their potential for real-world rescue robotics applications.
 % --- Section: Autoencoder Pretraining Results ---
-\section{Autoencoder Pretraining Results}
+\newsection{results_pretraining}{Autoencoder Pretraining Results}
-\threadtodo
+The results of pretraining the two autoencoder architectures are summarized in Table~\ref{tab:pretraining_loss}. Reconstruction performance is reported as mean squared error (MSE), with trends visualized in Figure~\ref{fig:ae_loss_overall}. The results show that the modified Efficient architecture consistently outperforms the LeNet-inspired baseline across all latent space dimensionalities. The improvement is most pronounced at lower-dimensional bottlenecks (e.g., 32 or 64 dimensions) but remains observable up to 1024 dimensions, although the gap narrows.
 {Present autoencoder reconstruction performance across architectures and latent sizes}
 {Important because latent size and architecture determine representation quality, which may affect DeepSAD later}
 {Show reconstruction losses over latent dimensions, compare Efficient vs LeNet}
 {Understanding representation capacity $\rightarrow$ motivates analyzing if AE results transfer to DeepSAD}
-%\fig{ae_loss_overall}{figures/ae_loss_overall.png}{Reconstruction loss across latent dimensions for LeNet-inspired and Efficient architectures.}
+\begin{table}[t]
 	\centering
 	\label{tab:pretraining_loss}
 	\begin{tabularx}{\textwidth}{c*{2}{Y}|*{2}{Y}}
 		\toprule
 		            & \multicolumn{2}{c}{Overall loss} & \multicolumn{2}{c}{Anomaly loss}                            \\
 		\cmidrule(lr){2-3} \cmidrule(lr){4-5}
 		Latent Dim. & LeNet                            & Efficient                        & LeNet  & Efficient       \\
 		\midrule
 		32          & 0.0223                           & \textbf{0.0136}                  & 0.0701 & \textbf{0.0554} \\
 		64          & 0.0168                           & \textbf{0.0117}                  & 0.0613 & \textbf{0.0518} \\
 		128         & 0.0140                           & \textbf{0.0110}                  & 0.0564 & \textbf{0.0506} \\
 		256         & 0.0121                           & \textbf{0.0106}                  & 0.0529 & \textbf{0.0498} \\
 		512         & 0.0112                           & \textbf{0.0103}                  & 0.0514 & \textbf{0.0491} \\
 		768         & 0.0109                           & \textbf{0.0102}                  & 0.0505 & \textbf{0.0490} \\
 		1024        & 0.0106                           & \textbf{0.0101}                  & 0.0500 & \textbf{0.0489} \\
 		\bottomrule
 	\end{tabularx}
 	\caption{Autoencoder pre-training MSE losses across latent dimensions. Left: overall loss; Right: anomaly-only loss. Cells show means across folds (no $\pm$std). Maximum observed standard deviation across all cells (not shown): 0.0067.}
 \end{table}
-\threadtodo
+\fig{ae_loss_overall}{figures/ae_elbow_test_loss_overall.png}{Reconstruction loss across latent dimensions for LeNet-inspired and Efficient architectures.}
 {Analyze anomaly reconstruction performance specifically}
 {Critical because degraded inputs may reconstruct differently, showing whether networks capture degradation structure}
 {Show reconstruction losses on anomalous-only data subset}
 {This analysis $\rightarrow$ motivates testing whether better AE reconstructions imply better anomaly detection}
-%\fig{ae_loss_degraded}{figures/ae_loss_degraded.png}{Reconstruction loss on degraded-only subsets.}
+Because overall reconstruction loss might obscure how well encoders represent anomalous samples, we additionally evaluate reconstruction errors only on degraded samples from hand-labeled smoke segments (Figure~\ref{fig:ae_loss_degraded}). As expected, reconstruction losses are about 0.05 higher on these challenging samples than in the overall evaluation. However, the relative advantage of the Efficient architecture remains, suggesting that its improvements extend to anomalous inputs as well.
 \fig{ae_loss_degraded}{figures/ae_elbow_test_loss_anomaly.png}{Reconstruction loss across latent dimensions for LeNet-inspired and Efficient architectures, evaluated only on degraded data from hand-labeled smoke experiments.}
 % It is important to note that absolute MSE values are difficult to interpret in isolation, as their magnitude depends on the data scaling and chosen reconstruction target. More detailed evaluations in terms of error in meters, relative error, or distance-binned metrics could provide richer insights into encoder quality. However, since the downstream anomaly detection results (Section~\ref{sec:results_deepsad}) do not reveal significant differences between pretraining regimes, such detailed pretraining evaluation was not pursued here. Instead, these metrics are left as promising directions for future work, particularly if pretraining were to play a larger role in final detection performance.
 %In the following, we therefore focus on the main question of whether improved reconstruction performance during pretraining translates into measurable benefits for anomaly detection in DeepSAD.
 It is important to note that absolute MSE values are difficult to interpret in isolation, as their magnitude depends on the data scaling and chosen reconstruction target. More detailed evaluations in terms of error in meters, relative error, or distance-binned metrics could provide richer insights into encoder quality. However, since the downstream anomaly detection results (Section~\ref{sec:results_deepsad}) do not reveal significant differences between pretraining regimes, such detailed pretraining evaluation was not pursued here. Instead, we restrict ourselves to reporting the reconstruction trends and leave more in-depth pretraining analysis as future work.
 % % --- Section: Autoencoder Pretraining Results ---
 % \newsection{results_pretraining}{Autoencoder Pretraining Results}
 %
 % \threadtodo
 % {Present autoencoder reconstruction performance across architectures and latent sizes}
 % {Important because latent size and architecture determine representation quality, which may affect DeepSAD later}
 % {Show reconstruction losses over latent dimensions, compare Efficient vs LeNet}
 % {Understanding representation capacity $\rightarrow$ motivates analyzing if AE results transfer to DeepSAD}
 %
 % The results from pretraining the two autoencoder architectures are shown as MSE-Loss in table~\ref{tab:pretraining_loss} and demonstrate that the modifications to the original LeNet-inspired architecture improved the reconstruction performance as can be seen in figure~\ref{fig:ae_loss_overall} which shows that especially at lower latent space dimensionalities the modified architecture results in a highly improved loss when compared to the LeNet-inspired one and that the improvement persists all the way up to a 1024-dimensional latent space, though the difference becomes less pronounced.
 %
 % \begin{table}[t]
 % 	\centering
 % 	\label{tab:pretraining_loss}
 % 	\begin{tabularx}{\textwidth}{c*{2}{Y}|*{2}{Y}}
 % 		\toprule
 % 		            & \multicolumn{2}{c}{Overall loss} & \multicolumn{2}{c}{Anomaly loss}                            \\
 % 		\cmidrule(lr){2-3} \cmidrule(lr){4-5}
 % 		Latent Dim. & LeNet                            & Efficient                        & LeNet  & Efficient       \\
 % 		\midrule
 % 		32          & 0.0223                           & \textbf{0.0136}                  & 0.0701 & \textbf{0.0554} \\
 % 		64          & 0.0168                           & \textbf{0.0117}                  & 0.0613 & \textbf{0.0518} \\
 % 		128         & 0.0140                           & \textbf{0.0110}                  & 0.0564 & \textbf{0.0506} \\
 % 		256         & 0.0121                           & \textbf{0.0106}                  & 0.0529 & \textbf{0.0498} \\
 % 		512         & 0.0112                           & \textbf{0.0103}                  & 0.0514 & \textbf{0.0491} \\
 % 		768         & 0.0109                           & \textbf{0.0102}                  & 0.0505 & \textbf{0.0490} \\
 % 		1024        & 0.0106                           & \textbf{0.0101}                  & 0.0500 & \textbf{0.0489} \\
 % 		\bottomrule
 % 	\end{tabularx}
 % 	\caption{Autoencoder pre-training MSE losses across latent dimensions. Left: overall loss; Right: anomaly-only loss. Cells show means across folds (no $\pm$std). Maximum observed standard deviation across all cells (not shown): 0.0067.}
 % \end{table}
 %
 % \fig{ae_loss_overall}{figures/ae_elbow_test_loss_overall.png}{Reconstruction loss across latent dimensions for LeNet-inspired and Efficient architectures.}
 %
 % \threadtodo
 % {Analyze anomaly reconstruction performance specifically}
 % {Critical because degraded inputs may reconstruct differently, showing whether networks capture degradation structure}
 % {Show reconstruction losses on anomalous-only data subset}
 % {This analysis $\rightarrow$ motivates testing whether better AE reconstructions imply better anomaly detection}
 %
 % Since it could be argued, that the overall reconstruction loss is not a good metric for evaluating the encoders' capability of extracting the most important information of anomalous data, we also plot the reconstruction loss of the two architectures for only anomalous samples from the hand-labeled sections of the experiments containing artifical smoke in figure~\ref{fig:ae_loss_degraded}. These evaluations show that while the loss of degraded sample reconstruction is overall roughly 0.05 higher than the one from the overall evaluation (which included normal and anomalous samples for reconstruction evaluation) the same improvement per latent space dimesionality between the LeNet-inspired and the Efficient encoder can be observed for anomalous samples, which would indicate that the modified architecture is still an improvement when only looking at its degraded sample reconstruction performance.
 %
 % \fig{ae_loss_degraded}{figures/ae_elbow_test_loss_anomaly.png}{Reconstruction loss across latent dimensions for LeNet-inspired and Efficient architectures evaluated only on degraded data from hand-labeled section of experiments with artifical smoke.}
 %
 % The reported MSE-loss is hard to judge in isolation but the overall better performance of the new architecture allows us to evaluate the importance of autoencoder performance for the anomaly detection performance of deepsad overall. To gauge the encoder performance' impact on the overall DeepSAD performance we next present the results of training DeepSAD not on a single but on all of the same latent space dimeionsionalities which we explored during the pre-training evaluation, which allows us to check if see similar differences between the architectures for the anomaly detection performance.
 %
 % --- Section: DeepSAD Training Results ---
-\section{DeepSAD Detection Performance}
+\newsection{results_deepsad}{DeepSAD Detection Performance}
 \begin{table}[t]
 	\centering
 	\caption{ROC AUC (mean \textpm std) across 5 folds for \texttt{experiment-based evaluation}, semi-labeling regime: 0 normal samples 0 anomalous samples.}
 	\label{tab:auc_exp_based_semi_0_0}
 	\begin{tabularx}{\textwidth}{cYYYY}
 		\toprule
 		\textbf{Latent Dim.} & \textbf{DeepSAD (LeNet)}     & \textbf{DeepSAD (Efficient)} & \textbf{IsolationForest} & \textbf{OC\text{-}SVM}       \\
 		\midrule
 		32                   & \textbf{0.801 \textpm 0.019} & 0.791 \textpm 0.011          & 0.717 \textpm 0.006      & 0.752 \textpm 0.045          \\
 		64                   & 0.776 \textpm 0.009          & \textbf{0.786 \textpm 0.012} & 0.718 \textpm 0.010      & 0.742 \textpm 0.018          \\
 		128                  & \textbf{0.784 \textpm 0.024} & 0.784 \textpm 0.017          & 0.719 \textpm 0.017      & 0.775 \textpm 0.009          \\
 		256                  & 0.762 \textpm 0.028          & 0.772 \textpm 0.016          & 0.712 \textpm 0.006      & \textbf{0.793 \textpm 0.022} \\
 		512                  & 0.759 \textpm 0.020          & 0.784 \textpm 0.021          & 0.712 \textpm 0.007      & \textbf{0.804 \textpm 0.027} \\
 		768                  & 0.749 \textpm 0.041          & 0.754 \textpm 0.024          & 0.713 \textpm 0.011      & \textbf{0.812 \textpm 0.023} \\
 		1024                 & 0.757 \textpm 0.020          & 0.750 \textpm 0.017          & 0.716 \textpm 0.012      & \textbf{0.821 \textpm 0.019} \\
 		\bottomrule
 	\end{tabularx}
 \end{table}
 \begin{table}[t]
 	\centering
 	\caption{ROC AUC (mean \textpm std) across 5 folds for \texttt{experiment-based evaluation}, semi-labeling regime: 50 normal samples 10 anomalous samples.}
 	\label{tab:auc_exp_based_semi_50_10}
 	\begin{tabularx}{\textwidth}{cYYYY}
 		\toprule
 		\textbf{Latent Dim.} & \textbf{DeepSAD (LeNet)}     & \textbf{DeepSAD (Efficient)} & \textbf{IsolationForest} & \textbf{OC\text{-}SVM}       \\
 		\midrule
 		32                   & 0.741 \textpm 0.013          & 0.747 \textpm 0.015          & 0.717 \textpm 0.006      & \textbf{0.752 \textpm 0.045} \\
 		64                   & \textbf{0.757 \textpm 0.011} & 0.750 \textpm 0.017          & 0.718 \textpm 0.010      & 0.742 \textpm 0.018          \\
 		128                  & 0.746 \textpm 0.019          & 0.751 \textpm 0.016          & 0.719 \textpm 0.017      & \textbf{0.775 \textpm 0.009} \\
 		256                  & 0.746 \textpm 0.015          & 0.750 \textpm 0.015          & 0.712 \textpm 0.006      & \textbf{0.793 \textpm 0.022} \\
 		512                  & 0.760 \textpm 0.057          & 0.763 \textpm 0.027          & 0.712 \textpm 0.007      & \textbf{0.804 \textpm 0.027} \\
 		768                  & 0.749 \textpm 0.016          & 0.747 \textpm 0.036          & 0.713 \textpm 0.011      & \textbf{0.812 \textpm 0.023} \\
 		1024                 & 0.748 \textpm 0.021          & 0.732 \textpm 0.015          & 0.716 \textpm 0.012      & \textbf{0.821 \textpm 0.019} \\
 		\bottomrule
 	\end{tabularx}
 \end{table}
 \begin{table}[t]
 	\centering
 	\caption{ROC AUC (mean \textpm std) across 5 folds for \texttt{experiment-based evaluation}, semi-labeling regime: 500 normal samples 100 anomalous samples.}
 	\label{tab:auc_exp_based_semi_500_100}
 	\begin{tabularx}{\textwidth}{cYYYY}
 		\toprule
 		\textbf{Latent Dim.} & \textbf{DeepSAD (LeNet)} & \textbf{DeepSAD (Efficient)} & \textbf{IsolationForest} & \textbf{OC\text{-}SVM}       \\
 		\midrule
 		32                   & 0.765 \textpm 0.005      & \textbf{0.775 \textpm 0.010} & 0.717 \textpm 0.006      & 0.752 \textpm 0.045          \\
 		64                   & 0.754 \textpm 0.013      & \textbf{0.773 \textpm 0.020} & 0.718 \textpm 0.010      & 0.742 \textpm 0.018          \\
 		128                  & 0.758 \textpm 0.009      & 0.769 \textpm 0.014          & 0.719 \textpm 0.017      & \textbf{0.775 \textpm 0.009} \\
 		256                  & 0.749 \textpm 0.016      & 0.768 \textpm 0.021          & 0.712 \textpm 0.006      & \textbf{0.793 \textpm 0.022} \\
 		512                  & 0.766 \textpm 0.043      & 0.770 \textpm 0.026          & 0.712 \textpm 0.007      & \textbf{0.804 \textpm 0.027} \\
 		768                  & 0.746 \textpm 0.016      & 0.750 \textpm 0.027          & 0.713 \textpm 0.011      & \textbf{0.812 \textpm 0.023} \\
 		1024                 & 0.743 \textpm 0.023      & 0.739 \textpm 0.016          & 0.716 \textpm 0.012      & \textbf{0.821 \textpm 0.019} \\
 		\bottomrule
 	\end{tabularx}
 \end{table}
 \begin{table}[t]
 	\centering
 	\caption{ROC AUC (mean \textpm std) across 5 folds for \texttt{handlabeling-based evaluation}, semi-labeling regime: 0 normal samples 0 anomalous samples.}
 	\label{tab:auc_manual_based_semi_0_0}
 	\begin{tabularx}{\textwidth}{cYYYY}
 		\toprule
 		\textbf{Latent Dim.} & \textbf{DeepSAD (LeNet)}     & \textbf{DeepSAD (Efficient)} & \textbf{IsolationForest} & \textbf{OC\text{-}SVM} \\
 		\midrule
 		32                   & \textbf{1.000 \textpm 0.000} & \textbf{1.000 \textpm 0.000} & 0.921 \textpm 0.010      & 0.917 \textpm 0.014    \\
 		64                   & 1.000 \textpm 0.000          & \textbf{1.000 \textpm 0.000} & 0.917 \textpm 0.007      & 0.931 \textpm 0.023    \\
 		128                  & \textbf{1.000 \textpm 0.000} & \textbf{1.000 \textpm 0.000} & 0.921 \textpm 0.008      & 0.967 \textpm 0.029    \\
 		256                  & 1.000 \textpm 0.000          & \textbf{1.000 \textpm 0.000} & 0.918 \textpm 0.009      & 0.966 \textpm 0.016    \\
 		512                  & 1.000 \textpm 0.000          & \textbf{1.000 \textpm 0.000} & 0.920 \textpm 0.010      & 0.949 \textpm 0.021    \\
 		768                  & 1.000 \textpm 0.000          & \textbf{1.000 \textpm 0.000} & 0.923 \textpm 0.007      & 0.960 \textpm 0.024    \\
 		1024                 & 1.000 \textpm 0.000          & \textbf{1.000 \textpm 0.000} & 0.919 \textpm 0.005      & 0.956 \textpm 0.011    \\
 		\bottomrule
 	\end{tabularx}
 \end{table}
 \begin{table}[t]
 	\centering
 	\caption{ROC AUC (mean \textpm std) across 5 folds for \texttt{handlabeling-based evaluation}, semi-labeling regime: 50 normal samples 10 anomalous samples.}
 	\label{tab:auc_manual_based_semi_50_10}
 	\begin{tabularx}{\textwidth}{cYYYY}
 		\toprule
 		\textbf{Latent Dim.} & \textbf{DeepSAD (LeNet)}     & \textbf{DeepSAD (Efficient)} & \textbf{IsolationForest} & \textbf{OC\text{-}SVM} \\
 		\midrule
 		32                   & 0.990 \textpm 0.019          & \textbf{0.998 \textpm 0.001} & 0.921 \textpm 0.010      & 0.917 \textpm 0.014    \\
 		64                   & 0.998 \textpm 0.003          & \textbf{0.999 \textpm 0.000} & 0.917 \textpm 0.007      & 0.931 \textpm 0.023    \\
 		128                  & 0.991 \textpm 0.018          & \textbf{0.999 \textpm 0.000} & 0.921 \textpm 0.008      & 0.967 \textpm 0.029    \\
 		256                  & 0.999 \textpm 0.002          & \textbf{0.999 \textpm 0.001} & 0.918 \textpm 0.009      & 0.966 \textpm 0.016    \\
 		512                  & 0.972 \textpm 0.060          & \textbf{0.999 \textpm 0.001} & 0.920 \textpm 0.010      & 0.949 \textpm 0.021    \\
 		768                  & \textbf{1.000 \textpm 0.000} & 0.998 \textpm 0.001          & 0.923 \textpm 0.007      & 0.960 \textpm 0.024    \\
 		1024                 & \textbf{0.999 \textpm 0.001} & 0.998 \textpm 0.001          & 0.919 \textpm 0.005      & 0.956 \textpm 0.011    \\
 		\bottomrule
 	\end{tabularx}
 \end{table}
 \begin{table}[t]
 	\centering
 	\caption{ROC AUC (mean \textpm std) across 5 folds for \texttt{handlabeling-based evaluation}, semi-labeling regime: 500 normal samples 100 anomalous samples.}
 	\label{tab:auc_manual_based_semi_500_100}
 	\begin{tabularx}{\textwidth}{cYYYY}
 		\toprule
 		\textbf{Latent Dim.} & \textbf{DeepSAD (LeNet)}     & \textbf{DeepSAD (Efficient)} & \textbf{IsolationForest} & \textbf{OC\text{-}SVM} \\
 		\midrule
 		32                   & \textbf{1.000 \textpm 0.000} & 1.000 \textpm 0.000          & 0.921 \textpm 0.010      & 0.917 \textpm 0.014    \\
 		64                   & 1.000 \textpm 0.000          & \textbf{1.000 \textpm 0.000} & 0.917 \textpm 0.007      & 0.931 \textpm 0.023    \\
 		128                  & 1.000 \textpm 0.000          & \textbf{1.000 \textpm 0.000} & 0.921 \textpm 0.008      & 0.967 \textpm 0.029    \\
 		256                  & 0.999 \textpm 0.001          & \textbf{1.000 \textpm 0.000} & 0.918 \textpm 0.009      & 0.966 \textpm 0.016    \\
 		512                  & 0.989 \textpm 0.025          & \textbf{1.000 \textpm 0.000} & 0.920 \textpm 0.010      & 0.949 \textpm 0.021    \\
 		768                  & 1.000 \textpm 0.000          & \textbf{1.000 \textpm 0.000} & 0.923 \textpm 0.007      & 0.960 \textpm 0.024    \\
 		1024                 & 1.000 \textpm 0.000          & \textbf{1.000 \textpm 0.000} & 0.919 \textpm 0.005      & 0.956 \textpm 0.011    \\
 		\bottomrule
 	\end{tabularx}
 \end{table}
 \threadtodo
 {Introduce DeepSAD anomaly detection results compared to baselines}
@@ -1629,7 +1581,7 @@ Together, these results provide a comprehensive overview of the computational re
 		1024        & 0.743                                      & 0.739                                 & 0.716 & \textbf{0.821} & \textbf{1.000} & \textbf{1.000} & 0.919 & 0.956 \\
 		\bottomrule
 	\end{tabularx}
-	\caption{AUC means across 5 folds for both evaluations, grouped by labeling regime. Maximum observed standard deviation across all cells (not shown in table): 0.060.}
+	\caption{ROC AUC means across 5 folds for both evaluations, grouped by labeling regime. Maximum observed standard deviation across all cells (not shown in table): 0.06.}
 \end{table}
 %\fig{roc_prc_unsup}{figures/roc_prc_unsup.png}{ROC and PRC curves for DeepSAD, Isolation Forest, and OCSVM (unsupervised, all latent dimensions).}
@@ -1655,7 +1607,7 @@ Together, these results provide a comprehensive overview of the computational re
 {This discussion $\rightarrow$ motivates looking at model behavior over time via inference}
 % --- Section: Inference Experiments ---
-\section{Inference on Held-Out Experiments}
+\newsection{results_inference}{Inference on Held-Out Experiments}
 \threadtodo
 {Introduce inference evaluation on unseen experiments}
@@ -1679,15 +1631,6 @@ Together, these results provide a comprehensive overview of the computational re
 %\fig{inference_clean_vs_smoke}{figures/inference_clean_vs_smoke.png}{Normalized anomaly scores for a clean vs degraded experiment. Clear amplitude separation is visible.}
 % --- Section: Results Summary ---
 \section{Summary of Results}
 \threadtodo
 {Summarize main findings across all results}
 {Reader should leave with a compact understanding of what was learned}
 {State that Efficient autoencoder reconstructs better, DeepSAD beats baselines, semi-supervision shows tradeoffs, inference confirms degradation quantification works}
 {Clear closure $\rightarrow$ prepares transition to discussion, limitations, and future work}
 % \todo[inline]{introductory paragraph results}
 % \todo[inline]{autoencoder results, compare lenet to efficient, shows that efficient is better and especially at lower latent dims, interesting to see in future exps if autencoder results appear to transfer to deepsad training results, therefore not a single latent dim in later exps, but rather all so it can be compared. also interesting to see if efficient better than lenet since reconstruction loss is better for efficient}
 %
--- a/tools/plot_scripts/load_results.py
+++ b/tools/plot_scripts/load_results.py
@@ -75,7 +75,6 @@ PRETRAIN_SCHEMA = {
    "semi_anomalous": pl.Int32,
    "model": pl.Utf8,  # always "ae"
    "fold": pl.Int32,
    "split": pl.Utf8,  # "train" | "test"
    # timings and optimization
    "train_time": pl.Float64,
    "test_time": pl.Float64,
@@ -577,7 +576,7 @@ def load_pretraining_results_dataframe(
    # Cast/optimize a bit (categoricals, ints, floats)
    df = df.with_columns(
-        pl.col("network", "model", "split").cast(pl.Categorical),
+        pl.col("network", "model").cast(pl.Categorical),
        pl.col(
            "latent_dim", "semi_normals", "semi_anomalous", "fold", "k_fold_num"
        ).cast(pl.Int32),
--- a/tools/plot_scripts/results_ae_table.py
+++ b/tools/plot_scripts/results_ae_table.py
@@ -0,0 +1,306 @@
 # ae_losses_table_from_df.py
 from __future__ import annotations
 import shutil
 from dataclasses import dataclass
 from datetime import datetime
 from pathlib import Path
 from typing import Dict, List, Tuple
 import numpy as np
 import polars as pl
 # CHANGE THIS IMPORT IF YOUR LOADER MODULE IS NAMED DIFFERENTLY
 from load_results import load_pretraining_results_dataframe
 # ----------------------------
 # Config
 # ----------------------------
 ROOT = Path("/home/fedex/mt/results/copy")  # experiments root you pass to the loader
 OUTPUT_DIR = Path("/home/fedex/mt/plots/results_ae_table")
 # Which label field to use from the DF; "labels_exp_based" or "labels_manual_based"
 LABEL_FIELD = "labels_exp_based"
 # Which architectures to include (labels must match canonicalize_network)
 WANTED_NETS = {"LeNet", "Efficient"}
 # Formatting
 DECIMALS = 4  # how many decimals to display for losses
 BOLD_BEST = False  # set True to bold per-group best (lower is better)
 LOWER_IS_BETTER = True  # for losses we want the minimum
 # ----------------------------
 # Helpers (ported/minified from your plotting script)
 # ----------------------------
 def canonicalize_network(name: str) -> str:
    low = (name or "").lower()
    if "lenet" in low:
        return "LeNet"
    if "efficient" in low:
        return "Efficient"
    return name or "unknown"
 def calculate_batch_mean_loss(scores: np.ndarray, batch_size: int) -> float:
    n = len(scores)
    if n == 0:
        return np.nan
    if batch_size <= 0:
        batch_size = n
    n_batches = (n + batch_size - 1) // batch_size
    acc = 0.0
    for i in range(0, n, batch_size):
        acc += float(np.mean(scores[i : i + batch_size]))
    return acc / n_batches
 def extract_batch_size(cfg_json: str) -> int:
    import json
    try:
        cfg = json.loads(cfg_json) if cfg_json else {}
    except Exception:
        cfg = {}
    return int(cfg.get("ae_batch_size") or cfg.get("batch_size") or 256)
@dataclass(frozen=True)
 class Cell:
    mean: float | None
    std: float | None
 def _fmt(mean: float | None) -> str:
    return "--" if (mean is None or not (mean == mean)) else f"{mean:.{DECIMALS}f}"
 def _bold_mask_display(
    values: List[float | None], decimals: int, lower_is_better: bool
 ) -> List[bool]:
    """
    Tie-aware bolding mask based on *displayed* precision.
    For losses, lower is better (min). For metrics where higher is better, set lower_is_better=False.
    """
    def disp(v: float | None) -> float | None:
        if v is None or not (v == v):
            return None
        # use string → float to match display rounding exactly
        return float(f"{v:.{decimals}f}")
    rounded = [disp(v) for v in values]
    finite = [v for v in rounded if v is not None]
    if not finite:
        return [False] * len(values)
    target = min(finite) if lower_is_better else max(finite)
    return [(v is not None and v == target) for v in rounded]
 # ----------------------------
 # Core
 # ----------------------------
 def build_losses_table_from_df(
    df: pl.DataFrame, label_field: str
 ) -> Tuple[str, float | None]:
    """
    Build a LaTeX table showing Overall loss (LeNet, Efficient) and Anomaly loss (LeNet, Efficient)
    with one row per latent dimension. Returns (latex_table_string, max_std_overall).
    """
    # Basic validation
    required_cols = {"scores", "network", "latent_dim"}
    missing = required_cols - set(df.columns)
    if missing:
        raise ValueError(f"Missing required columns in AE dataframe: {missing}")
    if label_field not in df.columns:
        raise ValueError(f"Expected '{label_field}' column in AE dataframe.")
    # Canonicalize nets, compute per-row overall/anomaly losses
    rows: List[dict] = []
    for row in df.iter_rows(named=True):
        net = canonicalize_network(row["network"])
        if WANTED_NETS and net not in WANTED_NETS:
            continue
        dim = int(row["latent_dim"])
        batch_size = extract_batch_size(row.get("config_json"))
        scores = np.asarray(row["scores"] or [], dtype=float)
        labels = row.get(label_field)
        labels = np.asarray(labels, dtype=int) if labels is not None else None
        overall_loss = calculate_batch_mean_loss(scores, batch_size)
        anomaly_loss = np.nan
        if labels is not None and labels.size == scores.size:
            anomaly_scores = scores[labels == -1]
            if anomaly_scores.size > 0:
                anomaly_loss = calculate_batch_mean_loss(anomaly_scores, batch_size)
        rows.append(
            {
                "net": net,
                "latent_dim": dim,
                "overall": overall_loss,
                "anomaly": anomaly_loss,
            }
        )
    if not rows:
        raise ValueError(
            "No rows available after filtering; check WANTED_NETS or input data."
        )
    df2 = pl.DataFrame(rows)
    # Aggregate across folds per (net, latent_dim)
    agg = df2.group_by(["net", "latent_dim"]).agg(
        pl.col("overall").mean().alias("overall_mean"),
        pl.col("overall").std().alias("overall_std"),
        pl.col("anomaly").mean().alias("anomaly_mean"),
        pl.col("anomaly").std().alias("anomaly_std"),
    )
    # Collect union of dims across both nets
    dims = sorted(set(agg.get_column("latent_dim").to_list()))
    # Build lookup
    keymap: Dict[Tuple[str, int], Cell] = {}
    keymap_anom: Dict[Tuple[str, int], Cell] = {}
    max_std: float | None = None
    def push_std(v: float | None):
        nonlocal max_std
        if v is None or not (v == v):
            return
        if max_std is None or v > max_std:
            max_std = v
    for r in agg.iter_rows(named=True):
        k = (r["net"], int(r["latent_dim"]))
        keymap[k] = Cell(r.get("overall_mean"), r.get("overall_std"))
        keymap_anom[k] = Cell(r.get("anomaly_mean"), r.get("anomaly_std"))
        push_std(r.get("overall_std"))
        push_std(r.get("anomaly_std"))
    # Ensure nets order consistent
    nets_order = ["LeNet", "Efficient"]
    nets_present = [n for n in nets_order if any(k[0] == n for k in keymap.keys())]
    if not nets_present:
        nets_present = sorted({k[0] for k in keymap.keys()})
    # Build LaTeX table
    header_left = [r"LeNet", r"Efficient"]
    header_right = [r"LeNet", r"Efficient"]
    lines: List[str] = []
    lines.append(r"\begin{table}[t]")
    lines.append(r"\centering")
    lines.append(r"\setlength{\tabcolsep}{4pt}")
    lines.append(r"\renewcommand{\arraystretch}{1.2}")
    # vertical bar between the two groups
    lines.append(r"\begin{tabularx}{\textwidth}{c*{2}{Y}|*{2}{Y}}")
    lines.append(r"\toprule")
    lines.append(
        r" & \multicolumn{2}{c}{Overall loss} & \multicolumn{2}{c}{Anomaly loss} \\"
    )
    lines.append(r"\cmidrule(lr){2-3} \cmidrule(lr){4-5}")
    lines.append(
        r"Latent Dim. & "
        + " & ".join(header_left)
        + " & "
        + " & ".join(header_right)
        + r" \\"
    )
    lines.append(r"\midrule")
    for d in dims:
        # Gather values in order: Overall (LeNet, Efficient), Anomaly (LeNet, Efficient)
        overall_vals = [keymap.get((n, d), Cell(None, None)).mean for n in nets_present]
        anomaly_vals = [
            keymap_anom.get((n, d), Cell(None, None)).mean for n in nets_present
        ]
        overall_strs = [_fmt(v) for v in overall_vals]
        anomaly_strs = [_fmt(v) for v in anomaly_vals]
        if BOLD_BEST:
            mask_overall = _bold_mask_display(overall_vals, DECIMALS, LOWER_IS_BETTER)
            mask_anom = _bold_mask_display(anomaly_vals, DECIMALS, LOWER_IS_BETTER)
            overall_strs = [
                (r"\textbf{" + s + "}") if (m and s != "--") else s
                for s, m in zip(overall_strs, mask_overall)
            ]
            anomaly_strs = [
                (r"\textbf{" + s + "}") if (m and s != "--") else s
                for s, m in zip(anomaly_strs, mask_anom)
            ]
        lines.append(
            f"{d} & "
            + " & ".join(overall_strs)
            + " & "
            + " & ".join(anomaly_strs)
            + r" \\"
        )
    lines.append(r"\bottomrule")
    lines.append(r"\end{tabularx}")
    max_std_str = "n/a" if max_std is None else f"{max_std:.{DECIMALS}f}"
    lines.append(
        rf"\caption{{Autoencoder pre-training MSE losses (test split) across latent dimensions. "
        rf"Left: overall loss; Right: anomaly-only loss. "
        rf"Cells show means across folds (no $\pm$std). "
        rf"Maximum observed standard deviation across all cells (not shown): {max_std_str}.}}"
    )
    lines.append(r"\end{table}")
    return "\n".join(lines), max_std
 # ----------------------------
 # Entry
 # ----------------------------
 def main():
    df = load_pretraining_results_dataframe(ROOT, allow_cache=True)
    # Build LaTeX table
    tex, max_std = build_losses_table_from_df(df, LABEL_FIELD)
    # Output dirs
    OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
    ts_dir = OUTPUT_DIR / "archive" / datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
    ts_dir.mkdir(parents=True, exist_ok=True)
    out_name = "ae_pretraining_losses_table.tex"
    out_path = ts_dir / out_name
    out_path.write_text(tex, encoding="utf-8")
    # Save a copy of this script
    script_path = Path(__file__)
    try:
        shutil.copy2(script_path, ts_dir / script_path.name)
    except Exception:
        pass
    # Mirror latest
    latest = OUTPUT_DIR / "latest"
    latest.mkdir(parents=True, exist_ok=True)
    # Clear
    for f in latest.iterdir():
        if f.is_file():
            f.unlink()
    # Copy
    for f in ts_dir.iterdir():
        if f.is_file():
            shutil.copy2(f, latest / f.name)
    print(f"Saved table to: {ts_dir}")
    print(f"Also updated: {latest}")
    print(f" - {out_name}")
 if __name__ == "__main__":
    main()