autoencoder work, placeholder images

This commit is contained in:
Jan Kowalczyk
2025-05-01 13:23:54 +02:00
parent 09b474fa2d
commit eaef47a517
3 changed files with 19 additions and 1 deletions

View File

@@ -353,13 +353,21 @@ Machine learning defines types of algorithms capable of learning from existing d
For supervised learning each data sample is augmented by including a label depicting the ideal output the algorithm can produce for the given input. During the learning step these algorithms can compare their generated output with the one provided by an expert and calculate the error between them, minimizing the error to improve performance. Such labels are typically either a categorical or continuous target which are most commonly used for classification and regression tasks respectively.
\fig{ml_supervised_learning}{figures/ml_supervised_learning_placeholder.jpg}{PLACEHOLDER - An illustration of supervised learning-the training data is augmented to include the algorithms optimal output for the data sample, called labels.}
Unsupervised learning algorithms use raw data without a target label that can be used during the learning process. These types of algorithms are often utilized to identify underlying patterns in data which may be hard to discover using classical data analysis due to for example large data size or high data complexity. Cluster analysis depicts one common use case, in which data is grouped into clusters such that data from one cluster resembles other data from the same cluster more closely than data from other clusters, according to some predesignated criteria. Another important use case are dimensionality reduction tasks which transform high-dimensional data into a lower-dimensional subspace while retaining meaningful information of the original data.
\todo[inline, color=green!40]{illustration unsupervised learning}
A more interactive approach to learning is taken by reinforcement learning, which provides the algorithm with an environment and an interpreter of the environment's state. During training the algorithm explores new possible actions and their impact on the provided environment. The interpreter can then reward or punish the algorithm based on the outcome of its actions. To improve the algorithms capability it will try to maximize the rewards received from the interpreter, retaining some randomness as to enable the exploration of different actions and their outcomes. Reinforcement learning is usually used for cases where an algorithm has to make sequences of decisions in complex environments e.g., autonomous driving tasks.
\todo[inline, color=green!40]{illustration reinforcement learning}
Semi-Supervised learning algorithms are an inbetween category of supervised and unsupervised algorithms, in that they use a mixture of labeled and unlabeled data. Typically vastly more unlabeled data is used during training of such algorithms than labeled data, due to the effort and expertise required to label large quantities of data correctly. Semi-supervised methods are oftentimes an effort to improve a machine learning algorithm belonging to either the supervised or unsupervised category. Supervised methods such as classification tasks are enhanced by using large amounts of unlabeled data to augment the supervised training without additional need of labeling work. Alternatively, unsupervised methods like clustering algorithms may not only use unlabeled data but improve their performance by considering some hand-labeled data during training.
%Semi-Supervised learning algorithms are an inbetween category of supervised and unsupervised algorithms, in that they use a mixture of labeled and unlabeled data. Typically vastly more unlabeled data is used during training of such algorithms than labeled data, due to the effort and expertise required to label large quantities of data correctly. The type of task performed by semi-supervised methods can originate from either supervised learningor unsupervised learning domain. For classification tasks which are oftentimes achieved using supervised learning the additional unsupervised data is added during training with the hope to achieve a better outcome than when training only with the supervised portion of the data. In contrast for unsupervised learning use cases such as clustering algorithms, the addition of labeled samples can help guide the learning algorithm to improve performance over fully unsupervised training.
\todo[inline, color=green!40]{Talk about deep learning, backwards propagation, optimization goals, iterative process?}
For anomaly detection methods, the underlying techniques can belong to any of these or other categories of machine learning algorithms. As described in section~\ref{sec:anomaly_detection}, they may not even use any machine learning at all. While supervised anomaly detection methods exist, their suitability depends mostly on the availability of labeled training data and on a reasonable proportionality between normal and anomalous data. Both requirements can be challenging due to labeling often being labour intesive and the anomalies' intrinsic property to occur rarely when compared to normal data. DeepSAD is a semi-supervised method which extends its unsupervised predecessor Deep SVDD by including some labeled samples during training with the intention to improve the algorithm's performance. Both, DeepSAD and Deep SVDD include the training of an autoencoder as a pre-training step, a machine learning architecture, frequently grouped with unsupervised algorithms, even though that definition can be contested when scrutinizing it in more detail, which we will look at next.
\newsection{autoencoder}{Autoencoder}
@@ -370,7 +378,17 @@ For anomaly detection methods, the underlying techniques can belong to any of th
{explain basic idea, unfixed architecture, infomax, mention usecases, dimension reduction}
{dimensionality reduction, useful for high dim data $\rightarrow$ pointclouds from lidar}
Autoencoders are a type of neural network architecture, whose main goal is learning to encode input data into a representative state, from which the same input can be reconstructed, hence the name. They typically consist of two parts, an encoder and a decoder. The encoder learns to extract the most significant features from the input and convert them into another representation. The reconstruction goal ensures that the most prominent features of the input get retained during the encoding phase, due to the inability of reconstruction without the relevant information. The decoder learns to reconstruct the original input from its encoded representation, by minimizing the error between the original input data and the autoencoder's output. This unique optimization goal creates uncertainty when categorizing autoencoders as an unsupervised or supervised method, since they do not contain additional labels to the input data, while still having an optimal target available in the input data itself. Both of these properties fit certain definitions of supervised and unsupervised learning, though they are mostly labeled as unsupervised in literature, and sometimes proposed to be a special case of self-supervised learning.
Autoencoders are a type of neural network architecture, whose main goal is learning to encode input data into a representative state, from which the same input can be reconstructed, hence the name. They typically consist of two parts, an encoder and a decoder. The encoder learns to extract the most significant features from the input and convert them into another representation. The reconstruction goal ensures that the most prominent features of the input get retained during the encoding phase, due to the inability of reconstruction without the relevant information. The decoder learns to reconstruct the original input from its encoded representation, by minimizing the error between the original input data and the autoencoder's output. This optimization goal creates uncertainty when categorizing autoencoders as an unsupervised method, although literature commonly defines them as such. While they do not require any labeling of the input data, their optimization target can still calculate the error between the output and the optimal target, which is typically not available for unsupervised methods. For this reason, they are sometimes proposed to be a case of self-supervised learning, a type of machine learning where the data itself can be used to generate a supervisory signal without the need for a domain expert to provide one.
\fig{autoencoder_general}{figures/autoencoder_principle_placeholder.png}{PLACEHOLDER - An illustration of autoencoders' general architecture and reconstruction task.}
One key use case of autoencoders is to employ them as a dimensionality reduction technique. In that case, the latent space inbetween the encoder and decoder is of a lower dimensionality than the input data itself. Due to the aforementioned reconstruction goal, the shared information between the input data and its latent space representation is maximized, which is known as following the infomax principle. After training such an autoencoder, it may be used to generate lower-dimensional representations of the given datatype, enabling more performant computations which may have been infeasible to achieve on the original data. DeepSAD which we employ in this paper, uses an autoencoder in a pre-training step to achieve this goal among others.
\todo[inline, color=green!40]{VAEs?}
%Another way to employ autoencoders is to use them as a generative technique. The decoder in autoencoders is trained to reproduce the input state from its encoded representation, which can also be interpreted as the decoder being able to generate data of the input type, from an encoded representation. A classic autoencoder trains the encoder to map its input to a single point in the latent space-a distriminative modeling approach, which can succesfully learn a predictor given enough data. In generative modeling on the other hand, the goal is to learn the distribution the data originates from, which is the idea behind variational autoencoders (VAE). VAEs have the encoder produce an distribution instead of a point representation, samples from which are then fed to the decoder to reconstruct the original input. The result is the encoder learning to model the generative distribution of the input data, which enables new usecases, due to the latent representation
\todo[inline]{autoencoder explanation}
\todo[inline, color=green!40]{autoencoders are a neural network architecture archetype (words) whose training target is to reproduce the input data itself - hence the name. the architecture is most commonly a mirrored one consisting of an encoder which transforms input data into a hyperspace represantation in a latent space and a decoder which transforms the latent space into the same data format as the input data (phrasing), this method typically results in the encoder learning to extract the most robust and critical information of the data and the (todo maybe something about the decoder + citation for both). it is used in many domains translations, LLMs, something with images (search example + citations)}