section about preprocessing, TODO plot projection

2025-03-10 14:21:44 +01:00
parent de31d4fe38
commit 0458cd8c83
2 changed files with 39 additions and 14 deletions
--- a/thesis/Main.tex
+++ b/thesis/Main.tex
@@ -243,9 +243,9 @@
    \newsection{lidar_related_work}{Lidar - Light Detection and Ranging}
    \todo[inline]{related work in lidar}
-    \todo[inline, color=green!40]{the older more commonly known radar works by sending out an electromagnetic wave in the radiofrequency and detecting the time it takes to return (if it returns at all) signalling a reflective object in the path of the radiowave. lidar works on the same principle but sends out a lightray produced by a laser (citation needed) and measuring the time it takes for the ray to return to the sensor. since the speed of light is constant in air the system can calculate the distance between the sensor and the measured point. modern lidar systems send out multiple, often millions of measurement rays per second which results in a three dimensional pointcloud, constructed from the information in which direction the ray was cast and the distance that was measured}
+    \todo[inline, color=green!40]{the older more commonly known radar works by sending out an electromagnetic wave in the radiofrequency and detecting the time it takes to return (if it returns at all) signalling a reflective object in the path of the radiowave. lidar works on the same principle but sends out a lightray produced by a laser (citation needed) and measuring the time it takes for the ray to return to the sensor. since the speed of light is constant in air the system can calculate the distance between the sensor and the measured point. modern lidar systems send out multiple, often millions of measurement rays per second which results in a three dimensional point cloud, constructed from the information in which direction the ray was cast and the distance that was measured}
    \todo[inline, color=green!40]{lidar is used in most domains reliant on accurate 3d representations of the world like autonomous driving, robot navigation, (+ maybe quickly look up two other domains), its main advantage is high measurement accuracy, precision (use correct term), and high resolution (possible due to single point measurements instead of cones like radar, ToF, Ultrasonic) which enables more detailed mappings of the environment}
-    \todo[inline, color=green!40]{due to point precision, lidar is sensitive to noise/degradation of airborne particles, which may produce early returns, deflections, errrors of light rays, this results in noise in the 3d pointcloud and possibly missing data of the measurement behind the aerosol particle.}
+    \todo[inline, color=green!40]{due to point precision, lidar is sensitive to noise/degradation of airborne particles, which may produce early returns, deflections, errrors of light rays, this results in noise in the 3d point cloud and possibly missing data of the measurement behind the aerosol particle.}
    \todo[inline, color=green!40]{because of the given advantages of lidar it is most commonly used nowadays on robot platforms for environment mapping and navigiation - so we chose to demonstrate our method based on degraded data collected by a lidar sensor as discussed in more dtail in section (data section)}
@@ -309,9 +309,9 @@ Additionally, the dataset must be sufficiently large for training learning-based
    %\todo[inline, color=green!40]{labeling is an especially problematic topic since ideally we would want an analog value which corresponds with the amount of smoke present for evaluation. for training we only require the possibility to provide labels in the form of normal or anormal targets (binary classification) and these labels do not have to be present for all data, only for some of the data (since semi-supervised only uses some labeled data as discussed in X)}
-%To evaluate how proficiently any method can quantify the degradation of lidar data we require some kind of degradation label per scan. Ideally we would want an analog value per scan which somehow correlates to the degradation, but even a binary label of either degraded or not degraded would be useful. To find out which options are available for this task, we first have to figure out what degradation means in the context of lidar scans and especially the pointclouds in which they result. Lidar sensors combine multiple range measurements which are executed near simultaneously into a pointcloud whose reference point is the sensor location at the time of measurement. Ideally for each attempted measurement during a scan one point is produced, albeit in reality there are many factors why a fraction of the measurements cannot be completed and therefore there will be missing points even in good conditions. Additionally, there are also measurements which result in an incorrect range, like for example when an aerosol particle is hit by the measurement ray and a smaller range than was intended to be measured (to the next solid object) was returned. The sum of missing and erroneous measurements makes up the degradation, although it can be alleged that the term also includes the type or structure of errors or missing points and the resulting difficulties when further utilizing the resulting pointcloud. For example, if aerosol particles are dense enough in a small portion of the frame, they could produce a pointcloud where the particles are interpreted as a solid object even though the amount of erroneous measurements is smaller than for another scan where aerosol particles are evenly distributed around the sensor. In the latter case the erroneous measurements may be identified by outlier detection algorithms and after removal do not hinder further processing of the pointcloud. For these reasons it is not simple to define data degradation for lidar scans. 
+%To evaluate how proficiently any method can quantify the degradation of lidar data we require some kind of degradation label per scan. Ideally we would want an analog value per scan which somehow correlates to the degradation, but even a binary label of either degraded or not degraded would be useful. To find out which options are available for this task, we first have to figure out what degradation means in the context of lidar scans and especially the point clouds in which they result. Lidar sensors combine multiple range measurements which are executed near simultaneously into a point cloud whose reference point is the sensor location at the time of measurement. Ideally for each attempted measurement during a scan one point is produced, albeit in reality there are many factors why a fraction of the measurements cannot be completed and therefore there will be missing points even in good conditions. Additionally, there are also measurements which result in an incorrect range, like for example when an aerosol particle is hit by the measurement ray and a smaller range than was intended to be measured (to the next solid object) was returned. The sum of missing and erroneous measurements makes up the degradation, although it can be alleged that the term also includes the type or structure of errors or missing points and the resulting difficulties when further utilizing the resulting point cloud. For example, if aerosol particles are dense enough in a small portion of the frame, they could produce a point cloud where the particles are interpreted as a solid object even though the amount of erroneous measurements is smaller than for another scan where aerosol particles are evenly distributed around the sensor. In the latter case the erroneous measurements may be identified by outlier detection algorithms and after removal do not hinder further processing of the point cloud. For these reasons it is not simple to define data degradation for lidar scans. 
-%Another option would be to try to find an objective measurement of degradation. As the degradation in our use case mostly stems from airborne particles, it stands to reason that measuring the amount of them would enable us to label each frame with an analog score which correlates to the amount of degradation. This approach turns out to be difficult to implement in real life, since sensors capable of measuring the amount and size of airborne particles typically do so at the location of the sensor while the lidar sensor also sends measurement rays into all geometries visible to it. This localized measurement could be useful if the aerosol particle distribution is uniform enough but would not allow the system to anticipate degradation in other parts of the pointcloud. We are not aware of any public dataset fit for our requirements which also includes data on aerosol particle density and size.
+%Another option would be to try to find an objective measurement of degradation. As the degradation in our use case mostly stems from airborne particles, it stands to reason that measuring the amount of them would enable us to label each frame with an analog score which correlates to the amount of degradation. This approach turns out to be difficult to implement in real life, since sensors capable of measuring the amount and size of airborne particles typically do so at the location of the sensor while the lidar sensor also sends measurement rays into all geometries visible to it. This localized measurement could be useful if the aerosol particle distribution is uniform enough but would not allow the system to anticipate degradation in other parts of the point cloud. We are not aware of any public dataset fit for our requirements which also includes data on aerosol particle density and size.
 To evaluate how effectively a method can quantify LiDAR data degradation, we require a degradation label for each scan. Ideally, each scan would be assigned an analog value that correlates with the degree of degradation, but even a binary label—indicating whether a scan is degraded or not—would be useful.
@@ -340,9 +340,9 @@ Based on the previously discussed requirements and the challenges of obtaining r
  \item IR-enabled RBG-D Camera - OAK-D Pro
  \item IMU - Pixhawk 2.1 Cube Orange,
 \end{itemize}
-    %\todo[inline, color=green!40]{lidar data of 360° sensor is captured at 10 frames per second. each sensor output consists of pointcloud which resulted from measurement of 32 vertical channels for each of which 2048 measurement points are taken during each measurement equiangular distributed around the whole horizontal 360°, so the sensor measures 32 * 2048 = 65536 measurements 10 times a second for which ideally every one produces a point in the pointcloud consisting of x,y,z coordinates (relative to sensor platform) as well as some other values per measurement (reflectivity, intensity originally measured range value)}
+    %\todo[inline, color=green!40]{lidar data of 360° sensor is captured at 10 frames per second. each sensor output consists of point cloud which resulted from measurement of 32 vertical channels for each of which 2048 measurement points are taken during each measurement equiangular distributed around the whole horizontal 360°, so the sensor measures 32 * 2048 = 65536 measurements 10 times a second for which ideally every one produces a point in the point cloud consisting of x,y,z coordinates (relative to sensor platform) as well as some other values per measurement (reflectivity, intensity originally measured range value)}
-%We mainly utilize the data from the \emph{Ouster OS1-32} lidar sensor, which produces 10 frames per second with a resolution of 32 vertical channels by 2048 measurements per channel, both equiangularly spaced over the vertical and horizontal fields of view of 42.4° and 360° respectively. Every measurement of the lidar therefore results in a pointcloud with a maximum of 65536 points. Every point contains the \emph{X}, \emph{Y} and \emph{Z} coordinates in meters with the sensor location as origin, as well as values for the \emph{range}, \emph{intensity} and \emph{reflectivity} which are typical data measured by lidar sensors. The data is dense, meaning missing measurements are still present in the data of each pointcloud with zero values for most fields.
+%We mainly utilize the data from the \emph{Ouster OS1-32} lidar sensor, which produces 10 frames per second with a resolution of 32 vertical channels by 2048 measurements per channel, both equiangularly spaced over the vertical and horizontal fields of view of 42.4° and 360° respectively. Every measurement of the lidar therefore results in a point cloud with a maximum of 65536 points. Every point contains the \emph{X}, \emph{Y} and \emph{Z} coordinates in meters with the sensor location as origin, as well as values for the \emph{range}, \emph{intensity} and \emph{reflectivity} which are typical data measured by lidar sensors. The data is dense, meaning missing measurements are still present in the data of each point cloud with zero values for most fields.
 We use data from the \emph{Ouster OS1-32} LiDAR sensor, which was configured to capture 10 frames per second with a resolution of 32 vertical channels and 2048 measurements per channel. These settings yield equiangular measurements across a vertical field of view of 42.4° and a complete 360° horizontal field of view. Consequently, every LiDAR scan can generate up to 65,536 points. Each point contains the \emph{X}, \emph{Y}, and \emph{Z} coordinates (in meters, with the sensor location as the origin) along with values for \emph{range}, \emph{intensity}, and \emph{reflectivity}—typical metrics measured by LiDAR sensors. Although the dataset is considered dense, each point cloud still contains missing measurements, with fields of these missing measurements registering as zero.
@@ -354,23 +354,23 @@ In the anomalous experiments, the artificial smoke machine appears to have been
 %\todo[inline, color=green!40]{shortly mention the differences in conditions for these experiments and why they do not matter for us}
-%The 14 experiments differ regarding the available illumination, the presence of humans-traversing the measurement grounds- or additional static objects as artifcats and of course regarding the presence of the water vapor from the smoke machine. Aside from the artifical smoke which is essential for our use case, the other differences during the individual experiments are of no interestet to us and do not affect it in any way. Regardless of illumination, the lidar sensor produces indistinguishable pointclouds and any static objects do not factor into our quantification of the pointclouds' degradation.
+%The 14 experiments differ regarding the available illumination, the presence of humans-traversing the measurement grounds- or additional static objects as artifcats and of course regarding the presence of the water vapor from the smoke machine. Aside from the artifical smoke which is essential for our use case, the other differences during the individual experiments are of no interestet to us and do not affect it in any way. Regardless of illumination, the lidar sensor produces indistinguishable point clouds and any static objects do not factor into our quantification of the point clouds' degradation.
 The 14 experiments varied in illumination conditions, the presence of humans on the measurement grounds, and additional static artifacts, as well as in the presence of water vapor from the smoke machine. For our purposes, only the artificial smoke is relevant; differences in lighting or incidental static objects do not affect our analysis. Regardless of illumination, the LiDAR sensor consistently produces comparable point clouds, and the presence of static objects does not influence our quantification of point cloud degradation.
-%\todo[inline, color=green!40]{include representative image of pointcloud and camera image}
+%\todo[inline, color=green!40]{include representative image of point cloud and camera image}
-The figures~\ref{fig:data_screenshot_pointcloud}~and~\ref{fig:data_screenshot_camera} show an representative depiction of the environment of the experiments as a camera image of the IR camera and the pointcloud created by the OS1 lidar sensor at practically the same time. 
+The figures~\ref{fig:data_screenshot_pointcloud}~and~\ref{fig:data_screenshot_camera} show an representative depiction of the environment of the experiments as a camera image of the IR camera and the point cloud created by the OS1 lidar sensor at practically the same time. 
    \fig{data_screenshot_pointcloud}{figures/data_screenshot_pointcloud.png}{Screenshot of 3D rendering of an experiment without smoke and with illumination (same frame and roughly same alignment as figure~\ref{fig:data_screenshot_camera}). Point color corresponds to measurement range and axis in center of figure is the lidar's position.}
    \fig{data_screenshot_camera}{figures/data_screenshot_camera.png}{Screenshot of IR camera output of an experiment without smoke and with illumination (same frame and roughly same alignment as figure~\ref{fig:data_screenshot_pointcloud})}
    %\todo[inline, color=green!40]{talk about how much data is available (maybe a plot about data?), number of experiments with/without degradation, other factors in these experiments which do not concern our use-case of them}
-%Regarding the amount of data, of the 10 normal experiments the shortest was 88.7 seconds and the longest 363.1 seconds with a mean of 157.65 seconds between all 10 experiments, which results in 15765 non-degraded pointclouds. Of the 4 anomalous experiments, the shortest was the stationary one with 11.7 seconds and the longest was 62.1 seconds, having a mean of 47.325 seconds, resulting in 1893 degraded pointclouds. This gives us 17658 pointclouds alltogether with 89.28\% of them being non-degraded/normal samples and the other 10.72\% of them begin degraded/anomalous samples. 
+%Regarding the amount of data, of the 10 normal experiments the shortest was 88.7 seconds and the longest 363.1 seconds with a mean of 157.65 seconds between all 10 experiments, which results in 15765 non-degraded point clouds. Of the 4 anomalous experiments, the shortest was the stationary one with 11.7 seconds and the longest was 62.1 seconds, having a mean of 47.325 seconds, resulting in 1893 degraded point clouds. This gives us 17658 point clouds alltogether with 89.28\% of them being non-degraded/normal samples and the other 10.72\% of them begin degraded/anomalous samples. 
 Regarding the dataset volume, the 10 normal experiments ranged from 88.7 to 363.1 seconds, with an average duration of 157.65 seconds. At a capture rate of 10 frames per second, these experiments yield 15,765 non-degraded point clouds. In contrast, the 4 anomalous experiments, including one stationary experiment lasting 11.7 seconds and another extending to 62.1 seconds, averaged 47.33 seconds, resulting in 1,893 degraded point clouds. In total, the dataset comprises 17,658 point clouds, with approximately 89.28\% classified as non-degraded (normal) and 10.72\% as degraded (anomalous). The distribution of experimental data is visualized in figure~\ref{fig:data_points_pie}.
-    \fig{data_points_pie}{figures/data_points_pie.png}{Pie chart visualizing the amount and distribution of normal and anomalous pointclouds in \cite{subter}}
+    \fig{data_points_pie}{figures/data_points_pie.png}{Pie chart visualizing the amount and distribution of normal and anomalous point clouds in \cite{subter}}
    %BEGIN missing points
    As we can see in figure~\ref{fig:data_missing_points}, the artifical smoke introduced as explicit degradation during some experiments results in more missing measurements during scans, which can be explained by measurement rays hitting airborne particles but not being reflected back to the sensor in a way it can measure.
@@ -380,7 +380,7 @@ Regarding the dataset volume, the 10 normal experiments ranged from 88.7 to 363.
    %BEGIN early returns
-    % In experiments with artifical smoke present, we observe many points in the pointcloud very close to the sensor where there are no solid objects and therefore the points have to be produced by airborne particles from the artifical smoke. The phenomenon can be explained, in that the closer to the sensor an airborne particle is hit, the higher the chance of it reflecting the ray in a way the lidar can measure. In \ref{fig:particles_near_sensor} we see a box diagram depicting how significantly more measurements of the anomaly expirements produce a range smaller than 50 centimeters. Due to the sensor platform's setup and its paths taken during experiments we can conclude that any measurement with a range smaller than 50 centimeters has to be erroneous. While the amount of these returns near the sensor could most likely be used to estimate the sensor data quality while the sensor itself is located inside an environment containing airborne particles, this method would not allow to anticipate sensor data degradation before the sensor itself enters the affected area. Since lidar is used to sense the visible geometry from a distance, it would be desireable to quantify the data degradation of an area before the sensor itself enters it. Due to these reasons we did not use this phenomenon in our work.
+    % In experiments with artifical smoke present, we observe many points in the point cloud very close to the sensor where there are no solid objects and therefore the points have to be produced by airborne particles from the artifical smoke. The phenomenon can be explained, in that the closer to the sensor an airborne particle is hit, the higher the chance of it reflecting the ray in a way the lidar can measure. In \ref{fig:particles_near_sensor} we see a box diagram depicting how significantly more measurements of the anomaly expirements produce a range smaller than 50 centimeters. Due to the sensor platform's setup and its paths taken during experiments we can conclude that any measurement with a range smaller than 50 centimeters has to be erroneous. While the amount of these returns near the sensor could most likely be used to estimate the sensor data quality while the sensor itself is located inside an environment containing airborne particles, this method would not allow to anticipate sensor data degradation before the sensor itself enters the affected area. Since lidar is used to sense the visible geometry from a distance, it would be desireable to quantify the data degradation of an area before the sensor itself enters it. Due to these reasons we did not use this phenomenon in our work.
    In experiments with artificial smoke, we observe numerous points in the point cloud very close to the sensor, even though no solid objects exist at that range. These points are therefore generated by airborne particles in the artificial smoke. This phenomenon occurs because the closer an airborne particle is to the sensor, the higher the probability it reflects the laser beam in a measurable way. As shown in Figure~\ref{fig:particles_near_sensor}, a box diagram illustrates that significantly more measurements during these experiments report ranges shorter than 50 centimeters. Given the sensor platform's setup and its experimental trajectory, we conclude that any measurement with a range under 50 centimeters is erroneous.
@@ -395,9 +395,21 @@ While the density of these near-sensor returns might be used to estimate data qu
    %\todo[inline, color=green!40]{while as described in sec X the method DeepSAD is not dependend on any specific type/structure of data it requires to train an auto encoder in the pretraining step. such autoencoders are better understood in the image domain since there are many uses cases for this such as X (TODO citation needed), there are also 3d data auto encoders such as X (todo find example). same as the reference paper (rain cite) we chose to transform the 3d data to 2d by using a spherical spherical projection to map each of the 3d points onto a 2d plane where the range of each measurement can be expressed as the brightness of a single pixel. this leaves us with a 2d image of resolution 32x2048 (channels by horizontal measurements), which is helpful for visualization as well as for choosing a simpler architecture for the autoencoder of deepsad, the data in the rosbag is sparse meaning that measurements of the lidar which did not produce any value (no return ray detected before sensor specific timeout) are simply not present in the lidar scan. meaning we have at most 65xxx measurements per scan but mostly fewer than this, (maybe statistic about this? could aslo be interesting to show smoke experiment stuff)}
-    Test Section~\ref{sec:algorithm_description} more test.
+    %As described in section~\ref{sec:algorithm_description} the method we want to evaluate is datatype agnostic and can be adjusted to work with any kind of data. The data from~\cite{subter} that we will train on is a point cloud per scan created by the lidar sensor which contains up to 65536 points with \emph{X}, \emph{Y}, and \emph{Z} coordinates (in meters) per point. To adjust the architecture of DeepSAD to work with a specific datatype, we have to define an autoencoder architecture that works for the given datatype. While autoencoders can be created for any datatype, as~\cite{autoencoder_survey} points out over 60\% of research papers pertaining autoencoders in recent years look at image classification and reconstruction, so we have a better understanding of their architectures for two dimensional images than for three dimensional point clouds.
    As described in Section~\ref{sec:algorithm_description}, the method under evaluation is data type agnostic and can be adapted to work with any kind of data. In our case, we train on point clouds from~\cite{subter}, where each scan produced by the LiDAR sensor contains up to 65,536 points, with each point represented by its \emph{X}, \emph{Y}, and \emph{Z} coordinates. To tailor the DeepSAD architecture to this specific data type, we must design an autoencoder suitable for processing three-dimensional point clouds. Although autoencoders can be developed for various data types, as noted in~\cite{autoencoder_survey}, over 60\% of recent research on autoencoders focuses on two-dimensional image classification and reconstruction. Consequently, there is a more established understanding of architectures for images compared to those for three-dimensional point clouds.
    %\todo[inline, color=green!40]{to achieve this transformation we used the helpful measurement index and channel present in each measurement point of the dataset which allowed a perfect reconstruction of the 2d projection without calculating the pixel position in the projection of each measurement via angles which in our experience typically leads to some ambiguity in the projection (multiple measurements mapping to the same pixel due to precision loss/other errors) the measurement index increases even for unavailable measurements (no ray return) so we can simply create the 2d projection by mapping the normalized range (FIXME really normalized) value to the pixel position y = channel, x = measurement index. by initalizing the array to NaN values originally we have a 2d data structure with the range values and NaN on pixel positions where originally no measurement took place (missing measurements in scans due to no ray return)}
    %For this reason we decided to preprocess the point clouds by converting them to two dimensional grayscale images using spherical projection. Additionally, \cite{degradation_quantification_rain}-which we modeled our approach after-successfully chose this approach. In the projected image each measurement is encoded to a single pixel, whose grayscale value $v$ is the normalized range of the measurement $v = \sqrt{\emph{X}^2 + \emph{Y}^2 + \emph{Z}^2}$. Due to the settings of the datasets' lidar, this results in images with the resolution of 2048 pixels wide by 32 pixels tall. Missing measurements of the point cloud are mapped to pixels with a brightness of 0. To create the mapping we used the measurements indices and channels which are available since the dataset contains dense point clouds and which can be used since the point indices are ordered from 0 to 65535 horizontally ascending channel by channel. For point clouds without indices which can be directly mapped, as is often the case for sparse ones, it would be necessary to use the pitch and yaw angles to the sensor origin to map each point to a pixel on the projection.
    To simplify further processing, we converted the point clouds into two-dimensional grayscale images using a spherical projection. This approach—also employed successfully in \cite{degradation_quantification_rain}—encodes each LiDAR measurement as a single pixel, where the pixel’s grayscale value is determined by the normalized range, calculated as $v = \sqrt{\emph{X}^2 + \emph{Y}^2 + \emph{Z}^2}$. Given the LiDAR sensor's configuration, the resulting images have a resolution of 2048 pixels in width and 32 pixels in height. Missing measurements in the point cloud are mapped to pixels with a brightness value of 0.
 To create this mapping, we leveraged the available measurement indices and channel information inherent in the dense point clouds, which are ordered from 0 to 65,535 in a horizontally ascending, channel-by-channel manner. For sparser point clouds without such indices, one would need to rely on the pitch and yaw angles relative to the sensor's origin to correctly map each point to its corresponding pixel.
    \todo[inline, color=green!40]{add two projections one with one without smoke to  }
    \todo[inline, color=green!40]{to achieve this transformation we used the helpful measurement index and channel present in each measurement point of the dataset which allowed a perfect reconstruction of the 2d projection without calculating the pixel position in the projection of each measurement via angles which in our experience typically leads to some ambiguity in the projection (multiple measurements mapping to the same pixel due to precision loss/other errors) the measurement index increases even for unavailable measurements (no ray return) so we can simply create the 2d projection by mapping the normalized range (FIXME really normalized) value to the pixel position y = channel, x = measurement index. by initalizing the array to NaN values originally we have a 2d data structure with the range values and NaN on pixel positions where originally no measurement took place (missing measurements in scans due to no ray return)}
    \todo[inline, color=green!40]{another important preprocessing step is labeling of the lidar frames as normal/anormal. this is one hand used during training (experiments with zero labeled up to most of the data being labeled) and on the other hand is important for evaluation of the method performance. originally we do not have any labels on the data regarding degradation and no analog values from another sensor which measures current smoke particles in the air. our simple approach was to label all frames from experiments which included artifical degradation by fog machine smoke as anomalous and all frames from experiments without artifical degradation as normal.}
    \todo[inline, color=green!40]{this simple labeling method is quite flawed since we do not label based on the actual degradation of the scan (not by some kind of threshold of analog measurement threshold, statistical info about scan) since (TODO FIXME) this would result in training which only learns this given metric (example missing measurement points) which would make this methodology useless since we could simply use that same measurement as an more simple way to quantify the scan's degradation. }
    \todo[inline]{TODO maybe evaluate based on different thresholds? missing datapoints, number of detected outliers, number of particles in phantom circle around sensor?}
--- a/thesis/bib/bibliography.bib
+++ b/thesis/bib/bibliography.bib
@@ -167,4 +167,17 @@
  volume = {61},
  pages = { 85-117 },
  url = {https://api.semanticscholar.org/CorpusID:11715509},
 },
@article{autoencoder_survey,
 title = {A comprehensive survey on design and application of autoencoder in deep learning},
 journal = {Applied Soft Computing},
 volume = {138},
 pages = {110176},
 year = {2023},
 issn = {1568-4946},
 doi = {https://doi.org/10.1016/j.asoc.2023.110176},
 url = {https://www.sciencedirect.com/science/article/pii/S1568494623001941},
 author = {Pengzhi Li and Yan Pei and Jianqiang Li},
 keywords = {Deep learning, Autoencoder, Unsupervised learning, Feature extraction, Autoencoder application},
 abstract = {Autoencoder is an unsupervised learning model, which can automatically learn data features from a large number of samples and can act as a dimensionality reduction method. With the development of deep learning technology, autoencoder has attracted the attention of many scholars. Researchers have proposed several improved versions of autoencoder based on different application fields. First, this paper explains the principle of a conventional autoencoder and investigates the primary development process of an autoencoder. Second, We proposed a taxonomy of autoencoders according to their structures and principles. The related autoencoder models are comprehensively analyzed and discussed. This paper introduces the application progress of autoencoders in different fields, such as image classification and natural language processing, etc. Finally, the shortcomings of the current autoencoder algorithm are summarized, and prospected for its future development directions are addressed.}
 }