wip
This commit is contained in:
@@ -249,22 +249,45 @@
|
||||
\newsection{Algorithm Details and Hyperparameters}{sec:algorithm_details}
|
||||
\todo[inline]{backpropagation optimization formula, hyperaparameters explanation}
|
||||
|
||||
\todo[inline, color=green!40]{in formula X we see the optimization target of the algorithm. explain in one paragraph the variables in the optimization formula}
|
||||
\todo[inline, color=green!40]{explain the three terms (unlabeled, labeled, regularization)}
|
||||
\begin{equation}
|
||||
\min_{\mathcal{W}} \quad
|
||||
\frac{1}{n+m} \sum_{i=1}^{n}\|\phi(\mathbf{x}_{i};\mathcal{W})-\mathbf{c}\|^{2}
|
||||
+\frac{\eta}{n+m}\sum_{j=1}^{m}\left(\|\phi(\tilde{\mathbf{x}}_{j};\mathcal{W})-\mathbf{c}\|^{2}\right)^{\tilde{y}_{j}}
|
||||
+\frac{\lambda}{2}\sum_{\ell=1}^{L}\|\mathbf{W}^{\ell}\|_{F}^{2}.
|
||||
\end{equation}
|
||||
|
||||
|
||||
\newsection{Advantages and Limitations}{sec:advantages_limitations}
|
||||
\todo[inline]{semi supervised, learns normality by amount of data (no labeling/ground truth required), very few labels for better training to specific situation}
|
||||
|
||||
\newchapter{Data and Preprocessing}{chap:data_preprocessing}
|
||||
\newsection{Data Sources}{sec:data_collection}
|
||||
Dataset~\cite{alexander_kyuroson_2023_7913307}
|
||||
\todo[inline, color=green!40]{good data important for learning based methods and for evaluation. in this chapter we talk about the requirements we have for our data and the difficulties that come with them and will then give some information about the dataset that was used as well as how the data was preprocessed for the experiments (sec 4.2)}
|
||||
|
||||
\newsection{Data}{sec:data}
|
||||
|
||||
\todo[inline]{describe data sources, limitations}
|
||||
\todo[inline]{screenshots of camera/3d data?}
|
||||
\todo[inline]{difficulties: no ground truth, different lidar sensors/settings, different data shapes, available metadata, ...}
|
||||
\todo[inline, color=green!40]{we require lidar sensor data that was collected in a domain as closely related to our target domain (rescue robots indoors, cave-ins, ) as possible which also includes some kind of appreciable degradation for which we have some kind of labeling possibility. ideally the degradation should be from smoke/dust/aerosol particles. most data should be without degradation (since we require more normal than anormal data to train the method as described in X) but we need enough anormal data so we can confidently evaluate the methods performance}
|
||||
\todo[inline, color=green!40]{labeling is an especially problematic topic since ideally we would want an analog value which corresponds with the amount of smoke present for evaluation. for training we only require the possibility to provide labels in the form of normal or anormal targets (binary classification) and these labels do not have to be present for all data, only for some of the data (since semi-supervised only uses some labeled data as discussed in X)}
|
||||
\todo[inline, color=green!40]{We chose to evaulate the method on the dataset "Multimodal Dataset from Harsh Sub-Terranean Environment with Aerosol Particles for Frontier Exploration"~\cite{alexander_kyuroson_2023_7913307} which is a public dataset collected by X in a sub-terranean environment and includes data from multiple sensors on a moving sensor platform as well as experiments where sensor data is explicitely degraded by aerosol particles produced by a smoke machine.}
|
||||
\todo[inline, color=green!40]{list sensors on the platform}
|
||||
\todo[inline, color=green!40]{talk about how much data is available (maybe a plot about data?), number of experiments with/without degradation, other factors in these experiments which do not concern our use-case of them}
|
||||
\todo[inline, color=green!40]{lidar data of 360° sensor is captured at 10 frames per second. each sensor output consists of pointcloud which resulted from measurement of 32 vertical channels for each of which 2048 measurement points are taken during each measurement equiangular distributed around the whole horizontal 360°, so the sensor measures 32 * 2048 = 65536 measurements 10 times a second for which ideally every one produces a point in the pointcloud consisting of x,y,z coordinates (relative to sensor platform) as well as some other values per measurement (reflectivity, intensity originally measured range value)}
|
||||
|
||||
|
||||
\newsection{Preprocessing Steps}{sec:preprocessing}
|
||||
\todo[inline]{describe how 3d lidar data was preprocessed (2d projection), labeling}
|
||||
\todo[inline]{screenshots of 2d projections?}
|
||||
|
||||
\todo[inline, color=green!40]{while as described in sec X the method DeepSAD is not dependend on any specific type/structure of data it requires to train an auto encoder in the pretraining step. such autoencoders are better understood in the image domain since there are many uses cases for this such as X (TODO citation needed), there are also 3d data auto encoders such as X (todo find example). same as the reference paper (rain cite) we chose to transform the 3d data to 2d by using a spherical spherical projection to map each of the 3d points onto a 2d plane where the range of each measurement can be expressed as the brightness of a single pixel. this leaves us with a 2d image of resolution 32x2048 (channels by horizontal measurements), which is helpful for visualization as well as for choosing a simpler architecture for the autoencoder of deepsad, the data in the rosbag is sparse meaning that measurements of the lidar which did not produce any value (no return ray detected before sensor specific timeout) are simply not present in the lidar scan. meaning we have at most 65xxx measurements per scan but mostly fewer than this, (maybe statistic about this? could aslo be interesting to show smoke experiment stuff)}
|
||||
\todo[inline, color=green!40]{to achieve this transformation we used the helpful measurement index and channel present in each measurement point of the dataset which allowed a perfect reconstruction of the 2d projection without calculating the pixel position in the projection of each measurement via angles which in our experience typically leads to some ambiguity in the projection (multiple measurements mapping to the same pixel due to precision loss/other errors) the measurement index increases even for unavailable measurements (no ray return) so we can simply create the 2d projection by mapping the normalized range (FIXME really normalized) value to the pixel position y = channel, x = measurement index. by initalizing the array to NaN values originally we have a 2d data structure with the range values and NaN on pixel positions where originally no measurement took place (missing measurements in scans due to no ray return)}
|
||||
|
||||
|
||||
\newchapter{Experimental Setup}{chap:experimental_setup}
|
||||
\newsection{DeepSAD Autoencoder Architecture}{sec:autoencoder_architecture}
|
||||
\newsection{Training/Evaluation Data Distribution}{sec:data_setup}
|
||||
\todo[inline]{which data was used how in training/evaluation}
|
||||
\todo[inline]{explain concept of global/local application for global-/window quantifiction}
|
||||
|
||||
Reference in New Issue
Block a user