This commit is contained in:
Jan Kowalczyk
2025-08-18 13:51:43 +02:00
parent 891b51b923
commit d170b4f9b7

View File

@@ -1041,6 +1041,8 @@ Even though the LeNet-inspired encoder proved capable of achieving our degradati
The receptive field of a convolutional neural network describes the region of the input that influences a single output activation. Its size and aspect ratio determine which structures the network can effectively capture: if the RF is too small, larger patterns cannot be detected, while an excessively large RF may blur fine details. For standard image data, the RF is often expressed as a symmetric $n \times n$ region \todo[inline]{add schematic of square RF}, but in principle it can be computed independently per axis. The receptive field of a convolutional neural network describes the region of the input that influences a single output activation. Its size and aspect ratio determine which structures the network can effectively capture: if the RF is too small, larger patterns cannot be detected, while an excessively large RF may blur fine details. For standard image data, the RF is often expressed as a symmetric $n \times n$ region \todo[inline]{add schematic of square RF}, but in principle it can be computed independently per axis.
\fig{setup_rf_concept}{diagrams/rf_figure}{UNFINISHED - rf concept}
In the case of spherical LiDAR projections, the input has a highly unbalanced resolution due to the sensor geometry. A fixed number of vertical channels (typically 32--128) sweeps across the horizontal axis, producing thousands of measurements per channel. This results in a pixel-per-degree resolution of approximately $0.99^{\circ}$/pixel vertically and $0.18^{\circ}$/pixel horizontally \todo[inline]{double-check with calculation graphic/table}. Consequently, the LeNet-inspired encoders calculated receptive field of $16 \times 16$ pixels translates to an angular size of $15.88^{\circ} \times 2.81^{\circ}$, which is highly rectangular in angular space. Such a mismatch risks limiting the networks ability to capture degradation patterns that extend differently across the two axes. \todo[inline]{add schematic showing rectangular angular RF overlaid on LiDAR projection} In the case of spherical LiDAR projections, the input has a highly unbalanced resolution due to the sensor geometry. A fixed number of vertical channels (typically 32--128) sweeps across the horizontal axis, producing thousands of measurements per channel. This results in a pixel-per-degree resolution of approximately $0.99^{\circ}$/pixel vertically and $0.18^{\circ}$/pixel horizontally \todo[inline]{double-check with calculation graphic/table}. Consequently, the LeNet-inspired encoders calculated receptive field of $16 \times 16$ pixels translates to an angular size of $15.88^{\circ} \times 2.81^{\circ}$, which is highly rectangular in angular space. Such a mismatch risks limiting the networks ability to capture degradation patterns that extend differently across the two axes. \todo[inline]{add schematic showing rectangular angular RF overlaid on LiDAR projection}
To address this, we developed an efficient network architecture with asymmetric convolution kernels, resulting in a receptive field of $10 \times 52$ pixels. In angular terms, this corresponds to $9.93^{\circ} \times 9.14^{\circ}$, which is far more balanced between vertical and horizontal directions. This adjustment increases the likelihood of capturing a broader variety of degradation patterns. Additional design improvements were incorporated as well, which will be described in the following section. To address this, we developed an efficient network architecture with asymmetric convolution kernels, resulting in a receptive field of $10 \times 52$ pixels. In angular terms, this corresponds to $9.93^{\circ} \times 9.14^{\circ}$, which is far more balanced between vertical and horizontal directions. This adjustment increases the likelihood of capturing a broader variety of degradation patterns. Additional design improvements were incorporated as well, which will be described in the following section.