the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Review of methodological considerations and recommendations for mapping remote glaciers from aerial photography surveys in suboptimal conditions
Abstract. Structure from motion (SfM) photogrammetry coupled with multiview stereo (MVS) techniques are widely used for generating topographic data for monitoring change in surface elevation. However, study sites on remote glaciers and ice caps often offer suboptimal conditions, including large survey areas, complex topography, changing weather and light conditions, poor contrast over ice and snow, and reduced satellite positioning performance. Here, we provide a review of methodological considerations for conducting aerial photography surveys under challenging field conditions. We generate topographic reconstructions, outlining the entire workflow, from data acquisition to SfM-MVS processing, using case studies focused around two small glaciers in Arctic Canada. We provide recommendations for the selection of photographic and positioning hardware and guidelines for flexible survey design using direct measurements of camera positions, thereby removing the need for ground control points. The focus is on maximising hardware performance despite inherent limitations, with the aim of optimising the quality and quantity of the source data, including image information and control measurements, despite suboptimal conditions.
- Preprint
(7735 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 02 Dec 2024)
-
RC1: 'Comment on gi-2024-10', Anonymous Referee #1, 11 Nov 2024
reply
The paper presents many important aspects in order to achieve optimal results
in photogrammetric surveys from the planning stage to the final results. These
aspects are mostly reviewed from a theoretical point view. The paper ends
with two practical examples presenting the work and outcome for two flights
at high latitude (~ 80° North).The paper is a bit of a mixed bag. The first part (approx. 11 pages) summarizes
many aspects of modern day photogrammetry, including MTF, diffraction, motion blur,
rolling shutter, GNSS specifics, etc.Actually, everything is true for any photogrammetric endeavor -- irrespective of the
terrain type (glacier or not).The 2nd part (also 11 pages) presents the work and outcome for two flights
at high latitude (~ 80° North).Following the very detailed description in the first part, one might expect that the practical
part then showcases how the theory of the first part is considered in the planning stage
of the two flights, and/or that it is demonstrated that neglecting certain aspects from the theory section produces
certain errors in the results. But that's not the case. Both flights appear to be executed without
having had many options.E.g. what could have been investigated:
- different apertures
- effect of flight speed on blur
- shooting not in raw; or bad Adobe Lightroom settings when "developing" the raw images
- impact of these settings on the different surface types: glacier (which is also part of the title) and non-glacier
- impact of changing the oblique viewing angle
- different flight patterns
- etc.Of course, the possibilities of changing the parameters and investigations are almost endless, but in the present form
the photogrammetric results are obtained in <<one>> way. Apparently, the outcome fulfilled the requirements,
but it remains a bit unclear if less strict settings would have let to similar acceptable results.Still, the paper is very well and understandable written. The theoretical part gives a great theoretical summary
on many aspects. And the description of the two flights shows the reality of conducting
photogrammetric data acquisitions in such high altitude.
In order to accept the paper I would ask the authors to provide a few more details in the theoretical and practical parts:row 78
"Along with focal length, sensor size also defines the ground sampling distance (GSD) and therefore"
-->
This is not correct, equ (1) for the GSD has no sensor size, just the pixel size (pixel pitch).Table 1
"cy per in" should be "cy per mm". How is the Nyquist limit calculated?Fig. 1+2+3
The paper should include the formulae that went into creating these figures.row 148
"the diffraction limit decreases with smaller apertures"
-->
"the diffraction limit decreases with smaller apertures (f-numbers)"row 149
"1/lambad N" should be "1/(lambad*N)".row 175-180
"more visible as the size of the Airy pattern"
"the distance between the centers of two disks is equal to their radius"
"when the circle of confusion reaches a size of twice the pixel pitch"
--> I would welcome that every time it is clearly specified if you mean diameter or radius (like in the 2nd quote).Equation 2
"GSD", which is not the resolution of the image, is a well defined term (your equation 1).
Thus I object against introducing the term "diffraction limited GSD" in equation 2,
because any later mentioning of GSD creates confusion whether you mean really "GSD" or use it as a short for "diffraction limited GSD".
And, indeed, in the caption of Fig. 3 you write "Diffraction limited ground sampling distance (GSD)" which is ambiguous in the
way that there GSD might refer only to "ground sampling distance" or the whole term "diffraction limited ground sampling distance".
In the end of that caption you also mention another term: "theoretical GSD".
Furthermore, later, you refer to equ.2 as circle of confusion (which I think is better).
Any term that really refers to the resolution (instead of the sampling = GSD) should by clearly discernable. Some authors
use the term "GRD" (ground resolved resolution). Although, that term itself leaves open what effects are considered (just
diffraction, or even the full point spread function), it still means obviously something else than GSD.
row 202
"Wide angle lenses exhibit negative (barrel) distortions which present as decreasing
image magnification from the center of the frame towards the edges, while positive (pincushion)
distortions are characteristic of telephoto lenses (70 mm or above)."
--> This is opposite to my experience which is:
1. That the sign of the radial distortion is not predictable from the angle of the camera.
2. The longer the focal length the more the lens shows no distortion at all.
Admitted, this is my personal experience, but currently, the quote is without reference. So, please,
either provide a reference, or rephrase.(On row 197 you write "the downside being that short focal lengths are more prone to distortions", if this
is inverted to "longer focal lengths are likely to have less distortion", it fits to my experience.)
Fig 2.
I am not sure, if (a) and (b) are derived from the very same RAW image, or only (a); and (b) is a
direct JPG from the camera (and thus in principle a different photograph than the RAW for (a))?row 250
"Including the affinity and non-orthogonality coefficients
in the camera calibration matrix at the image alignment stage should partially compensate for this effect."
-->
This works only (or the more) the flight speed is constant and the terrain is flat. You may wish to add/clarify that.Remark 1:
(Again from my experience) affine parameters need to be introduced per image (and not per camera).
Remark 2:
The developers of Pix4D have a paper about their method on rolling shutter compensation, which is better than the affinity in
image space as it directly works on the change of the exterior orientation parameters per image:
https://s3.amazonaws.com/mics.pix4d.com/KB/documents/isprs_rolling_shutter_paper_final_2016.pdfrow 254
"including" --> "included" (?)
row 266
"0.43" --> "4.3"row 280
"The direct georeferencing method ... similar precision to the ground-based approach where
camera position information is acquired with multi-frequency survey-grade GNSS equipment"
-->
Does this last part ("where camera position ...") refer to the ground-based approach?Remark:
A photogrammetric survey that fully relies on direct georeferencing using GNSS, and thus
without a single GCP, is prone to deliver results with a large height bias. Because if the camera
calibration is considered unknown and thus is estimated during the bundle block adjustment, then any small
bias in the estimated focal length causes a large height offset. This is especially true for vertical
images, and may be mitigated using oblique images.
General comment to the theory (section 2):
Interestingly, the problem of depth of field is completely neglected.
Although, its importance increases with smaller viewing (focus) distance, it belongs into this theory part.
Actually, you refer to defocus in row 591.
Especially, the hyperfocal distance would be an interesting feature maybe not known to everybody of the target audience.
page 14/15
I am a bit confused regarding your "off-nadir" images. On row 365 you say "">5°" and on row 372 "30-50°".
Why use these rather different definition thresholds, and not just provide some information on the off-nadir
angles your two sites used; e.g. 5th and 95th percentile, and add that info to table 2.Additionally, it is not clear in which direction the off-nadir angle is applied; as pitch or as roll, or something between?
Here it would be super helpful to include the viewing direction in Fig. 5 for every 50th image or so.In table 2 further some info on the GSD would be interesting to get some idea. I know with oblique angles
and variable terrain there is no straight forward way to come up with a representative value, but currently
there is no mentioning at all.
row 392
You took pictures through the front passenger window. It would be interesting to know how the camera calibration
was effected by that (in comparison to the other data set).row 413
"PPP" is mentioned here the first time, please, add a reference.row 443
Here and elsewhere you mention an aperture of "f/5". Although, such value is possible in principle, it would be very
unusual, because the usual f-stop numbers are the powers of sqrt(2), the closest being thus 4 and 5.6, which you also
have in Fig. 2+3.row 456
"at or below the size of the circle resulting from diffraction"
-->
you mean "diameter"? Also add a reference to the equation in the theoretical part.row 478
"Lastly, a variable exposure gain was automatically applied to all
images to brighten underexposed areas and match total exposure of successive images."
-->
Can your provide a bit more information on how this exposure gain is controlled?row 486
"the most time-intensive task in post-production is masking extensive swaths of sky and any terrain beyond the area of interest."
-->
What would have happened if you would not have done this masking? Would the bundle block adjustment have completely failed for
all images, or just for the affected images? Or would the adjustment have worked, but the subsequent dense point cloud extraction
would have failed (if so, why not simply define a bounding box prior to deriving the dense point cloud)?General comment on the bundle block adjustment in Metashape:
- What accuracies were assumed for the GNSS image positions?
- What GNSS residuals were obtained after the adjustment?
- What reprojection error was obtained?row 575
"Ideally, horizontal accuracy should be higher or equivalent to the spatial resolution of the final gridded products.
Here, both DEMs and orthomosaics were gridded at 0.5 m resolution and horizontal checkpoint misalignment errors remain
below that level for both reconstructions."
-->
I never heard of this rule and, actually, do not subscribe to it. The gridding of the results (DEM from dense image
matching and orthomosaic) should fit to
the image resolution, i.e. the finest details in the image should also be included in these results, independently of the spatial
accuracy of the georeferencing. Even if accuracy and grid width do not match, as in your case, the results provide valuable
information about these fine details, however, you only are able to derive the location with a certain limited accuracy. For
that reason, the accuracy always should be communicated together with the results. Furthermore, the error in the georeferencing
is usually a global one, meaning that your result could be improved simply by shifting in 3D to obtain a much better georeferencing.
The latter can be done even on the results themselves; e.g. in case of time series where one epoch (with the best georeferencing
quality) serves as reference (and enough corresponding stable areas are present, of course).
Thus it would be a waste of potential to set the gridding of the result to the obtained georeferencing accuracy.
Last not least, photogrammetry works even without GCPs and GNSS (one only needs a known distance for scaling). In this case
no meaningful (absolute) georeferencing accuracy can be obtained, but the resolution of the images still serves as a guide.As you outlined in the theory the actual resolution of the images is not easily determinable, because it depends on so many
factors. However, the GSD is well defined and easily obtainable (at least in case of nadir images over "flat" terrain). So the
usual way in photogrammetry is to adopt the GSD as gridding value for the DEM and the orthophoto.
In your case I am not sure what the GSD is, but from the given point density values of around 15 pts/m², we see that the average
distance between these points is 1/sqrt(15) = 26 cm. Thus you could at least create your DEM and the orthomosaic with 25 cm grid width
and should thus be able to get a bit more out of your results than compared to the chosen 50 cm (provided the images were not dramatically
effected by blur).I could imagine, that the orthomosaic could be created with an even smaller pixel size, because with dense image matching only in
optimal cases one really gets a 3D point per image pixel. Furthermore, in your case, you will have a high variability of the GSD, and
in order to get the details even in the images with the smallest GSD, one thus could base the orthomosaic pixelsize not on the average
GSD but some smaller statistical value, like the 10th percentile.
row 665
"Due to data gaps, 28 cameras from the 10 Hz EF survey were disabled (∼5 %), compared to 129 cameras (or
13 %) from the 15 s TF survey."
-->
Here you use the wrong terminology from Metashape. You mean 28 and 129 "images" not "cameras".
Actually, here do you mean that the entire images were disabled, or that the GNSS locations were disabled (due
to big interpolation error)?
(Actually, Metashape should be able to link images without GNSS location information to their neighboring
images (with GNSS location), provided the image content allows for enough feature points.)
It would be interesting to list the numbers of images regarding: originally taken vs. disabled images (classified for
whatever reasons (e.g. blur)).
Finally, a general comment:
If you cite a book of several hundreds of pages, then please include the page number in the quotes; e.g. Rowlands, 2017.Citation: https://doi.org/10.5194/gi-2024-10-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
86 | 38 | 7 | 131 | 3 | 4 |
- HTML: 86
- PDF: 38
- XML: 7
- Total: 131
- BibTeX: 3
- EndNote: 4
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1