Architecture of Solution for Panoramic Image Blurring in GIS projects Application

Panoramic images captured using laser scanning technologies, which principally produce point clouds, are readily applicable in colorization of point cloud, detailed visual inspection, road defect detection, spatial entities extraction, diverse 10 maps creation etc. This paper underlines the importance of images in modern surveying technologies and different GIS projects at the same time having regard to their anonymization in accordance with GDPR. Namely, it is a legislative requirement that faces of persons and license plates of vehicles in the collected data are blurred. The objective of this paper is to present a novel architecture of the solution for a particular object blurring. The methodology was tested on four data sets counting 5000, 10 000, 15 000 and 20 000 panoramic images respectively. Percentage of accuracy, i.e. successfully detected 15 and blurred objects of interest, was higher than 97% for each data set.

and different maps creation. With image usage, visual inspection is facilitated and various types of damage can be detected.
According to (Lahoti et al., 2019), maps that graphically present particular data of interest about an urban area can be generated. That has been shown very helpful for planners and architects in positioning future objects, regulating green areas, traffic management, forest management (Kuzmić et al., 2017), and many other. Since images play a vital role in a lot of 35 different disciplines, they have inevitably become part of collected data. These most often involve cars and pedestrians.
Following GDPR (European Data Protection Regulation), those pictures are to have blurred faces of people and car license plates. In order to secure legitimate panoramic images, the detection and blurring of the above-mentioned features ought to be conducted. Whilst it is clear that there are justifiable reasons for sharing multimedia data acquired in such ways (e.g. for law enforcement, forensics, bioterrorism surveillance, disaster prediction), there is also a strong need to protect the privacy 40 of innocent individuals who are inexorably "captured" in the recordings (Ribaric et al., 2016). For instance, the average citizen in London is caught on CCTV cameras about 300 times a day (Cavalaro 2007).
Moreover, the necessity for object detection has significantly increased. The reasons for this include a growing demand for automatic vehicle identification required for traffic control, border control, access control, calculation of parking time and payment, search for stolen cars or unpaid fees, along with the requirement for reliable identification considering a complex 45 diversity of circumstances, e.g. different lighting conditions, presence of random or structured noise in the plate, its size and type of characters as well as nationality specific features (Kasaei et al., 2010).
This study focuses on automatic object detection from panoramic images, obtained by mobile mapping technology, which is followed by blurring of those objects.
The remainder of the paper is divided into four sections. Section Related works provides a descriptive summary of certain 50 methods that have been implemented and tested in the area of automatic object detection and blurring. Section Materials and Methods offers an insight into the proposed methodology. Experimental results are discussed in Section Results and Discussion. Conclusions and further work are presented in the last section.

Related works
Some studies related to the paper topic are presented in this section. Several authors define object detection and its 55 significance (Demir 2014, Božić-Štulić et al., 2018, Radović et al., 2017. Some authors deal with object detection in general, while the others point out the detection of particular spatial entities. In recent years, deep learning approaches using features extracted by convolutional neural networks (CNN) have significantly improved the detection accuracy. Paper (Sommer et al., 2017) proposes a deep neural network derived from the Faster R-CNN approach for multi-category object detection in aerial images. The detection accuracy was shown to be capable of improvement by replacing the network 60 architecture with the one specially designed for handling small objects. Faster R-CNN approach for medium-sized objects was elaborated in paper (Zhang et al., 2016). Authors (Božić-Štulić et al., 2018) used a pre-trained Faster R-CNN model for detection of minor deformations from images obtained by UAV surveying technology. The article (Radović et al., 2017) https://doi.org/10.5194/gi-2021-23 Preprint. Discussion started: 20 July 2021 c Author(s) 2021. CC BY 4.0 License. details the procedure and parameters used for the training of CNNs on a set of aerial images for object recognition. The results show that by selecting a proper set of parameters, CNN can detect and classify objects with a high level of accuracy 65 (97.5%) and computational efficiency.
According to (Deb and Jo, 2009), the vehicle license plate detection from vehicle images is a challenging task due to multistyle plate formats, viewpoint changes and the non-uniform outdoor illumination conditions during image acquisition. A real-time multiple license plate detection algorithm is described in (Asif et al., 2016). The authors used color components to identify license plate regions. Experimental results show that the proposed method accurately detects 93.86% of these 70 elements. Edge feature-based method uses edge detection and morphological operations to find a rectangular candidate plate and then aspect ratio to filter the candidate regions. While this approach can work in many cases, authors of (Chuang et al., 2014) showed that skewed plates and small plates cannot be detected. Another research (Hamid and Shayegh, 2013) advises the use of edge detection and morphological operations to identify potential license regions followed by connected components operation to identify the license plate location. Although the correct recognition rate was reported to be 98.66%, 75 this method requires multiple steps. Namely, the obtained image needs to be converted into a binary mode first and only then the algorithm is conducted. Also, the acquired time from input until final output is not mentioned. Authors (Wang and Lee, 2003) detected probable license plate regions from the gradients of the input car images. Then, this element was separated into several adjacent regions and the one with the largest possible value was chosen. Experimental results show that the rotation-free character recognition method can achieve an accuracy rate of 98.6%. The flow of the suggested algorithm was 80 the manual detection of character features that are non-sensitive to rotation variations. Region-based license plate detection method was described in (Jia et al., 2007), where a mean shift procedure is applied in a spatial-range domain to segment a color vehicle image in order to get candidate regions. License plates adhere to a unique feature combination of rectangularity, aspect ratio and edge density. These three features are defined and extracted in order to decide if a candidate region contains this object. The lack of this method is the difficulty of detecting license plates in case vehicles and their 85 respective plates are of similar or same color.
Human body detection presents a number of challenges such as extracting meaningful features to capture a wide range of poses of human appearance (Deb and Jo, 2009). Most current work on human detection in color images encompasses a variety of feature descriptors and classifiers. Most notable people detection method is based on the histogram of oriented gradients (HOG) feature descriptors (Dalal and Triggs, 2005). Dense descriptors comprising blocks with multiple histograms 90 of image gradients are classified as human/non-human using a linear SVM. The histogram of image gradients is constructed and scaled for each cell. The final search window descriptor is a vector of concatenated block histograms. Although this method has been proven as reasonably efficient, there is still room for optimization and further speed-up in detections. The paper (Miezianko and Pokrajac, 2008) documented a method for detecting people in low-resolution infrared videos. The suggested method is based on extracting gradient histograms from recursively generated patches and, subsequently, 95 computing histogram ratios between the patches. Each set of patches was defined in terms of relative position within the search window, and each set was then recursively applied to extract smaller patches. The major objective of this study was to https://doi.org/10.5194/gi-2021-23 Preprint. Discussion started: 20 July 2021 c Author(s) 2021. CC BY 4.0 License.
incorporate motion detection and tracking into the existing system and to limit the search in the future. Another publication (Breckon et al., 2012) gave an account of a combined autonomous system for surveillance and human detection, which can also be applied to vehicle detection, using optical and thermal images. This approach primarily detects the initial segments 100 within the scene that might contain an object. Afterwards, isolated segments are extracted, supplying a basis for secondary object classification to be carried out. As this method does not take place in real-time, authors of publication (Gilmore et al., 2011) suggested almost real-time detection algorithm, founded on digital, infrared thermal imagery. The objective was to achieve pedestrian candidate selection and detection. The focus of the article (Vu et al., 2006) is an event recognition system employing face detection and tracking combined with audio analysis. Three-dimensional contexts, such as zones of interest 105 and static objects, were recorded in a knowledge base and 3D positions were calculated for mobile objects using calibration matrices. The major flaw of this mechanism is the fact that substantial changes in lighting conditions occasionally prevent the system from detecting people correctly.
Nowadays, there is a growing trend in image blurring so various authors have paid considerable attention to research this area. Authors of (Farid et al., 2018) described content-adaptive blurring (CAB). In the CAB, a multi-focus image is 110 iteratively blurred in such a manner that only the focused regions get blurred whereas the defocus regions receive a little or no blur at all. If blurring degrades the quality of a local image region more than an allowable limit, that region is not blurred and exempted from further blurring. Thus, the defocused regions are preserved while the focused regions get blurred. In (Chiang et al., 2016) different image types were exploited for multiple object recognition by means of focusing and blurring.
The focusing and blurring step applies image processing techniques to focus on the most important objects and blur out the 115 rest of the image with either vignette, blur, or bokeh, using the identified object bounding boxes. A method named inhomogeneous principal component blur (IPCB) was proposed in (Du et al., 2011). It adaptively blurs different pixels of a license plate by taking into account the prior distribution of sensitive information. The blurring is based on the Principal Component Analysis (PCA) approachthe original plate's area is substituted by a reconstructed area that is obtained by applying a smaller number of eigenvectors. The detection of faces and license plates in Google Street View footage was 120 demonstrated in (From et al., 2009), where de-identifications were simply done by blurring the detected locations. A simplified version of the face detector based on a fast sliding-window approach over a range of window sizes was used for the detection of license plates. They belong to a large family of sliding window detectors, such as Schneiderman and Kanade (2001) and Viola-Jones detectors (2004). The authors have reported that a completely automatic system detected and sufficiently blurred 94-96% of the total number of license plates and more than 86% of faces in evaluation sets sampled from 125 Google Street View imagery.

Materials and Methods
The solution proposed in this paper is composed of four software components:


Project management module; The project management module is a web application that provides a user interface (UI) necessary to coordinate the blurring process. Through this interface, it is possible to setup and monitor the blurring process and manually correct the obtained results. Besides the UI, the project management module is responsible for distribution of configurations and tasks to the rest 135 of the modules. It is also the main collection point for the results produced by the aforementioned modules and the final processing unit which encompasses the blurring module. The backend of the project management module was implemented using the Flaks web application framework (The Pallets Projects), ZeroMQ and the OpenCV library (ZeroMQ, Open Source Computer Vision Library). The backend exposes two ZeroMQ TCP client sockets which are utilized to send processing requests and to receive responses from other modules. The first socket is used to communicate with the vehicles and people 140 detection module, whilst the second one is intended for the communication with the license plate detection module. The front-end of the project management module was implemented using the Angular framework and OpenLayers library (Angular, OpenLayers).
The vehicle and people detection module and the license plate detection module are responsible for the extraction of the bounding boxes from the raw images supplied by the project management module. These modules can contain one or more 145 processing nodes. Each processing node exposes a TCP server ZeroMQ socket. The server sockets accept requests from the project management module. Each request contains the actual image that has to be processed. After its processing, the same socket is used to reply to the project management module with a message containing the bounding boxes extracted from the input image.
The processing nodes of the vehicle and people detection module have been implemented as a neural network trained for 150 object detection. This network was implemented based on the TensorFlow Object Detection API (Git Hub). The model used from the API was the Faster R-CNN ResNet-101 trained on the COCO data sets. This model was trained to detect all of the 80 classes in the COCO data set which include vehicles and people. The output of this neural network consists of the bounding boxes and classes of the detected objects. Although the input into the processing node is the whole image, the input into the network comprises overlapping patches of the original image scaled down to the resolution of 1000 by 600 pixels. 155 The minimum overlap of the patches is 50% which ensures that no objects are skipped or partially detected. The main advantage of splitting the image into smaller patches is the reduction of the required computational resources. The downside of this approach is the creation of multiple overlapping bounding boxes for the same detected object. Once returned to the project management module, these overlapping boxes have to be merged into a single bounding box. The process of merging was done using the non-maximum suppression algorithm (Neubeck and Van Gool, 2006). The processing nodes of the 160 license plate detection module were implemented by the means of the same model as those of the vehicle and people one. The blurring module is an integral part of the project management module and is responsible for blurring the areas designated by both the vehicle and people detection module and the license plate detection module. The input into this module represents raw images and the bounding boxes obtained by processing images in the aforementioned modules. A set of bounding boxes, detected in the same image, was used for each corresponding input image to determine the area of the image that has to be blurred. In case the bounding box corresponds to a license plate, the area of the image covered by the 175 bounding box is immediately blurred using the Gaussian blur. However, if the bounding box corresponds to a person, the blurring module first detects the face of that person and then blurs the area of the detected face using the Gaussian blur. The idea was to avoid unnecessary blurring of faces on billboards and similar public displaysthe ones that are recognized as faces but they do not belong to the group prone to the risk of identity misuse or are not considered sensitive personal information. This kind of procedure aims to facilitate the processing performed by the algorithm since it does not require 180 search of all images, but just the detected areas. This makes the methodology described in this paper faster and more precise.
The face detection was done using the Haar classifier provided by the OpenCV library. The classifier was trained to detect faces from the frontal view. The output of the blurring module presents blurred images and bounding boxes of the blurred areas. These two elements are stored in the output folder defined in the project management module.
An overview of the proposed innovative architecture of the solution as well as the data flow between components is shown 185 in Figure 1. The main advantage of the architecture of the solution elaborated in this paper is the ease with which it can be scaled up in order to reduce the computation time necessary to process large amounts of images. This scaling can be done by deploying 190 additional processing nodes. Each node can be deployed on a dedicated computer or it can share a single computer with other nodes. If the nodes are used on dedicated computers, it is necessary to connect those computers in a computer network to allow them to share data. Due to the amount of data that is shared between the nodes during the processing, it is necessary to provide a network infrastructure with bandwidth of no less than 1Gbit/s to achieve an optimal performance of the solution.
Also, the nodes were implemented to utilize CUDA capable GPUs to increase the speed of both training and exploitation of 195 the model. Thus, each node requires a dedicated CUDA capable GPU in order to be successfully used.

Results and Discussion
As mentioned in the Introduction, the need for images in mobile mapping projects is being increasingly recognized over time. Nowadays, these projects have put forward high demands regarding precision together with the level of details and accuracy. Therefore, producing non-colorized point cloud frequently cannot satisfy all these requirements. Colorization of 200 the point cloud is performed with the help of collected images and this has helped a more efficient extraction and element recognition. Additionally, all road defects like cracking, potholes, patching, surface chip loss, and other can easily be managed with image usage. Often, the results of mobile mapping projects are published on public servers. Matching the photographs to point cloud and publishing them on a web-platform offers a significant advantage. This convenience is reflected in the fact that while looking at the point cloud, photographs can be observed at the same time allowing any 205 possible doubts about the particular terrain situation to be solved (Batilović et al., 2019). In order to use images for the above-discussed purposes, they have to be GDPR-compliant, i.e. blurring of faces and license plates is required to ensure privacy protection. Consequently, the suggested mechanism for object detection and blurring has a substantial impact on diverse projects. This section focuses on testing the methodology explained in the previous chapter. In order to verify the validity and 210 accuracy of the suggested solution architecture, several experiments were carried out. Four data sets containing 5000, 10 000, 15 000 and 20 000 images respectively were used in each trial. They represent a collection of panoramic images with the resolution of 8000 x 4000 pixels, obtained by mobile mapping scanning with Trimble MX9 system. The fact that people's faces and license plates are taken at a different angle and have divergent size and position in panoramic images has made this experiment more challenging. Additionally, the project does not encompass uniformbut differentvehicle 215 types: cars, trucks, vans, motorcycles etc. The computer performances used for conducting these tests are presented in Table   1, while their results are given in Table 2.  The conducted experiment showed that this methodology is highly precise and is able to perform detection and blurring more efficiently than some other commercially developed software. The software used in this study was tested four times -for four independent panoramic image data sets.
In the first data set, counting 5000 panoramic images, there were 13 478 automatically detected objects in images. During 225 the manual blurring process, i.e. quality control, 117 objects were additionally detected and blurred. Taking this into account, the percentage of successfully blurred objects was 99.14%. This figure was obtained by dividing the number of automatically detected objects with the total number of objects (automatically plus manually detected). In this example, it means 13 478/ (13 478+117) x 100%= 99.14%. Figure 2 presents the ratio of the number of objects in the images and the total number of images. Here, it can be seen that 230 some images had no objects at all. In this data set, this number was 1143. These include parts of rural area, e.g. forest roads where there were no cars or pedestrians. On the other hand, there are a lot of images counting multiple objects. The maximum number of object in one image was 28.

235
The second data set with 10 000 panoramic images contains 26 152 objects detected by the proposed algorithm and 729 more objects manually found during quality control. The success-percentage equals 97.29%. Figure 3 illustrates a distribution chart of detected objects in this data set. It can be seen that the maximum number of detected objects in one image was 29, while the number of images where there were no bounding boxes was 2371. The third data set consists of 15 000 panoramic images. There were 39 688 objects detected and blurred, while 546 objects were found manually. The percentage of successfully blurred images was 98.64%. Figure 4 represents a distribution chart of  The last tested data set consists of 20 000 panoramic images. The proposed approach produced 52 855 objects which were detected and blurred, while 845 objects were found manually. Figure 5 shows a distribution chart of detected objects in the fourth image data set, where the maximum number of detected objects in one image was 29, while 4320 images had no 250 objects detected. The main limitation of the suggested architecture of the solution for image processing was identified when detecting white cars, because the majority of license plates are white as well. The same color makes it more difficult for the algorithm to execute detection successfully.

Conclusions 265
The increasing need for images in mobile mapping projects is highlighted in this paper. Since images are very often used, blurring of faces and license plates is required to comply with data protection laws. According to (Ribaric et al., 2016), privacy is one of the most important social and political issues in contemporary information society, characterized by a growing range of enabling and supporting technologies and services. Amongst these are communications, multimedia, biometrics, big data, rapid development of cloud storage (Wang et al., 2021), data mining, internet, social networks, and 270 audio-video surveillance. Each of these can potentially provide the means for privacy intrusion. Therefore, this article suggests a reliable method for detection and blurring of these particular objects.
The experiment evaluating the applicability of the proposed software was conducted on four different data sets containing urban and rural area panoramic images. The total number of tested panoramic images was 50 000. Those images were obtained from laser scanning of roads with Trimble MX9 device. The success rate results for each data set were as follows: 275 data set with 5000 panoramic images ensured the accuracy of 99.14%, the one with 10 000 panoramic images had 97.21% accuracy, the one comprising 15 000 panoramic images achieved 98.62% and the data set with 20 000 images generated 98.40% of success.
Several major advantages of this approach were identified. Not only that the high percentage of positively detected and blurred elements is evident, but also the presented algorithm was proven remarkably effective in cases when the 280 aforementioned objects of interests had different angles, positions, colors and sizes. Moreover, the usage is very simple since it requires from the user just to put the images that should be blurred into a defined folder and start the blurring machine.
Next, its potential is great because it is able to use images of different size, resolution, extension, etc. Finally, its processing speed is extremely convenient as it takes less than one second for detecting and blurring all faces and license plates from one panoramic image. 285 Future work will involve three focal points. To begin with, the software will be improved in terms of detecting the license plates of the same color as the vehicle they are attached to. Also, since bounding boxes are upright rectangles covering a slightly larger area than the object of interest, working on more precise bounding boxes will be the next stage of further research. They will be marking only the license plate area, taking into account its angle and tilt. In that way, the unnecessary parts of the image will not be blurred, and the quality of the output image in general will be better. Eventually, the plan is to 290 upgrade this methodology with traffic sign detection. Automatic detection of traffic signs with assigned attributes such as