Ques Mon
1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEE Transactions on Industrial Informatics
1
Abstract—In the research of location privacy protection, the
existing methods are mostly based on the traditional anonymization, fuzzy and cryptography technology, and little success in the big data environment, for example, the sensor networks contain sensitive information, which is compulsory to be appropriately protected. Current trends such as “Industrie 4.0” and Internet of Things (IoT), generate, process, and exchange vast amounts of security-critical and privacy-sensitive data, which makes them attractive targets of attacks. However, previous methods overlooked the privacy protection issue, leading to privacy violation. In this paper, we propose a location privacy protection method that satisfying differential privacy constraint to protect location data privacy and maximize the utility of data and algorithm in Industrial Internet of Things. In view of the high value and low density of location data, we combine the utility with the privacy and build a multilevel location information tree model. Furthermore, the index mechanism of differential privacy is used to select data according to the tree node accessing frequency. Finally, the Laplace scheme is used to add noises to accessing frequency of the selecting data. As is shown in the theoretical analysis and the experimental results, the proposed strategy can achieve significant improvements in terms of security, privacy, and applicability.
Index Terms—Differential privacy, location privacy protection, location privacy tree, Internet of Things.
I. INTRODUCTION HE development of information technology has accumulated a great deal of data for today’s digital systems.
Big data is a very important research and development resource,
This work was funded by the National Natural Science Foundation of China
(61772282, 61373134, 61772454, and 61402234). It was also supported by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX17_0901) and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET). (Corresponding author: Jin Wang.)
Chunyong Yin is with the School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China (e-mail: yinchunyong@hotmail.com).
Jinwen Xi is with the School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China (e-mail: javenxi@yeah.net).
Ruxia Sun is with the School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, China (e-mail: src@nuist.edu.cn).
Jin Wang is with the College of Information Engineering, Yangzhou University, Yangzhou 225127, China (e-mail: jinwang@yzu.edu.cn).
and the demand for data publishing, sharing and analysis are also growing rapidly. People pay more and more attention to the security of data protection, and the government, enterprises, and individuals are also improving their understanding of privacy protection.
Body sensor networks (BSNs), as a special application of wireless sensor networks (WSNs) [1], are deployed on the surface of bodies for periodically monitoring physical conditions. For a typical social participatory sensing application, it is important to motivate participatory, at the same time, the participatory sensing process should not disclose the privacy information of any participating party (the private data) or the community (patterns, distribution, etc.). Therefore, it is essential to develop a privacy protection data strategy for the protection of the users and the community [2].
Devices in the Internet of Things (IoT) generate, process, and exchange vast amounts of security and safety-critical data as well as privacy-sensitive information, and hence are appealing targets of various attacks. To ensure the correct and safe operation of IoT systems, it is crucial to assure the integrity of the underlying devices, especially their code and data privacy, against malicious modification [3].
The privacy threats of Industrial Internet of Things (IIoT) [4] can be simply divided into two categories: privacy threats based on data and privacy threats on location. Data privacy issues mainly refer to the leakage of secret information in the process of data acquisition and transmission in Internet of Things.
Location privacy is an important part of privacy protection of the Internet of Things. It mainly refers to the location privacy of each node in the Internet of Things and the location privacy of the Internet of Things in providing various location services, especially including the RFID reader location privacy RFID user location privacy, sensor Node location privacy, and location-based privacy issues based on location services [3], [5].
Usually, data collected, aggregated and transmitted in sensor networks contain personal and sensitive information, which directly or indirectly reveals the condition of a person. If the data cannot be properly preserved, once exposed to the public, the privacy will be destroyed. Therefore, protecting the privacy of sensitive data is greatly important [6], [7].
Location data implies moving objects, spatial coordinates, current time, and unique features different from other data, which is discrete and of high value. Before the concept of big data, most of the privacy protection methods focus on a small
Chunyong Yin, Jinwen Xi, Ruxia Sun, Jin Wang, Member, IEEE
Location Privacy Protection based on Differential Privacy Strategy for Big Data in
Industrial Internet-of-Things
T
1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEE Transactions on Industrial Informatics
2
number of non-positional data. There are some limitations for the protection of location data privacy in big data. There are two main reasons as follows: (1) Multiple data fusion by big data makes traditional anonymity [8], [9], [10] and fuzzification technology difficult to take effect in location privacy protection [11]; (2) Traditional cryptography technology takes little effect on the real-time analysis required by big data.
In summary, location privacy protection faces great challenges for big data in IIoT. Therefore, we propose a more rigorous LPT-DP-k location privacy protection method based on differential privacy strategy for big data in sensor networks.
Our specific contributions mainly include: (I) We introduce a tree structure to represent the location data
in sensor networks, which called LPT (Location privacy tree), according to the characteristics and retrieval difficulty of location data.
(II) The differential privacy strategy is suitable for location privacy protection because it is insensitive to the background knowledge and the DP-k (Differential privacy-k) model has better protection effect. Laplace and index mechanism are the main implementation mechanisms of differential privacy, which can show the degree of privacy protection by allocating the privacy budget and is relatively reliable and rigorous.
(III) We conduct extensive experiments on real-world datasets which show that the proposed location privacy protection method can protect users’ location privacy without significantly affecting the privacy, availability, security, and effectiveness.
Section 2 presents necessary background and related work. Section 3 introduces the preliminary knowledge. Section 4 details the proposed location privacy protection method. The real-world datasets and experimental results are presented in Section 5 and conclusions in Section 6.
II. RELATED WORK In the process of data mining and data publishing, the
privacy protection of location data mainly involves two aspects: privacy model and utility.
A. Privacy Model In recent years, with the widespread application of location
services and data mining and publishing, location privacy protection model in location service can be divided into two categories: one is the traditional anonymous model based on grouping; the other is the differential privacy model that ignores the attacker’s background knowledge.
(1) Anonymous model based on grouping The traditional anonymous model based on grouping plays
an important role in location privacy protection. Samarati et al. [12] firstly proposed k-anonymity method and other a large number of methods based on k-anonymity [13], [14], [15]. Previous research [15], [16] shows that only using anonymous methods does not provide good protection to a wide range of data. According to [17], the encryption privacy protection method is proposed, which can completely protect the privacy of data and prevent the leakage of data in the process of location
service, but the availability of data is insufficient. The development of traditional location privacy protection technology has gone through three stages: the “informed and consent” method proposed by document [18] has been developed to deal with anonymous processing in a single location, and then to deal with anonymous processing of user’s trajectory data. Heuristic privacy measurement, probability deduction, and private information retrieval based technologies are common methods to protect location privacy. The heuristic privacy measurement method mainly protects the users who are not in the strict privacy protection environment, such as k-anonymity, t-closeness [19], m-invariance [20], and l-diversity. However, these three kinds of methods are proposed as a unified attacking model, which protect the location data under the premise of accumulating certain background knowledge. As attackers grasp more background knowledge, these methods cannot effectively protect the user’s location data privacy and the lack of privacy protection about the type of relational privacy protection method is proposed in [21].
(2) Differential privacy model Differential privacy model is a strong privacy concept which
is completely independent of attacker’s background knowledge and computing ability, and has become a research hotspot in recent years. Compared to the traditional privacy protection model, differential privacy has its unique advantages. Firstly, the model assumes that the attacker has the maximum background knowledge. Secondly, differential privacy has a solid mathematical foundation, a strict definition of privacy protection and a reliable quantitative evaluation method. In recent years, differential privacy model has been widely applied in privacy protection. Especially, the FM (Functional mechanism) is introduced in [22], in which an objective function of -differential privacy disturbance optimization problem is used to protect privacy. The Diff-FPM (Privacy-Preserving Mining) algorithm is proposed in [23], [24], [25] and combined with MCMC (Markov Chain Monte Carlo), which provides privacy protection and maintains the high data availability while satisfying ( )-differential privacy conditions. The PrivBasis and SmartTrunc method proposed in [26] adopt differential privacy model in the mining process of frequent itemsets, ensuring the privacy and utility of data analysis and anonymity. The DiffP-C4.5 and DiffGen algorithm introduced in [27] combines differential privacy with decision trees and other data structures to maintain a balance between data privacy and availability.
In a word, differential privacy is an effective privacy protection mechanism, which protects the user’s location privacy while keeping enough useful information for data analysis.
B. Utility Maximization In the process of location service, data mining and data publishing need to protect location privacy and provide enough information for data analysis. Therefore, data utility is the core issue that needs to be paid attention to. Reference [28] uses compressive sensing theory to propose a perception mechanism
1551-3203 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2017.2773646, IEEE Transactions on Industrial Informatics
3
to solve the published issue of statistical results, which can effectively solve the insufficiency problem of data utility, but it destroys the link among data. DP-topk method is proposed in [29], which has a more rigorous definition of effectiveness. But this method ignored the relation among transaction data, and because of its poor efficiency in data processing individually, the availability of algorithm is not high. To maximize the utility of the results and meet the requirement of location privacy protection, the paper proposes LPT-DP-k algorithm, which is a more rigorous differential privacy protection method, not only can guarantee the high availability of data, but also can enhance the usability of the algorithm.
III. PRELIMINARY KNOWLEDGE Differential privacy is a common privacy protection
framework that supported by the solid mathematical theory, which can provide privacy protection for data in the case that the attacker grasps the largest background knowledge. Definition 1 (Differential privacy)
Suppose there is a random algorithm M , MP is all possible output sets for M , for any given adjacent data set D and D (there is at most one different record between them),
1D D , MS is any subset of MP . If the algorithm M satisfies the following inequality, the algorithm M will satisfy -differential privacy protection.
exp .r M r MP M D S P M D S (1)
in which .rP represents the randomness of the algorithm M on the data set D and D . Definition 2 (Sensitivity)
The differential privacy protection method defines two kinds of sensitivity, called global sensitivity and local sensitivity. Suppose there is a query function : df D R , the input is a data set and the output is a d -dimensional real vector. For any adjacent data set:
1,= max .D Df f D f D (2)
called the global sensitivity of function f , and f represents the maximum change value of the output results.
f D f D is the 1-order norm distance between f D and f D . Definition 3 (Implementation mechanism)
Laplace mechanism and Exponential mechanism [30] are two of the most basic implementation mechanisms of differential privacy protection. In this paper, the Laplace mechanism is used to add the noise that obeys the Laplace distribution to realize the differential privacy. Assuming the privacy protection algorithm f based on the Laplace mechanism, the noise keeps to the Laplace distribution with the
variance f is and the mean is 0. Then the probability density
function is:
r 1P , = exp .
2 x
x
(3)
in which x represents the specific variable and = f .
If the algorithm f is proportional to the probability of
,exp 2
D r
to select from O and export r , then the
algorithm f provides -differential privacy protection, the concrete formula is as follows:
r ,
, = : P exp . 2
D r A D r r O