CN116709392B

CN116709392B - Large-scale wireless sensor network data fusion method

Info

Publication number: CN116709392B
Application number: CN202310991312.2A
Authority: CN
Inventors: 曾建祥; 欧阳路; 何海鱼; 邓群; 邓林海; 王军; 吴稳; 薛学科; 胡雅琴; 刘孟夫
Original assignee: Hunan Tianlian City Data Control Co ltd
Current assignee: Hunan Tianlian City Data Control Co ltd
Priority date: 2023-08-08
Filing date: 2023-08-08
Publication date: 2023-11-14
Anticipated expiration: 2043-08-08
Also published as: CN116709392A

Abstract

The application discloses a large-scale wireless sensor network data fusion method, which comprises the following steps: designing a wireless sensor network comprising various heterogeneous sensors, and optimizing the deployment positions of various sensor nodes; transmitting the original data acquired by each sensor to a data center through a wireless network for storage; performing data cleaning and conversion processes on the data center, and standardizing all original data into a uniform data format; and performing a fusion operation of association analysis and pattern mining on the stored data on the data center. The method of the application proves the accuracy and the reliability in practical application. Accurate and reliable results are obtained, both at the single sensor level and at the overall network level.

Description

Large-scale wireless sensor network data fusion method

Technical Field

The application relates to the technical fields of sensors and the Internet of things, in particular to a large-scale wireless sensor network data fusion method.

Background

Wireless Sensor Networks (WSNs) are a technology that is widely used in the fields of environmental monitoring, intelligent transportation, intelligent home, health monitoring, etc. The wireless sensor network is composed of a group of small, low-cost sensor nodes that are capable of sensing environmental information, such as temperature, humidity, illumination intensity, etc., and transmitting data to a data center via wireless communication. However, how to efficiently perform large-scale wireless sensor network data fusion is an important and challenging problem due to sensor heterogeneity and data heterogeneity.

The energy of wireless sensors is typically limited, so in designing wireless sensor networks, consideration is often given to how to optimize the location and operating time of the sensor to minimize energy consumption and maximize coverage. In addition, since the communication distance of the sensors is limited, it is also necessary to consider the communication problem between the sensors. Since wireless sensor networks are typically composed of multiple types of sensors, the data collected is heterogeneous and heterogeneous. For example, the data collected by the temperature sensor and the data collected by the humidity sensor may all differ in data type, unit, and range. Therefore, data cleaning and conversion are important steps in data fusion of wireless sensor networks. The data collected by the wireless sensor network typically has a spatiotemporal correlation. For example, there may be some correlation between data collected at different times at the same location, or data collected at the same time at different locations. Thus, techniques such as time series knowledge patterns can be used to analyze this correlation, thereby mining valuable patterns. Privacy security of data is an important issue when large-scale wireless sensor network data fusion is performed. Therefore, how to perform effective data fusion while ensuring data privacy is a challenging problem.

Disclosure of Invention

The present application aims to solve at least one of the technical problems existing in the prior art. Therefore, the application discloses a large-scale wireless sensor network data fusion method. The method provides a data fusion method of the wireless energy-carrying sensor network, can effectively process heterogeneous and heterogeneous data, and provides new possibility for data processing of the large-scale wireless sensor network with safety and privacy.

The application aims at realizing a large-scale wireless sensor network data fusion method by the following technical scheme that the method comprises the following steps:

step 1, designing a wireless sensor network comprising various heterogeneous sensors, and optimizing the deployment positions of various sensor nodes;

step 2, transmitting the original data acquired by each sensor to a data center through a wireless network for storage;

step 3, implementing a data cleaning and conversion process on the data center, and standardizing all original data into a unified data format;

and 4, performing association analysis and pattern mining fusion operation on the stored data on the data center.

The optimizing the deployment position of each sensor node comprises the following steps:

let the sensor set beEach sensor->Has a predetermined total energy +.>Energy consumption per unit time->Total working time->Working time->Definitions->For sensor->And->Distance between->For the set of areas that can be covered, +.>For communication range, if sensor +.>Can cover area +.>，/>Otherwise；

Establishing a deployment position optimization model, and setting an objective function as follows: on the premise of meeting the data acquisition requirement, the least sensors are used, namely, the number of the sensors is minimized, which is expressed as:the method comprises the steps of carrying out a first treatment on the surface of the The set constraint is expressed as follows:

；

all functional areas need to be covered; the energy of each sensor cannot exceed the preset energy level; the working time of each sensor must not exceed the preset working time; the communication distance between each pair of sensors must not exceed the maximum communication distance C only if the sensorsSelected and at least one other sensor is present +.>And->In direct communication, the->Can be collected;

wherein,is a binary sensor variable when the sensor +.>When selected, add->Otherwise->，/>Is a binary variable when the sensor is +.>And->When a direct communication link exists between them, +.>Otherwise->。

The process for solving the deployment position optimization model comprises the following steps:

step 101, initializing: setting the upper bound UB to be one larger size value +' infinity; initializing a lower bound LB as an optimal solution of the unconstrained problem;

step 102, creating a search tree: initializing a root node of a search tree, and taking the whole problem as a sub-problem of the root node; creating a set of variables including sensor variablesAnd communication variable->The method comprises the steps of carrying out a first treatment on the surface of the Creating a constraint condition set, including an area coverage constraint, an energy constraint, a working time constraint, a communication limit constraint and a sensor selection and intermediation sensor constraint, and defining an objective function, namely minimizing the number of sensors;

step 103, selecting branch variables: in the current sub-problem, selecting an unbranched variable for branching, and heuristically selecting a sensor variable with the greatest number of activity constraintsBranching is carried out; for the current sub-problem, calculate each sensor variable +.>The number of activity constraints representing the number of constraints the sensor variable involves in the current sub-problem; selecting the sensor variable with the maximum number of activity constraints +.>As a branching variable, this means that the sensor variable with the greatest influence or relevance is selected for branching, helping to converge more quickly to the optimal solution; branching with the variable whose estimated computation time is shortest if there are a plurality of sensor variables having the same maximum number of activity constraints; for each sensor variable +.>Obtaining a predicted calculation time by accumulating calculation times of operations related to the sensor variables;

step 104, for selected branch variablesTwo sub-problems are created: sub-problem one: will->Setting to 1, and updating the problem according to the constraint condition; secondary problems: will->Setting to 0, and updating the problem according to the constraint condition;

step 105, for each sub-problem, solving the mixed integer programming problem to obtain a lower bound LB and a feasible solution, and discarding the sub-problem if the lower bound LB is greater than the current upper bound UB;

step 106, if the optimal solution of a certain sub-problem is smaller than the current upper bound UB, updating the upper bound UB to the optimal solution;

step 107, for each sub-problem, discarding the sub-problem if the optimal solution is greater than the current upper bound UB;

step 108, terminating the search if all the sub-problems in the search tree are discarded or if all the sub-problems in the search tree have been solved and an optimal solution is found;

step 109, if the termination condition is not satisfied, returning to step 103, and selecting the next variable which is not branched to perform branching operation;

step 110, returning to the optimal sensor selection scheme found in the search tree, meeting the data acquisition requirements and communication limitations, and minimizing the number of sensors.

The original data is transmitted through a wireless network, and the method comprises the following steps:

each sensorAcquisition of raw data->；

At each sensorData of->Add noise satisfying Laplace distribution or Gaussian distribution>Obtaining noisy data->For the noise of the Laplace distribution, the probability density function is:

；

wherein,is a location parameter, +.>Is a scale parameter, in satisfying->In the case of differential privacy, scale parametersWherein->Is a function->Sensitivity on proximity database, +.>Is a privacy budget;

data to be noisyHomomorphic encryption is carried out to obtain encrypted data +.>；

Each sensor will encrypt dataTo a data center.

After the data center collects all the encrypted data, the data center performs various forms of data fusion on the encrypted data, and is provided withFor fusion function->For all sets of encrypted dataThe encrypted data after fusion is +.>；

When the data center needs to deeply analyze the data, the data center uses the secret key to decrypt the encrypted data after fusion, and a decryption function is set asDecryption result is->Due to the homomorphic encryption and differential privacy properties。

Specifically, the encryption adopts Paillier encryption algorithm, and for plaintextAnd corresponding->The encryption and decryption functions are defined as follows:

encryption function:wherein->And->Is public key (L)>Is a random number;

decryption function:wherein->As a private key,。

specifically, the data cleaning and converting process comprises the following steps:

collecting raw data from different types of sensors;

after the data collection is completed, cleaning the data, including deleting redundant data, identifying and processing missing values, abnormal values and the like;

since heterogeneous sensors may have different data acquisition frequencies and time stamps, alignment data is required;

because of possible inconsistency of data format and units among the sensors, data needs to be converted to achieve consistency;

and finally, fusing all the processed sensor data to form a data set with a uniform data format.

The unified data format includes the following elements:

timestamp: representing the time of data acquisition;

sensor identification: this data represents the source of the data, i.e., which sensor acquired the data;

location: representing the geographic location of the sensor;

measurement value: representing actual data collected by the sensor;

units: units representing measured values;

data quality: representing the quality of this data, is a signal strength or other indicator.

The step of carrying out relevance analysis on the stored data is as follows:

constructing a time sequence knowledge graph: constructing a time-series knowledge graph based on the collected sensor data, wherein each sensor is considered an entity, each measurement is considered another entity, the "measurement" relationship between the sensor and the measurement, and the "on" relationship between the measurement and the timestamp is considered an edge;

identifying the relevance: for any two sensors, their measurements at different times are analyzed to identify their correlation, by calculating the correlation coefficients of their measurements, defined as:

；

wherein,representing covariance +_>Represents standard deviation->And->Representing sensor +.>And->Is a sequence of measurements of (1);

adding a relevance edge: for the sensor pairs with the correlation coefficient larger than a certain threshold value, adding a correlation edge between the sensor pairs, and taking the correlation number as the weight of the correlation edge;

correlation analysis: and finding out important modes in the sensor network by analyzing the attribute values and the relevance edges.

The important modes include: a community detection algorithm is used to find a closely related sensor population or a path analysis method is used to identify key factors affecting a certain sensor measurement.

The mode of the mode mining comprises the following steps:

periodic mode: many sensor data will exhibit periodic variations, analytical steps: for a certain sensorUsing Fourier transform method to measure itQuantitative sequence->Conversion to the frequency domain, identification of the dominant frequency component, mathematical formula:wherein->Is the length of the sequence,/>From->To->，/>From->To the point of；

Abnormal mode: the sensor data may contain some outliers, the analysis step: for a certain sensorThe measurement sequence is detected by means of statistical methods or machine learning methods>Is an outlier of (2);

cluster mode: there are some closely related sensor groups in the sensor network, and their measurement data have similar change patterns; the analysis step: and clustering the measurement sequences of all the sensors by using a clustering algorithm, and then identifying cluster groups.

Compared with the prior art, the method has the advantages that: the technical scheme provides a large-scale wireless sensor network data fusion method, and sensor resources are effectively managed and distributed by means of an optimization model. The model can be optimized according to the energy, communication capacity and working time of the sensor, so that the maximum utilization of the sensor is realized, and the total quantity is reduced as much as possible; the method of the application cleans and converts the data aiming at the heterogeneity of the sensor network and the heterogeneity of the sensing data, and can process various types and formats of data so as to unify and normalize the data; the correlation between data is better understood by using a time-sequential knowledge-graph technique. This enables us to understand the data from different angles and levels, thereby mining deeper patterns and trends; the safety of the data is enhanced, and the safety of the sensor data in the transmission and fusion processes is effectively protected. The method of the application proves the accuracy and the reliability in practical application. Accurate and reliable results are obtained, both at the single sensor level and at the overall network level.

Drawings

Fig. 1 shows a schematic flow chart of an embodiment of the application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

In this example, we assume that we are running a large farm and use a wireless sensor network to collect data to optimize planting results. In farms we may use various types of sensors including temperature sensors, humidity sensors, illumination sensors, soil pH sensors etc.

All the sensors start working according to the set time and position, and various data are collected. Because of the heterogeneity of the sensors, after data acquisition is completed, the raw data needs to be cleaned and converted, and all the data is standardized into a uniform data format.

After data standardization, a time sequence knowledge graph is used for carrying out relevance analysis on various data. For example, by analyzing the combined distribution of temperature, humidity, light and pH, certain patterns can be found, such as greater effect of light on plant growth under specific humidity and temperature conditions. We can then use these modes to optimize the management decisions of the farm.

To protect the privacy of the data, we will encrypt the data using homomorphic encryption and add noise to the data using differential privacy methods before the sensor uploads the data. This allows us to perform data fusion and analysis on the data while ensuring data privacy.

Thus, a method for fusing large-scale wireless sensor network data, the method comprising:

let the sensor set be:each sensor->Has a predetermined total energy +.>Energy consumption per unit time->Total working time->Working time->Definitions->For sensor->And->Distance between->For the set of areas that can be covered, +.>For communication range, if sensor +.>Can cover area +.>，/>Otherwise；

Establishing a deployment position optimization model, and setting an objective function as follows: on the premise of meeting the data acquisition requirement, the least sensors are used, namely, the number of the sensors is minimized, which is expressed as:the set constraint is expressed as follows:

；

all functional areas need to be covered; the energy of each sensor cannot exceed the preset energy level; the working time of each sensor must not exceed the preset working time; the communication distance between each pair of sensors cannot exceed the maximum communication distance C; only when the sensor isSelected and at least one other sensor is present +.>And->In direct communication, the->Can be collected;

step 102, creating a search tree: initializing a root node of a search tree, and taking the whole problem as a sub-problem of the root node; creating a set of variables including sensor variablesAnd communication variable->The method comprises the steps of carrying out a first treatment on the surface of the Creating a set of constraints including an area coverage constraint, an energy constraint, a working time constraint, a communication constraint, and a sensor selectionSelecting and mediating sensor constraints, defining an objective function, namely minimizing the number of sensors;

each sensorAcquisition of raw data->；

At each sensorData of->Add noise satisfying Laplace distribution or Gaussian distribution>Obtaining noisy data->Probability density function for Laplace distributed noiseThe method comprises the following steps:

；

data to be noisyHomomorphic encryption is carried out to obtain encrypted data +.>Each sensor will encrypt data +.>To a data center.

After the data center collects all the encrypted data, the data center performs various forms of data fusion on the encrypted data, and is provided withFor fusion function->For the set of all encrypted data, the fused encrypted data is +>When the data center needs to deeply analyze the data, the data center uses the secret key to decrypt the encrypted data after fusion, and the decryption function is set as +.>Decryption result is->Due to homomorphic encryption and differential privacy properties +.>。

encryption function:wherein->And->Is public key (L)>Is a random number;

decryption function:wherein->As a private key,。

collecting raw data from different types of sensors;

The unified data format includes the following elements:

timestamp: representing the time of data acquisition;

location: representing the geographic location of the sensor;

measurement value: representing actual data collected by the sensor;

units: units representing measured values;

The step of carrying out relevance analysis on the stored data is as follows:

；

wherein,representing covariance +_>Represents standard deviation->And->Representing sensor +.>And->Is a sequence of measurements of (1); adding a relevance edge: for the sensor pairs with the correlation coefficient larger than a certain threshold value, adding a correlation edge between the sensor pairs, and taking the correlation number as the weight of the correlation edge;

The mode of the mode mining comprises the following steps:

periodic mode: many sensor data will exhibit periodic variations, analytical steps: for a certain sensorMeasuring the sequence by means of Fourier transformation method>Conversion to the frequency domain, identification of the dominant frequency component, mathematical formula:wherein->Is the length of the sequence,/>From->To->，/>From->To the point of；

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. The data fusion method for the large-scale wireless sensor network is characterized by comprising the following steps of:

step 4, carrying out association analysis and pattern mining fusion operation on the stored data on a data center;

let the sensor set be s= { S ₁ ，s ₂ ，...，s _n Each sensor s _i With a predetermined total energy E _i Energy consumption per unit time e _i Total working time T _i Operating time t _i Definition d _ij Is sensor s _i Sum s _j The distance between Sd is the set of areas that can be covered, D is the communication range, if the sensor s _i Can cover the areas j, a _ij =1, otherwise a _ij ＝0；

Establishing a deployment position optimization model, and setting an objective function as follows: the minimum number of sensors, i.e. the minimum number of sensors used, is indicative of meeting the data acquisition requirementsThe method comprises the following steps:the set constraint is expressed as follows:

the above constraints represent: all functional areas need to be covered; the energy of each sensor cannot exceed the preset total energy; the working time of each sensor must not exceed the preset total working time; the communication distance between each pair of sensors cannot exceed the maximum communication distance C; only when the sensor s _i Selected and at least one other sensor s is present _j And s _i S in direct communication _i Can be collected; x is x _i Is a binary sensor variable, when the sensor s _i When selected, x _i =1, otherwise x _i ＝0，y _ij Is a binary variable, when the sensor s _i And s _j When a direct communication link exists between the two, y _ij =1, otherwise y _ij ＝0；

Wherein, the original data is transmitted through a wireless network, comprising the following steps:

each sensor s _i The original data d is collected _i ；

At each sensor s _i Data d of (2) _i Adding noise n satisfying Laplace distribution or Gaussian distribution _i Obtaining noisy data d' _i ＝d _i +n _i For the noise of the Laplace distribution, the probability density function is:

P(x|μ，b)＝1/(2b)*exp(-|x-μ|/b)；

where μ is a location parameter, b is a scale parameter, where in case epsilon-differential privacy is satisfied, the scale parameter b = Δf/epsilon, where Δf is the sensitivity of the function f on the proximity database and epsilon is the privacy budget; will noisy data d' _i Homomorphic encryption is performed to obtain encrypted data ci=enc (d' _i )；

Each sensor will encrypt data c _i Transmitting to a data center;

after the data center collects all the encrypted data, carrying out various forms of data fusion on the encrypted data, and setting F as a fusion function, C as a set of all the encrypted data, wherein the fused encrypted data is C' =F (C); when the data center needs to deeply analyze the data, the data center uses the secret key to decrypt the encrypted data after fusion, and the decryption function is Dec, and the decryption result is x '=Dec (c').

2. The method for fusing large-scale wireless sensor network data according to claim 1, wherein the process of solving the deployment location optimization model comprises the following steps:

step 101, initializing: setting the upper bound UB to + -infinity; initializing a lower bound LB as an optimal solution of the unconstrained problem;

step 102, creating a search tree: initializing the root node of the search tree toThe whole problem is used as a sub-problem of the root node; creating a set of variables including the sensor variable x _i And the communication variable y _ij The method comprises the steps of carrying out a first treatment on the surface of the Creating a constraint condition set, including an area coverage constraint, an energy constraint, a working time constraint, a communication limit constraint and a sensor selection and intermediation sensor constraint, and defining an objective function, namely minimizing the number of sensors;

step 103, selecting branch variables: in the current sub-problem, selecting an unbranched variable for branching, and heuristically selecting the sensor variable x with the greatest number of activity constraints _i Branching is carried out; for the current sub-problem, calculate each sensor variable x _i The number of activity constraints representing the number of constraints the sensor variable involves in the current sub-problem; selecting a sensor variable x with a maximum number of activity constraints _i As a branching variable, this means that the sensor variable with the greatest influence or relevance is selected for branching, helping to converge more quickly to the optimal solution; branching with the variable whose estimated computation time is shortest if there are a plurality of sensor variables having the same maximum number of activity constraints; for each sensor variable x _i Obtaining a predicted calculation time by accumulating calculation times of operations related to the sensor variables;

step 104, for selected branch variable x _i Two sub-problems are created: sub-problem one: will x _i Setting to 1, and updating the problem according to the constraint condition; secondary problems: will x _i Setting to 0, and updating the problem according to the constraint condition;

step 107, for each sub-problem, discarding the sub-problem if the optimal solution is greater than the current upper bound UB; step 108, terminating the search if all the sub-problems in the search tree are discarded or if all the sub-problems in the search tree have been solved and an optimal solution is found;

3. The method for fusing large-scale wireless sensor network data according to claim 2, wherein in the encrypting process, for plaintext x and corresponding ciphertext C, encrypting and decrypting functions are defined as follows:

encryption function: enc (x) =g ^x *r ⁿ (mod n ² ) Wherein g and n are public keys and r is a random number;

decryption function: dec (c) =l (c) ^λ modn ² )/L(gλmodn ² ) Where λ is the private key, L (u) = (u-1)/n.

4. The method for fusing data of a large-scale wireless sensor network according to claim 1, wherein the data cleaning and converting process comprises the steps of:

collecting raw data from different types of sensors;

cleaning the data after the data collection is completed, including deleting redundant data, identifying and processing missing values and abnormal values; heterogeneous sensor alignment data;

converting the sensor data;

5. The method for fusing large-scale wireless sensor network data according to claim 4, wherein the unified data format comprises the following elements:

timestamp: representing the time of data acquisition;

location: representing the geographic location of the sensor;

measurement value: representing actual data collected by the sensor;

units: units representing measured values;

6. The method for fusing large-scale wireless sensor network data according to claim 1, wherein the step of performing correlation analysis on the stored data is as follows:

r _ij ＝cov(D _i ，D _j )/(σ _i *σ _j )；

wherein cov represents covariance, σ represents standard deviation, D _i And D _j Representing a sequence of measurements of sensors i and j;

7. The method for fusing data of a large-scale wireless sensor network according to claim 6, wherein the important mode comprises: a community detection algorithm is used to find a closely related sensor population or a path analysis method is used to identify key factors affecting a certain sensor measurement.

8. The method for fusing data of a large-scale wireless sensor network according to claim 1, wherein the pattern mining comprises:

periodic mode: many sensor data will exhibit periodic variations, analytical steps: for a certain sensor s _i Measuring the sequence D by using a Fourier transform method _i Conversion to the frequency domain, identification of the dominant frequency component, mathematical formula:where N is the length of the sequence, j is from 0 to N-1, and k is from 0 to N-1;

abnormal mode: the sensor data contains some outliers, the analysis step: for a certain sensor s _i Using statistical or machine learning methods to detect its measurement sequence D _i Is an outlier of (2);