CN111274349A - Public security data hierarchical indexing method and device based on information entropy - Google Patents
Public security data hierarchical indexing method and device based on information entropy Download PDFInfo
- Publication number
- CN111274349A CN111274349A CN202010072369.9A CN202010072369A CN111274349A CN 111274349 A CN111274349 A CN 111274349A CN 202010072369 A CN202010072369 A CN 202010072369A CN 111274349 A CN111274349 A CN 111274349A
- Authority
- CN
- China
- Prior art keywords
- keyword
- data
- keywords
- hierarchical
- root node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明实施例提供一种基于信息熵的公共安全数据分级索引方法及装置,所述方法包括:获取待查询公共安全数据的关键字;根据预设索引结构对所述关键字进行索引;其中,所述预设索引结构是根据由信息熵表示的、表征各关键字之间关联程度的互信息量确定的分级索引结构。所述装置执行上述方法。本发明实施例提供的基于信息熵的公共安全数据分级索引方法及装置,通过由信息熵表示的互信息量确定的分级索引结构对公共安全数据的关键字进行索引,能够提高在公共安全数据索引时的索引速度。
Embodiments of the present invention provide a hierarchical indexing method and device for public safety data based on information entropy. The method includes: obtaining keywords for public safety data to be queried; indexing the keywords according to a preset index structure; wherein, The preset index structure is a hierarchical index structure determined based on the amount of mutual information represented by information entropy and representing the degree of association between keywords. The device executes the above method. The public safety data hierarchical indexing method and device based on information entropy provided by the embodiments of the present invention can index the keywords of public safety data through a hierarchical index structure determined by the amount of mutual information represented by information entropy, which can improve public safety data indexing. indexing speed.
Description
技术领域technical field
本发明涉及数据索引技术领域,尤其涉及一种基于信息熵的公共安全数据分级索引方法及装置。The present invention relates to the technical field of data indexing, in particular to a method and device for hierarchical indexing of public safety data based on information entropy.
背景技术Background technique
互联网社交网络、公共网站中含有大量的公共安全数据,但是,采集到的这些数据都杂乱无章,处于无序状态,查询这些数据会耗费过多的人力和物力。现有的数据文件系统对大量多源异构数据的存储和检索都存在瓶颈,数据由数据中心进行统一的存储和索引管理,当存储大量数据时,数据处理效率不高。因此,提出一种适用于公共安全数据快速检索的分级索引方法具有重要意义。Internet social networks and public websites contain a large amount of public safety data. However, the collected data is disorganized and in a disordered state. Querying these data will consume too much manpower and material resources. Existing data file systems have bottlenecks in the storage and retrieval of a large amount of multi-source heterogeneous data. The data is uniformly stored and managed by the data center. When a large amount of data is stored, the data processing efficiency is not high. Therefore, it is of great significance to propose a hierarchical indexing method suitable for fast retrieval of public safety data.
发明内容SUMMARY OF THE INVENTION
针对现有技术存在的问题,本发明实施例提供一种基于信息熵的公共安全数据分级索引方法及装置。In view of the problems existing in the prior art, the embodiments of the present invention provide a method and device for hierarchical indexing of public security data based on information entropy.
本发明实施例提供一种基于信息熵的公共安全数据分级索引方法,包括:An embodiment of the present invention provides a method for hierarchical indexing of public security data based on information entropy, including:
获取待查询公共安全数据的关键字;Get keywords for public safety data to be queried;
根据预设索引结构对所述关键字进行索引;其中,所述预设索引结构是根据由信息熵表示的、表征各关键字之间关联程度的互信息量确定的分级索引结构。The keywords are indexed according to a preset index structure; wherein, the preset index structure is a hierarchical index structure determined according to the amount of mutual information represented by the information entropy and characterizing the degree of association between the keywords.
其中,所述根据预设索引结构对所述关键字进行索引,包括:Wherein, the indexing of the keywords according to the preset index structure includes:
从所述预设索引结构的根节点开始逐级遍历所述关键字,并进行索引。From the root node of the preset index structure, the keywords are traversed step by step and indexed.
其中,预先构建所述预设索引结构,具体包括:Wherein, pre-constructing the preset index structure specifically includes:
获取由各根节点关键字分别表示的词频表;所述词频表记录有属于同一根节点关键字的各数据关键字的词频数;Acquiring word frequency tables respectively represented by each root node keyword; the word frequency table records the word frequencies of each data keyword belonging to the same root node keyword;
根据所述各根节点关键字对应的词频表数量、属于同一根节点关键字的各数据关键字的词频数,确定各数据关键字分别对应的信息熵;According to the number of word frequency tables corresponding to each root node keyword and the word frequency number of each data keyword belonging to the same root node keyword, determine the information entropy corresponding to each data keyword respectively;
根据各数据关键字分别对应的信息熵,确定所述互信息量;Determine the mutual information amount according to the information entropy corresponding to each data keyword respectively;
确定各根节点关键字为所述预设索引结构中的根节点,根据所述互信息量,确定与所述互信息量相对应的数据关键字为所述根节点下的分级节点,并根据所有分级节点构建所述预设索引结构。Determine that each root node keyword is a root node in the preset index structure, and according to the mutual information, determine that the data keyword corresponding to the mutual information is a hierarchical node under the root node, and according to All hierarchical nodes build the preset index structure.
其中,所述根据所述各根节点关键字对应的词频表数量、属于同一根节点关键字的各数据关键字的词频数,确定各数据关键字分别对应的信息熵,包括:Wherein, determining the information entropy corresponding to each data keyword according to the number of word frequency tables corresponding to each root node keyword and the word frequency of each data keyword belonging to the same root node keyword, including:
根据所述各根节点关键字对应的词频表数量,确定属于同一根节点关键字的全部数据关键字对应的权重;Determine the weights corresponding to all data keywords belonging to the same root node keyword according to the number of word frequency tables corresponding to each root node keyword;
根据属于同一根节点关键字的各数据关键字的词频数,确定各数据关键字在与其对应的根节点关键字所表示的词频表中出现的概率;According to the word frequency of each data keyword belonging to the same root node keyword, determine the probability of each data keyword appearing in the word frequency table represented by its corresponding root node keyword;
根据所述权重和所述概率,确定各数据关键字分别对应的信息熵。According to the weight and the probability, the information entropy corresponding to each data keyword is determined.
其中,所述根据所述权重和所述概率,确定各数据关键字分别对应的信息熵,包括:Wherein, determining the information entropy corresponding to each data keyword according to the weight and the probability includes:
根据如下公式计算各数据关键字分别对应的信息熵:Calculate the information entropy corresponding to each data keyword according to the following formula:
其中,H(X)为与关键字x对应的信息熵、λ为所述权重、P(x)为所述概率、x为所述关键字、X为包含有所述词频表的关键字集合。Wherein, H(X) is the information entropy corresponding to the keyword x, λ is the weight, P(x) is the probability, x is the keyword, and X is the keyword set including the word frequency table .
其中,所述根据各数据关键字分别对应的信息熵,确定所述互信息量,包括:Wherein, determining the mutual information amount according to the information entropy corresponding to each data keyword includes:
根据如下公式确定所述互信息量:The mutual information is determined according to the following formula:
I(X;Y)=H(Y)-H(Y|X)I(X;Y)=H(Y)-H(Y|X)
其中,I(X;Y)为所述互信息量、H(Y)为与关键字x关联的关键字y对应的信息熵、H(Y|X)根据如下公式进行计算:Wherein, I(X; Y) is the amount of mutual information, H(Y) is the information entropy corresponding to the keyword y associated with the keyword x, and H(Y|X) is calculated according to the following formula:
其中,H(Y|X)为y对x的期望、p(x,y)为关键字x和关键字y在与其对应的根节点关键字所表示的词频表中同时出现的概率、p(y|x)为关键字x在与其对应的根节点关键字所表示的词频表中出现的条件下关键字y出现的概率。Among them, H(Y|X) is the expectation of y for x, p(x,y) is the probability that the keyword x and the keyword y appear at the same time in the word frequency table represented by the corresponding root node keyword, p( y|x) is the probability that the keyword y appears under the condition that the keyword x appears in the word frequency table represented by the corresponding root node keyword.
其中,所述根据所述互信息量,确定与所述互信息量相对应的数据关键字为所述根节点下的分级节点,并根据所有分级节点构建所述预设索引结构,包括:Wherein, according to the amount of mutual information, determining the data keyword corresponding to the amount of mutual information is a hierarchical node under the root node, and constructing the preset index structure according to all hierarchical nodes, including:
按照所述互信息量的互信息量数值大小顺序排列所述互信息量,并将前n个互信息量对应的关键字作为所述根节点的下一级分级节点;Arrange the mutual information in the order of the mutual information numerical value of the mutual information, and use the keywords corresponding to the first n mutual information as the next-level hierarchical node of the root node;
将在前n个互信息量之后的m个互信息量对应的关键字作为所述下一级分级节点的下一级分级节点,并重复执行,直到遍历完成全部互信息量对应的关键字。The keywords corresponding to the m mutual information amounts after the first n mutual information amounts are regarded as the next-level hierarchical node of the next-level hierarchical node, and the execution is repeated until the traversal of the keywords corresponding to all the mutual information amounts is completed.
本发明实施例提供一种基于信息熵的公共安全数据分级索引装置,包括:An embodiment of the present invention provides a public security data hierarchical indexing device based on information entropy, including:
获取单元,用于获取待查询公共安全数据的关键字;an acquisition unit, used to acquire the keywords of the public safety data to be queried;
索引单元,用于根据预设索引结构对所述关键字进行索引;其中,所述预设索引结构是根据由信息熵表示的、表征各关键字之间关联程度的互信息量确定的分级索引结构。an indexing unit, configured to index the keywords according to a preset index structure; wherein, the preset index structure is a hierarchical index determined according to the amount of mutual information represented by the information entropy and characterizing the degree of association between the keywords structure.
本发明实施例提供一种电子设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,An embodiment of the present invention provides an electronic device, including: a memory, a processor, and a computer program stored in the memory and running on the processor, wherein,
所述处理器执行所述程序时实现如下方法步骤:When the processor executes the program, the following method steps are implemented:
获取待查询公共安全数据的关键字;Get keywords for public safety data to be queried;
根据预设索引结构对所述关键字进行索引;其中,所述预设索引结构是根据由信息熵表示的、表征各关键字之间关联程度的互信息量确定的分级索引结构。The keywords are indexed according to a preset index structure; wherein, the preset index structure is a hierarchical index structure determined according to the amount of mutual information represented by the information entropy and characterizing the degree of association between the keywords.
本发明实施例提供一种非暂态计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如下方法步骤:An embodiment of the present invention provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following method steps are implemented:
获取待查询公共安全数据的关键字;Get keywords for public safety data to be queried;
根据预设索引结构对所述关键字进行索引;其中,所述预设索引结构是根据由信息熵表示的、表征各关键字之间关联程度的互信息量确定的分级索引结构。The keywords are indexed according to a preset index structure; wherein, the preset index structure is a hierarchical index structure determined according to the amount of mutual information represented by the information entropy and characterizing the degree of association between the keywords.
本发明实施例提供的基于信息熵的公共安全数据分级索引方法及装置,通过由信息熵表示的互信息量确定的分级索引结构对公共安全数据的关键字进行索引,能够提高在公共安全数据索引时的索引速度。The method and device for hierarchical indexing of public security data based on information entropy provided by the embodiments of the present invention index the keywords of public security data through the hierarchical index structure determined by the mutual information amount represented by information entropy, which can improve the performance of public security data indexing. indexing speed.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.
图1为本发明基于信息熵的公共安全数据分级索引方法实施例流程图;1 is a flowchart of an embodiment of a method for hierarchical indexing of public safety data based on information entropy according to the present invention;
图2为本发明实施例预设索引结构示意图;FIG. 2 is a schematic diagram of a preset index structure according to an embodiment of the present invention;
图3为本发明基于信息熵的公共安全数据分级索引装置实施例结构示意图;3 is a schematic structural diagram of an embodiment of an apparatus for hierarchical indexing of public safety data based on information entropy according to the present invention;
图4为本发明实施例提供的电子设备实体结构示意图。FIG. 4 is a schematic diagram of a physical structure of an electronic device according to an embodiment of the present invention.
具体实施方式Detailed ways
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
图1为本发明基于信息熵的公共安全数据分级索引方法实施例流程图,如图1所示,本发明实施例提供的一种基于信息熵的公共安全数据分级索引方法,包括以下步骤:1 is a flow chart of an embodiment of a method for grading and indexing public safety data based on information entropy according to the present invention. As shown in FIG. 1 , a method for grading and indexing public safety data based on information entropy provided by an embodiment of the present invention includes the following steps:
S101:获取待查询公共安全数据的关键字。S101: Obtain a keyword of the public safety data to be queried.
具体的,获取待查询公共安全数据的关键字。执行该方法步骤的可以是计算机设备,具体可以是服务器。公共安全数据包括,但不限于自然灾害数据,例如某地区的地震数据、台风数据等,参照上述举例,关键字可以包括地震和台风等,也可以具体为天然地震和人工地震等,进一步地,还可以具体为天然地震中的构造地震、火山地震和塌陷地震等。Specifically, the keyword of the public safety data to be queried is obtained. It may be a computer device that performs the method steps, and may specifically be a server. Public safety data includes, but is not limited to, natural disaster data, such as earthquake data, typhoon data, etc. in a certain area. Referring to the above examples, keywords may include earthquakes and typhoons, etc., and may also specifically be natural earthquakes and artificial earthquakes, etc. Further, It can also be specifically tectonic earthquakes, volcanic earthquakes and collapse earthquakes in natural earthquakes.
S102:根据预设索引结构对所述关键字进行索引;其中,所述预设索引结构是根据由信息熵表示的、表征各关键字之间关联程度的互信息量确定的分级索引结构。S102: Index the keywords according to a preset index structure; wherein, the preset index structure is a hierarchical index structure determined according to the amount of mutual information represented by information entropy and representing the degree of association between keywords.
具体的,根据预设索引结构对所述关键字进行索引;其中,所述预设索引结构是根据由信息熵表示的、表征各关键字之间关联程度的互信息量确定的分级索引结构。图2为本发明实施例预设索引结构示意图,如图2所示,N0为预设索引结构的根节点,具体包括k5和k30,参照上述举例,k5可以对应地震、k30可以对应台风,为第一分级。以地震为例,天然地震、人工地震和地震的关联程度更强,k14可以对应天然地震、k21可以对应人工地震,k5、k14和k21作为第二分级,构造地震、火山地震、塌陷地震和天然地震的关联程度更强,k16可以对应构造地震、k19可以对应火山地震、k15可以对应塌陷地震,k16、k19和k15等可以作为第三分级,对于k5和k21不再赘述。Specifically, the keywords are indexed according to a preset index structure, wherein the preset index structure is a hierarchical index structure determined according to the amount of mutual information represented by information entropy and characterizing the degree of association between the keywords. FIG. 2 is a schematic diagram of a preset index structure according to an embodiment of the present invention. As shown in FIG. 2, N0 is the root node of the preset index structure, which specifically includes k5 and k30. Referring to the above example, k5 can correspond to an earthquake, and k30 can correspond to a typhoon. first grade. Taking earthquakes as an example, natural earthquakes, artificial earthquakes and earthquakes are more closely related, k14 can correspond to natural earthquakes, k21 can correspond to artificial earthquakes, k5, k14 and k21 are used as the second classification, tectonic earthquakes, volcanic earthquakes, collapse earthquakes and natural earthquakes. The degree of correlation between earthquakes is stronger, k16 can correspond to tectonic earthquakes, k19 can correspond to volcanic earthquakes, k15 can correspond to collapse earthquakes, k16, k19 and k15 can be used as the third classification.
即上述预设索引结构为三级索引结构,其中N0对应第一分级索引结构、N1和N2对应第二分级索引结构、N3至N8对应第三分级索引结构。That is, the above preset index structure is a three-level index structure, wherein N0 corresponds to the first hierarchical index structure, N1 and N2 correspond to the second hierarchical index structure, and N3 to N8 correspond to the third hierarchical index structure.
由于,信息熵解决了对信息的量化度量问题,因此,本发明实施例的预设索引结构包含了数据所携带的信息量,同时,预设索引结构还包含了通过互信息量表征的信息关联程度。Since the information entropy solves the problem of quantitative measurement of information, the preset index structure in the embodiment of the present invention includes the amount of information carried by the data, and at the same time, the preset index structure also includes the information association represented by the amount of mutual information. degree.
所述根据预设索引结构对所述关键字进行索引,可以具体包括:从所述预设索引结构的根节点开始逐级遍历所述关键字,并进行索引。即假设用户在查询关键字时,算法根据预设索引结构从根节点开始进行搜索,从N0开始,经过N1查询节点中关键字,若存在所需关键字,则返回该关键字的孩子节点下所有的数据,数据显示排序方式可以根据互信息量的数值大小来展示;若N1节点中不存在所需关键字,则搜索N3中关键字,如果存在所需关键字,则返回该关键字的孩子节点下所有的数据,以此类推,不再赘述。The indexing the keywords according to the preset index structure may specifically include: traversing the keywords step by step starting from the root node of the preset index structure, and performing the indexing. That is, it is assumed that when a user searches for a keyword, the algorithm starts searching from the root node according to the preset index structure, starting from N0, and passes through N1 to query the keyword in the node. If the required keyword exists, it will return to the child node of the keyword. All data, the data display and sorting method can be displayed according to the value of mutual information; if the required keyword does not exist in the N1 node, search for the keyword in N3, if there is the required keyword, return the keyword of the keyword All data under the child node, and so on, will not be repeated.
本发明实施例提供的基于信息熵的公共安全数据分级索引方法,通过由信息熵表示的互信息量确定的分级索引结构对公共安全数据的关键字进行索引,能够提高在公共安全数据索引时的索引速度。The method for hierarchical indexing of public safety data based on information entropy provided by the embodiment of the present invention indexes the keywords of public safety data through a hierarchical index structure determined by the mutual information amount represented by information entropy, which can improve the accuracy of public safety data indexing. indexing speed.
在上述实施例的基础上,所述根据预设索引结构对所述关键字进行索引,包括:On the basis of the foregoing embodiment, the indexing of the keywords according to the preset index structure includes:
具体的,从所述预设索引结构的根节点开始逐级遍历所述关键字,并进行索引。可参照上述说明,不再赘述。Specifically, starting from the root node of the preset index structure, the keywords are traversed step by step and indexed. The above description may be referred to, and details are not repeated here.
本发明实施例提供的基于信息熵的公共安全数据分级索引方法,进一步能够提高在公共安全数据索引时的索引速度。The method for hierarchical indexing of public safety data based on information entropy provided by the embodiment of the present invention can further improve the indexing speed when indexing public safety data.
在上述实施例的基础上,还包括:预先构建所述预设索引结构,具体包括:On the basis of the above-mentioned embodiment, the method further includes: constructing the preset index structure in advance, which specifically includes:
具体的,获取由各根节点关键字分别表示的词频表;所述词频表记录有属于同一根节点关键字的各数据关键字的词频数。参照上述举例,地震和台风分别为两个不同根节点关键字,以地震为例,由地震表示的词频表中记录有天然地震和人工地震的词频数,还可以包括构造地震、火山地震和塌陷地震的词频数。同一个根节点关键字可以对应一个或多个词频表,同一个数据来源可以对应一个词频表,参照上述举例,如果地震的数据来源有两个,则对应两个词频表、如果台风的数据来源有三个,则对应三个词频表。Specifically, a word frequency table represented by each root node keyword is obtained; the word frequency table records the word frequency of each data keyword belonging to the same root node keyword. Referring to the above example, earthquake and typhoon are two different root node keywords. Taking earthquake as an example, the word frequency table represented by earthquake records the word frequencies of natural earthquakes and artificial earthquakes, and can also include tectonic earthquakes, volcanic earthquakes and collapses. The word frequency of earthquakes. The same root node keyword can correspond to one or more word frequency tables, and the same data source can correspond to one word frequency table. Referring to the above example, if there are two data sources for earthquakes, there are two word frequency tables, and if there are two data sources for typhoons There are three, corresponding to three word frequency tables.
具体的,根据所述各根节点关键字对应的词频表数量、属于同一根节点关键字的各数据关键字的词频数,确定各数据关键字分别对应的信息熵;具体可以包括:Specifically, according to the number of word frequency tables corresponding to each root node keyword and the word frequency of each data keyword belonging to the same root node keyword, determine the information entropy corresponding to each data keyword respectively; specifically, it may include:
根据所述各根节点关键字对应的词频表数量,确定属于同一根节点关键字的全部数据关键字对应的权重;参照上述举例,地震对应两个词频表、台风对应三个词频表,即文件系统中总共包括五个词频表,因此,属于同一根节点关键字“地震”的全部数据关键字对应的权重为2/5,同理,属于同一根节点关键字“台风”的全部数据关键字对应的权重为3/5。According to the number of word frequency tables corresponding to each root node keyword, the weights corresponding to all data keywords belonging to the same root node keyword are determined; referring to the above example, earthquake corresponds to two word frequency tables, typhoon corresponds to three word frequency tables, that is, file The system includes a total of five word frequency tables. Therefore, the corresponding weight of all data keywords belonging to the same root node keyword "earthquake" is 2/5. Similarly, all data keywords belonging to the same root node keyword "typhoon" The corresponding weight is 3/5.
根据属于同一根节点关键字的各数据关键字的词频数,确定各数据关键字在与其对应的根节点关键字所表示的词频表中出现的概率;参照上述举例,地震对应的词频表分别记为表1和表2,对于表1,天然地震和人工地震的词频数分别为40次和60次,则数据关键字“天然地震”的概率为0.4、数据关键字“人工地震”的概率为0.6;对于表2,天然地震和人工地震的词频数分别为30次和70次,则数据关键字“天然地震”的概率为0.3、数据关键字“人工地震”的概率为0.7。According to the word frequency of each data keyword belonging to the same root node keyword, determine the probability of each data keyword appearing in the word frequency table represented by its corresponding root node keyword; For Table 1 and Table 2, for Table 1, the word frequencies of natural earthquakes and artificial earthquakes are 40 and 60, respectively, then the probability of the data keyword "natural earthquake" is 0.4, and the probability of the data keyword "artificial earthquake" is 0.6; for Table 2, the word frequencies of natural earthquakes and artificial earthquakes are 30 and 70, respectively, the probability of the data keyword "natural earthquake" is 0.3, and the probability of the data keyword "artificial earthquake" is 0.7.
根据所述权重和所述概率,确定各数据关键字分别对应的信息熵,可以具体包括:According to the weight and the probability, the information entropy corresponding to each data keyword is determined, which may specifically include:
根据如下公式计算各数据关键字分别对应的信息熵:Calculate the information entropy corresponding to each data keyword according to the following formula:
其中,H(X)为与关键字x对应的信息熵、λ为所述权重、P(x)为所述概率、x为所述关键字、X为包含有所述词频表的关键字集合,参照上述举例,对于“地震”对应的“天然地震”,有:Wherein, H(X) is the information entropy corresponding to the keyword x, λ is the weight, P(x) is the probability, x is the keyword, and X is the keyword set including the word frequency table , referring to the above example, for the "natural earthquake" corresponding to "earthquake", there are:
H(X)=-0.4×(0.4log20.4+0.3log20.3)。H(X) = -0.4 x (0.4 log 2 0.4 + 0.3 log 2 0.3).
对于“地震”对应的“人工地震”,有:For the "artificial earthquake" corresponding to "earthquake", there are:
H(Y)=-0.6×(0.6log20.6+0.7log20.7)。H(Y) = -0.6 x (0.6 log 2 0.6 + 0.7 log 2 0.7).
需要说明的是:一般来说,若某条数据关键字的信息熵H(X)越小,表明该数据关键字提供的数据信息量越多,不确定性就越少,在数据集中的作用就越重要;相反,若某条数据关键字的信息熵H(X)越大,则表明该数据关键字提供的数据信息量越少,不确定性就越大,在数据集中的作用就越小。It should be noted that: in general, if the information entropy H(X) of a certain data keyword is smaller, it indicates that the amount of data information provided by the data keyword is more, and the uncertainty is less. On the contrary, if the information entropy H(X) of a certain data keyword is larger, it indicates that the amount of data information provided by the data keyword is less, the uncertainty is greater, and the effect in the data set is greater. Small.
具体的,根据各数据关键字分别对应的信息熵,确定所述互信息量;具体可以包括:Specifically, according to the information entropy corresponding to each data keyword, the mutual information amount is determined; specifically, the amount of mutual information may include:
根据如下公式确定所述互信息量:The mutual information is determined according to the following formula:
I(X;Y)=H(Y)-H(Y|X)I(X;Y)=H(Y)-H(Y|X)
其中,I(X;Y)为所述互信息量、H(Y)为与关键字x关联的关键字y对应的信息熵、H(Y|X)根据如下公式进行计算:Wherein, I(X; Y) is the amount of mutual information, H(Y) is the information entropy corresponding to the keyword y associated with the keyword x, and H(Y|X) is calculated according to the following formula:
其中,H(Y|X)为y对x的期望、p(x,y)为关键字x和关键字y在与其对应的根节点关键字所表示的词频表中同时出现的概率、p(y|x)为关键字x在与其对应的根节点关键字所表示的词频表中出现的条件下关键字y出现的概率。p(x,y)和p(y|x)的计算为本领域成熟技术,不再赘述。Among them, H(Y|X) is the expectation of y for x, p(x,y) is the probability that the keyword x and the keyword y appear at the same time in the word frequency table represented by the corresponding root node keyword, p( y|x) is the probability that the keyword y appears under the condition that the keyword x appears in the word frequency table represented by the corresponding root node keyword. The calculation of p(x, y) and p(y|x) is a mature technology in the art, and will not be repeated here.
其中,I(X;Y)=H(Y)-H(Y|X)可以根据如下公式变换得到:Among them, I(X; Y)=H(Y)-H(Y|X) can be obtained by transforming according to the following formula:
其中,P(X,Y)表示关键字x,y同时出现的概率,P(X)、P(Y)分别表示关键字x和关键字y出现的概率,具体的变换方法为本领域成熟技术,不再赘述。Among them, P(X,Y) represents the probability of the keywords x and y appearing at the same time, P(X), P(Y) represent the probability of the keyword x and the keyword y appearing respectively, the specific transformation method is the mature technology in the field ,No longer.
互信息量可以说明两个关键字之间的关联程度的强弱。I(X;Y)表示由于X的存在而使Y的不确定性减小的量。I越大,说明X出现后,Y出现的不确定度越小,即Y很可能也会出现,也就说明X,Y关系越密切。因此,在查看包含关键字X的数据时,很可能也需要查看包含关键字Y的数据。The amount of mutual information can indicate the strength of the correlation between two keywords. I(X; Y) represents the amount by which the uncertainty in Y is reduced due to the presence of X. The larger the I, the smaller the uncertainty of the appearance of Y after the appearance of X, that is, the appearance of Y is likely to appear, which means that the relationship between X and Y is closer. Therefore, when looking at data containing keyword X, it is likely that you also need to look at data containing keyword Y.
具体的,确定各根节点关键字为所述预设索引结构中的根节点,根据所述互信息量,确定与所述互信息量相对应的数据关键字为所述根节点下的分级节点,并根据所有分级节点构建所述预设索引结构,具体可以包括:Specifically, each root node key is determined as a root node in the preset index structure, and according to the mutual information, the data key corresponding to the mutual information is determined as a hierarchical node under the root node , and construct the preset index structure according to all hierarchical nodes, which may specifically include:
按照所述互信息量的互信息量数值大小顺序排列所述互信息量,并将前n个互信息量对应的关键字作为所述根节点的下一级分级节点;即按照互信息量的互信息量数值由大到小的顺序,顺序排列所述互信息量,参照图2,n的取值为3。Arrange the mutual information in the order of the mutual information numerical value of the mutual information, and use the keywords corresponding to the first n mutual information as the next-level hierarchical node of the root node; that is, according to the mutual information The mutual information values are arranged in order from large to small. Referring to FIG. 2 , the value of n is 3.
将在前n个互信息量之后的m个互信息量对应的关键字作为所述下一级分级节点的下一级分级节点,并重复执行,直到遍历完成全部互信息量对应的关键字。参照图2,m的取值为6,参照图2,此时,遍历完成全部互信息量对应的关键字,因此,不再重复执行上述步骤,如果没有遍历完成全部互信息量对应的关键字,则继续执行将在前m个互信息量之后的r个互信息量对应的关键字作为再下一级分级节点的步骤,直到遍历完成全部互信息量对应的关键字。The keywords corresponding to the m mutual information amounts after the first n mutual information amounts are regarded as the next-level hierarchical node of the next-level hierarchical node, and the execution is repeated until the traversal of the keywords corresponding to all the mutual information amounts is completed. Referring to Figure 2, the value of m is 6. Referring to Figure 2, at this time, the keywords corresponding to all the mutual information are traversed and completed. Therefore, the above steps are not repeated. If the keywords corresponding to all the mutual information are not traversed and completed , then continue to perform the step of using the keywords corresponding to the r mutual information quantities after the first m mutual information quantities as the next-level hierarchical node, until the traversal of the keywords corresponding to all the mutual information quantities is completed.
多层分级索引的目的是避免全表扫描,是提高社会公共安全风险数据管理及查询效率的有效方式。在多层分级索引结构中,局部数据节点与全局索引之间并非一一映射,而是通过全局索引定位到具体的数据节点,再通过数据节点的局部索引来操作数据。The purpose of the multi-layer hierarchical index is to avoid full table scans, and it is an effective way to improve the efficiency of social and public security risk data management and query. In the multi-layer hierarchical index structure, there is no one-to-one mapping between local data nodes and global indexes, but a specific data node is located through the global index, and then the data is manipulated through the local index of the data node.
本发明实施例采用B+树作为数据索引结构。其中,B+树的叶子节点存储相关的公共安全文件信息,内部节点用于存储文件的关键字,并且内部节点关键字帮助完成索引过程。树中的每个节点存储一个信息熵H,节点N的结构为:The embodiment of the present invention adopts the B+ tree as the data index structure. Among them, the leaf node of the B+ tree stores the relevant public security file information, the internal node is used to store the key of the file, and the key of the internal node helps to complete the indexing process. Each node in the tree stores an information entropy H, and the structure of node N is:
N={num,children[m],H}N={num,children[m],H}
其中,num是节点N的编号;children[m]是指向孩子节点的指针,m是B+树的阶数;H是存储节点信息熵的向量。需要说明的是:对于根节点,H存储信息熵、对于分级节点,H存储信息熵和互信息量。Among them, num is the number of the node N; children[m] is the pointer to the child node, m is the order of the B+ tree; H is the vector that stores the information entropy of the node. It should be noted that: for the root node, H stores information entropy, and for hierarchical nodes, H stores information entropy and mutual information.
基于B+树的索引是通过插入数据来完成的,每次插入数据的操作会将其插入到叶子节点中,数据存储的顺序则是按照上一步骤中计算的互信息量数值的大小进行一一存储。B+树中指向文件的指针都会存储在叶子节点中,上一层节点的键值是多个叶子节点中共有的关键字。The index based on B+ tree is completed by inserting data. Each time the data is inserted, it will be inserted into the leaf node. The order of data storage is based on the size of the mutual information value calculated in the previous step. storage. The pointers to the files in the B+ tree are stored in the leaf nodes, and the key value of the upper node is the key shared by multiple leaf nodes.
本发明实施例提供的基于信息熵的公共安全数据分级索引方法,通过构建预设索引结构,有助于提高在公共安全数据索引时的索引速度。The method for hierarchical indexing of public safety data based on information entropy provided by the embodiment of the present invention helps to improve the indexing speed when indexing public safety data by constructing a preset index structure.
在上述实施例的基础上,所述根据所述各根节点关键字、属于同一根节点关键字的各数据关键字的词频数,确定各数据关键字分别对应的信息熵,包括:On the basis of the above embodiment, determining the information entropy corresponding to each data keyword according to each root node keyword and the word frequency of each data keyword belonging to the same root node keyword includes:
具体的,根据所述各根节点关键字对应的词频表数量,确定属于同一根节点关键字的全部数据关键字对应的权重;可参照上述说明,不再赘述。Specifically, according to the number of word frequency tables corresponding to each root node keyword, the weights corresponding to all data keywords belonging to the same root node keyword are determined; the above description can be referred to, and details are not repeated here.
具体的,根据属于同一根节点关键字的各数据关键字的词频数,确定各数据关键字在与其对应的根节点关键字所表示的词频表中出现的概率;可参照上述说明,不再赘述。Specifically, according to the word frequency of each data keyword belonging to the same root node keyword, the probability of each data keyword appearing in the word frequency table represented by its corresponding root node keyword is determined; refer to the above description, and will not repeat it. .
具体的,根据所述权重和所述概率,确定各数据关键字分别对应的信息熵。可参照上述说明,不再赘述。Specifically, the information entropy corresponding to each data keyword is determined according to the weight and the probability. The above description may be referred to, and details are not repeated here.
本发明实施例提供的基于信息熵的公共安全数据分级索引方法,进一步有助于提高在公共安全数据索引时的索引速度。The method for hierarchical indexing of public safety data based on information entropy provided by the embodiments of the present invention further helps to improve the indexing speed when indexing public safety data.
在上述实施例的基础上,所述根据所述权重和所述概率,确定各数据关键字分别对应的信息熵,包括:On the basis of the foregoing embodiment, determining the information entropy corresponding to each data keyword according to the weight and the probability includes:
具体的,根据如下公式计算各数据关键字分别对应的信息熵:Specifically, the information entropy corresponding to each data keyword is calculated according to the following formula:
其中,H(X)为与关键字x对应的信息熵、λ为所述权重、P(x)为所述概率、x为所述关键字、X为包含有所述词频表的关键字集合。可参照上述说明,不再赘述。Wherein, H(X) is the information entropy corresponding to the keyword x, λ is the weight, P(x) is the probability, x is the keyword, and X is the keyword set including the word frequency table . The above description may be referred to, and details are not repeated here.
本发明实施例提供的基于信息熵的公共安全数据分级索引方法,进一步有助于提高在公共安全数据索引时的索引速度。The method for hierarchical indexing of public safety data based on information entropy provided by the embodiments of the present invention further helps to improve the indexing speed when indexing public safety data.
在上述实施例的基础上,所述根据各数据关键字分别对应的信息熵,确定所述互信息量,包括:On the basis of the foregoing embodiment, determining the mutual information amount according to the information entropy corresponding to each data keyword includes:
具体的,根据如下公式确定所述互信息量:Specifically, the mutual information amount is determined according to the following formula:
I(X;Y)=H(Y)-H(Y|X)I(X;Y)=H(Y)-H(Y|X)
其中,I(X;Y)为所述互信息量、H(Y)为与关键字x关联的关键字y对应的信息熵、H(Y|X)根据如下公式进行计算:Wherein, I(X; Y) is the amount of mutual information, H(Y) is the information entropy corresponding to the keyword y associated with the keyword x, and H(Y|X) is calculated according to the following formula:
其中,H(Y|X)为y对x的期望、p(x,y)为关键字x和关键字y在与其对应的根节点关键字所表示的词频表中同时出现的概率、p(y|x)为关键字x在与其对应的根节点关键字所表示的词频表中出现的条件下关键字y出现的概率。可参照上述说明,不再赘述。Among them, H(Y|X) is the expectation of y for x, p(x,y) is the probability that the keyword x and the keyword y appear at the same time in the word frequency table represented by the corresponding root node keyword, p( y|x) is the probability that the keyword y appears under the condition that the keyword x appears in the word frequency table represented by the corresponding root node keyword. The above description may be referred to, and details are not repeated here.
本发明实施例提供的基于信息熵的公共安全数据分级索引方法,进一步有助于提高在公共安全数据索引时的索引速度。The method for hierarchical indexing of public safety data based on information entropy provided by the embodiments of the present invention further helps to improve the indexing speed when indexing public safety data.
在上述实施例的基础上,所述根据所述互信息量,确定与所述互信息量相对应的数据关键字为所述根节点下的分级节点,并根据所有分级节点构建所述预设索引结构,包括:On the basis of the above embodiment, the data key corresponding to the mutual information is determined as a hierarchical node under the root node according to the mutual information, and the preset is constructed according to all hierarchical nodes Index structure, including:
具体的,按照所述互信息量的互信息量数值大小顺序排列所述互信息量,并将前n个互信息量对应的关键字作为所述根节点的下一级分级节点;可参照上述说明,不再赘述。Specifically, the mutual information quantities are arranged in the order of the mutual information quantities of the mutual information quantities, and the keywords corresponding to the first n mutual information quantities are used as the next-level hierarchical nodes of the root node; refer to the above description, and will not repeat them.
具体的,将在前n个互信息量之后的m个互信息量对应的关键字作为所述下一级分级节点的下一级分级节点,并重复执行,直到遍历完成全部互信息量对应的关键字。可参照上述说明,不再赘述。Specifically, the keywords corresponding to the m mutual information amounts after the first n mutual information amounts are used as the next-level hierarchical node of the next-level hierarchical node, and the execution is repeated until the traversal is completed. keywords. The above description may be referred to, and details are not repeated here.
本发明实施例提供的基于信息熵的公共安全数据分级索引方法,进一步有助于提高在公共安全数据索引时的索引速度。The method for hierarchical indexing of public safety data based on information entropy provided by the embodiments of the present invention further helps to improve the indexing speed when indexing public safety data.
利用本发明实施例提供的方法在对公共安全数据集进行存储时,具有以下优点:When using the method provided by the embodiment of the present invention to store the public safety data set, it has the following advantages:
1、考虑了根节点关键字在文件系统中权重,根据关键字信息熵对文件进行信息重要度排序,强调关键信息的重要性,减少对无用信息的处理;1. Considering the weight of the root node keyword in the file system, the information importance of the file is sorted according to the keyword information entropy, emphasizing the importance of key information, and reducing the processing of useless information;
2、计算关键字之间的互信息量,了解关键字之间的关联度大小,能够明确两条信息之间密切程度;2. Calculate the amount of mutual information between keywords, understand the degree of correlation between keywords, and be able to clarify the degree of closeness between two pieces of information;
3、索引结构采用B+树,与传统的存储模式相比,B+树更充分的利用了节点的空间,让查询速度更加稳定;同时B+树遍历整棵树只需要遍历所有的叶子节点即可,有利于做全文件扫描。3. The index structure adopts the B+ tree. Compared with the traditional storage mode, the B+ tree makes more full use of the node space and makes the query speed more stable; at the same time, the B+ tree only needs to traverse all the leaf nodes to traverse the entire tree. Useful for full document scanning.
图3为本发明基于信息熵的公共安全数据分级索引装置实施例结构示意图,如图3所示,本发明实施例提供了一种基于信息熵的公共安全数据分级索引装置,包括获取单元301和索引单元302,其中:FIG. 3 is a schematic structural diagram of an embodiment of an apparatus for grading and indexing public safety data based on information entropy according to the present invention. As shown in FIG. 3 , an embodiment of the present invention provides an apparatus for grading and indexing public safety data based on information entropy, including an obtaining
获取单元301用于获取待查询公共安全数据的关键字;索引单元302用于根据预设索引结构对所述关键字进行索引;其中,所述预设索引结构是根据由信息熵表示的、表征各关键字之间关联程度的互信息量确定的分级索引结构。The obtaining
具体的,获取单元301用于获取待查询公共安全数据的关键字;索引单元302用于根据预设索引结构对所述关键字进行索引;其中,所述预设索引结构是根据由信息熵表示的、表征各关键字之间关联程度的互信息量确定的分级索引结构。Specifically, the obtaining
本发明实施例提供的基于信息熵的公共安全数据分级索引装置,通过由信息熵表示的互信息量确定的分级索引结构对公共安全数据的关键字进行索引,能够提高在公共安全数据索引时的索引速度。The apparatus for hierarchical indexing of public safety data based on information entropy provided by the embodiment of the present invention indexes the keywords of public safety data through a hierarchical index structure determined by the mutual information amount represented by information entropy, which can improve the accuracy of public safety data indexing. indexing speed.
本发明实施例提供的基于信息熵的公共安全数据分级索引装置具体可以用于执行上述各方法实施例的处理流程,其功能在此不再赘述,可以参照上述方法实施例的详细描述。The information entropy-based public security data hierarchical indexing apparatus provided in the embodiment of the present invention can be specifically used to execute the processing flow of the above method embodiments, and its functions are not repeated here, and reference may be made to the detailed description of the above method embodiments.
图4为本发明实施例提供的电子设备实体结构示意图,如图4所示,所述电子设备包括:处理器(processor)401、存储器(memory)402和总线403;FIG. 4 is a schematic diagram of the physical structure of an electronic device provided by an embodiment of the present invention. As shown in FIG. 4 , the electronic device includes: a processor (processor) 401, a memory (memory) 402, and a
其中,所述处理器401、存储器402通过总线403完成相互间的通信;The
所述处理器401用于调用所述存储器402中的程序指令,以执行上述各方法实施例所提供的方法,例如包括:获取待查询公共安全数据的关键字;根据预设索引结构对所述关键字进行索引;其中,所述预设索引结构是根据由信息熵表示的、表征各关键字之间关联程度的互信息量确定的分级索引结构。The
本实施例公开一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,计算机能够执行上述各方法实施例所提供的方法,例如包括:获取待查询公共安全数据的关键字;根据预设索引结构对所述关键字进行索引;其中,所述预设索引结构是根据由信息熵表示的、表征各关键字之间关联程度的互信息量确定的分级索引结构。This embodiment discloses a computer program product, the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer program The methods provided by the above method embodiments can be performed, for example, including: obtaining the keywords of the public safety data to be queried; indexing the keywords according to a preset index structure; wherein, the preset index structure is based on the information provided by the The entropy represents the hierarchical index structure determined by the mutual information that characterizes the degree of association between keywords.
本实施例提供一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令使所述计算机执行上述各方法实施例所提供的方法,例如包括:获取待查询公共安全数据的关键字;根据预设索引结构对所述关键字进行索引;其中,所述预设索引结构是根据由信息熵表示的、表征各关键字之间关联程度的互信息量确定的分级索引结构。This embodiment provides a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions cause the computer to execute the methods provided by the foregoing method embodiments, for example, including : obtain the keywords of the public safety data to be queried; index the keywords according to a preset index structure; wherein, the preset index structure is based on the relationship between the degree of association between the keywords represented by the information entropy A hierarchical index structure determined by the amount of information.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps of implementing the above method embodiments can be completed by program instructions related to hardware, the aforementioned program can be stored in a computer-readable storage medium, and when the program is executed, execute It includes the steps of the above method embodiments; and the aforementioned storage medium includes: ROM, RAM, magnetic disk or optical disk and other media that can store program codes.
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on this understanding, the above-mentioned technical solutions can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or some parts of the embodiments.
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010072369.9A CN111274349B (en) | 2020-01-21 | 2020-01-21 | A method and device for hierarchical indexing of public safety data based on information entropy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010072369.9A CN111274349B (en) | 2020-01-21 | 2020-01-21 | A method and device for hierarchical indexing of public safety data based on information entropy |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111274349A true CN111274349A (en) | 2020-06-12 |
CN111274349B CN111274349B (en) | 2020-12-15 |
Family
ID=71002820
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010072369.9A Active CN111274349B (en) | 2020-01-21 | 2020-01-21 | A method and device for hierarchical indexing of public safety data based on information entropy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111274349B (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060282455A1 (en) * | 2005-06-13 | 2006-12-14 | It Interactive Services Inc. | System and method for ranking web content |
US20070233649A1 (en) * | 2006-03-31 | 2007-10-04 | Microsoft Corporation | Hybrid location and keyword index |
CN101163032A (en) * | 2006-10-11 | 2008-04-16 | 中兴通讯股份有限公司 | Method of managing alarm inquiry |
CN101236615A (en) * | 2008-01-22 | 2008-08-06 | 安徽科大讯飞信息科技股份有限公司 | Intelligent pronunciation learning material creation method |
CN101236550A (en) * | 2007-02-01 | 2008-08-06 | 阿里巴巴公司 | Method and system for processing tree -type structure data |
CN102402602A (en) * | 2011-11-18 | 2012-04-04 | 航天科工深圳(集团)有限公司 | B + tree indexing method and device for real-time database |
CN103745008A (en) * | 2014-01-28 | 2014-04-23 | 河海大学 | Sorting method for big data indexing |
CN106021524A (en) * | 2016-05-24 | 2016-10-12 | 成都希盟泰克科技发展有限公司 | Working method for tree-augmented Navie Bayes classifier used for large data mining based on second-order dependence |
CN107170020A (en) * | 2017-06-06 | 2017-09-15 | 西北工业大学 | Dictionary learning still image compression method based on minimum quantization error criterion |
CN107341165A (en) * | 2016-04-29 | 2017-11-10 | 上海京东到家元信信息技术有限公司 | The method and apparatus for prompting display are carried out at search box |
CN108733781A (en) * | 2018-05-08 | 2018-11-02 | 安徽工业大学 | The cluster temporal data indexing means calculated based on memory |
CN106649597B (en) * | 2016-11-22 | 2019-10-01 | 浙江大学 | Method for auto constructing is indexed after a kind of books book based on book content |
-
2020
- 2020-01-21 CN CN202010072369.9A patent/CN111274349B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060282455A1 (en) * | 2005-06-13 | 2006-12-14 | It Interactive Services Inc. | System and method for ranking web content |
US20070233649A1 (en) * | 2006-03-31 | 2007-10-04 | Microsoft Corporation | Hybrid location and keyword index |
CN101163032A (en) * | 2006-10-11 | 2008-04-16 | 中兴通讯股份有限公司 | Method of managing alarm inquiry |
CN101236550A (en) * | 2007-02-01 | 2008-08-06 | 阿里巴巴公司 | Method and system for processing tree -type structure data |
CN101236615A (en) * | 2008-01-22 | 2008-08-06 | 安徽科大讯飞信息科技股份有限公司 | Intelligent pronunciation learning material creation method |
CN102402602A (en) * | 2011-11-18 | 2012-04-04 | 航天科工深圳(集团)有限公司 | B + tree indexing method and device for real-time database |
CN103745008A (en) * | 2014-01-28 | 2014-04-23 | 河海大学 | Sorting method for big data indexing |
CN107341165A (en) * | 2016-04-29 | 2017-11-10 | 上海京东到家元信信息技术有限公司 | The method and apparatus for prompting display are carried out at search box |
CN106021524A (en) * | 2016-05-24 | 2016-10-12 | 成都希盟泰克科技发展有限公司 | Working method for tree-augmented Navie Bayes classifier used for large data mining based on second-order dependence |
CN106649597B (en) * | 2016-11-22 | 2019-10-01 | 浙江大学 | Method for auto constructing is indexed after a kind of books book based on book content |
CN107170020A (en) * | 2017-06-06 | 2017-09-15 | 西北工业大学 | Dictionary learning still image compression method based on minimum quantization error criterion |
CN108733781A (en) * | 2018-05-08 | 2018-11-02 | 安徽工业大学 | The cluster temporal data indexing means calculated based on memory |
Also Published As
Publication number | Publication date |
---|---|
CN111274349B (en) | 2020-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11762876B2 (en) | Data normalization using data edge platform | |
US11281793B2 (en) | User permission data query method and apparatus, electronic device and medium | |
CN105706078B (en) | Automatic definition of entity collections | |
Kolomičenko et al. | Experimental comparison of graph databases | |
Hu et al. | Top-k spatio-textual similarity join | |
CN108509543A (en) | A kind of streaming RDF data multi-key word parallel search method based on Spark Streaming | |
US20150100605A1 (en) | Determining collection membership in a data graph | |
US9043321B2 (en) | Enhancing cluster analysis using document metadata | |
Jayathilake et al. | A study into the capabilities of NoSQL databases in handling a highly heterogeneous tree | |
CN105608135A (en) | Data mining method and system based on Apriori algorithm | |
Khan et al. | Predictive performance comparison analysis of relational & NoSQL graph databases | |
CN106874425A (en) | Real time critical word approximate search algorithm based on Storm | |
CN110222240A (en) | A kind of space RDF data keyword query method based on summary figure | |
CN118885673A (en) | A community search method, system and storage medium based on k-truss nested index | |
CN111274349B (en) | A method and device for hierarchical indexing of public safety data based on information entropy | |
CN113704248A (en) | Block chain query optimization method based on external index | |
CN113220820A (en) | Efficient SPARQL query response method, device and equipment based on graph | |
Slavov et al. | Fast processing of SPARQL queries on RDF quadruples | |
Li et al. | Answering why-not questions on top-k augmented spatial keyword queries | |
Han et al. | Join index hierarchy: An indexing structure for efficient navigation in object-oriented databases | |
CN116383247A (en) | An Efficient Query Method for Large-Scale Graph Data | |
CN111177189B (en) | Client optimization system and method based on user behavior analysis | |
CN114691845A (en) | Semantic search method and device, electronic equipment, storage medium and product | |
Zhong et al. | A distributed index for efficient parallel top-k keyword search on massive graphs | |
Huang et al. | Pisa: An index for aggregating big time series data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |