US20040158560A1 - Systems and methods for query expansion - Google Patents
Systems and methods for query expansion Download PDFInfo
- Publication number
- US20040158560A1 US20040158560A1 US10/365,294 US36529403A US2004158560A1 US 20040158560 A1 US20040158560 A1 US 20040158560A1 US 36529403 A US36529403 A US 36529403A US 2004158560 A1 US2004158560 A1 US 2004158560A1
- Authority
- US
- United States
- Prior art keywords
- terms
- query
- document
- expansion
- term
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000004044 response Effects 0.000 claims abstract description 8
- 230000015654 memory Effects 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims 10
- 230000000875 corresponding effect Effects 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000002093 peripheral effect Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000006855 networking Effects 0.000 description 3
- 230000002596 correlated effect Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000005055 memory storage Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3338—Query expansion
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99932—Access augmentation or optimizing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99934—Query formulation, input preparation, or translation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99935—Query augmenting and refining, e.g. inexact access
Definitions
- the invention pertains to information retrieval.
- Global analysis query expansion techniques do not typically address term mismatch.
- Global analysis techniques are based on the analysis of a corpus of data to generate statistical similarity matrixes of term pair co-occurrences. Such corpus-wide analysis is typically resource intensive, requiring substantial computer processing, memory, and data storage resources.
- the similarity matrixes are used to expand a query with additional terms that are most similar to the terms already in the query. By only adding “similar” terms to the query, and by not addressing the ambiguities that are inherent between words in language, this global analysis approach to query expansion does not address term mismatch, which is one of the most significant problems in query expansion.
- some query expansion techniques require explicit relevance information from the user, which can only be obtained by interrupting the task that the user is currently performing. To obtain this information, after submitting a query to a search engine and receiving a list of documents, rather that browsing the documents in the document list or submitting a new query, the user is asked to manually rank the relevance of the documents in the list. This may be accomplished by check-box selection, enumeration, or otherwise indicating that particular ones of the documents in the list are more relevant that others.
- search engine has no idea whether or not the user considered one document to be more relevant than another. This means that the search engine has no indication of any term that can be considered more relevant than another to a particular query. For this reason, explicit relevance feedback techniques are seldom used to expand queries.
- some query expansion techniques automatically assume that the top-ranked document(s) that are returned to the user in response to a query are relevant.
- the original queries from the user are then expanded with term(s) extracted from such top-ranked document(s).
- This technique becomes substantially problematic when a large fraction of the top-ranked documents are actually not relevant to the user's information need. In this situation, words drawn from such documents and added to the query are often unrelated to the information being sought and the quality of the documents retrieved using such an expanded query is typically poor.
- some query expansion techniques extract noun groups or “concepts” from a set of top-ranked documents. These noun groups are extracted based on co-occurrences with query terms and not based on the frequencies that the term(s) appear in the top-ranked documents.
- This technique is based on the hypothesis that a common term from the top-ranked documents will tend to co-occur with all query terms within the top-ranked documents. This hypothesis is not always true and often leads to improper query expansion. In other words, this technique is conducted in the document space only, without considering any judgments from users. It requires distinctive difference between the cluster of relevant documents and that of nion-relevant documents in the retrieval result. This is true for many cases but does not hold some time, especially for those inherently ambiguous queries.
- new terms are extracted from a newly submitted query.
- Terms to expand the new terms are identified to a relevant document list.
- the expansion term are identified at least in part on the new terms and probabilistic correlations from information in a query log.
- the query log information includes one or more query terms and a corresponding set of document identifiers (IDs).
- the query terms were previously submitted to a search engine.
- the document IDs represent each document selected from a list generated by the search engine in response to searching for information relevant to corresponding ones of the query terms.
- FIG. 1 is a block diagram of an exemplary computing environment within which systems and methods for query expansion may be implemented.
- FIG. 2 is a block diagram that shows further exemplary aspects of application programs and program data of the exemplary computing device of FIG. 1.
- FIG. 3 shows that correlations between terms of newly submitted queries and document terms can be established via information maintained in query sessions from a query log.
- FIG. 4 shows exemplary probabilistic correlations between query terms and document terms.
- FIG. 5 shows an exemplary procedure for query expansion.
- FIG. 1 illustrates an example of a suitable computing environment 120 on which the subsequently described systems, apparatuses and methods to expand queries may be implemented.
- Exemplary computing environment 120 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of systems and methods the described herein. Neither should computing environment 120 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in computing environment 120 .
- the methods and systems described herein are operational with numerous other general purpose or special purpose computing system environments or configurations.
- Examples of well known computing systems, environments, and/or configurations that may be suitable include, but are not limited to, including hand-held devices, multi-processor systems, microprocessor based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, portable communication devices, and the like.
- the invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
- program modules may be located in both local and remote memory storage devices.
- computing environment 120 includes a general-purpose computing device in the form of a computer 130 .
- the components of computer 130 may include one or more processors or processing units 132 , a system memory 134 , and a bus 136 that couples various system components including system memory 134 to processor 132 .
- Bus 136 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures.
- bus architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnects (PCI) bus also known as Mezzanine bus.
- Computer 130 typically includes a variety of computer readable media. Such media may be any available media that is accessible by computer 130 , and it includes both volatile and non-volatile media, removable and non-removable media.
- system memory 134 includes computer readable media in the form of volatile memory, such as random access memory (RAM) 140 , and/or non-volatile memory, such as read only memory (ROM) 138 .
- RAM 140 random access memory
- ROM read only memory
- a basic input/output system (BIOS) 142 containing the basic routines that help to transfer information between elements within computer 130 , such as during start-up, is stored in ROM 138 .
- BIOS basic input/output system
- RAM 140 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processor 132 .
- Computer 130 may further include other removable/non-removable, volatile/non-volatile computer storage media.
- FIG. 1 illustrates a hard disk drive 144 for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a “hard drive”), a magnetic disk drive 146 for reading from and writing to a removable, non-volatile magnetic disk 148 (e.g., a “floppy disk”), and an optical disk drive 150 for reading from or writing to a removable, non-volatile optical disk 152 such as a CD-ROM/R/RW, DVD-ROM/R/RW/+R/RAM or other optical media.
- Hard disk drive 144 , magnetic disk drive 146 and optical disk drive 150 are each connected to bus 136 by one or more interfaces 154 .
- the drives and associated computer-readable media provide nonvolatile storage of computer readable instructions, data structures, program modules, and other data for computer 130 .
- the exemplary environment described herein employs a hard disk, a removable magnetic disk 148 and a removable optical disk 152 , it should be appreciated by those skilled in the art that other types of computer readable media which can store data that is accessible by a computer, such as magnetic cassettes, flash memory cards, digital video disks, random access memories (RAMs), read only memories (ROM), and the like, may also be used in the exemplary operating environment.
- a number of program modules may be stored on the hard disk, magnetic disk 148 , optical disk 152 , ROM 138 , or RAM 140 , including, e.g., an operating system 158 , one or more application programs 160 , other program modules 162 , and program data 164 .
- a user may provide commands and information into computer 130 through input devices such as keyboard 166 and pointing device 168 (such as a “mouse”).
- Other input devices may include a microphone, joystick, game pad, satellite dish, serial port, scanner, camera, etc.
- a user input interface 170 that is coupled to bus 136 , but may be connected by other interface and bus structures, such as a parallel port, game port, or a universal serial bus (USB).
- USB universal serial bus
- a monitor 172 or other type of display device is also connected to bus 136 via an interface, such as a video adapter 174 .
- personal computers typically include other peripheral output devices (not shown), such as speakers and printers, which may be connected through output peripheral interface 175 .
- Computer 130 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 182 .
- Remote computer 182 may include many or all of the elements and features described herein relative to computer 130 .
- Logical connections shown in FIG. 1 are a local area network (LAN) 177 and a general wide area network (WAN) 179 .
- LAN local area network
- WAN wide area network
- computer 130 When used in a LAN networking environment, computer 130 is connected to LAN 177 via network interface or adapter 186 .
- the computer When used in a WAN networking environment, the computer typically includes a modem 178 or other means for establishing communications over WAN 179 .
- Modem 178 which may be internal or external, may be connected to system bus 136 via the user input interface 170 or other appropriate mechanism.
- FIG. 1 Depicted in FIG. 1, is a specific implementation of a WAN via the Internet.
- computer 130 employs modem 178 to establish communications with at least one remote computer 182 via the Internet 180 .
- program modules depicted relative to computer 130 may be stored in a remote memory storage device.
- remote application programs 189 may reside on a memory device of remote computer 182 . It will be appreciated that the network connections shown and described are exemplary and other means of establishing a communications link between the computers may be used.
- FIG. 2 is a block diagram that shows further exemplary aspects of application programs 160 and program data 164 of FIG. 1 used to expand query terms.
- System memory 134 is shown to include a number of application programs including, for example, query expansion module 202 and other modules 204 .
- Other modules includes, for example, an operating system to provide a run-time environment, a search engine to generate lists of documents from submitted queries, an embedded Web server to provide search engine services to Web users, and so on.
- the query expansion module identifies one or more query expansion terms 206 from analysis of query session(s) 208 stored in query log(s) 210 .
- a query session is represented, for example, as follows:
- query session ⁇ keyword term(s)>[document identifier(s)].
- Each query session is associated with one or more keyword terms or “terms” from one (1) query and corresponding identifier(s) for the one or more documents that were selected by the user from a document list.
- Each document ID substantially uniquely identifies a particular document that was selected by the user from a document list.
- one or more of the documents IDs are Universal Resource Locators (URLs).
- the document list was returned to the user by a search engine (i.e., see the search engine 204 of “other modules”) responsive to searching for information that includes keywords indicated by the term(s) of the query. Through daily use, the search engine accumulates a substantially large number of such query logs.
- the query expansion module 202 generates a database of probabilistic correlations 212 between previous query terms and document terms. These probabilistic correlations are made between each pair of a previous query term and a document term, as a function of statistics of the whole query logs.
- the document terms are terms in the documents selected by a system responsive to search engine queries. These documents are identified in the query log(s) 210 .
- the probabilistic correlations indicate the conditional probability of the appearance of a document term when a query term is used. For instance, if a document that has been selected by a user more than once for a query consisting of the same terms, then the document is correlated to the terms in the query.
- the probabilistic correlations are based on an assumption that each document that is returned to a user in response to a query and that is also selected or “clicked” by the user will be “relevant” to the particular query.
- user selection information is not as accurate as an explicit relevance indication from a user, as often used in traditional information retrieval, each document returned in response to a query submission that is selected by a user does suggest implied relevance of that document to the user's information need. Even if some erroneous user document clicks/selections are made, users do not typically select documents presented in response to a search engine query at random.
- each keyword term that is not a stop term is extracted. Stops term are those terms that appear frequently in documents and do not provide any ability to discriminate one document from another. Such terms include, for example, “the”, “this” or “and”.
- the query expansion module For every extracted term the query expansion module generates selected document terms 216 . Selected document terms represent corresponding ones of the terms selected from the probabilistic correlation database 212 .
- the query expansion module determines the joint probability 206 for every selected document term as a function of at least the combined probabilities from all terms of a newly submitted query 214 , corresponding ones of the session queries 208 , and/or from all conditional probabilities of document term given query term. The joint probabilities are ranked.
- the query expansion model selects one or more expansion terms 206 from the top-ranked selected document terms.
- a top-ranked expansion term is a term with a higher calculated joint probability than the joint probability corresponding to another term.
- the selected expansion terms are added to the terms of the newly submitted query. In this manner, high-quality expansion terms are added to the terms (i.e., expanding) of a newly submitted query before sending it to the search engine (see, the search engine of “other modules” 204 ).
- the query log(s) 210 are a very valuable resource containing abundant implied relevance feedback data.
- Such implied feedback is used to overcome the problems that are often endemic in traditional relevance feedback techniques, such the lack of sufficient explicit relevance judgment information from a user, the mismatch problems of conventional global analysis techniques, or the problems associated with top-ranked documents that are not of informational interest to the user or even relevant to the submitted query.
- FIG. 3 shows that correlations between terms of newly submitted queries and document terms can be established via information maintained in query sessions from a query log.
- Paths between respective terms 302 of a newly submitted query 214 (FIG. 2) and particular terms of from a selected document (identified via a query session 208 of FIG. 2) are represented in FIG. 3 via directional arrows.
- such paths map new query terms 302 via previous query session terms 306 and selected documents 308 to selected document terms 216 . If there is at least one path between a newly submitted term and one document term, the query expansion module 202 establishes a probabilistic link between the two terms.
- FIG. 4 shows exemplary probabilistic correlations between query terms 302 (newly submitted or logged session terms) and document terms 310 .
- the degree of correlation between a query term and a document term is the conditional probability of a document term's appearance on condition that the query term is used.
- the query expansion module calculates such degrees of correlation between respective ones of the terms 302 and 216 as follows. Let w j (d) and w i (q) be an arbitrary document term and query term respectively, wherein the query term for purposes of equations 4-7 are respective ones of the query terms recorded in the query log(s) 210 (FIG. 2).
- w i (q)) is defined as follows.
- S is a set of documents.
- a document is added into the set if and only if its document identification (ID) and the query term w i (q) co-occur in at least one query session (that is, there is at least one user using the query term M.q) has selected/clicked on the document).
- w i (q) ) is the conditional probability of the document D k being clicked in case that w i (q) appears in the user query.
- D k ) is the conditional probability of occurrence of w j (d) if the document D k is selected. It is noted that
- w i (q) ) can be statistically obtained from the query logs.
- D k ) depends on the frequency of occurrence of w j (d) in the document D k , as well as the occurrence of the term w j (d) in the whole document collection.
- f ik (q) (w i (q) , D k ) is the number of the query sessions in which the query word w i (q) and the document D k appear together.
- f (q) (w i (q) ) is the number of the query sessions that contain the term w i (q) .
- W jk (d) is the normalized weight of the term w j (d) in the document D k , which is divided by the maximum value of term weights in the document D k .
- probabilistic correlations between query terms and document terms from the query logs 210 are pre-computed offline prior to evaluating terms of a newly submitted query 214 for expansion.
- the terms in the query (with stop words being removed) are extracted.
- all correlated document terms are selected based on the conditional probability in the formula (7).
- the joint probability for every document term is obtained according to the following:
- Q stands for the terms extracted from the newly submitted query 214 .
- candidate expansion terms as well as the conditional probabilities between each term and the query.
- top-ranked terms are selected as expansion terms, which are then add to the terms of the newly submitted query for submitting to the search engine.
- FIG. 5 shows an exemplary procedure 500 for query expansion.
- the query expansion module 202 analyzes one or more query logs 210 (FIG. 2) to generate a database of probabilistic correlations 214 (FIG. 2) between query terms and document terms from logged session(s) 208 (FIG. 2). These correlations represent the conditional probability of a document term's appearance on condition that term was used in a session query.
- the query expansion module responsive to receiving a newly submitted query 214 from a client computing device (e.g., a remote device 182 of FIG. 1), the query expansion module extracts every term that is not a stop term from the newly submitted query.
- the query expansion module 202 selects one or more document terms from the probabilistic correlation database 212 (FIG. 2). Each selected document term 216 (FIG. 2) has at least one correlation with a particular one of the extracted query terms.
- the query expansion module combines the probabilities from all terms of a newly submitted query 214 (FIG. 2) and session queries 208 (FIG. 2) to obtain the joint probability 206 (FIG. 2) for every selected document term 216 (FIG. 2).
- the query expansion model compares the joint probabilities to select top-ranked expansion term(s) 206 for adding to the terms of the newly submitted query.
- a top-ranked expansion term is a term with a higher calculated joint probability than the joint probability corresponding to another term. In this manner, high-quality expansion terms are identified for adding to the terms of the newly submitted query.
- the query expansion module submits terms of the newly submitted query 214 (FIG. 2) and the expansion terms 206 (FIG. 2) to a search engine (see, other modules 204 of FIG. 2) for a list of relevant documents (see, the relevant documents portion of other data 220 of FIG. 2).
- the list of relevant documents is communicated to the client computing device (e.g., the remote device 182 of FIG. 1 that communicated the newly submitted query).
- the query expansion module responsive to an indication of user selection of a document from the list of relevant documents, generates a new or updates a previous query session 208 (FIG. 2) in a query log 210 (FIG. 2).
- a Web browser application executing at a client computing device automatically communicates the identity of a document to the computing device 130 (FIG. 1) hosting the query expansion module 202 (FIG. 2), wherein the document was selected by the user from the relevant document list provided to the client device.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
- The invention pertains to information retrieval.
- People increasingly rely on the World Wide Web (“Web”) to satisfy diverse information needs. To meet these needs, existing search engine technology allows users to input a query consisting of one or more keywords for a search for Web documents containing the keywords. Users typically select such keywords because they are thought to be related to the information being sought. Often, however, selected keywords are not always good descriptors of relevant document contents.
- One reason for this is that most words in natural language have inherent ambiguity. Such ambiguity often results in search engine keyword/document term mismatch problems. Very short queries amplify such mismatch problems. Additionally, vocabularies used by Web content authors can vary greatly. In light of this, generating a search engine query that will result in return of a document list of relevance to a user is a difficult problem. In efforts to address this problem, search engine services typically expand queries (i.e., add terms/keywords). Unfortunately, existing query expansion techniques are considerably limited for numerous reasons.
- One limitation, for example, is that global analysis query expansion techniques do not typically address term mismatch. Global analysis techniques are based on the analysis of a corpus of data to generate statistical similarity matrixes of term pair co-occurrences. Such corpus-wide analysis is typically resource intensive, requiring substantial computer processing, memory, and data storage resources. The similarity matrixes are used to expand a query with additional terms that are most similar to the terms already in the query. By only adding “similar” terms to the query, and by not addressing the ambiguities that are inherent between words in language, this global analysis approach to query expansion does not address term mismatch, which is one of the most significant problems in query expansion.
- In another example, some query expansion techniques require explicit relevance information from the user, which can only be obtained by interrupting the task that the user is currently performing. To obtain this information, after submitting a query to a search engine and receiving a list of documents, rather that browsing the documents in the document list or submitting a new query, the user is asked to manually rank the relevance of the documents in the list. This may be accomplished by check-box selection, enumeration, or otherwise indicating that particular ones of the documents in the list are more relevant that others.
- If the user volunteers and manually ranks the documents in the list, subsequent queries submitted to the search engine are then expanded with term(s) extracted from the documents that the user specifically marked as being relevant. Unfortunately, users are often reluctant to interrupt their immediate activities to provide such explicit relevance feedback. Thus, the search engine has no idea whether or not the user considered one document to be more relevant than another. This means that the search engine has no indication of any term that can be considered more relevant than another to a particular query. For this reason, explicit relevance feedback techniques are seldom used to expand queries.
- In another example, some query expansion techniques automatically assume that the top-ranked document(s) that are returned to the user in response to a query are relevant. The original queries from the user are then expanded with term(s) extracted from such top-ranked document(s). This technique becomes substantially problematic when a large fraction of the top-ranked documents are actually not relevant to the user's information need. In this situation, words drawn from such documents and added to the query are often unrelated to the information being sought and the quality of the documents retrieved using such an expanded query is typically poor.
- In another example, some query expansion techniques extract noun groups or “concepts” from a set of top-ranked documents. These noun groups are extracted based on co-occurrences with query terms and not based on the frequencies that the term(s) appear in the top-ranked documents. This technique is based on the hypothesis that a common term from the top-ranked documents will tend to co-occur with all query terms within the top-ranked documents. This hypothesis is not always true and often leads to improper query expansion. In other words, this technique is conducted in the document space only, without considering any judgments from users. It requires distinctive difference between the cluster of relevant documents and that of nion-relevant documents in the retrieval result. This is true for many cases but does not hold some time, especially for those inherently ambiguous queries.
- In light of the above, further innovation to select relevant terms for query expansion is greatly desired.
- Systems and methods for query expansion are described. In one aspect, new terms are extracted from a newly submitted query. Terms to expand the new terms are identified to a relevant document list. The expansion term are identified at least in part on the new terms and probabilistic correlations from information in a query log. The query log information includes one or more query terms and a corresponding set of document identifiers (IDs). The query terms were previously submitted to a search engine. The document IDs represent each document selected from a list generated by the search engine in response to searching for information relevant to corresponding ones of the query terms.
- The following detailed description is described with reference to the accompanying figures. In the figures, the left-most digit of a component reference number identifies the particular figure in which the component first appears.
- FIG. 1 is a block diagram of an exemplary computing environment within which systems and methods for query expansion may be implemented.
- FIG. 2 is a block diagram that shows further exemplary aspects of application programs and program data of the exemplary computing device of FIG. 1.
- FIG. 3 shows that correlations between terms of newly submitted queries and document terms can be established via information maintained in query sessions from a query log.
- FIG. 4 shows exemplary probabilistic correlations between query terms and document terms.
- FIG. 5 shows an exemplary procedure for query expansion.
- An Exemplary Operating Environment
- Turning to the drawings, wherein like reference numerals refer to like elements, the invention is illustrated as being implemented in a suitable computing environment. Although not required, the invention is described in the general context of computer-executable instructions, such as program modules, being executed by a personal computer. Program modules generally include routines, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types.
- FIG. 1 illustrates an example of a
suitable computing environment 120 on which the subsequently described systems, apparatuses and methods to expand queries may be implemented.Exemplary computing environment 120 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of systems and methods the described herein. Neither should computingenvironment 120 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated incomputing environment 120. - The methods and systems described herein are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable include, but are not limited to, including hand-held devices, multi-processor systems, microprocessor based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, portable communication devices, and the like. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
- As shown in FIG. 1,
computing environment 120 includes a general-purpose computing device in the form of acomputer 130. The components ofcomputer 130 may include one or more processors orprocessing units 132, asystem memory 134, and abus 136 that couples various system components includingsystem memory 134 toprocessor 132. -
Bus 136 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnects (PCI) bus also known as Mezzanine bus. -
Computer 130 typically includes a variety of computer readable media. Such media may be any available media that is accessible bycomputer 130, and it includes both volatile and non-volatile media, removable and non-removable media. In FIG. 1,system memory 134 includes computer readable media in the form of volatile memory, such as random access memory (RAM) 140, and/or non-volatile memory, such as read only memory (ROM) 138. A basic input/output system (BIOS) 142, containing the basic routines that help to transfer information between elements withincomputer 130, such as during start-up, is stored inROM 138.RAM 140 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on byprocessor 132. -
Computer 130 may further include other removable/non-removable, volatile/non-volatile computer storage media. For example, FIG. 1 illustrates ahard disk drive 144 for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a “hard drive”), amagnetic disk drive 146 for reading from and writing to a removable, non-volatile magnetic disk 148 (e.g., a “floppy disk”), and anoptical disk drive 150 for reading from or writing to a removable, non-volatileoptical disk 152 such as a CD-ROM/R/RW, DVD-ROM/R/RW/+R/RAM or other optical media.Hard disk drive 144,magnetic disk drive 146 andoptical disk drive 150 are each connected tobus 136 by one ormore interfaces 154. - The drives and associated computer-readable media provide nonvolatile storage of computer readable instructions, data structures, program modules, and other data for
computer 130. Although the exemplary environment described herein employs a hard disk, a removablemagnetic disk 148 and a removableoptical disk 152, it should be appreciated by those skilled in the art that other types of computer readable media which can store data that is accessible by a computer, such as magnetic cassettes, flash memory cards, digital video disks, random access memories (RAMs), read only memories (ROM), and the like, may also be used in the exemplary operating environment. - A number of program modules may be stored on the hard disk,
magnetic disk 148,optical disk 152,ROM 138, orRAM 140, including, e.g., anoperating system 158, one ormore application programs 160,other program modules 162, andprogram data 164. - A user may provide commands and information into
computer 130 through input devices such askeyboard 166 and pointing device 168 (such as a “mouse”). Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, serial port, scanner, camera, etc. These and other input devices are connected to theprocessing unit 132 through auser input interface 170 that is coupled tobus 136, but may be connected by other interface and bus structures, such as a parallel port, game port, or a universal serial bus (USB). - A
monitor 172 or other type of display device is also connected tobus 136 via an interface, such as avideo adapter 174. In addition to monitor 172, personal computers typically include other peripheral output devices (not shown), such as speakers and printers, which may be connected through outputperipheral interface 175. -
Computer 130 may operate in a networked environment using logical connections to one or more remote computers, such as aremote computer 182.Remote computer 182 may include many or all of the elements and features described herein relative tocomputer 130. Logical connections shown in FIG. 1 are a local area network (LAN) 177 and a general wide area network (WAN) 179. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets, and the Internet. - When used in a LAN networking environment,
computer 130 is connected toLAN 177 via network interface oradapter 186. When used in a WAN networking environment, the computer typically includes amodem 178 or other means for establishing communications overWAN 179.Modem 178, which may be internal or external, may be connected tosystem bus 136 via theuser input interface 170 or other appropriate mechanism. - Depicted in FIG. 1, is a specific implementation of a WAN via the Internet. Here,
computer 130 employsmodem 178 to establish communications with at least oneremote computer 182 via theInternet 180. - In a networked environment, program modules depicted relative to
computer 130, or portions thereof, may be stored in a remote memory storage device. Thus, e.g., as depicted in FIG. 1,remote application programs 189 may reside on a memory device ofremote computer 182. It will be appreciated that the network connections shown and described are exemplary and other means of establishing a communications link between the computers may be used. - FIG. 2 is a block diagram that shows further exemplary aspects of
application programs 160 andprogram data 164 of FIG. 1 used to expand query terms.System memory 134 is shown to include a number of application programs including, for example,query expansion module 202 andother modules 204. Other modules includes, for example, an operating system to provide a run-time environment, a search engine to generate lists of documents from submitted queries, an embedded Web server to provide search engine services to Web users, and so on. - The query expansion module identifies one or more
query expansion terms 206 from analysis of query session(s) 208 stored in query log(s) 210. A query session is represented, for example, as follows: - query session=<keyword term(s)>[document identifier(s)].
- Each query session is associated with one or more keyword terms or “terms” from one (1) query and corresponding identifier(s) for the one or more documents that were selected by the user from a document list. Each document ID substantially uniquely identifies a particular document that was selected by the user from a document list. In one implementation, one or more of the documents IDs are Universal Resource Locators (URLs). The document list was returned to the user by a search engine (i.e., see the
search engine 204 of “other modules”) responsive to searching for information that includes keywords indicated by the term(s) of the query. Through daily use, the search engine accumulates a substantially large number of such query logs. - The
query expansion module 202 generates a database ofprobabilistic correlations 212 between previous query terms and document terms. These probabilistic correlations are made between each pair of a previous query term and a document term, as a function of statistics of the whole query logs The document terms are terms in the documents selected by a system responsive to search engine queries. These documents are identified in the query log(s) 210. The probabilistic correlations indicate the conditional probability of the appearance of a document term when a query term is used. For instance, if a document that has been selected by a user more than once for a query consisting of the same terms, then the document is correlated to the terms in the query. - The probabilistic correlations are based on an assumption that each document that is returned to a user in response to a query and that is also selected or “clicked” by the user will be “relevant” to the particular query. Although such user selection information is not as accurate as an explicit relevance indication from a user, as often used in traditional information retrieval, each document returned in response to a query submission that is selected by a user does suggest implied relevance of that document to the user's information need. Even if some erroneous user document clicks/selections are made, users do not typically select documents presented in response to a search engine query at random.
- When the
query expansion module 202 receives a newly submittedquery 214, each keyword term that is not a stop term, is extracted. Stops term are those terms that appear frequently in documents and do not provide any ability to discriminate one document from another. Such terms include, for example, “the”, “this” or “and”. For every extracted term the query expansion module generates selected document terms 216. Selected document terms represent corresponding ones of the terms selected from theprobabilistic correlation database 212. The query expansion module then determines thejoint probability 206 for every selected document term as a function of at least the combined probabilities from all terms of a newly submittedquery 214, corresponding ones of the session queries 208, and/or from all conditional probabilities of document term given query term. The joint probabilities are ranked. - The query expansion model selects one or
more expansion terms 206 from the top-ranked selected document terms. A top-ranked expansion term is a term with a higher calculated joint probability than the joint probability corresponding to another term. The selected expansion terms are added to the terms of the newly submitted query. In this manner, high-quality expansion terms are added to the terms (i.e., expanding) of a newly submitted query before sending it to the search engine (see, the search engine of “other modules” 204). - In light of the above, the query log(s)210 are a very valuable resource containing abundant implied relevance feedback data. Such implied feedback is used to overcome the problems that are often endemic in traditional relevance feedback techniques, such the lack of sufficient explicit relevance judgment information from a user, the mismatch problems of conventional global analysis techniques, or the problems associated with top-ranked documents that are not of informational interest to the user or even relevant to the submitted query.
- FIG. 3 shows that correlations between terms of newly submitted queries and document terms can be established via information maintained in query sessions from a query log. Paths between
respective terms 302 of a newly submitted query 214 (FIG. 2) and particular terms of from a selected document (identified via aquery session 208 of FIG. 2) are represented in FIG. 3 via directional arrows. In particular, such paths mapnew query terms 302 via previousquery session terms 306 and selecteddocuments 308 to selected document terms 216. If there is at least one path between a newly submitted term and one document term, thequery expansion module 202 establishes a probabilistic link between the two terms. - FIG. 4 shows exemplary probabilistic correlations between query terms302 (newly submitted or logged session terms) and document
terms 310. The degree of correlation between a query term and a document term is the conditional probability of a document term's appearance on condition that the query term is used. The query expansion module calculates such degrees of correlation between respective ones of theterms - S is a set of documents. A document is added into the set if and only if its document identification (ID) and the query term wi (q)co-occur in at least one query session (that is, there is at least one user using the query term M.q) has selected/clicked on the document).
- P(Dk|wi (q)) is the conditional probability of the document Dk being clicked in case that wi (q) appears in the user query. P(wj (d)|Dk) is the conditional probability of occurrence of wj (d) if the document Dk is selected. It is noted that
- P(w j (d) |w i (q) ,D k)=P(wj (d) |D k).
- This is because the document Dk separates the query term wi (q) from the document term wj (d).
-
- Where fik (q)(wi (q), Dk) is the number of the query sessions in which the query word wi (q) and the document D k appear together. f(q)(wi (q)) is the number of the query sessions that contain the term wi (q). Wjk (d)is the normalized weight of the term wj (d) in the document Dk, which is divided by the maximum value of term weights in the document Dk.
-
- In one implementation, probabilistic correlations between query terms and document terms from the query logs210 are pre-computed offline prior to evaluating terms of a newly submitted
query 214 for expansion. When a new query comes arrives, the terms in the query (with stop words being removed) are extracted. Then for every extracted term, all correlated document terms are selected based on the conditional probability in the formula (7). By combining the probabilities of all query terms, the joint probability for every document term is obtained according to the following: - P(w j (d) |Q)=ln(Π(P(w j (d) |w i (q))+1)) (8).
- Q stands for the terms extracted from the newly submitted
query 214. Thus, for every query, we get a list of candidate expansion terms as well as the conditional probabilities between each term and the query. Then the top-ranked terms are selected as expansion terms, which are then add to the terms of the newly submitted query for submitting to the search engine. - An Exemplary Procedure
- FIG. 5 shows an
exemplary procedure 500 for query expansion. The operations of this procedure are described in reference to the program module and data components of FIGS. 1 and 2. Atblock 502, the query expansion module 202 (FIG. 2) analyzes one or more query logs 210 (FIG. 2) to generate a database of probabilistic correlations 214 (FIG. 2) between query terms and document terms from logged session(s) 208 (FIG. 2). These correlations represent the conditional probability of a document term's appearance on condition that term was used in a session query. Atblock 504, responsive to receiving a newly submittedquery 214 from a client computing device (e.g., aremote device 182 of FIG. 1), the query expansion module extracts every term that is not a stop term from the newly submitted query. - At
block 506, the query expansion module 202 (FIG. 2) selects one or more document terms from the probabilistic correlation database 212 (FIG. 2). Each selected document term 216 (FIG. 2) has at least one correlation with a particular one of the extracted query terms. Atblock 508, the query expansion module combines the probabilities from all terms of a newly submitted query 214 (FIG. 2) and session queries 208 (FIG. 2) to obtain the joint probability 206 (FIG. 2) for every selected document term 216 (FIG. 2). Atblock 510, the query expansion model compares the joint probabilities to select top-ranked expansion term(s) 206 for adding to the terms of the newly submitted query. A top-ranked expansion term is a term with a higher calculated joint probability than the joint probability corresponding to another term. In this manner, high-quality expansion terms are identified for adding to the terms of the newly submitted query. - At
block 512, the query expansion module submits terms of the newly submitted query 214 (FIG. 2) and the expansion terms 206 (FIG. 2) to a search engine (see,other modules 204 of FIG. 2) for a list of relevant documents (see, the relevant documents portion ofother data 220 of FIG. 2). The list of relevant documents is communicated to the client computing device (e.g., theremote device 182 of FIG. 1 that communicated the newly submitted query). Atblock 514, the query expansion module, responsive to an indication of user selection of a document from the list of relevant documents, generates a new or updates a previous query session 208 (FIG. 2) in a query log 210 (FIG. 2). - In one implementation, a Web browser application executing at a client computing device (e.g., the
remote device 182 of FIG. 1) automatically communicates the identity of a document to the computing device 130 (FIG. 1) hosting the query expansion module 202 (FIG. 2), wherein the document was selected by the user from the relevant document list provided to the client device. - The described systems and methods expand queries. Although the systems and methods have been described in language specific to structural features and methodological operations, the subject matter as defined in the appended claims are not necessarily limited to the specific features or operations described. Rather, the specific features and operations are disclosed as exemplary forms of implementing the claimed subject matter.
Claims (21)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/365,294 US7287025B2 (en) | 2003-02-12 | 2003-02-12 | Systems and methods for query expansion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/365,294 US7287025B2 (en) | 2003-02-12 | 2003-02-12 | Systems and methods for query expansion |
Publications (2)
Publication Number | Publication Date |
---|---|
US20040158560A1 true US20040158560A1 (en) | 2004-08-12 |
US7287025B2 US7287025B2 (en) | 2007-10-23 |
Family
ID=32824609
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/365,294 Expired - Fee Related US7287025B2 (en) | 2003-02-12 | 2003-02-12 | Systems and methods for query expansion |
Country Status (1)
Country | Link |
---|---|
US (1) | US7287025B2 (en) |
Cited By (60)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040243561A1 (en) * | 2003-05-30 | 2004-12-02 | Cody William F. | Text explanation for on-line analytic processing events |
US20050027691A1 (en) * | 2003-07-28 | 2005-02-03 | Sergey Brin | System and method for providing a user interface with search query broadening |
US20050125440A1 (en) * | 2003-12-05 | 2005-06-09 | Roy Hirst | Systems and methods for improving information discovery |
US20050289102A1 (en) * | 2004-06-29 | 2005-12-29 | Microsoft Corporation | Ranking database query results |
US20060036593A1 (en) * | 2004-08-13 | 2006-02-16 | Dean Jeffrey A | Multi-stage query processing system and method for use with tokenspace repository |
US20060190439A1 (en) * | 2005-01-28 | 2006-08-24 | Chowdhury Abdur R | Web query classification |
US20060230036A1 (en) * | 2005-03-31 | 2006-10-12 | Kei Tateno | Information processing apparatus, information processing method and program |
WO2006121702A1 (en) * | 2005-05-04 | 2006-11-16 | Google, Inc. | Suggesting and refining user input based on original user input |
US20070220023A1 (en) * | 2004-08-13 | 2007-09-20 | Jeffrey Dean | Document compression system and method for use with tokenspace repository |
US20070299836A1 (en) * | 2006-06-23 | 2007-12-27 | Xue Qiao Hou | Database query language transformation method, transformation apparatus and database query system |
EP1952287A2 (en) * | 2005-11-22 | 2008-08-06 | Google, Inc. | Inferring search category synonyms from user logs |
US20080189262A1 (en) * | 2007-02-01 | 2008-08-07 | Yahoo! Inc. | Word pluralization handling in query for web search |
US20090055380A1 (en) * | 2007-08-22 | 2009-02-26 | Fuchun Peng | Predictive Stemming for Web Search with Statistical Machine Translation Models |
US7685195B2 (en) * | 2005-03-24 | 2010-03-23 | Sas Institute Inc. | Systems and methods for analyzing web site search terms |
US20100280989A1 (en) * | 2009-04-29 | 2010-11-04 | Pankaj Mehra | Ontology creation by reference to a knowledge corpus |
US20100287034A1 (en) * | 2009-05-08 | 2010-11-11 | David Carter Pope | Computer-Implemented Systems and Methods for Determining Future Profitability |
WO2011045699A1 (en) * | 2009-10-14 | 2011-04-21 | Koninklijke Philips Electronics N.V. | Method and system for facilitating data entry for an information system |
WO2011079414A1 (en) * | 2009-12-30 | 2011-07-07 | Google Inc. | Custom search query suggestion tools |
US8019748B1 (en) | 2007-11-14 | 2011-09-13 | Google Inc. | Web search refinement |
US8055669B1 (en) * | 2003-03-03 | 2011-11-08 | Google Inc. | Search queries improved based on query semantic information |
US20110314001A1 (en) * | 2010-06-18 | 2011-12-22 | Microsoft Corporation | Performing query expansion based upon statistical analysis of structured data |
US8380705B2 (en) | 2003-09-12 | 2013-02-19 | Google Inc. | Methods and systems for improving a search ranking using related queries |
US8396865B1 (en) | 2008-12-10 | 2013-03-12 | Google Inc. | Sharing search engine relevance data between corpora |
US8498974B1 (en) | 2009-08-31 | 2013-07-30 | Google Inc. | Refining search results |
US20130232147A1 (en) * | 2010-10-29 | 2013-09-05 | Pankaj Mehra | Generating a taxonomy from unstructured information |
US8572096B1 (en) * | 2011-08-05 | 2013-10-29 | Google Inc. | Selecting keywords using co-visitation information |
CN103425727A (en) * | 2012-05-14 | 2013-12-04 | 国际商业机器公司 | Contextual voice query dilation |
US8615514B1 (en) | 2010-02-03 | 2013-12-24 | Google Inc. | Evaluating website properties by partitioning user feedback |
US8661029B1 (en) | 2006-11-02 | 2014-02-25 | Google Inc. | Modifying search result ranking based on implicit user feedback |
US8694511B1 (en) | 2007-08-20 | 2014-04-08 | Google Inc. | Modifying search result ranking based on populations |
US8694374B1 (en) | 2007-03-14 | 2014-04-08 | Google Inc. | Detecting click spam |
US8725756B1 (en) | 2007-11-12 | 2014-05-13 | Google Inc. | Session-based query suggestions |
US8832083B1 (en) | 2010-07-23 | 2014-09-09 | Google Inc. | Combining user feedback |
US8874555B1 (en) | 2009-11-20 | 2014-10-28 | Google Inc. | Modifying scoring data based on historical changes |
US8909655B1 (en) | 2007-10-11 | 2014-12-09 | Google Inc. | Time based ranking |
US8924379B1 (en) | 2010-03-05 | 2014-12-30 | Google Inc. | Temporal-based score adjustments |
US8938463B1 (en) | 2007-03-12 | 2015-01-20 | Google Inc. | Modifying search result ranking based on implicit user feedback and a model of presentation bias |
US8959093B1 (en) | 2010-03-15 | 2015-02-17 | Google Inc. | Ranking search results based on anchors |
US8972391B1 (en) | 2009-10-02 | 2015-03-03 | Google Inc. | Recent interest based relevance scoring |
US8972394B1 (en) | 2009-07-20 | 2015-03-03 | Google Inc. | Generating a related set of documents for an initial set of documents |
US9002867B1 (en) | 2010-12-30 | 2015-04-07 | Google Inc. | Modifying ranking data based on document changes |
US9009146B1 (en) | 2009-04-08 | 2015-04-14 | Google Inc. | Ranking search results based on similar queries |
CN104750819A (en) * | 2015-03-31 | 2015-07-01 | 大连理工大学 | A Biomedical Literature Retrieval Method and System Based on Word Grouping Algorithm |
US9092510B1 (en) | 2007-04-30 | 2015-07-28 | Google Inc. | Modifying search result ranking based on a temporal element of user feedback |
US20150213041A1 (en) * | 2013-03-15 | 2015-07-30 | Google Inc. | Search suggestion rankings |
US9110975B1 (en) | 2006-11-02 | 2015-08-18 | Google Inc. | Search result inputs using variant generalized queries |
US9183499B1 (en) | 2013-04-19 | 2015-11-10 | Google Inc. | Evaluating quality based on neighbor features |
US9317550B2 (en) | 2012-07-20 | 2016-04-19 | Alibaba Group Holding Limited | Query expansion |
CN105956010A (en) * | 2016-04-20 | 2016-09-21 | 浙江大学 | Distributed information retrieval set selection method based on distributed representation and local ordering |
US9529868B1 (en) | 2011-08-25 | 2016-12-27 | Infotech International Llc | Document processing system and method |
CN106547864A (en) * | 2016-10-24 | 2017-03-29 | 湖南科技大学 | A kind of Personalized search based on query expansion |
US9623119B1 (en) | 2010-06-29 | 2017-04-18 | Google Inc. | Accentuating search results |
US9633012B1 (en) | 2011-08-25 | 2017-04-25 | Infotech International Llc | Construction permit processing system and method |
US9785638B1 (en) | 2011-08-25 | 2017-10-10 | Infotech International Llc | Document display system and method |
US20190179946A1 (en) * | 2017-12-13 | 2019-06-13 | Microsoft Technology Licensing, Llc | Contextual Data Transformation of Image Content |
US11182435B2 (en) * | 2016-11-25 | 2021-11-23 | Nippon Telegraph And Telephone Corporation | Model generation device, text search device, model generation method, text search method, data structure, and program |
US11475501B2 (en) * | 2009-04-30 | 2022-10-18 | Paypal, Inc. | Recommendations based on branding |
US20240135098A1 (en) * | 2013-07-12 | 2024-04-25 | Microsoft Technology Licensing, Llc | Interactive concept editing in computer-human interactive learning |
US12014387B1 (en) | 2021-07-23 | 2024-06-18 | Apttus Corporation | System, method, and computer program for providing a pricing platform for performing different types of pricing calculations for different customers |
US12067037B1 (en) * | 2022-02-28 | 2024-08-20 | Apttus Corporation | System, method, and computer program for performing natural language searches for documents in a database using alternate search suggestions |
Families Citing this family (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8630984B1 (en) | 2003-01-17 | 2014-01-14 | Renew Data Corp. | System and method for data extraction from email files |
US8065277B1 (en) | 2003-01-17 | 2011-11-22 | Daniel John Gardner | System and method for a data extraction and backup database |
US8943024B1 (en) | 2003-01-17 | 2015-01-27 | Daniel John Gardner | System and method for data de-duplication |
US8375008B1 (en) | 2003-01-17 | 2013-02-12 | Robert Gomes | Method and system for enterprise-wide retention of digital or electronic data |
US7844589B2 (en) * | 2003-11-18 | 2010-11-30 | Yahoo! Inc. | Method and apparatus for performing a search |
US7840547B1 (en) * | 2004-03-31 | 2010-11-23 | Google Inc. | Methods and systems for efficient query rewriting |
US20080077570A1 (en) * | 2004-10-25 | 2008-03-27 | Infovell, Inc. | Full Text Query and Search Systems and Method of Use |
US20060212441A1 (en) * | 2004-10-25 | 2006-09-21 | Yuanhua Tang | Full text query and search systems and methods of use |
US8069151B1 (en) | 2004-12-08 | 2011-11-29 | Chris Crafford | System and method for detecting incongruous or incorrect media in a data recovery process |
US8527468B1 (en) | 2005-02-08 | 2013-09-03 | Renew Data Corp. | System and method for management of retention periods for content in a computing system |
US8150846B2 (en) | 2005-02-17 | 2012-04-03 | Microsoft Corporation | Content searching and configuration of search results |
CN101366024B (en) * | 2005-05-16 | 2014-07-30 | 电子湾有限公司 | Method and system for processing data searching request |
US8195683B2 (en) | 2006-02-28 | 2012-06-05 | Ebay Inc. | Expansion of database search queries |
US20070271255A1 (en) * | 2006-05-17 | 2007-11-22 | Nicky Pappo | Reverse search-engine |
US8150827B2 (en) * | 2006-06-07 | 2012-04-03 | Renew Data Corp. | Methods for enhancing efficiency and cost effectiveness of first pass review of documents |
US7747607B2 (en) * | 2006-09-21 | 2010-06-29 | Yahoo! Inc. | Determining logically-related sub-strings of a string |
US7636712B2 (en) * | 2006-11-14 | 2009-12-22 | Microsoft Corporation | Batching document identifiers for result trimming |
EP2122506A4 (en) * | 2007-01-10 | 2011-11-30 | Sysomos Inc | Method and system for information discovery and text analysis |
US20080208820A1 (en) * | 2007-02-28 | 2008-08-28 | Psydex Corporation | Systems and methods for performing semantic analysis of information over time and space |
CN101842787A (en) * | 2007-09-14 | 2010-09-22 | 谷歌公司 | Suggesting alterntive queries in query results |
US7814108B2 (en) * | 2007-12-21 | 2010-10-12 | Microsoft Corporation | Search engine platform |
US8615490B1 (en) | 2008-01-31 | 2013-12-24 | Renew Data Corp. | Method and system for restoring information from backup storage media |
US20100332491A1 (en) * | 2009-06-25 | 2010-12-30 | Yahoo!, Inc., a Delaware corporation | Method and system for utilizing user selection data to determine relevance of a web document for a search query |
US9430521B2 (en) * | 2009-09-30 | 2016-08-30 | Microsoft Technology Licensing, Llc | Query expansion through searching content identifiers |
WO2011075610A1 (en) | 2009-12-16 | 2011-06-23 | Renew Data Corp. | System and method for creating a de-duplicated data set |
US8161073B2 (en) | 2010-05-05 | 2012-04-17 | Holovisions, LLC | Context-driven search |
US8583669B2 (en) * | 2011-05-30 | 2013-11-12 | Google Inc. | Query suggestion for efficient legal E-discovery |
US9619046B2 (en) | 2013-02-27 | 2017-04-11 | Facebook, Inc. | Determining phrase objects based on received user input context information |
US9582543B2 (en) | 2014-04-24 | 2017-02-28 | International Business Machines Corporation | Temporal proximity query expansion |
US9996527B1 (en) | 2017-03-30 | 2018-06-12 | International Business Machines Corporation | Supporting interactive text mining process with natural language and dialog |
US10678822B2 (en) | 2018-06-29 | 2020-06-09 | International Business Machines Corporation | Query expansion using a graph of question and answer vocabulary |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5787422A (en) * | 1996-01-11 | 1998-07-28 | Xerox Corporation | Method and apparatus for information accesss employing overlapping clusters |
US5864845A (en) * | 1996-06-28 | 1999-01-26 | Siemens Corporate Research, Inc. | Facilitating world wide web searches utilizing a multiple search engine query clustering fusion strategy |
US6128613A (en) * | 1997-06-26 | 2000-10-03 | The Chinese University Of Hong Kong | Method and apparatus for establishing topic word classes based on an entropy cost function to retrieve documents represented by the topic words |
US6169986B1 (en) * | 1998-06-15 | 2001-01-02 | Amazon.Com, Inc. | System and method for refining search queries |
US20020099701A1 (en) * | 2001-01-24 | 2002-07-25 | Andreas Rippich | Method and apparatus for displaying database search results |
US20020133726A1 (en) * | 2001-01-18 | 2002-09-19 | Noriaki Kawamae | Information retrieval support method and information retrieval support system |
US20030004968A1 (en) * | 2000-08-28 | 2003-01-02 | Emotion Inc. | Method and apparatus for digital media management, retrieval, and collaboration |
US6701309B1 (en) * | 2000-04-21 | 2004-03-02 | Lycos, Inc. | Method and system for collecting related queries |
US6772150B1 (en) * | 1999-12-10 | 2004-08-03 | Amazon.Com, Inc. | Search query refinement using related search phrases |
US20040220925A1 (en) * | 2001-11-30 | 2004-11-04 | Microsoft Corporation | Media agent |
US20040243568A1 (en) * | 2000-08-24 | 2004-12-02 | Hai-Feng Wang | Search engine with natural language-based robust parsing of user query and relevance feedback learning |
US6856957B1 (en) * | 2001-02-07 | 2005-02-15 | Nuance Communications | Query expansion and weighting based on results of automatic speech recognition |
US6886010B2 (en) * | 2002-09-30 | 2005-04-26 | The United States Of America As Represented By The Secretary Of The Navy | Method for data and text mining and literature-based discovery |
US6925433B2 (en) * | 2001-05-09 | 2005-08-02 | International Business Machines Corporation | System and method for context-dependent probabilistic modeling of words and documents |
-
2003
- 2003-02-12 US US10/365,294 patent/US7287025B2/en not_active Expired - Fee Related
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5787422A (en) * | 1996-01-11 | 1998-07-28 | Xerox Corporation | Method and apparatus for information accesss employing overlapping clusters |
US5864845A (en) * | 1996-06-28 | 1999-01-26 | Siemens Corporate Research, Inc. | Facilitating world wide web searches utilizing a multiple search engine query clustering fusion strategy |
US6128613A (en) * | 1997-06-26 | 2000-10-03 | The Chinese University Of Hong Kong | Method and apparatus for establishing topic word classes based on an entropy cost function to retrieve documents represented by the topic words |
US6169986B1 (en) * | 1998-06-15 | 2001-01-02 | Amazon.Com, Inc. | System and method for refining search queries |
US6772150B1 (en) * | 1999-12-10 | 2004-08-03 | Amazon.Com, Inc. | Search query refinement using related search phrases |
US6701309B1 (en) * | 2000-04-21 | 2004-03-02 | Lycos, Inc. | Method and system for collecting related queries |
US20040243568A1 (en) * | 2000-08-24 | 2004-12-02 | Hai-Feng Wang | Search engine with natural language-based robust parsing of user query and relevance feedback learning |
US20030004968A1 (en) * | 2000-08-28 | 2003-01-02 | Emotion Inc. | Method and apparatus for digital media management, retrieval, and collaboration |
US20020133726A1 (en) * | 2001-01-18 | 2002-09-19 | Noriaki Kawamae | Information retrieval support method and information retrieval support system |
US20020099701A1 (en) * | 2001-01-24 | 2002-07-25 | Andreas Rippich | Method and apparatus for displaying database search results |
US6856957B1 (en) * | 2001-02-07 | 2005-02-15 | Nuance Communications | Query expansion and weighting based on results of automatic speech recognition |
US6925433B2 (en) * | 2001-05-09 | 2005-08-02 | International Business Machines Corporation | System and method for context-dependent probabilistic modeling of words and documents |
US20040220925A1 (en) * | 2001-11-30 | 2004-11-04 | Microsoft Corporation | Media agent |
US6886010B2 (en) * | 2002-09-30 | 2005-04-26 | The United States Of America As Represented By The Secretary Of The Navy | Method for data and text mining and literature-based discovery |
Cited By (112)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8577907B1 (en) | 2003-03-03 | 2013-11-05 | Google Inc. | Search queries improved based on query semantic information |
US8055669B1 (en) * | 2003-03-03 | 2011-11-08 | Google Inc. | Search queries improved based on query semantic information |
US20070282830A1 (en) * | 2003-05-30 | 2007-12-06 | Cody William F | Text explanation for on-line analytic processing events |
US7383257B2 (en) * | 2003-05-30 | 2008-06-03 | International Business Machines Corporation | Text explanation for on-line analytic processing events |
US20040243561A1 (en) * | 2003-05-30 | 2004-12-02 | Cody William F. | Text explanation for on-line analytic processing events |
US7822704B2 (en) | 2003-05-30 | 2010-10-26 | International Business Machines Corporation | Text explanation for on-line analytic processing events |
US20050027691A1 (en) * | 2003-07-28 | 2005-02-03 | Sergey Brin | System and method for providing a user interface with search query broadening |
US8856163B2 (en) * | 2003-07-28 | 2014-10-07 | Google Inc. | System and method for providing a user interface with search query broadening |
US8380705B2 (en) | 2003-09-12 | 2013-02-19 | Google Inc. | Methods and systems for improving a search ranking using related queries |
US8452758B2 (en) | 2003-09-12 | 2013-05-28 | Google Inc. | Methods and systems for improving a search ranking using related queries |
US7472118B2 (en) * | 2003-12-05 | 2008-12-30 | Microsoft Corporation | Systems and methods for improving information discovery |
US7987170B2 (en) | 2003-12-05 | 2011-07-26 | Microsoft Corporation | Systems and methods for improving information discovery |
US20050125440A1 (en) * | 2003-12-05 | 2005-06-09 | Roy Hirst | Systems and methods for improving information discovery |
US20050289102A1 (en) * | 2004-06-29 | 2005-12-29 | Microsoft Corporation | Ranking database query results |
US7383262B2 (en) * | 2004-06-29 | 2008-06-03 | Microsoft Corporation | Ranking database query results using probabilistic models from information retrieval |
US8321445B2 (en) | 2004-08-13 | 2012-11-27 | Google Inc. | Generating content snippets using a tokenspace repository |
US7917480B2 (en) | 2004-08-13 | 2011-03-29 | Google Inc. | Document compression system and method for use with tokenspace repository |
US20060036593A1 (en) * | 2004-08-13 | 2006-02-16 | Dean Jeffrey A | Multi-stage query processing system and method for use with tokenspace repository |
US8407239B2 (en) | 2004-08-13 | 2013-03-26 | Google Inc. | Multi-stage query processing system and method for use with tokenspace repository |
US20110153577A1 (en) * | 2004-08-13 | 2011-06-23 | Jeffrey Dean | Query Processing System and Method for Use with Tokenspace Repository |
US9619565B1 (en) | 2004-08-13 | 2017-04-11 | Google Inc. | Generating content snippets using a tokenspace repository |
US9098501B2 (en) | 2004-08-13 | 2015-08-04 | Google Inc. | Generating content snippets using a tokenspace repository |
US20070220023A1 (en) * | 2004-08-13 | 2007-09-20 | Jeffrey Dean | Document compression system and method for use with tokenspace repository |
US9146967B2 (en) | 2004-08-13 | 2015-09-29 | Google Inc. | Multi-stage query processing system and method for use with tokenspace repository |
WO2006020595A1 (en) * | 2004-08-13 | 2006-02-23 | Google, Inc. | Multi-stage query processing system and method for use with tokenspace repository |
US7779009B2 (en) * | 2005-01-28 | 2010-08-17 | Aol Inc. | Web query classification |
US20060190439A1 (en) * | 2005-01-28 | 2006-08-24 | Chowdhury Abdur R | Web query classification |
US7685195B2 (en) * | 2005-03-24 | 2010-03-23 | Sas Institute Inc. | Systems and methods for analyzing web site search terms |
US20060230036A1 (en) * | 2005-03-31 | 2006-10-12 | Kei Tateno | Information processing apparatus, information processing method and program |
WO2006121702A1 (en) * | 2005-05-04 | 2006-11-16 | Google, Inc. | Suggesting and refining user input based on original user input |
US9020924B2 (en) | 2005-05-04 | 2015-04-28 | Google Inc. | Suggesting and refining user input based on original user input |
US9411906B2 (en) | 2005-05-04 | 2016-08-09 | Google Inc. | Suggesting and refining user input based on original user input |
US8438142B2 (en) | 2005-05-04 | 2013-05-07 | Google Inc. | Suggesting and refining user input based on original user input |
US8156102B2 (en) | 2005-11-22 | 2012-04-10 | Google Inc. | Inferring search category synonyms |
JP4809441B2 (en) * | 2005-11-22 | 2011-11-09 | グーグル インコーポレイテッド | Estimating search category synonyms from user logs |
EP1952287A2 (en) * | 2005-11-22 | 2008-08-06 | Google, Inc. | Inferring search category synonyms from user logs |
EP1952287A4 (en) * | 2005-11-22 | 2010-01-20 | Google Inc | Inferring search category synonyms from user logs |
US20100036822A1 (en) * | 2005-11-22 | 2010-02-11 | Google Inc. | Inferring search category synonyms from user logs |
US7668818B2 (en) * | 2006-06-23 | 2010-02-23 | International Business Machines Corporation | Database query language transformation method, transformation apparatus and database query system |
US20070299836A1 (en) * | 2006-06-23 | 2007-12-27 | Xue Qiao Hou | Database query language transformation method, transformation apparatus and database query system |
US9223827B2 (en) | 2006-06-23 | 2015-12-29 | International Business Machines Corporation | Database query language transformation method, transformation apparatus and database query system |
US20090094216A1 (en) * | 2006-06-23 | 2009-04-09 | International Business Machines Corporation | Database query language transformation method, transformation apparatus and database query system |
US9811566B1 (en) | 2006-11-02 | 2017-11-07 | Google Inc. | Modifying search result ranking based on implicit user feedback |
US11188544B1 (en) | 2006-11-02 | 2021-11-30 | Google Llc | Modifying search result ranking based on implicit user feedback |
US9110975B1 (en) | 2006-11-02 | 2015-08-18 | Google Inc. | Search result inputs using variant generalized queries |
US10229166B1 (en) | 2006-11-02 | 2019-03-12 | Google Llc | Modifying search result ranking based on implicit user feedback |
US11816114B1 (en) | 2006-11-02 | 2023-11-14 | Google Llc | Modifying search result ranking based on implicit user feedback |
US9235627B1 (en) | 2006-11-02 | 2016-01-12 | Google Inc. | Modifying search result ranking based on implicit user feedback |
US8661029B1 (en) | 2006-11-02 | 2014-02-25 | Google Inc. | Modifying search result ranking based on implicit user feedback |
US20080189262A1 (en) * | 2007-02-01 | 2008-08-07 | Yahoo! Inc. | Word pluralization handling in query for web search |
US7996410B2 (en) | 2007-02-01 | 2011-08-09 | Yahoo! Inc. | Word pluralization handling in query for web search |
US8938463B1 (en) | 2007-03-12 | 2015-01-20 | Google Inc. | Modifying search result ranking based on implicit user feedback and a model of presentation bias |
US8694374B1 (en) | 2007-03-14 | 2014-04-08 | Google Inc. | Detecting click spam |
US9092510B1 (en) | 2007-04-30 | 2015-07-28 | Google Inc. | Modifying search result ranking based on a temporal element of user feedback |
US8694511B1 (en) | 2007-08-20 | 2014-04-08 | Google Inc. | Modifying search result ranking based on populations |
US7788276B2 (en) * | 2007-08-22 | 2010-08-31 | Yahoo! Inc. | Predictive stemming for web search with statistical machine translation models |
US20090055380A1 (en) * | 2007-08-22 | 2009-02-26 | Fuchun Peng | Predictive Stemming for Web Search with Statistical Machine Translation Models |
US8909655B1 (en) | 2007-10-11 | 2014-12-09 | Google Inc. | Time based ranking |
US9152678B1 (en) | 2007-10-11 | 2015-10-06 | Google Inc. | Time based ranking |
US8725756B1 (en) | 2007-11-12 | 2014-05-13 | Google Inc. | Session-based query suggestions |
US9104764B1 (en) | 2007-11-12 | 2015-08-11 | Google Inc. | Session-based query suggestions |
US9858358B1 (en) | 2007-11-12 | 2018-01-02 | Google Inc. | Session-based query suggestions |
US8321403B1 (en) | 2007-11-14 | 2012-11-27 | Google Inc. | Web search refinement |
US8019748B1 (en) | 2007-11-14 | 2011-09-13 | Google Inc. | Web search refinement |
US8898152B1 (en) | 2008-12-10 | 2014-11-25 | Google Inc. | Sharing search engine relevance data |
US8396865B1 (en) | 2008-12-10 | 2013-03-12 | Google Inc. | Sharing search engine relevance data between corpora |
US9009146B1 (en) | 2009-04-08 | 2015-04-14 | Google Inc. | Ranking search results based on similar queries |
US20100280989A1 (en) * | 2009-04-29 | 2010-11-04 | Pankaj Mehra | Ontology creation by reference to a knowledge corpus |
US11475501B2 (en) * | 2009-04-30 | 2022-10-18 | Paypal, Inc. | Recommendations based on branding |
US20100287034A1 (en) * | 2009-05-08 | 2010-11-11 | David Carter Pope | Computer-Implemented Systems and Methods for Determining Future Profitability |
US8473331B2 (en) | 2009-05-08 | 2013-06-25 | Sas Institute Inc. | Computer-implemented systems and methods for determining future profitability |
US8185432B2 (en) | 2009-05-08 | 2012-05-22 | Sas Institute Inc. | Computer-implemented systems and methods for determining future profitability |
US8977612B1 (en) | 2009-07-20 | 2015-03-10 | Google Inc. | Generating a related set of documents for an initial set of documents |
US8972394B1 (en) | 2009-07-20 | 2015-03-03 | Google Inc. | Generating a related set of documents for an initial set of documents |
US9418104B1 (en) | 2009-08-31 | 2016-08-16 | Google Inc. | Refining search results |
US9697259B1 (en) | 2009-08-31 | 2017-07-04 | Google Inc. | Refining search results |
US8498974B1 (en) | 2009-08-31 | 2013-07-30 | Google Inc. | Refining search results |
US8738596B1 (en) | 2009-08-31 | 2014-05-27 | Google Inc. | Refining search results |
US8972391B1 (en) | 2009-10-02 | 2015-03-03 | Google Inc. | Recent interest based relevance scoring |
US9390143B2 (en) | 2009-10-02 | 2016-07-12 | Google Inc. | Recent interest based relevance scoring |
WO2011045699A1 (en) * | 2009-10-14 | 2011-04-21 | Koninklijke Philips Electronics N.V. | Method and system for facilitating data entry for an information system |
CN102549589A (en) * | 2009-10-14 | 2012-07-04 | 皇家飞利浦电子股份有限公司 | Method and system for facilitating data entry for an information system |
US8898153B1 (en) | 2009-11-20 | 2014-11-25 | Google Inc. | Modifying scoring data based on historical changes |
US8874555B1 (en) | 2009-11-20 | 2014-10-28 | Google Inc. | Modifying scoring data based on historical changes |
WO2011079414A1 (en) * | 2009-12-30 | 2011-07-07 | Google Inc. | Custom search query suggestion tools |
US8615514B1 (en) | 2010-02-03 | 2013-12-24 | Google Inc. | Evaluating website properties by partitioning user feedback |
US8924379B1 (en) | 2010-03-05 | 2014-12-30 | Google Inc. | Temporal-based score adjustments |
US8959093B1 (en) | 2010-03-15 | 2015-02-17 | Google Inc. | Ranking search results based on anchors |
US20110314001A1 (en) * | 2010-06-18 | 2011-12-22 | Microsoft Corporation | Performing query expansion based upon statistical analysis of structured data |
US9623119B1 (en) | 2010-06-29 | 2017-04-18 | Google Inc. | Accentuating search results |
US8832083B1 (en) | 2010-07-23 | 2014-09-09 | Google Inc. | Combining user feedback |
US20130232147A1 (en) * | 2010-10-29 | 2013-09-05 | Pankaj Mehra | Generating a taxonomy from unstructured information |
US9002867B1 (en) | 2010-12-30 | 2015-04-07 | Google Inc. | Modifying ranking data based on document changes |
US8572096B1 (en) * | 2011-08-05 | 2013-10-29 | Google Inc. | Selecting keywords using co-visitation information |
US10540401B1 (en) | 2011-08-25 | 2020-01-21 | Isqft, Inc. | Construction permit processing system and method |
US9633012B1 (en) | 2011-08-25 | 2017-04-25 | Infotech International Llc | Construction permit processing system and method |
US9785638B1 (en) | 2011-08-25 | 2017-10-10 | Infotech International Llc | Document display system and method |
US9529868B1 (en) | 2011-08-25 | 2016-12-27 | Infotech International Llc | Document processing system and method |
US9946715B1 (en) | 2011-08-25 | 2018-04-17 | Infotech International Llc | Document processing system and method |
CN103425727A (en) * | 2012-05-14 | 2013-12-04 | 国际商业机器公司 | Contextual voice query dilation |
US9317550B2 (en) | 2012-07-20 | 2016-04-19 | Alibaba Group Holding Limited | Query expansion |
US20150213041A1 (en) * | 2013-03-15 | 2015-07-30 | Google Inc. | Search suggestion rankings |
US9183499B1 (en) | 2013-04-19 | 2015-11-10 | Google Inc. | Evaluating quality based on neighbor features |
US20240135098A1 (en) * | 2013-07-12 | 2024-04-25 | Microsoft Technology Licensing, Llc | Interactive concept editing in computer-human interactive learning |
CN104750819A (en) * | 2015-03-31 | 2015-07-01 | 大连理工大学 | A Biomedical Literature Retrieval Method and System Based on Word Grouping Algorithm |
CN105956010A (en) * | 2016-04-20 | 2016-09-21 | 浙江大学 | Distributed information retrieval set selection method based on distributed representation and local ordering |
CN106547864A (en) * | 2016-10-24 | 2017-03-29 | 湖南科技大学 | A kind of Personalized search based on query expansion |
US11182435B2 (en) * | 2016-11-25 | 2021-11-23 | Nippon Telegraph And Telephone Corporation | Model generation device, text search device, model generation method, text search method, data structure, and program |
US20190179946A1 (en) * | 2017-12-13 | 2019-06-13 | Microsoft Technology Licensing, Llc | Contextual Data Transformation of Image Content |
US11030205B2 (en) * | 2017-12-13 | 2021-06-08 | Microsoft Technology Licensing, Llc | Contextual data transformation of image content |
US12014387B1 (en) | 2021-07-23 | 2024-06-18 | Apttus Corporation | System, method, and computer program for providing a pricing platform for performing different types of pricing calculations for different customers |
US12067037B1 (en) * | 2022-02-28 | 2024-08-20 | Apttus Corporation | System, method, and computer program for performing natural language searches for documents in a database using alternate search suggestions |
Also Published As
Publication number | Publication date |
---|---|
US7287025B2 (en) | 2007-10-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7287025B2 (en) | Systems and methods for query expansion | |
US7099860B1 (en) | Image retrieval systems and methods with semantic and feature based relevance feedback | |
KR101109236B1 (en) | Suggest related terms for multi-meaning queries | |
KR101721338B1 (en) | Search engine and implementation method thereof | |
US9697249B1 (en) | Estimating confidence for query revision models | |
US8051080B2 (en) | Contextual ranking of keywords using click data | |
US8065310B2 (en) | Topics in relevance ranking model for web search | |
US7519588B2 (en) | Keyword characterization and application | |
US7523105B2 (en) | Clustering web queries | |
EP1396799B1 (en) | Content management system | |
US7958110B2 (en) | Performing an ordered search of different databases in response to receiving a search query and without receiving any additional user input | |
US7725451B2 (en) | Generating clusters of images for search results | |
US8645407B2 (en) | System and method for providing search query refinements | |
US20070185859A1 (en) | Novel systems and methods for performing contextual information retrieval | |
US20050149504A1 (en) | System and method for blending the results of a classifier and a search engine | |
EP2405370A1 (en) | Integration of multiple query revision models | |
EP1587010A2 (en) | Verifying relevance between keywords and web site contents | |
US20040002849A1 (en) | System and method for automatic retrieval of example sentences based upon weighted editing distance | |
US20100318531A1 (en) | Smoothing clickthrough data for web search ranking | |
US20060059132A1 (en) | Searching hypertext based multilingual web information | |
JP2005302043A (en) | Reinforced clustering of multi-type data object for search term suggestion | |
US20060230005A1 (en) | Empirical validation of suggested alternative queries | |
JP2003067397A (en) | Content control system | |
JP2002024262A (en) | Method and device for estimating information source location and storage medium stored with information source location estimating program | |
JP2006529044A (en) | Definition system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MICROSOFT CORPORATION, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WEN, JI-RONG;CUI, HANG;MA, WEI-YING;REEL/FRAME:013776/0008;SIGNING DATES FROM 20030130 TO 20030131 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034541/0477 Effective date: 20141014 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20191023 |