US12174839B2 - Relevant passage retrieval system - Google Patents
Relevant passage retrieval system Download PDFInfo
- Publication number
- US12174839B2 US12174839B2 US16/303,274 US201616303274A US12174839B2 US 12174839 B2 US12174839 B2 US 12174839B2 US 201616303274 A US201616303274 A US 201616303274A US 12174839 B2 US12174839 B2 US 12174839B2
- Authority
- US
- United States
- Prior art keywords
- query
- passage
- passages
- ranked
- electronic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24578—Query processing with adaptation to user needs using ranking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9558—Details of hyperlinks; Management of linked annotations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- Users of computing systems run searches for electronic objects using queries. For example, users run searches on the Internet, email systems, video archives, and other databases. As the number of electronic objects being stored grows, it is becoming increasingly challenging to search a corpus of electronic objects and return relevant results to a user. Further, users increasingly expect to quickly access information relevant to the query without having to access the various electronic objects that may be returned by the query. This is particularly true for queries that are submitted using a mobile device. The small form factor of mobile devices makes it more difficult for users to sift through the electronic objects returned in response to a query. Because of this, it is beneficial to provide a mechanism in which an answer to a query is provided directly to the user without requiring the user to actually access an electronic object to find the answer.
- aspects of the present technology relate to returning results from user queries.
- a user may provide a query, such as a natural language query, seeking information into a web browser, an email search interface, or cloud search interface, a file system search interface, or any other type of search interface.
- aspects of the technology described herein provide systems and methods to identify highly relevant passages from a corpus of electronic objects (such as the web pages, word processing documents, spreadsheets, videos, etc.) and return the most relevant passage(s) that answers the user query.
- the passage may be obtained directly from the electronic object or may be generated from multiple, highly-ranked passages from one or more electronic objects.
- Other information may be returned such as the location of the electronic object from where the passage(s) was obtained. This may be a URL link, an email link, or other object link.
- Examples are implemented as a computer process, a computing system, or as an article of manufacture such as a device, computer program product, or computer readable medium.
- the computer program product is a computer storage medium readable by a computer system and encoding a computer program comprising instructions for executing a computer process.
- FIG. 1 illustrates an exemplary networked-computing environment for retrieving relevant passages from a corpus of electronic objects.
- FIG. 2 illustrates an exemplary method for providing a passage.
- FIG. 3 illustrates an exemplary passage retrieval system for returning relevant passages based on a query.
- FIG. 4 illustrates an example output of a relevant passage and a link to an exemplar electronic object.
- FIG. 5 illustrates an exemplary a method for identifying a query dependent passage.
- FIG. 6 illustrates a method for identifying a query independent passage in a document.
- FIG. 7 illustrates an exemplary method for ranking a passage based on a semantic translation model.
- FIG. 8 illustrates an example output resulting from identifying features of a passage based on a query and a sample target electronic object.
- FIG. 9 illustrates an exemplary method of matching a passage type to a query type.
- FIG. 10 illustrates an example output generated using a method for matching a passage type to a query type.
- FIG. 11 illustrates an exemplary method of matching an electronic object to a query based on contextual meaning.
- FIG. 12 provides an example of matching an electronic object to a query based on a heading of the electronic object.
- FIG. 13 illustrates an exemplary method of aggregation model of ranking passages.
- FIG. 14 is an example of applying the aggregation method on the passage.
- FIG. 15 illustrates an exemplary method of matching answer patterns in passages to query answer patterns for queries.
- FIG. 16 provides an example of the results generated using a method of matching answer patterns to queries.
- FIG. 17 is a block diagram illustrating example physical components of a computing device with which aspects of the disclosure may be practiced.
- FIGS. 18 A and 18 B are simplified block diagrams of a mobile computing device with which aspects of the present disclosure may be practiced.
- FIG. 19 illustrates one example of the architecture of a system for web-scale passage retrieval as described above.
- aspects of the disclosure are described more fully below with reference to the accompanying drawings, which form a part hereof, and which show specific exemplary aspects.
- different aspects of the disclosure may be implemented in many different forms and should not be construed as limited to the aspects set forth herein; rather, these aspects are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the aspects to those skilled in the art.
- aspects may be practiced as methods, systems or devices. Accordingly, aspects may take the form of a hardware implementation, an entirely software implementation or an implementation combining software and hardware aspects. The following detailed description is, therefore, not to be taken in a limiting sense.
- Existing search systems retrieve electronic objects, or links to electronic objects, as results.
- a web search system may retrieve URL links to web pages in response to a query while a file system search interface may retrieve one or more files.
- the results returned from existing search systems are not the “precise information” that the user who submitted the query is looking for. For example, in the case of a web search system, the user must click one of the returned URL links to access an electronic object which actually contains the information that the user is looking for.
- passage is a piece of information or content that provides more direct fulfillment of a user's search intent. For example, in response to Internet search query, rather than just retuning a list of URLs, the content of a group of URLs is searched to identify and extract the relevant information to answer the query. A passage, either generated or extracted from the URLs, will be returned to the user. It will be appreciated that, while this disclosure often refers to URLs and returning text based passages, this disclosure is not so limited. A passage is not limited to text information. In various aspects a passage can be an image, video, and/or a combination of different information.
- semantic meaning may be interpreted using keywords and entities between a query and a passage. Keywords are words may be helpful in determining intents and domains of sentences, phrases, and passages in an electronic object as well as a query. Entities are similar to keywords, but additionally are words or phrases that have, as used in the context of the natural language query or passage, an alternative meaning than the literal definition. For example, the words “super bowl” typically would not literally mean an excellent stadium, but normally refers the championship game of the National Football League.
- aspects of the present technology include the use of machine-learned models.
- the machine learned models will use a set of training data to learn how to recognize semantic patterns in natural language.
- the training data will be tagged in such a way as to train the model for a particular task. For example, when identifying key words in a sentence, the model will be trained to recognize California king as one entity related to mattresses, and not necessarily a monarch of the state of California.
- machine learning approaches and models can be employed by the technology disclosed herein including, but not limited to, decision tree learning, association rule learning, artificial neural networks, deep learning, support vector machines, Bayesian networks, etc.
- a user may submit the following query to an Internet search engine: “Will a California king fit on a king frame?”
- a traditional search engine will provide a list of links to web pages that were identified as related to the query.
- aspects of the technology described herein returns an answer such as “A standard king-sized bed frame measures 76 inches wide by 80 inches long.
- the technology described herein may be applied to identify/generate a passage that provides a more narrowly tailored answer to this and other queries, thereby making the precise information available to the user in faster than traditional search systems.
- aspects disclosed herein relate to a retrieval system for finding the relevant passages from across an entire web of electronic objects, such as the Internet. Additionally, electronic object ranking and passage ranking algorithms and techniques are disclosed. As such, in addition to providing other benefits, aspects of the technology disclosed herein not provide improvements to the coverage of existing answer retrieval systems but also provide more precise passages by using web ranking algorithms and techniques. For ease of illustration, various aspects of the disclosure are described with respect to providing passages identified or synthesized using web content. However, one of skill in the art will appreciate that the technologies described herein can be employed across various different types of content stores such as, but not limited to, databases, file systems, email mailboxes and/or archives, social media networks, etc.
- FIG. 1 illustrates a networked-computing environment 100 for retrieving relevant passages from a corpus of electronic objects.
- the corpus of electronic objects is defined by the electronic objects that are accessible by a search application.
- the size of the corpus of electronic objects may vary depending on the type of search application. For example, the corpus of documents available to a file system search application is limited to the electronic objects stored on the file system.
- the corpus of electronic objects available to a web search application may include every electronic object that is accessible on the Internet.
- FIG. 1 includes a computing device 102 , a networked-database 104 , and a server 106 , each of which is communicatively coupled to each other via a network 108 .
- the computing device 102 may be any suitable type of computing device.
- the computing device 102 may be one of a desktop computer, a laptop computer, a tablet, a mobile telephone, a smart phone, a wearable computing device, or the like.
- aspects of the current technology include the computing device storing a browser 110 and a search application 112 .
- a browser 110 may be an interact search browser.
- a user may input search queries to the browser 110 through a variety of means including text, touch, gesture, or spoken language.
- the browser 110 may be configured to receive the query and transmit the query to a search engine stored on one or more servers, such as search engine 114 stored on server 106 .
- the computing device 102 includes a search application 112 .
- the search application 112 may be a search application programmed to receive queries for searching electronic objects such as emails, video files, word processing documents, etc.
- the received input is a natural language query.
- the received query may be sent to the server 106 .
- Aspects of the technology include the search application 112 having the ability to receive queries though text, touch, and/or speech input.
- the search engine 114 may be configured to receive the query and transmit the query to a search engine stored on one or more servers, such as search engine 114 stored on server 106 .
- the System 100 may also include a database 104 .
- the database 104 may be used to store a variety of information to aid in retrieving relevant passages from a corpus of electronic objects.
- a Deep Neural Network 118 (“DNN”) and a machine learned model 120 may be provided to the server 106 to aid in retrieving relevant passages from a corpus of electronic objects.
- a search engine 114 and a passage extraction engine 116 may reside on a remote device, such as server 106 . In other examples, however, the search engine 114 and a passage extraction engine 116 may be stored on another computing device, such as computing device 102 .
- One or both of the search engine 114 and the passage extraction engine 116 receives queries from computing devices such as computing device 102 .
- the search engine 114 receives a query and, using the query, identifies a corpus of potential electronic objects that may satisfy the query.
- the passage extraction engine 116 identifies, analyzes, and ranks the passages within a corpus of electronic objects, such as those objects identified by the search engine 114 .
- Network 108 facilitates communication between devices, such as computing device 102 , database 104 , and server 106 .
- the network 108 may include the Internet and/or any other type of local or wide area networks. Communication between devices allows for the exchange of queries, information related to the corpus of electronic objects, relevant passages, and other information.
- FIG. 2 illustrates a method 200 for providing a passage.
- method 200 may be executed on a device comprising at least one processor configured to store and execute operations, programs or instructions. However, method 200 is not limited to such examples. In other examples, method 200 may be performed on an application or service for providing the storage and/or manipulation of user input. In at least one example, method 200 may be executed (e.g., computer-implemented operations) by one or more components of a distributed network, such as a web service/distributed network service (e.g. cloud service).
- Method 200 begins with receive query operation 202 .
- receive query operation 202 a query is received.
- the query may be a natural language query received by a search application, a web browser, a personal digital assistant, etc.
- the query is a structured query that conforms to a specific syntax.
- Method 200 proceeds to identify corpus operation 204 .
- a corpus of electronic objects such as a database storing emails or an Internet domain is identified.
- a conventional search engine may be employed to identify the corpus of electronic objects relevant to the query. For example, a list of top URLs generated by a search engine may constitute the corpus of electronic objects, with the electronic objects being the page represented by the URLs.
- the corpus may be identified based upon the query. Alternatively, the corpus may be identified means without using a query, such as a specific database, a web domain, an email server, etc. In further aspects, the corpus may be identified based upon the networks and/or data stores that are accessible to the application that received the query.
- additional processing may be performed to identify the corpus.
- filtering and/or ranking may be utilized to identify a corpus of electronic objects.
- a search engine such as an Internet search engine, may receive a query and, based upon the query, produces a ranked list of top candidate URLs that include information to likely satisfy the query.
- the top candidate URLs as opposed to all related URLs, may be identified as the corpus of electronic objects.
- the query used by the search engine may be the same or similar to the query received in operation 202 . That is, the search engine processes or otherwise alters the query prior to identifying the corpus of electronic objects.
- Method 200 then proceeds to parse content operation 206 .
- each electronic object that is part of the corpus may be parsed.
- only a number of highly ranked electronic objects e.g., 5, 10, 15, 20 . . . 200, etc.
- parsing may be computationally expensive and time consuming, it may be beneficial to only parse a set of highly ranked electronic objects in order provide a timely response to the query.
- pre-parsing of the corpus of electronic objects may be performed before receiving the query.
- the content of the electronic objects are parsed.
- the content of each web page is parsed. Parsing may determine and delineate the content into passages. This may be a rule based system, such as delineating content into passages by identifying carriage returns in a document, the presence of a body, chapters in a video, a digital mark in an audio clip, etc. Additionally, or alternatively, content with similar semantic meaning may be grouped together as a passage.
- a semantic meaning of each passage may be determined during the parsing operation 206 .
- the semantic meaning may be determined using a machine learned model.
- Method 200 then proceeds to rank passages in each electronic object (or a subset of electronic objects identified by rank) in the corpus at operation 208 .
- Ranking the passages of each electronic object may be performed by comparing the semantic meaning of the passage with the semantic intent of the query.
- Key entities of a query may be compared to entities in the passage by building a semantic translation model between the passage and the query.
- the query answer type may be compared to the information presented in the passage.
- the context of the query and the passage may be compared.
- key-features between the query and the passage may be analyzed.
- the result of performing one or more of these techniques on a passage is a numerical score indicating the likelihood that the passage directly answers the received query.
- Method 200 then proceeds to rank aggregate passages operation 210 .
- the method 200 may identify each ranked passage from each top candidate document.
- Each ranked passage may then be aggregated with the ranked passages from other electronic objects.
- the aggregated passages are be ranked against one another.
- ranking occurs by comparing the numerical scores of the passages, which score was determined in operation 208 .
- an alternate model may be used to generate a new score for the aggregated passages.
- Method 212 then proceeds to select passage operation 212 .
- select passage operation the one or more top ranked passages from the ranked aggregated passages is chosen.
- the selected passage(s) is then returned at operation 214 .
- FIG. 3 illustrates an exemplary passage retrieval system 300 for returning relevant passages based on a query.
- the passage retrieval system 300 may be integrated with a web search infrastructure.
- Passage retrieval system 300 interprets a query at interpretation service 302 .
- the query may be interpreted to determine the intent of the query as well as an expected answer type. For example, user may query a system with “what drinks have the least amount of calories.”
- the intent of the user may be identified as wanting to search the Internet to determine a list of beverages along with the number of calories associated with each beverage, rank ordered from least to highest.
- the intent of the query may be determined using machine learning, a deep neural network, or any other mechanism known to the art without departing form the scope of this disclosure.
- Query interpretation service 302 may also perform additional operations such as parsing the query, transforming the query, determine whether additional information is needed for the query, prompt the user for additional information, etc.
- the query, along with any interpretations, are sent to selection service 304 .
- the extraction service may analyze a corpus of electronic documents prior to any query being received.
- the extraction service may analyze electronic objects by identifying signals within the object. The signals suggest whether the electronic object has content that may be useful.
- the content is parsed into individual passages. Where the passage has a signal that associated with likely relevance, the passage is flagged as potentially relevant.
- the passage extraction service identifies and formulates passage candidates from electronic objects that may be relevant (or will not be relevant) prior to receiving a query (offline extraction) or after receiving a query (online extraction).
- the offline extraction allows for faster processing of a query, in aspects.
- a web domain may be searched and the content may be parsed. The content may be identified as containing mostly advertisements, or empty pages, in such an example, the web pages may be marked as not relevant.
- the flagged passages and/or electronic objects are sent to passage ranking service 310 .
- candidate electronic objects are determined for further processing.
- the system may identify a list of web pages that are sorted by relevant keyword calorie, drink, beverage, etc. Other methods of identifying relevant webpages in the URL may be used.
- the resulting corpus of electronic documents is sent to the document ranking service 306 .
- the returned electronic objects may then be ranked by keyword or semantic relevance at document ranking service 306 .
- the top-most candidates determined at document ranking service 306 are passed to pre-caption ranking service 308 .
- Pre-caption ranking service 308 further determines a relevance of the electronic objects.
- a numerical score, such as a probability, may be associated with each candidate document in order to determine relevance.
- the result of pre-caption ranking service 308 is a list of pre-caption ranked electronic objects (such as URLs linking relevant web pages).
- the results of pre-caption ranking service 308 are passed to both caption generator 311 and passage ranking service 310 .
- passage ranking service 310 individual passages of each of the results of pre-caption ranking service 308 are ranked.
- the identification of passages is performed with passage extraction service 303 .
- the pre-caption ranking service may have returned a list of 10 URLs that are relevant to the intent of the user.
- Each passage within the electronic objects identified by each URL is ranked.
- the different electronic objects identified by the different URLs may contain a varying number of passages. That is, the electronic object identified by URL 1 (hereinafter referred to as “URL 1”) may have 3 passages, the electronic object identified by URL 2 (hereinafter referred to as “URL 2”) may have 7 passages, etc.
- the passages within each identified electronic object may be ranked against one another. As an example, the URL 1 will be ranked against the other passages of URL 1, the passages in URL 2 will be ranked against other passages in URL 2, and so on Ranking may be performed using keywords, relevance to the user intent, etc.
- the passage ranking service 310 identifies and ranks passages from the electronic objects included in the pre-caption results. Ranking the passages of each page may be performed by comparing the semantic meaning of the passage with the semantic intent of query. Key entities of a query may be compared to entities in the passage by building a semantic translation model between the passage and the query. Also, the query answer type may be compared to the information presented in the passage. Further, the context of the query and the passage may be compared. Additionally, cross-document aggregation may be used. Further, key-features between the query and the passage may be analyzed. In aspects of the technology, the result of performing one or more of these techniques on a passage is a numerical score indicating the likelihood that the passage directly answers the received query.
- the n-top ranked passages of each electronic object ranked at pre-caption ranking service 310 is passed to aggregate rank service 312 .
- aggregate rank service 312 all of the ranked passages are ranked against each other. For example, each of the top three passages from URL 1 may be received, the top three passages from URL 2 may be received, and the top three passages from URL 3 may be received for a total of 9 URLs. These top passages, known as the aggregated passages, are then ranked against each other. Ranking may occur using the numerical scores assigned to each passage at service. Additionally, or alternatively, the numerical scores of each passage may be recalculated. The resulting top n-most passage is passed to merge service 312 .
- caption generator generates one caption for each of the pre-caption ranked electronic objects resulting from the pre-caption ranking service 308 .
- the caption generator parses the content of each pre-caption ranked electronic object (or portion thereof) and generates a caption.
- the caption may be generated using a machine learned model.
- the caption is applied the electronic objects and to form a captioned electronic objects.
- the captioned electronic objects are then sent to the post-caption ranking service 314 .
- the post-caption re-ranking service 314 re-ranks the ranked electronic objects received by the caption generator 311 .
- the re-ranking includes an analysis of the summary generated in caption generator 311 .
- the analysis may be performed by a machine learned model. This results in post-caption re-ranked electronic objects.
- the results are sent to merge service 316 .
- the top n-most passage(s) are formatted for presentation along with one or more links to post-caption re-ranked electronic objects.
- the top most passage from an electronic object (such as a web page) may be formatted in such a way so as to display in a mobile device, and a link to the top post-caption re-ranked electronic objects may be included.
- FIG. 4 illustrates an example output of a relevant passage and a link to an exemplar electronic object.
- example output 400 includes a passage 402 and a link 404 .
- the passage 402 is a passage that answers a query, such as a user entering a query into a browser, based on the interpretation of the intent of the query.
- the summary may be an actual passage from one or more top electronic objects, such as the web paged linked by link 404 , or may be generated using an amalgamation of various top-ranked electronic objects.
- the link 404 shown in example output 400 is a URL, though it need not be.
- multiple links are provided.
- the multiple links may be the top-n most relevant links as determined by the systems and methods described herein.
- FIG. 5 illustrates a method 500 for identifying a query dependent passage.
- method 500 may be implemented to fine relevant passages based on comparing a passage to the query.
- method 500 may be executed on a device comprising at least one processor configured to store and execute operations, programs or instructions. However, method 500 is not limited to such examples.
- method 500 may be performed on an application or service for providing the storage and/or manipulation of user input.
- method 500 may be executed (e.g., computer-implemented operations) by one or more components of a distributed network, such as a web service/distributed network service (e.g. cloud service).
- a distributed network such as a web service/distributed network service (e.g. cloud service).
- Method 500 begins with receive query operation 502 .
- receive query operation 502 a query is received.
- the query may be received from a client device, such as a client device 102 .
- the query may be a natural language query entered into a web browser.
- the query is then parsed to determine information about the query. For example, the semantic meaning of the query may be determined, keywords and entities may be identified, and answer type may be determined. This may be performed using a machine learned model.
- Method 500 then proceeds to receive electronic object operation 504 .
- a list of electronic objects (such as web pages, emails, word documents) is received based on the received query. For example, where the query is an Internet search query entered into a browser, a search engine will run a search using the query. In such a case, a list of links will be received.
- Method 500 then proceeds to parse electronic object operation 506 .
- the content of the electronic object is parsed.
- the parsing may be done by extracting the content (textual, audio, video, etc.) of the electronic object, such as a web page, email, or word processing documents.
- the parsing may be done using a machine learned model. Parsing may identify passages in the content, such as paragraphs or sentences in an electronic object that have similar semantic meaning. From such parsing, the semantic meaning of the passages within the electronic object, is identified.
- the semantic meaning and other information derived at operation 506 may be compared to information related to the query at the compare operation 508 .
- the keywords, entities, answer type derived from in operation 502 may be compared to the information derived at operation 506 .
- the comparison may identify passages that are highly relevant to the query. Each passage may be ranked according to relevance of the passage.
- the method 500 then proceeds to identify passage based on comparison operation 510 .
- the top n-most passages are identified as highly relevant. Relevancy may be determined in a variety of ways. For example, key entities of a query may be compared to entities in the passage by building a semantic translation model between the passage and the query. Also, the query answer type may be compared to the information presented in the passage. Further, the context of the query and the passage may be compared. Additionally, cross-document aggregation may be used. Further, key-features between the query and the passage may be analyzed. In aspects of the technology, the result of performing one or more of these techniques on a passage is a numerical score indicating the likelihood that the passage directly answers the received query.
- One or more highly relevant passages may be identified in identify operation 510 . Identifying may be performed by choosing the top most relevant passage. The passages may have been assigned a numerical score. In such cases, identification may be any passage that exceeds a threshold numerical value.
- FIG. 6 illustrates a method 600 for identifying a query independent passage in a document.
- method 600 may be used to identify passages as potentially relevant irrespective of any query. That is, no query is necessarily used to determine potential relevance.
- method 600 may be executed on a device comprising at least one processor configured to store and execute operations, programs or instructions. However, method 600 is not limited to such examples. In other examples, method 600 may be performed on an application or service for providing the storage and/or manipulation of user input. In at least one example, method 600 may be executed (e.g., computer-implemented operations) by one or more components of a distributed network, such as a web service/distributed network service (e.g. cloud service).
- a distributed network such as a web service/distributed network service (e.g. cloud service).
- Method 600 begins with identify electronic objects operation 602 .
- various electronic objects may be identified. Identification may be based on a set list (such as all web pages in a popular URL, all emails from a particular user, all documents created on or before a certain date).
- Method 600 then proceeds with parse content operation 604 .
- the electronic objects are parsed to determine the attributes of the electronic object. Parsing may include identifying metadata for the corpus of electronic objects. In other aspects, the content may be parsed. For example, where the electronic object is a web page, it may be determined that the URL does not include any discernable information, is mostly dedicated to advertisements, or is associated with a domain.
- Static signals may include a variety of signals.
- the static signal may be a threshold length of the document, a threshold number of grammatical errors in a document, a list of trusted/untrusted authors, etc.
- the static signal may be untrusted/trusted URLs, the number of pop-ups, the presence of a paywall, etc.
- Electronic objects or passages within the electronic object may be flagged or ranked as useful/not useful based on the presence or absence of a static signal.
- Method 600 then proceeds to generate or identify passage 608 .
- passages of the electronic object that are flagged as not useful may be excluded. Where the static signal highlights highly relevant passages, those passages may be identified as useful and flagged as much.
- FIG. 7 illustrates a method 700 for ranking a passage based on a semantic translation model.
- method 700 may be executed on a device comprising at least one processor configured to store and execute operations, programs or instructions. However, method 700 is not limited to such examples. In other examples, method 700 may be performed on an application or service for providing the storage and/or manipulation of user input. In at least one example, method 700 may be executed (e.g., computer-implemented operations) by one or more components of a distributed network, such as a web service/distributed network service (e.g. cloud service).
- a distributed network such as a web service/distributed network service (e.g. cloud service).
- Method 700 begins with receive query 702 .
- receive query operation 702 a query is received.
- the query may be received from a client device and sent to a server for processing, such as a client device 102 .
- the query may be a natural language query entered into a web browser.
- Method 700 then proceeds to identify semantic units in a query operation 704 .
- Semantic units in the query operation may be identified using a machine learned model. For example, for the query “Will a California king fit on a king frame?” the entities in query may be identified as ⁇ California_king ⁇ and ⁇ king_frame ⁇ .
- the method 700 then proceeds to identify semantic units in a passage 706 .
- the semantic units of a passage may be identified using a machine learned model. Following the above example, a passage may be: “Insta-lock Queen, King, Cal-king Bedframe It's a bed frame We purchased this bed frame for a mattress & box springs. The instructions were easy to read & the frame went together without any problems.”
- Another passage may be “A California king bed measures 72 inches wide and 84 inches long, and a standard king mattress measures 76 inches wide and 80 inches long, so while a California king mattress is four inches longer, it is also about four inches thinner than a standard king bed.” Each passage is parsed to identify entities.
- Method 708 then proceeds to generate semantic translation model “TM” 708 .
- a TM model is generated by building a semantic unit relationship table with translation probabilities between the query and the passage.
- the entity relationship table in aspects, generates a probability that the entity identified in the query is present in the passage.
- FIG. 8 illustrates experimental output resulting from identifying features of a passage based on a query and a sample target electronic object.
- the semantic translation model is generated with machine learning method.
- column one 802 includes semantic units of source query
- column two 804 includes target passage semantic units
- column three 806 is the probability that the semantic unit in the passage is the same as the entity used in the query
- column 4 808 is the probability that the semantic unit in the query is the same as the semantic unit in the passage. The higher the probability the most likely the source will match the target. Note that P(target source) normally is not the same as P(source target).
- column one 802 includes a semantic unit of source query
- column two 804 includes target passage semantic unit
- column three 806 is the probability that the semantic unit in the passage is the same as the semantic unit used in the query
- column 4 808 is the probability that the semantic unit in the query is the same as the semantic unit in the passage.
- FIG. 9 is a method 900 of matching a passage type to a query type.
- method 900 may be executed on a device comprising at least one processor configured to store and execute operations, programs or instructions. However, method 900 is not limited to such examples. In other examples, method 900 may be performed on an application or service for providing the storage and/or manipulation of user input. In at least one example, method 900 may be executed (e.g., computer-implemented operations) by one or more components of a distributed network, such as a web service/distributed network service (e.g. cloud service).
- a distributed network such as a web service/distributed network service (e.g. cloud service).
- Method 900 beings with extract answer type from query operation 902 .
- a query type of the answer may be numerical, a location, a time, a commercial center, etc.
- a query may ask “how many teaspoons in a cup.” In such an instance, the answer to a query type would be numerical.
- a query may be “where can I buy sushi grade fish near me.” In such a case, the answer type would be a commercial center, such as a fish store.
- a query is parsed using a machine learned model to determine the answer-type of the query.
- An entity type may be identified an extracted using a machine learned model.
- the entity type may be one that is numerical, a location, a time, a commercial center, etc. For example, if the passage includes the phrase “There are 48 teaspoons in a cup,” the entity 48 would be identified as numerical.
- Another passage may include the phrase “teaspoon sizes originated in England, where the price of tea caused tea cups and spoons to shrink.” The entity England would be identified as a location.
- An entity may have multiple entity types. For example, consider the following text from a passage: “Von Miller was the MVP of Super Bowl 50.” The following types may be associated with Von Miller: Person, Athlete, American Football Player.
- the method 900 then proceeds to rank passages 906 .
- rank passage operation 906 the passages are ranked by comparing the answer type to the entity type identified in operation 904 .
- a numerical answer e.g., “how many teaspoons in a cup”
- passages with a numerical entity e.g., “there are 48 teaspoons in a cup”
- Other passages where the entity did not match the answer type e.g., “teaspoon sizes originated in England, where the price of tea caused tea cups and spoons to shrink” would be flagged as likely not relevant.
- the ranking of the passages is based on a machine learned model.
- FIG. 10 is an example output using a method of matching a passage type to a query type.
- a query 1002 “what are the drinks with the’ least amount of calories” has been parsed to identify a semantic intent 1004 “alcoholic drinks with least calories.”
- FIG. 10 also includes the resolution of the answer type 1006 sought by the query 1002 as numeric and has identified key words as “calories.” Identification of semantic intent, answer types, and keywords may be performed by using a machine learned model.
- FIG. 10 also includes a passage 1008 that has been parsed.
- the passage is parsed using a machine learned model.
- the parsed passage includes identified entities “calories” that is the same as the key words identified in the query. Further, the calories include a modifier “100,” which matches up with the query answer type numeric. Accordingly, the passage will be given a relatively high score.
- the score is calculated by adding the number of like elements of a passage (e.g., like answer types/entities, and keywords) of a passage. Scores may be calculated other ways.
- FIG. 11 is a method 1100 of matching an electronic object to a query based on contextual meaning.
- method 1100 may be executed on a device comprising at least one processor configured to store and execute operations, programs or instructions. However, method 1100 is not limited to such examples. In other examples, method 1100 may be performed on an application or service for providing the storage and/or manipulation of user input. In at least one example, method 1100 may be executed (e.g., computer-implemented operations) by one or more components of a distributed network, such as a web service/distributed network service (e.g. cloud service).
- a distributed network such as a web service/distributed network service (e.g. cloud service).
- Method 1100 begins with extract nearest heading to passage as at operation 1102 .
- heading may refer to any marker in an electronic object that that indicates a section of the electronic object. This may include title, caption, legend, subtitle, subheading, rubric, headline, etc.
- the nearest heading to a passage may be extracted. This may be done by identifying tagged content in an electronic object (such as a web page that identifies a phrase as a heading), the title of a video, the subject line of an email, etc. Identification may be performed using a machine learned model.
- Method 1100 then proceeds to identify semantic meaning of query 1104 .
- the semantic meaning of a query may be identified using a machine learned model.
- Method 1100 then proceeds to compare heading to query operation 1106 .
- operation 1104 the contextual meaning of each heading is compared to the semantic meaning of the query. Where the heading is similar to the query, the heading is marked as relevant.
- Method 1100 then proceeds to flag passage operation 1108 .
- passages associated with a heading that has been marked as relevant are flagged as likely relevant.
- a numerical score associated with the relevance of the passage is increased based on a flagged header.
- FIG. 12 is an example of matching an electronic object to a query based on the heading of the electronic object.
- the query 1202 includes “how to change the excel cell width.”
- the headings of a web page are analyzed to determine potentially relevant passages.
- the heading and query may be analyzed using the methods described herein, including method 1100 .
- the title of the heading of the web page 1204 is “Changing Column Widths and Row Heights in Excel.” Further sub-headings within the web page are analyzed.
- the sub heading “how to change column widths” 1206 and sub-heading “how to change row heights” 1208 are analyzed.
- the passage 1208 associated with the sub-heading “how to change column widths” 1206 is flagged as potentially relevant.
- This relevant passage 1210 may then be processed further, given a numerical score, or have the numerical score increased.
- FIG. 13 is a method 1300 of ranking passages.
- Method 1302 begins with select top passages.
- top candidate passages are identified.
- the method 1300 then proceeds select top terms operation 1304 . From each passage 1302 . In operation 1304 , the terms that are the same or similar to terms of query are identified in each passage. Identification of top terms may be done by using a machine learned model.
- the method 1300 then proceeds to identify entities operation 1306 .
- entities that are similar to or the same as those of the query (or related to those of the query) of the passage are identified. Same, similar, or related entities may be determined using a machine learned model.
- each passage is ranked.
- the passage is ranked summing the number of keywords and entities identified in operation 1306 and 1304 .
- the passage with the highest total may be ranked first, in aspects.
- FIG. 14 is an example 1400 of applying the method 1300 on the passage “The Fifteenth Amendment (Amendment XV) to the United States Constitution prohibits the federal and state governments from denying a citizen the right to vote based on that citizen's “race, color, or previous condition of servitude” 1402 .
- the query 1404 is “15 th amendment definition.”
- top terms identification 1406 the top terms, such as the term “amendment”, “constitution”, “citizens”, “fifteenth”, and “prohibits” are identified. This is five terms.
- the example 1400 also includes identifying similar, same, or related top entities.
- the entities “condition_of_servitude,” “previous_condition,” “the_15th_amendment_to_the_united_states_the_united_states_constitution,” “the_u_s_constitution voting_rights an_amendment the_fifteenth_amendment” “the_right_to_vote southern_states the_13 th ” are identified. This is a total of 5 entities. In embodiments, the passage would be given a rank of 10.
- FIG. 15 illustrates a method 1500 of matching answer patterns in passages to answer types related to queries.
- Method 1500 begins with determine query answer pattern operation 1502 .
- a query answer pattern is determined based on the query pattern.
- a query may be in a pattern such as who is noun, where is noun, etc.
- a translation table between the query pattern and the pattern of the expected answer, e.g., the query answer pattern may be established.
- Method 1500 then proceeds to determine passage pattern operation 1504 .
- determine passage pattern operation 1504 the pattern of passage is determined.
- the semantic pattern of the passage may be determined using a machine learned model.
- Method 1500 proceeds to compare passage pattern with query answer pattern operation 1506 .
- the patterns of the passage are identified that have a similar or same pattern as the query answer pattern.
- Method 1500 proceeds to score passage operation 1508 . Based on the similarity, the passage is scored and/or moved to a higher rank. Various different probabilistic models may be employed to score the passage based upon the similarity.
- FIG. 16 is an example 1600 of the results of performing method 1500 on a passage.
- the query 1602 the query is “who was the king of England after Queen Elizabeth 1.”
- the query answer pattern is passages with answer pattern 1608 “ ⁇ answer> was king of king ⁇ answer>, who father, ⁇ answer>.”
- Top answer 1610 is discovered by analyzing the top N passages 1606 .
- the answer includes the entities “the virgin queen anne_boleyn the daughter king_henry_viii,” “when_she queen_elizabeth_i prince_james,” “james_vi_of_scotland,” and “44_years_of rule.”
- the most relevant passage 1612 associated with the answer is “After the death of Queen Elizabeth I without issue, in 1603 , the crowns of England and Scotland were joined in personal union under King James VI of Scotland, who became James I of England.” This may be provided to a user as a direct answer to the query.
- FIGS. 17 - 19 and the associated descriptions provide a discussion of a variety of operating environments in which examples are practiced.
- the devices and systems illustrated and discussed with respect to 17 - 19 are for purposes of example and illustration and are not limiting of a vast number of computing device configurations that are utilized for practicing aspects, described herein.
- FIG. 17 is a block diagram illustrating physical components (i.e., hardware) of a computing device 1700 with which examples of the present disclosure may be practiced.
- the computing device 1700 includes at least one processing unit 1702 and a system memory 1704 .
- the system memory 1704 comprises, but is not limited to, volatile storage (e.g., random access memory), non-volatile storage (e.g., read-only memory), flash memory, or any combination of such memories.
- the system memory 1704 includes an operating system 1705 and one or more program modules 1706 suitable for running software applications 1750 .
- the system memory 1704 includes the system 100 .
- the operating system 1705 is suitable for controlling the operation of the computing device 1700 .
- aspects are practiced in conjunction with a graphics library, other operating systems, or any other application program, and are not limited to any particular application or system.
- This basic configuration is illustrated in FIG. 17 by those components within a dashed line 1708 .
- the computing device 1700 has additional features or functionality.
- the computing device 1700 includes additional data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Such additional storage is illustrated in FIG. 17 by a removable storage device 1709 and a non-removable storage device 1710 .
- a number of program modules and data files are stored in the system memory 1704 .
- the program engines 1706 e.g., the engines of system 1700
- the program engines 1706 perform processes including, but not limited to, one or more of the stages of the methods used for aggregating and modeling illustrated in FIGS. 2 , 3 , 5 , 6 , 7 , 9 , 11 , 13 , and 15 .
- other program modules are used in accordance with examples and include applications such as electronic mail and contacts applications, word processing applications, spreadsheet applications, database applications, slide presentation applications, drawing or computer-aided application programs, etc.
- aspects are practiced in an electrical circuit comprising discrete electronic elements, packaged or integrated electronic chips containing logic gates, a circuit utilizing a microprocessor, or on a single chip containing electronic elements or microprocessors.
- aspects are practiced via a system-on-a-chip (SOC) where each or many of the components illustrated in FIG. 17 are integrated onto a single integrated circuit.
- SOC system-on-a-chip
- such an SOC device includes one or more processing units, graphics units, communications units, system virtualization units and various application functionality all of which are integrated (or “burned”) onto the chip substrate as a single integrated circuit.
- the functionality, described herein is operated via application-specific logic integrated with other components of the computing device 1700 on the single integrated circuit (chip).
- aspects of the present disclosure are practiced using other technologies capable of performing logical operations such as, for example, AND, OR, and NOT, including but not limited to mechanical, optical, fluidic, and quantum technologies.
- aspects are practiced within a general purpose computer or in any other circuits or systems.
- the computing device 1700 has one or more input device(s) 1712 such as a keyboard, a mouse, a pen, a sound input device, a touch input device, etc.
- the output device(s) 1714 such as a display, speakers, a printer, etc. are also included according to an aspect.
- the aforementioned devices are examples and others may be used.
- the computing device 1700 includes one or more communication connections 1716 allowing communications with other computing devices 1718 . Examples of suitable communication connections 1716 include, but are not limited to, radio frequency (RF) transmitter, receiver, and/or transceiver circuitry; universal serial bus (USB), parallel, and/or serial ports.
- RF radio frequency
- USB universal serial bus
- Computer readable media includes computer storage media.
- Computer storage media include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, or program modules.
- the system memory 1704 , the removable storage device 1709 , and the non-removable storage device 1710 are all computer storage media examples (i.e., memory storage.)
- computer storage media include RAM, ROM, electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other article of manufacture which can be used to store information and which can be accessed by the computing device 1700 .
- any such computer storage media is part of the computing device 1700 .
- Computer storage media do not include a carrier wave or other propagated data signal.
- communication media are embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and include any information delivery media.
- modulated data signal describes a signal that has one or more characteristics set or changed in such a manner as to encode information in the signal.
- communication media include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared, and other wireless media.
- FIGS. 18 A and 18 B illustrate a mobile computing device 1800 , for example, a mobile telephone, a smart phone, a tablet personal computer, a laptop computer, and the like, with which aspects may be practiced.
- a mobile computing device 1800 for implementing the aspects is illustrated.
- the mobile computing device 1800 is a handheld computer having both input elements and output elements.
- the mobile computing device 1800 typically includes a display 1805 and one or more input buttons 1810 that allow the user to enter information into the mobile computing device 1800 .
- the display 1805 of the mobile computing device 1800 functions as an input device (e.g., a touch screen display). If included, an optional side input element 1815 allows further user input.
- the side input element 1815 is a rotary switch, a button, or any other type of manual input element.
- mobile computing device 1800 incorporates more or fewer input elements.
- the display 1805 may not be a touch screen in some examples.
- the mobile computing device 1800 is a portable phone system, such as a cellular phone.
- the mobile computing device 1800 includes an optional keypad 1835 .
- the optional keypad 1835 is a physical keypad.
- the optional keypad 1835 is a “soft” keypad generated on the touch screen display.
- the output elements include the display 1805 for showing a graphical user interface (GUI), a visual indicator 1820 (e.g., a light emitting diode), and/or an audio transducer 1825 (e.g., a speaker).
- GUI graphical user interface
- the mobile computing device 1800 incorporates a vibration transducer for providing the user with tactile feedback.
- the mobile computing device 1800 incorporates input and/or output ports, such as an audio input (e.g., a microphone jack), an audio output (e.g., a headphone jack), and a video output (e.g., a HDMI port) for sending signals to or receiving signals from an external device.
- the mobile computing device 1800 incorporates peripheral device port 1840 , such as an audio input (e.g., a microphone jack), an audio output (e.g., a headphone jack), and a video output (e.g., a HDMI port) for sending signals to or receiving signals from an external device.
- peripheral device port 1840 such as an audio input (e.g., a microphone jack), an audio output (e.g., a headphone jack), and a video output (e.g., a HDMI port) for sending signals to or receiving signals from an external device.
- FIG. 18 B is a block diagram illustrating the architecture of one example of a mobile computing device. That is, the mobile computing device 1800 incorporates a system (i.e., an architecture) 1802 to implement some examples.
- the system 1802 is implemented as a “smart phone” capable of running one or more applications (e.g., browser, e-mail, calendaring, contact managers, messaging clients, games, and media clients/players).
- the system 1802 is integrated as a computing device, such as an integrated personal digital assistant (PDA) and wireless phone.
- PDA personal digital assistant
- one or more application programs 1850 are loaded into the memory 1862 and run on or in association with the operating system 1864 .
- Examples of the application programs include phone dialer programs, e-mail programs, personal information management (PIM) programs, word processing programs, spreadsheet programs, Internet browser programs, messaging programs, and so forth.
- system 100 is loaded into memory 1862 .
- the system 1802 also includes a non-volatile storage area 1868 within the memory 1862 .
- the non-volatile storage area 1868 is used to store persistent information that should not be lost if the system 1802 is powered down.
- the application programs 1850 may use and store information in the non-volatile storage area 1868 , such as e-mail or other messages used by an e-mail application, and the like.
- a synchronization application (not shown) also resides on the system 1802 and is programmed to interact with a corresponding synchronization application resident on a host computer to keep the information stored in the non-volatile storage area 1868 synchronized with corresponding information stored at the host computer.
- other applications may be loaded into the memory 1862 and run on the mobile computing device 1800 .
- the system 1802 has a power supply 1870 , which is implemented as one or more batteries.
- the power supply 1870 further includes an external power source, such as an AC adapter or a powered docking cradle that supplements or recharges the batteries.
- the system 1802 includes a radio 1872 that performs the function of transmitting and receiving radio frequency communications.
- the radio 1872 facilitates wireless connectivity between the system 1802 and the “outside world,” via a communications carrier or service provider. Transmissions to and from the radio 1872 are conducted under control of the operating system 1864 . In other words, communications received by the radio 1872 may be disseminated to the application programs 1850 via the operating system 1864 , and vice versa.
- the visual indicator 1820 is used to provide visual notifications and/or an audio interface 1874 is used for producing audible notifications via the audio transducer 1825 .
- the visual indicator 1820 is a light emitting diode (LED) and the audio transducer 1825 is a speaker.
- LED light emitting diode
- the LED may be programmed to remain on indefinitely until the user takes action to indicate the powered-on status of the device.
- the audio interface 1874 is used to provide audible signals to and receive audible signals from the user.
- the audio interface 1874 may also be coupled to a microphone to receive audible input, such as to facilitate a telephone conversation.
- the system 1802 further includes a video interface 1876 that enables an operation of an on-board camera 1830 to record still images, video stream, and the like.
- a mobile computing device 1800 implementing the system 1802 has additional features or functionality.
- the mobile computing device 1800 includes additional data storage devices (removable and/or non-removable) such as, magnetic disks, optical disks, or tape.
- additional storage is illustrated in FIG. 18 B by the non-volatile storage area 1868 .
- data/information generated or captured by the mobile computing device 1800 and stored via the system 1802 are stored locally on the mobile computing device 1800 , as described above.
- the data are stored on any number of storage media that are accessible by the device via the radio 1872 or via a wired connection between the mobile computing device 1800 and a separate computing device associated with the mobile computing device 1800 , for example, a server computer in a distributed computing network, such as the Internet.
- a server computer in a distributed computing network such as the Internet.
- data/information are accessible via the mobile computing device 1800 via the radio 1872 or via a distributed computing network.
- such data/information are readily transferred between computing devices for storage and use according to well-known data/information transfer and storage means, including electronic mail and collaborative data/information sharing systems.
- FIG. 19 illustrates one example of the architecture of a system for web-scale passage retrieval as described above.
- Content developed, interacted with, or edited in association with the system 100 is enabled to be stored in different communication channels or other storage types.
- various documents may be stored using a directory service 1922 , a web portal 1924 , a mailbox service 1926 , an instant messaging store 1928 , or a social networking site 1930 .
- the web-scale passage retrieval system 100 is operative to use any of these types of systems or the like for improving search efficiency, as described herein.
- a server 1920 provides the web-scale passage retrieval system 100 to clients 1905 a,b,c .
- the server 1920 is a web server providing the system 100 over the web.
- the server 1920 provides the system 100 over the web to clients 1905 through a network 1940 .
- the client computing device is implemented and embodied in a personal computer 1905 a , a tablet computing device 1905 b or a mobile computing device 1905 c (e.g., a smart phone), or other computing device. Any of these examples of the client computing device are operable to obtain content from the store 1916 .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
Claims (20)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2016/082980 WO2017201647A1 (en) | 2016-05-23 | 2016-05-23 | Relevant passage retrieval system |
Publications (2)
Publication Number | Publication Date |
---|---|
US20190303375A1 US20190303375A1 (en) | 2019-10-03 |
US12174839B2 true US12174839B2 (en) | 2024-12-24 |
Family
ID=60411031
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/303,274 Active 2038-03-29 US12174839B2 (en) | 2016-05-23 | 2016-05-23 | Relevant passage retrieval system |
Country Status (4)
Country | Link |
---|---|
US (1) | US12174839B2 (en) |
EP (1) | EP3465464A4 (en) |
CN (1) | CN109219811B (en) |
WO (1) | WO2017201647A1 (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10812343B2 (en) * | 2017-08-03 | 2020-10-20 | Microsoft Technology Licensing, Llc | Bot network orchestration to provide enriched service request responses |
CN110659346B (en) * | 2019-08-23 | 2024-04-12 | 平安科技(深圳)有限公司 | Form extraction method, form extraction device, terminal and computer readable storage medium |
US11409626B2 (en) | 2019-08-29 | 2022-08-09 | Snowflake Inc. | Decoupling internal and external tasks in a database environment |
CN112825088A (en) * | 2019-11-21 | 2021-05-21 | 阿里巴巴集团控股有限公司 | Information display method, device, equipment and storage medium |
US11526557B2 (en) | 2019-11-27 | 2022-12-13 | Amazon Technologies, Inc. | Systems, apparatuses, and methods for providing emphasis in query results |
US11475067B2 (en) * | 2019-11-27 | 2022-10-18 | Amazon Technologies, Inc. | Systems, apparatuses, and methods to generate synthetic queries from customer data for training of document querying machine learning models |
EP4062295A1 (en) * | 2019-11-27 | 2022-09-28 | Amazon Technologies Inc. | Systems, apparatuses, and methods for document querying |
US11366855B2 (en) | 2019-11-27 | 2022-06-21 | Amazon Technologies, Inc. | Systems, apparatuses, and methods for document querying |
NL2025417B1 (en) * | 2020-04-24 | 2021-11-02 | Microsoft Technology Licensing Llc | Intelligent Content Identification and Transformation |
CN113641783B (en) * | 2020-04-27 | 2024-07-19 | 北京庖丁科技有限公司 | Content block retrieval method, device, equipment and medium based on key sentences |
KR102197945B1 (en) * | 2020-05-01 | 2021-01-05 | 호서대학교 산학협력단 | Method for training information retrieval model based on weak-supervision and method for providing search result using such model |
US11887011B2 (en) * | 2021-02-08 | 2024-01-30 | Microsoft Technology Licensing, Llc | Schema augmentation system for exploratory research |
US11615154B2 (en) * | 2021-02-17 | 2023-03-28 | International Business Machines Corporation | Unsupervised corpus expansion using domain-specific terms |
WO2022255807A1 (en) * | 2021-06-02 | 2022-12-08 | 호서대학교 산학협력단 | Method for providing improved search result by combining two or more information searches |
KR102324571B1 (en) * | 2021-06-02 | 2021-11-11 | 호서대학교 산학협력단 | Method for providing enhanced search result in passage-based information retrieval |
KR102325249B1 (en) * | 2021-06-02 | 2021-11-12 | 호서대학교 산학협력단 | Method for providing enhanced search result by fusioning passage-based and document-based information retrievals |
US20230325420A1 (en) * | 2022-03-29 | 2023-10-12 | Open Text Holdings, Inc. | System and method for document analysis to determine diverse and relevant passages of documents |
Citations (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010000356A1 (en) | 1995-07-07 | 2001-04-19 | Woods William A. | Method and apparatus for generating query responses in a computer-based document retrieval system |
US20050187768A1 (en) * | 2004-02-24 | 2005-08-25 | Godden Kurt S. | Dynamic N-best algorithm to reduce recognition errors |
JP2005301444A (en) | 2004-04-07 | 2005-10-27 | Nippon Telegr & Teleph Corp <Ntt> | Passage retrieving method, passage retrieving device, passage retrieving program and recoding medium with passage retrieving program recorded |
US20050256700A1 (en) | 2004-05-11 | 2005-11-17 | Moldovan Dan I | Natural language question answering system and method utilizing a logic prover |
US20060190258A1 (en) * | 2004-12-16 | 2006-08-24 | Jan Verhasselt | N-Best list rescoring in speech recognition |
US20060195440A1 (en) * | 2005-02-25 | 2006-08-31 | Microsoft Corporation | Ranking results using multiple nested ranking |
CN101246484A (en) | 2007-02-15 | 2008-08-20 | 刘二中 | Electric text similarity processing method and system convenient for query |
US20080301174A1 (en) | 2007-03-30 | 2008-12-04 | Albert Mons | Data structure, system and method for knowledge navigation and discovery |
US20090070322A1 (en) * | 2007-08-31 | 2009-03-12 | Powerset, Inc. | Browsing knowledge on the basis of semantic relations |
WO2009035871A1 (en) | 2007-09-10 | 2009-03-19 | Powerset, Inc. | Browsing knowledge on the basis of semantic relations |
US20090089277A1 (en) * | 2007-10-01 | 2009-04-02 | Cheslow Robert D | System and method for semantic search |
CN101523338A (en) | 2005-03-18 | 2009-09-02 | 搜索引擎科技有限责任公司 | Search engine that applies feedback from users to improve search results |
US7856350B2 (en) | 2006-08-11 | 2010-12-21 | Microsoft Corporation | Reranking QA answers using language modeling |
US20110246181A1 (en) * | 2010-03-30 | 2011-10-06 | Jisheng Liang | Nlp-based systems and methods for providing quotations |
CN102323955A (en) | 2011-09-16 | 2012-01-18 | 邹春城 | Private cloud searching system and implement method thereof |
US8275803B2 (en) | 2008-05-14 | 2012-09-25 | International Business Machines Corporation | System and method for providing answers to questions |
US20130124534A1 (en) * | 2011-11-15 | 2013-05-16 | Long Van Dinh | Apparatus and method for information access, search, rank and retrieval |
US20140297571A1 (en) * | 2013-03-29 | 2014-10-02 | International Business Machines Corporation | Justifying Passage Machine Learning for Question and Answer Systems |
US20140330819A1 (en) * | 2013-05-03 | 2014-11-06 | Rajat Raina | Search Query Interactions on Online Social Networks |
US20150340033A1 (en) * | 2014-05-20 | 2015-11-26 | Amazon Technologies, Inc. | Context interpretation in natural language processing using previous dialog acts |
CN105247517A (en) | 2013-04-23 | 2016-01-13 | 谷歌公司 | Ranking signals in mixed corpora environments |
WO2016015267A1 (en) | 2014-07-31 | 2016-02-04 | Hewlett-Packard Development Company, L.P. | Rank aggregation based on markov model |
US20160078102A1 (en) | 2014-09-12 | 2016-03-17 | Nuance Communications, Inc. | Text indexing and passage retrieval |
US9323831B2 (en) | 2010-09-28 | 2016-04-26 | International Business Machines Corporation | Providing answers to questions using hypothesis pruning |
US20160147871A1 (en) * | 2014-11-20 | 2016-05-26 | International Business Machines Corporation | Entity-relation based passage scoring in a question answering computer system |
US9940367B1 (en) * | 2014-08-13 | 2018-04-10 | Google Llc | Scoring candidate answer passages |
US10180964B1 (en) * | 2014-08-13 | 2019-01-15 | Google Llc | Candidate answer passages |
US20200162531A1 (en) * | 2014-08-15 | 2020-05-21 | Groupon, Inc. | Enforcing diversity in ranked relevance results returned from a universal relevance service framework |
US20200184532A1 (en) * | 2014-08-15 | 2020-06-11 | Groupon, Inc. | Universal Relevance Service Framework |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100185623A1 (en) * | 2009-01-15 | 2010-07-22 | Yumao Lu | Topical ranking in information retrieval |
-
2016
- 2016-05-23 US US16/303,274 patent/US12174839B2/en active Active
- 2016-05-23 EP EP16902637.4A patent/EP3465464A4/en not_active Withdrawn
- 2016-05-23 WO PCT/CN2016/082980 patent/WO2017201647A1/en unknown
- 2016-05-23 CN CN201680086072.XA patent/CN109219811B/en active Active
Patent Citations (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010000356A1 (en) | 1995-07-07 | 2001-04-19 | Woods William A. | Method and apparatus for generating query responses in a computer-based document retrieval system |
US20050187768A1 (en) * | 2004-02-24 | 2005-08-25 | Godden Kurt S. | Dynamic N-best algorithm to reduce recognition errors |
JP2005301444A (en) | 2004-04-07 | 2005-10-27 | Nippon Telegr & Teleph Corp <Ntt> | Passage retrieving method, passage retrieving device, passage retrieving program and recoding medium with passage retrieving program recorded |
US20050256700A1 (en) | 2004-05-11 | 2005-11-17 | Moldovan Dan I | Natural language question answering system and method utilizing a logic prover |
US20060190258A1 (en) * | 2004-12-16 | 2006-08-24 | Jan Verhasselt | N-Best list rescoring in speech recognition |
US20060195440A1 (en) * | 2005-02-25 | 2006-08-31 | Microsoft Corporation | Ranking results using multiple nested ranking |
CN101523338A (en) | 2005-03-18 | 2009-09-02 | 搜索引擎科技有限责任公司 | Search engine that applies feedback from users to improve search results |
US7856350B2 (en) | 2006-08-11 | 2010-12-21 | Microsoft Corporation | Reranking QA answers using language modeling |
CN101246484A (en) | 2007-02-15 | 2008-08-20 | 刘二中 | Electric text similarity processing method and system convenient for query |
US20080301174A1 (en) | 2007-03-30 | 2008-12-04 | Albert Mons | Data structure, system and method for knowledge navigation and discovery |
US20090070322A1 (en) * | 2007-08-31 | 2009-03-12 | Powerset, Inc. | Browsing knowledge on the basis of semantic relations |
WO2009035871A1 (en) | 2007-09-10 | 2009-03-19 | Powerset, Inc. | Browsing knowledge on the basis of semantic relations |
US20090089277A1 (en) * | 2007-10-01 | 2009-04-02 | Cheslow Robert D | System and method for semantic search |
US8275803B2 (en) | 2008-05-14 | 2012-09-25 | International Business Machines Corporation | System and method for providing answers to questions |
US20110246181A1 (en) * | 2010-03-30 | 2011-10-06 | Jisheng Liang | Nlp-based systems and methods for providing quotations |
US9323831B2 (en) | 2010-09-28 | 2016-04-26 | International Business Machines Corporation | Providing answers to questions using hypothesis pruning |
CN102323955A (en) | 2011-09-16 | 2012-01-18 | 邹春城 | Private cloud searching system and implement method thereof |
US20130124534A1 (en) * | 2011-11-15 | 2013-05-16 | Long Van Dinh | Apparatus and method for information access, search, rank and retrieval |
US20140297571A1 (en) * | 2013-03-29 | 2014-10-02 | International Business Machines Corporation | Justifying Passage Machine Learning for Question and Answer Systems |
US20170140304A1 (en) * | 2013-03-29 | 2017-05-18 | International Business Machines Corporation | Justifying Passage Machine Learning for Question and Answer Systems |
CN105247517A (en) | 2013-04-23 | 2016-01-13 | 谷歌公司 | Ranking signals in mixed corpora environments |
US20140330819A1 (en) * | 2013-05-03 | 2014-11-06 | Rajat Raina | Search Query Interactions on Online Social Networks |
US20150340033A1 (en) * | 2014-05-20 | 2015-11-26 | Amazon Technologies, Inc. | Context interpretation in natural language processing using previous dialog acts |
WO2016015267A1 (en) | 2014-07-31 | 2016-02-04 | Hewlett-Packard Development Company, L.P. | Rank aggregation based on markov model |
US9940367B1 (en) * | 2014-08-13 | 2018-04-10 | Google Llc | Scoring candidate answer passages |
US10180964B1 (en) * | 2014-08-13 | 2019-01-15 | Google Llc | Candidate answer passages |
US10783156B1 (en) * | 2014-08-13 | 2020-09-22 | Google Llc | Scoring candidate answer passages |
US20200162531A1 (en) * | 2014-08-15 | 2020-05-21 | Groupon, Inc. | Enforcing diversity in ranked relevance results returned from a universal relevance service framework |
US20200184532A1 (en) * | 2014-08-15 | 2020-06-11 | Groupon, Inc. | Universal Relevance Service Framework |
US20160078102A1 (en) | 2014-09-12 | 2016-03-17 | Nuance Communications, Inc. | Text indexing and passage retrieval |
US20160147871A1 (en) * | 2014-11-20 | 2016-05-26 | International Business Machines Corporation | Entity-relation based passage scoring in a question answering computer system |
Non-Patent Citations (13)
Title |
---|
"Extended Search Report Issued in European Patent Application No. 16902637.4", Mailed Date: Dec. 3, 2019, 9 Pages. |
"First Office Action and Search Report Issued in Chinese Patent Application No. 201680086072.X", Mailed Date: Mar. 3, 2021, 13 Pages. |
"Second Office Action and Search Report Issued in Chinese Patent Application No. 201680086072.X", Mailed Date: Aug. 18, 2021, 11 Pages. |
International Search Report and Written Opinion for PCT/CN2016/082980, mailed Feb. 24, 2017. |
Keikha, et al., "Retrieving Passages and Finding Answers", In Proceedings of Australasian Document Computing Symposium, Nov. 27, 2014, 4 pages. |
Li, Rongmei, "The State-of-the-arts in Focused Search", Retrieved on: May 31, 2016, 16 pages, http://doc.utwente.nl/67516/1/focusedSearch.pdf. |
Llopis et al., "Passage Selection to Improve Question Answering", In Proceedings of conference on multilingual summarization and question answering, vol. 19, Aug. 2002, 6 pages. |
Ofoghi, Bahadorreza, John Yearwood, and Ranadhir Ghosh. "A semantic approach to boost passage retrieval effectiveness for question answering." In Proceedings of the 29th Australasian Computer Science Conference—vol. 48, pp. 95-101. 2006. (Year: 2006). * |
Sankepally, et al., "An Aggregate Search Model for Web Search Engines: An Empirical Study", In IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), vol. 3, Nov. 17, 2013, pp. 288-289. |
Tellex et al., "Quantitative Evaluation of Passage Retrieval Algorithms for Question Answering", In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 28, 2003, 7 pages. |
Wan et al., "Passage Retrieval with Vector Space and Query-Level Aspect Models", In Proceedings of the Sixteenth Text Retrieval Conference, Nov. 6, 2007, 10 pages. |
Xu et al., "Passage Retrieval for Information Extraction using Distant Supervision", In Proceedings of the 5th International Joint Conference on Natural Language Processing, Nov. 8, 2011, 9 pages. |
Zhao, Feiyu, "A Random Walk Based Win/Loss Graph Aggregation Algorithm for News Metasearch Engine", In Dissertation of Master Degree of Xihua University, Apr. 2013, 68 Pages. |
Also Published As
Publication number | Publication date |
---|---|
EP3465464A1 (en) | 2019-04-10 |
WO2017201647A1 (en) | 2017-11-30 |
CN109219811A (en) | 2019-01-15 |
US20190303375A1 (en) | 2019-10-03 |
EP3465464A4 (en) | 2020-01-01 |
CN109219811B (en) | 2022-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12174839B2 (en) | Relevant passage retrieval system | |
Venetis et al. | Recovering semantics of tables on the web | |
US10169453B2 (en) | Automatic document summarization using search engine intelligence | |
Cappallo et al. | New modality: Emoji challenges in prediction, anticipation, and retrieval | |
Nie et al. | Beyond text QA: multimedia answer generation by harvesting web information | |
US8335787B2 (en) | Topic word generation method and system | |
CA2774278C (en) | Methods and systems for extracting keyphrases from natural text for search engine indexing | |
US20100287162A1 (en) | method and system for text summarization and summary based query answering | |
US9645987B2 (en) | Topic extraction and video association | |
US10572528B2 (en) | System and method for automatic detection and clustering of articles using multimedia information | |
WO2016054301A1 (en) | Distant supervision relationship extractor | |
US20210103622A1 (en) | Information search method, device, apparatus and computer-readable medium | |
US20180032608A1 (en) | Flexible summarization of textual content | |
Krishnaveni et al. | Automatic text summarization by local scoring and ranking for improving coherence | |
US20110213763A1 (en) | Web content mining of pair-based data | |
Man | Feature extension for short text categorization using frequent term sets | |
Chirigati et al. | Knowledge exploration using tables on the web | |
Feldman | The answer machine | |
JP4931114B2 (en) | Data display device, data display method, and data display program | |
CN111753052A (en) | Providing intellectual answers to knowledge intent questions | |
Sariki et al. | A book recommendation system based on named entities | |
CN114416914B (en) | Processing method based on picture question and answer | |
US11803583B2 (en) | Concept discovery from text via knowledge transfer | |
Welch | Addressing the challenges of underspecification in web search | |
Hoonlor | Sequential patterns and temporal patterns for text mining |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BAI, JING;LIU, YUE-SHENG;PEDERSEN, JAN O.;AND OTHERS;SIGNING DATES FROM 20160524 TO 20170117;REEL/FRAME:047552/0185 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCV | Information on status: appeal procedure |
Free format text: NOTICE OF APPEAL FILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCV | Information on status: appeal procedure |
Free format text: NOTICE OF APPEAL FILED |
|
STCV | Information on status: appeal procedure |
Free format text: ON APPEAL -- AWAITING DECISION BY THE BOARD OF APPEALS |
|
STCV | Information on status: appeal procedure |
Free format text: BOARD OF APPEALS DECISION RENDERED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
ZAAB | Notice of allowance mailed |
Free format text: ORIGINAL CODE: MN/=. |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |