US12223002B2 - Semantics-aware hybrid encoder for improved related conversations - Google Patents
Semantics-aware hybrid encoder for improved related conversations Download PDFInfo
- Publication number
- US12223002B2 US12223002B2 US17/454,445 US202117454445A US12223002B2 US 12223002 B2 US12223002 B2 US 12223002B2 US 202117454445 A US202117454445 A US 202117454445A US 12223002 B2 US12223002 B2 US 12223002B2
- Authority
- US
- United States
- Prior art keywords
- post
- conversing
- query
- posts
- subject
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
- 238000000034 method Methods 0.000 claims abstract description 38
- 239000013598 vector Substances 0.000 claims description 46
- 230000004044 response Effects 0.000 claims description 5
- 230000015654 memory Effects 0.000 description 12
- 238000012360 testing method Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000012552 review Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000000717 retained effect Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3334—Selection or weighting of terms from queries, including natural language queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9538—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/107—Computer-aided management of electronic mailing [e-mailing]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/01—Customer relationship services
- G06Q30/015—Providing customer assistance, e.g. assisting a customer within a business location or via helpdesk
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
Definitions
- Embodiments of the present disclosure are directed to a method of identifying user posts on an online forum relevant to a query post.
- the Internet is a great resource for users to find solutions to problems.
- Online forums where users post queries answered by other users are widely available for users seeking answers or solutions to their problems. Users who post queries may receive answers or other posts that may be highly relevant or in most cases not relevant at all to the queries.
- Not every query that is posted online is novel and there may be related conversations previously posted online, and the users might be able to refer to such conversations that have already been answered to resolve their issue. It is important for companies to invest in keeping community engagement active, positive, and organic.
- the right set of related conversations can ensure that users find what they are looking for as well as providing the opportunity to explore their area of interest further through related conversations. In addition, the quicker the questions are resolved; the more likely customers are retained.
- the challenge is to suggest related conversations for a given query post based on the similarity in the context in the body, i.e., the content of the post, as well as matching subject lines.
- Embodiments of the disclosure provide a method to encode the context of the conversations from the body of a post along with the subject of the post, thus improving the overall quality.
- the results retrieved from a model according to an embodiment better encode the context of the post and thus provide better quality recommendations.
- a computer-implemented system according to an embodiment retrieves the most relevant online documents given an online query document using a 3-level hierarchical ranking mechanism.
- Embodiments of the disclosure include a computer-implemented semantics-oriented hybrid-search technique for encoding the context of the online documents, resulting in better retrieval performance.
- a computer-implemented boosting technique captures multiple metrics and provides a hierarchical ranking criterion.
- the boosting technique can boost rankings for documents involving the same board, the same product, the same OS Version and the same app version, etc.
- a method for recommending online conversations relevant a given query post can be implemented as a computer application that is incorporated into the software that supports the online forum, and would be automatically invoked when a user posts a query.
- a computer-implemented method of finding online relevant conversing posts including: receiving, by a web server serving an online forum, a query post from an inquirer using the online forum, wherein the online forum facilitates conversing posts from users on subjects that are relevant and irrelevant to the query post; computing, by a contextual similarity scoring module, a contextual similarity score between each conversing post of a set of conversing posts in the online forum with the query post, wherein the query post and each conversing post of the set of conversing posts includes a subject and a body, wherein the contextual similarity score is computed between the body of each of the set of conversing post and the body of the query post, wherein N1 conversing posts of the set of conversing posts with a highest contextual similarity score are selected; computing, by a fine grained similarity scoring module, a fine grained similarity score between an embedding of the subject of the query post and an embedding of the subject of each of the N 1 selected conversing posts, wherein N
- a computer-implemented system for finding, in an online forum, conversing posts relevant to a query post including: a subject encoding module that calculates a subject embedding vector of a subject of a query post received by a web server serving an online forum and subject embedding vectors of a set of conversing posts previously posted to the online forum, wherein each of the query post and the set of conversing posts includes a subject and a body, wherein a user wants to find other conversing post in the online forum that are relevant to the query post; a fine grained relevance scoring module that calculates a fine grain similarity score between the subject embedding vectors of the query post and the set of conversing post, and that selects N2 conversing posts from the set of conversing posts with a highest fine grained relevance scorer with respect to the query post; a boosting module that boosts the fine grain similarity score of at least some of the N2 conversing posts based on one or more relevance metrics and selects N3
- a computer-implemented method of retrieving online relevant conversing posts including receiving, by a web server serving an online forum, a query post from an inquirer using the online forum, wherein the online forum facilitates conversing posts from users on subjects that are relevant and irrelevant to the query post; computing, by a contextual similarity scoring module, a contextual similarity score between each conversing post of a set of conversing posts in the online forum with the query post, wherein the query post and each conversing post of the set of conversing posts includes a subject and a body, wherein the contextual similarity is computed between the body of each of the set of conversing posts and the body of the query post, wherein N1 conversing posts of the set of conversing posts with a highest contextual similarity score are selected; and computing, by a fine grained similarity scoring module, a fine grained similarity score between an embedding of a subject of the query post and embeddings of a subject of each of the N 1 selected conversing posts by applying
- FIG. 1 A is a block diagram of an overall system that illustrates the different stages of a method according to an embodiment.
- FIG. 1 B is a flow diagram of a method of retrieving posts relevant to a query post, according to an embodiment.
- FIG. 2 is a table of results of an expert comparison of a conversation recommendation system according to an embodiment to other language processing models.
- FIG. 3 is a table of key performance indicator results of an A/B test of a conversation recommendation system according to an embodiment an out-of-the box (OOTB) language model.
- OOTB out-of-the box
- FIGS. 4 A-B illustrate conversation recommendations of a previous system and those generated by a conversation recommendation system according to an embodiment.
- FIG. 5 illustrates a block diagram of an exemplary computing device that implements a conversation recommendation system according to an embodiment.
- the challenge is to suggest related conversations for a given query post based on the similarity in the context in the body, i.e., the content of the post, as well as matching subject lines.
- Embodiments of the disclosure provide a semantics-oriented hybrid search technique that uses page-ranking along with the generalizability of neural networks that retrieve related online conversations for a given query post, and do so essentially instantaneously. Results show significant improvement in retrieved results over the existing techniques.
- semantic-searching can be used to improve searches in various products that utilize textual content. These uses include semantic searches, analyzing textual reviews on a product page on e-commerce websites to cluster similar reviews, natural language understanding and answering questions posted online.
- At least one embodiment of the disclosure uses a computer system that encodes the context of the conversations from the body of the post along with the subject of the post, thus improving the overall quality.
- An online based semantics-oriented hybrid search is used with a hierarchical ranking technique that recommends the best related conversations and that utilizes the context of the query post along with the generalizability of neural networks.
- a computer-implemented system according to an embodiment first shortlists conversations that have a similar context, based on the bodies of the posts, and then recommends the highly relevant posts based on the subject of the post.
- the hierarchical ranking used by systems according to embodiments of the disclosure ensures the recommended conversations are relevant, increasing the probability of a user engaging with those posts.
- Other benefits expected from use of computer-implemented online systems according to embodiments of the disclosure include increased page views, member entrances, time remaining onsite, posts submitted, accepted solutions, and liked posts.
- query refers to a document submitted by a user to an Internet forum or message board, which is a computer implemented online discussion site where people can hold conversations in the form of posted messages or documents.
- a query typically includes a header or title that identifies the subject of the query, and the body, which contains the substance of the query.
- online forum refers to a computer-implemented Internet forum, discussion group or community in which the subject of the posts is directed to a particular subject matter, such as a technology.
- semantic search refers to a computer-implemented online search with meaning, as opposed to a lexical search in which the search engine looks for literal matches of the query words or variants of them, without understanding the overall meaning of the query. Semantic search seeks to improve search accuracy and generate more relevant results by understanding the contextual meaning of terms as they appear in a searchable dataspace.
- embeddding refers to the representation of text, typically in the form of a real-valued vector that encodes the meaning of the text such that the documents that are closer in the vector space are expected to be similar in meaning.
- A/B testing refers to a randomized experiment with two variants, A and B, to compare two versions of a single variable, typically by testing a subject's response to variant A against variant B, and determining which of the two variants is more effective.
- TFIDF frequency-inverse document frequency
- tf-idf refers to a numerical statistic that reflects how important a word is to a document in a collection of document.
- the tf-idf value increases proportionally to the number of times a word appears in the document and is offset by the number of documents in the corpus that contain the word, to adjust for the fact that some words appear more frequently in general.
- cosine similarity refers to measure of similarity between two non-zero vectors and is defined by the cosine of the angle between them, which is the same as the inner product of the same vectors normalized to unit length.
- BERT refers to a computer-implemented transformer language model known as Bidirectional Encoder Representations from Transformers, a transformer-based machine learning technique for natural language processing (NLP).
- BERT has variable number of encoder layers and self-attention heads, and was pretrained on two tasks: language modeling to predict tokens from context, and next sentence prediction to predict if a chosen next sentence was probable or not given the first sentence. After pretraining, BERT can be fine tuned to optimize its performance on specific tasks.
- transformer refers to a computer-implemented deep learning model that uses the attention mechanism to differentially weigh the significance of each part of the input data.
- the attention mechanism identifies the context for each word in a sentence without necessarily processing the data in order.
- boosting refers to increasing a similarity score between two online posts or documents based on shared references of the two posts or documents.
- FIG. 1 is a block diagram of an overall computer-implemented system that illustrates the different stages of a method according to an embodiment.
- a query is posted in an online forum server that serves as a reference post 111 that is split into a body 113 and a subject 114 .
- a body encoding module 125 calculates a body embedding 127 of the body 113
- a subject encoding module 126 calculates a subject embedding 128 of the body 114 .
- An exemplary, non-limiting body encoding module 125 is a tf-idf encoding module.
- the subject encoding module 126 is a computer-implemented multilingual model 121 , which is pre-trained at block 122 and fine tuned for semantic searching at block 123 .
- An exemplary, non-limiting computer-implemented multilingual model 121 is a BERT model.
- a contextual similarity scoring module 140 compares the body embedding 127 of the reference post 111 against a body embedding 137 of the body 133 of a post 131 in the complete corpus 130 of posts in the online forum to calculate a contextual relevance score, as will be described below.
- a first number N 1 of posts with a highest contextual relevance score are selected for further processing.
- a fine-grained similarity scoring module 141 compares the subject embedding 128 of the reference post 111 against a subject embedding 138 of the subject 134 of the first number N 1 of posts 131 of the complete corpus 130 of posts in the online forum to calculate the fine-grained relevance score, as will be described below.
- a second number N 2 of posts with a highest fine grained relevance score, where N 2 ⁇ N 1 will be selected for further processing.
- a boosting module 150 uses various metrics 142 to boost the fine-grained relevance scores of the second number N 2 of posts based on various other relevance measures, as will be described below.
- a final score for each of the second number N 2 of posts is calculated by the boosting module 150 as a weighted sum over the various other relevance measures, from which a third number N 3 with the highest final scores are selected as the top N 3 recommendations 151 .
- the N 3 selected online posts are then displayed to the user by a display device.
- FIG. 1 B is a flow diagram of a computer-implemented method of retrieving online posts relevant to a query post, according to an embodiment.
- a method begins by receiving, at step 10 , by an online forum server, a query post from a user of an online forum.
- the query post may be a question from the user that the user wants answered by finding other posts from a set of online posts 11 in the online forum that are relevant to the query post.
- the query post and each post of the set of online posts 11 include a subject and a body.
- a body encoding module calculates an embedding from the body of each post of the set of online posts and an embedding from the body of the query post.
- the embeddings of the body of the query post and each of the posts of the set of online posts are used by a contextual similarity scoring module at step 13 to calculate a contextual similarity score between the query post and each post of a set of online posts, and the N1 posts of the set of online posts with the highest contextual similarity score are selected for further processing.
- a computer-implemented pre-trained multi-lingual model is fine tuned for determining semantic similarities.
- the fine tuned computer-implemented multi-lingual model is used at step 15 by a subject encoding module to calculate embeddings of the subject of the query post and the subject of each of the N1 selected online posts.
- a fine grained similarity scoring module calculates a fine grained similarity score between the embedding of the subject of the query post and the embeddings of the subject of each of the N1 selected online posts, and N2 posts of the set of N1 online posts with a highest fine grained similarity score are selected for further processing, wherein N2 ⁇ N1.
- a boosting module performs boosting on the N2 selected online posts based on one or more relevance metrics 17 , in which the fine grained similarity score of at least some of the N2 selected online posts is boosted by a weighted sum of the relevance metrics 17 of the N2 selected online posts.
- the boosting module selects the N3 highest posts with the highest boosted fine grained similarity score from the N2 selected online posts as a list of online posts relevant to the query post, where N3 ⁇ N2, these N3 selected online posts are displayed to the user by a display device at step 19 as the most relevant online posts for answering the query posted by the user.
- Contextual Coarse Ranking For a first level, the relevance between the body of the conversations of a online forum with the body of the query post for which a user wants to find the similar conversations is measured. For this, the cosine similarity between the TF-IDF vectors of each of the online forum posts and a TF-IDF vector of the query post is computed by a contextual similarity scoring module, and a shortlist of the N 1 relevant online forum posts with maximum similarity between the body context of the query post and the body contexts of the online forum posts are selected by the contextual similarity scoring module. Alternatively, an L2 distance is used to determine the similarity between the TF-IDF vectors. This ensures that posts that have similar contexts appear in the shortlisted posts.
- N 1 equals 200, but embodiments are not limited thereto. In an embodiment, the value of N 1 is based on a predetermined threshold, but embodiments are not limited thereto, and in other embodiments, the value of N 1 is determined without reference to a threshold.
- a pre-trained computer-implemented classifier was used that was fine-tuned for the task of semantic search. This ensures that the classifier encodes similar posts with embeddings that lie closer in the vector space.
- An exemplary, non-limiting trained computer-implemented multi-lingual classifier is the BERT classifier, which is pre-trained on unlabeled data over different tasks.
- the BERT model is initialized with the pre-trained parameters, and then is fine-tuned using labeled data for the semantic search tasks. Methods of fine tuning are known in the art.
- Boosting To ensure that the related online forum conversations are from the same online board as the query post while at the same time not completely excluding posts from other online boards, boosting is performed by a boosting module to ensure that posts from the same online board as the query post are given higher preference.
- the N 2 related conversations are ranked based on the value of their distance metrics, and then the ranking of individual conversations is boosted based on one or more of a plurality of metrics.
- the boosting depends only on the fine grained relevance scores and does not need to refer to the actual text.
- These metrics include, but are not limited to: a board relevance score, mentioned above, in which the rank of conversations from the same board as the query post are boosted; a product preference score, in which the rank of conversations about a particular product discussed in the user's post are boosted; an OS relevance score, in which the rank of conversations that reference the same operating system version as the user's query post are boosted; and an application version relevance score, in which the rank of conversations that reference the same version of an application as the user's query post are boosted.
- the boosting is based on a weighted sum of one or more of these metrics, as represented by the following equation:
- metric i and weight i are the metric and its associated weight, respectively, and the weights are determined based on an evaluation of each of the metrics with respect to the N 2 related conversations.
- the top N 3 posts are selected out of the N 2 selected online posts that constitute the final list of recommendations.
- N 3 9, but embodiments are not limited thereto.
- the top N3 selected online posts are displayed to the user.
- the value of N 3 is based on a predetermined threshold, but embodiments are not limited thereto, and in other embodiments, the value of N 3 is determined without reference to a threshold.
- a recommendation system has a variety of relevant use cases.
- Semantic Search A popular query is a semantic search on a search engine. Searching for content on the web is akin to finding needle in the haystack, but search engines provide results to search queries in milliseconds.
- An approach according to an embodiment generates textual embeddings that capture the “context” of the text. This is used to search through billions of posts to identify relevant text content. This is useful for searching for similar text in scanned documents, etc.
- NLU Natural Language Understanding
- Question Answering Retrieval In some forums, user post questions and either experts or users of the forum post answers to those. Usually, it takes a few hours for the dedicated experts to identify the new post and answer the query. Since an approach according to an embodiment searches for semantically similar posts, it identifies similar posts that were answered by experts and then recommends similar solutions when an expert is unavailable.
- An evaluation of a method according to an embodiment is performed using A/B testing in production and the results are compared with the existing state-of-the-art methods.
- the quality of recommendations is better than earlier models and the increases in various performance indices indicates an increased business value.
- An experimental conversation recommendation system was tested on a marketing community platform, to find more relevant content for its members, along with driving improved community engagement and product adoption for the users.
- Recommendations are currently based on a user's viewed posts or submitted comments and threads. Members are recommended between 6-10 articles daily depending on their historical community activity.
- results have been summarized in the table of FIG. 2 along with the other methods that were compared against.
- the table indicates, for each model, the percentage of results that are better than an out of the box (OOTB) implementation currently implemented on a community support platform, the percentage of results that are equally good as the OOTB, the percentage of results that are worse than OOTB, the percentage where both are not good, and the percentage that are inclusive.
- FIG. 2 shows that 91% of the conversations recommended by a model according to an embodiment were rated as being better or equally good as recommendations from the OOTB.
- An A/B test was carried out for a selected (US-region) audience to evaluate the engagement of the users on related conversations component, i.e., a click through rate, generated by a model according to an embodiment versus those generated by an OOTB model.
- FIG. 3 shows the different performance metrics that were tracked for the A/B testing experiment along with the results. As can be seen, there was a 6.8% increase in click through rate and a 20.2% decrease in the visit-level JCR rate, which are statistically significant differences.
- FIGS. 4 A-B illustrates some qualitative results as compared with previous models.
- the subject of the post is “All my scans have disappeared”, and the body of the post is below the subject.
- the related conversations returned by a conventional model are listed on the left, and the 9 related conversations returned by a model according to an embodiment are shown on the right.
- the subject of the post is “‘This document could not be saved. There is a problem reading this document” ( 110 )” HELP!’, and the body of the post is below the subject.
- the related conversations returned by a conventional model are listed on the left, and the 9 related conversations returned by a model according to an embodiment are shown on the right.
- the related conversations shown on the right include more information and the information is more relevant to the posted query than the conversation shown on the left.
- FIG. 5 illustrates a block diagram of an exemplary computing device 500 that may be configured to perform one or more of the processes described above.
- one or more computing devices such as the computing device 500
- the computing device 500 may represent the computing system described above, such as the system of FIG. 1 .
- the computing device 500 may be a mobile device, such as a mobile telephone, a smartphone, a PDA, a tablet, a laptop, a camera, a tracker, a watch, a wearable device, etc).
- the computing device 500 may be a non-mobile device, such as a desktop computer or another type of client device.
- the computing device 500 may be a server device that includes cloud-based processing and storage capabilities.
- the computing device 500 can include one or more processor(s) 502 , memory 504 , a storage device 506 , input/output interfaces 508 (or “I/O interfaces 508 ”), and a communication interface 510 , which may be communicatively coupled by way of a communication infrastructure, such as bus 512 .
- a communication infrastructure such as bus 512 .
- the computing device 500 is shown in FIG. 5 , the components illustrated in FIG. 5 are not intended to be limiting. Additional or alternative components may be used in other embodiments.
- the computing device 500 includes fewer components than those shown in FIG. 5 . Components of the computing device 500 shown in FIG. 5 will now be described in additional detail.
- the processor(s) 52 includes hardware for executing instructions, such as those making up a computer program.
- the processor(s) 502 may retrieve (or fetch) the instructions from an internal register, an internal cache, memory 504 , or a storage device 506 and decode and execute them.
- the computing device 500 includes memory 504 , which is coupled to the processor(s) 502 .
- the memory 504 may be used for storing data, metadata, and programs for execution by the processor(s).
- the memory 504 may include one or more of volatile and non-volatile memories, such as Random-Access Memory (“RAM”), Read-Only Memory (“ROM”), a solid-state disk (“SSD”), Flash, Phase Change Memory (“PCM”), or other types of data storage.
- RAM Random-Access Memory
- ROM Read-Only Memory
- SSD solid-state disk
- PCM Phase Change Memory
- the memory 504 may be internal or distributed memory.
- the computing device 500 includes a storage device 506 for storing data or instructions.
- the storage device 506 can include a non-transitory storage medium described above.
- the storage device 506 may include a hard disk drive (HDD), flash memory, a Universal Serial Bus (USB) drive or a combination these or other storage devices.
- HDD hard disk drive
- USB Universal Serial Bus
- the computing device 500 includes one or more I/O interfaces 508 , which are provided to allow a user to provide input to (such as user strokes), receive output from, and otherwise transfer data to and from the computing device 500 .
- I/O interfaces 508 may include a mouse, keypad or a keyboard, a touch screen, camera, optical scanner, network interface, modem, other known I/O devices or a combination of such I/O interfaces 508 .
- the touch screen may be activated with a stylus or a finger.
- the I/O interfaces 508 may include one or more devices for presenting output to a user, including, but not limited to, a graphics engine, a display (e.g., a display screen), one or more output drivers (e.g., display drivers), one or more audio speakers, and one or more audio drivers.
- I/O interfaces 508 are configured to provide graphical data to a display for presentation to a user.
- the graphical data may be representative of one or more graphical user interfaces or any other graphical content as may serve a particular implementation.
- the computing device 500 can further include a communication interface 510 .
- the communication interface 510 can include hardware, software, or both.
- the communication interface 510 provides one or more interfaces for communication (such as, for example, packet-based communication) between the computing device and one or more other computing devices or one or more networks.
- communication interface 510 may include a network interface controller (NIC) or network adapter for communicating with an Ethernet or other wire-based network or a wireless NIC (WNIC) or wireless adapter for communicating with a wireless network, such as a WI-FI.
- NIC network interface controller
- WNIC wireless NIC
- the computing device 500 can further include a bus 512 .
- the bus 512 can include hardware, software, or both that connects components of computing device 500 to each other.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Strategic Management (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Entrepreneurship & Innovation (AREA)
- Tourism & Hospitality (AREA)
- Probability & Statistics with Applications (AREA)
- Computing Systems (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Computer Hardware Design (AREA)
- Primary Health Care (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
-
- 12% increase in page views;
- 5% increase in member entrances;
- 6% increase in minutes online;
- 44% increase in posts submitted;
- 101% increase in accepted solutions; and
- 19% increase in liked posts.
Claims (16)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/454,445 US12223002B2 (en) | 2021-11-10 | 2021-11-10 | Semantics-aware hybrid encoder for improved related conversations |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/454,445 US12223002B2 (en) | 2021-11-10 | 2021-11-10 | Semantics-aware hybrid encoder for improved related conversations |
Publications (2)
Publication Number | Publication Date |
---|---|
US20230143777A1 US20230143777A1 (en) | 2023-05-11 |
US12223002B2 true US12223002B2 (en) | 2025-02-11 |
Family
ID=86229866
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/454,445 Active 2042-03-28 US12223002B2 (en) | 2021-11-10 | 2021-11-10 | Semantics-aware hybrid encoder for improved related conversations |
Country Status (1)
Country | Link |
---|---|
US (1) | US12223002B2 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024259860A1 (en) * | 2023-06-21 | 2024-12-26 | Huawei Technologies Co., Ltd. | Method, apparatus, and system for semantic communications |
CN116542257B (en) * | 2023-07-07 | 2023-09-22 | 长沙市智为信息技术有限公司 | Rumor detection method based on conversation context awareness |
US12235864B1 (en) * | 2023-07-31 | 2025-02-25 | Jpmorgan Chase Bank, N.A. | Method and system for automated classification of natural language data |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110270830A1 (en) * | 2010-04-30 | 2011-11-03 | Palo Alto Research Center Incorporated | System And Method For Providing Multi-Core And Multi-Level Topical Organization In Social Indexes |
US20130046771A1 (en) * | 2011-08-15 | 2013-02-21 | Lockheed Martin Corporation | Systems and methods for facilitating the gathering of open source intelligence |
US20140019118A1 (en) * | 2012-07-12 | 2014-01-16 | Insite Innovations And Properties B.V. | Computer arrangement for and computer implemented method of detecting polarity in a message |
US20160140643A1 (en) * | 2014-11-18 | 2016-05-19 | Microsoft Technology Licensing | Multilingual Content Based Recommendation System |
US20170068906A1 (en) * | 2015-09-09 | 2017-03-09 | Microsoft Technology Licensing, Llc | Determining the Destination of a Communication |
US20180260416A1 (en) * | 2015-09-01 | 2018-09-13 | Dream It Get It Limited | Unit retrieval and related processes |
US20200097544A1 (en) * | 2018-09-21 | 2020-03-26 | Salesforce.Com, Inc. | Response recommendation system |
US20200334486A1 (en) * | 2019-04-16 | 2020-10-22 | Cognizant Technology Solutions India Pvt. Ltd. | System and a method for semantic level image retrieval |
US20210173857A1 (en) * | 2019-12-09 | 2021-06-10 | Kabushiki Kaisha Toshiba | Data generation device and data generation method |
US20210182287A1 (en) * | 2019-12-12 | 2021-06-17 | The Yes Platform | Dynamic Filter Recommendations |
US20210232613A1 (en) * | 2020-01-24 | 2021-07-29 | Accenture Global Solutions Limited | Automatically generating natural language responses to users' questions |
US20210342785A1 (en) * | 2020-05-01 | 2021-11-04 | Monday.com Ltd. | Digital processing systems and methods for virtual file-based electronic white board in collaborative work systems |
-
2021
- 2021-11-10 US US17/454,445 patent/US12223002B2/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110270830A1 (en) * | 2010-04-30 | 2011-11-03 | Palo Alto Research Center Incorporated | System And Method For Providing Multi-Core And Multi-Level Topical Organization In Social Indexes |
US20130046771A1 (en) * | 2011-08-15 | 2013-02-21 | Lockheed Martin Corporation | Systems and methods for facilitating the gathering of open source intelligence |
US20140019118A1 (en) * | 2012-07-12 | 2014-01-16 | Insite Innovations And Properties B.V. | Computer arrangement for and computer implemented method of detecting polarity in a message |
US20160140643A1 (en) * | 2014-11-18 | 2016-05-19 | Microsoft Technology Licensing | Multilingual Content Based Recommendation System |
US20180260416A1 (en) * | 2015-09-01 | 2018-09-13 | Dream It Get It Limited | Unit retrieval and related processes |
US20170068906A1 (en) * | 2015-09-09 | 2017-03-09 | Microsoft Technology Licensing, Llc | Determining the Destination of a Communication |
US20200097544A1 (en) * | 2018-09-21 | 2020-03-26 | Salesforce.Com, Inc. | Response recommendation system |
US20200334486A1 (en) * | 2019-04-16 | 2020-10-22 | Cognizant Technology Solutions India Pvt. Ltd. | System and a method for semantic level image retrieval |
US20210173857A1 (en) * | 2019-12-09 | 2021-06-10 | Kabushiki Kaisha Toshiba | Data generation device and data generation method |
US20210182287A1 (en) * | 2019-12-12 | 2021-06-17 | The Yes Platform | Dynamic Filter Recommendations |
US20210232613A1 (en) * | 2020-01-24 | 2021-07-29 | Accenture Global Solutions Limited | Automatically generating natural language responses to users' questions |
US20210342785A1 (en) * | 2020-05-01 | 2021-11-04 | Monday.com Ltd. | Digital processing systems and methods for virtual file-based electronic white board in collaborative work systems |
Non-Patent Citations (10)
Title |
---|
Alexis Conneau, et al., "Unsuper Vised Cross-Lingual Representation Learning at Scale," Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8440-8451. |
Andrei Z. Broder, "On the Resemblance and Containment of Documents," In Proceedings, Compression and Complexity of Sequences 1997 (Cat. No. 97TB100171) (pp. 21-29). IEEE. |
David M. Blei, et al., "Latent Dirichlet Allocation," The Journal of machine Learning research 3 (2003): 993-1022. |
Jacob Devlin, et al., "BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding," Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. vol. 1 (Long and Short Papers), pp. 4171-4186. |
Jeffery Pennington, et al., "Glove: Global Vectors for Word Representation," Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532-1543, Oct. 25-29, 2014. |
Juan Ramos, "Using TF-IDF to Determine Word Relevance in Document Queries." In Proceedings of the first instructional conference on machine learning (vol. 242, No. 1, pp. 29-48). |
Ron Kohavi, et al., "Online Controlled Experiments and A/B Testing," Encyclopedia of Machine Learning and Data Mining, DOI 10.1007/978-1-4899-7502-7 891-1. |
Sakata, et al., "Faq Retrieval Using Query-Question Similarity and Bert-Based Query-Answer Relevance," In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1113-1116). |
Tomas Mikolov et al., "Efficient Estimation of Word Representations in Vector Space," arXiv preprint arXiv:1301.3781 (2013). |
Wee Chung Gan, et al., "Improving the Robustness of Question Answering Systems to Question Paraphrasing," Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6065-6075 Florence, Italy, Jul. 28-Aug. 2, 2019. |
Also Published As
Publication number | Publication date |
---|---|
US20230143777A1 (en) | 2023-05-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9348900B2 (en) | Generating an answer from multiple pipelines using clustering | |
US10387437B2 (en) | Query rewriting using session information | |
US10489399B2 (en) | Query language identification | |
US9146987B2 (en) | Clustering based question set generation for training and testing of a question and answer system | |
US10769552B2 (en) | Justifying passage machine learning for question and answer systems | |
US12223002B2 (en) | Semantics-aware hybrid encoder for improved related conversations | |
CN105989040B (en) | Intelligent question and answer method, device and system | |
US9230009B2 (en) | Routing of questions to appropriately trained question and answer system pipelines using clustering | |
US9621601B2 (en) | User collaboration for answer generation in question and answer system | |
US20180196881A1 (en) | Domain review system for identifying entity relationships and corresponding insights | |
US7565345B2 (en) | Integration of multiple query revision models | |
US10565265B2 (en) | Accounting for positional bias in a document retrieval system using machine learning | |
US9251185B2 (en) | Classifying results of search queries | |
US20170270159A1 (en) | Determining query results in response to natural language queries | |
US9411886B2 (en) | Ranking advertisements with pseudo-relevance feedback and translation models | |
US20150161242A1 (en) | Identifying and Displaying Relationships Between Candidate Answers | |
US20150261859A1 (en) | Answer Confidence Output Mechanism for Question and Answer Systems | |
US11734322B2 (en) | Enhanced intent matching using keyword-based word mover's distance | |
US20130198192A1 (en) | Author disambiguation | |
CA3119416C (en) | Combining statistical methods with a knowledge graph | |
CN110147494B (en) | Information searching method and device, storage medium and electronic equipment | |
US10528576B1 (en) | Automated search recipe generation | |
US8364672B2 (en) | Concept disambiguation via search engine search results | |
CN110990533A (en) | Method and device for determining standard text corresponding to query text | |
KR20140109729A (en) | System for searching semantic and searching method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ADOBE INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BADJATIYA, PINKESH;ANAND, TANAY;SHAHID, SIMRA;AND OTHERS;SIGNING DATES FROM 20211108 TO 20211110;REEL/FRAME:058077/0574 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
ZAAB | Notice of allowance mailed |
Free format text: ORIGINAL CODE: MN/=. |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |