US7681147B2 - System for determining probable meanings of inputted words - Google Patents
System for determining probable meanings of inputted words Download PDFInfo
- Publication number
- US7681147B2 US7681147B2 US11/314,956 US31495605A US7681147B2 US 7681147 B2 US7681147 B2 US 7681147B2 US 31495605 A US31495605 A US 31495605A US 7681147 B2 US7681147 B2 US 7681147B2
- Authority
- US
- United States
- Prior art keywords
- word
- probability
- context
- meaning
- probable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Definitions
- the present embodiments relate generally to document classification, and more particularly relates to identifying the meaning of words.
- a document In document classification, a document may be assigned to one or more categories, based on its contents.
- a recent use of document classification techniques has been spam filtering which tries to discern E-mail spam messages from legitimate emails.
- Document classification tasks can be supervised, where some external mechanism, such as human feedback, provides information on the correct classification for the documents, and unsupervised, where the classification is done without reference to external information.
- Document classification techniques include naive Bayes classifier, latent semantic indexing, support vector machines, and approaches based on natural language processing.
- the embodiments described below include a system for determining probable meanings of words.
- a first aspect an input of a word is obtained.
- Probable meanings of the word are determined in accordance with a prior probability of probable meanings of the word and a context frequency probability of probable meanings of the word.
- a client/server network is used for determining probable meanings of words.
- a client may be used to enter at least one word on the network.
- a server may be used to obtain an input of at least one word over the network.
- a processor may determine a probable meaning of the word in accordance with a prior probability of probable meanings of the word and a context frequency probability of probable meanings of the word
- FIG. 1 provides a simplified view of a network environment including a natural language server.
- FIG. 2 is a flow chart illustrating a use of the natural language server.
- FIG. 3 is an exemplary screen shot of a web page that may be displayed by a service provider.
- FIG. 4 is a flowchart illustrating exemplary operations of the natural language server.
- FIG. 5 is a flow chart illustrating the generation of a training set.
- FIG. 6 is a flow chart of a process that may be used with double-sided contexts.
- FIG. 7 is a diagram showing a relationship of the processes for determining the probability of the meanings of a word.
- FIG. 8 is a block diagram illustrating an exemplary attribute hierarchy.
- FIG. 9 is a flowchart illustrating an exemplary implementation of risk assessment and statistical example hunting.
- FIG. 10 is a block diagram of an exemplary general computer system.
- the principles described herein may be embodied in many different forms.
- the system may enable a better distinction or disambiguation between different possible meanings of words.
- the system is described in terms of determining potential place names within text documents. At least a portion of the system may utilize a reduced amount of human input to generate statistics that enable the system to distinguish the meaning words better.
- the system may also use statistics to identify which words have a meaning that may be more uncertain than others and therefore may merit further investigation.
- the system is described as used in a network environment, but the system may also operate outside of the network environment.
- FIG. 1 provides a simplified view of a network environment 100 in which the system may operate. Not all of the depicted components may be required and some embodiments may include additional components not shown in the figure. Variations in the arrangement and type of the components may be made without departing from the spirit or scope of the claims as set forth herein.
- environment 100 includes a natural language server 114 , which may provide for classification of documents and words within documents.
- the natural language server 114 may also be used outside of the environment 100 for other implementations.
- the environment 100 may also include an advertisement services server 110 , which may provide a platform for selection, optimization, and/or distribution of advertisements for inclusion in pages, such as web pages. Additionally or alternatively, the natural language server 114 and the advertisements services server 110 may be implemented together with the same physical server. Web pages may be provided to the natural language server 114 , the advertisement services server 110 and other users by a portal server 104 and/or a third-party server 102 .
- Some or all of the natural language server 114 , the advertisement services server 110 , portal server 104 , and third-party server 102 may be in communication with each other by way of a network 108 .
- the advertisement services server 110 and portal server 104 may each represent multiple linked computing devices, and multiple third-party servers, such as third-party server 102 , may be included in environment 100 .
- Network 108 may be regarded as a public or private network connection and may include, for example, a virtual private network or an encryption or other security mechanism employed over the public Internet, or the like.
- a user device 106 depicted as a conventional personal computer, and/or other device such as a mobile user device 112 , including a network-enabled mobile phone, personal digital assistant (PDA), pager, network-enabled television, digital video recorder, such as TIVO, and/or automobile.
- PDA personal digital assistant
- User device 106 and mobile user device 112 are represented by user-interactive devices that typically run browser applications, and the like, to display requested pages received over a network.
- the user may be a consumer of goods or services that is searching for a business such as a business of the advertiser.
- Such devices are in communication with portal server 104 and/or third-party server 102 by way of network 109 .
- Network 109 may include the Internet and may include all or part of network 108 ; network 108 may include all or part of network 109 .
- Portal server 104 , third-party server 102 , advertisement services server 110 , user device 106 , and mobile user device 112 represent computing devices of various kinds.
- Such computing devices may generally include any device that is configured to perform computation and that is capable of sending and receiving data communications by way of one or more wired and/or wireless communication interfaces.
- Such devices may be configured to communicate in accordance with any of a variety of network protocols, including but not limited to protocols within the Transmission Control Protocol/Internet Protocol (TCP/IP) protocol suite.
- TCP/IP Transmission Control Protocol/Internet Protocol
- user device 106 may be configured to execute a browser application that employs HTTP to request information, such as a web page, from a web server, which may be a process executing on portal server 104 or third-party server 102 .
- Networks 108 , 109 may be configured to couple one computing device to another computing device to enable communication of data between the devices.
- Networks 108 , 109 may generally be enabled to employ any form of machine-readable media for communicating information from one device to another.
- Each of networks 108 , 109 may include one or more of a wireless network, a wired network, a local area network (LAN), a wide area network (WAN), a direct connection such as through a Universal Serial Bus (USB) port, and the like, and may include the set of interconnected networks that make up the Internet.
- Networks 108 , 109 may include any communication method by which information may travel between computing devices.
- FIG. 2 is a flow chart illustrating a use of the natural language server 114 .
- a user 106 , 112 may enter a search query into a page of the browser.
- the embodiments are not limited to the analysis of search queries, however, the natural language server 114 may analyze words input in other ways too.
- the words may be inputted with a keypad of a computer or in other ways, such as voice recognition of a computer or other input devices such as a voice recognition module of an automobile.
- the input may be entered into one or more fields on a page of a search provider (e.g. FIG. 3 ), such as a search provider of the advertisement services server 110 or third-party server 104 .
- the natural language server 114 may analyze the words of the search query to provide information about one or more probable meanings of the words. For example, if the word is Gary, the natural language server 114 may give the possible meaning of the word being the name of a person or a place. If the word is ‘apple’, the natural language server 114 may be used to determine if the word relates to a fruit, an APPLE computer or an APPLE IPOD.
- the natural language server 114 may be used to determine if the word relates to music or a lock. If the word is ‘orange’, the natural language server 114 may be used to determine if the word relates to a county, a fruit, or a name of a restaurant, etc. If the word ‘Gary’ in the search query is meant by the user to be the name of a place, the natural language server 114 may determine the probability that the place is a city or a county, and which state the city or county may be located.
- Web pages may be indexed to a search.
- News stories location may be plotted on a map.
- Geographically relevant advertisements may be placed on a web page.
- Enhanced statistics may be calculated for use in query analysis.
- Search result listings may be presented to the user in accordance with the probabilities. For example, a result that corresponds to a meaning having the highest probability may be listed first.
- the natural language server 114 may also be used with other implementations, such as to present ads for pay-for-placement, cost-per-click, pay-per-call and pay-per-act type services.
- the advertisement server 220 may use the information to send relevant ads to the user. For example, if the word is meant to be the place “Gary, Ind.’ it may not be relevant to send ads related to ‘Gary's ice cream’ in California.
- FIG. 3 is an exemplary screen shot of a web page 300 that may be displayed by a service provider.
- the web page 300 may include a field 310 for a user to enter a search query.
- the field 310 may be divided into one or more fields, such as having a separate field for a user to enter a location.
- Other part of the web page 300 may include news highlights 320 , ads 330 , and links 340 to other features provided by the service provider.
- the natural language server 114 may parse the words entered into the field 310 to analyze the words separately and in the context of the other words of the of the search query, as described in more detail below.
- FIG. 4 is a flowchart illustrating exemplary operations of the natural language server 114 , which are also described in more detail below.
- the natural language server 114 may calculate prior probabilities of the meanings for the words.
- the prior probability may include the likelihood that the word refers to a predetermined meaning, such as a determined person, place or thing, irregardless of the context in which the word is used.
- the prior probabilities may be determined from previous analysis of documents in which the words were analyzed to determine the probability that each word pertained to each possible meaning.
- the natural language server 114 may also calculate the context frequency probabilities.
- the context includes the word or words that appear before and/or after the word for which the meaning is being determined.
- the average probability may be determined of each attribute that any of the words in these contexts possessed.
- the average context frequency may be compared to the average prior probabilities to determine if the average is higher or lower than the prior probabilities. If the average for the context frequency is higher, the conditional probability for each attribute (e.g. meaning) is calculated.
- the natural language server 114 may also calculate specific disambiguitors for the word.
- the specific disambiguitors relate to whether the words or phrases increase the likelihood of a given meaning of a particular other word or phrase. This may be calculated in a similar way to the immediate context, such as whether the probability of a given outcome is significantly higher or lower than expected in the presence of a particular word or phrase. For example, YAHOO! may imply the place Sunnyvale, Calif., but not another place called Sunnyvale. These probabilities may be considered conditional upon the combination of prior probabilities and immediate context values, as described in more detail below.
- general disambiguators may be calculated. If the specific disambiguators are aggregated across all related attributes (e.g., Sunnyvale is a U.S. town in California), it is determined whether there are any values that are significant across a whole attribute or meaning of the word.
- the prior probabilities (block 400 ), such as a location related probability, are determined for a word.
- the location related probability includes the probability that a word or phrase refers to a location. For example, if nine times out of ten the phrase “Washington” refers to a location then the location related probability for “Washington” is 90%.
- the context frequency (block 410 ) may then be considered.
- the context frequency relates to the ways the word is used in the context of other words surrounding the word. If a word or phrase precedes or follows a potential place name, the conditional probability that the potential place name refers to a location is calculated. For example: the phrase “a resident of Gary” implies that “Gary” refers to a place, but the phrase “George Washington” implies that “Washington” does not refer to a place.
- the natural language server 114 may determine, for example, that “a resident of ⁇ ?>” had a context frequency value of 80% and that the context “George ⁇ ?>” had a context frequency value of 0.1%.
- the natural language server 114 may next utilize a combine function to analyze the results of the prior probability of the context frequency.
- the combine function is a mathematical function which can be used in the implementation of natural language disambiguation. This function is described herein with regard to the example, however other functions could be used, such as the “likelihood function” used in Bayes Theorem and other machine learning techniques. Given a prior probability and one or more context frequency values, the combine function may be determined as:—
- the values from small samples may be moderated by adding one to the numerator and two to the denominator:—
- Another way of achieving a similar effect may be to wait for a significant sample amount to appear. Given the richness in language constructs possible, however, this may not always be desirable as the resulting data set (in terms of context frequency and prior probabilities) may be too small to make the actual language recognition work. The amount of data which is needed to be tagged by human or other input to obtain desirable results from the natural language server 114 may be too large to be practical. While small data samples may be avoided because they often produce extreme values that may not be representative of general trends, moderation may allow for a small number of occurrences to have a statistical meaning.
- conditional probability may be calculated that a given context implies a location reference. For example, considering the preceding context “travel to” in the following cases:—
- An average prior probability or expected value for the context “travel to” in these cases would be 68.75%. If another input, such as human input, determines that all the potential place names in the above examples referred to locations except the last case (i.e. “travel to Charles”), then the actual probability for this context may be 75%, e.g., three out of the four names refer to places in the context of “travel to” before the word. Because the actual probability is higher than the expected, it may be concluded that this context implies that the potential place name is more likely to be a location when used in this context. Values may be divided into two sets: “before” and “after” contexts. A number of words, such as four words, may be allowed in either direction (e.g. “now he lives in ⁇ ?>” is a four word “before” context). Other context terms or symbols may be used, such as ‘in’, ‘near’, ‘around’ and the hyphen symbol.
- the value 0.5 may be referred to as the neutral context frequency as there is no effect when combined with the prior probability. If the context frequency value generated is too close to the neutral context frequency, they may be ignored as having too small an effect, such as those between 0.45 and 0.55 for example.
- the context frequency values may be moderated for small samples. Because a given context may occur only a very small number of times, it may also be necessary to moderate the context frequency values, such as in a similar way to the prior probability values described above. Because the prior probability values may have already been moderated, the average prior probability may not be an extreme value but the actual probability may be extreme if not moderated.
- the following formula may be used to moderate context values:—
- ModerateContex ⁇ t ( c , a , n ) ( c . n + a ) ( n + 1 ) ( Equation ⁇ ⁇ 16 )
- the actual probability may be moderated towards 0.5.
- the modified actual probability may be lower than the expected probability in cases where the original actual probability was 100%.
- Equation 16 modifies the actual probability to be more similar to the expected probability and thus may avoid this problem.
- the moderated value may then be used in the inverse combine function instead of the original, actual probability value. For example, an event may be more significant if it contradicts that general expectation than if it merely corroborates it. As an example, “he lives in Florida” is a location reference and “Richard Florida” is not a location reference. Because the word “Florida” is generally considered to refer to a location, more significance may be given to the context (“Richard” ⁇ ?>”) as it contradicts the expectation.
- the context frequency values may also be used to indicate more than this.
- Specific prior probabilities may be determined by examining, for example, an input generated training set (e.g. a set of documents that have been tagged with the precise meaning of each potential place name) and calculating what proportion of the uses of each potential place name refer to a given meaning. The training may be accomplished with human inputs or automatically, such as with a processor.
- FIG. 5 is a flow chart illustrating the generation of a training set.
- one or more lists of names to train may be created. This is a list of words to be disambiguated with each of their possible meaning and the attributes that relate to each meaning. For example, in the phrase “the capital of Denmark”, the context “the capital of ⁇ ?>” may indicate that this is a country in Europe or a country or state in the U.S. By assigning specific attributes to contexts and to places, these contexts may be applied when the attribute of the place in question matches the attribute of the context it is found in. The context may help to disambiguate between places with the same name but different attributes.
- the sets of attributes that may be used include: a. Place Type (Country, State, County, Town etc.), b.
- ISO Country Code US, GB, FR, DE etc
- Administrative Area Level 1 i.e. State/Province/Region
- US/Florida US/California
- CA/Ontario etc. d. Is a place, and e. Is not a place.
- Other examples include, in the phrase “the French town of Nancy”, the context “the French town of ⁇ ?>” may indicate that this is likely to be a town in France.
- the context “ ⁇ ?>, Illinois” may indicate this is a place in the state of Illinois.
- the context “ ⁇ ?> Crown Court” may indicate this is a place in England.
- text pieces are collected.
- the text pieces may include publications, such as articles, that may be collected on the Internet or from other sources.
- a search of the text may be conducted for the selected names as they appear in the text.
- the names with highly ambiguous meanings are identified, such as the names with many different meanings. To determine whether a name is highly ambiguous, when training a given place name, for example, the number of different places with that name may be counted. Supposing ten examples were selected of the use of a potential place name and the examples were disambiguated, if the same answer in all ten examples was given, then it may be likely that the potential place name may be relatively ambiguous. If the answer given to every single example was different, then this name may be considered highly ambiguous.
- an input such as from a human, may be obtained regarding the name. Questions may be presented to the human about the name, such as whether the name appears to be a place or not. An attribute may be associated with the name in accordance with the input, such as to indicate that the place is or is not a place, and where the place is located.
- the results from blocks 500 to 540 may be analyzed to generate statistics. Table 2, for example, illustrates a proportion of place names that refer to a determined meaning.
- the meaning of the word may be identified by a unique number rather than a description, as described in more detail below.
- the calculated statistics may be used to re-analyze each document and/or new documents.
- the context may appear before and/or after the word to be disambiguated.
- FIG. 6 is a flow chart of a process that may be used to decide whether results from a double-sided context taken together are significantly different from analyzing the sides of the contexts separately as the sum of its two sides.
- the combine function is applied to the “before” and “after” context values.
- the inverse combine function is applied to the double-sided context value and the combined value from the step 600 .
- the resulting value for the double-sided context is stored for later use. Table 3 shows exemplary results:
- rare words sometimes it may be useful to allow for wildcard words within a context. For example, when analyzing the phrase “John Byalistock of Washington”, it may be unlikely that a context value will occur for “John Byalistock of”. The rare words such as “Byalistock” may be ignored and context values for the phrase “John ⁇ *> of ⁇ ?>” may be generated, which gives more information than the context “of ⁇ ?>”.
- the rarity skipping contexts may be generated in different ways such as by taking a first pass of all the text in a training set and performing a word count. When contexts are generated, words that are too rare may be passed over. Once values for all the contexts found within the text are collected, those words that appeared only once may be determined and a set of variants may be created of each one with each word skipped over in turn. Both approaches may be used simultaneously.
- tokens may be used to represent the beginning and the end of a piece of text. Identification of the beginning and end of the text may be useful because documents may often start with the name of the place that the document is about. By using beginning and end tokens, the system may make use of this fact. This may be particularly true if the document is a search engine query.
- the system may calculate specific disambiguators that help to disambiguate between different meanings of specific names.
- the probabilities may be calculated as being dependent on the combination of a prior probabilities value and the context frequency. A risk was taken when making the assumption about the independence of the statistics being calculated, since there may be too many different values to be able to reasonably calculate the statistical relation between all of them.
- the values may be formed into groups and those groups may be placed in a determined order when calculating the relationship between the statistics in one group and those in the next group.
- the word “Californian” anywhere in a piece of text makes it 83% more likely that a place in California, USA is being referred to; the phrase “Gov. Arnold Schwarzenegger” may make it 78% more likely that a place in California, USA is being referred to and the word “Aussie” may make it 81% more likely that a town in Australia is being referred to.
- a list of attributes may be used for the general disambiguators. Values may calculated by aggregating together all the specific disambiguators that relate the places with a given attribute and storing the significant values for later use.
- Non-text disambiguators that are not within the text of a web-page or news article may also be used, such as the source or general category (e.g. Sports news, current affairs etc.). In the context of internet search queries the IP Address location of the searcher, or user registration information, could be used.
- FIG. 7 is a diagram showing a relationship of the processes for determining the probability of the meanings of a word. Probability values in each layer may be calculated after the layer immediately inside of it. To generate statistics for prior probabilities and context frequencies, certain assumptions may be made about statistical independence. This may be necessary if a vast number of variables are involved. In the calculation of context frequencies, the calculations are not conditional on the expected prior probability, but may be used to help further determine the results that most likely match the meaning of the word of the user. Specific and general disambiguators may be considered to help further achieve the desired result of correctly determining the meaning of the word.
- the prior probability may be calculated to determine results that give the best match to the desired result of the searcher.
- the context frequency values may then be calculated that modify the predicated prior probability values to further determine a best match to the actual results.
- the disambiguator values may then be calculated to modify the combined prior probability and context frequency predicted values to give the best match to the desired results.
- a normalization function may be used because, although the prior probabilities of all the distinct meanings of a potential place name sums to one, there is no guarantee that, once modified, using various context frequencies and disambiguators the prior probabilities will still sum to one.
- a set of probabilities may be normalized by dividing each member of the set by the sum of the whole set. Supposing the following prior probabilities were determined for the name “Garfield”:—
- the combine function may be used to adjust all the probabilities by the same amount such that they are summed to one. Combining each value with 6%, for example, gives the following values:—
- AdjustmentValue ⁇ ( P ) 1 1 + 1 / R ( Equation ⁇ ⁇ 17 )
- Dummy entries may also be used to calculate prior probabilities.
- the prior probabilities When calculated, the prior probabilities may be moderated such as in the way location related probability values were moderated. Moderation may be accomplished by creating a dummy entry for each possible meaning of a word to start with a uniform distribution of probabilities across all possible meanings of the word. The distributions may then diverge from being uniform as more information becomes available.
- context inertia may be used, such as the partial inheritance of values by longer contexts from their shorter relatives.
- Longer contexts are rarer than their shorter relatives.
- the context “lived in ⁇ ?>” may be rarer than “in ⁇ ?>”.
- longer contexts have similar meanings to their shorter relatives, but this is not always the case.
- “President of ⁇ ?>” may seem likely to imply a reference to a country, and “Vice President of ⁇ ?>” might also, but “Senior Vice President of ⁇ ?>” may be more likely to refer to role in a company than in a country.
- ModerateLongerContext ⁇ ( c , a , n , i ) ( c . n + a . i ) ( n + i ) ( Equation ⁇ ⁇ 18 )
- c is the longer context probability
- a is the shorter context probability
- n is the number of times the longer context was found
- i is the “context inertia” value.
- the context inertia value may be a determined constant, such as 4, but the value is implementation dependent and may be adjusted to make the system more or less sensitive to distinct meanings for longer contexts.
- the probability value for the longer context may be stored for future reference if it is sufficiently different from that of its shorter relative. The difference may be determined using the inverse combine function. It should be noted that double-sided contexts may have two immediate shorter relatives. The two values may be averaged to produce a single value that can then be used.
- Contexts that occur rarely may be used or purged from the system as having only limited value.
- the inclusion of rare contexts may be detrimental as they use up space in memory while possibly offering relatively little benefit in terms of disambiguation. How often a context needs to appear before it becomes common enough may be decided on a performance versus accuracy basis—and this may become an operational consideration to optimize the memory space and execution time needed to archive the task of analyzing documents.
- FIG. 8 is a block diagram illustrating an exemplary attribute hierarchy.
- a feature correlation threshold may be set. For example, when generating conditional probabilities for each feature, all features may be excluded whose occurrence has a statistical correlation of less than a determined percentage, such as 5%, with any particular meaning of a potential place name. As such, common words like “the”, “a” and “of’ may be eliminated to reduce the amount of time and memory that may be needed to calculate a full set of conditional probabilities.
- training-set statistics may be generated.
- risk assessment and statistical example hunting may be used.
- place name example if each potential place name were considered separately, the number of examples needed before enough information was gathered may be determined.
- One factor to consider may be the volume of uses typically found in text. For example:—
- FIG. 9 is a flowchart illustrating an exemplary implementation of risk assessment and statistical example hunting.
- a document classifier may be created that has no context frequency values, or specific and general disambiguators yet generated.
- the data may include the potential place names with a uniform distribution of probabilities across all possible meanings, such as based on the dummy entries for each one.
- the document classifier may be applied across a large number of documents, such as a million or two documents to start with.
- a determined value such as 0.25
- a summary may be created, such as grouped by potential place name, of the uncertainty values with the following fields:—
- the list may be sorted, for example, by decreasing total uncertainty. This list may be referred to as the “candidate list”.
- the candidate list may be sent to an application, such as a web application.
- the application may present each document from the candidate list to a user with the potential place name highlighted.
- An optional list of possible meanings (including the extra options of: “Ambiguous Place”, “Other place” [i.e. a place that is not currently present in the data], “Don't know”) may also be passed.
- the user makes a best guest as to the meaning of the name in the context of the document and submits a response.
- the response may be stored in a file.
- the file may contain, for example, the following columns:—
- the candidate list may be large, such as one entry for every potential place name that appears in any of the documents. It may not be essential that the user gives a response to every single case in the list. But once enough responses have been gathered from which to calculate the first set of location related probabilities, context frequencies and specific and general disambiguators, a second run of the document classifier may be started. Because the system may now have some knowledge on which to base its classification, at block 930 , the system may begin to generate different uncertainty values for the cases that are similar to the ones that the user has already disambiguated. For the sake of efficiency, potential place name/Document ID combinations that have already been disambiguated by the user may be excluded from the generation of the candidate list.
- the user may keep giving their response to the previous list while the classifier is running for the second time. Once the classifier has finished and a second candidate list has been produced the user can start responding to this new list and a third run can start, and so on. After a few iterations the system may have gained enough knowledge that it will stop presenting the user with the more obvious cases and start presenting more complex examples. In this way the system may leverage the knowledge gained to date in such a way that much of the tedium of manually disambiguating thousands of extremely similar cases is eliminated.
- Word and phrase abstraction may be applied to context frequency values and disambiguators. Instead of allowing for rarity tokens and beginning and end of document tokens, a large array of abstract tokens may be developed that represent different possibilities.
- “Tompkins” has no immediate context that strongly implies that it is a county. It may be determined that the word “counties” after “Chemung” strongly implies that this is a county.
- a context may be created that represents the idea that if' ⁇ Y>′′ is a county in the phrase “ ⁇ X> and ⁇ Y>” then “ ⁇ X>” is more likely to be a county.
- a context may also be created that represents the idea that if “ ⁇ Y>” is a county in the phrase “ ⁇ X>, ⁇ Y>” then this strongly implies that “ ⁇ X>” is a county. Then the whole list may be traversed and it may be deduced that the names are all references to counties.
- the system may also use atomized statistical layering. While the location related probability may generally be a more significant value than the other values such as the context, the layers may overlap. For example, if the phrase “of New York” is encountered at random in text then it may be assumed that “New York” is a location reference because it is a known phrase that has a high location related probability. The context “of ⁇ ?>” seems consistent with this assumption but of secondary importance. On the other hand, if the phrase “John Rochester” is encountered in a piece of text it may be assumed that “Roley” is not a location reference primarily based on the context “John ⁇ ?>”. The following example shows how the distinction may be important:—for example a training set of documents contains the following cases:
- a first pass of the training set may be conducted and statistically independent values may be calculated for all possible disambiguators.
- a second pass may then be accomplished and the strongest disambiguator may be selected in each case, i.e. the value furthest from the neutral value of 0.5. The value may be accumulated for the remaining disambiguators relative to the first, or even a sequence of disambiguators, as dependent on the next strongest.
- the system may identify missing place names from the documents in an automated way. Once a reasonably strong set of context values is developed, the set may be used to search for plausible place names, e.g., starting with a capital letter, and the system may evaluate the probability of them actually being place names from the strength of the contexts that they are found in. In this way, it is also possible to extrapolate location related probability values for place names that have not been researched, if the average probability of the context that they are found in is considered.
- FIG. 10 is an illustrative embodiment of a general computer system 1000 , such as the computer systems used for the natural language server 114 and other components of the environment 100 .
- the computer system 1000 can include a set of instructions that can be executed to cause the computer system 1000 to perform any one or more of the computer based functions disclosed herein.
- the computer system 1000 may operate as a standalone device or may be connected, e.g., using a network, to other computer systems or peripheral devices.
- Suitable operating systems include any of the MICROSOFT WINDOWS suite including XP, NT and DOS. Other operating systems may be used such as UNIX or LINUX, and the program may be invoked from another program such as an Application Program Interface (API).
- API Application Program Interface
- alternative software implementations may be used including, but not limited to, distributed processing or component/object distributed processing, parallel processing, or virtual machine processing can also be constructed to implement the tools described herein.
- the computer system may operate in the capacity of a server or as a client user computer in a server-client user network environment, or as a peer computer system in a peer-to-peer (or distributed) network environment.
- the computer system 1000 can also be implemented as or incorporated into various devices, such as a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a mobile device, a palmtop computer, a laptop computer, a desktop computer, a communications device, a wireless telephone, a land-line telephone, a control system, a camera, a scanner, a facsimile machine, a printer, a pager, a personal trusted device, a web appliance, a television, a network router, switch or bridge, or any other machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
- PC personal computer
- PDA personal digital assistant
- STB set-top box
- PDA personal digital assistant
- the computer system 1000 can be implemented using electronic devices that provide voice, video or data communication. Further, while a single computer system 1000 is illustrated, the term “system” shall also be taken to include any collection of systems or sub-systems that individually or jointly execute a set, or multiple sets, of instructions to perform one or more computer functions.
- a router may act as a link between LANs, enabling messages to be sent from one to another.
- Communication links within LANs typically include twisted wire pair or coaxial cable.
- Communication links between networks may generally use analog telephone lines, full or fractional dedicated digital lines including T1, T2, T3, and T4, Integrated Services Digital Networks (ISDNs), Digital Subscriber Lines (DSLs), wireless links including satellite links, or other communication links.
- ISDNs Integrated Services Digital Networks
- DSLs Digital Subscriber Lines
- Remote computers and other network-enabled electronic devices may be remotely connected to LANs or WANs by way of a modem and temporary telephone link.
- the computer system 1000 may include a processor 1002 , e.g., a central processing unit (CPU), a graphics processing unit (GPU), or both. Moreover, the computer system 1000 can include a main memory 1004 and a static memory 1006 that can communicate with each other via a bus 1008 .
- the computer system 1000 may further include a video display unit 1010 , such as a liquid crystal display (LCD), an organic light emitting diode (OLED), a flat panel display, a solid state display, or a cathode ray tube (CRT). Additionally, the computer system 1000 may include an input device 1012 , such as a keyboard, and a cursor control device 1014 , such as a mouse.
- the computer system 1000 can also include a disk drive unit 1016 , a signal generation device 1018 , such as a speaker or remote control, and a network interface device 1020 .
- the disk drive unit 1016 may include a computer-readable medium 1022 in which one or more sets of instructions 1024 , e.g. software, can be embedded. Further, the instructions 1024 may embody one or more of the methods or logic as described herein. The instructions 1024 may reside completely, or at least partially, within the main memory 1004 , the static memory 1006 , and/or within the processor 1002 during execution by the computer system 1000 . The main memory 1004 and the processor 1002 also may include computer-readable media.
- Dedicated hardware implementations such as application specific integrated circuits, programmable logic arrays and other hardware devices, can be constructed to implement one or more of the methods described herein.
- Applications that may include the apparatus and systems of various embodiments can broadly include a variety of electronic and computer systems.
- One or more embodiments described herein may implement functions using two or more specific interconnected hardware modules or devices with related control and data signals that can be communicated between and through the modules, or as portions of an application-specific integrated circuit. Accordingly, the present system encompasses software, firmware, and hardware implementations.
- the methods described herein may be implemented by software programs executable by a computer system.
- implementations can include distributed processing, component/object distributed processing, and parallel processing.
- virtual computer system processing can be constructed to implement one or more of the methods or functionality as described herein.
- a computer-readable medium includes instructions 1024 or receives and executes instructions 1024 responsive to a propagated signal, so that a device connected to a network 1026 can communicate voice, video or data over the network 1026 . Further, the instructions 1024 may be transmitted or received over the network 1026 via the network interface device 1020 .
- While the computer-readable medium is shown to be a single medium, the term “computer-readable medium” includes a single medium or multiple media, such as a centralized or distributed database, and/or associated caches and servers that store one or more sets of instructions.
- the term “computer-readable medium” shall also include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor or that cause a computer system to perform any one or more of the methods or operations disclosed herein.
- the computer-readable medium may include a solid-state memory such as a memory card or other package that houses one or more non-volatile read-only memories. Further, the computer-readable medium may be a random access memory or other volatile re-writable memory. Additionally, the computer-readable medium may include a magneto-optical or optical medium, such as a disk or tapes or other storage device to capture carrier wave signals such as a signal communicated over a transmission medium. A digital file attachment to an e-mail or other self-contained information archive or set of archives may be considered a distribution medium that is equivalent to a tangible storage medium. Accordingly, the disclosure is considered to include any one or more of a computer-readable medium or a distribution medium and other equivalents and successor media, in which data or instructions may be stored.
- the phrases “at least one of ⁇ A>, ⁇ B>, . . . and ⁇ N>” or “at least one of ⁇ A>, ⁇ B>, ⁇ N>, or combinations thereof” are defined by the Applicant in the broadest sense, superceding any other implied definitions herebefore or hereinafter unless expressly asserted to the contrary, to mean one or more elements selected from the group comprising A, B, . . . and N, that is to say, any combination of one or more of the elements A, B, . . . or N including any one element alone or in combination with one or more of the other elements which may also include, in combination, additional elements not listed.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
Description
Cmb(a,b,c)=Cmb(a,Cmb(b,c))=Cmb(Cmb(a,b),c) (Equation 2)
Cmb(a,b)=Cmb(b,a) (Equation 3)
Cmb(1,x)=1 (Equation 5)
Cmb(0,x)=0 (Equation 6)
1/1=1=100%. (Equation 7)
-
- “travel to New York”
- “travel to London”
- “travel to Austin”
- “travel to Charles”
TABLE 1 | |||
LRP | |||
Name | Value | ||
New York | 94% | ||
London | 93% | ||
Austin | 71% | ||
Charles | 17% | ||
Cmb(a,0.5)=a (Equation 15)
TABLE 2 | ||||
Name | Meaning | Probability | ||
“Springfield” | Springfield, Illinois | 46% | ||
“Springfield” | Springfield, Missouri | 36% | ||
“Springfield” | Springfield, Massachusetts | 11% | ||
“Springfield” | Not a place | 7% | ||
TABLE 3 | |||
Context | Value | ||
“of <?>” | 64% | ||
“<?> died” | 70% | ||
“of <?> died” | 99% | ||
TABLE 4 | ||
Text | Attribute | Probability |
“Californian” | State = US/California | 83% |
“Gov. Arnold Schwarzenegger” | State = US/California | 78% |
“Aussie” | Type = AU/Town | 81% |
TABLE 5 | |||
Meaning | Probability | ||
Town in New York | 25% | ||
Town in Alberta | 25% | ||
County in Utah | 25% | ||
Not a place | 25% | ||
TABLE 6 | |||
Meaning | Probability | ||
Town in New York | 50% | ||
Town in Alberta | 50% | ||
County in Utah | 99% | ||
Not a place | 25% | ||
TABLE 7 | |||
Meaning | Probability | ||
Town in |
22% | ||
Town in |
22% | ||
County in Utah | 44% | ||
Not a place | 11% | ||
TABLE 8 | |||
Meaning | Probability | ||
Town in New York | 6% | ||
Town in Alberta | 6% | ||
County in Utah | 86% | ||
Not a place | 2 | ||
Total | |||
100% | |||
TABLE 9 | ||||
Name | No. of results from Yahoo! Search | Priority | ||
New York | 1,000,000,000 | High | ||
London | 560,000,000 | High | ||
John | 1,000,000,000 | High | ||
Jaszarokszallas | 16,000 | Low | ||
Hincesti | 33,000 | Low | ||
Bertacchi | 64,000 | Low | ||
U=1−max(p 1 , p 2 , . . . , p n) (Equation 19)
U=1−1/n (Equation 20)
-
- Potential place name
- Total uncertainty for this name
- Max uncertainty for this name
- Document ID of document with max uncertainty
-
- Document ID
- Potential place name
- Selected meaning (i.e. unique place ID or “Not a place” etc.)
- User name
TABLE 10 | |||
Potential | No. of | Location | |
Phrase | Place Name | Occurances | Reference |
“David Beckham” | |
1000 | No |
“David Bunbury” | |
100 | No |
“David Overton” | |
100 | No |
“in nearby Bunbury” | Bunbury | 1 | Yes |
TABLE 11 | |||
Potential Place Name | LRP | ||
Beckham | 1/1002 = 0.001 | ||
Bunbury | 2/103 = 0.02 | ||
Overton | 1/102 = 0.01 | ||
TABLE 12 | |||
Context Frequency | |||
Context | Value | ||
“David <?>” | 1/1202 = 0.0008 | ||
“in nearby <?>” | 2/3 = 0.667 | ||
Actual location related probability=1/101=0.0099 (Equation 22)
CmbInv(0.0099,0.0074)=0.572=57.2% (Equation 24)
Claims (31)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/314,956 US7681147B2 (en) | 2005-12-13 | 2005-12-13 | System for determining probable meanings of inputted words |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/314,956 US7681147B2 (en) | 2005-12-13 | 2005-12-13 | System for determining probable meanings of inputted words |
Publications (2)
Publication Number | Publication Date |
---|---|
US20070136689A1 US20070136689A1 (en) | 2007-06-14 |
US7681147B2 true US7681147B2 (en) | 2010-03-16 |
Family
ID=38140943
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/314,956 Expired - Fee Related US7681147B2 (en) | 2005-12-13 | 2005-12-13 | System for determining probable meanings of inputted words |
Country Status (1)
Country | Link |
---|---|
US (1) | US7681147B2 (en) |
Cited By (51)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080117202A1 (en) * | 2006-11-22 | 2008-05-22 | Ronald Martinez | Methods, Systems and Apparatus for Delivery of Media |
US20080117201A1 (en) * | 2006-11-22 | 2008-05-22 | Ronald Martinez | Methods, Systems and Apparatus for Delivery of Media |
US20080120308A1 (en) * | 2006-11-22 | 2008-05-22 | Ronald Martinez | Methods, Systems and Apparatus for Delivery of Media |
US20080126961A1 (en) * | 2006-11-06 | 2008-05-29 | Yahoo! Inc. | Context server for associating information based on context |
US20080162686A1 (en) * | 2006-12-28 | 2008-07-03 | Yahoo! Inc. | Methods and systems for pre-caching information on a mobile computing device |
US20090150501A1 (en) * | 2007-12-10 | 2009-06-11 | Marc Eliot Davis | System and method for conditional delivery of messages |
US20090150514A1 (en) * | 2007-12-10 | 2009-06-11 | Yahoo! Inc. | System and method for contextual addressing of communications on a network |
US20090165022A1 (en) * | 2007-12-19 | 2009-06-25 | Mark Hunter Madsen | System and method for scheduling electronic events |
US20090177644A1 (en) * | 2008-01-04 | 2009-07-09 | Ronald Martinez | Systems and methods of mapping attention |
US20090176509A1 (en) * | 2008-01-04 | 2009-07-09 | Davis Marc E | Interest mapping system |
US20090177484A1 (en) * | 2008-01-06 | 2009-07-09 | Marc Eliot Davis | System and method for message clustering |
US20090182631A1 (en) * | 2008-01-16 | 2009-07-16 | Yahoo! Inc. | System and method for word-of-mouth advertising |
US20090198488A1 (en) * | 2008-02-05 | 2009-08-06 | Eric Arno Vigen | System and method for analyzing communications using multi-placement hierarchical structures |
US20090222304A1 (en) * | 2008-03-03 | 2009-09-03 | Yahoo! Inc. | Method and Apparatus for Social Network Marketing with Advocate Referral |
US20090248738A1 (en) * | 2008-03-31 | 2009-10-01 | Ronald Martinez | System and method for modeling relationships between entities |
US20090328087A1 (en) * | 2008-06-27 | 2009-12-31 | Yahoo! Inc. | System and method for location based media delivery |
US20090326800A1 (en) * | 2008-06-27 | 2009-12-31 | Yahoo! Inc. | System and method for determination and display of personalized distance |
US20100030870A1 (en) * | 2008-07-29 | 2010-02-04 | Yahoo! Inc. | Region and duration uniform resource identifiers (uri) for media objects |
US20100027527A1 (en) * | 2008-07-30 | 2010-02-04 | Yahoo! Inc. | System and method for improved mapping and routing |
US20100049702A1 (en) * | 2008-08-21 | 2010-02-25 | Yahoo! Inc. | System and method for context enhanced messaging |
US20100063993A1 (en) * | 2008-09-08 | 2010-03-11 | Yahoo! Inc. | System and method for socially aware identity manager |
US20100077017A1 (en) * | 2008-09-19 | 2010-03-25 | Yahoo! Inc. | System and method for distributing media related to a location |
US20100083169A1 (en) * | 2008-09-30 | 2010-04-01 | Athellina Athsani | System and method for context enhanced mapping within a user interface |
US20100082688A1 (en) * | 2008-09-30 | 2010-04-01 | Yahoo! Inc. | System and method for reporting and analysis of media consumption data |
US20100125604A1 (en) * | 2008-11-18 | 2010-05-20 | Yahoo, Inc. | System and method for url based query for retrieving data related to a context |
US20100161600A1 (en) * | 2008-12-19 | 2010-06-24 | Yahoo! Inc. | System and method for automated service recommendations |
US20100228582A1 (en) * | 2009-03-06 | 2010-09-09 | Yahoo! Inc. | System and method for contextual advertising based on status messages |
US20100280879A1 (en) * | 2009-05-01 | 2010-11-04 | Yahoo! Inc. | Gift incentive engine |
US20110035265A1 (en) * | 2009-08-06 | 2011-02-10 | Yahoo! Inc. | System and method for verified monetization of commercial campaigns |
US8024317B2 (en) | 2008-11-18 | 2011-09-20 | Yahoo! Inc. | System and method for deriving income from URL based context queries |
US8055675B2 (en) | 2008-12-05 | 2011-11-08 | Yahoo! Inc. | System and method for context based query augmentation |
US8060492B2 (en) | 2008-11-18 | 2011-11-15 | Yahoo! Inc. | System and method for generation of URL based context queries |
US8069142B2 (en) | 2007-12-06 | 2011-11-29 | Yahoo! Inc. | System and method for synchronizing data on a network |
US8150967B2 (en) | 2009-03-24 | 2012-04-03 | Yahoo! Inc. | System and method for verified presence tracking |
US8166168B2 (en) | 2007-12-17 | 2012-04-24 | Yahoo! Inc. | System and method for disambiguating non-unique identifiers using information obtained from disparate communication channels |
US8364611B2 (en) | 2009-08-13 | 2013-01-29 | Yahoo! Inc. | System and method for precaching information on a mobile device |
US8452855B2 (en) | 2008-06-27 | 2013-05-28 | Yahoo! Inc. | System and method for presentation of media related to a context |
US8554623B2 (en) | 2008-03-03 | 2013-10-08 | Yahoo! Inc. | Method and apparatus for social network marketing with consumer referral |
US8560390B2 (en) | 2008-03-03 | 2013-10-15 | Yahoo! Inc. | Method and apparatus for social network marketing with brand referral |
US8583668B2 (en) | 2008-07-30 | 2013-11-12 | Yahoo! Inc. | System and method for context enhanced mapping |
US8589486B2 (en) | 2008-03-28 | 2013-11-19 | Yahoo! Inc. | System and method for addressing communications |
US8745133B2 (en) | 2008-03-28 | 2014-06-03 | Yahoo! Inc. | System and method for optimizing the storage of data |
WO2014104943A1 (en) * | 2012-12-27 | 2014-07-03 | Abbyy Development Llc | Finding an appropriate meaning of an entry in a text |
US8892495B2 (en) | 1991-12-23 | 2014-11-18 | Blanding Hovenweep, Llc | Adaptive pattern recognition based controller apparatus and method and human-interface therefore |
US8914342B2 (en) | 2009-08-12 | 2014-12-16 | Yahoo! Inc. | Personal data platform |
US20150331852A1 (en) * | 2012-12-27 | 2015-11-19 | Abbyy Development Llc | Finding an appropriate meaning of an entry in a text |
US9224172B2 (en) | 2008-12-02 | 2015-12-29 | Yahoo! Inc. | Customizable content for distribution in social networks |
US9507778B2 (en) | 2006-05-19 | 2016-11-29 | Yahoo! Inc. | Summarization of media object collections |
US9535563B2 (en) | 1999-02-01 | 2017-01-03 | Blanding Hovenweep, Llc | Internet appliance system and method |
US9805123B2 (en) | 2008-11-18 | 2017-10-31 | Excalibur Ip, Llc | System and method for data privacy in URL based context queries |
US20230025964A1 (en) * | 2021-05-17 | 2023-01-26 | Verantos, Inc. | System and method for term disambiguation |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100920442B1 (en) * | 2006-09-21 | 2009-10-08 | 삼성전자주식회사 | How to Retrieve Information from Your Mobile Device |
WO2008148012A1 (en) * | 2007-05-25 | 2008-12-04 | Veveo, Inc. | System and method for text disambiguation and context designation in incremental search |
US20090313101A1 (en) * | 2008-06-13 | 2009-12-17 | Microsoft Corporation | Processing receipt received in set of communications |
US8788350B2 (en) * | 2008-06-13 | 2014-07-22 | Microsoft Corporation | Handling payment receipts with a receipt store |
WO2010061507A1 (en) * | 2008-11-28 | 2010-06-03 | 日本電気株式会社 | Language model creation device |
US8700630B2 (en) * | 2009-02-24 | 2014-04-15 | Yahoo! Inc. | Algorithmically generated topic pages with interactive advertisements |
US9269353B1 (en) * | 2011-12-07 | 2016-02-23 | Manu Rehani | Methods and systems for measuring semantics in communications |
US20130151997A1 (en) * | 2011-12-07 | 2013-06-13 | Globant, Llc | Method and system for interacting with a web site |
US9582490B2 (en) | 2013-07-12 | 2017-02-28 | Microsoft Technolog Licensing, LLC | Active labeling for computer-human interactive learning |
JP2016162163A (en) * | 2015-03-02 | 2016-09-05 | 富士ゼロックス株式会社 | Information processor and information processing program |
JP6495124B2 (en) * | 2015-07-09 | 2019-04-03 | 日本電信電話株式会社 | Term semantic code determination device, term semantic code determination model learning device, method, and program |
CN108875810B (en) * | 2018-06-01 | 2020-04-28 | 阿里巴巴集团控股有限公司 | Method and device for sampling negative examples from word frequency table aiming at training corpus |
US11237713B2 (en) * | 2019-01-21 | 2022-02-01 | International Business Machines Corporation | Graphical user interface based feature extraction application for machine learning and cognitive models |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5873056A (en) * | 1993-10-12 | 1999-02-16 | The Syracuse University | Natural language processing system for semantic vector representation which accounts for lexical ambiguity |
US20040117173A1 (en) * | 2002-12-18 | 2004-06-17 | Ford Daniel Alexander | Graphical feedback for semantic interpretation of text and images |
US20050080613A1 (en) * | 2003-08-21 | 2005-04-14 | Matthew Colledge | System and method for processing text utilizing a suite of disambiguation techniques |
US20060294067A1 (en) * | 2005-06-23 | 2006-12-28 | Halcrow Michael A | Dynamic language checking |
US7162468B2 (en) * | 1998-07-31 | 2007-01-09 | Schwartz Richard M | Information retrieval system |
US20070112511A1 (en) * | 2005-11-17 | 2007-05-17 | Digital Cyclone, Inc. | Mobile geo-temporal information manager |
US7269546B2 (en) * | 2001-05-09 | 2007-09-11 | International Business Machines Corporation | System and method of finding documents related to other documents and of finding related words in response to a query to refine a search |
US7302426B2 (en) * | 2004-06-29 | 2007-11-27 | Xerox Corporation | Expanding a partially-correct list of category elements using an indexed document collection |
-
2005
- 2005-12-13 US US11/314,956 patent/US7681147B2/en not_active Expired - Fee Related
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5873056A (en) * | 1993-10-12 | 1999-02-16 | The Syracuse University | Natural language processing system for semantic vector representation which accounts for lexical ambiguity |
US7162468B2 (en) * | 1998-07-31 | 2007-01-09 | Schwartz Richard M | Information retrieval system |
US7269546B2 (en) * | 2001-05-09 | 2007-09-11 | International Business Machines Corporation | System and method of finding documents related to other documents and of finding related words in response to a query to refine a search |
US20040117173A1 (en) * | 2002-12-18 | 2004-06-17 | Ford Daniel Alexander | Graphical feedback for semantic interpretation of text and images |
US20050080613A1 (en) * | 2003-08-21 | 2005-04-14 | Matthew Colledge | System and method for processing text utilizing a suite of disambiguation techniques |
US7302426B2 (en) * | 2004-06-29 | 2007-11-27 | Xerox Corporation | Expanding a partially-correct list of category elements using an indexed document collection |
US20060294067A1 (en) * | 2005-06-23 | 2006-12-28 | Halcrow Michael A | Dynamic language checking |
US20070112511A1 (en) * | 2005-11-17 | 2007-05-17 | Digital Cyclone, Inc. | Mobile geo-temporal information manager |
Cited By (80)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8892495B2 (en) | 1991-12-23 | 2014-11-18 | Blanding Hovenweep, Llc | Adaptive pattern recognition based controller apparatus and method and human-interface therefore |
US9535563B2 (en) | 1999-02-01 | 2017-01-03 | Blanding Hovenweep, Llc | Internet appliance system and method |
US9507778B2 (en) | 2006-05-19 | 2016-11-29 | Yahoo! Inc. | Summarization of media object collections |
US20080126961A1 (en) * | 2006-11-06 | 2008-05-29 | Yahoo! Inc. | Context server for associating information based on context |
US8594702B2 (en) | 2006-11-06 | 2013-11-26 | Yahoo! Inc. | Context server for associating information based on context |
US8402356B2 (en) | 2006-11-22 | 2013-03-19 | Yahoo! Inc. | Methods, systems and apparatus for delivery of media |
US20080117201A1 (en) * | 2006-11-22 | 2008-05-22 | Ronald Martinez | Methods, Systems and Apparatus for Delivery of Media |
US20080120308A1 (en) * | 2006-11-22 | 2008-05-22 | Ronald Martinez | Methods, Systems and Apparatus for Delivery of Media |
US20090024452A1 (en) * | 2006-11-22 | 2009-01-22 | Ronald Martinez | Methods, systems and apparatus for delivery of media |
US20080117202A1 (en) * | 2006-11-22 | 2008-05-22 | Ronald Martinez | Methods, Systems and Apparatus for Delivery of Media |
US9110903B2 (en) | 2006-11-22 | 2015-08-18 | Yahoo! Inc. | Method, system and apparatus for using user profile electronic device data in media delivery |
US8769099B2 (en) | 2006-12-28 | 2014-07-01 | Yahoo! Inc. | Methods and systems for pre-caching information on a mobile computing device |
US20080162686A1 (en) * | 2006-12-28 | 2008-07-03 | Yahoo! Inc. | Methods and systems for pre-caching information on a mobile computing device |
US8069142B2 (en) | 2007-12-06 | 2011-11-29 | Yahoo! Inc. | System and method for synchronizing data on a network |
US8307029B2 (en) | 2007-12-10 | 2012-11-06 | Yahoo! Inc. | System and method for conditional delivery of messages |
US8671154B2 (en) | 2007-12-10 | 2014-03-11 | Yahoo! Inc. | System and method for contextual addressing of communications on a network |
US8799371B2 (en) | 2007-12-10 | 2014-08-05 | Yahoo! Inc. | System and method for conditional delivery of messages |
US20090150514A1 (en) * | 2007-12-10 | 2009-06-11 | Yahoo! Inc. | System and method for contextual addressing of communications on a network |
US20090150501A1 (en) * | 2007-12-10 | 2009-06-11 | Marc Eliot Davis | System and method for conditional delivery of messages |
US8166168B2 (en) | 2007-12-17 | 2012-04-24 | Yahoo! Inc. | System and method for disambiguating non-unique identifiers using information obtained from disparate communication channels |
US20090165022A1 (en) * | 2007-12-19 | 2009-06-25 | Mark Hunter Madsen | System and method for scheduling electronic events |
US9626685B2 (en) | 2008-01-04 | 2017-04-18 | Excalibur Ip, Llc | Systems and methods of mapping attention |
US9706345B2 (en) | 2008-01-04 | 2017-07-11 | Excalibur Ip, Llc | Interest mapping system |
US20090177644A1 (en) * | 2008-01-04 | 2009-07-09 | Ronald Martinez | Systems and methods of mapping attention |
US20090176509A1 (en) * | 2008-01-04 | 2009-07-09 | Davis Marc E | Interest mapping system |
US20090177484A1 (en) * | 2008-01-06 | 2009-07-09 | Marc Eliot Davis | System and method for message clustering |
US8762285B2 (en) | 2008-01-06 | 2014-06-24 | Yahoo! Inc. | System and method for message clustering |
US20090182631A1 (en) * | 2008-01-16 | 2009-07-16 | Yahoo! Inc. | System and method for word-of-mouth advertising |
US10074093B2 (en) | 2008-01-16 | 2018-09-11 | Excalibur Ip, Llc | System and method for word-of-mouth advertising |
US20090198488A1 (en) * | 2008-02-05 | 2009-08-06 | Eric Arno Vigen | System and method for analyzing communications using multi-placement hierarchical structures |
US20090222304A1 (en) * | 2008-03-03 | 2009-09-03 | Yahoo! Inc. | Method and Apparatus for Social Network Marketing with Advocate Referral |
US8560390B2 (en) | 2008-03-03 | 2013-10-15 | Yahoo! Inc. | Method and apparatus for social network marketing with brand referral |
US8554623B2 (en) | 2008-03-03 | 2013-10-08 | Yahoo! Inc. | Method and apparatus for social network marketing with consumer referral |
US8538811B2 (en) | 2008-03-03 | 2013-09-17 | Yahoo! Inc. | Method and apparatus for social network marketing with advocate referral |
US8589486B2 (en) | 2008-03-28 | 2013-11-19 | Yahoo! Inc. | System and method for addressing communications |
US8745133B2 (en) | 2008-03-28 | 2014-06-03 | Yahoo! Inc. | System and method for optimizing the storage of data |
US8271506B2 (en) | 2008-03-31 | 2012-09-18 | Yahoo! Inc. | System and method for modeling relationships between entities |
US20090248738A1 (en) * | 2008-03-31 | 2009-10-01 | Ronald Martinez | System and method for modeling relationships between entities |
US8813107B2 (en) | 2008-06-27 | 2014-08-19 | Yahoo! Inc. | System and method for location based media delivery |
US8706406B2 (en) | 2008-06-27 | 2014-04-22 | Yahoo! Inc. | System and method for determination and display of personalized distance |
US20090328087A1 (en) * | 2008-06-27 | 2009-12-31 | Yahoo! Inc. | System and method for location based media delivery |
US9858348B1 (en) | 2008-06-27 | 2018-01-02 | Google Inc. | System and method for presentation of media related to a context |
US20090326800A1 (en) * | 2008-06-27 | 2009-12-31 | Yahoo! Inc. | System and method for determination and display of personalized distance |
US9158794B2 (en) | 2008-06-27 | 2015-10-13 | Google Inc. | System and method for presentation of media related to a context |
US8452855B2 (en) | 2008-06-27 | 2013-05-28 | Yahoo! Inc. | System and method for presentation of media related to a context |
US20100030870A1 (en) * | 2008-07-29 | 2010-02-04 | Yahoo! Inc. | Region and duration uniform resource identifiers (uri) for media objects |
US8583668B2 (en) | 2008-07-30 | 2013-11-12 | Yahoo! Inc. | System and method for context enhanced mapping |
US10230803B2 (en) | 2008-07-30 | 2019-03-12 | Excalibur Ip, Llc | System and method for improved mapping and routing |
US20100027527A1 (en) * | 2008-07-30 | 2010-02-04 | Yahoo! Inc. | System and method for improved mapping and routing |
US8386506B2 (en) | 2008-08-21 | 2013-02-26 | Yahoo! Inc. | System and method for context enhanced messaging |
US20100049702A1 (en) * | 2008-08-21 | 2010-02-25 | Yahoo! Inc. | System and method for context enhanced messaging |
US20100063993A1 (en) * | 2008-09-08 | 2010-03-11 | Yahoo! Inc. | System and method for socially aware identity manager |
US8281027B2 (en) | 2008-09-19 | 2012-10-02 | Yahoo! Inc. | System and method for distributing media related to a location |
US20100077017A1 (en) * | 2008-09-19 | 2010-03-25 | Yahoo! Inc. | System and method for distributing media related to a location |
US20100083169A1 (en) * | 2008-09-30 | 2010-04-01 | Athellina Athsani | System and method for context enhanced mapping within a user interface |
US9600484B2 (en) | 2008-09-30 | 2017-03-21 | Excalibur Ip, Llc | System and method for reporting and analysis of media consumption data |
US20100082688A1 (en) * | 2008-09-30 | 2010-04-01 | Yahoo! Inc. | System and method for reporting and analysis of media consumption data |
US8108778B2 (en) | 2008-09-30 | 2012-01-31 | Yahoo! Inc. | System and method for context enhanced mapping within a user interface |
US20100125604A1 (en) * | 2008-11-18 | 2010-05-20 | Yahoo, Inc. | System and method for url based query for retrieving data related to a context |
US8060492B2 (en) | 2008-11-18 | 2011-11-15 | Yahoo! Inc. | System and method for generation of URL based context queries |
US9805123B2 (en) | 2008-11-18 | 2017-10-31 | Excalibur Ip, Llc | System and method for data privacy in URL based context queries |
US8024317B2 (en) | 2008-11-18 | 2011-09-20 | Yahoo! Inc. | System and method for deriving income from URL based context queries |
US8032508B2 (en) * | 2008-11-18 | 2011-10-04 | Yahoo! Inc. | System and method for URL based query for retrieving data related to a context |
US9224172B2 (en) | 2008-12-02 | 2015-12-29 | Yahoo! Inc. | Customizable content for distribution in social networks |
US8055675B2 (en) | 2008-12-05 | 2011-11-08 | Yahoo! Inc. | System and method for context based query augmentation |
US20100161600A1 (en) * | 2008-12-19 | 2010-06-24 | Yahoo! Inc. | System and method for automated service recommendations |
US8166016B2 (en) | 2008-12-19 | 2012-04-24 | Yahoo! Inc. | System and method for automated service recommendations |
US20100228582A1 (en) * | 2009-03-06 | 2010-09-09 | Yahoo! Inc. | System and method for contextual advertising based on status messages |
US8150967B2 (en) | 2009-03-24 | 2012-04-03 | Yahoo! Inc. | System and method for verified presence tracking |
US20100280879A1 (en) * | 2009-05-01 | 2010-11-04 | Yahoo! Inc. | Gift incentive engine |
US20110035265A1 (en) * | 2009-08-06 | 2011-02-10 | Yahoo! Inc. | System and method for verified monetization of commercial campaigns |
US10223701B2 (en) | 2009-08-06 | 2019-03-05 | Excalibur Ip, Llc | System and method for verified monetization of commercial campaigns |
US8914342B2 (en) | 2009-08-12 | 2014-12-16 | Yahoo! Inc. | Personal data platform |
US8364611B2 (en) | 2009-08-13 | 2013-01-29 | Yahoo! Inc. | System and method for precaching information on a mobile device |
US9772995B2 (en) * | 2012-12-27 | 2017-09-26 | Abbyy Development Llc | Finding an appropriate meaning of an entry in a text |
US20150331852A1 (en) * | 2012-12-27 | 2015-11-19 | Abbyy Development Llc | Finding an appropriate meaning of an entry in a text |
WO2014104943A1 (en) * | 2012-12-27 | 2014-07-03 | Abbyy Development Llc | Finding an appropriate meaning of an entry in a text |
US20230025964A1 (en) * | 2021-05-17 | 2023-01-26 | Verantos, Inc. | System and method for term disambiguation |
US11727208B2 (en) * | 2021-05-17 | 2023-08-15 | Verantos, Inc. | System and method for term disambiguation |
US11989511B2 (en) | 2021-05-17 | 2024-05-21 | Verantos, Inc. | System and method for term disambiguation |
Also Published As
Publication number | Publication date |
---|---|
US20070136689A1 (en) | 2007-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7681147B2 (en) | System for determining probable meanings of inputted words | |
US7729901B2 (en) | System for classifying words | |
US11868375B2 (en) | Method, medium, and system for personalized content delivery | |
US11023513B2 (en) | Method and apparatus for searching using an active ontology | |
CN107992514B (en) | Structured information card search and retrieval | |
US7519588B2 (en) | Keyword characterization and application | |
RU2692045C1 (en) | Method and system for recommending fresh suggest search requests in a search engine | |
US10049099B2 (en) | Spell correction with hidden markov models on online social networks | |
US20190121850A1 (en) | Computerized system and method for automatically transforming and providing domain specific chatbot responses | |
US9116948B2 (en) | Method and system for semantic search against a document collection | |
US10223464B2 (en) | Suggesting filters for search on online social networks | |
CN106372060B (en) | Search for the mask method and device of text | |
US8156129B2 (en) | Substantially similar queries | |
US9262438B2 (en) | Geotagging unstructured text | |
US20130159277A1 (en) | Target based indexing of micro-blog content | |
US20160299882A1 (en) | Contextual speller models on online social networks | |
US20190042580A1 (en) | Categorizing Objects for Queries on Online Social Networks | |
US20160275196A1 (en) | Semantic search apparatus and method using mobile terminal | |
CN110069698B (en) | Information pushing method and device | |
US10102246B2 (en) | Natural language consumer segmentation | |
US20190079934A1 (en) | Snippet Generation for Content Search on Online Social Networks | |
US8731930B2 (en) | Contextual voice query dilation to improve spoken web searching | |
US10592514B2 (en) | Location-sensitive ranking for search and related techniques | |
CN116681801A (en) | Poster generation method, poster generation device, server and storage medium | |
RU2589856C2 (en) | Method of processing target message, method of processing new target message and server (versions) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: YAHOO! INC.,CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:RICHARDSON-BUNBURY, DAVID;RIISE, SOREN;PATEL, DEVESH;AND OTHERS;REEL/FRAME:022622/0703 Effective date: 20060227 Owner name: YAHOO! INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:RICHARDSON-BUNBURY, DAVID;RIISE, SOREN;PATEL, DEVESH;AND OTHERS;REEL/FRAME:022622/0703 Effective date: 20060227 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
AS | Assignment |
Owner name: EXCALIBUR IP, LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:YAHOO! INC.;REEL/FRAME:038383/0466 Effective date: 20160418 |
|
AS | Assignment |
Owner name: YAHOO! INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:EXCALIBUR IP, LLC;REEL/FRAME:038951/0295 Effective date: 20160531 |
|
AS | Assignment |
Owner name: EXCALIBUR IP, LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:YAHOO! INC.;REEL/FRAME:038950/0592 Effective date: 20160531 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552) Year of fee payment: 8 |
|
AS | Assignment |
Owner name: STARBOARD VALUE INTERMEDIATE FUND LP, AS COLLATERAL AGENT, NEW YORK Free format text: PATENT SECURITY AGREEMENT;ASSIGNORS:ACACIA RESEARCH GROUP LLC;AMERICAN VEHICULAR SCIENCES LLC;BONUTTI SKELETAL INNOVATIONS LLC;AND OTHERS;REEL/FRAME:052853/0153 Effective date: 20200604 |
|
AS | Assignment |
Owner name: R2 SOLUTIONS LLC, TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:EXCALIBUR IP, LLC;REEL/FRAME:053459/0059 Effective date: 20200428 |
|
AS | Assignment |
Owner name: STINGRAY IP SOLUTIONS LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: LIFEPORT SCIENCES LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: AMERICAN VEHICULAR SCIENCES LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: SAINT LAWRENCE COMMUNICATIONS LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: MONARCH NETWORKING SOLUTIONS LLC, CALIFORNIA Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: NEXUS DISPLAY TECHNOLOGIES LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: BONUTTI SKELETAL INNOVATIONS LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: CELLULAR COMMUNICATIONS EQUIPMENT LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: R2 SOLUTIONS LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: TELECONFERENCE SYSTEMS LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: LIMESTONE MEMORY SYSTEMS LLC, CALIFORNIA Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: UNIFICATION TECHNOLOGIES LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: PARTHENON UNIFIED MEMORY ARCHITECTURE LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: INNOVATIVE DISPLAY TECHNOLOGIES LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: MOBILE ENHANCEMENT SOLUTIONS LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: ACACIA RESEARCH GROUP LLC, NEW YORK Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: SUPER INTERCONNECT TECHNOLOGIES LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 |
|
AS | Assignment |
Owner name: R2 SOLUTIONS LLC, TEXAS Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE ASSIGNEE NAME PREVIOUSLY RECORDED ON REEL 053654 FRAME 0254. ASSIGNOR(S) HEREBY CONFIRMS THE RELEASE OF SECURITY INTEREST GRANTED PURSUANT TO THE PATENT SECURITY AGREEMENT PREVIOUSLY RECORDED;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:054981/0377 Effective date: 20200630 |
|
AS | Assignment |
Owner name: STARBOARD VALUE INTERMEDIATE FUND LP, AS COLLATERAL AGENT, NEW YORK Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE THE ASSIGNOR NAME PREVIOUSLY RECORDED AT REEL: 052853 FRAME: 0153. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNOR:R2 SOLUTIONS LLC;REEL/FRAME:056832/0001 Effective date: 20200604 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20220316 |