US20050262050A1 - System, method and service for ranking search results using a modular scoring system - Google Patents
System, method and service for ranking search results using a modular scoring system Download PDFInfo
- Publication number
- US20050262050A1 US20050262050A1 US10/841,391 US84139104A US2005262050A1 US 20050262050 A1 US20050262050 A1 US 20050262050A1 US 84139104 A US84139104 A US 84139104A US 2005262050 A1 US2005262050 A1 US 2005262050A1
- Authority
- US
- United States
- Prior art keywords
- pages
- scoring
- query
- scored
- ranking
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9532—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9538—Presentation of query results
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99934—Query formulation, input preparation, or translation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99935—Query augmenting and refining, e.g. inexact access
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99936—Pattern matching access
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99937—Sorting
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99944—Object-oriented database structure
- Y10S707/99945—Object-oriented database structure processing
Definitions
- the present invention generally relates to scoring or ranking documents with respect to a query term.
- the present invention relates to a method for scoring or ranking documents by aggregating many rankings into one using a rank aggregation method.
- the present invention allows customization of the rank aggregation method for specific intranets, the WWW, and subsets of the WWW.
- the World Wide Web is comprised of an expansive network of interconnected computers upon which businesses, governments, groups, and individuals throughout the world maintain inter-linked computer files known as web pages. Users navigate these web pages by means of computer software programs commonly known as Internet browsers. Due to the vast number of WWW sites, many web pages have a redundancy of information or share a strong likeness in either function or title. The vastness of the unstructured WWW causes users to rely primarily on Internet search engines to retrieve information or to locate businesses. These search engines use various means to determine the relevance of a user-defined search to the information retrieved.
- the authors of web pages provide information known as metadata within the body of the document that defines the web pages.
- This document is typically written in, for example, hypertext markup language (HTML).
- HTML hypertext markup language
- a computer software product known as a web crawler systematically accesses web pages by sequentially following hypertext links (hyperlinks) from web page to web page.
- the crawler indexes the web pages for use by the search engines using information about a web page as provided by its address or Universal Resource Locator (URL), metadata, and other criteria found within the web page.
- the crawler is run periodically to update previously stored data and to append information about newly created web pages.
- the information compiled by the crawler is stored in a metadata repository or database.
- the search engines search this repository to identify matches for the user-defined search rather than attempt to find matches in real time.
- a typical search engine has an interface with a search window where the user enters an alphanumeric search expression or keywords.
- the search engine sifts through available web sites for the search terms, and returns the search of results in the form of web pages in, for example, HTML.
- Each search result comprises a list of individual entries that have been identified by the search engine as satisfying the search expression.
- Each entry or “hit” comprises a hyperlink that points to a Uniform Resource Locator (URL) location or web page.
- URL Uniform Resource Locator
- An exemplary search engine is the Google® search engine.
- An important aspect of the Google® search engine is the ability to rank web pages according to the authority of the web pages with respect to a search.
- the ranking technique used by the Google® search engine is the PageRank algorithm. Reference is made to L. Page, et. al., “The PageRank citation ranking: Bringing order to the web”, Technical report, Stanford Digital Library Technologies Project, 1998. Paper SIDL-WP-1999-0120.
- the PageRank algorithm calculates a stationary distribution of a Markov chain induced by hyperlink connectivity on the WWW. This same technique used by the PageRank algorithm applies to intranets or subsets of the WWW. Although the PageRank algorithm has proven to be useful, it would be desirable to present additional improvements.
- Search engines typically face a problem with having too many results that contain the query terms. For example, the query “db2” appears in over 180,000 different URLs on one company intranet. The problem of indexing on a large corpus of text such as an intranet or the World Wide Web becomes one of ranking the many results by their importance and relevance to the query, so that the user need not peruse all of the results to satisfy an informational need.
- Intranet search is different from Internet search for several reasons: the queries asked on the intranet are different, the notion of a “good answer” is different, and the social processes that create the intranet are different from those that create the Internet. Queries on an intranet tend to be jargon-heavy and use various acronyms and abbreviations that reflect the structure of the organization or company that uses that intranet. In addition, the correct answer to a query is often specific to a site, geographic location, or an organizational division, but the user often does not make this intent explicit in the query. Context-sensitive search is a common problem for many intranets and the Internet.
- employees tend to fulfill a role that is consistent with their job description.
- an employee of the marketing department may have a different need than an employee of a research division, and a lawyer may have a different need than a programmer. This suggests that ranking methods for search engines should provide personalization of ranking functions.
- a simple approach to mixing different features is to apply numerical weights to the features and use a mixing function to combine these numerical weights into a single score for ranking documents.
- the scales and distribution of scores from different features can be incomparable, and it is difficult to arrive at an optimal mixing function.
- One approach to addressing this problem that has been suggested previously uses Bayesian probabilistic models for retrieval, treating the different scores given to documents as probabilities and merging them according to a probabilistic model.
- W. Croft “Combining approaches to information retrieval”, Advances in Information Retrieval . Kluwer Academic Publishers, 2000; D. Hiemstra. “ Using Language Models for Information Retrieval ”, PhD thesis, University of Twente, Twente, The Netherlands, 2001; W.
- the solution should be customizable to meet the needs and characteristics of a specific network, intranet, or client. The need for such a solution has heretofore remained unsatisfied.
- the present invention satisfies this need, and presents a system, a computer program product, and an associated method (collectively referred to herein as “the system” or “the present system”) for ranking search results using a rank aggregation method to allow flexible implementation of text search processors.
- the present system introduces a means by which many different features of documents can be merged to give an ordered list of results.
- the ranking functions of the present system can easily be customized to the needs of a particular corpus or collection of users such as, for example, an intranet.
- rank aggregation is independent of the underlying score distributions between the different factors and can be applied to merge any set of ranking functions. While discussed in terms of an intranet or the Internet, the present system is applicable as well to, for example, indexing email repositories, newsgroups, instant messaging logs, and the like.
- rank aggregation holds the advantage of combining the influence of many different heuristic factors in a robust way to produce high-quality results for queries.
- a rank aggregation processor takes several ranked lists, each of which ranks part of a collection of candidates (web pages), and produces an ordering of all the candidates in the union of the lists. Reference is made to C. Dwork, et. al. “Rank aggregation methods for the web”, Proc. 10 th WWW , pages 613-622, 2001. Ordering produced by the rank aggregation processor is aimed at minimizing the total number of inversions, that is, the total number of “upsets” caused by the output ordering with respect to initial orderings.
- Rank aggregation is employed as a tool to combine initial rankings of pages or documents (produced by various ranking functions) into an aggregate ranking that is improved over any one of the constituent ranking functions.
- the present system uses rank aggregation to combine many ranked lists into a single ranking of results.
- the present system combines factors such as indegree, page ranking, URL length, proximity to the root server of an intranet, etc, to form a single ordering on web pages that closely obeys the individual orderings, but also mediates between the collective wisdom of individual heuristics.
- Rank aggregation is resilient to spam; specifically, if one ranking is manipulated to influence the results, the aggregation of different rankings still reflects the collective wisdom of the majority of the rankings. Consequently, the present system produces robust rankings that are not as subject to manipulation as conventional techniques.
- the present system provides the opportunity to customize and personalize search results within a single software architecture.
- the ranking function can be tailored to represent the set of users and that structure of the organization. For example, a client may require a particular server to supply all answers to human resource questions.
- the present system incorporates this requirement incorporated into the ranking function without the need to change the underlying text index software.
- a “best” answer to a query may depend on the location of a user asking the question. For example, the query “retirement” can have different answers if the query is submitted in Zurich than if it is submitted in Raleigh.
- the present system can incorporate the geographical location of a user without changing the underlying indexing software that returns ranked lists of results.
- a text search application takes a set of documents in and produces a graded set of documents. More generally, a text search application typically comprises the following features: a collection of documents, a set of users, a set of scoring modules, and a rank aggregation processor.
- a collection of documents In the case of the web, information about the collection of documents is often gathered into one place via a web crawler, but the construction of an index can also be done in a distributed fashion where the documents reside.
- the set of users issue queries on the documents and receive lists of results from the system that suggest which documents are relevant to the query.
- the set of scoring modules and the rank aggregation processor comprise a set of heuristics and scoring methods that take as input the set of documents and produce a graded set of documents, based on heuristics for relevance or authoritativeness.
- the heuristics comprise several categories: static orderings of the documents, dynamic orderings that depend on the particular query, and dynamic orderings that depend on the particular user. Dynamic orderings that depend on the particular query and dynamic orderings that depend on the particular user are typically used to decide which documents are relevant to a query. Static orderings of the documents are used to discriminate between which of the relevant documents to rank first.
- the static orderings of the documents are based on features of the documents themselves but independent of the query or the user.
- Examples of static orderings of documents comprise PageRank, document length, media type, position in a hierarchy, rankings by authors, rankings by an independent authority, etc.
- Dynamic orderings that depend on the particular query comprise, for example, the number of times that a query term appears in the document and other documents (TF*IDF), whether the query terms appear in a title, whether the query terms appear in anchor text for a document, whether the query terms appear near each other in the document (e.g., lexical affinities), the placement of query terms within a document, etc. For example, documents that contain terms early in their text or within a title may be more relevant than documents that contain the query term in a footnote or appendix. Dynamic orderings that depend on the particular query may be implemented within a traditional inverted keyword index or may be implemented outside the index to allow customization without the need to modify the index software.
- Dynamic orderings that depend on the particular user comprise, for example, geographical proximity scores for the user, role or job title of the user within an organization, educational level, history of previous queries by the user, etc.
- the present system may be embodied in a utility program such as a modular scoring utility program.
- the modular scoring utility program is customized for a client's intranet and particular needs.
- the present system provides means for the user to identify an intranet, the Internet, a database, or other set of data as input data from which query results may be scored by the present system.
- the present system also provides means for the user to specify a query in the form of text input. A user specifies the input data and the query and then invokes the modular scoring utility program to produce a scored set of documents.
- FIG. 1 is a schematic illustration of an exemplary operating environment in which a modular scoring system of the present invention can be used
- FIG. 2 is a block diagram of the high-level architecture of the modular scoring system of FIG. 1 ;
- FIG. 3 is a process flow chart illustrating a method of configuring the modular scoring system of FIGS. 1 and 2 ;
- FIG. 4 is a process flow chart illustrating a method of operation of the modular scoring system of FIGS. 1 and 2 ;
- FIG. 5 is a block diagram of the high-level architecture of one embodiment of the modular scoring system of FIGS. 1 and 2 ;
- FIG. 6 is a block diagram of the high-level architecture of one embodiment of the modular scoring system of FIGS. 1 and 2 .
- Internet A collection of interconnected public and private computer networks that are linked together with routers by a set of standards protocols to form a global, distributed network.
- Inlink Links coming into a web page or document such as an HTML document from another web page or document.
- Intranet A network that is internal to a company or organization that may not be connected to the Internet, but that uses standard Internet protocols and has some similar functions.
- Link A pointer in a web page or in a document such as an HTML document that leads to another web page or to another place within the same document; also called a hyperlink.
- Rank An index assigned to a document/web page having value 1 through the number of documents, with the highest rank corresponding to 1.
- Score A numeric value (usual a fractional real number) assigned to a document/page by the scoring technique (e.g., pagerank) from which a ranking can be obtained.
- URL Uniform Resource Locator
- WWW World Wide Web
- Internet client server hypertext distributed information retrieval system
- FIG. 1 portrays an exemplary overall environment in which a system, a service, a computer program product, and an associated method (the “system 10 ”) for ranking search results using a modular scoring system according to the present invention may be used.
- System 10 comprises a software programming code or a computer program product that is typically embedded within, or installed on a host server 15 .
- system 10 can be saved on a suitable storage medium such as a diskette, a CD, a hard drive, or like devices.
- Users such as remote Internet users, are represented by a variety of computers such as computers 20 , 25 , 30 , and can access the host server 15 through a network 35 .
- system 10 scores results obtained by a search engine operated on host server 15 .
- the results are accessed by system 10 from database (dB) 40 .
- Users access the results of the scoring when performing key word searches on the Internet.
- users provide an index or other hierarchical structure to system 10 ; system 10 then scores the index or other hierarchical structure for the user.
- Output from system 10 is stored on dB 40 or on a storage repository of the user.
- Computers 20 , 25 , 30 each comprise software that allows the user to interface securely with the host server 15 .
- the host server 15 is connected to network 35 via a communications link 45 such as a telephone, cable, or satellite link.
- Computers 20 , 25 , 30 can be connected to network 35 via communications links 50 , 55 , 60 , respectively. While system 10 is described in terms of network 35 , computers 20 , 25 , 30 may also access system 10 locally rather than remotely. Computers 20 , 25 , 30 may access system 10 either manually, or automatically through the use of an application.
- Graded set refers to a set of documents in which each document in the set of documents in the graded set comprises a numerical score.
- the numerical score is a relevance score as applied to a search term.
- the numerical scores may all be equal in which case the graded set can be thought of as just a set.
- Graded sets are known as “fuzzy sets”.
- FIG. 2 illustrates a high-level hierarchy of system 10 .
- System 10 comprises a set of scoring modules 205 , a duplication module 210 , and a rank aggregation processor 215 .
- Each of the scoring modules 205 takes as input one or more graded sets of documents, an auxiliary information module 225 , and (optionally) a query 220 .
- Output from each of the scoring modules to the rank aggregation processor 215 is a ranked set of documents.
- the rank aggregation processor 215 weights the outputs from each of the scoring modules 205 equally.
- the rank aggregation processor 215 weights the outputs from each of the scoring modules 205 differently to meet scoring requirements of a specific client, user, intranet, or network.
- An auxiliary information module 225 comprises data that may be used to customize the output of system 10 such as, for example, a user ID, a history of queries made by a user, a history of documents read by a user, a history of click-through results by a user, the geographic location of a user, the security classification of a user, a language set for a user, a set of documents for comparison uses during scoring, etc. Additional data may be used by the auxiliary information module 225 as needed to customize the output of system 10 to a client, intranet, or other network such as, for example, the Internet. In one embodiment, system 10 excludes auxiliary information module 225 from the process of scoring an aggregated set of graded documents.
- the duplication module 210 takes as input one graded set of documents and produces two or more identical copies of the graded set of documents for use as needed by the scoring modules 205 and the rank aggregation processor 215 .
- Input to the rank aggregation processor 215 comprises two or more ranked sets of documents produced by the scoring modules 205 and optional weights on the ranked sets of documents.
- the rank aggregation processor 215 produces a scored set of documents using rank aggregation to merge the outputs of some or all of the scoring modules 205 and produce a single scored set of documents.
- Scoring modules 205 comprise a set of indices 230 such as, for example, a content index 235 , a title index 240 , and an anchortext index 245 . Additional indices may be used as desired.
- the content index 235 , the title index 240 , and the anchortext index 245 take as input query 220 and find a set of documents in dB 40 that match the text of input query 220 .
- the indices (e.g., content index, title index, anchortext index) provide pointers into the set of documents in dB 40 containing the query terms, and pass them to the union module 250 and to the rank aggregation processor 215 .
- the content index 235 comprises an inverted keyword index on document content.
- the title index 240 comprises an inverted keyword index on titles and metadata about the set of found documents.
- the anchortext index 245 comprises an inverted keyword index on anchortext for the set of found documents.
- the anchortext for a document typically comprises the highlighted text that a user clicks on to navigate to the document.
- the anchortext for a document may further comprise text surrounding the highlighted text.
- the title index 240 and anchortext index 245 comprise virtual documents.
- Indices 230 provide graded lists of found documents that are scored using any suitable scoring analysis such as, for example, TF*IDF (Term Frequency Times Inverse Document Frequency).
- TF*IDF scores a document based on the number of terms a query term appears in a document: the higher the term frequency, the more relevant the document. Further, TF*IDF determines the relevance of a query term by the number of documents comprising the query term. TF*IDF places more weight on a less common term than a more common term as determined by the number of documents found with each term. Consequently, documents with the highest number of least common terms in the search query receive the highest score.
- TF*IDF Term Frequency Times Inverse Document Frequency
- the outputs of indices 230 are combined in a union module 250 to form a single graded set of documents.
- the duplication module 210 duplicates the single graded set of documents as needed to provide inputs to the scoring modules 205 .
- scoring modules 205 may also utilize query 220 and auxiliary information module 225 as input.
- Scoring modules 205 further comprise ranking or scoring processors such as, for example, a page ranking processor 255 , an indegree processor 260 , a discovery date processor 265 , a URL word processor 270 , a URL depth processor 275 , a URL length processor 280 , a geography processor 285 , a discriminator processor 290 , etc.
- This list of the scoring modules 205 is illustrative of the use of various scoring techniques by system 10 . Any type of processor that produces a ranked set of documents from a graded set of documents may be used as one of the scoring modules 205 .
- the scoring modules 205 may be selected or deselected by selection module 295 as needed for a query, a user, a client, an intranet, etc.
- the page ranking processor 255 ranks a graded set of documents utilizing, for example, the PageRank algorithm. The computation of page ranking depends primarily on the link structure between web pages and is therefore useful for HTML content. Any type of page ranking processor may be used by system 10 such as, for example, any variation on the PageRank algorithm.
- the indegree processor 260 ranks a graded set of documents based on the number of inlinks to the document.
- the indegree processor accords a rank to a document or web page that is proportional to the number of links into the document or web page.
- the discovery date processor 265 ranks a graded set of documents based on the time that a crawler discovers a web page or a document. If a crawl is started from a single seed, then the order in which pages are crawled tends to be similar to a breadth first search through the link graph [reference is made to M. Najork, et. al., “Breadth-first search crawling yields high-quality pages”, In Proc. 10 th WWW , pages 114-118, 2001]. A sequence of times that a page is discovered by a hyperlink provides an approximation to the hyperlink graph distance of the page from the root seed of the network.
- the discovery date processor 265 accords a rank to a document or a web page that is inversely proportional to the distance of the document or web page from the root seed of the network.
- a document or a web page that is close to the root seed of the network receives a higher rank than a document or a web page that is further away from the root seed of the network.
- the URL words processor 270 compares the text of the query term with a URL of a document or web page in the graded set. Input to the URL words processor 270 comprises query 220 and a graded set of documents. The URL words processor 270 accords a higher rank to a document that comprise a query term as a substring in the URL corresponding to that document.
- the URL depth processor 275 accords a rank to a document or a web page that is inversely proportional to the number of delimiters such as a slash character (“/”) that appear in the path component of the URL corresponding to that document or web page.
- the number of delimiters in the path component of a URL indicates the relative position of a document or a web page in a directory hierarchy; fewer delimiters indicate a higher position in the hierarchy.
- the URL depth processor 275 favors a document or web page near the top of a directory hierarchy. Documents or web pages near the top of a directory hierarchy tend to be more general and have links to pages or documents lower in the hierarchy. Consequently, documents or web pages near the top of a directory hierarchy (having fewer delimiters) tend to be more authoritative than those documents or web pages at the bottom of the hierarchy (having more delimiters).
- the URL length processor 280 accords a rank to a document or a web page that is inversely proportional to a length of the URL corresponding to that document or web page. When comparing documents comprising comparable content, the URL length processor 280 considers documents with shorter URL strings as more authoritative.
- the geography processor 285 accords rank to a web page or a document based on a geographical location associated with that document or web page compared to the geographical location of a client or a user.
- the geography processor 285 is especially useful for worldwide organizations that have employees in many different countries. For example, a user queries a company intranet in Japan regarding company benefits.
- the geography processor 285 accords a high rank to documents about company benefits that correspond to Japan as opposed to documents that correspond to Sweden.
- the discriminator 290 accords rank to a web page or a document in favor of certain classes of URLs over others.
- the favored URLs comprise, for example, those that end in a slash character (“/”) or “index.html”.
- the favored URLs further comprise those URLs that comprise a tilde character (“ ⁇ ”). These URL's are typically the main page of a site.
- the discriminator 285 further discriminates, for example, against certain classes of dynamic URLs containing a question mark character (“?”).
- the discriminator 290 is neutral on all other URLs and is easily customized to knowledge of a specific intranet or other network.
- the rank aggregation processor 215 utilizes a variety of methods to aggregate the outputs of the scoring modules 205 such as, for example, positional methods, graph methods, or Markov chain methods. When using positional methods, the rank aggregation processor 215 gives each document an output score that is computed as a function of the various ranks received by a document or a web page from the scoring modules 205 . The output score assignment may be determined by, for example, the mean rank or the median rank. A document or a web page is then scored by the output rank received.
- the graph method creates a graph of 2n vertices D 1 , . . . , D n , P, . . . , P n , where n is the number of documents.
- Edges of the graph are of the form (D i , P j ).
- a “cost” associated with the edge (D i , P j ) reflects the badness of scoring document i as the j th best document, where “badness” corresponds to divergence from an ideal.
- the rank aggregation processor 215 determines the costs from the input graded sets. As an example, if document i receives ranks R 1 , . . .
- the rank aggregation processor 215 may define the cost of (D i , P j ) as the sum of the quantities
- , for t 1, . . . , k. Once these costs have been defined, a “minimum-cost perfect matching” is computed.
- a perfect matching assigns a unique position (score) j to each document i. The unique position j places document i in a scored aggregate list of documents.
- the Markov chain method creates a graph of n vertices D 1 , . . . , D n where n is the number of documents.
- the edges are of the form (D i , D j ), and the weight of the edge (D i , D j ) reflects perceived improvement of D j over D i by the input graded sets. For example, if a majority of the input ranked sets rank D j above D i , then the weight is 1, else it is 0.
- the weights of the out-going edges of each vertex D i are normalized so that their sum is exactly 1.
- the rank aggregation processor 215 determines the stationary probability distribution of a random walk on this graph via an eigenvector computation.
- the rank aggregation processor 215 then sorts the web pages or documents as represented by the vertices in decreasing order of stationary probability, yielding the final score of the documents [reference is made to J. Allan, et. al., “INQUERY and TREC-9”. In Proc. 9 th TREC , pages 551-562, 2000].
- FIG. 3 illustrates a method 300 of configuring system 10 for a specific client.
- Customer requirements are determined (step 305 ) and required auxiliary information is identified (step 310 ).
- Customer requirements and required auxiliary information are used to configure the auxiliary information module 225 (step 315 ) and configure scoring modules 205 for use by the client (step 320 ).
- a client having a homogeneous workforce that operates in one geographic area does not require a geography processor 285 .
- a user may wish to use in-degree or pagerank, but not both.
- FIG. 4 illustrates a method 400 of operation of system 10 .
- a user enters a query 220 ( FIG. 2 )).
- the query is transmitted to indices 230 and optionally to other scoring modules 205 (step 410 ).
- Each of the indices 230 produces a graded set of documents from the query (step 415 ).
- the union module 250 combines outputs of indices 230 into a single graded set of documents (step 420 ).
- the duplication module 210 duplicates the single graded set of documents as needed for selected scoring modules 205 (step 425 ).
- the scoring modules 205 are selected for a specific configuration as required by the client.
- the scoring modules 205 score their copy of the graded set of documents (step 430 ).
- the rank aggregation module 215 forms a single scored set of documents by merging and scoring the outputs of the scoring modules 205 according to predetermined criteria selected to meet client requirements.
- the scoring modules 205 , the duplication module 210 , and the rank aggregation processor 215 can be configured within system 10 in a variety of combinations as necessary to refine the scoring process and achieve performance desired by the client.
- the selection of a configuration may be achieved by mathematical considerations or by comparison on the basis of human trials.
- FIG. 5 illustrates a high-level hierarchy of one embodiment of system 10 in which two levels of aggregation are used. These two levels of aggregation can result in different scores by giving different weights to different features in the scoring process.
- a rank aggregation processor 505 processes outputs of indices 230 to create an initial scored set for analysis by selected scoring modules 205 and the rank aggregation processor 215 .
- FIG. 6 illustrates a high-level hierarchy of another embodiment of system 10 in which two levels of aggregation are used in a different configuration.
- a rank aggregation processor processes outputs of similar scoring modules 205 prior to an overall rank aggregation. For example, outputs from the page ranking processor 255 and the indegree processor 250 are aggregated together and scored by a rank aggregation processor 605 . Outputs from the URL length processor 280 , the URL words processor 270 , and the URL depth processor 275 are aggregated together and scored by a rank aggregation processor 610 . Outputs from the discovery date processor 265 , the geography processor 285 , the discriminator 290 , etc., are aggregated together and scored by a rank aggregation processor 615 .
- the outputs of the rank aggregation processor 605 , the rank aggregation processor 610 , and the rank aggregation processor 615 are intermediate scored sets of documents that are then processed by a rank aggregation processor 620 along with output from indices 230 .
- the rank aggregation processor 605 , the rank aggregation processor 610 , and the rank aggregation processor 615 each process ranked sets of documents that have some elements that do not appear in the other sets. In other terms, the more commensurate scores are aggregated first, and then these ranks are used as input the final aggregation.
- the present invention is described for illustration purpose only in relation to intranets, it should be clear that the invention is applicable as well to, for example, the WWW and to subsets of the WWW in addition to data derived from any source stored in any format that is accessible by the present invention.
- the present invention is described in terms of the PageRank algorithm, it should be clear that the present invention is applicable as well to, for example, other search applications and ranking techniques without departing from the scope of the present invention.
- the present invention has been described in terms of web search engines, it should be clear that the present invention is applicable as well to, for example, indexing email repositories, newsgroups, instant messaging logs, and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Description
- The present invention generally relates to scoring or ranking documents with respect to a query term. In particular, the present invention relates to a method for scoring or ranking documents by aggregating many rankings into one using a rank aggregation method. Specifically, the present invention allows customization of the rank aggregation method for specific intranets, the WWW, and subsets of the WWW.
- The World Wide Web (WWW) is comprised of an expansive network of interconnected computers upon which businesses, governments, groups, and individuals throughout the world maintain inter-linked computer files known as web pages. Users navigate these web pages by means of computer software programs commonly known as Internet browsers. Due to the vast number of WWW sites, many web pages have a redundancy of information or share a strong likeness in either function or title. The vastness of the unstructured WWW causes users to rely primarily on Internet search engines to retrieve information or to locate businesses. These search engines use various means to determine the relevance of a user-defined search to the information retrieved.
- The authors of web pages provide information known as metadata within the body of the document that defines the web pages. This document is typically written in, for example, hypertext markup language (HTML). A computer software product known as a web crawler systematically accesses web pages by sequentially following hypertext links (hyperlinks) from web page to web page. The crawler indexes the web pages for use by the search engines using information about a web page as provided by its address or Universal Resource Locator (URL), metadata, and other criteria found within the web page. The crawler is run periodically to update previously stored data and to append information about newly created web pages. The information compiled by the crawler is stored in a metadata repository or database. The search engines search this repository to identify matches for the user-defined search rather than attempt to find matches in real time.
- A typical search engine has an interface with a search window where the user enters an alphanumeric search expression or keywords. The search engine sifts through available web sites for the search terms, and returns the search of results in the form of web pages in, for example, HTML. Each search result comprises a list of individual entries that have been identified by the search engine as satisfying the search expression. Each entry or “hit” comprises a hyperlink that points to a Uniform Resource Locator (URL) location or web page.
- An exemplary search engine is the Google® search engine. An important aspect of the Google® search engine is the ability to rank web pages according to the authority of the web pages with respect to a search. The ranking technique used by the Google® search engine is the PageRank algorithm. Reference is made to L. Page, et. al., “The PageRank citation ranking: Bringing order to the web”, Technical report, Stanford Digital Library Technologies Project, 1998. Paper SIDL-WP-1999-0120. The PageRank algorithm calculates a stationary distribution of a Markov chain induced by hyperlink connectivity on the WWW. This same technique used by the PageRank algorithm applies to intranets or subsets of the WWW. Although the PageRank algorithm has proven to be useful, it would be desirable to present additional improvements.
- Search engines typically face a problem with having too many results that contain the query terms. For example, the query “db2” appears in over 180,000 different URLs on one company intranet. The problem of indexing on a large corpus of text such as an intranet or the World Wide Web becomes one of ranking the many results by their importance and relevance to the query, so that the user need not peruse all of the results to satisfy an informational need.
- Many different features can be used to determine the relevance or authority of a document to given query. In the case of the World Wide Web, the most successful techniques (as exemplified by Google) are a combination of indexing the content, indexing of anchortext, and use of PageRank to provide a static ordering of authority. Many techniques have been suggested for producing good results to queries, including considering the indegree in the weblink graph, TF*IDF and lexical affinity scoring techniques, and heavier weighting for terms that appear in titles or larger fonts. Some of these ranking techniques (e.g., ranking by frequency of terms in anchortext) are query-dependent, and can only be computed in response to a query. Others (e.g., PageRank) are static, and do not depend on the query that has been submitted.
- There is a conflict between the desire to have a good searchable intranet and the inherent diversification of the way that information is presented using web technology. In many ways, this conflict mirrors the tensions that exist on the Internet. People want their Internet pages to be seen, and Internet implementers want their information to be discoverable. At the same time, myriad other factors such as social forces, technology limitations, and a lack of understanding of search by web developers can lead to decisions that conflict with good search results.
- Intranet search is different from Internet search for several reasons: the queries asked on the intranet are different, the notion of a “good answer” is different, and the social processes that create the intranet are different from those that create the Internet. Queries on an intranet tend to be jargon-heavy and use various acronyms and abbreviations that reflect the structure of the organization or company that uses that intranet. In addition, the correct answer to a query is often specific to a site, geographic location, or an organizational division, but the user often does not make this intent explicit in the query. Context-sensitive search is a common problem for many intranets and the Internet.
- A great deal of work has been done over the years to assess the effectiveness of different search techniques, but their effectiveness tends to be a function of the underlying corpus being searched and the characterization of the queries and users that are accessing the data. Each intranet is an island unto itself, reflecting the character of the organization that it represents. For this reason, what works well for the Internet may not work well for an intranet, and what works for one intranet may not work well for another. Part of this is derived from the nature of the organization. In a university intranet, desirable searching features may comprise free speech and diversity of opinion. In a corporation, desirable searching features may comprise hierarchical distribution of authority and focus upon the mission. Consequently, ranking functions need to reflect the particular value system of the organization whose data is being indexed. This suggests that customization is an important feature of an intranet search engine.
- Within an organization, employees tend to fulfill a role that is consistent with their job description. Thus an employee of the marketing department may have a different need than an employee of a research division, and a lawyer may have a different need than a programmer. This suggests that ranking methods for search engines should provide personalization of ranking functions.
- A simple approach to mixing different features is to apply numerical weights to the features and use a mixing function to combine these numerical weights into a single score for ranking documents. However, the scales and distribution of scores from different features can be incomparable, and it is difficult to arrive at an optimal mixing function. One approach to addressing this problem that has been suggested previously uses Bayesian probabilistic models for retrieval, treating the different scores given to documents as probabilities and merging them according to a probabilistic model. Reference is made to W. Croft, “Combining approaches to information retrieval”, Advances in Information Retrieval. Kluwer Academic Publishers, 2000; D. Hiemstra. “Using Language Models for Information Retrieval”, PhD thesis, University of Twente, Twente, The Netherlands, 2001; W. Kraaij, et. al., “The importance of prior probabilities for entry page search”, In Proc. 25th SIGIR, pages 27-34, 2002; T. Westerveld, et. al., “Retrieving web pages using content links”, URLs and anchors. In Proc. 10th TREC, pages 663-672, 2001. Although this approach has proven to be useful, it would be desirable to present additional improvements.
- What is therefore needed is a system, a service, a computer program product, and an associated method for ranking scales and distributions of scores from different ranking systems based on different ranking features. The solution should be customizable to meet the needs and characteristics of a specific network, intranet, or client. The need for such a solution has heretofore remained unsatisfied.
- The present invention satisfies this need, and presents a system, a computer program product, and an associated method (collectively referred to herein as “the system” or “the present system”) for ranking search results using a rank aggregation method to allow flexible implementation of text search processors. The present system introduces a means by which many different features of documents can be merged to give an ordered list of results. Further, the ranking functions of the present system can easily be customized to the needs of a particular corpus or collection of users such as, for example, an intranet. Advantageously, rank aggregation is independent of the underlying score distributions between the different factors and can be applied to merge any set of ranking functions. While discussed in terms of an intranet or the Internet, the present system is applicable as well to, for example, indexing email repositories, newsgroups, instant messaging logs, and the like.
- Rank aggregation holds the advantage of combining the influence of many different heuristic factors in a robust way to produce high-quality results for queries. A rank aggregation processor takes several ranked lists, each of which ranks part of a collection of candidates (web pages), and produces an ordering of all the candidates in the union of the lists. Reference is made to C. Dwork, et. al. “Rank aggregation methods for the web”, Proc. 10th WWW, pages 613-622, 2001. Ordering produced by the rank aggregation processor is aimed at minimizing the total number of inversions, that is, the total number of “upsets” caused by the output ordering with respect to initial orderings. Rank aggregation is employed as a tool to combine initial rankings of pages or documents (produced by various ranking functions) into an aggregate ranking that is improved over any one of the constituent ranking functions.
- There are numerous factors that affect the ranking of search results The present system uses rank aggregation to combine many ranked lists into a single ranking of results. The present system combines factors such as indegree, page ranking, URL length, proximity to the root server of an intranet, etc, to form a single ordering on web pages that closely obeys the individual orderings, but also mediates between the collective wisdom of individual heuristics. Rank aggregation is resilient to spam; specifically, if one ranking is manipulated to influence the results, the aggregation of different rankings still reflects the collective wisdom of the majority of the rankings. Consequently, the present system produces robust rankings that are not as subject to manipulation as conventional techniques.
- The present system provides the opportunity to customize and personalize search results within a single software architecture. When deploying a search engine in many different intranets, the ranking function can be tailored to represent the set of users and that structure of the organization. For example, a client may require a particular server to supply all answers to human resource questions. The present system incorporates this requirement incorporated into the ranking function without the need to change the underlying text index software. Moreover, a “best” answer to a query may depend on the location of a user asking the question. For example, the query “retirement” can have different answers if the query is submitted in Zurich than if it is submitted in Raleigh. The present system can incorporate the geographical location of a user without changing the underlying indexing software that returns ranked lists of results.
- At the top level, a text search application takes a set of documents in and produces a graded set of documents. More generally, a text search application typically comprises the following features: a collection of documents, a set of users, a set of scoring modules, and a rank aggregation processor. In the case of the web, information about the collection of documents is often gathered into one place via a web crawler, but the construction of an index can also be done in a distributed fashion where the documents reside. The set of users issue queries on the documents and receive lists of results from the system that suggest which documents are relevant to the query.
- The set of scoring modules and the rank aggregation processor comprise a set of heuristics and scoring methods that take as input the set of documents and produce a graded set of documents, based on heuristics for relevance or authoritativeness. The heuristics comprise several categories: static orderings of the documents, dynamic orderings that depend on the particular query, and dynamic orderings that depend on the particular user. Dynamic orderings that depend on the particular query and dynamic orderings that depend on the particular user are typically used to decide which documents are relevant to a query. Static orderings of the documents are used to discriminate between which of the relevant documents to rank first.
- The static orderings of the documents are based on features of the documents themselves but independent of the query or the user. Examples of static orderings of documents comprise PageRank, document length, media type, position in a hierarchy, rankings by authors, rankings by an independent authority, etc.
- Dynamic orderings that depend on the particular query comprise, for example, the number of times that a query term appears in the document and other documents (TF*IDF), whether the query terms appear in a title, whether the query terms appear in anchor text for a document, whether the query terms appear near each other in the document (e.g., lexical affinities), the placement of query terms within a document, etc. For example, documents that contain terms early in their text or within a title may be more relevant than documents that contain the query term in a footnote or appendix. Dynamic orderings that depend on the particular query may be implemented within a traditional inverted keyword index or may be implemented outside the index to allow customization without the need to modify the index software.
- Dynamic orderings that depend on the particular user comprise, for example, geographical proximity scores for the user, role or job title of the user within an organization, educational level, history of previous queries by the user, etc.
- The present system may be embodied in a utility program such as a modular scoring utility program. The modular scoring utility program is customized for a client's intranet and particular needs. The present system provides means for the user to identify an intranet, the Internet, a database, or other set of data as input data from which query results may be scored by the present system. The present system also provides means for the user to specify a query in the form of text input. A user specifies the input data and the query and then invokes the modular scoring utility program to produce a scored set of documents.
- The various features of the present invention and the manner of attaining them will be described in greater detail with reference to the following description, claims, and drawings, wherein reference numerals are reused, where appropriate, to indicate a correspondence between the referenced items, and wherein:
-
FIG. 1 is a schematic illustration of an exemplary operating environment in which a modular scoring system of the present invention can be used; -
FIG. 2 is a block diagram of the high-level architecture of the modular scoring system ofFIG. 1 ; -
FIG. 3 is a process flow chart illustrating a method of configuring the modular scoring system ofFIGS. 1 and 2 ; -
FIG. 4 is a process flow chart illustrating a method of operation of the modular scoring system ofFIGS. 1 and 2 ; -
FIG. 5 is a block diagram of the high-level architecture of one embodiment of the modular scoring system ofFIGS. 1 and 2 ; and -
FIG. 6 is a block diagram of the high-level architecture of one embodiment of the modular scoring system ofFIGS. 1 and 2 . - The following definitions and explanations provide background information pertaining to the technical field of the present invention, and are intended to facilitate the understanding of the present invention without limiting its scope:
- Internet: A collection of interconnected public and private computer networks that are linked together with routers by a set of standards protocols to form a global, distributed network.
- Inlink: Links coming into a web page or document such as an HTML document from another web page or document.
- Intranet: A network that is internal to a company or organization that may not be connected to the Internet, but that uses standard Internet protocols and has some similar functions.
- Link: A pointer in a web page or in a document such as an HTML document that leads to another web page or to another place within the same document; also called a hyperlink.
- Rank: An index assigned to a document/web page having value 1 through the number of documents, with the highest rank corresponding to 1.
- Score: A numeric value (usual a fractional real number) assigned to a document/page by the scoring technique (e.g., pagerank) from which a ranking can be obtained.
- URL (Uniform Resource Locator): A unique address that fully specifies the location of a content object on the Internet. The general format of a URL is protocol://server-address/path/filename, where the server-address is referenced as the host rank.
- World Wide Web (WWW, also Web): An Internet client—server hypertext distributed information retrieval system.
-
FIG. 1 portrays an exemplary overall environment in which a system, a service, a computer program product, and an associated method (the “system 10”) for ranking search results using a modular scoring system according to the present invention may be used.System 10 comprises a software programming code or a computer program product that is typically embedded within, or installed on ahost server 15. Alternatively,system 10 can be saved on a suitable storage medium such as a diskette, a CD, a hard drive, or like devices. - Users, such as remote Internet users, are represented by a variety of computers such as
computers host server 15 through anetwork 35. In one embodiment,system 10 scores results obtained by a search engine operated onhost server 15. The results are accessed bysystem 10 from database (dB) 40. Users access the results of the scoring when performing key word searches on the Internet. In another embodiment, users provide an index or other hierarchical structure tosystem 10;system 10 then scores the index or other hierarchical structure for the user. Output fromsystem 10 is stored ondB 40 or on a storage repository of the user. -
Computers host server 15. Thehost server 15 is connected to network 35 via acommunications link 45 such as a telephone, cable, or satellite link.Computers communications links system 10 is described in terms ofnetwork 35,computers system 10 locally rather than remotely.Computers system 10 either manually, or automatically through the use of an application. -
System 10 uses the term “graded set” to refer to a set of documents in which each document in the set of documents in the graded set comprises a numerical score. The numerical score is a relevance score as applied to a search term. In one embodiment, the numerical scores may all be equal in which case the graded set can be thought of as just a set. Graded sets are known as “fuzzy sets”. -
FIG. 2 illustrates a high-level hierarchy ofsystem 10.System 10 comprises a set of scoringmodules 205, aduplication module 210, and arank aggregation processor 215. Each of the scoringmodules 205 takes as input one or more graded sets of documents, anauxiliary information module 225, and (optionally) aquery 220. Output from each of the scoring modules to therank aggregation processor 215 is a ranked set of documents. In one embodiment, therank aggregation processor 215 weights the outputs from each of the scoringmodules 205 equally. In another embodiment, therank aggregation processor 215 weights the outputs from each of the scoringmodules 205 differently to meet scoring requirements of a specific client, user, intranet, or network. - An
auxiliary information module 225 comprises data that may be used to customize the output ofsystem 10 such as, for example, a user ID, a history of queries made by a user, a history of documents read by a user, a history of click-through results by a user, the geographic location of a user, the security classification of a user, a language set for a user, a set of documents for comparison uses during scoring, etc. Additional data may be used by theauxiliary information module 225 as needed to customize the output ofsystem 10 to a client, intranet, or other network such as, for example, the Internet. In one embodiment,system 10 excludesauxiliary information module 225 from the process of scoring an aggregated set of graded documents. - The
duplication module 210 takes as input one graded set of documents and produces two or more identical copies of the graded set of documents for use as needed by the scoringmodules 205 and therank aggregation processor 215. Input to therank aggregation processor 215 comprises two or more ranked sets of documents produced by the scoringmodules 205 and optional weights on the ranked sets of documents. Therank aggregation processor 215 produces a scored set of documents using rank aggregation to merge the outputs of some or all of the scoringmodules 205 and produce a single scored set of documents. - Scoring
modules 205 comprise a set ofindices 230 such as, for example, acontent index 235, atitle index 240, and ananchortext index 245. Additional indices may be used as desired. Thecontent index 235, thetitle index 240, and theanchortext index 245 take asinput query 220 and find a set of documents indB 40 that match the text ofinput query 220. The indices (e.g., content index, title index, anchortext index) provide pointers into the set of documents indB 40 containing the query terms, and pass them to theunion module 250 and to therank aggregation processor 215. - The
content index 235 comprises an inverted keyword index on document content. Thetitle index 240 comprises an inverted keyword index on titles and metadata about the set of found documents. Theanchortext index 245 comprises an inverted keyword index on anchortext for the set of found documents. The anchortext for a document typically comprises the highlighted text that a user clicks on to navigate to the document. The anchortext for a document may further comprise text surrounding the highlighted text. Thetitle index 240 andanchortext index 245 comprise virtual documents. -
Indices 230 provide graded lists of found documents that are scored using any suitable scoring analysis such as, for example, TF*IDF (Term Frequency Times Inverse Document Frequency). TF*IDF scores a document based on the number of terms a query term appears in a document: the higher the term frequency, the more relevant the document. Further, TF*IDF determines the relevance of a query term by the number of documents comprising the query term. TF*IDF places more weight on a less common term than a more common term as determined by the number of documents found with each term. Consequently, documents with the highest number of least common terms in the search query receive the highest score. - The outputs of
indices 230 are combined in aunion module 250 to form a single graded set of documents. Theduplication module 210 duplicates the single graded set of documents as needed to provide inputs to the scoringmodules 205. As needed, scoringmodules 205 may also utilizequery 220 andauxiliary information module 225 as input. Scoringmodules 205 further comprise ranking or scoring processors such as, for example, apage ranking processor 255, anindegree processor 260, adiscovery date processor 265, aURL word processor 270, aURL depth processor 275, aURL length processor 280, ageography processor 285, adiscriminator processor 290, etc. This list of the scoringmodules 205 is illustrative of the use of various scoring techniques bysystem 10. Any type of processor that produces a ranked set of documents from a graded set of documents may be used as one of the scoringmodules 205. - The scoring
modules 205 may be selected or deselected byselection module 295 as needed for a query, a user, a client, an intranet, etc. Thepage ranking processor 255 ranks a graded set of documents utilizing, for example, the PageRank algorithm. The computation of page ranking depends primarily on the link structure between web pages and is therefore useful for HTML content. Any type of page ranking processor may be used bysystem 10 such as, for example, any variation on the PageRank algorithm. - The
indegree processor 260 ranks a graded set of documents based on the number of inlinks to the document. The indegree processor accords a rank to a document or web page that is proportional to the number of links into the document or web page. - The
discovery date processor 265 ranks a graded set of documents based on the time that a crawler discovers a web page or a document. If a crawl is started from a single seed, then the order in which pages are crawled tends to be similar to a breadth first search through the link graph [reference is made to M. Najork, et. al., “Breadth-first search crawling yields high-quality pages”, In Proc. 10th WWW, pages 114-118, 2001]. A sequence of times that a page is discovered by a hyperlink provides an approximation to the hyperlink graph distance of the page from the root seed of the network. Thediscovery date processor 265 accords a rank to a document or a web page that is inversely proportional to the distance of the document or web page from the root seed of the network. A document or a web page that is close to the root seed of the network receives a higher rank than a document or a web page that is further away from the root seed of the network. - The
URL words processor 270 compares the text of the query term with a URL of a document or web page in the graded set. Input to theURL words processor 270 comprisesquery 220 and a graded set of documents. TheURL words processor 270 accords a higher rank to a document that comprise a query term as a substring in the URL corresponding to that document. - The
URL depth processor 275 accords a rank to a document or a web page that is inversely proportional to the number of delimiters such as a slash character (“/”) that appear in the path component of the URL corresponding to that document or web page. The number of delimiters in the path component of a URL indicates the relative position of a document or a web page in a directory hierarchy; fewer delimiters indicate a higher position in the hierarchy. Between two pages relevant to a query on the same host, theURL depth processor 275 favors a document or web page near the top of a directory hierarchy. Documents or web pages near the top of a directory hierarchy tend to be more general and have links to pages or documents lower in the hierarchy. Consequently, documents or web pages near the top of a directory hierarchy (having fewer delimiters) tend to be more authoritative than those documents or web pages at the bottom of the hierarchy (having more delimiters). - The
URL length processor 280 accords a rank to a document or a web page that is inversely proportional to a length of the URL corresponding to that document or web page. When comparing documents comprising comparable content, theURL length processor 280 considers documents with shorter URL strings as more authoritative. - The
geography processor 285 accords rank to a web page or a document based on a geographical location associated with that document or web page compared to the geographical location of a client or a user. Thegeography processor 285 is especially useful for worldwide organizations that have employees in many different countries. For example, a user queries a company intranet in Japan regarding company benefits. Thegeography processor 285 accords a high rank to documents about company benefits that correspond to Japan as opposed to documents that correspond to Sweden. - The
discriminator 290 accords rank to a web page or a document in favor of certain classes of URLs over others. The favored URLs comprise, for example, those that end in a slash character (“/”) or “index.html”. The favored URLs further comprise those URLs that comprise a tilde character (“˜”). These URL's are typically the main page of a site. Thediscriminator 285 further discriminates, for example, against certain classes of dynamic URLs containing a question mark character (“?”). Thediscriminator 290 is neutral on all other URLs and is easily customized to knowledge of a specific intranet or other network. - The
rank aggregation processor 215 utilizes a variety of methods to aggregate the outputs of the scoringmodules 205 such as, for example, positional methods, graph methods, or Markov chain methods. When using positional methods, therank aggregation processor 215 gives each document an output score that is computed as a function of the various ranks received by a document or a web page from the scoringmodules 205. The output score assignment may be determined by, for example, the mean rank or the median rank. A document or a web page is then scored by the output rank received. - The graph method creates a graph of 2n vertices D1, . . . , Dn, P, . . . , Pn, where n is the number of documents. Edges of the graph are of the form (Di, Pj). A “cost” associated with the edge (Di, Pj) reflects the badness of scoring document i as the jth best document, where “badness” corresponds to divergence from an ideal. The
rank aggregation processor 215 determines the costs from the input graded sets. As an example, if document i receives ranks R1, . . . , Rk from the k input graded sets, therank aggregation processor 215 may define the cost of (Di, Pj) as the sum of the quantities |(j−Rt)|, for t=1, . . . , k. Once these costs have been defined, a “minimum-cost perfect matching” is computed. A perfect matching assigns a unique position (score) j to each document i. The unique position j places document i in a scored aggregate list of documents. - The Markov chain method creates a graph of n vertices D1, . . . , Dn where n is the number of documents. The edges are of the form (Di, Dj), and the weight of the edge (Di, Dj) reflects perceived improvement of Dj over Di by the input graded sets. For example, if a majority of the input ranked sets rank Dj above Di, then the weight is 1, else it is 0. The weights of the out-going edges of each vertex Di are normalized so that their sum is exactly 1. The
rank aggregation processor 215 then determines the stationary probability distribution of a random walk on this graph via an eigenvector computation. Therank aggregation processor 215 then sorts the web pages or documents as represented by the vertices in decreasing order of stationary probability, yielding the final score of the documents [reference is made to J. Allan, et. al., “INQUERY and TREC-9”. In Proc. 9th TREC, pages 551-562, 2000]. -
System 10 can be configured to meet the specific needs of a client.FIG. 3 illustrates amethod 300 of configuringsystem 10 for a specific client. Customer requirements are determined (step 305) and required auxiliary information is identified (step 310). Customer requirements and required auxiliary information are used to configure the auxiliary information module 225 (step 315) and configure scoringmodules 205 for use by the client (step 320). For example, a client having a homogeneous workforce that operates in one geographic area does not require ageography processor 285. As another example a user may wish to use in-degree or pagerank, but not both. -
FIG. 4 illustrates amethod 400 of operation ofsystem 10. A user enters a query 220 (FIG. 2 )). The query is transmitted toindices 230 and optionally to other scoring modules 205 (step 410). Each of theindices 230 produces a graded set of documents from the query (step 415). - The
union module 250 combines outputs ofindices 230 into a single graded set of documents (step 420). Theduplication module 210 duplicates the single graded set of documents as needed for selected scoring modules 205 (step 425). The scoringmodules 205 are selected for a specific configuration as required by the client. The scoringmodules 205 score their copy of the graded set of documents (step 430). Therank aggregation module 215 forms a single scored set of documents by merging and scoring the outputs of the scoringmodules 205 according to predetermined criteria selected to meet client requirements. - The scoring
modules 205, theduplication module 210, and therank aggregation processor 215 can be configured withinsystem 10 in a variety of combinations as necessary to refine the scoring process and achieve performance desired by the client. The selection of a configuration may be achieved by mathematical considerations or by comparison on the basis of human trials. -
FIG. 5 illustrates a high-level hierarchy of one embodiment ofsystem 10 in which two levels of aggregation are used. These two levels of aggregation can result in different scores by giving different weights to different features in the scoring process. Arank aggregation processor 505 processes outputs ofindices 230 to create an initial scored set for analysis by selected scoringmodules 205 and therank aggregation processor 215. -
FIG. 6 illustrates a high-level hierarchy of another embodiment ofsystem 10 in which two levels of aggregation are used in a different configuration. A rank aggregation processor processes outputs ofsimilar scoring modules 205 prior to an overall rank aggregation. For example, outputs from thepage ranking processor 255 and theindegree processor 250 are aggregated together and scored by arank aggregation processor 605. Outputs from theURL length processor 280, theURL words processor 270, and theURL depth processor 275 are aggregated together and scored by arank aggregation processor 610. Outputs from thediscovery date processor 265, thegeography processor 285, thediscriminator 290, etc., are aggregated together and scored by arank aggregation processor 615. - The outputs of the
rank aggregation processor 605, therank aggregation processor 610, and therank aggregation processor 615 are intermediate scored sets of documents that are then processed by arank aggregation processor 620 along with output fromindices 230. In this embodiment, therank aggregation processor 605, therank aggregation processor 610, and therank aggregation processor 615 each process ranked sets of documents that have some elements that do not appear in the other sets. In other terms, the more commensurate scores are aggregated first, and then these ranks are used as input the final aggregation. - It is to be understood that the specific embodiments of the invention that have been described are merely illustrative of certain applications of the principle of the present invention. Numerous modifications may be made to a system and method for ranking search results using a modular scoring system described herein without departing from the spirit and scope of the present invention.
- Moreover, while the present invention is described for illustration purpose only in relation to intranets, it should be clear that the invention is applicable as well to, for example, the WWW and to subsets of the WWW in addition to data derived from any source stored in any format that is accessible by the present invention. Furthermore, although the present invention is described in terms of the PageRank algorithm, it should be clear that the present invention is applicable as well to, for example, other search applications and ranking techniques without departing from the scope of the present invention. While the present invention has been described in terms of web search engines, it should be clear that the present invention is applicable as well to, for example, indexing email repositories, newsgroups, instant messaging logs, and the like.
Claims (26)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/841,391 US7257577B2 (en) | 2004-05-07 | 2004-05-07 | System, method and service for ranking search results using a modular scoring system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/841,391 US7257577B2 (en) | 2004-05-07 | 2004-05-07 | System, method and service for ranking search results using a modular scoring system |
Publications (2)
Publication Number | Publication Date |
---|---|
US20050262050A1 true US20050262050A1 (en) | 2005-11-24 |
US7257577B2 US7257577B2 (en) | 2007-08-14 |
Family
ID=35376419
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/841,391 Expired - Lifetime US7257577B2 (en) | 2004-05-07 | 2004-05-07 | System, method and service for ranking search results using a modular scoring system |
Country Status (1)
Country | Link |
---|---|
US (1) | US7257577B2 (en) |
Cited By (106)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060085427A1 (en) * | 2004-10-15 | 2006-04-20 | Microsoft Corporation | Method and apparatus for intranet searching |
US20060242137A1 (en) * | 2005-04-21 | 2006-10-26 | Microsoft Corporation | Full text search of schematized data |
US20060248066A1 (en) * | 2005-04-28 | 2006-11-02 | Microsoft Corporation | System and method for optimizing search results through equivalent results collapsing |
US20070118521A1 (en) * | 2005-11-18 | 2007-05-24 | Adam Jatowt | Page reranking system and page reranking program to improve search result |
US20070130205A1 (en) * | 2005-12-05 | 2007-06-07 | Microsoft Corporation | Metadata driven user interface |
US20070143437A1 (en) * | 2005-12-16 | 2007-06-21 | Microsoft Corporation | Global and local entity naming |
WO2007091896A1 (en) * | 2006-02-08 | 2007-08-16 | Telenor Asa | Document similarity scoring and ranking method, device and computer program product |
US20070198504A1 (en) * | 2006-02-23 | 2007-08-23 | Microsoft Corporation | Calculating level-based importance of a web page |
US20070208730A1 (en) * | 2006-03-02 | 2007-09-06 | Microsoft Corporation | Mining web search user behavior to enhance web search relevance |
US20070266025A1 (en) * | 2006-05-12 | 2007-11-15 | Microsoft Corporation | Implicit tokenized result ranking |
US20070288498A1 (en) * | 2006-06-07 | 2007-12-13 | Microsoft Corporation | Interface for managing search term importance relationships |
US20080071766A1 (en) * | 2006-03-01 | 2008-03-20 | Semdirector, Inc. | Centralized web-based software solutions for search engine optimization |
US20080071767A1 (en) * | 2006-08-25 | 2008-03-20 | Semdirector, Inc. | System and method for measuring the effectiveness of an on-line advertisement campaign |
US20080071929A1 (en) * | 2006-09-18 | 2008-03-20 | Yann Emmanuel Motte | Methods and apparatus for selection of information and web page generation |
US20080077583A1 (en) * | 2006-09-22 | 2008-03-27 | Pluggd Inc. | Visual interface for identifying positions of interest within a sequentially ordered information encoding |
US20080104113A1 (en) * | 2006-10-26 | 2008-05-01 | Microsoft Corporation | Uniform resource locator scoring for targeted web crawling |
US20080126331A1 (en) * | 2006-08-25 | 2008-05-29 | Xerox Corporation | System and method for ranking reference documents |
US20080147640A1 (en) * | 2006-12-19 | 2008-06-19 | Schachter Joshua E | Techniques for including collection items in search results |
US20080147578A1 (en) * | 2006-12-14 | 2008-06-19 | Dean Leffingwell | System for prioritizing search results retrieved in response to a computerized search query |
US20080147641A1 (en) * | 2006-12-14 | 2008-06-19 | Dean Leffingwell | Method for prioritizing search results retrieved in response to a computerized search query |
WO2008074152A1 (en) * | 2006-12-20 | 2008-06-26 | Ma, Gary, Manchoir | Method of displaying a subjective score with search engine results |
US20080228719A1 (en) * | 2007-03-13 | 2008-09-18 | Fatdoor, Inc. | People and business search result optimization |
US20080243812A1 (en) * | 2007-03-30 | 2008-10-02 | Microsoft Corporation | Ranking method using hyperlinks in blogs |
WO2008121144A1 (en) * | 2007-04-02 | 2008-10-09 | University Of Washington | Open information extraction from the web |
US20080275865A1 (en) * | 2007-05-04 | 2008-11-06 | Sony Ericsson Mobile Communications Ab | Searching and ranking contacts in contact database |
US20080313167A1 (en) * | 2007-06-15 | 2008-12-18 | Jim Anderson | System And Method For Intelligently Indexing Internet Resources |
US20090063462A1 (en) * | 2007-09-04 | 2009-03-05 | Google Inc. | Word decompounder |
US20090083248A1 (en) * | 2007-09-21 | 2009-03-26 | Microsoft Corporation | Multi-Ranker For Search |
US20090083257A1 (en) * | 2007-09-21 | 2009-03-26 | Pluggd, Inc | Method and subsystem for information acquisition and aggregation to facilitate ontology and language-model generation within a content-search-service system |
US20090083262A1 (en) * | 2007-09-21 | 2009-03-26 | Kevin Chen-Chuan Chang | System for entity search and a method for entity scoring in a linked document database |
US20090083256A1 (en) * | 2007-09-21 | 2009-03-26 | Pluggd, Inc | Method and subsystem for searching media content within a content-search-service system |
US20090157643A1 (en) * | 2007-12-12 | 2009-06-18 | Microsoft Corporation | Semi-supervised part-of-speech tagging |
US7584221B2 (en) | 2004-03-18 | 2009-09-01 | Microsoft Corporation | Field weighting in text searching |
US20090234824A1 (en) * | 2008-03-14 | 2009-09-17 | International Business Machines Corporation | Browser Use of Directory Listing for Predictive Type-Ahead |
US7599917B2 (en) * | 2005-08-15 | 2009-10-06 | Microsoft Corporation | Ranking search results using biased click distance |
US20090254543A1 (en) * | 2008-04-03 | 2009-10-08 | Ofer Ber | System and method for matching search requests and relevant data |
US7603616B2 (en) | 2000-01-28 | 2009-10-13 | Microsoft Corporation | Proxy server using a statistical model |
US20100022752A1 (en) * | 2002-10-29 | 2010-01-28 | Young Malcolm P | Identifying components of a network having high importance for network integrity |
US7676464B2 (en) | 2006-03-17 | 2010-03-09 | International Business Machines Corporation | Page-ranking via user expertise and content relevance |
US20100070495A1 (en) * | 2008-09-12 | 2010-03-18 | International Business Machines Corporation | Fast-approximate tfidf |
EP2169568A1 (en) | 2008-09-17 | 2010-03-31 | OGS Search Limited | Method and apparatus for generating a ranked index of web pages |
US20100114862A1 (en) * | 2002-10-29 | 2010-05-06 | Ogs Limited | Method and apparatus for generating a ranked index of web pages |
US7716198B2 (en) | 2004-12-21 | 2010-05-11 | Microsoft Corporation | Ranking search results using feature extraction |
US7739277B2 (en) | 2004-09-30 | 2010-06-15 | Microsoft Corporation | System and method for incorporating anchor text into ranking search results |
US20100162093A1 (en) * | 2008-12-18 | 2010-06-24 | Google Inc. | Identifying comments to show in connection with a document |
US7761448B2 (en) * | 2004-09-30 | 2010-07-20 | Microsoft Corporation | System and method for ranking search results using click distance |
US20100205183A1 (en) * | 2009-02-12 | 2010-08-12 | Yahoo!, Inc., a Delaware corporation | Method and system for performing selective decoding of search result messages |
US7792833B2 (en) | 2005-03-03 | 2010-09-07 | Microsoft Corporation | Ranking search results using language types |
US7827181B2 (en) * | 2004-09-30 | 2010-11-02 | Microsoft Corporation | Click distance determination |
US7840522B2 (en) | 2007-03-07 | 2010-11-23 | Microsoft Corporation | Supervised rank aggregation based on rankings |
US7840569B2 (en) | 2007-10-18 | 2010-11-23 | Microsoft Corporation | Enterprise relevancy ranking using a neural network |
US7877343B2 (en) | 2007-04-02 | 2011-01-25 | University Of Washington Through Its Center For Commercialization | Open information extraction from the Web |
US20110191687A1 (en) * | 2010-01-29 | 2011-08-04 | Kabushiki Kaisha Toshiba | Mobile terminal |
US20110258184A1 (en) * | 2007-06-27 | 2011-10-20 | Oracle International Corporation | Changing ranking algorithms based on customer settings |
WO2011129993A1 (en) * | 2010-04-14 | 2011-10-20 | Raytheon Company | Relevance-based open source intelligence (osint) collection |
US20110282888A1 (en) * | 2010-03-01 | 2011-11-17 | Evri, Inc. | Content recommendation based on collections of entities |
US20110282816A1 (en) * | 2007-05-04 | 2011-11-17 | Microsoft Corporation | Link spam detection using smooth classification function |
US8117200B1 (en) * | 2005-01-14 | 2012-02-14 | Wal-Mart Stores, Inc. | Parallelizing graph computations |
US8150860B1 (en) * | 2009-08-12 | 2012-04-03 | Google Inc. | Ranking authors and their content in the same framework |
US20120173544A1 (en) * | 2004-12-30 | 2012-07-05 | Google Inc. | Authoritative document identification |
US20120197879A1 (en) * | 2009-07-20 | 2012-08-02 | Lexisnexis | Fuzzy proximity boosting and influence kernels |
US8316007B2 (en) | 2007-06-28 | 2012-11-20 | Oracle International Corporation | Automatically finding acronyms and synonyms in a corpus |
US8332259B1 (en) * | 2008-11-03 | 2012-12-11 | Intuit Inc. | Method and system for evaluating expansion of a business |
US8352475B2 (en) | 2006-03-01 | 2013-01-08 | Oracle International Corporation | Suggested content with attribute parameterization |
US8396878B2 (en) | 2006-09-22 | 2013-03-12 | Limelight Networks, Inc. | Methods and systems for generating automated tags for video files |
US8396742B1 (en) | 2008-12-05 | 2013-03-12 | Covario, Inc. | System and method for optimizing paid search advertising campaigns based on natural search traffic |
US8433712B2 (en) | 2006-03-01 | 2013-04-30 | Oracle International Corporation | Link analysis for enterprise environment |
US20130110815A1 (en) * | 2011-10-28 | 2013-05-02 | Microsoft Corporation | Generating and presenting deep links |
US8504547B1 (en) * | 2008-04-23 | 2013-08-06 | Google Inc. | Customizing image search for user attributes |
US8595255B2 (en) | 2006-03-01 | 2013-11-26 | Oracle International Corporation | Propagating user identities in a secure federated search system |
US8601028B2 (en) | 2006-03-01 | 2013-12-03 | Oracle International Corporation | Crawling secure data sources |
US8707451B2 (en) | 2006-03-01 | 2014-04-22 | Oracle International Corporation | Search hit URL modification for secure application integration |
US8725770B2 (en) | 2006-03-01 | 2014-05-13 | Oracle International Corporation | Secure search performance improvement |
US8738635B2 (en) | 2010-06-01 | 2014-05-27 | Microsoft Corporation | Detection of junk in search result ranking |
US8793706B2 (en) | 2010-12-16 | 2014-07-29 | Microsoft Corporation | Metadata-based eventing supporting operations on data |
US20140214814A1 (en) * | 2013-01-29 | 2014-07-31 | Sriram Sankar | Ranking search results using diversity groups |
US20140222792A1 (en) * | 2008-06-18 | 2014-08-07 | Dirk H. Groeneveld | Name search using a ranking function |
US8812493B2 (en) | 2008-04-11 | 2014-08-19 | Microsoft Corporation | Search results ranking using editing distance and document information |
US8843486B2 (en) | 2004-09-27 | 2014-09-23 | Microsoft Corporation | System and method for scoping searches using index keys |
US8868540B2 (en) | 2006-03-01 | 2014-10-21 | Oracle International Corporation | Method for suggesting web links and alternate terms for matching search queries |
US8875249B2 (en) | 2006-03-01 | 2014-10-28 | Oracle International Corporation | Minimum lifespan credentials for crawling data repositories |
US8943039B1 (en) | 2006-08-25 | 2015-01-27 | Riosoft Holdings, Inc. | Centralized web-based software solution for search engine optimization |
US8949216B2 (en) | 2012-12-07 | 2015-02-03 | International Business Machines Corporation | Determining characteristic parameters for web pages |
US8972379B1 (en) | 2006-08-25 | 2015-03-03 | Riosoft Holdings, Inc. | Centralized web-based software solution for search engine optimization |
WO2014184785A3 (en) * | 2013-05-16 | 2015-04-16 | Yandex Europe Ag | Method and system for presenting image information to a user of a client device |
US9015172B2 (en) | 2006-09-22 | 2015-04-21 | Limelight Networks, Inc. | Method and subsystem for searching media content within a content-search service system |
US20150142969A1 (en) * | 2004-07-09 | 2015-05-21 | Mercury Kingdom Assets Limited | Web page performance scoring |
US9165040B1 (en) * | 2006-10-12 | 2015-10-20 | Google Inc. | Producing a ranking for pages using distances in a web-link graph |
US20150379140A1 (en) * | 2014-06-30 | 2015-12-31 | Google Inc. | Surfacing in-depth articles in search results |
US20160094676A1 (en) * | 2012-07-25 | 2016-03-31 | Oracle International Corporation | Heuristic caching to personalize applications |
US9348912B2 (en) | 2007-10-18 | 2016-05-24 | Microsoft Technology Licensing, Llc | Document length as a static relevance feature for ranking search results |
US9454582B1 (en) * | 2011-10-31 | 2016-09-27 | Google Inc. | Ranking search results |
US9471670B2 (en) | 2007-10-17 | 2016-10-18 | Vcvc Iii Llc | NLP-based content recommender |
US20160321250A1 (en) * | 2015-05-01 | 2016-11-03 | Microsoft Technology Licensing, Llc | Dynamic content suggestion in sparse traffic environment |
US9495462B2 (en) | 2012-01-27 | 2016-11-15 | Microsoft Technology Licensing, Llc | Re-ranking search results |
US9613004B2 (en) | 2007-10-17 | 2017-04-04 | Vcvc Iii Llc | NLP-based entity recognition and disambiguation |
US20170169055A1 (en) * | 2010-12-30 | 2017-06-15 | Google Inc. | Semantic geotokens |
US9934313B2 (en) | 2007-03-14 | 2018-04-03 | Fiver Llc | Query templates and labeled search tip system, methods and techniques |
US10049150B2 (en) | 2010-11-01 | 2018-08-14 | Fiver Llc | Category-based content recommendation |
US10331783B2 (en) | 2010-03-30 | 2019-06-25 | Fiver Llc | NLP-based systems and methods for providing quotations |
US10606851B1 (en) * | 2018-09-10 | 2020-03-31 | Palantir Technologies Inc. | Intelligent compute request scoring and routing |
CN111522533A (en) * | 2020-04-24 | 2020-08-11 | 中国标准化研究院 | Product modular design method and device recommended based on user's personalized needs |
US11068943B2 (en) | 2018-10-23 | 2021-07-20 | International Business Machines Corporation | Generating collaborative orderings of information pertaining to products to present to target users |
US11263209B2 (en) * | 2019-04-25 | 2022-03-01 | Chevron U.S.A. Inc. | Context-sensitive feature score generation |
US20220076810A1 (en) * | 2017-05-25 | 2022-03-10 | Enlitic, Inc. | Medical scan interface feature evaluating system and methods for use therewith |
WO2023023099A1 (en) * | 2021-08-16 | 2023-02-23 | Elasticsearch B.V. | Search query refinement using generated keyword triggers |
Families Citing this family (46)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7725463B2 (en) * | 2004-06-30 | 2010-05-25 | Microsoft Corporation | System and method for generating normalized relevance measure for analysis of search results |
US7779001B2 (en) * | 2004-10-29 | 2010-08-17 | Microsoft Corporation | Web page ranking with hierarchical considerations |
US8131647B2 (en) | 2005-01-19 | 2012-03-06 | Amazon Technologies, Inc. | Method and system for providing annotations of a digital work |
US9275052B2 (en) | 2005-01-19 | 2016-03-01 | Amazon Technologies, Inc. | Providing annotations of a digital work |
US7574530B2 (en) | 2005-03-10 | 2009-08-11 | Microsoft Corporation | Method and system for web resource location classification and detection |
US8606781B2 (en) * | 2005-04-29 | 2013-12-10 | Palo Alto Research Center Incorporated | Systems and methods for personalized search |
US7627564B2 (en) * | 2005-06-21 | 2009-12-01 | Microsoft Corporation | High scale adaptive search systems and methods |
US20070005588A1 (en) * | 2005-07-01 | 2007-01-04 | Microsoft Corporation | Determining relevance using queries as surrogate content |
US7630964B2 (en) * | 2005-11-14 | 2009-12-08 | Microsoft Corporation | Determining relevance of documents to a query based on identifier distance |
US8005816B2 (en) * | 2006-03-01 | 2011-08-23 | Oracle International Corporation | Auto generation of suggested links in a search system |
US8027982B2 (en) | 2006-03-01 | 2011-09-27 | Oracle International Corporation | Self-service sources for secure search |
US7606875B2 (en) * | 2006-03-28 | 2009-10-20 | Microsoft Corporation | Detecting serving area of a web resource |
US8352449B1 (en) * | 2006-03-29 | 2013-01-08 | Amazon Technologies, Inc. | Reader device content indexing |
US8666821B2 (en) | 2006-08-28 | 2014-03-04 | Microsoft Corporation | Selecting advertisements based on serving area and map area |
US7650431B2 (en) * | 2006-08-28 | 2010-01-19 | Microsoft Corporation | Serving locally relevant advertisements |
US9672533B1 (en) | 2006-09-29 | 2017-06-06 | Amazon Technologies, Inc. | Acquisition of an item based on a catalog presentation of items |
US8725565B1 (en) | 2006-09-29 | 2014-05-13 | Amazon Technologies, Inc. | Expedited acquisition of a digital item following a sample presentation of the item |
US7865817B2 (en) * | 2006-12-29 | 2011-01-04 | Amazon Technologies, Inc. | Invariant referencing in digital works |
US8024400B2 (en) | 2007-09-26 | 2011-09-20 | Oomble, Inc. | Method and system for transferring content from the web to mobile devices |
US7751807B2 (en) | 2007-02-12 | 2010-07-06 | Oomble, Inc. | Method and system for a hosted mobile management service architecture |
US9665529B1 (en) | 2007-03-29 | 2017-05-30 | Amazon Technologies, Inc. | Relative progress and event indicators |
US7716224B2 (en) * | 2007-03-29 | 2010-05-11 | Amazon Technologies, Inc. | Search and indexing on a user device |
US8234282B2 (en) | 2007-05-21 | 2012-07-31 | Amazon Technologies, Inc. | Managing status of search index generation |
US8179915B2 (en) * | 2007-06-28 | 2012-05-15 | Lantiq Deutschland Gmbh | System and method for transmitting and retransmitting data |
US7979321B2 (en) | 2007-07-25 | 2011-07-12 | Ebay Inc. | Merchandising items of topical interest |
WO2009052534A1 (en) * | 2007-10-15 | 2009-04-23 | Chacha Search, Inc | Method and system of promoting human-assisted search |
US8332411B2 (en) * | 2007-10-19 | 2012-12-11 | Microsoft Corporation | Boosting a ranker for improved ranking accuracy |
US7779019B2 (en) * | 2007-10-19 | 2010-08-17 | Microsoft Corporation | Linear combination of rankers |
US8370372B2 (en) * | 2007-11-05 | 2013-02-05 | Jones Scott A | Method and system of promoting human-assisted search |
US8271357B2 (en) * | 2007-12-11 | 2012-09-18 | Ebay Inc. | Presenting items based on activity rates |
US8117060B2 (en) * | 2007-12-20 | 2012-02-14 | Ebay Inc. | Geographic demand distribution and forecast |
US20090164929A1 (en) * | 2007-12-20 | 2009-06-25 | Microsoft Corporation | Customizing Search Results |
US8010535B2 (en) * | 2008-03-07 | 2011-08-30 | Microsoft Corporation | Optimization of discontinuous rank metrics |
US8417694B2 (en) * | 2008-03-31 | 2013-04-09 | International Business Machines Corporation | System and method for constructing targeted ranking from multiple information sources |
US20090259620A1 (en) * | 2008-04-11 | 2009-10-15 | Ahene Nii A | Method and system for real-time data searches |
US8423889B1 (en) | 2008-06-05 | 2013-04-16 | Amazon Technologies, Inc. | Device specific presentation control for electronic book reader devices |
US20100131489A1 (en) * | 2008-11-24 | 2010-05-27 | Samsung Electronics Co., Ltd. | Personalized mobile search |
US8392429B1 (en) | 2008-11-26 | 2013-03-05 | Google Inc. | Informational book query |
US20100174719A1 (en) * | 2009-01-06 | 2010-07-08 | Jorge Alegre Vilches | System, method, and program product for personalization of an open network search engine |
US9087032B1 (en) | 2009-01-26 | 2015-07-21 | Amazon Technologies, Inc. | Aggregation of highlights |
US8378979B2 (en) | 2009-01-27 | 2013-02-19 | Amazon Technologies, Inc. | Electronic device with haptic feedback |
US8832584B1 (en) | 2009-03-31 | 2014-09-09 | Amazon Technologies, Inc. | Questions on highlighted passages |
US8692763B1 (en) | 2009-09-28 | 2014-04-08 | John T. Kim | Last screen rendering for electronic book reader |
US9495322B1 (en) | 2010-09-21 | 2016-11-15 | Amazon Technologies, Inc. | Cover display |
US9158741B1 (en) | 2011-10-28 | 2015-10-13 | Amazon Technologies, Inc. | Indicators for navigating digital works |
US11397745B1 (en) | 2019-03-26 | 2022-07-26 | Grant Carter Hemingway | System and method for determining rankings, searching, and generating reports of profiles and personal information |
Citations (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5983227A (en) * | 1997-06-12 | 1999-11-09 | Yahoo, Inc. | Dynamic page generator |
US6112203A (en) * | 1998-04-09 | 2000-08-29 | Altavista Company | Method for ranking documents in a hyperlinked environment using connectivity and selective content analysis |
US6169986B1 (en) * | 1998-06-15 | 2001-01-02 | Amazon.Com, Inc. | System and method for refining search queries |
US6285999B1 (en) * | 1997-01-10 | 2001-09-04 | The Board Of Trustees Of The Leland Stanford Junior University | Method for node ranking in a linked database |
US20020078045A1 (en) * | 2000-12-14 | 2002-06-20 | Rabindranath Dutta | System, method, and program for ranking search results using user category weighting |
US20020198869A1 (en) * | 2001-06-20 | 2002-12-26 | Barnett Russell Clark | Metasearch technique that ranks documents obtained from multiple collections |
US6553364B1 (en) * | 1997-11-03 | 2003-04-22 | Yahoo! Inc. | Information retrieval from hierarchical compound documents |
US20030088545A1 (en) * | 2001-06-18 | 2003-05-08 | Pavitra Subramaniam | System and method to implement a persistent and dismissible search center frame |
US20030177116A1 (en) * | 2002-02-28 | 2003-09-18 | Yasushi Ogawa | System and method for document retrieval |
US6701312B2 (en) * | 2001-09-12 | 2004-03-02 | Science Applications International Corporation | Data ranking with a Lorentzian fuzzy score |
US6738764B2 (en) * | 2001-05-08 | 2004-05-18 | Verity, Inc. | Apparatus and method for adaptively ranking search results |
US20040103075A1 (en) * | 2002-11-22 | 2004-05-27 | International Business Machines Corporation | International information search and delivery system providing search results personalized to a particular natural language |
US6772150B1 (en) * | 1999-12-10 | 2004-08-03 | Amazon.Com, Inc. | Search query refinement using related search phrases |
US6778997B2 (en) * | 2001-01-05 | 2004-08-17 | International Business Machines Corporation | XML: finding authoritative pages for mining communities based on page structure criteria |
US6785671B1 (en) * | 1999-12-08 | 2004-08-31 | Amazon.Com, Inc. | System and method for locating web-based product offerings |
US20050038894A1 (en) * | 2003-08-15 | 2005-02-17 | Hsu Frederick Weider | Internet domain keyword optimization |
US20050071776A1 (en) * | 2002-01-31 | 2005-03-31 | Mansfield Steven M | Multifunction hyperlink and methods of producing multifunction hyperlinks |
US20050071465A1 (en) * | 2003-09-30 | 2005-03-31 | Microsoft Corporation | Implicit links search enhancement system and method for search engines using implicit links generated by mining user access patterns |
US6901399B1 (en) * | 1997-07-22 | 2005-05-31 | Microsoft Corporation | System for processing textual inputs using natural language processing techniques |
US6911475B1 (en) * | 1999-09-02 | 2005-06-28 | Assistance Publique-Hopitaux De Paris | Use of nicotine or its derivatives in a drug for treating neurological disease, in particular Parkinson's disease |
US20050149538A1 (en) * | 2003-11-20 | 2005-07-07 | Sadanand Singh | Systems and methods for creating and publishing relational data bases |
US7039631B1 (en) * | 2002-05-24 | 2006-05-02 | Microsoft Corporation | System and method for providing search results with configurable scoring formula |
-
2004
- 2004-05-07 US US10/841,391 patent/US7257577B2/en not_active Expired - Lifetime
Patent Citations (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6285999B1 (en) * | 1997-01-10 | 2001-09-04 | The Board Of Trustees Of The Leland Stanford Junior University | Method for node ranking in a linked database |
US5983227A (en) * | 1997-06-12 | 1999-11-09 | Yahoo, Inc. | Dynamic page generator |
US6901399B1 (en) * | 1997-07-22 | 2005-05-31 | Microsoft Corporation | System for processing textual inputs using natural language processing techniques |
US6553364B1 (en) * | 1997-11-03 | 2003-04-22 | Yahoo! Inc. | Information retrieval from hierarchical compound documents |
US6112203A (en) * | 1998-04-09 | 2000-08-29 | Altavista Company | Method for ranking documents in a hyperlinked environment using connectivity and selective content analysis |
US6169986B1 (en) * | 1998-06-15 | 2001-01-02 | Amazon.Com, Inc. | System and method for refining search queries |
US6911475B1 (en) * | 1999-09-02 | 2005-06-28 | Assistance Publique-Hopitaux De Paris | Use of nicotine or its derivatives in a drug for treating neurological disease, in particular Parkinson's disease |
US6785671B1 (en) * | 1999-12-08 | 2004-08-31 | Amazon.Com, Inc. | System and method for locating web-based product offerings |
US6772150B1 (en) * | 1999-12-10 | 2004-08-03 | Amazon.Com, Inc. | Search query refinement using related search phrases |
US20020078045A1 (en) * | 2000-12-14 | 2002-06-20 | Rabindranath Dutta | System, method, and program for ranking search results using user category weighting |
US6778997B2 (en) * | 2001-01-05 | 2004-08-17 | International Business Machines Corporation | XML: finding authoritative pages for mining communities based on page structure criteria |
US6738764B2 (en) * | 2001-05-08 | 2004-05-18 | Verity, Inc. | Apparatus and method for adaptively ranking search results |
US20030088545A1 (en) * | 2001-06-18 | 2003-05-08 | Pavitra Subramaniam | System and method to implement a persistent and dismissible search center frame |
US20020198869A1 (en) * | 2001-06-20 | 2002-12-26 | Barnett Russell Clark | Metasearch technique that ranks documents obtained from multiple collections |
US6701312B2 (en) * | 2001-09-12 | 2004-03-02 | Science Applications International Corporation | Data ranking with a Lorentzian fuzzy score |
US20050071776A1 (en) * | 2002-01-31 | 2005-03-31 | Mansfield Steven M | Multifunction hyperlink and methods of producing multifunction hyperlinks |
US20030177116A1 (en) * | 2002-02-28 | 2003-09-18 | Yasushi Ogawa | System and method for document retrieval |
US7039631B1 (en) * | 2002-05-24 | 2006-05-02 | Microsoft Corporation | System and method for providing search results with configurable scoring formula |
US20040103075A1 (en) * | 2002-11-22 | 2004-05-27 | International Business Machines Corporation | International information search and delivery system providing search results personalized to a particular natural language |
US20050038894A1 (en) * | 2003-08-15 | 2005-02-17 | Hsu Frederick Weider | Internet domain keyword optimization |
US20050071465A1 (en) * | 2003-09-30 | 2005-03-31 | Microsoft Corporation | Implicit links search enhancement system and method for search engines using implicit links generated by mining user access patterns |
US20050149538A1 (en) * | 2003-11-20 | 2005-07-07 | Sadanand Singh | Systems and methods for creating and publishing relational data bases |
Cited By (186)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7603616B2 (en) | 2000-01-28 | 2009-10-13 | Microsoft Corporation | Proxy server using a statistical model |
US20100114862A1 (en) * | 2002-10-29 | 2010-05-06 | Ogs Limited | Method and apparatus for generating a ranked index of web pages |
US7990878B2 (en) | 2002-10-29 | 2011-08-02 | E-Therapeutics Plc | Identifying components of a network having high importance for network integrity |
US9002658B2 (en) | 2002-10-29 | 2015-04-07 | E-Therapeutics Plc | Identifying components of a network having high importance for network integrity |
US20100022752A1 (en) * | 2002-10-29 | 2010-01-28 | Young Malcolm P | Identifying components of a network having high importance for network integrity |
US8125922B2 (en) | 2002-10-29 | 2012-02-28 | Searchbolt Limited | Method and apparatus for generating a ranked index of web pages |
US20100048870A1 (en) * | 2002-10-29 | 2010-02-25 | Young Malcolm P | Identifying components of a network having high importance for network integrity |
US8301391B2 (en) | 2002-10-29 | 2012-10-30 | E-Therapeutics Plc | Identifying components of a network having high importance for network integrity |
US7584221B2 (en) | 2004-03-18 | 2009-09-01 | Microsoft Corporation | Field weighting in text searching |
US20150142969A1 (en) * | 2004-07-09 | 2015-05-21 | Mercury Kingdom Assets Limited | Web page performance scoring |
US9374284B2 (en) * | 2004-07-09 | 2016-06-21 | Mercury Kingdom Assets Limited | Web page performance scoring |
US8843486B2 (en) | 2004-09-27 | 2014-09-23 | Microsoft Corporation | System and method for scoping searches using index keys |
US20100268707A1 (en) * | 2004-09-30 | 2010-10-21 | Microsoft Corporation | System and method for ranking search results using click distance |
US7739277B2 (en) | 2004-09-30 | 2010-06-15 | Microsoft Corporation | System and method for incorporating anchor text into ranking search results |
US7761448B2 (en) * | 2004-09-30 | 2010-07-20 | Microsoft Corporation | System and method for ranking search results using click distance |
US7827181B2 (en) * | 2004-09-30 | 2010-11-02 | Microsoft Corporation | Click distance determination |
US8082246B2 (en) * | 2004-09-30 | 2011-12-20 | Microsoft Corporation | System and method for ranking search results using click distance |
US7779012B2 (en) * | 2004-10-15 | 2010-08-17 | Microsoft Corporation | Method and apparatus for intranet searching |
US9507828B2 (en) * | 2004-10-15 | 2016-11-29 | Microsoft Technology Licensing, Llc | Method and apparatus for intranet searching |
US8595223B2 (en) * | 2004-10-15 | 2013-11-26 | Microsoft Corporation | Method and apparatus for intranet searching |
US20140081947A1 (en) * | 2004-10-15 | 2014-03-20 | Microsoft Corporation | Method and apparatus for intranet searching |
US20060085397A1 (en) * | 2004-10-15 | 2006-04-20 | Microsoft Corporation | Method and apparatus for intranet searching |
US20060085427A1 (en) * | 2004-10-15 | 2006-04-20 | Microsoft Corporation | Method and apparatus for intranet searching |
US20060085447A1 (en) * | 2004-10-15 | 2006-04-20 | Microsoft Corporation | Method and apparatus for intranet searching |
US7716198B2 (en) | 2004-12-21 | 2010-05-11 | Microsoft Corporation | Ranking search results using feature extraction |
US8650197B2 (en) * | 2004-12-30 | 2014-02-11 | Google Inc. | Authoritative document identification |
US20120173544A1 (en) * | 2004-12-30 | 2012-07-05 | Google Inc. | Authoritative document identification |
US8117200B1 (en) * | 2005-01-14 | 2012-02-14 | Wal-Mart Stores, Inc. | Parallelizing graph computations |
US7792833B2 (en) | 2005-03-03 | 2010-09-07 | Microsoft Corporation | Ranking search results using language types |
US20060242137A1 (en) * | 2005-04-21 | 2006-10-26 | Microsoft Corporation | Full text search of schematized data |
US20060248066A1 (en) * | 2005-04-28 | 2006-11-02 | Microsoft Corporation | System and method for optimizing search results through equivalent results collapsing |
US7599917B2 (en) * | 2005-08-15 | 2009-10-06 | Microsoft Corporation | Ranking search results using biased click distance |
US20070118521A1 (en) * | 2005-11-18 | 2007-05-24 | Adam Jatowt | Page reranking system and page reranking program to improve search result |
US20070130205A1 (en) * | 2005-12-05 | 2007-06-07 | Microsoft Corporation | Metadata driven user interface |
US8095565B2 (en) | 2005-12-05 | 2012-01-10 | Microsoft Corporation | Metadata driven user interface |
US20070143437A1 (en) * | 2005-12-16 | 2007-06-21 | Microsoft Corporation | Global and local entity naming |
US7739356B2 (en) * | 2005-12-16 | 2010-06-15 | Microsoft Corporation | Global and local entity naming |
US7689559B2 (en) | 2006-02-08 | 2010-03-30 | Telenor Asa | Document similarity scoring and ranking method, device and computer program product |
WO2007091896A1 (en) * | 2006-02-08 | 2007-08-16 | Telenor Asa | Document similarity scoring and ranking method, device and computer program product |
US7844595B2 (en) | 2006-02-08 | 2010-11-30 | Telenor Asa | Document similarity scoring and ranking method, device and computer program product |
US20070198504A1 (en) * | 2006-02-23 | 2007-08-23 | Microsoft Corporation | Calculating level-based importance of a web page |
US8433712B2 (en) | 2006-03-01 | 2013-04-30 | Oracle International Corporation | Link analysis for enterprise environment |
US9479494B2 (en) | 2006-03-01 | 2016-10-25 | Oracle International Corporation | Flexible authentication framework |
US11038867B2 (en) | 2006-03-01 | 2021-06-15 | Oracle International Corporation | Flexible framework for secure search |
US9081816B2 (en) | 2006-03-01 | 2015-07-14 | Oracle International Corporation | Propagating user identities in a secure federated search system |
US8875249B2 (en) | 2006-03-01 | 2014-10-28 | Oracle International Corporation | Minimum lifespan credentials for crawling data repositories |
US8352475B2 (en) | 2006-03-01 | 2013-01-08 | Oracle International Corporation | Suggested content with attribute parameterization |
US9853962B2 (en) | 2006-03-01 | 2017-12-26 | Oracle International Corporation | Flexible authentication framework |
US20080071766A1 (en) * | 2006-03-01 | 2008-03-20 | Semdirector, Inc. | Centralized web-based software solutions for search engine optimization |
US8868540B2 (en) | 2006-03-01 | 2014-10-21 | Oracle International Corporation | Method for suggesting web links and alternate terms for matching search queries |
US9177124B2 (en) | 2006-03-01 | 2015-11-03 | Oracle International Corporation | Flexible authentication framework |
US8725770B2 (en) | 2006-03-01 | 2014-05-13 | Oracle International Corporation | Secure search performance improvement |
US10382421B2 (en) | 2006-03-01 | 2019-08-13 | Oracle International Corporation | Flexible framework for secure search |
US8707451B2 (en) | 2006-03-01 | 2014-04-22 | Oracle International Corporation | Search hit URL modification for secure application integration |
US9467437B2 (en) | 2006-03-01 | 2016-10-11 | Oracle International Corporation | Flexible authentication framework |
US8595255B2 (en) | 2006-03-01 | 2013-11-26 | Oracle International Corporation | Propagating user identities in a secure federated search system |
US8601028B2 (en) | 2006-03-01 | 2013-12-03 | Oracle International Corporation | Crawling secure data sources |
US8626794B2 (en) | 2006-03-01 | 2014-01-07 | Oracle International Corporation | Indexing secure enterprise documents using generic references |
US7877392B2 (en) | 2006-03-01 | 2011-01-25 | Covario, Inc. | Centralized web-based software solutions for search engine optimization |
US9251364B2 (en) | 2006-03-01 | 2016-02-02 | Oracle International Corporation | Search hit URL modification for secure application integration |
US20070208730A1 (en) * | 2006-03-02 | 2007-09-06 | Microsoft Corporation | Mining web search user behavior to enhance web search relevance |
US7676464B2 (en) | 2006-03-17 | 2010-03-09 | International Business Machines Corporation | Page-ranking via user expertise and content relevance |
US20070266025A1 (en) * | 2006-05-12 | 2007-11-15 | Microsoft Corporation | Implicit tokenized result ranking |
US8555182B2 (en) | 2006-06-07 | 2013-10-08 | Microsoft Corporation | Interface for managing search term importance relationships |
US20070288498A1 (en) * | 2006-06-07 | 2007-12-13 | Microsoft Corporation | Interface for managing search term importance relationships |
US8972379B1 (en) | 2006-08-25 | 2015-03-03 | Riosoft Holdings, Inc. | Centralized web-based software solution for search engine optimization |
US8838560B2 (en) | 2006-08-25 | 2014-09-16 | Covario, Inc. | System and method for measuring the effectiveness of an on-line advertisement campaign |
US8473495B2 (en) | 2006-08-25 | 2013-06-25 | Covario, Inc. | Centralized web-based software solution for search engine optimization |
US8943039B1 (en) | 2006-08-25 | 2015-01-27 | Riosoft Holdings, Inc. | Centralized web-based software solution for search engine optimization |
US20080071767A1 (en) * | 2006-08-25 | 2008-03-20 | Semdirector, Inc. | System and method for measuring the effectiveness of an on-line advertisement campaign |
US20080126331A1 (en) * | 2006-08-25 | 2008-05-29 | Xerox Corporation | System and method for ranking reference documents |
US20080071929A1 (en) * | 2006-09-18 | 2008-03-20 | Yann Emmanuel Motte | Methods and apparatus for selection of information and web page generation |
US8966389B2 (en) | 2006-09-22 | 2015-02-24 | Limelight Networks, Inc. | Visual interface for identifying positions of interest within a sequentially ordered information encoding |
US8396878B2 (en) | 2006-09-22 | 2013-03-12 | Limelight Networks, Inc. | Methods and systems for generating automated tags for video files |
US20080077583A1 (en) * | 2006-09-22 | 2008-03-27 | Pluggd Inc. | Visual interface for identifying positions of interest within a sequentially ordered information encoding |
US9015172B2 (en) | 2006-09-22 | 2015-04-21 | Limelight Networks, Inc. | Method and subsystem for searching media content within a content-search service system |
US9953049B1 (en) | 2006-10-12 | 2018-04-24 | Google Llc | Producing a ranking for pages using distances in a web-link graph |
US9165040B1 (en) * | 2006-10-12 | 2015-10-20 | Google Inc. | Producing a ranking for pages using distances in a web-link graph |
US20080104113A1 (en) * | 2006-10-26 | 2008-05-01 | Microsoft Corporation | Uniform resource locator scoring for targeted web crawling |
US7672943B2 (en) | 2006-10-26 | 2010-03-02 | Microsoft Corporation | Calculating a downloading priority for the uniform resource locator in response to the domain density score, the anchor text score, the URL string score, the category need score, and the link proximity score for targeted web crawling |
WO2008070744A3 (en) * | 2006-12-05 | 2009-01-15 | Covario Inc | Centralized web-based software solution for search engine optimization |
WO2008070744A2 (en) * | 2006-12-05 | 2008-06-12 | Covario, Inc. | Centralized web-based software solution for search engine optimization |
US20080147641A1 (en) * | 2006-12-14 | 2008-06-19 | Dean Leffingwell | Method for prioritizing search results retrieved in response to a computerized search query |
US20080147578A1 (en) * | 2006-12-14 | 2008-06-19 | Dean Leffingwell | System for prioritizing search results retrieved in response to a computerized search query |
KR101298334B1 (en) * | 2006-12-19 | 2013-08-20 | 야후! 인크. | Techniques for including collection items in search results |
US20080147640A1 (en) * | 2006-12-19 | 2008-06-19 | Schachter Joshua E | Techniques for including collection items in search results |
US9009164B2 (en) | 2006-12-19 | 2015-04-14 | Yahoo! Inc. | Techniques for including collection items in search results |
US7958126B2 (en) * | 2006-12-19 | 2011-06-07 | Yahoo! Inc. | Techniques for including collection items in search results |
US9576055B2 (en) | 2006-12-19 | 2017-02-21 | Yahoo! | Techniques for including collection items in search results |
US20110238675A1 (en) * | 2006-12-19 | 2011-09-29 | Schachter Joshua E | Techniques for including collection items in search results |
US20100299317A1 (en) * | 2006-12-20 | 2010-11-25 | Victor David Uy | Method of displaying a subjective score with search engine results |
WO2008074152A1 (en) * | 2006-12-20 | 2008-06-26 | Ma, Gary, Manchoir | Method of displaying a subjective score with search engine results |
US9311401B2 (en) | 2006-12-20 | 2016-04-12 | Victor David Uy | Method of displaying a subjective score with search engine results |
US8005784B2 (en) | 2007-03-07 | 2011-08-23 | Microsoft Corporation | Supervised rank aggregation based on rankings |
US7840522B2 (en) | 2007-03-07 | 2010-11-23 | Microsoft Corporation | Supervised rank aggregation based on rankings |
US20110029466A1 (en) * | 2007-03-07 | 2011-02-03 | Microsoft Corporation | Supervised rank aggregation based on rankings |
US20080228719A1 (en) * | 2007-03-13 | 2008-09-18 | Fatdoor, Inc. | People and business search result optimization |
US9934313B2 (en) | 2007-03-14 | 2018-04-03 | Fiver Llc | Query templates and labeled search tip system, methods and techniques |
US20080243812A1 (en) * | 2007-03-30 | 2008-10-02 | Microsoft Corporation | Ranking method using hyperlinks in blogs |
US8346763B2 (en) | 2007-03-30 | 2013-01-01 | Microsoft Corporation | Ranking method using hyperlinks in blogs |
WO2008121144A1 (en) * | 2007-04-02 | 2008-10-09 | University Of Washington | Open information extraction from the web |
US8938410B2 (en) | 2007-04-02 | 2015-01-20 | University Of Washington Through Its Center For Commercialization | Open information extraction from the web |
US20110191276A1 (en) * | 2007-04-02 | 2011-08-04 | University Of Washington Through Its Center For Commercialization | Open information extraction from the web |
US7877343B2 (en) | 2007-04-02 | 2011-01-25 | University Of Washington Through Its Center For Commercialization | Open information extraction from the Web |
US8234272B2 (en) * | 2007-05-04 | 2012-07-31 | Sony Mobile Communications Ab | Searching and ranking contacts in contact database |
US8494998B2 (en) * | 2007-05-04 | 2013-07-23 | Microsoft Corporation | Link spam detection using smooth classification function |
US20080275865A1 (en) * | 2007-05-04 | 2008-11-06 | Sony Ericsson Mobile Communications Ab | Searching and ranking contacts in contact database |
US20110282816A1 (en) * | 2007-05-04 | 2011-11-17 | Microsoft Corporation | Link spam detection using smooth classification function |
US8805754B2 (en) | 2007-05-04 | 2014-08-12 | Microsoft Corporation | Link spam detection using smooth classification function |
US20080313167A1 (en) * | 2007-06-15 | 2008-12-18 | Jim Anderson | System And Method For Intelligently Indexing Internet Resources |
US20110258184A1 (en) * | 2007-06-27 | 2011-10-20 | Oracle International Corporation | Changing ranking algorithms based on customer settings |
US8412717B2 (en) * | 2007-06-27 | 2013-04-02 | Oracle International Corporation | Changing ranking algorithms based on customer settings |
US8316007B2 (en) | 2007-06-28 | 2012-11-20 | Oracle International Corporation | Automatically finding acronyms and synonyms in a corpus |
US8380734B2 (en) | 2007-09-04 | 2013-02-19 | Google Inc. | Word decompounder |
US8046355B2 (en) * | 2007-09-04 | 2011-10-25 | Google Inc. | Word decompounder |
US20090063462A1 (en) * | 2007-09-04 | 2009-03-05 | Google Inc. | Word decompounder |
US8117208B2 (en) | 2007-09-21 | 2012-02-14 | The Board Of Trustees Of The University Of Illinois | System for entity search and a method for entity scoring in a linked document database |
US8204891B2 (en) | 2007-09-21 | 2012-06-19 | Limelight Networks, Inc. | Method and subsystem for searching media content within a content-search-service system |
US20090083248A1 (en) * | 2007-09-21 | 2009-03-26 | Microsoft Corporation | Multi-Ranker For Search |
US20090083262A1 (en) * | 2007-09-21 | 2009-03-26 | Kevin Chen-Chuan Chang | System for entity search and a method for entity scoring in a linked document database |
US20090083256A1 (en) * | 2007-09-21 | 2009-03-26 | Pluggd, Inc | Method and subsystem for searching media content within a content-search-service system |
WO2009039392A1 (en) * | 2007-09-21 | 2009-03-26 | The Board Of Trustees Of The University Of Illinois | A system for entity search and a method for entity scoring in a linked document database |
US7917492B2 (en) * | 2007-09-21 | 2011-03-29 | Limelight Networks, Inc. | Method and subsystem for information acquisition and aggregation to facilitate ontology and language-model generation within a content-search-service system |
US8122015B2 (en) | 2007-09-21 | 2012-02-21 | Microsoft Corporation | Multi-ranker for search |
US20090083257A1 (en) * | 2007-09-21 | 2009-03-26 | Pluggd, Inc | Method and subsystem for information acquisition and aggregation to facilitate ontology and language-model generation within a content-search-service system |
US9471670B2 (en) | 2007-10-17 | 2016-10-18 | Vcvc Iii Llc | NLP-based content recommender |
US9613004B2 (en) | 2007-10-17 | 2017-04-04 | Vcvc Iii Llc | NLP-based entity recognition and disambiguation |
US10282389B2 (en) | 2007-10-17 | 2019-05-07 | Fiver Llc | NLP-based entity recognition and disambiguation |
US9348912B2 (en) | 2007-10-18 | 2016-05-24 | Microsoft Technology Licensing, Llc | Document length as a static relevance feature for ranking search results |
US7840569B2 (en) | 2007-10-18 | 2010-11-23 | Microsoft Corporation | Enterprise relevancy ranking using a neural network |
US8099417B2 (en) | 2007-12-12 | 2012-01-17 | Microsoft Corporation | Semi-supervised part-of-speech tagging |
US20090157643A1 (en) * | 2007-12-12 | 2009-06-18 | Microsoft Corporation | Semi-supervised part-of-speech tagging |
US20090234824A1 (en) * | 2008-03-14 | 2009-09-17 | International Business Machines Corporation | Browser Use of Directory Listing for Predictive Type-Ahead |
US8306987B2 (en) * | 2008-04-03 | 2012-11-06 | Ofer Ber | System and method for matching search requests and relevant data |
US20090254543A1 (en) * | 2008-04-03 | 2009-10-08 | Ofer Ber | System and method for matching search requests and relevant data |
US8812493B2 (en) | 2008-04-11 | 2014-08-19 | Microsoft Corporation | Search results ranking using editing distance and document information |
US8504547B1 (en) * | 2008-04-23 | 2013-08-06 | Google Inc. | Customizing image search for user attributes |
US8782029B1 (en) | 2008-04-23 | 2014-07-15 | Google Inc. | Customizing image search for user attributes |
US9146997B2 (en) | 2008-04-23 | 2015-09-29 | Google Inc. | Customizing image search for user attributes |
US20140222792A1 (en) * | 2008-06-18 | 2014-08-07 | Dirk H. Groeneveld | Name search using a ranking function |
US9727639B2 (en) * | 2008-06-18 | 2017-08-08 | Microsoft Technology Licensing, Llc | Name search using a ranking function |
US20100070495A1 (en) * | 2008-09-12 | 2010-03-18 | International Business Machines Corporation | Fast-approximate tfidf |
US7730061B2 (en) * | 2008-09-12 | 2010-06-01 | International Business Machines Corporation | Fast-approximate TFIDF |
EP2169568A1 (en) | 2008-09-17 | 2010-03-31 | OGS Search Limited | Method and apparatus for generating a ranked index of web pages |
US8332259B1 (en) * | 2008-11-03 | 2012-12-11 | Intuit Inc. | Method and system for evaluating expansion of a business |
US8396742B1 (en) | 2008-12-05 | 2013-03-12 | Covario, Inc. | System and method for optimizing paid search advertising campaigns based on natural search traffic |
US8706548B1 (en) | 2008-12-05 | 2014-04-22 | Covario, Inc. | System and method for optimizing paid search advertising campaigns based on natural search traffic |
US20100162093A1 (en) * | 2008-12-18 | 2010-06-24 | Google Inc. | Identifying comments to show in connection with a document |
US8656266B2 (en) * | 2008-12-18 | 2014-02-18 | Google Inc. | Identifying comments to show in connection with a document |
US20100205183A1 (en) * | 2009-02-12 | 2010-08-12 | Yahoo!, Inc., a Delaware corporation | Method and system for performing selective decoding of search result messages |
US20120197879A1 (en) * | 2009-07-20 | 2012-08-02 | Lexisnexis | Fuzzy proximity boosting and influence kernels |
US8818999B2 (en) * | 2009-07-20 | 2014-08-26 | Lexisnexis | Fuzzy proximity boosting and influence kernels |
US8838619B1 (en) | 2009-08-12 | 2014-09-16 | Google Inc. | Ranking authors and their content in the same framework |
US8396879B1 (en) | 2009-08-12 | 2013-03-12 | Google Inc. | Ranking authors and their content in the same framework |
US9875313B1 (en) | 2009-08-12 | 2018-01-23 | Google Llc | Ranking authors and their content in the same framework |
US8150860B1 (en) * | 2009-08-12 | 2012-04-03 | Google Inc. | Ranking authors and their content in the same framework |
US20110191687A1 (en) * | 2010-01-29 | 2011-08-04 | Kabushiki Kaisha Toshiba | Mobile terminal |
US20110282888A1 (en) * | 2010-03-01 | 2011-11-17 | Evri, Inc. | Content recommendation based on collections of entities |
US9710556B2 (en) * | 2010-03-01 | 2017-07-18 | Vcvc Iii Llc | Content recommendation based on collections of entities |
US10331783B2 (en) | 2010-03-30 | 2019-06-25 | Fiver Llc | NLP-based systems and methods for providing quotations |
WO2011129993A1 (en) * | 2010-04-14 | 2011-10-20 | Raytheon Company | Relevance-based open source intelligence (osint) collection |
US8738635B2 (en) | 2010-06-01 | 2014-05-27 | Microsoft Corporation | Detection of junk in search result ranking |
US10049150B2 (en) | 2010-11-01 | 2018-08-14 | Fiver Llc | Category-based content recommendation |
US8793706B2 (en) | 2010-12-16 | 2014-07-29 | Microsoft Corporation | Metadata-based eventing supporting operations on data |
US10102222B2 (en) * | 2010-12-30 | 2018-10-16 | Google Llc | Semantic geotokens |
US20170169055A1 (en) * | 2010-12-30 | 2017-06-15 | Google Inc. | Semantic geotokens |
US20130110815A1 (en) * | 2011-10-28 | 2013-05-02 | Microsoft Corporation | Generating and presenting deep links |
US9454582B1 (en) * | 2011-10-31 | 2016-09-27 | Google Inc. | Ranking search results |
US9495462B2 (en) | 2012-01-27 | 2016-11-15 | Microsoft Technology Licensing, Llc | Re-ranking search results |
US20160094676A1 (en) * | 2012-07-25 | 2016-03-31 | Oracle International Corporation | Heuristic caching to personalize applications |
US10372781B2 (en) * | 2012-07-25 | 2019-08-06 | Oracle International Corporation | Heuristic caching to personalize applications |
US8949216B2 (en) | 2012-12-07 | 2015-02-03 | International Business Machines Corporation | Determining characteristic parameters for web pages |
US20140214814A1 (en) * | 2013-01-29 | 2014-07-31 | Sriram Sankar | Ranking search results using diversity groups |
US10032234B2 (en) * | 2013-01-29 | 2018-07-24 | Facebook, Inc. | Ranking search results using diversity groups |
WO2014184785A3 (en) * | 2013-05-16 | 2015-04-16 | Yandex Europe Ag | Method and system for presenting image information to a user of a client device |
US20150379140A1 (en) * | 2014-06-30 | 2015-12-31 | Google Inc. | Surfacing in-depth articles in search results |
US9996624B2 (en) * | 2014-06-30 | 2018-06-12 | Google Llc | Surfacing in-depth articles in search results |
US10127230B2 (en) * | 2015-05-01 | 2018-11-13 | Microsoft Technology Licensing, Llc | Dynamic content suggestion in sparse traffic environment |
US20160321250A1 (en) * | 2015-05-01 | 2016-11-03 | Microsoft Technology Licensing, Llc | Dynamic content suggestion in sparse traffic environment |
US20220076810A1 (en) * | 2017-05-25 | 2022-03-10 | Enlitic, Inc. | Medical scan interface feature evaluating system and methods for use therewith |
US10606851B1 (en) * | 2018-09-10 | 2020-03-31 | Palantir Technologies Inc. | Intelligent compute request scoring and routing |
US12229150B2 (en) | 2018-09-10 | 2025-02-18 | Palantir Technologies Inc. | Intelligent compute request scoring and routing |
US11068943B2 (en) | 2018-10-23 | 2021-07-20 | International Business Machines Corporation | Generating collaborative orderings of information pertaining to products to present to target users |
US11263209B2 (en) * | 2019-04-25 | 2022-03-01 | Chevron U.S.A. Inc. | Context-sensitive feature score generation |
CN111522533A (en) * | 2020-04-24 | 2020-08-11 | 中国标准化研究院 | Product modular design method and device recommended based on user's personalized needs |
WO2023023099A1 (en) * | 2021-08-16 | 2023-02-23 | Elasticsearch B.V. | Search query refinement using generated keyword triggers |
Also Published As
Publication number | Publication date |
---|---|
US7257577B2 (en) | 2007-08-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7257577B2 (en) | System, method and service for ranking search results using a modular scoring system | |
US6560600B1 (en) | Method and apparatus for ranking Web page search results | |
US8280918B2 (en) | Using link structure for suggesting related queries | |
US5920859A (en) | Hypertext document retrieval system and method | |
US10423668B2 (en) | System, method, and user interface for organization and searching information | |
US7809716B2 (en) | Method and apparatus for establishing relationship between documents | |
Glance | Community search assistant | |
CN1192320C (en) | Cooperative topical servers with automatic prefiltering and routing | |
US8224847B2 (en) | Relevant individual searching using managed property and ranking features | |
JP4249726B2 (en) | Method and system for indexing and searching database groups | |
US7552109B2 (en) | System, method, and service for collaborative focused crawling of documents on a network | |
Mauldin | Lycos: Design choices in an internet search service | |
US20070055680A1 (en) | Method and system for creating a taxonomy from business-oriented metadata content | |
US20040024752A1 (en) | Method and apparatus for search ranking using human input and automated ranking | |
US20050234877A1 (en) | System and method for searching using a temporal dimension | |
US20060129538A1 (en) | Text search quality by exploiting organizational information | |
US7809736B2 (en) | Importance ranking for a hierarchical collection of objects | |
CA2516852A1 (en) | System and method for controlling ranking of pages returned by a search engine | |
US8589391B1 (en) | Method and system for generating web site ratings for a user | |
WO1997049048A1 (en) | Hypertext document retrieval system and method | |
Knoblock | Searching the world wide web | |
US7490082B2 (en) | System and method for searching internet domains | |
Gavankar et al. | Explicit query interpretation and diversification for context-driven concept search across ontologies | |
Maiellaro et al. | Sustainable building on the WWW | |
Grivell | Seek and you shall find? |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FAGIN, RONALD;MCCURLEY, KEVIN SNOW;NOVAK, JASMINE;AND OTHERS;REEL/FRAME:015321/0599;SIGNING DATES FROM 20040502 TO 20040504 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
REMI | Maintenance fee reminder mailed | ||
FPAY | Fee payment |
Year of fee payment: 4 |
|
SULP | Surcharge for late payment | ||
AS | Assignment |
Owner name: GOOGLE INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTERNATIONAL BUSINESS MACHINES CORPORATION;REEL/FRAME:026664/0866 Effective date: 20110503 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
AS | Assignment |
Owner name: GOOGLE LLC, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044127/0735 Effective date: 20170929 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 12 |