US7302645B1 - Methods and systems for identifying manipulated articles - Google Patents
Methods and systems for identifying manipulated articles Download PDFInfo
- Publication number
- US7302645B1 US7302645B1 US10/732,048 US73204803A US7302645B1 US 7302645 B1 US7302645 B1 US 7302645B1 US 73204803 A US73204803 A US 73204803A US 7302645 B1 US7302645 B1 US 7302645B1
- Authority
- US
- United States
- Prior art keywords
- documents
- document
- cluster
- articles
- manipulated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 230000004044 response Effects 0.000 claims description 5
- 238000013459 approach Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 210000003484 anatomy Anatomy 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Creation or modification of classes or clusters
Definitions
- the invention generally relates to manipulated articles. More particularly, the invention relates to methods and systems for identifying manipulated articles.
- a search engine or search engine program is a widely used mechanism for allowing users to search vast numbers of documents for information.
- Automated search engines locate websites by matching terms from a user entered search query to an indexed corpus of web pages.
- a conventional network search engine such as the GoogleTM search engine, returns a result set in response to the search query submitted by the user.
- the search engine performs the search based on a conventional search method. For example, one known method, described in an article entitled “The Anatomy of a Large-Scale Hypertextual Search Engine,” by Sergey Brin and Lawrence Page, assigns a degree of importance to a document, such as a web page, based on the link structure of the web page.
- the processor 110 executes computer-executable program instructions stored in memory 108 .
- Such processors may include a microprocessor, an ASIC, and state machines.
- Such processors include, or may be in communication with, media, for example computer-readable media, which stores instructions that, when executed by the processor, cause the processor to perform the steps described herein.
- Embodiments of computer-readable media include, but are not limited to, an electronic, optical, magnetic, or other storage or transmission device capable of providing a processor, such as the processor 110 of client 102 a , with computer-readable instructions.
- server device 104 may comprise a single physical or logical server, and the cluster processor 130 and manipulation processor 132 may be located external to the search engine 120 .
- the system 100 shown in FIG. 1 is merely exemplary, and is used to explain the exemplary methods shown in FIG. 2 .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
-
- The text of the document—whether the text appears to be normal English (or other language) text or text generated by a computer, such as containing a large number of keywords and not containing any sentences;
- Meta tags—whether the document has meta tags and whether the meta tags contain a large number of repeated keywords;
- Redirect—whether there is any script in the document such as JavaScript or HTML script that redirects a user to another document upon access; Similarly colored text and background—whether there is a large amount of text in the document that is the same color as the background of the document (Systems and methods for detecting hidden text and links in articles are described in U.S. patent application Ser. No. 10/726,483, filed Dec. 4, 2003, which is hereby incorporated by this reference);
- A large number of random links—whether the document contains a large number of unrelated links;
- History of the document—whether the text of the document, the link structure of the document, or the ownership of the website on which the document appears has changed recently (Systems and methods for using historical information in information retrieval are described in U.S. patent application Ser. No. 60/507,617, filed Sep. 30, 2003, which is hereby incorporated by this reference);
- Anchor text—whether there are a lot of links on the page and there is little or no text that is not anchor text.
Claims (22)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/732,048 US7302645B1 (en) | 2003-12-10 | 2003-12-10 | Methods and systems for identifying manipulated articles |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/732,048 US7302645B1 (en) | 2003-12-10 | 2003-12-10 | Methods and systems for identifying manipulated articles |
Publications (1)
Publication Number | Publication Date |
---|---|
US7302645B1 true US7302645B1 (en) | 2007-11-27 |
Family
ID=38722084
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/732,048 Active 2025-09-12 US7302645B1 (en) | 2003-12-10 | 2003-12-10 | Methods and systems for identifying manipulated articles |
Country Status (1)
Country | Link |
---|---|
US (1) | US7302645B1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11269812B2 (en) * | 2019-05-10 | 2022-03-08 | International Business Machines Corporation | Derived relationship for collaboration documents |
Citations (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6006222A (en) | 1997-04-25 | 1999-12-21 | Culliss; Gary | Method for organizing information |
US6014665A (en) | 1997-08-01 | 2000-01-11 | Culliss; Gary | Method for organizing information |
US6038561A (en) * | 1996-10-15 | 2000-03-14 | Manning & Napier Information Services | Management and analysis of document information text |
US6078916A (en) | 1997-08-01 | 2000-06-20 | Culliss; Gary | Method for organizing information |
US6182068B1 (en) | 1997-08-01 | 2001-01-30 | Ask Jeeves, Inc. | Personalized search methods |
US6185559B1 (en) | 1997-05-09 | 2001-02-06 | Hitachi America, Ltd. | Method and apparatus for dynamically counting large itemsets |
US6285999B1 (en) | 1997-01-10 | 2001-09-04 | The Board Of Trustees Of The Leland Stanford Junior University | Method for node ranking in a linked database |
US20020042791A1 (en) | 2000-07-06 | 2002-04-11 | Google, Inc. | Methods and apparatus for using a modified index to provide search results in response to an ambiguous search query |
US20020059708A1 (en) | 2000-07-28 | 2002-05-23 | The Penn State Research Foundation | Process for fabricating hollow electroactive devices |
US20020069114A1 (en) | 2000-12-01 | 2002-06-06 | Charette Phillip Carl | Method and system for placing a purchase order over a communications network |
US20020123988A1 (en) | 2001-03-02 | 2002-09-05 | Google, Inc. | Methods and apparatus for employing usage statistics in document retrieval |
US20020133481A1 (en) | 2000-07-06 | 2002-09-19 | Google, Inc. | Methods and apparatus for providing search results in response to an ambiguous search query |
US6460036B1 (en) * | 1994-11-29 | 2002-10-01 | Pinpoint Incorporated | System and method for providing customized electronic newspapers and target advertisements |
US6526440B1 (en) | 2001-01-30 | 2003-02-25 | Google, Inc. | Ranking search results by reranking the results based on local inter-connectivity |
US6598054B2 (en) * | 1999-01-26 | 2003-07-22 | Xerox Corporation | System and method for clustering data objects in a collection |
US6615209B1 (en) | 2000-02-22 | 2003-09-02 | Google, Inc. | Detecting query-specific duplicate documents |
US6658423B1 (en) | 2001-01-24 | 2003-12-02 | Google, Inc. | Detecting duplicate and near-duplicate files |
US6678681B1 (en) | 1999-03-10 | 2004-01-13 | Google Inc. | Information extraction from a database |
US20040024752A1 (en) | 2002-08-05 | 2004-02-05 | Yahoo! Inc. | Method and apparatus for search ranking using human input and automated ranking |
US6754873B1 (en) | 1999-09-20 | 2004-06-22 | Google Inc. | Techniques for finding related hyperlinked documents using link-based analysis |
US20040122811A1 (en) | 1997-01-10 | 2004-06-24 | Google, Inc. | Method for searching media |
US20040119740A1 (en) | 2002-12-24 | 2004-06-24 | Google, Inc., A Corporation Of The State Of California | Methods and apparatus for displaying and replying to electronic messages |
US20040267725A1 (en) | 2003-06-30 | 2004-12-30 | Harik Georges R | Serving advertisements using a search of advertiser Web information |
US6868525B1 (en) * | 2000-02-01 | 2005-03-15 | Alberti Anemometer Llc | Computer graphic display visualization system and method |
US20050065959A1 (en) | 2003-09-22 | 2005-03-24 | Adam Smith | Systems and methods for clustering search results |
US20050071224A1 (en) | 2003-09-30 | 2005-03-31 | Andrew Fikes | System and method for automatically targeting web-based advertisements |
US20050071741A1 (en) | 2003-09-30 | 2005-03-31 | Anurag Acharya | Information retrieval based on historical data |
US20050114198A1 (en) | 2003-11-24 | 2005-05-26 | Ross Koningstein | Using concepts for ad targeting |
-
2003
- 2003-12-10 US US10/732,048 patent/US7302645B1/en active Active
Patent Citations (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6460036B1 (en) * | 1994-11-29 | 2002-10-01 | Pinpoint Incorporated | System and method for providing customized electronic newspapers and target advertisements |
US6038561A (en) * | 1996-10-15 | 2000-03-14 | Manning & Napier Information Services | Management and analysis of document information text |
US20040122811A1 (en) | 1997-01-10 | 2004-06-24 | Google, Inc. | Method for searching media |
US6285999B1 (en) | 1997-01-10 | 2001-09-04 | The Board Of Trustees Of The Leland Stanford Junior University | Method for node ranking in a linked database |
US6006222A (en) | 1997-04-25 | 1999-12-21 | Culliss; Gary | Method for organizing information |
US6185559B1 (en) | 1997-05-09 | 2001-02-06 | Hitachi America, Ltd. | Method and apparatus for dynamically counting large itemsets |
US6078916A (en) | 1997-08-01 | 2000-06-20 | Culliss; Gary | Method for organizing information |
US6182068B1 (en) | 1997-08-01 | 2001-01-30 | Ask Jeeves, Inc. | Personalized search methods |
US6014665A (en) | 1997-08-01 | 2000-01-11 | Culliss; Gary | Method for organizing information |
US6598054B2 (en) * | 1999-01-26 | 2003-07-22 | Xerox Corporation | System and method for clustering data objects in a collection |
US6678681B1 (en) | 1999-03-10 | 2004-01-13 | Google Inc. | Information extraction from a database |
US6754873B1 (en) | 1999-09-20 | 2004-06-22 | Google Inc. | Techniques for finding related hyperlinked documents using link-based analysis |
US6868525B1 (en) * | 2000-02-01 | 2005-03-15 | Alberti Anemometer Llc | Computer graphic display visualization system and method |
US6615209B1 (en) | 2000-02-22 | 2003-09-02 | Google, Inc. | Detecting query-specific duplicate documents |
US20020042791A1 (en) | 2000-07-06 | 2002-04-11 | Google, Inc. | Methods and apparatus for using a modified index to provide search results in response to an ambiguous search query |
US20020133481A1 (en) | 2000-07-06 | 2002-09-19 | Google, Inc. | Methods and apparatus for providing search results in response to an ambiguous search query |
US6529903B2 (en) | 2000-07-06 | 2003-03-04 | Google, Inc. | Methods and apparatus for using a modified index to provide search results in response to an ambiguous search query |
US20020059708A1 (en) | 2000-07-28 | 2002-05-23 | The Penn State Research Foundation | Process for fabricating hollow electroactive devices |
US20020069114A1 (en) | 2000-12-01 | 2002-06-06 | Charette Phillip Carl | Method and system for placing a purchase order over a communications network |
US6658423B1 (en) | 2001-01-24 | 2003-12-02 | Google, Inc. | Detecting duplicate and near-duplicate files |
US6526440B1 (en) | 2001-01-30 | 2003-02-25 | Google, Inc. | Ranking search results by reranking the results based on local inter-connectivity |
US6725259B1 (en) | 2001-01-30 | 2004-04-20 | Google Inc. | Ranking search results by reranking the results based on local inter-connectivity |
US20020123988A1 (en) | 2001-03-02 | 2002-09-05 | Google, Inc. | Methods and apparatus for employing usage statistics in document retrieval |
US20040024752A1 (en) | 2002-08-05 | 2004-02-05 | Yahoo! Inc. | Method and apparatus for search ranking using human input and automated ranking |
US20040119740A1 (en) | 2002-12-24 | 2004-06-24 | Google, Inc., A Corporation Of The State Of California | Methods and apparatus for displaying and replying to electronic messages |
US20040267725A1 (en) | 2003-06-30 | 2004-12-30 | Harik Georges R | Serving advertisements using a search of advertiser Web information |
US20050065959A1 (en) | 2003-09-22 | 2005-03-24 | Adam Smith | Systems and methods for clustering search results |
US20050071224A1 (en) | 2003-09-30 | 2005-03-31 | Andrew Fikes | System and method for automatically targeting web-based advertisements |
US20050071741A1 (en) | 2003-09-30 | 2005-03-31 | Anurag Acharya | Information retrieval based on historical data |
US20050114198A1 (en) | 2003-11-24 | 2005-05-26 | Ross Koningstein | Using concepts for ad targeting |
Non-Patent Citations (9)
Title |
---|
Brin, Sergey et al., "The Anatomy of a Large-Scale Hypertextual Web Search Engine," 1998, Computer Science Department, Stanford University, Stanford, CA. |
Bryan, Kurt and Leise, Tanya, "The $25,000,000,000★ Eigenvector The Linear Algebra Behind Google," Society for Industrial and Applied Mathematics. vol. 48,No. 3,pp. 569-581, 13 pages. |
Dourisbourne, et al. "Extraction and Classification of Dense Communities in the Web," WWW 2007, May 8-12, 2007, Banff, Alberta Canada, 10 pages. |
Fetterly, et al., "Spam, Damn Spam, and Statistics," Seventh Int'l Workshop on the Web and Databases, (WebDB 2004) Jun. 17-18, 2004, Paris, France. 6 pages. |
Gibson, et al., "Discovering Large Dense Subparagraphs in Massive Graphs," Proceedings of the 31<SUP>st </SUP>VLDB Conference, Trondheim, Norway, 2005, 12 pages. |
Henzinger, et al., "Challenges in Web Search Engines," Oct. 17, 2002, 14 pages. |
Wikipedia-Bipartite Graph, [online] [retrieved May 15, 2007] Retrieved from http://en.wikipedia.org/w/index.php?title=Bipartite<SUB>-</SUB>graph&printable=yes , 3 pages. |
Wikipedia-Link Farm, [online] [retrieved May 15, 2007] Retrieved from http://en.wikipedia.org/w/index.php?title=Link<SUB>-</SUB>farm&printable=yes , 3 pages. |
Wikipedia-Spamdexing, [online] [retrieved May 15, 2007] Retrieved from http://en.wikipedia.org/w/index.php?title=Spamdexing&printable=yes , 5 pages. |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11269812B2 (en) * | 2019-05-10 | 2022-03-08 | International Business Machines Corporation | Derived relationship for collaboration documents |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8135725B2 (en) | System and method for providing tag-based relevance recommendations of bookmarks in a bookmark and tag database | |
US6442606B1 (en) | Method and apparatus for identifying spoof documents | |
JP5148278B2 (en) | Method and system for selecting a language for text segmentation | |
US8271486B2 (en) | System and method for searching a bookmark and tag database for relevant bookmarks | |
US9110985B2 (en) | Generating a conceptual association graph from large-scale loosely-grouped content | |
US8849852B2 (en) | Text segmentation | |
US9218397B1 (en) | Systems and methods for improved searching | |
US7627571B2 (en) | Extraction of anchor explanatory text by mining repeated patterns | |
US8838567B1 (en) | Customization of search results for search queries received from third party sites | |
US8051080B2 (en) | Contextual ranking of keywords using click data | |
US20080120276A1 (en) | Systems and Methods Using Query Patterns to Disambiguate Query Intent | |
US20080244428A1 (en) | Visually Emphasizing Query Results Based on Relevance Feedback | |
US8661035B2 (en) | Content management system and method | |
US20080306968A1 (en) | Method and system for extracting, analyzing, storing, comparing and reporting on data stored in web and/or other network repositories and apparatus to detect, prevent and obfuscate information removal from information servers | |
EP1766507A2 (en) | Results based personalization of advertisements in a search engine | |
US20120023127A1 (en) | Method and system for processing a uniform resource locator | |
CA2490202A1 (en) | Query recognizer | |
KR100485321B1 (en) | A method of managing web sites registered in search engine and a system thereof | |
CN101305371A (en) | Ranking blog documents | |
US20100094855A1 (en) | System for transforming queries using object identification | |
US7698329B2 (en) | Method for improving quality of search results by avoiding indexing sections of pages | |
Roy et al. | Discovering and understanding word level user intent in web search queries | |
CN116830099A (en) | Inferring information about a web page based on a uniform resource locator of the web page | |
CN114282097A (en) | Information identification method and device | |
US20050076000A1 (en) | Determination of table of content links for a hyperlinked document |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: GOOGLE INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HENZINGER, MONIKA;FRANZ, ALEXANDER MARK;REEL/FRAME:019485/0332 Effective date: 20040708 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
AS | Assignment |
Owner name: GOOGLE LLC, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044127/0735 Effective date: 20170929 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 12 |