CA2418568A1 - Method and system for classifying content and prioritizing web site content issues - Google Patents

Method and system for classifying content and prioritizing web site content issues Download PDF

Info

Publication number
CA2418568A1
CA2418568A1 CA002418568A CA2418568A CA2418568A1 CA 2418568 A1 CA2418568 A1 CA 2418568A1 CA 002418568 A CA002418568 A CA 002418568A CA 2418568 A CA2418568 A CA 2418568A CA 2418568 A1 CA2418568 A1 CA 2418568A1
Authority
CA
Canada
Prior art keywords
traffic data
web
server
content
web site
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA002418568A
Other languages
French (fr)
Other versions
CA2418568C (en
Inventor
Emad Abdel Bary
Ruth Milling
D. Gordon Smith
Gerard Torenvliet
Jozsef Horvath
Kari Simpson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
Watchfire Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Watchfire Corp filed Critical Watchfire Corp
Priority to US10/361,948 priority Critical patent/US7624173B2/en
Priority to CA2418568A priority patent/CA2418568C/en
Publication of CA2418568A1 publication Critical patent/CA2418568A1/en
Application granted granted Critical
Publication of CA2418568C publication Critical patent/CA2418568C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99943Generating database or data structure, e.g. via user interface

Abstract

A method of analysing a Web page comprising the steps of: analysing said Web page and identifying content issues; obtaining traffic data for said Web page; correlating said content issues with said traffic data; and producing a report on said correlated data.

Claims (24)

  1. WHAT IS CLAIMED IS:

    A method of analysing a Web page comprising the steps of:
    analysing said Web page and identifying content issues;
    obtaining traffic data for said Web page;
    correlating said content issues with said traffic data; and producing a report on said correlated data.
  2. 2. The method of claim 1 further comprising the step of:
    performing URL normalization of traffic data.
  3. 3. The method of claim 2 wherein said step of performing URL normalization comprises the step of:
    removing session identifiers from URLs in traffic data.
  4. 4. The method of claim 2 wherein said step of performing URL normalization comprises the step of:
    correlating URLs of mirror sites with corresponding URLs of the main server.
  5. 5. The method of claim 2 wherein said step of performing URL normalization comprises the step of:
    switching upper case characters in URLs of traffic data to corresponding lower case format.
  6. 6. The method of claim 1 wherein said content issues are selected from the group consisting of: broken links, broken anchors, slow pages, missing Alt text, spelling errors, forms, compliance with accessibility guidelines, cookie handling, third-party links and P3P compact policies.
  7. 7. The method of claim 1 wherein said step of analysing is done in response to parameters set by the Web administrator.
  8. 8. The method of claim 7 further comprising the step of:
    querying the Web administrator to input parameters for said analysis.
  9. 9. The method of claim 7 wherein said step of producing comprises the step of:

    collecting traffic data records within a certain time range thereby allowing the production of historical trend reports.
  10. 10. The method of claim 9 further comprising the step of:
    compiling data for time periods before and after a Web site was changed, allowing the Web administrator to consider the impact of changes.
  11. 11. The method of claim 7 wherein said steps of :analysing, obtaining, correlating and producing are performed on multiple Web pages within a Web site.
  12. 12. The method of claim 11 wherein said step of producing comprises the step of sorting said Web pages in order from greatest number of content issues to least number of content issues.
  13. 13. The method of claim 11 wherein said step of producing comprises the step of sorting said Web pages in order from greatest traffic flow to least traffic flow.
  14. 14. The method of claim 11 wherein software code for effecting said method comprises an analysis module and a reporting module.
  15. 15. The method of claim 11 further comprising the step of identifying Web pages which exceed a certain threshold level for certain content issues.
  16. 16. The method of claim 11 wherein said step of identifying content issues further comprises the step of indexing said content issues by Web page.
  17. 17. The method of claim 16 wherein said step of correlating comprises the step of correlating said content issues with said traffic by Web page.
  18. 18. The method of claim 11 further comprising the step of:
    importing traffic data from a remote traffic server.
  19. 19. The method of claim 18 further comprising the step of:
    converting said imported traffic data to a universal format.
  20. 20. The method of claim 18 wherein said traffic data comprises a separate data record for each hit.
  21. 21. The method of claim 18 wherein said traffic data includes date, time, URL, and user identification.
  22. 22. The method of claim 18 wherein said traffic data includes date, time, URL, and user identification.
  23. 23. A method of analysing a Web site comprising the steps of:
    analysing said Web site and identifying content issues for each Web page of said Web site;
    obtaining traffic data far each said Web page of said Web site;
    correlating said content issues with said traffic data; and producing reports on said correlated data.
  24. 24. A system for analysing a Web site, said system comprising:
    a Web server;
    a Content Analysis server;
    a Traffic Data server; and a communication network for interconnecting said Web server, said Content Analysis server and said Traffic Data server;
    said Web server supporting said Web site, and being operable to:
    accumulate traffic data for said Web site;
    said Traffic Data server being operable to:
    aggregate said traffic data; and said Content Analysis server being operable to:
    analyse said Web site and compile a list of content issues for each page of said Web site, said content issues being indexed by Web page;
    obtain traffic data for said Web pages from said Traffic Data Server;
    correlate said list of content issues with said Traffic Data; and produce reports on said correlated data.
CA2418568A 2003-02-10 2003-02-10 Method and system for classifying content and prioritizing web site content issues Expired - Lifetime CA2418568C (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US10/361,948 US7624173B2 (en) 2003-02-10 2003-02-10 Method and system for classifying content and prioritizing web site content issues
CA2418568A CA2418568C (en) 2003-02-10 2003-02-10 Method and system for classifying content and prioritizing web site content issues

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/361,948 US7624173B2 (en) 2003-02-10 2003-02-10 Method and system for classifying content and prioritizing web site content issues
CA2418568A CA2418568C (en) 2003-02-10 2003-02-10 Method and system for classifying content and prioritizing web site content issues

Publications (2)

Publication Number Publication Date
CA2418568A1 true CA2418568A1 (en) 2004-08-10
CA2418568C CA2418568C (en) 2011-10-11

Family

ID=33160287

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2418568A Expired - Lifetime CA2418568C (en) 2003-02-10 2003-02-10 Method and system for classifying content and prioritizing web site content issues

Country Status (2)

Country Link
US (1) US7624173B2 (en)
CA (1) CA2418568C (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114167801A (en) * 2021-12-06 2022-03-11 中成卓越(北京)厨房设备有限公司 Kitchen equipment management system based on linkage control

Families Citing this family (69)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040243704A1 (en) * 2003-04-14 2004-12-02 Alfredo Botelho System and method for determining the unique web users and calculating the reach, frequency and effective reach of user web access
US7908248B2 (en) * 2003-07-22 2011-03-15 Sap Ag Dynamic meta data
US7886217B1 (en) 2003-09-29 2011-02-08 Google Inc. Identification of web sites that contain session identifiers
US7594018B2 (en) * 2003-10-10 2009-09-22 Citrix Systems, Inc. Methods and apparatus for providing access to persistent application sessions
FI20031758A (en) * 2003-12-02 2005-06-03 Nokia Corp Editing character strings on a touch screen
US7886032B1 (en) * 2003-12-23 2011-02-08 Google Inc. Content retrieval from sites that use session identifiers
US20050262063A1 (en) * 2004-04-26 2005-11-24 Watchfire Corporation Method and system for website analysis
US7600027B2 (en) * 2004-09-16 2009-10-06 International Business Machines Corporation Managing multiple sessions for a user of a portal
US8613048B2 (en) 2004-09-30 2013-12-17 Citrix Systems, Inc. Method and apparatus for providing authorized remote access to application sessions
US7748032B2 (en) 2004-09-30 2010-06-29 Citrix Systems, Inc. Method and apparatus for associating tickets in a ticket hierarchy
US7711835B2 (en) 2004-09-30 2010-05-04 Citrix Systems, Inc. Method and apparatus for reducing disclosure of proprietary data in a networked environment
US20060074714A1 (en) * 2004-10-01 2006-04-06 Microsoft Corporation Workflow tracking based on profiles
US8024568B2 (en) * 2005-01-28 2011-09-20 Citrix Systems, Inc. Method and system for verification of an endpoint security scan
US20060190433A1 (en) * 2005-02-23 2006-08-24 Microsoft Corporation Distributed navigation business activities data
US7774359B2 (en) * 2005-04-26 2010-08-10 Microsoft Corporation Business alerts on process instances based on defined conditions
US7627544B2 (en) * 2005-05-20 2009-12-01 Microsoft Corporation Recognizing event patterns from event streams
US7689913B2 (en) * 2005-06-02 2010-03-30 Us Tax Relief, Llc Managing internet pornography effectively
US8196104B2 (en) * 2005-08-31 2012-06-05 Sap Ag Systems and methods for testing application accessibility
EP2021995A4 (en) * 2005-12-06 2011-06-01 Berman Joel Method and system for scoring quality of traffic to network sites
US20070150585A1 (en) * 2005-12-28 2007-06-28 Microsoft Corporation Multi-dimensional aggregation on event streams
US7761558B1 (en) * 2006-06-30 2010-07-20 Google Inc. Determining a number of users behind a set of one or more internet protocol (IP) addresses
WO2008058262A2 (en) * 2006-11-08 2008-05-15 Social Media Networks, Inc. Methods and systems for storing, processing and managing internet user click information
US8533846B2 (en) 2006-11-08 2013-09-10 Citrix Systems, Inc. Method and system for dynamically associating access rights with a resource
US8255873B2 (en) * 2006-11-20 2012-08-28 Microsoft Corporation Handling external content in web applications
US8291108B2 (en) * 2007-03-12 2012-10-16 Citrix Systems, Inc. Systems and methods for load balancing based on user selected metrics
KR100755468B1 (en) * 2007-05-29 2007-09-04 (주)이즈포유 Method for grasping information of web site through analyzing structure of web page
WO2009005004A1 (en) * 2007-06-29 2009-01-08 Nec Corporation Session control system, session control method, and session control program
US9779173B2 (en) * 2007-07-12 2017-10-03 Go Daddy Operating Company, LLC Recording and transmitting a network user's network session
US7676461B2 (en) 2007-07-18 2010-03-09 Microsoft Corporation Implementation of stream algebra over class instances
US7925694B2 (en) * 2007-10-19 2011-04-12 Citrix Systems, Inc. Systems and methods for managing cookies via HTTP content layer
US20090178084A1 (en) * 2008-01-04 2009-07-09 Visteon Global Technologies, Inc. System and method for affinity marketing to mobile devices
WO2009094657A1 (en) 2008-01-26 2009-07-30 Citrix Systems, Inc. Systems and methods for fine grain policy driven cookie proxying
US8463821B2 (en) * 2008-04-15 2013-06-11 Oracle International Corporation Automatic generation and publication of online documentation
US8621635B2 (en) * 2008-08-18 2013-12-31 Microsoft Corporation Web page privacy risk detection
US20100088325A1 (en) 2008-10-07 2010-04-08 Microsoft Corporation Streaming Queries
US8041710B2 (en) * 2008-11-13 2011-10-18 Microsoft Corporation Automatic diagnosis of search relevance failures
CN101739433B (en) * 2008-11-14 2012-12-19 鸿富锦精密工业(深圳)有限公司 System and method for correcting webpage download error
EP2224351A1 (en) * 2009-02-26 2010-09-01 Telefonaktiebolaget L M Ericsson (publ) method for use in association with a multi-tab interpretation and rendering function
US9871811B2 (en) * 2009-05-26 2018-01-16 Microsoft Technology Licensing, Llc Identifying security properties of systems from application crash traffic
US9021401B2 (en) * 2009-09-21 2015-04-28 International Business Machines Corporation Methods and systems involving browser nodes
US9158816B2 (en) 2009-10-21 2015-10-13 Microsoft Technology Licensing, Llc Event processing with XML query based on reusable XML query template
KR101103401B1 (en) * 2009-11-23 2012-01-05 쏠스펙트럼(주) Statistical analysis apparatus for web service
US8538817B2 (en) * 2010-03-08 2013-09-17 Aol Inc. Systems and methods for protecting consumer privacy in online advertising environments
JP2011197801A (en) * 2010-03-17 2011-10-06 Sony Corp Apparatus and method for processing information, program, server apparatus, and information processing system
US9177321B2 (en) * 2010-12-21 2015-11-03 Sitecore A/S Method and a system for analysing traffic on a website by means of path analysis
US9645990B2 (en) * 2012-08-02 2017-05-09 Adobe Systems Incorporated Dynamic report building using a heterogeneous combination of filtering criteria
CN102915327A (en) * 2012-09-05 2013-02-06 吴小军 Double track synchronous synergy website system with dynamic/static webpage
US8949216B2 (en) * 2012-12-07 2015-02-03 International Business Machines Corporation Determining characteristic parameters for web pages
US8925099B1 (en) 2013-03-14 2014-12-30 Reputation.Com, Inc. Privacy scoring
US9825984B1 (en) 2014-08-27 2017-11-21 Shape Security, Inc. Background analysis of web content
US10853843B2 (en) 2014-10-14 2020-12-01 Postalytics, Inc. Pay as you go marketing campaign
US11544736B2 (en) 2014-10-14 2023-01-03 Postalytics, Inc. Pay as you go marketing campaign
US10444934B2 (en) 2016-03-18 2019-10-15 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10896286B2 (en) 2016-03-18 2021-01-19 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10867120B1 (en) 2016-03-18 2020-12-15 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US11727195B2 (en) 2016-03-18 2023-08-15 Audioeye, Inc. Modular systems and methods for selectively enabling cloud-based assistive technologies
US10423709B1 (en) 2018-08-16 2019-09-24 Audioeye, Inc. Systems, devices, and methods for automated and programmatic creation and deployment of remediations to non-compliant web pages or user interfaces
US10262348B2 (en) 2016-05-09 2019-04-16 Microsoft Technology Licensing, Llc Catalog quality management model
WO2018072060A1 (en) 2016-10-17 2018-04-26 Google Llc Machine learning based identification of broken network connections
US10880261B2 (en) * 2017-04-11 2020-12-29 Postalytics, Inc. Personal web address management system
US10789315B1 (en) * 2017-07-19 2020-09-29 United Services Automobile Association (Usaa) Content curation application and graphical user interface
US10853697B2 (en) * 2018-08-28 2020-12-01 Beijing Jingdong Shangke Information Technology Co., Ltd. System and method for monitoring online retail platform using artificial intelligence and fixing malfunction
US10834214B2 (en) 2018-09-04 2020-11-10 At&T Intellectual Property I, L.P. Separating intended and non-intended browsing traffic in browsing history
WO2020214658A1 (en) * 2019-04-16 2020-10-22 Litmus Software, Inc. Methods and systems for converting text to audio to improve electronic mail message design
CN110336790B (en) * 2019-05-29 2021-05-25 网宿科技股份有限公司 Website detection method and system
US11544653B2 (en) * 2019-06-24 2023-01-03 Overstock.Com, Inc. System and method for improving product catalog representations based on product catalog adherence scores
US11328241B2 (en) 2020-07-02 2022-05-10 Content Square SAS Identifying script errors in an online retail platform and quantifying such errors
EP4176352A1 (en) * 2020-07-02 2023-05-10 Content Square SAS Identifying script errors in an online retail platform and quantifying such errors
CN113315670B (en) * 2021-07-28 2021-10-01 深圳市华球通网络有限公司 Network flow analysis method, device and storage medium

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5999929A (en) * 1997-09-29 1999-12-07 Continuum Software, Inc World wide web link referral system and method for generating and providing related links for links identified in web pages
US6253204B1 (en) * 1997-12-17 2001-06-26 Sun Microsystems, Inc. Restoring broken links utilizing a spider process
US6393479B1 (en) * 1999-06-04 2002-05-21 Webside Story, Inc. Internet website traffic flow analysis
US6697969B1 (en) * 1999-09-01 2004-02-24 International Business Machines Corporation Method, system, and program for diagnosing a computer in a network system
US6792458B1 (en) * 1999-10-04 2004-09-14 Urchin Software Corporation System and method for monitoring and analyzing internet traffic
US20020013782A1 (en) * 2000-02-18 2002-01-31 Daniel Ostroff Software program for internet information retrieval, analysis and presentation
WO2001063486A2 (en) * 2000-02-24 2001-08-30 Findbase, L.L.C. Method and system for extracting, analyzing, storing, comparing and reporting on data stored in web and/or other network repositories and apparatus to detect, prevent and obfuscate information removal from information servers
US6591266B1 (en) * 2000-07-14 2003-07-08 Nec Corporation System and method for intelligent caching and refresh of dynamically generated and static web content
AU2002321795A1 (en) * 2001-07-27 2003-02-17 Quigo Technologies Inc. System and method for automated tracking and analysis of document usage
US7047291B2 (en) * 2002-04-11 2006-05-16 International Business Machines Corporation System for correlating events generated by application and component probes when performance problems are identified
US20030208594A1 (en) * 2002-05-06 2003-11-06 Urchin Software Corporation. System and method for tracking unique visitors to a website

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114167801A (en) * 2021-12-06 2022-03-11 中成卓越(北京)厨房设备有限公司 Kitchen equipment management system based on linkage control
CN114167801B (en) * 2021-12-06 2022-07-08 中成卓越(北京)厨房设备有限公司 Kitchen equipment management system based on linkage control

Also Published As

Publication number Publication date
US20040158429A1 (en) 2004-08-12
US7624173B2 (en) 2009-11-24
CA2418568C (en) 2011-10-11

Similar Documents

Publication Publication Date Title
CA2418568A1 (en) Method and system for classifying content and prioritizing web site content issues
CN107040863B (en) Real-time service recommendation method and system
US6741990B2 (en) System and method for efficient and adaptive web accesses filtering
US6317787B1 (en) System and method for analyzing web-server log files
CN107087001B (en) distributed internet important address space retrieval system
US6360261B1 (en) System and method for analyzing remote traffic data in distributed computing environment
CN100456286C (en) Universal file search system and method
CN105159964A (en) Log monitoring method and system
CN102918534A (en) Query pipeline
CN103297435A (en) Abnormal access behavior detection method and system on basis of WEB logs
CN102208991A (en) Blog processing method, device and system
CN111882367B (en) Method for monitoring and tracking online advertisements through analysis of user surfing behavior
CN102710795A (en) Hotspot collecting method and device
CN111740868B (en) Alarm data processing method and device and storage medium
EP1785841A3 (en) Database for multiple implementation of http to obtain information from devices
CN110061931B (en) Industrial control protocol clustering method, device and system and computer storage medium
CN112181931A (en) Big data system link tracking method and electronic equipment
KR20210083936A (en) System for collecting cyber threat information
CN112261645A (en) Mobile application fingerprint automatic extraction method and system based on grouping and domain division
CN112822153A (en) Method and system for discovering suspicious threats based on DNS log
CN104660438A (en) Problem positioning processing method and device
CN102347930A (en) Method and system for obtaining webpage content
CN108965011A (en) One kind being based on intelligent gateway deep packet inspection system and analysis method
CN110011860A (en) Android application and identification method based on network traffic analysis
CN111031025B (en) Method and device for automatically detecting and verifying Webshell

Legal Events

Date Code Title Description
EEER Examination request
MKEX Expiry

Effective date: 20230210

MKEX Expiry

Effective date: 20230210