|Publication number||US7574382 B1|
|Application number||US 10/910,457|
|Publication date||Aug 11, 2009|
|Filing date||Aug 3, 2004|
|Priority date||Aug 3, 2004|
|Publication number||10910457, 910457, US 7574382 B1, US 7574382B1, US-B1-7574382, US7574382 B1, US7574382B1|
|Inventors||Zachary T. Hubert|
|Original Assignee||Amazon Technologies, Inc.|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (26), Non-Patent Citations (2), Referenced by (12), Classifications (6), Legal Events (3)|
|External Links: USPTO, USPTO Assignment, Espacenet|
1. Field of the Invention
The present invention relates to computer-implemented processes for efficiently detecting anomalous user activity associated with specific items, such as items in an electronic catalog. The detected anomalies may, for example, be attributable to, and may be used to correct, errors in an electronic catalog.
2. Description of the Related Art
It has become common for businesses to set up web sites, and other types of interactive computer systems, to automate the process of accepting orders from users. Information about the items that can be ordered via such a system is typically disseminated to users via a browsable electronic catalog. While browsing the electronic catalog, users can typically select one or more items to purchase, rent, or otherwise acquire, and then place an order for these items. The ordered items may, for example, be shipped to the user from a distribution center, made available for local pick-up, or transmitted to the user electronically.
One problem with this type of system is that a large number of users can rely on, or take advantage of, a typographical or other error in the electronic catalog before the error is detected and corrected by authorized personnel. As a result, a single error, such as an error in the price of an item, can result in a significant loss of revenue to an online merchant. One potential solution to this problem is to set up a computer system that analyzes each order to evaluate whether it represents a significant departure from current trends. Due to the computational burden associated with this approach, however, it is not well suited for systems that process large numbers of orders (e.g., hundreds or thousands of orders per minute) placed from a catalog that includes a large number of items (e.g., millions of items).
The present invention comprises a system that detects anomalous user activity associated with specific items in an electronic catalog. The system may, for example, be implemented using a computer system, such a general-purpose computer, that passively monitors orders placed by users of the electronic catalog. The system is suitable for use in an electronic catalog system that, for example, receives thousands of orders per minute from a catalog that includes millions of items.
In one embodiment, the system includes a data repository that stores aggregated data about orders placed from an electronic catalog. The aggregated data may be arranged by time period, where each time period may, for example, have a duration of one hour. To analyze the aggregated data associated with a current time period (e.g., the last hour), an analyzer selects, from a set of items ordered during the current time period, a subset of items for which to conduct an anomaly analysis. The subset may, for example, be selected based on the quantity of each item ordered during the current time period and/or other criteria. By limiting the analysis to a selected subset of items, the analyzer controls the processing load associated with the anomaly detection process.
For each item in the subset, the analyzer uses order volume data from prior time periods to generate a forecasted or expected order volume for the current time period. An exponential smoothing algorithm may be used for this purpose. In one embodiment, the order volume for each item is specified in terms of the total quantity of the item ordered in the relevant time period, although other metrics reflective of the demand for the item, such as total number of distinct users that order the item, or total number of orders received for one or more units of the item, may additionally or alternatively be used. To determine whether an item's order activity or demand during the current time period is anomalous, the actual order volume associated with the item is compared to the item's forecasted order volume. Other criteria, such as the number of distinct users that ordered the item during the current time period, may also be taken into consideration.
If the analyzer determines that an anomaly exists in the order activity data for a given item, an alert message is generated and sent to an associated catalog administrator, such as an administrator responsible for a corresponding product category. The alert message may include a hyperlink to an associated catalog page to enable the administrator to efficiently evaluate whether the detected anomaly is attributable to an erroneous catalog description of the item. The alert message may also provide an option (e.g., a set of buttons or links) for the message recipient to provide feedback on whether the anomaly was properly detected. In embodiments that provide such a feedback option, the feedback may be used, on an item-by-item or other basis, to adaptively adjust the sensitivity of an anomaly detection algorithm used by the analyzer.
The invention may also be used where some or all of the orders are placed without the use of an electronic catalog. For example, the invention is applicable to systems that accept orders from recipients of a paper catalog that describes items that can be purchased.
One aspect of the invention is thus a system for detecting anomalous user activity associated with items in a catalog. The system comprises a data repository that stores aggregated data descriptive of orders placed by users from a catalog of items, with the aggregated data arranged by time period. A forecasting module analyzes item demand levels in prior time periods on an item-by-item basis, as indicated by the aggregated data, to predict demand levels for respective items in a current time period. The item demand levels may, for example, be measured and predicted in terms of total quantity of item ordered per time period. An anomaly detection module detects anomalies associated with specific items in the catalog, at least in part, by comparing the demand levels predicted by the forecasting module to corresponding observed demand levels. A reporting module generates alert messages to notify catalog administrators of items for which anomalies are detected by the anomaly detection module.
Neither this summary nor the following detailed description purports to define the invention. The invention is defined by the claims.
As depicted in
Some or all of the information stored in the items database 46 for a given item is disseminated to users as part of the electronic catalog, such as on item detail pages of a web site. Updates to the catalog are made by updating the items database 46. The updates may include item additions and deletions, and changes to various item attributes (price, availability, description, photo, average customer review, etc.). The updates may come from various sources, such as catalog administrators, suppliers, merchants that sell items via the electronic catalog, or an inventory management system.
Errors in the item information supplied by any of the sources of item information may result in an error in the catalog. Examples of the types of errors that can occur include erroneous price information, erroneous availability information (e.g., a not-yet-released item is listed as being available), and erroneous descriptions of product features (e.g., a 2-megapixel camera is listed as a 4-megapixel camera). As discussed below, the anomaly detection engine 32 rapidly identifies anomalous user behavior suggestive of these and other types of catalog errors. The anomaly detection engine 32 may also be used to detect fraudulent user activity.
The order acquisition system 34 also includes a users database 50 that stores information about users that have registered with the system 30. The information stored for a given user may include, for example, a username and password, shipping information, payment information, and a history of orders placed by the user.
As illustrated in
The order processing pipeline 52 is responsible for collecting payments from users, such as by charging a user's credit card upon shipment of a set of ordered items. In the case of physical products, the order processing pipeline 52 may also select one or more distribution centers from which to ship the ordered items, and may provide associated messaging and order tracking for purposes of order fulfillment. In some embodiments, some or all of the orders may be fulfilled by a business entity other than the entity that operates the electronic catalog system 30. For instance, the electronic catalog system 30 may acquire orders and collect payments for many different merchants.
The primary components of the anomaly detection engine 32, in the illustrated embodiment, are a cache 60 that stores and aggregates information about recently placed orders, a listener 62 that populates the cache 60 as orders are placed by users, and an analyzer 64 that analyzes aggregated data stored in the cache to detect anomalous user behavior associated with specific catalog items. The anomaly detection engine 32 also includes an anomalies database 68 that stores information about detected anomalies. In addition, the anomaly detection engine 32 includes a reporting component 70 that sends alert messages to catalog administrators (represented by block 74, which depicts the computers of the administrators). The reporting component 70 may also provide functionality for administrators to interactively generate charts and reports of information stored in the anomalies database 68. The cache 60 and the anomalies database 68 may be implemented using any type of data repository.
In one embodiment, the anomaly detection engine 32 is implemented entirely within software executed by a single, general-purpose computer. Because the anomaly detection engine 32 uses highly efficient data processing algorithms, this single computer is capable of detecting anomalies substantially in real time with a sustained order rate of over 103 orders per minute and a catalog size of over 108 items. Although a single computer may be used, the anomaly detection engine 32 may alternatively be implemented using two or more computers.
The operation of the anomaly detection engine 32 will now be described with reference to
As depicted in
Each aggregation table 82 stores aggregated information about orders placed during a respective, one-hour time period, such that the orders placed during a single day are effectively divided among twenty-four one-hour “buckets.” Aggregation tables that represent smaller or larger time periods may alternatively be used. For example, time periods falling in the range of one minute to six hours, and more typically in the range of twenty minutes to three hours, may be used. Although multiple aggregation tables 82 are shown in
Each aggregation table 82 includes one entry (row) for each item ordered during the corresponding constituent time period. As illustrated, each such entry contains the ID of the item, the total quantity of that item ordered over the corresponding one-hour time period, and the number of distinct users that ordered the item during that time period. In one embodiment, aggregation tables 82 are maintained in the cache 60 for user activity occurring over the preceding thirty days. As depicted in
In some embodiments of the invention, the analyzer 64 takes item prices into consideration for purposes of detecting anomalies. In these embodiments, the cache manager 86 may also use the data read from the recent orders table to maintain an item price histories table 88. The item price histories table 88 may, for example, store a history of up to the last X (e.g., 3) price changes detected for each item in the catalog. Information about recent item prices, if used, may alternatively be obtained from another source.
The analyzer 64 may be invoked each time a new aggregation table 82 is generated in order to search for anomalies in order activity data recorded therein. As illustrated in
The problem space reduction module 92 is responsible for selecting, from the set of items ordered during the current time period, a relatively small subset of items for which to conduct a forecasting and anomaly detection analysis. The purpose of the problem space reduction phase is to reduce the processing burden associated with the forecasting and anomaly detection phases to an acceptable level, such as a level which permits the analysis of a one-hour bucket to be completed in less than one hour. In one embodiment, the problem space reduction module 92 selects a total of N items from one or both of the following groups, where N is a selected integer such as 200 or 500:
Group 1 is based primarily on the assumption that the items for which the most serious catalog errors exist, such as severe pricing errors that are favorable to customers, will likely experience the highest levels of order activity. Group 2, on the other hand, focuses on relatively high cost, low volume items, since catalog errors associated with these items can be very costly even at relatively low volumes. Because the current price in the catalog may be erroneous, a recent item price is used in the calculation for group 2. The recent item price may be obtained from the item price histories table 88 or some other source of price information.
In embodiments in which order volumes are sufficiently low, and/or computing resources are sufficiently high, the anomaly analysis may be performed in connection with all items ordered during the current time period. In such embodiments, the problem space reduction module 92 may be omitted or disabled.
As depicted in
In one embodiment, the forecasting module 94 uses an exponential smoothing algorithm, such as a single, double or triple exponential smoothing algorithm, to generate the forecasted item quantities. Exponential smoothing algorithms give exponentially decreasing weight to data values from progressively earlier time periods. Thus, for example, to predict an item's order quantity for the current time period, or “t,” the greatest weight would be given to the item's quantity value from the immediately preceding time period, t−1, and exponentially decreasing weight would be given to the quantity values from time periods t−2, t−3, and so on. Although an exponential smoothing algorithm is used in the illustrated embodiment, other types of time series forecasting algorithms may be used, such as single and double moving average, Holt-Winters, and multiple linear regression algorithms.
Referring again to
In one embodiment, the anomaly detection module 96 uses a set of one or more thresholds to determine, for each selected item, whether an anomaly exists. By way of example and not limitation, an anomaly may be deemed to exist if and only if the following three conditions are met:
1. actual quantity/forecasted quantity>1.2;
2. actual quantity>5; and
3. actual quantity×recent price>$1000
The second of these three conditions filters out those items for which the low volume of orders is likely to produce statistically inaccurate forecasting results. The third condition filters out those items for which the potential monetary loss over the current time period falls below a selected threshold. The actual threshold values used for these and other conditions may vary by type or category of product. In addition, different thresholds may be used based on the time of day (e.g., greater variations may be permitted during peak periods).
In another embodiment, a scoring algorithm is used to generate a respective score for each of the N selected catalog items. By way of example and not limitation, a score may be generated for each item according to the following equation:
score=10×(actual quantity/forecasted quantity)+10×(no. distinct users who order the item)+100×(avg. order size). Equation 1
The score may be compared to one or more thresholds to evaluate whether, or the extent to which, the associated user activity is anomalous. For example, scores in the range of 0 to 500 may be treated as normal, scores in the range of over 500 to 1000 may be treated as revealing a medium risk anomaly, and scores above 1000 may be treated as revealing a high risk anomaly.
As discussed below, the anomaly detection module 96 may also use a relevance feedback algorithm to adapt to the feedback provided by human operators.
As further illustrated in
As depicted in
Upon receiving an alert message, the catalog administrator can determine whether an error exists in the item's catalog description, such as by viewing the item's detail page. If an error is found, the administrator can take an appropriate corrective action, such as correcting the error in the catalog, and possibly blocking pending orders for the relevant item from being fulfilled. (Assuming one-hour time intervals are used, the anomaly is typically reported within one hour of its occurrence, allowing pending orders placed at the time of the anomaly to be blocked.) In some embodiments, the task of checking for and correcting the associated catalog error may be partially or fully automated.
In the example shown in
In step 100 of
In step 102, which corresponds to the problem space reduction block in
In step 106, an exponential smoothing algorithm is applied to the current item's aggregation table data (quantity values) from prior time periods to calculate the forecasted quantity for the current time period. This step may optionally be performed before the end of the current time period because it relies solely on data from prior time periods. For example, before the end of the current time period, forecasted quantities may be calculated for those items that, based on the activity that has already occurred during the current time period, are predicted to be included in the set of N items. Forecasts for any additional items that end up being selected in step 102 can then be generated at the end of the current time period.
If a double exponential smoothing algorithm is used in step 106, the forecast may be made using the following equations, where Ft+1 is the forecast for time period t+1, yt represents the actual observation for time period t, and α and γ are smoothing constants between 0 and 1.
F t+1 =S t +b t Equation 2
S t =αy t+(1−α)(S t−1 +b t−1) Equation 3
b t=γ(S t −S t−1)+(1−γ)b t−1 Equation 4
In one embodiment, a value of 0.8 is used for each of α and γ. In another embodiment, the forecasting module 94 iteratively selects, for each item, an α and γ that produces a “best match” between the second exponential smoothing curve and the associated time series of observed quantity values; the α and γ values that produce the best match (lowest error) are then used to generate the forecasted quantity for that item.
In step 108, the forecasted and actual quantity values, and optionally other types of data, are used to evaluate whether an anomaly exists in the current item's order data. This evaluation may be performed using one of the methods described above or another appropriate method, and may optionally take into consideration prior feedback provided by catalog administrators. If an anomaly is detected in step 108, it is recorded in the anomalies database 68 as depicted in step 110, and an alert message is generated and sent to a catalog administrator.
As will be apparent, steps 106 (forecasting) and 108 (anomaly detection) may, in practice, be combined. For example, the two steps may be embodied within a single formula or function that generates a yes/no response based on the item's actual quantity values for the current and prior time periods.
As mentioned above, one possible variation to the illustrated embodiment is to forecast and compare the number of distinct users that order the item, rather than (or in addition to) forecasting and comparing the total item quantity. Specifically, in step 106, the number of distinct users that acquired the current item in prior time periods can be used to predict the number of distinct users for the current period. This number can then be compared, in step 108, to the actual number of distinct users that acquired the item during the current time period. With this variation, all of the components depicted in
As depicted by the loop that includes step 114, steps 106-108 are repeated for each additional item in the set of N items until the last item is reached in step 112. The order data stored in the recent items table 80 for the current time period may then be purged, as shown in step 116.
Numerous variations to the approach shown in
As will be appreciated by the foregoing, the disclosed architecture can easily be scaled by adding additional computers. For example, assuming a single computer is initially used to implement the anomaly detection engine 32, the number of items for which an anomaly analysis is conducted each time period can be approximately doubled by adding a second computer. This second computer can be a replicated version of the first computer (i.e., can include all of the components and modules shown in block 62 of
The invention may also be applied where some or all of the orders are placed without the use of an electronic catalog. For example, the invention is applicable to systems that accept orders from recipients of a paper catalog that describes items that can be ordered. To select an item to order in such a system, the user may, for example, scan-in a corresponding bar code label from the paper catalog using a PDA or a digital pen, or may specify a product identifier using a computer keyboard, a telephone keypad, or automated voice recognition. The components and algorithms used in such paper-catalog-based embodiments may be substantially the same as those shown in the drawings and described above. The invention may also be used in systems that accept orders placed from electronic catalogs that are distributed by CD, DVD, disk, tape, or other types of information storage medium.
Although this invention has been described in terms of certain specific embodiments and applications, other embodiments and applications that are apparent to those of ordinary skill in the art, including embodiments that do not provide all of the features and advantages set forth herein, are also within the scope of this invention. Accordingly, the scope of the present invention is defined only by the appended claims.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US5659593||Nov 30, 1994||Aug 19, 1997||Lucent Technologies Inc.||Detection of deviations in monitored patterns|
|US6032145 *||Apr 10, 1998||Feb 29, 2000||Requisite Technology, Inc.||Method and system for database manipulation|
|US6549919 *||Dec 8, 2000||Apr 15, 2003||Lucent Technologies Inc.||Method and apparatus for updating records in a database system based on an improved model of time-dependent behavior|
|US6714918 *||Nov 18, 2002||Mar 30, 2004||Access Business Group International Llc||System and method for detecting fraudulent transactions|
|US6738811||Mar 31, 2000||May 18, 2004||Supermicro Computer, Inc.||Method and architecture for monitoring the health of servers across data networks|
|US6944599 *||Sep 13, 2000||Sep 13, 2005||Ebay Inc.||Monitoring and automatic notification of irregular activity in a network-based transaction facility|
|US7092929 *||Jul 13, 2001||Aug 15, 2006||Bluefire Systems, Inc.||Method and apparatus for planning analysis|
|US7251589||May 9, 2006||Jul 31, 2007||Sas Institute Inc.||Computer-implemented system and method for generating forecasts|
|US7310590||Nov 15, 2006||Dec 18, 2007||Computer Associates Think, Inc.||Time series anomaly detection using multiple statistical models|
|US20010049690 *||Apr 6, 2001||Dec 6, 2001||Mcconnell Theodore Van Fossen||Method and apparatus for monitoring the effective velocity of items through a store or warehouse|
|US20020106709||Aug 10, 2001||Aug 8, 2002||Potts Russell O.||Methods and devices for prediction of hypoglycemic events|
|US20020161672 *||Dec 4, 2001||Oct 31, 2002||Siemens Medical Solution Health Services Corporation||System for processing product information in support of commercial transactions|
|US20020169657 *||Oct 29, 2001||Nov 14, 2002||Manugistics, Inc.||Supply chain demand forecasting and planning|
|US20020178077 *||May 25, 2001||Nov 28, 2002||Katz Steven Bruce||Method for automatically invoking a software module in response to an internal or external event affecting the procurement of an item|
|US20030018928||Mar 6, 2002||Jan 23, 2003||California Institute Of Technology In Pasadena, California||Real-time spatio-temporal coherence estimation for autonomous mode identification and invariance tracking|
|US20030033179 *||Aug 9, 2001||Feb 13, 2003||Katz Steven Bruce||Method for generating customized alerts related to the procurement, sourcing, strategic sourcing and/or sale of one or more items by an enterprise|
|US20030050859 *||Mar 23, 2001||Mar 13, 2003||Restaurant Services, Inc.||System, method and computer program product for a catalog feature in a supply chain management framework|
|US20030055714 *||Sep 20, 2001||Mar 20, 2003||Bobby Thompson||System and method for monitoring irregular sales activity|
|US20030083956 *||Dec 6, 2002||May 1, 2003||Freeny Charles C.||Automated synchronous product pricing and advertising system|
|US20030212590 *||May 13, 2002||Nov 13, 2003||Klingler Gregory L.||Process for forecasting product demand|
|US20030212618||May 7, 2002||Nov 13, 2003||General Electric Capital Corporation||Systems and methods associated with targeted leading indicators|
|US20040088211 *||Nov 4, 2002||May 6, 2004||Steve Kakouros||Monitoring a demand forecasting process|
|US20050033683 *||Aug 4, 2004||Feb 10, 2005||Nathan Sacco||Method and apparatus for deploying high-volume listings in a network trading platform|
|US20050102175 *||Nov 7, 2003||May 12, 2005||Dudat Olaf S.||Systems and methods for automatic selection of a forecast model|
|US20050102192 *||Nov 7, 2003||May 12, 2005||Gerrits Kevin G.||Method and apparatus for processing of purchase orders|
|WO2003060642A2||Dec 23, 2002||Jul 24, 2003||Ims Health Inc||Method and system for rapidly projecting and forecasting pharmaceutical market information|
|1||*||Chang Yang et al., Efficient discovery of error-tolerant frequent itemsets in high dimensions., 2001, ACM Press.|
|2||*||Moira Cotlier. Avoiding costly online errors. Catalog Age. New Canaan, CT. Sep. 2000. vol. 17, Iss. 10.|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|US7610214||Mar 24, 2005||Oct 27, 2009||Amazon Technologies, Inc.||Robust forecasting techniques with reduced sensitivity to anomalous data|
|US7739143||Mar 24, 2005||Jun 15, 2010||Amazon Technologies, Inc.||Robust forecasting techniques with reduced sensitivity to anomalous data|
|US7930330 *||Sep 13, 2005||Apr 19, 2011||International Business Machines Corporation||Scaled exponential smoothing|
|US8370194||Mar 17, 2010||Feb 5, 2013||Amazon Technologies, Inc.||Robust forecasting techniques with reduced sensitivity to anomalous data|
|US8447684 *||Dec 20, 2010||May 21, 2013||N. Caleb Avery||Method and system for optimal pricing and allocation for a set of contractual rights to be offered with canceling/modifying of indications of interest|
|US8631054||Feb 18, 2011||Jan 14, 2014||International Business Machines Corporation||Scaled exponential smoothing|
|US8856797||Oct 5, 2011||Oct 7, 2014||Amazon Technologies, Inc.||Reactive auto-scaling of capacity|
|US8949231 *||Mar 7, 2013||Feb 3, 2015||Veveo, Inc.||Methods and systems for selecting and presenting content based on activity level spikes associated with the content|
|US9112782||Oct 3, 2014||Aug 18, 2015||Amazon Technologies, Inc.||Reactive auto-scaling of capacity|
|US20060056721 *||Sep 13, 2005||Mar 16, 2006||Todd Stephen J||Method and system for scaled exponential smoothing|
|US20110191230 *||Aug 4, 2011||Avery N Caleb||Method and system for optimal pricing and allocation for a set of contractual rights to be offered with canceling/modifying of indications of interest|
|US20130318080 *||Mar 7, 2013||Nov 28, 2013||Veveo, Inc.||Methods and systems for selecting and presenting content based on activity level spikes associated with the content|
|Cooperative Classification||G06Q30/0623, G06Q10/00|
|European Classification||G06Q30/0623, G06Q10/00|
|Dec 8, 2008||AS||Assignment|
Owner name: AMAZON TECHNOLOGIES, INC., NEVADA
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HUBERT, ZACHARY T.;REEL/FRAME:021941/0028
Effective date: 20040803
|Sep 7, 2010||CC||Certificate of correction|
|Feb 11, 2013||FPAY||Fee payment|
Year of fee payment: 4