US 20080126303 A1
A method comprising receiving a search query to generate a search result of one or more media items and providing a personalized search rank of the one or more media items on the basis of a user profile and an item relevance for a given media item with regard to the query metadata associated with the search result is identified and used to identify at least one related media item. The at least one related media item is ranked on the basis of the user profile and the metadata.
1. A method comprising:
receiving a search query to generate a search result of one or more media items:
providing a personalized search rank of the one or more media items on the basis of a user profile and an item relevance for a given media item with regard to the query;
identifying metadata associated with the search result;
using the metadata to identify at least one related media item; and
ranking the at least one related media item on the basis of the user profile and the metadata.
2. The method of
3. The method of
4. The method of
5. The method of
6. The method of
7. The method of
8. The method of
9. The method of
10. The method of
11. A system comprising:
a search engine for receiving a search query and generating a search result of one or more media items, the search result organized in accordance with a personalized search rank on the basis of a user profile and an item relevance for a given media item with regard to the query;
a user preference agent coupled to the search engine for identifying metadata associated with the search result and for using the metadata to identify at least one related media item; and
a ranking component coupled to the user preference agent for ranking the search result and for ranking the at least one related media item in accordance the user profile and the metadata.
12. The system of
13. The system of
14. The system of
15. The system of
16. The system of
17. The system of
18. The system of
19. The system of
20. The system of
21. A system comprising:
means for receiving a search query to generate a search result of one or more media items;
means for providing a personalized search rank of the one or more media items on the basis of a user profile and an item relevance for a given media item with regard to the query;
means for identifying metadata associated with the search result;
means for using the metadata to identify at least one related media item; and
means for ranking the at least one related media item on the basis of the user profile and the metadata.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.
The technical field generally relates to indexing and ranking media content items and providing a search engine for locating media content items and related media content items.
Search engines are available that allow users to search through specialized sets of electronic documents/files, such as electronic movie content. These search engines that provide search functionality of specialized media documents are referred to herein as media content search engines. Media content search engines, such as the Internet Movie Database and Yahoo! Movies, provide databases of movies and movie information that are searchable using search terms, such as keywords found in a movie title, actor name, or director name. Such media content search engines are useful when a user knows something about the media content that they are attempting to locate.
When a keyword is known, a user may enter the keyword in the search engine, which will match media content items in the media content database and return search results. In conventional media content search engines, the search results are typically ranked by search matching score. If a searched keyword is a general term or a term marginally related to the media content, the user may be presented with a list of largely irrelevant search results.
Some conventional media search engines, including the Internet Movie Database, present search results together with lists of links to related media content. Such related media content includes, for example, lists of links to information about actors appearing in a movie and to other recommended movies. However, the determination of such lists of links to related media content does not account for the relevance of the related media content to the user's search and are not ranked in a meaningful way. The list of links may include links to media content that the user deems irrelevant. In this scenario, a user may have to navigate through many unhelpful links, or may be unable to locate desired media content.
Systems and methods search for media content items, such as movies, television programs or other audio or video content, as well as information regarding the media content items and related media content items. According to some embodiments, a user using a client device such as a computer, telephone, personal digital assistant, television set top box, or other client device, submits search terms using a user interface that the client provides for supplying one or more search terms. For example, a user may enter search terms into a web page or other interface that a search engine provides. The client device communicates the search terms over a network to a server. The server is communicatively coupled to a database containing media content items, as well as metadata relating to the media content items. Metadata relating to media content may include, for example, data identifying the media content, such as title, characters, plot, genre, crew, actors, ratings, reviews and other media content metadata.
The server searches the database for media content items on the basis of the search query. The server may rank results from the search query in accordance with various parameters, such as item relevance against the search query, an explicit rating (a rating specifically assigned to a media content item or some aspect of media content by the searcher), a predicted rating (an expected preference of the searcher to the media content item or some aspect of media content, which is calculated by the server on the basis of preference information in one or more user profiles), a global, community or third party rating, other parameters, or combinations of these parameters. The ranked search results are returned to the client device together with links to related media content items, which may alternatively or additionally include information regarding related media content items. The links to related media content items may also be ranked in accordance with the various parameters.
According to one embodiment, a list of search results is provided in ranked order and the search results are presented with links to one or more related or recommended media content items, such as movies connected by common collaborators. Examples of movies with common collaborators include movies having at least one common actor, director, or other cast or crewmember with the selected media content item. In some embodiments, a user's use of the search results may be tracked and incorporated into the user preference profile.
The invention is illustrated in the figures of the accompanying drawings which are meant to be exemplary and not limiting, in which like references are intended to refer to like or corresponding parts, and in which:
The client device 50, which may include one or more client devices in communication with the server 70 over the network 60, may be any device capable of receiving data from a user and transmitting the data to the server 70, as well as capable of receiving data from the server 70 and displaying or otherwise communicating the received data to the user. A client device 50 may include a personal computer, a television set top box, a mobile telephone, personal digital assistant (PDA), or other device.
The network 60 may be any network capable of communicating data from the client device 50 to the server 70, such as, the Internet, a local area network (LAN), a wide area network (WAN) or another network or combinations thereof.
Possibly offline, the metadata agent 80 parses information describing media content, which the metadata agent 80 may receive from one or more content sources 42 and 44. According to one embodiment, the content sources 42 and 44 may include the Yahoo! Movies database and/or the IMDB movies database, each of which contain information describing a given media content item, including title, casting information, explicit user ratings, etc. The metadata agent 80 associates media content with metadata relating to the media content. For example, the metadata agent 80 may store data relating to a movie in association with its metadata such as its title, cast, characters, director, producer, genre, popularity, plots, general ratings, MPAA ratings, award information, lease date, the summary of reviews from critics and users, theater information, captions, ranking information or other metadata relating to or describing the media content. In addition to obtaining metadata regarding media content items from one or more content sources 42 and 44, the metadata agent 80 may obtain metadata from other sources including, for example, from the given media content item itself, public and private databases, online resources, or other sources.
The metadata agent 80 stores the metadata in the data store 100. The data store 100 may be any data structure capable of storing data in an organized and structured fashion. According to some embodiments, the data store 100 includes data and metadata relating to media content items, such as movies, television programs, or other audio and/or video media content. The data in the data store 100 may be obtained from one or more data sources, such as publishers, recording companies, production companies, reviewers or other sources of media content. The data store 100 may be a relational database in which data relating to each media content item is stored in association with related data, such as data indicating another related media content item. The data store may be a tab-delimited data store, a comma delimited data store, an object database, a hybrid object-relational database and/or other data stores known to those of skill in the art.
The rating agent 90 generates non-personalized overall ratings for classes of items that the data store 100 contains: movies, directors and actors. The rating agent 90 may also aggregate a community profile for a media content item relating to ratings from a community of users, such as an explicit rating provided by one or more users within a community of users. In one embodiment, a community preference profile is an aggregation of one or more user preference profiles 95. The rating agent 90 stores these non-personalized ratings in the data store 100 for use by other components of the system 10 when an explicit or predicted rating is unavailable for use in ranking a given media content item.
In one embodiment, the graph generator 110 is operative to generate two types of graphs from metadata and rating information. The first graph is referred to as the “movie-casting graph”, which relates actors, directors, cast members, etc. and one or more movies. The second graph is referred to as the “movie-user” graph, which relates a plurality of movies on the basis of preference and rating information that other components of the system 10 generate, such that given a first movie, the system 10 may determine one or more movies that relate to the first movie. According to one embodiment of the invention, the graph generator 110 generates the movie-user graph on a per-user basis whereby graphs for various users may be related, e.g., when two users rate the same movie, actor, director, etc. Alternatively or additionally, the graph generator 110 may generate one or more aggregate user-movie graphs for one or more communities or groups of users. The graph generator 110 stores the graphs in the data store 100.
The user preference agent 85 tracks, maintains, stores and otherwise utilizes user preferences in one or more preference profiles 45 that the profile data store 95 maintains, including maintenance of explicit ratings for media content items. In some embodiments, the user preference agent 85 tracks data relating to a user, such as a user's search query history, a user's explicit preferences, user-defined ratings and/or other information relating to media content items, including ratings for or indications of interest in movies, actors, or genres of movies, and movies viewed or visited by a user. Preference information may be stored in one or more profiles 45 in the profile data store 95.
The user preference agent 85 stores data relating to each of a user's prior search queries. In some embodiments, user search queries are stored in one or more profiles 95 in association with data relating to a user's subsequent use of the search query results. For example, data relating to a selection of a particular item in a result set generated based on a user search query may be stored in association with the search terms of the query. Such selection information may provide, for example, an indication of user interest in the particular result (e.g., a rating), which may be used when ranking results. In addition to maintaining profiles 45 for individual users, the profile data store 95 may provide persistent storage for community or group profiles 45 indicating the preferences for a group or community of users.
The user preference agent 85 also provides other personalization features. The user preference agent 85 analyzes query and interaction information (e.g., how the user interacts with items in a result set) in the user profiles 45 to provide personal information to a user. For example, what search terms appear most frequently in user queries, what movies and people the user visited most, either directly or indirectly, the user's favorite genres, etc. The user preference agent 85 may also provide a user with personalized top recommendations, for example, the top ten rated movies, directors, etc. for a given user. Alternatively or in conjunction top recommendations may be provided across a community or group of users. According to one embodiment, the user preference agent 85 may generate recommendations for movies using an item-based collaborative filtering algorithm, whereas actor/director recommendations may be produced on how many “A” rated movies, either explicit or predicted, in which the actor has starred.
The client device 50 communicates data, such as search terms or criteria in a search query, to the server 70 via the network 60. The server 70 may be any device capable of performing a search in response to a query and supplying search results. In some embodiments, a user may enter search criteria into a user interface that one or more interface components 40 provide for display on the client device 50. The server 70 searches a database 100 containing data related to media content for data satisfying the search criteria. As discussed in detail below, the ranking component 64 may rank the search results automatically in accordance with parameters, e.g., item relevance against the search query, a user preference profile, a community preference profile, other preferences, ratings or other parameters. The search results may be forwarded by the interface components 40 to the client device 50 via the network 60 together with links to related media content items or metadata regarding the same.
The server 70 provides one or more interface components 40 that provide controls to allow the client device 50 to interact with the server 70. The interface components 40 handle searches from clients 50 for media content items and/or metadata regarding the same, and presentation of media content items and/or metadata regarding the same. When the interface components 40 receive a query from a given client 50, the interface components 40 execute the query against information that the data store 100 maintains. The data store 100 retrieves matching items that it sends to the personalizer component 62 for rating, possibly only in the absence of explicit rating information. According to one embodiment, the data store 100 performs a matching AND search for movie data (returning matches that contain all the terms in a given query), and a matching OR search for person data (returning matches that contain any query terms in a given query), which may be an actor, director, other cast member, etc.
The personalizer component 62 may utilize a user preference profile 45, which may be an individual preference profile, a community profile or other rating data, to provide actual or predicted rating information for a media content item that the data store 100 returns. The personalizer component 62 may associate rating information with media content and perform other rating operations.
The personalizer component 62 may generate predicted ratings for a given media content item and metadata associated with the given media content item, possibly based on a user preference profile or other parameters. In general, the personalizer component 62 may generate predicted ratings using an item-based collaborative algorithm, heuristic rating algorithm or other algorithm. Badrul Sarwar, George Karypis, Joseph Konstan, and John Reidl, “Item-based collaborative filtering recommendation algorithms”. The proceeding of the 10th World Wide Web Conference, 285-295, 2001, Hong Kong China. In one embodiment, if no user preference profile data is available, the personalizer component 62 generates a predicted rating on the basis of an average user rating, the number of users who supplied a rating, an award score and an average critic rating. The personalizer component 62 may generate output indicating the rating data for each media content item rated by the personalizer component 62, as well as ratings for other metadata items (e.g., actors, directors, etc.).
The ranking component 64 gathers all rating information for one or more given media content items. Available rating information includes global rating information from the rating agent 90, explicit rating information that profiles 45 maintain and/or predicted rating information from the personalizer component 62. On the basis of rating information, the ranking component 64 dynamically generates a personalized ranking of the result set.
The ranking component 64 may perform ranking on the basis of a relevance judgment algorithm, which is a combination of authorities (e.g., ratings) and proximities (e.g., similarities between specific metadata for a given media content item and terms in a given query) of the items that the data store 100 returns. One example of the proximities of the returned items is the relevance or similarity between media content items (or data associated with a media content item, such as a title or character) and terms comprising a given search query. In some embodiments, proximity is measured in a logical manner. For example, proximity may be obtained by counting the number of bigrams and unigrams in a search query that appear or match data in a metadata field associated with a media content item, such as a title, actor name, crew name, or other metadata.
The system 10 may calculate a proximity score by combining bigram and unigram scores with a score for a match in other metadata fields, such as plot summary, character names or other metadata. In general, title and bigram matches may have greater weight than other matches. Some ranked search results, recommended media content, or other media content, are obtained using the bigram score of the matching search results. In some embodiments, if there is a tie in the bigram score, the system 10 uses the unigram score. If there is also a tie in the unigram score, the authorities or ratings of the matching search results are used to break the tie.
In other embodiments, search results are ranked according to a combination of proximities (or relevance) and authorities (or ratings). For example, a search result ranking can be obtained using the formula presented in Table 1:
The authority value that the ranking component 64 uses in the formula of Table 1 may be calculated according to one or more types of authority, including an explicit rating, a predicted rating, and a global popularity. Explicit ratings are actual ratings entered by a user, which the user preference agent 85 captures for persistent storage in a user profile 45 in the profile data store 95. The personalizer component 62 calculates predicted ratings on the basis of a preference profile 45 for the user that the profile data store 95 maintains. For example, the personalizer component 62 may calculate predicted ratings of unrated movies according to an item-based collaborative algorithm. When the ranking component 64 does not have an explicit rating for a media content item and cannot provide a predicted rating due to a lack of user preference information, the ranking component 64 may apply global popularity that the rating agent 90 calculates on the basis of the cumulative preferences of all of system 10 users. The global popularity of a given media content item, “i”, may be the sum of one or more factors, including: average user rating (avg(i)), the number of users who rated an item i (n(i)), an award score, and an average critic rating. Thus, in some embodiments, global popularity may be calculated according to the formula of Table 2:
Users may also rate actors, directors, other cast members, etc., and the ranking component 70 may generate global popularity ratings regarding the same. In some embodiments, the system 10 provides global popularity of actors and directors, using factors including a normalized sum of the top movies for a given actor, top movies for a given director, etc. According to one embodiment, individuals, who participate in a movie as an actor and a director, are only counted once. To avoid overemphasizing some actors, a credit line limitation may be used such that that actors in a movie may be limited to only movies in which the actor played a major role, e.g., credited as one of the top five actors in the movie.
The ranking component 64 may produce a ranked result set comprising links to media content items or information regarding the items. The interface components 40 may generate or receive controls for presentation to the user, e.g., other user interface elements known to those of skill in the art. The interface components 40 transmit the ranked result set and any additional user interface or client side components over the network 60 to the client 50. The client 50 receives the data and renders them on a display device (not pictured) for the user.
One embodiment of a method for searching media content items is presented by the flowchart of
The system receives the results of the search query and determines whether the user has a preference profile, step 202. In some embodiments, a preference profile may be obtained by querying a user preference agent or other system component for user preference information, which may be stored on a persistent storage device according to one or more user or group preference profiles. If the user does not have a preference profile, the system may calculate a global popularity for a given item in the result set, which may also be pre-computed and maintained with the given media content item, step 204. For example, in the case when a user does not have a preference profile, the search results are ranked according to a global popularity for a given media content item, which may include a consideration of a number of factors such as a community preference profile, general rankings, global ratings, relevance of terms, proximity of returned items, link characteristics of items in the results set, or other ranking or weighting factors. The result set is ranked according to the global popularity of items contained therein, step 206.
Where the check performed at step 202 evaluates to true and the user has a preference profile, a check is performed to determine if the user's preference profile contains explicit ratings for media content items in the result set, step 208. Where the user's preference profile does not contain an explicit rating for a media content item, step 208, the system calculates a predicted rating for the media content item, step 210. A predicted rating for a media content item may be calculated on the basis of one or more items of information contained in the user's preference profile. For example, where the user's preference profile indicates a high preference for science fiction movies, as well as a high preference for movies starring Keanu Reeves, the system predicts that the user will rate movies highly in which both theses preferences are present. The system ranks the result set according to the predicted ratings of items contained therein, step 212.
Where the check performed at step 208 evaluates to true, indicating that the user's preference profile contains explicit ratings for media content items in the result set, the result set is ranked according to the explicit ratings, step 214. The result set may be transmitted over the network from the system to a client device and displayed on the client device, step 216. In some embodiments, the search results may be transmitted together with data or links to data indicating other relevant media content, such as recommended movies, recommended actors, similar movies, or other related media content.
The system may track the user's search query, the user's use of the result set, such as items viewed, ordered, or otherwise utilized by the user, in addition to other user activity, step 218. The system stores such tracking information, e.g., using a user preferences agent, or other system component, in the user preference profile.
According to a variation of the method of
If a user has a preference profile, the system may weight or rank items in the result set in accordance with the user's preference profile. According to some embodiments, the system weights or ranks items in the result set in accordance with the preference profile as presented in the flow diagram illustrated in
The rating data may comprise explicit rating data assigned by a user, e.g., in a user preference profile, or rating data from other users, such as users in a community or ratings from other entities, such as ratings of a rating body, a critic's ratings or other rating data. In addition, rating data may comprise predicted rating data, such as a user's predicted rating for a particular media content item, which may be based on past ratings or ratings for fields of metadata, e.g., genre. The predicted rating may be generated when a user has not explicitly rated a media content item, and may be based on a parameters which include media content type, media content metadata, a user preference profile, community preference profile, or other rating data.
The system receives a result set responsive to a search, step 320, and the search results are weighted in accordance with a ranking algorithm, step 330. In some embodiments, the search result ranking is automatically or partially automatically generated on the basis of parameters using a relevance judgment algorithm, or other weighting algorithm, which includes, for example, weighting a result on the basis of a general popularity of the media content item (e.g., based on a number of user votes for a particular media content item, or other indicator of popularity), relational inferences, personal ratings (explicit and predicted based on, e.g., a user preference profile), global ratings, community preference profile, and proximities of the returned items.
In some embodiments, ranking algorithm of Table 3 may be used to rank media content items:
The system identifies metadata associated with each of the media content items included in the ranked search results, step 340. This may include, for example, identifying metadata associated with each media content item in a given result set, such as the actor, genre, plot summary, character or director of a movie included in the result set. This may further include identifying metadata associated with the media content item by querying a system data store or other system component. The system searches the data store for other media content items that match the identified metadata, step 350. The metadata search may provide results which include, for example, media content items associated with a particular actor, director, genre, plot summary, character or other metadata that is associated with the media content item in the ranked search results.
In addition, the metadata search may identify recommended or similar media content items based on media content items that have metadata related to metadata associated with the media content item included in a result set. Thus, the system provides a second set of search results, which includes the one or more media content items with metadata related to or matching the metadata associated with the media content included in the result set. In some embodiments, similar items of media content are obtained by analyzing metadata for media content items and a user preference profile using adjusted cosine similarity to calculate similarity, e.g., the similarity between movies i and j:
Ui,j and Ui denote a set of users who vote or rate movies i and j and a set of users who vote or rate movie i, respectively. vu,i and
The metadata search results are received by a system component, such as the metadata agent or other system component, step 360, and are weighed according to a ranking algorithm, step 370. The ranking algorithm may be one of the ranking algorithms discussed herein or other algorithms, which includes factors such as proximity, relevance or similarity, user preference profile, community preference profile, explicit ratings, predicted ratings, community ratings, global ratings, or other rating. The ranking algorithm may request user input to select between or identify criteria for ranking the search results. For example, the user may value global popularity more than his or her previous explicit ratings.
As discussed herein, the system transmits the result set to a client device, which may include the transmission of media content items, links to media content items, metadata for media content items, etc. Examples of the list of search results and related metadata are presented to a user as presented in the user interfaces illustrated in
One exemplary user preference profile is presented in the screen diagram illustrated at
The user preference profile also includes a list of recommended media content items on the basis of user preferences. Recommended media content items 680 and/or recommended actors, directors, etc. 690 are presented as links to web pages with information regarding the selected media content item or person. In general, the recommended media content items and people (e.g., directors, producers, cast and crew members) are generated in accordance with parameters such as a user preference profile, community preference profile, ranking or rating data, or other parameter. For example, cast and crew recommendations are generated based on a number of “A−” or highly rated movies either explicitly rated or predicted to be rated with which the cast and crew is associated.
The related media content items listed in connection with selected items from a result set, e.g., connected movies by common cast and crew member(s), similar movies, recommended movies, recommended cast/crew, etc., make it possible for a user to search for one media content item and quickly navigate through related media content, e.g., by selecting a web link, to obtain information about an unknown item of media content, or a media content items that may have been known, but which the user had no search terms with which to locate information about the media content item.
An example of a result set generated in response to a search query and provided to a user is depicted in
According to the interface illustrated at
The similar movies list 670 may be obtained by analyzing metadata regarding media content items, a user preference profile or other rating information. The similar movies list 670 presents media content items that are similar to a selected or viewed item of media content based on one or more user profiles. A user is presented with media content items that are similar to the media content item selected or viewed, which provides an easy way to browse similar media content options to locate an item of interest.
As stated above, some embodiments may enable a user to choose how the search results may be ranked, e.g., whether to apply personalized ranking or global ranking. If personalized ranking is selected, the system may search for user explicit ratings for content items. If the user has an explicit rating, then it becomes the rating of the content item. If not, the system may calculate a predicted rating of the content item for the given user, which becomes the rating of the content item for the given user. If the system fails to or cannot generate a predicted rating, then the system may use a non personalized global rating of the content item as a rating for the given user.
The system may enable the user to select global ranking alone. If global ranking is selected, the system may use only the global ratings, e.g., not the explicit and/or predicted ratings, of content items to rank the result set.
Other embodiments may enable the user to select parameters to use for the ranking function. For example, the system may provide four ranking options, e.g., Web Relevance, DB Relevance, Rating, and MADRank.
If Web relevance is selected, the system may score an item based on the item's relative position on the web search result within, for example, movies.yahoo.com.
If DB relevance is selected, the system may score based on database relevance score returned by MySql. To improve DB relevance, other programming library, such as Apache Lucene (http://lucene.apache.org/java/docs/), which is high-performance, full-featured text search engine library, can be used to calculate better relevance between a query and an items.
It will be appreciated that Web relevance and DB relevance may have the same result in both personalized and global ranking, since there are no personalized features in either.
If “Rating” is selected, the system may score an item based on the personalized ratings technique described above.
If MADRank is selected, then the system may score a given item based on following equation: MADRank=(Rating+argmax(DB−Relevance, Web−Relevance))/2. When a user choses “personalized ranking” and MADRank, then the system may compute the MADRank value where Rating is based on the personalized rating algorithm described above. If the user chooses global ranking and MADRank, then the system may compute the MADRank value where Rating is based on the global ranking algorithm discussed above.
Although not shown, the system may have a “Web Analyzer”. The web analyzer nay be operative to improve user experience. When a query comes from a user, the web analyzer may submit the query to the search technology, e.g., Yahoo Search Technology, to check for typo and get query suggestions. An example of spelling correction is shown in
Several example screen shots of various embodiments follow.
While the invention has been described and illustrated in connection with preferred embodiments, many variations and modifications as will be evident to those skilled in this art may be made without departing from the spirit and scope of the invention, and the invention is thus not to be limited to the precise details of methodology or construction set forth above as such variations and modification are intended to be included within the scope of the invention.