|Publication number||US7271849 B2|
|Application number||US 11/171,871|
|Publication date||Sep 18, 2007|
|Filing date||Jun 30, 2005|
|Priority date||Oct 19, 2000|
|Also published as||US7023492, US7274407, US7286189, US20020047918, US20050078220, US20050253968, US20050253969|
|Publication number||11171871, 171871, US 7271849 B2, US 7271849B2, US-B2-7271849, US7271849 B2, US7271849B2|
|Inventors||Gary J. Sullivan|
|Original Assignee||Microsoft Corporation|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (17), Non-Patent Citations (13), Classifications (9), Legal Events (3)|
|External Links: USPTO, USPTO Assignment, Espacenet|
This application is a Continuation of U.S. patent application Ser. No. 09/982,127, filed Oct. 18, 2001, entitled “Method and Apparatus for Encoding Video Content,” now U.S. Pat. No. 7,023,492, and incorporated herein by reference. That application claims the benefit of U.S. Provisional Application No. 60/241,296, filed Oct. 19, 2000, the disclosure of which is incorporated herein by reference.
The present invention relates to video encoding systems and, more particularly, to an encoding system and method that provides multiple display regions for a particular display type. The present invention further provides a flexible mechanism for specifying the portions of an image to display.
Video content may be encoded using various encoding techniques. Today there are many different types of display devices that may eventually display encoded video content. These different display devices include televisions having different aspect ratios (e.g., a 16:9 aspect ratio or a 4:3 aspect ratio) or different screen resolutions (e.g. 480 vertical lines of resolution or 1080 vertical lines of resolution), computer monitors having different aspect ratios and/or picture resolutions, and portable video players with various aspect ratios and different screen resolutions. The use of the display may also include using only a sub-regions of such display devices rather than filling the entire display with a single video content stream (e.g., “picture-in-picture” or display in a “window” on a computer screen). At the time the video content is encoded, it may not be known which display types may eventually display the video content, and the same encoded content may be used in a wide variety of display environments.
The encoded video content is provided to a transmitter 108, which transmits the encoded video content to one or more receivers 110 across a communication link 112. Communication link 112 may be, for example, a physical cable, a satellite link, a terrestrial broadcast, an Internet connection, a physical medium (such as a digital versatile disc (DVD)) or a combination thereof. A video decoder 114 decodes the signal received by receiver 110 using the appropriate decoding technique. The decoded video content is then displayed on a video display 116, such as a television or a computer monitor. Receiver 110 may be a separate component (such as a set top box) or may be integrated into video display 116. Similarly, video decoder 114 may be a separate component or may be integrated into the receiver 110 or the video display 116.
Video content may be captured and encoded into a format having a particular aspect ratio (such as 16:9) and later displayed on a video display having a different aspect ratio (such as 4:3). Various methods are available for displaying an image on a video display having a different aspect ratio.
Another alternative for displaying a 4:3 image on a 16:9 video display is shown in
In existing video encoding systems, the encoded video content includes an indication of how to display the encoded video image on different types of displays. For example, if a video image has a 16:9 aspect ratio, the encoded video image includes information regarding how to display the video image on a video display having a 4:3 aspect ratio or a 2.3:1 aspect ratio. However, the information for each different type of video display (e.g., different aspect ratios) has a single option for displaying the image on that type of video display. Existing systems do not support multiple different image display regions that are specified in the video content stream and associated with a particular type of video display. For example, these different image display regions may focus on different characters appearing in the video content. Although existing systems allow a user to select among different pre-defined display formats (such as letterboxing or overscanning), these systems do not support multiple different encoder-specified image display regions, as described herein. Typically, if multiple display regions are specified, they are pre-defined display formats that are defined in the decoder rather than being transmitted with the video content stream.
Additionally, if a portion of an image is to be deleted after decoding (for example, because the output of the decoder is not a multiple of the fundamental “macroblock” dimensions used to represent the video in a compressed domain), existing video encoding systems typically delete content along the right edge of the image and/or along the bottom edge of the image. These systems may not provide the ability to specify which portion of the image to discard if a portion of the image needs to be deleted. Instead, these systems can only delete the portion of the image along the right edge or the bottom edge. Existing systems do not provide support for multiple different image framings for a given display type combined with the ability to delete specific portions of the image.
The systems and methods described herein address the above limitations by providing a system that encodes video content such that a user of a video display can select among multiple image framings for displaying the encoded image on the video display. Moreover, the video encoding systems and methods described herein are capable of specifying the particular portion of an image to be deleted if a portion of the image needs to be discarded.
The systems and methods described herein allow video content to define (e.g., within the video content stream) several different display regions for each type of video display device, thereby allowing the user of the video display device to determine, based on the user's preferences, the manner in which the video content is displayed. Thus, the user of the video display device is not limited to a single display region for a given display aspect ratio or to pre-defined display formats (such as letterboxing or overscanning). Additionally, the systems and methods described herein allow for the identification of an active region, which defines the portion of the image that has meaningful information. This active region may be located anywhere within the image, thereby providing flexibility in determining which portions of the image to discard and which portions of the image to display, based on framing the identified active region area in relation to the chosen defined display region.
In one embodiment, video data to be encoded is identified. Additionally, multiple display regions associated with each particular video display type are identified. Each of the multiple display regions is associated with a different portion of an image associated with the video data. The video data is encoded such that the encoded video data includes information regarding the multiple display regions.
In another embodiment, the encoded video data is stored using a storage device.
In a described embodiment, the encoded video data is transmitted to multiple destinations.
In a particular implementation, each display region has an associated display region identifier.
Another embodiment identifies the area of the video content containing valid material that may be suitable for display. An active region of the video data to be encoded is identified. The active region may be located anywhere within an image associated with the video data. Multiple display regions associated with the video data are also identified. The video data is encoded such that the encoded video data includes an indication of the active region and includes information sufficient to specify the intersection of the multiple display regions with that active region.
The systems and methods described herein allow video data to define multiple display regions for each type of video display device on which the video data may be displayed. This permits each user to select among the various display regions based on that user's viewing preferences. For example, one user may choose to view all of the video data, which may create blank bands along the top and bottom edges of the video display device. Another user may choose to have the display device's screen filled with the video data, which may cause some portions of the left and right edges of the video image to be cropped to fill the display device. Another user may choose to have the displayed video image focus on a particular actor or actress in the program or movie being displayed. Thus, portions of the image may be reduced and/or cropped depending on the particular display region selected by the user, and the encoded representation of the video data includes identification of the display regions that can be selected for display (for example, identification of which region focuses on a particular actor).
The systems and methods described herein also allow an active region to be identified. The active region defines the portion of the image that has meaningful information. The active region typically excludes portions of the image that contain artifacts or other undesirable data. The active region may be located anywhere within the image, thus providing flexibility in defining which portions of the image should be discarded and which portions of the image should be displayed. In combination with the identification of a display region, the area to be shown on the display would consist of the intersection of the active region with the chosen display region.
Active region locator 308 identifies an active region associated with particular video content. Video encoder 304 also includes a video encoding engine 310, which encodes video content and other data (such as display region information and active region information). The output of video encoder 304 is communicated to a transmitter 312, which transmits the encoded video signal to one or more receivers. Alternatively, transmitter 312 may be a storage device that stores the encoded video signal (e.g., on a DVD or other memory device).
Receiver 320 receives an encoded video signal and communicates the received signal to a video decoder 322. Alternatively, receiver 320 may be a device (such as a DVD player) capable of reading stored encoded video content (e.g., stored on a DVD). Video decoder 322 includes a display region locator 324 and an active region locator 326. Display region locator 324 identifies one or more display regions encoded in the video signal (or transmitted along with the video signal). Active region locator 326 identifies an active region encoded in the video signal. Video decoder 322 also includes a video decoding engine 328 which decodes the encoded video signal, including the various display regions associated with the video signal. After decoding the video signal, video decoder 322 communicates the decoded video content to a video display 330 which renders the image defined by the decoded video content. Video decoder 322 may be a separate device or may be incorporated into another device, such as a television or a DVD player.
As mentioned above, a video image may be captured using one aspect ratio (such as 16:9) and displayed on video display devices having different aspect ratios (such as 4:3). Providing multiple display regions for each type of video display device allows a user to choose how the image is displayed based on the user's viewing preferences. Further, the multiple display regions allow a user to focus the display on a particular character or feature of the video content.
The original video image 400 is identified by a solid line. A first display region 402 aligns the top and bottom edges of the video display with the top and bottom edges of the original video image 400. Display region 402 has a 4:3 aspect ratio, which matches the video display. In this situation, the entire 4:3 video display is filled, but the right and left edges of the original video image 400 are deleted (i.e., the portions between broken lines 402 and the sides of the original video image 400). As shown in
A second display region 406 aligns the right and left edges of the video display with the right and left edges of the original video image 400. Display region 406 has a 4:3 aspect ratio, which matches the video display. In this situation, the 4:3 video display is filled to the left and right edges, but blank bands are located along the top and bottom edges of the video display. The blank bands extend from the top of the original video image 400 to the top of display region 406, and from the bottom of the original video image 400 to the bottom of display region 408.
A third display region 404 represents a compromise between display regions 402 and 406. Display region 404 also has a 4:3 aspect ratio, which matches the video display. In this situation, a portion of the original video image 400 is deleted along the left and right edges (i.e., the portion between broken line 404 and the right and left edges of the original video image). The portion of the original video image 400 that is deleted is approximately one-half the amount that is deleted by display region 402. Additionally, black bands are located along the top and bottom edges of the video display. The black bands extend from the top of the original video image 400 to the top of display region 404. The black bands created by third display region 404 are approximately one-half the size of the black bands created by display region 406.
A fourth display region 408 focuses on a particular character or object in the original video image 400 (e.g., the viewer's favorite actor or actress). Display region 408 typically moves around the original video image 400 to follow the particular character or object. In another implementation, display region 408 may be enlarged to fill all (or a majority) of the screen of the display device (e.g., zoom in on the particular character or object). Alternative embodiments may include any number of different display regions associated with a particular video display type (e.g., a 4:3 aspect ratio television).
If the actor or object being highlighted by the display region is not present in a particular scene, then there may be no region identifier identified with that actor or object for that scene. In this situation, the system may switch to a different display region that shows other characters and objects in the scene. When the preferred actor or object returns to the scene, the system can switch back to the display region that highlights that actor or object.
Thus, the user of the video display can select among the four different display regions, depending on their viewing preferences. The first display region 402 fills the 4:3 video display, but deletes the greatest portion of the original video image 400. The second display region 406 contains all of the original video image 400, but has the largest blank bars along the top and bottom edges of the 4:3 video display. The third display region 404 reduces the size of the blank bars along the top and bottom edges of the 4:3 video display, but also deletes a portion of the original video image 400. The fourth display region 408 focuses on a particular character or object and deletes most of the remaining portions of the image. This allows the user to select the appropriate display region based on their viewing preference. Further, a user may watch the same video content at different times selecting different display regions.
The location of each display region can be defined by using four parameters: 1) the offset from the top of the image rectangle to the top of the display region, 2) the offset from the left side of the image rectangle to the left side of the display region, 3) the offset from the right side of the image rectangle to the right side of the display region, and 4) the offset from the bottom of the image rectangle to the bottom of the display region. Alternatively, two parameters could identify the four corners of the display region (e.g., the (x,y) location of the upper left corner of the display region and the (x,y) location of the lower right corner of the display region. These numbers may be integer numbers based on the row and column (i.e., line) address of a digital sample, or could have greater precision, such as 1/16th pixel accuracy.
Each display region identifier is included with the definition of the display region. At block 508, the procedure determines whether additional video display types are to be supported. If so, the procedure identifies the next video display type (block 510) and returns to block 508 to identify display regions associated with the next video display type.
If no additional video types are supported at block 508, the procedure branches to block 512, which encodes the video content including all display types and all display regions associated with each display type. Finally, the encoded video content is communicated to a destination (block 514). A destination may be, for example, a transmitter that transmits the encoded video content to one or more receivers, or a recording device that records the encoded video content for future transmission or playback.
The active region, defined by broken lines 704, defines the portion of the image that has meaningful information. For example, an active region may exclude portions of an image that contain artifacts or other data that should be discarded. Certain image capture devices may introduce distortion or other undesirable data along the edges of the image. By specifying an active region that does not include the edges of the image, such distortion and other undesirable data is not displayed to the user of a video display device. The active region may be located anywhere within the original video image.
As shown in
The intersection of the active region and the display region is used to determine the actual image displayed on the display device. In the example of
The example active region illustrated in
The exemplary computing environment 900 is only one example of a computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the computer and network architectures. Neither should the computing environment 900 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary computing environment 900.
The video encoding and decoding systems and methods described herein may be implemented with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use include, but are not limited to, personal computers, server computers, multiprocessor systems, microprocessor-based systems, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and so on. Compact or subset versions may also be implemented in clients of limited resources.
The computing environment 900 includes a general-purpose computing device in the form of a computer 902. The components of computer 902 can include, by are not limited to, one or more processors or processing units 904, a system memory 906, and a system bus 908 that couples various system components including the processor 904 to the system memory 906.
The system bus 908 represents one or more of several possible types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures can include an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an Enhanced ISA (EISA) bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnects (PCI) bus also known as a Mezzanine bus.
Computer 902 typically includes a variety of computer readable media. Such media can be any available media that is accessible by computer 902 and includes both volatile and non-volatile media, removable and non-removable media.
The system memory 906 includes computer readable media in the form of volatile memory, such as random access memory (RAM) 910, and/or non-volatile memory, such as read only memory (ROM) 912. A basic input/output system (BIOS) 914, containing the basic routines that help to transfer information between elements within computer 902, such as during start-up, is stored in ROM 912. RAM 910 typically contains data and/or program modules that are immediately accessible to and/or presently operated on by the processing unit 904.
Computer 902 may also include other removable/non-removable, volatile/non-volatile computer storage media. By way of example,
The disk drives and their associated computer-readable media provide non-volatile storage of computer readable instructions, data structures, program modules, and other data for computer 902. Although the example illustrates a hard disk 916, a removable magnetic disk 920, and a removable optical disk 924, it is to be appreciated that other types of computer readable media which can store data that is accessible by a computer, such as magnetic cassettes or other magnetic storage devices, flash memory cards, CD-ROM, digital versatile disks (DVD) or other optical storage, random access memories (RAM), read only memories (ROM), electrically erasable programmable read-only memory (EEPROM), and the like, can also be utilized to implement the exemplary computing system and environment.
Any number of program modules can be stored on the hard disk 916, magnetic disk 920, optical disk 924, ROM 912, and/or RAM 910, including by way of example, an operating system 926, one or more application programs 928, other program modules 930, and program data 932. Each of the operating system 926, one or more application programs 928, other program modules 930, and program data 932 (or some combination thereof) may include elements of the video encoding and/or decoding algorithms and systems.
A user can enter commands and information into computer 902 via input devices such as a keyboard 934 and a pointing device 936 (e.g., a “mouse”). Other input devices 938 (not shown specifically) may include a microphone, joystick, game pad, satellite dish, serial port, scanner, and/or the like. These and other input devices are connected to the processing unit 904 via input/output interfaces 940 that are coupled to the system bus 908, but may be connected by other interface and bus structures, such as a parallel port, game port, or a universal serial bus (USB).
A monitor 942 or other type of display device can also be connected to the system bus 908 via an interface, such as a video adapter 944. In addition to the monitor 942, other output peripheral devices can include components such as speakers (not shown) and a printer 946 which can be connected to computer 902 via the input/output interfaces 940.
Computer 902 can operate in a networked environment using logical connections to one or more remote computers, such as a remote computing device 948. By way of example, the remote computing device 948 can be a personal computer, portable computer, a server, a router, a network computer, a peer device or other common network node, and so on. The remote computing device 948 is illustrated as a portable computer that can include many or all of the elements and features described herein relative to computer 902.
Logical connections between computer 902 and the remote computer 948 are depicted as a local area network (LAN) 950 and a general wide area network (WAN) 952. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets, and the Internet.
When implemented in a LAN networking environment, the computer 902 is connected to a local network 950 via a network interface or adapter 954. When implemented in a WAN networking environment, the computer 902 typically includes a modem 956 or other means for establishing communications over the wide network 952. The modem 956, which can be internal or external to computer 902, can be connected to the system bus 908 via the input/output interfaces 940 or other appropriate mechanisms. It is to be appreciated that the illustrated network connections are exemplary and that other means of establishing communication link(s) between the computers 902 and 948 can be employed.
In a networked environment, such as that illustrated with computing environment 900, program modules depicted relative to the computer 902, or portions thereof, may be stored in a remote memory storage device. By way of example, remote application programs 958 reside on a memory device of remote computer 948. For purposes of illustration, application programs and other executable program components such as the operating system are illustrated herein as discrete blocks, although it is recognized that such programs and components reside at various times in different storage components of the computing device 902, and are executed by the data processor(s) of the computer.
An implementation of the system and methods described herein may result in the storage or transmission of data, instructions, or other information across some form of computer readable media. Computer readable media can be any available media that can be accessed by a computer. By way of example, and not limitation, computer readable media may comprise “computer storage media” and “communications media.” “Computer storage media” include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer.
“Communication media” typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as carrier wave or other transport mechanism. Communication media also includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media. Combinations of any of the above are also included within the scope of computer readable media.
Alternatively, portions of the systems and methods described herein may be implemented in hardware or a combination of hardware, software, and/or firmware. For example, one or more application specific integrated circuits (ASICs) or programmable logic devices (PLDs) could be designed or programmed to implement one or more portions of the video encoding or video decoding systems and procedures.
Although the description above uses language that is specific to structural features and/or methodological acts, it is to be understood that the invention defined in the appended claims is not limited to the specific features or acts described. Rather, the specific features and acts are disclosed as exemplary forms of implementing the invention.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US5134697 *||Dec 10, 1990||Jul 28, 1992||Prime Computer||Remote memory-mapped display with interactivity determination|
|US5623308||Jul 7, 1995||Apr 22, 1997||Lucent Technologies Inc.||Multiple resolution, multi-stream video system using a single standard coder|
|US5936616||Aug 7, 1996||Aug 10, 1999||Microsoft Corporation||Method and system for accessing and displaying a compressed display image in a computer system|
|US5995146||Jan 24, 1997||Nov 30, 1999||Pathway, Inc.||Multiple video screen display system|
|US6020931 *||Apr 24, 1997||Feb 1, 2000||George S. Sheng||Video composition and position system and media signal communication system|
|US6049333||Sep 3, 1996||Apr 11, 2000||Time Warner Entertainment Company, L.P.||System and method for providing an event database in a telecasting system|
|US6078403||Oct 21, 1996||Jun 20, 2000||International Business Machines Corporation||Method and system for specifying format parameters of a variable data area within a presentation document|
|US6088045||Jul 22, 1991||Jul 11, 2000||International Business Machines Corporation||High definition multimedia display|
|US6141442||Jul 21, 1999||Oct 31, 2000||At&T Corp||Method and apparatus for coding segmented regions which may be transparent in video sequences for content-based scalability|
|US6256785||Dec 23, 1996||Jul 3, 2001||Corporate Media Patners||Method and system for providing interactive look-and-feel in a digital broadcast via an X-Y protocol|
|US6263313 *||Nov 30, 1998||Jul 17, 2001||International Business Machines Corporation||Method and apparatus to create encoded digital content|
|US6456335||Oct 6, 1998||Sep 24, 2002||Fujitsu Limited||Multiple picture composing method and multiple picture composing apparatus|
|US6493008 *||Feb 16, 2000||Dec 10, 2002||Canon Kabushiki Kaisha||Multi-screen display system and method|
|US6510177 *||Mar 24, 2000||Jan 21, 2003||Microsoft Corporation||System and method for layered video coding enhancement|
|US6724434||Mar 9, 2000||Apr 20, 2004||Nokia Corporation||Inserting one or more video pictures by combining encoded video data before decoding|
|US7023492 *||Oct 18, 2001||Apr 4, 2006||Microsoft Corporation||Method and apparatus for encoding video content|
|US7120924 *||Nov 17, 2000||Oct 10, 2006||Goldpocket Interactive, Inc.||Method and apparatus for receiving a hyperlinked television broadcast|
|1||ISO/IEC, "Coding of Audio-Visual Objects: Visual, ISO/IEC 14496-2," 14496-2, 330 pp. (1998).|
|2||ISO/IEC, "Coding of Moving Pictures and Associated Audio for Digital Storage Media at Up to About 1.5 Mbit/s, Part 2: Video," 1117-2, 112 pp. (1993).|
|3||ITU-T Recommendation H.261, "Line of Transmission of Non-Telephone Signals," International Telecommunications Union, pp. i, 1-25 (Mar. 1993).|
|4||ITU-T Recommendation H.262, "Transmission of Non-Telephone Signals," International Telecommunications Union, 204 pp. (Jul. 1995).|
|5||ITU-T Recommendation H.263, "Series H: Audiovisual and Multimedia Systems, Infrastructure of Audiovisual Services-Coding of Moving Video," International Telecommunications Union, 155 pp. (Feb. 1998).|
|6||Muranoi et al., "Video Retrieval Method using ShotID for Copyright Protection Systems," Proc. SPIE vol. 3527, pp. 245-252 (Nov. 1998).|
|7||R. Schumeyer and K. Barner, "A Color-Based Classifier for Region Identification in Video," SPIE vol. 3309, pp. 189-200.|
|8||R. Schumeyer and K. Barner,"A Color-Based Classifier for Region Identification in Video," SPIE vol. 3309, pp. 189-200.|
|9||Reader, "History of MPEG Video Compression-Ver. 4.0," 99 pp. [Document marked Dec. 16, 2003].|
|10||Sullivan et al., "The H.264/AVC Advanced Video Coding Standard: Overview and Introduction to the Fidelity Range Extensions," 21 pp. (Aug. 2004).|
|11||U. Ladebusch, "Co-ordination of Identification Numbers for DVB (Digital Video Broadcasting) Programs," Femseh- und Kino-Technik, vol. 54, No. 7, Jul. 2000. pp. 410-414.|
|12||Wiegand, "Joint Model No. 1, Revision 1 (JM1-r1)," JVT-A003r1, 80 pp. (Document marked "Generated: Jan. 18, 2002").|
|13||Z. Miao-Ling and W. Dong-Hui, "Video Stream Segmentation Method Based on Video Page," Journal of Computer Aided Design & Computer Graphics, vol. 12, No. 8, Aug. 2000, pp. 585-589.|
|U.S. Classification||348/588, 348/E05.111, 348/E07.024|
|International Classification||H04N5/44, H04N7/08, H04N7/18|
|Cooperative Classification||H04N7/08, H04N7/0122|
|Feb 22, 2011||FPAY||Fee payment|
Year of fee payment: 4
|Dec 9, 2014||AS||Assignment|
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON
Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034543/0001
Effective date: 20141014
|Feb 25, 2015||FPAY||Fee payment|
Year of fee payment: 8