US 20080022343 A1
A system and method for providing multiple audio streams for a video over a network such as the Internet. The system comprises a server that includes an encryption unit and a slicing unit, a plurality of boxes, and an ordering box. The server encodes multiple audio streams and a single video to be sliced into segments. The sliced segments of multiple audio streams and the single video are seeded to a number of the plurality of boxes. This may be repeated for other videos with multiple streams. When the ordering box makes a request for a single video with a single audio, then the number of boxes with segments of the requested video and multiple audio streams, filters and sends the requested video and requested single audio stream to the ordering box. Similarly, multiple closed caption streams can be handled and provided like the audio streams.
1. A system comprising:
a plurality of units coupled to a network, wherein one of the units initiate a request for video and audio streams in response to a server to determine one or more units provided with segments of the requested video and a multiple audios for the requested video, each of the one or more units selects the requested audio and supply segments of the requested video and audio to the ordering box.
2. The system of
3. The system of
4. The system of
5. The system of
6. The system of
7. The system of
8. The system of
9. The system of
10. The system of
11. A method of providing media services comprising:
providing segments of a video and associated multiple audios to one or more units;
receiving a request from one of a plurality of units, the request including an order for video and audio;
identifying one or more units other than the requesting unit to provide segments pertaining to the requested video and audio to the requesting unit;
each of the identified one or more units, selecting the requested audio from the multiple audios associated with the requested video; and
streaming the requested video and audio from each of the identified one or more units to the requesting unit.
12. The method of
13. The method of 12, further comprising multiplexing the segments of the requested video and audio for transfer to the requesting unit.
14. The method of
15. The method of
16. The method of
17. The method of
18. The method of
19. The method of
20. The method of
21. A method of providing media services comprising:
providing segments of a program including a single video and multiple audios to a number of units in a plurality of units;
receiving a request for the program from one of the units in the plurality of units;
selecting a single video and a single audio from the program residing at the number of units; and
sending segments of the single video and single audio to the ordering unit.
22. The method of
23. The method of
24. The method of
This application is related to U.S. application Ser. No. 11/388,613, entitled “System and Method for Trick Play of Highly Compressed Video Data,” filed Mar. 23, 2006, and U.S. application Ser. No. 11/331,113, entitled “Access Control of Media Services Over an Open Network” and filed Jan. 10, 2006, and U.S. application Ser. No. 11/075,573, entitled “Continuous Data Feeding in a Distributed Environment” and filed Mar. 9, 2005, which patent applications are incorporated herein by reference in their entirety for all purposes.
1. Technical Field
The present invention is generally related to multimedia delivery over the Internet. Particularly, the present invention is related to techniques providing media services including movies with multiple audio streams on an open network, such as the Internet.
2. Description of the Related Art
Continuous or on-demand media data such as video and audio programs have been broadcasted over data networks (e.g., the Internet). Broadcast of such media information over data networks by digital broadcasting systems provides many advantages and benefits that cannot be matched by current television cable systems or over-the-air broadcasting.
With the media-over-network systems, service providers are often able to draw viewers into an exciting, interactive and enhanced television or viewing experience. Video-On-Demand (VOD) or Near Video-On-Demand (NVOD) collectively referred to herein as VOD programs are examples of the interactive television programs typically provided by a service provider to its subscribers. VOD programs are video sessions that subscribers can order whenever they want or per NVOD schedules.
To ensure quality of service (QoS), the bandwidth requirement of the network path (e.g., 108-1, 108-2, . . . 108-n) to each of the client machines 106-1, 106-2, . . . 106-n has to be sufficient. However, as the number of the subscribers continues to increase, the demand on the bandwidth of the backbone network path 110 increases linearly, and the overall cost of the system 100 increases considerably at the same time. If the server has a fixed bandwidth limit and system support capability, an increase in the number of subscribers beyond a certain threshold will result in slower transfer of data to clients. In other words, the transmission of the video data over the network 104 to the subscribers via the client machines 106-1, 106-2, . . . 106-n is no longer guaranteed. When the video data is not received in a client machine on time, the display of the video data may fail or at least become jittery.
To alleviate such loading problem to the video server 102, a video delivery system often employs multiple video servers as rendering farms, perhaps in multiple locations. Each of the video servers, similar to the video server 102, is configured to support a limited number of subscribers. Whenever the number of subscribers goes beyond the capacity of a video server or the bandwidth thereof, an additional video server needs to be deployed or additional bandwidth needs to be allocated. Subsequently, overall costs go up considerably when more subscribers sign up with the video delivery system 100.
Although more servers may be added to accommodate more subscribers, the implementation of the video server 102 present many challenges to consider in delivering programs over an open network. In general, movies come with a number of different audio tracks. Typically, a movie may include respective audio tracks in English, Spanish, French, Chinese, or other languages. Streaming multiple audio streams for each video increases the bandwidth requirements of the network. Increasing the bandwidth requirement for multiple audio tracks makes the overall cost of the system 100 too costly and may not be practically possible.
There have been various effort towards providing multiple audio tracks for a movie. One approach is to treat video with different audio tracks as different movies. For example, a video with English audio track is treated as a different movie from the same video with Spanish audio track. However, such an approach wastes storage space and bandwidth by duplicating videos for different audio tracks.
Thus, there is a need for improved techniques for cost effective ways for service providers to deliver programs with multiple audios to subscribers over an open network.
This section is for the purpose of summarizing some aspects of embodiments of the present invention and to briefly introduce some preferred embodiments. Simplifications or omissions in this section as well as the title and the abstract of this disclosure may be made to avoid obscuring the purpose of the section, the title and the abstract. Such simplifications or omissions are not intended to limit the scope of the present invention.
In general, the present invention relates to techniques for providing media services over an open network. To ensure that multiple audio streams are available for a video, the present invention provides techniques for encoding and sending multiple audio streams, and single video stream into a transport—seeding a “fat” transport. Then, streaming a transport with a single audio and a single video with other audios filtered out to a subscriber—streaming a “lean” transport.
According to another embodiment, the embodiment comprises encoding and sending each video and audio stream separately, then streaming a transport with requested single audio and single video in real-time.
According to one aspect of the present invention, data pertaining to a title is divided or organized into several segments that are distributed among boxes in service. General orders of titles being offered in a library are fulfilled by a group of selected client devices (e.g., boxes) delivering respective segments to an ordering box. Special orders of certain programs (e.g., a live event or a rare title not included in the library) are fulfilled directly by a server. In addition, the server is configured to supply some of the segments to an ordering box or back up any one of the selected boxes designated to supply the needed data to an ordering box. Because of its inherent superior computing power and more bandwidth, the server may deliver more than one segment at a time. The architecture contemplated in the present invention offers the flexibilities of being relatively independent from the number of users while, at the same time, offering centralized management or services to the users. The present invention inherently distributes load among client devices in service by using the computing power and bandwidth collectively available at any time in the client devices. Furthermore, much of the traditional server functionality now gets distributed among the client devices in service.
Embodiments of the invention may be implemented in numerous ways, including a method, system, device, or a computer readable medium. Several embodiments of the invention are discussed below. In one embodiment, the invention provides a method of providing media services over a network, the method comprises: receiving a request from one of a plurality of boxes (hereinafter “ordering box”), the request including an order of a title and an audio. The embodiments further comprise identifying one or more of the boxes other than the ordering box to provide distributed segments pertaining to the title to the ordering box, wherein the ordering box proceeds with downloading the distributed segments, and a playback of the title based on the distributed segments together with residing segments, if any, is started or continued.
According to another embodiment, the invention provides a system for providing media services, the system comprises a server coupled to a network and configured to manage the medial services, and a plurality of boxes coupled to the network, wherein one of the boxes (herein after “ordering box”) initiating a request including an order of a title communicates directly with the server configured to proceed with identifying one or more of the boxes other than the ordering box to provide distributed segments pertaining to the title to the ordering box, wherein the ordering box proceeds with downloading the distributed segments, and a playback of the title based on the distributed segments together with residing segments, if any, is started or continued. One of the objects, features, and advantages of the present invention is to provide various techniques related to streaming multiple audio tracks based on a distributed architecture, a client-server architecture, and a hybrid architecture taking the benefits, features, and advantages of both distributed architecture and client-server architecture.
It should be understood that each technique so described herein has its own distinctive features, and all techniques in combination yield an equally independently novel combination as well, even if combined in their broadest sense; i.e., with less than the specific manner in which each of the techniques has been reduced to practice.
Other objects, features, and advantages of the present invention will become apparent upon examining the following detailed description of an embodiment thereof, taken in conjunction with the attached drawings.
The invention will be readily understood by the following detailed description in conjunction with the accompanying drawings, wherein like reference numerals designate like structural elements, and in which:
The present invention is related to techniques of providing multiple audio streams of media services based on a distributed architecture or a hybrid architecture taking the benefits, features, and advantages of both distributed architecture and client-server architecture. Different from a prior art system in which one video with different audio tracks are treated as different movies, multiple audio streams and a single video are encoded together and sent to a number of boxes, a single audio and a single video are streamed to an ordering box with other audio tracks filtered out by boxes acting as a media content provider. Alternatively, each video and audio is encoded and sent separately to a number of boxes, a single audio and a single video are encoded at a number of boxes to be streamed to the ordering box. As a result, multiple audio streams are provided for a single video without using increased bandwidth.
In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention. The present invention may be practiced without these specific details. The description and representation herein are the means used by those experienced or skilled in the art to effectively convey the substance of their work to others skilled in the art. In other instances, well-known methods, procedures, components, and circuitry have not been described in detail since they are already well understood and to avoid unnecessarily obscuring aspects of the present invention.
Reference herein to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one implementation of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Further, the order of blocks in process, flowcharts or functional diagrams representing one or more embodiments do not inherently indicate any particular order nor imply limitations in the invention.
Embodiments of the present invention are discussed herein with reference to
According to one embodiment, when fulfilling a request from a local machine or a box (e.g., 206-1), communication between the server 202 and the box 206-1 over the network paths 208-1 and 210 may be limited to small-scale requests and responses (e.g., of small size and very infrequent). A server response to a request from a box may include source information (e.g., identifiers), authorization information and security information. Using the response from the server 202, the box may be activated to begin playback of a title (e.g., 207-1). Substantially at the same time, the box may initiate one or more requests to other boxes (e.g., 206-2 and 206-n) in accordance with the source identifiers to request subsequent portions of the title (e.g., 207-2 and 207-n). Assuming proper authorization, the requesting box receives the subsequent portions of the data concurrently from the other boxes. Because of box-to-box communication of content over the path 209, the bandwidth requirement for box-to-server communications over the network paths 208-1 and 210 is kept low and typically short in duration. In the event there are a large number of user boxes issuing playback requests substantially at the same time, the bandwidth of the backbone path 210 should be sufficient to avoid noticeable or burdensome delay.
The contents available in a library being offered in any of the boxes 206-1, 206-2, . . . 206-n are originally provided by one or more content providers. Examples of the content providers include service satellite receivers, television relay stations, analog or digital broadcasting station, movie studios and Internet sites. Depending on implementation, the contents may be initially received or originated in the server 202. Instead of maintaining and managing the content in a large storage device, the server 202 is configured to distribute the content or files to a plurality of local machines registered with the server 202. The boxes 206-1, 206-2, . . . 206-n shown in
For convenience, it is assumed herein that a file pertaining to a title is played back when the title is selected and ordered by a user. When an order for a title is placed, a corresponding file must be available for playback. One of the features in the system 200 is that a file, or at least a portion thereof, regardless of its size, can be accessed instantaneously, thereby realizing instantaneous VOD. According to one embodiment, where a file is 840 Mbytes on average and a box includes a storage capacity of 300 Gbytes, a system may offer a large library of titles (e.g., 5000) for access at any time instantly. In the prior art, if the files for the titles must be stored in advance to offer instantaneous playback, the local storage of a box would have to have a capacity of 4,000 Gbytes, consequently, rendering instantaneous VOD economically impractical.
According to one aspect of the present invention, only a beginning portion (referred to as a “head”) and possibly one or more tail segments of a file are locally cached in a box. Such locally cached segments are referred to as residing objects or segments, while segments not residing locally are referred to as distributed objects or segments. When a title is selected, the head of the corresponding file is instantly played back. During the time the head is being played, the distributed objects corresponding to the title are retrieved simultaneously from other boxes. When the head is finished, the received parts of the distributed segments being streamed in from other boxes is combined with residing segments for the title, if any, to enable a continuous playback. Depending on the popularity and concurrent demand for a particular title, the number of residing objects may be increased or decreased to control the dependency of each box on other boxes for playback. Typically, the more residing objects for a title a box has, the more distributed copies of the title there are in the entire system and thus the less dependency of the ordering box on the other boxes.
In one embodiment, the head is always played first to ensure an instant playback. In another embodiment, the head size is reduced to zero, in which case, a time-fill program is played first to provide a time frame that is sufficient enough to fetch and assembly the beginning data portion of the segments either locally available or from other boxes. Depending on implementation, the time-fill program may include one or more trailers related to the title being ordered, various notifications/updates or commercial programs. The time-fill program may be locally configured. In one embodiment, the time-fill program is provided to give a time frame in which data being fetched from one or more other devices can be stabilized. In another embodiment, the time-fill program provides a platform for sponsors that hope to display their respective programs to audience. Orders or slot positions for these programs in a time-fill program may be auctioned.
Various content streams include IP packets that are directed to appropriate channels for delivery over the data network. The IP packets include IP data representing the content of the programs. Prior to transmission over the data network, the IP packets are encrypted by a conditional access encryption unit 710. Once the IP data is encrypted, the Slicing unit 712 slices the data stream in segments of data stream as described further below.
Regardless whether a head is used or not, a file or a majority of a file will be fragmented and the segments are distributed among the boxes in service. According to one embodiment, given a required transmission rate (e.g., 1 megabit per second or 1 Mbps), the minimum uploading and downloading speeds of a network are considered to determine a number that defines the segmentation, and thus the dependency on other boxes and the support for concurrent demands of a particular title.
It is assumed that a minimum uploading speed is U and a required transmission rate is D, and D/U=K<k, where k is the smallest integer greater than K. In one embodiment, a file or a majority of a file is preferably divided into k segments to optimally utilize the uploading speed of U, assuming that the downloading speed is at least k times faster than the uploading speed. For example, in a POTS-based DSL network for residential areas, the required transmission may be about 1.0 Mbps while the uploading speed may be about 300 kbps. Hence, k=4. Assuming that an ordering box has a downloading speed four times the uploading speed of the other boxes, up to four segments in other boxes can be downloaded concurrently across the network as streaming into the ordering box without interruption.
As shown in
where b stands for “data block”, numerals after “b” are mere reference numbers. As used above, the data blocks b11, b21, b31, b41, b12, b22, b32, b42, b13, b23, b33, b43, . . . b1 n, b2 n, b3 n, b4 n are sequential while, for example, data blocks b11, b12, b13, b14 . . . b1 n in Segment 1 are not sequential.
Because multiple audios are encoded, the fragmentation of the file is difficult:
Segmentation is performed to the point where there is no cutting into the middle of an audio.
It should be noted, however, a head, if used, includes data blocks that are consecutive so that an instantaneous playback of the head is possible. It is evident that the data blocks in the segments are non-consecutive, interlaced or interleaved.
In one embodiment, the data stream 240-2 includes one single video 260 and multiple audios 262 associated with the video. For example, the video 260 may be a movie and multiple audios may be audios in different languages such as English, French, Spanish, Italian, Chinese, and etc. To ease the bandwidth requirement, the data stream may be sliced into smaller segments and distributed to the boxes. When a movie (i.e., a video and a particular audio stream) is requested by an ordering box, the boxes with different segments filter out the requested audio to be sent to the ordering box. For example, if the ordering box requests the video 260 with audio 262-3, then the boxes with segments for the requested video 260 filter out all other audios 262-1, 262-2 . . . 262-n, so that only audio 262-3 is sent with the video 260. Once the filtering is done, the requested data stream 240-8 which only includes the video 260 and audio 262-3 can be sent to the ordering box in real-time without wasting the bandwidth because it only has one video and one audio—hence a lean streaming.
Furthermore, the data stream 240-2 may include multiple audio and multiple closed captioning.
In another embodiment, fat seeding may be achieved by distributing video and audio streams 240-4 and 240-6 separately. For example, at the server 202, a video stream 240-4 is encrypted, sliced, and distributed to a number of boxes. An audio stream 240-6 is encrypted, sliced, and distributed to a number of boxes. This is repeated for the audio stream until all audios associated with the video is distributed to the boxes. At the boxes, the video and audio streams reside until an ordering box requests for the video and audio. The ordering box requests for a program that includes a video ID and an audio ID. Once the request is received by the server, the server instructs the boxes with segments to multiplex the requested video and audio. This is done by looking at the video ID and audio ID, and individual packet presentation and decoding time stamps. Once, segments of requested video and audio are multiplexed at the sending boxes, the segments of the data stream 240-8 which only includes one video and audio are forwarded to the ordering box over the path 209.
In both embodiments, distribution of one single video and multiple audios may be done gradually over the paths 210 and 208 from the server 202 to a number of boxes 206—fat seed. Once an ordering box makes a request, the number of boxes forwards a lean stream over the path 209 to the ordering box.
Referring now to
For example, the architecture 300 may be configured to deliver non-prerecorded programs such as live broadcasts by a multicasting protocol. The server 302 receives orders from some of the subscribers (e.g., for boxes 306-1 and 306-n) for a broadcasting event. When the event comes, the server 302 receives a streaming feed from a source (e.g., a televised site). The streaming is then delivered by the server 302 via the network path 310 to 308-1 and 308-n to the ordering boxes 306-1 and 306-n. As the subscriber for the box 306-2 did not order the event, the box 306-2 will not receive the streaming from the server 302. It can be appreciated that the number of recipients for the program does not affect the performance of the server 302 or demands higher bandwidth because the program is being multicast to the ordering boxes.
The architecture 300, at the same time, allows non-interrupted media services among the boxes. Similar to the description for
Referring now back to
In one embodiment, when a server is designed to be one of the suppliers to service an ordering box, the server is not necessarily the one that provides the designation information. A service provider may deploy several servers, each is designated to cover a specific area in accordance with one or more specification (e.g., popularity, geography, demographics, and/or like criteria).
According to one embodiment, the server 302 is configured to provide titles that are not widely distributed among the boxes in service. It is understood that the distributed architecture as described in
Referring now to
To facilitate the continuation of a data stream, each of the pointers 482 and 484 is used to remember where the data block of a segment is being fed or about to be fed to the buffer 470. In the event, the segment being fetched from a box is interrupted and a backup box needs to step in, the ordering box knows exactly where to start fetching the segment from where it was interrupted in accordance with the pointer. Likewise, similar pointers (not shown) may be provided to remember where the data block of the locally cached segment is being fed or about to be fed to the buffer 470. In the event, the ordering box needs to be reset or is suddenly powered off and back on, these pointers can facilitate the continuation of the playback of the ordered movie.
It should be readily understood to those skilled in the art that the above description may be equally applied to cases in which instantaneous VOD services are desired. Instead of playing back the time-fill program, a head of a movie title can be played back first, during which the remaining segments, if not locally available, can be fetched from other designated boxes.
Referring now to
According to one embodiment, any of the boxes 506-1, 506-2, 506-3, . . . 506-n and 508 may receive compressed data from the server 504 that centrally stores all video data and delivers required video data pertaining to an ordered title upon receiving a request. According to another embodiment, the server 504 is configured to identify one or more other boxes to supply pieces of compressed data to a box requesting the data. In other words, all video data is distributed among all boxes in service and the server 504 is not required to deliver all the data in response to a request, and instead is configured to provide source information as to where and how to retrieve some or all of the data from other boxes. As shown in
The operation of distributing segments of a single video and multiple audios—fat seeding—is described in a flow chart or process 750 shown in
At 752, the server is configured to distribute programs to a plurality of boxes for future access depending on popularity. The server also takes into account the programs that are already residing at boxes because either they have been viewed or distributed. Once the server determines that a program needs to be distributed, at 754, the server encodes the program. Typically, the program includes a single video and multiple audios. However, the data stream may included multiple video, audio, and closed captioning. The server may encode the single video and multiple audios as a single data stream or as separate data streams. For example, referring back to
In the case of one data stream that includes a video and multiple audios, at 756, the data stream is sliced into segments and distributed to a number of boxes at 758. The number and location of boxes depend on the popularity of a program, available bandwidth, and other factors.
In the case of separate data stream for a video and each audio, at 756, each data stream is sliced into segments and distributed to a number of boxes at 758. This is repeated until all audios have been distributed to the boxes. Whether the data stream includes single video and multiple audios, or single video or audio, once the process is completed, the selected boxes have for each video, multiple audios associated with each video—fat seed. Typically, data stream is sent from a server to some clients, and then from those clients to other clients.
The operation of fetching segments of requested single video and single audio—lean streaming is described in a flow chart or process 780 shown in
At 786, depending on how the audio streams are residing at the box—either as separate streams or in a single data stream along with the video—the requested audio is selected. In the case where multiple audios are in separate audio streams, the requested audio is identified by its audio ID number. Then, the selected audio is multiplex with the requested video. In the case where multiple audios are in a single data stream together, all audios are filtered out except for the requested audio. Thus, after the filtering, the data stream only includes the requested video and audio. In either case, a lean stream including one video and one audio is constructed. MPEG2 TS has about 188 byte packets, and individual stream IDs, knows as Program Ids (PID). Video, and each audio stream have their own PID.
At 788, segments of the lean stream of the requested video and audio are sent to the ordering box. At 789, the ordering box receives, multiplexes segments received from different boxes to a buffer. At 790, the ordering box plays assembled data in the buffer.
As described above, the architecture of
Similarly, in the event where a video is associated with multiple subtitle tracks, responsive boxes can filter out the irrelevant subtitles before streaming. If the user does not choose any subtitles, then all subtitle packets are filtered out. In cases where there are multiple video tracks, e.g., at different bit rates, and one video track may be streamed depending on the bandwidth available on the receiver.
In addition, one embodiment of the present invention dynamically determines what fragments to stream and what to filter out. For example, the receiver may tell the sender exactly what to send and what to filter out. Such dynamic switching is useful to enable the following features:
In another embodiment, the above described architecture can be used to deal with associated closed caption streams. Each closed caption stream of data is multiplexed into the media stream with its own ID and its own presentation time information. There can be none, one or more closed caption streams in the media stream. Each closed caption stream could correspond to one language or there could be multiple closed caption streams in single language with different content (e.g. Actual dialog, or commentary, etc). Segments of the complete media stream with multiple close caption streams gets distributed to client boxes during seeding (fat seed). During playback, user of the requesting box selects which closed caption stream, if any, s/he wants to see. The serving boxes filter out all closed caption streams other than the requested stream based on the ID of the requested stream.
The foregoing description of embodiments is illustrative of various aspects/embodiments of the present invention. Various modifications to the present invention can be made to the preferred embodiments by those skilled in the art without departing from the true spirit and scope of the invention as defined by the appended claims. Accordingly, the scope of the present invention is defined by the appended claims rather than the foregoing description of embodiments.