WO2012094916A1

WO2012094916A1 - Method for encapsulating and transmitting streaming media packet, and device for processing streaming media

Info

Publication number: WO2012094916A1
Application number: PCT/CN2011/081273
Authority: WO
Inventors: 刘继年; 李竹平; 王芳; 孙健; 赵宇
Original assignee: 中兴通讯股份有限公司
Priority date: 2011-01-11
Filing date: 2011-10-25
Publication date: 2012-07-19
Also published as: CN102595199A

Abstract

Disclosed are a method for encapsulating and transmitting a streaming media packet, and a device for processing the streaming media. In the present invention, a sending end encapsulates a packet, including for each packet of the media code stream: writing a media code stream identifier into the packet header of the packet according to the media type of the packet; writing a packet serial number into the packet header of the packet according to the serial number of the packet in the media code stream corresponding to the media code stream identifier; and writing a packet format load length identifier into the packet header of the packet according to the total length of the packet; and the sending end sending each encapsulated packet to one or more receiving ends via the same UDP channel. The present invention can enable multiple paths of media sources to be transmitted in the same UDP/IP network channel, and can provide functions such as extraction, combination, and lost packet compensation for the program media code stream, realizing the objects of encapsulating and transmitting multi-media packets highly efficiently and with high quality.

Description

Method for packaging and transmitting streaming media data packet and streaming media processing device

Technical field

The present invention relates to the field of multimedia data transmission technologies, and in particular, to a method for packaging and transmitting a streaming media data packet and a streaming media processing device.

Background technique

With the development of wireless 3G, 4G data transmission bandwidth and high-bandwidth access of fixed networks, multimedia transmission technology has been widely used, and streaming media transmission technology for real-time transmission of multimedia content such as audio, video, subtitles or animation should be carried out. Health.

The streaming media server packs media content such as audio, video, subtitles or animation into a media stream, and the media stream is continuously transmitted in real time on the network in groups. The client does not have to wait for the media content to be completely downloaded, and only needs to delay for a short time. You can start playing, and the media stream will play while it is playing until the media content is played or the client aborts. Application streaming media technology can significantly reduce media playback latency and reduce client cache requirements.

However, existing streaming media transmission protocols have some shortcomings. For example, the standard organization IETF (Internet Engineering Task Force) RTP (Real-time Transport Protocol) cannot directly achieve better results. Multiple media synchronization features and media information tagging capabilities; Standard Organization ISO (International Organization for Standardization, International Organization for Standardization) MPEG-2 (Moving Pictures Experts Grou) TS (Transport Stream) standards Unable to achieve efficient network transmission.

Summary of the invention

The object of the present invention is to provide a package, a transmission method and a streaming media processing device for streaming media data packets, so as to ensure the efficiency of multimedia transmission and increase the quality of user experience.

In order to solve the above problem, the present invention provides a method for encapsulating a streaming media data packet, including: each data packet for a media code stream: according to a media type of the data packet, the media code stream is marked The data packet header is written into the data packet header of the data packet; the data packet sequence number is written into the data packet header of the data packet according to the sequence number in the media code stream corresponding to the media stream identifier; and, according to the data packet The total length of the packet format payload length is written to the packet header of the packet.

The above packaging method further includes:

For each packet of the media stream: The clock reference identifier is written to the packet header of the packet according to the media stream encoding time of the packet.

The above packaging method further includes:

For each packet of the media stream: According to the media stream key information of the packet, the media stream key information identifier is written into the packet header of the packet.

Wherein, each packet for the media stream:

If the data packet is a data packet of a video media stream, the media stream key information includes one or more of the following information: scalable video coding (SVC) attributes and dependencies, multi-view video coding ( MVC) angle information, and related key information of the video frame;

If the data packet is a data packet of an audio stream, the media stream key information includes language information of the media stream;

If the data packet is a data packet of a subtitle stream, the media stream key information includes subtitle voice information of the media stream.

The relevant key information of the video frame includes related frame information of the video frame and a frame data boundary. In order to solve the above problem, the present invention provides a method for streaming media transmission, including: a sending end encapsulating a data packet, wherein each data packet for the media code stream: according to a media type of the data packet, the media code stream Identifying a data packet header written in the data packet; writing a data packet sequence number in a data packet header of the data packet according to a sequence number in the media code stream corresponding to the media code stream identifier; and, according to the data packet The total length of the packet format payload length identifier is written to the packet header of the packet;

The sender sends each encapsulated data packet to one or more receiving ends through the same User Datagram Protocol (UDP) channel.

In the step of encapsulating the data packet, the sending end is also for each data packet of the media code stream, The clock reference identifier is written into the packet header of the packet according to the media stream encoding time of the packet.

In the step of encapsulating the data packet, the sending end further writes the media code stream key information identifier into the data packet header of the data packet according to the media code stream key information of the data packet for each data packet of the media code stream. in.

Each data packet for the media stream: If the data packet is a data packet of the video media stream, the media stream key information includes one or more of the following information: SVC attributes and dependencies, MVC Angle information, and related key information of the video frame;

The relevant key information of the video frame includes related frame information of the video frame and a frame data boundary. The sending end sends each encapsulated data packet to the one or more receiving ends through the same UDP channel, including:

The transmitting end combines one or more encapsulated data packets into a large data packet and sends the data packet to the receiving end.

The above transmission method further includes:

After receiving the data packet sent by the sending end, the receiving end sorts the media code stream according to the media code stream identifier and the media code stream key information identifier in the data packet header of the received data packet, and selects the required media. The code stream is forwarded or played.

The above transmission method further includes:

After receiving the data packet directly or indirectly from the transmitting end, the receiving end determines whether there is a packet loss according to the media code stream identifier and the data packet number in the data packet header of the received data packet, and if there is a packet loss, A server that holds a media stream corresponding to the lost packet acquires the lost packet.

The above transmission method further includes: After receiving the data packet directly or indirectly from the transmitting end, the receiving end synchronizes the multi-media media code stream according to the clock reference identifier in the data packet header of the received data packet.

In order to solve the above problems, the present invention provides a streaming media processing device, including a package module, where

The encapsulating module is configured to encapsulate a data packet, where each data packet for the media code stream is: according to the media type of the data packet, writing the media code stream identifier into a data packet header of the data packet; according to the data The serial number in the media code stream corresponding to the media stream identifier, the data packet number is written into the data packet header of the data packet; and, according to the total length of the data packet, the data packet format payload length identifier is written into the data packet header The packet header in the packet.

The encapsulating module is further configured to: each data packet for the media code stream: according to the media code stream encoding time of the data packet, writing a clock reference identifier into a data packet header of the data packet; according to the media code of the data packet Flow key information, the media stream key information identifier is written into the packet header of the data packet.

The above device also includes a transmitting module,

The sending module is configured to send each encapsulated data packet output by the encapsulating module to one or more receiving ends through the same UDP channel.

The present invention proposes an efficient, high quality method of packaging and transmitting multimedia data packets, which has the following advantages:

1. The encapsulated data packet can be directly transmitted on a transport layer protocol such as UDP (User Datagram Protocol) or TCP (Transmission Control Protocol);

2. The encapsulated data packet can also be transmitted on an application layer protocol such as RTP or HTTP (Hypertext Transfer Protocol).

3. In any transmission bearer mode, the encapsulated data packet can obtain efficient data transmission;

4. In any transmission bearer mode, the encapsulated data packet can provide packet loss compensation information, and can also ensure better multi-channel code stream synchronization effect; 5. In any transmission bearer mode, the encapsulated data packet contains key information of the media code stream, and the application level can be extended according to the information at the application level and the device level. BRIEF abstract

1 is a schematic structural diagram of a data packet according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of data packet transmission and application according to an application example of the present invention.

Preferred embodiment of the invention

In the face of the explosive increase of current multimedia transmission applications, in order to ensure the efficiency of multimedia transmission and increase the quality of user experience, the present invention proposes an efficient, high quality method of packaging and transmitting multimedia data packets.

Specifically, the following steps may be included:

Step 1: The sender divides each media code stream into more than one data packet (the media stream data is placed in the data payload portion, as shown in Figure 1).

Step 2: For each data packet: according to the media type of the data packet, write the media code stream identifier into the data packet header of the data packet; according to the sequence number of the data packet in the media code stream corresponding to the media code stream identifier Write the packet sequence number into the packet header of the packet; and, according to the total length of the packet, write the packet format payload length identifier into the packet header of the packet.

Media stream identification fields can be used to distinguish between different types of media packets, such as video, audio, subtitles, different angles of multi-angle video (MVC), and different base layer and extension layer (SVC) of scalable video.

In addition, the sending end may, for each data packet, write the clock reference identifier into the data packet header of the data packet according to the media code stream encoding time of the data packet, and the media according to the data packet. The code stream key information, the media code stream key information identifier is written into the data packet header of the data packet.

If the data packet is a data packet of a video media stream, the media stream key information may include one or more of the following information: SVC (Scalable Video Coding) attributes and dependencies, MVC (multi-view coding) angle information, And relevant key information of the video frame;

The related key information of the video frame may include related frame information and frame data boundary of the video frame.

Step 3: The sender sends each encapsulated data packet to one or more receiving ends through the same UDP/IP channel.

Since the data packets of each media type can be distinguished by the media stream identification field, a port resource channel can be used for transmission in the UDP transmission channel to ensure: a, network layer port resource saving use; b, media Synchronization of information.

In addition, because some media data has a small amount, for example, subtitles, audio, etc., for such a small amount of media, the package can be packaged as a smaller packet when the smaller packet is transmitted over the IP network. In this case, multiple smaller packets can be combined into one large packet for transmission.

Step 4: After receiving the data packet sent by the sending end, the receiving end may sort the media code stream according to the media code stream identifier and the media stream stream key information identifier in the data packet header of the data packet, and select the required media code stream. Forward or play.

For example, according to different applications, after receiving the data packet, the media server may perform selective transmission according to the identification field of different media types.

In addition, after receiving the data packet directly or indirectly from the sending end, the receiving end may determine whether there is a packet loss according to the media code stream identifier and the data packet number in the data packet header of the data packet, and if there is a packet loss, the receiving device saves The server of the media stream corresponding to the lost packet acquires the lost packet. And, the multi-media stream can be synchronized according to the clock reference identifier in the packet header of the packet.

The following further explains the package structure of the data packet:

As shown in Figure 1, the encapsulated packet includes the packet header and data payload. Among them, the packet header can be extended in length and contains multiple fields. In this article, some of the main key fields are defined. However, it is not limited to this and can be further expanded.

1, packet format load length identification

This field identifies the total length of the packet and contains the length of the packet header. This representation helps the application layer to more clearly obtain the actual length of the overall packet, not just the length field of the payload. The length of the data packet defined by this field is variable. The application layer can define a set of experience values or common values for use in order to facilitate application and transmission equipment. This field mainly includes the following application advantages in multimedia applications: It can improve transmission efficiency and flexibly configure the packet length. This field can be defined using 8 bits and more than 8 bits.

2, media stream identification

This field identifies different media source types. The identifier field can be identified by using 4 digits or more. For example, if 8 bits are defined, 256 different media types can be defined. If 16 bits are defined, 65536 different colors can be implemented. Media type. This field mainly includes the following application advantages in multimedia applications: Because the field exists for media source type differentiation, multiple media types of one program can be transmitted in the same resource port of the same IP channel, which can be better. Synchronization effect.

3, the data packet serial number

The field identifies the packet sequence number of the data packet in a certain media type, and the sequence number is sorted according to the source of the determined media type, and the sequence number is identified by 8 bits and more than 8 bits. For example, if 8 bits are defined, 256 packets can be identified. Cyclic sorting of sequence numbers, defining 16 bits, can achieve loop sorting of 65536 packet sequence numbers. This field mainly includes the following application advantages in the multimedia application: According to the continuity of the packet number, it is relatively easy to judge the network packet loss on the terminal side, and thus it is easier to implement the retransmission effect and improve the quality of the experience.

4, clock reference identification

This field allows for an accurate codec clock definition that can be used as an accurate clock reference for individual packet code synchronization; the clock reference identification bits are defined at least 32 bits and are defined to more than 32 bits. This field mainly includes the following application advantages in multimedia applications: This field can carry accurate clock references of various media sources to ensure accurate decoding and synchronization of the client.

5, media code stream key information identification field This field can be represented by 8 bits and more than 8 digits. It describes the key information of the media stream. See Figure 1 for the description of this field.

For the video media stream, an SVC attribute and a dependency describing the code stream, and MVC angle information may be defined, and relevant key information of the field extended video frame may be added.

For an audio stream, language information describing the stream can be defined;

For the subtitle stream, subtitle language information describing the stream may be defined;

The relevant key information of the video frame can further describe the related frame information and the frame data boundary of the video frame, so as to better extract the key information of the media into the main transmission field, so as to more easily achieve a more complete multimedia application.

This field identifies the following information for the video media frame: IDR—Start (start of IDR frame); IDR Middle (in the middle of the IDR frame); IDR_end (end of IDR frame); ODR Start (start of ODR frame) ; ODR Middle (the middle of the ODR frame); ODR End (the end of the ODR frame); GDR_Start (the start of the GDR frame); GDR Middle (the middle of the GDR frame); GDR End (the end of the GDR frame); B Start ( B frame start); B Middle (between B frames); B_End (end of B frame); P Start (start of P frame); P_Middle (in the middle of P frame); P_End (end of P frame).

The field mainly includes the following application advantages in the multimedia application: 1. For the random access of the client, the related video frame can be conveniently found; 2. The key frame can be conveniently stripped for multimedia application; 3. The network device can use the information to perform Packet buffering for more applications.

As shown in FIG. 2, in order to perform a whole process of main service using such a packet format, the present invention will be further described as a specific application example.

First, the multimedia encoder performs the collection of the video source and the audio source, and after some processing, forms a program code stream, and the program code stream can be described by considering a relatively complex model. We assume that the program code stream contains two angles of video code. Stream, one of the video streams is a code stream encoded by SVC, which we call A video stream, that is, the A video stream contains a basic layer code stream, which is called A video base layer stream; And an enhancement layer code stream, referred to as an A video enhancement layer code stream; meanwhile, another video code stream is referred to as a B video code stream; and includes an English audio code stream and a Chinese audio code stream and a subtitle layer code stream. In summary, the program stream includes a total of six media streams, namely an A video base layer stream, an A video enhancement layer stream, a B video stream, an English audio stream, a Chinese audio stream, and a subtitle layer stream. The six-way media stream is encapsulated according to the foregoing description, and the necessary fields of the encapsulation mode are performed according to the above description, and output by the multimedia encoder to the IP network; this process is indicated by the arrow A in the drawing of the specification.

The following describes the functions or applications implemented in this example:

1. The multi-channel media source is in the same UDP/IP network channel (the same network port resource M is dedicated to transmit. This function mainly depends on the media stream identification field in the encapsulation format, wherein the steps of encapsulation and transmission are as follows:

In the first step, the multimedia encoder collects the corresponding video and audio, and the subtitle media is encapsulated as follows:

Fill in the media stream identification field according to the media type of the data packet, fill in the data packet sequence number field according to the serial number of the data packet, fill in the data packet format payload length identifier according to the total length of the data packet, and fill in the clock according to the coding time of each media code stream. Refer to the identification field, and fill in the media stream key information identification field according to the key information of each media stream.

In the second step, each media stream packet (the embodiment includes six media streams) is sent from a determined sending port of the UDP protocol to a determined receiving port according to a determined sequence and UDP port resources.

The program stream is transmitted over the network in such a manner that the multi-channel media of the program is transmitted within the port resources determined by a UDP channel. It can be shown in the description of arrow B in Figure 2.

Second, the program media stream extraction and combination function

This function mainly relies on the media stream identification field and the media stream key information identification field; the media server in Figure 2 illustrates:

The process is identified by the arrow B in Figure 2, and the multi-media stream of the program is output to the media server. The media server can sort, store or buffer according to the application requirements according to the above packet format.

For example, the media server may perform code stream sorting according to the media code stream identification field and the media code stream key information identification field, and then combine the code stream that is required by the user to be sent to the media player 2 for playing and decoding, and the code stream is transmitted. The F and G arrows in Figure 2 are identified; if the application requires cutting Change one media stream, for example, replace the A video base layer code stream into a B video code stream; or add a B video code stream; both can use the H arrow identifier to perform feedback, and the media server performs corresponding code stream replacement or increases the corresponding code stream. send.

Third, the loss of packet compensation function

This function mainly relies on the media stream identification field and the packet sequence number field; the following figure

2 media player 1 and media server for explanation:

In the first step, the media player 1 can receive the original code stream output by the multimedia encoder (as indicated by the arrow C), and can select the user according to the media stream identification field and the media stream key information identification field in the media stream. Actual code stream.

In the second step, the media player 1 can perform network packet loss detection according to the media code stream identification field and the data packet sequence number in the received media stream, and when the packet is lost, the message is sent to the media server through the arrow D in the figure. The arrow E in the figure indicates that the buffered data packet is compensated and transmitted by the media server, thereby improving the quality of the user experience.

Fourth, the full use of transmission resources

This function is mainly implemented by the packet format payload length identification field. The field in the packet format is a variable length definition, that is, the length of each packet is different.

Generally, for a video code stream, since the video data is more, it can be determined that the data packet length is close to the MTU (Maximum Transmission Unit) value of the network device; generally, for the audio stream or the subtitle stream, Similar media stream data is less, can be a smaller value of the packet length;

Packets of different lengths can be combined and carried in a UDP transport packet for transmission. This method can save network resources and make full use of transmission resources.

V. Relevant key information application of video frames in the key information identification field of the media stream stream The following specific functions mainly rely on the relevant key information of the video frames in the key information identification field of the media stream stream. The following describes the application functions in detail:

The first application includes an I-frame related identifier (IDR (Instantaneous Decoding Refresh), ODR (Open-GOP Decoding Refresh), and GDR (Gradual Decoding) in the code stream transmission packet. Refresh, progressive decoding refresh Frame)) enables the relevant application layer device to recognize the I frame and perform application processing. For example, the I frame can be buffered for fast channel switching (fast I frame transmission or playback);

In the second application, the I-frame correlation identifier (IDR, ODR, GDR) is included in the code stream transmission data packet, so that the relevant application layer device can identify the I frame and perform application processing, for example, the index can be established according to the I frame identifier, and used for Stunt mode implementation;

The third application, including the I-frame correlation identifier (IDR, ODR, GDR) in the code stream transmission data packet, enables the relevant application layer device to identify the I frame and perform application processing, for example, stripping storage for key frame data. Independent storage, etc.

In the fourth application, the B-frame correlation identifier is included in the code stream transmission data packet, and the network device can directly not forward the B frame when the network resource is insufficient, thereby saving network resources and achieving more diverse applications.

The application related to the key information of the video frame in the key stream identification field of the media stream cannot be exhaustive.

Sixth, clock reference application

The following specific functions are mainly implemented by using the clock reference field in the packet format in combination with other fields. The following describes the application functions in detail:

The first application, based on the clock reference and parsing in the packet, can complete the strict synchronization of multiple media types on the client.

In the second application, the clock reference can be used as a decoding recovery clock reference for the player.

Correspondingly, the streaming media processing device of the embodiment of the present invention includes a package module and a sending module, where

Optionally, the encapsulating module is further configured to: each data packet for the media code stream: writing a clock reference identifier into a data packet header of the data packet according to a media code stream encoding time of the data packet; According to the media stream key information of the data packet, the media stream key information identifier is written into the packet header of the data packet.

One of ordinary skill in the art will appreciate that all or a portion of the above steps may be accomplished by a program instructing the associated hardware, such as a read-only memory, a magnetic disk, or an optical disk. Optionally, all or part of the steps of the foregoing embodiments may also be implemented by using one or more integrated circuits. Accordingly, each module/unit in the foregoing embodiment may be implemented in the form of hardware, or may use software functions. The form of the module is implemented. The invention is not limited to any specific form of combination of hardware and software.

The above description is only the preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.

Industrial applicability

Compared with the prior art, in the present invention, the encapsulated data packet may be directly transmitted on a transport layer protocol such as UDP or TCP for transmission, or may be carried on an application layer protocol such as RTP or HTTP, in any transport bearer. In this way, the encapsulated data packet can obtain efficient data transmission, can provide packet loss compensation information, and can also ensure better multi-channel code stream synchronization effect; in addition, the encapsulated data packet includes media code stream key Information, at the application level and device level, can be extended based on this information.

Claims

Claim

A method for packaging a streaming media data packet, comprising:

Each data packet for the media stream: according to the media type of the data packet, writing the media code stream identifier into the data packet header of the data packet; according to the data packet in the media code stream corresponding to the media code stream identifier Serial number, the packet sequence number is written into the packet header of the packet; and, according to the total length of the packet, the packet format payload length identifier is written into the packet header of the packet.

2. The method of claim 1 further comprising:

3. The method of claim 1 or 2, further comprising:

4. The method according to claim 3, wherein, for each data packet of the media code stream: if the data packet is a data packet of a video media stream, the media stream key information includes the following information One or more of: scalable video coding (SVC) attributes and dependencies, multi-view video coding (MVC) angle information, and related key information of video frames;

5. The method of claim 4, wherein

The relevant key information of the video frame includes related frame information of the video frame and a frame data boundary.

6. A method of streaming media transmission, comprising:

The sending end encapsulates the data packet, where each data packet for the media code stream: according to the media type of the data packet, writes the media code stream identifier into the data packet header of the data packet; according to the data packet in the media packet The code stream identifies the serial number in the corresponding media stream, and writes the data packet number into the data packet header of the data packet; and, according to the total length of the data packet, the data packet format payload length Know the data packet header of the packet; and

7. The method of claim 6, wherein

In the step of encapsulating the data packet, the transmitting end further writes the clock reference identifier into the data packet header of the data packet according to the media code stream encoding time of the data packet for each data packet of the media code stream.

8. The method according to claim 6 or 7, wherein

9. The method according to claim 8, wherein each data packet for the media code stream: if the data packet is a data packet of a video media stream, the media stream key information includes the following information One or more of: scalable video coding (SVC) attributes and dependencies, multi-view video coding (MVC) angle information, and related key information of video frames;

10. The method of claim 9, wherein

The method of claim 6, wherein the sending end sends each encapsulated data packet to the one or more receiving ends through the same UDP channel, including:

12. The method of claim 8 further comprising:

After the receiving end receives the data packet sent by the transmitting end, according to the number of received data packets According to the media code stream identifier and the media code stream key information identifier in the packet header, the media code stream is sorted, and the required media code stream is selected for forwarding or playing.

13. The method of claim 6 further comprising:

14. The method of claim 7 further comprising:

After receiving the data packet directly or indirectly from the transmitting end, the receiving end synchronizes the multi-media media code stream according to the clock reference identifier in the data packet header of the received data packet.

15. A streaming media processing device, comprising a package module,