2019年9月29日日曜日

Next generation audio codec MPEG-H 3D Audio in the Fraunhofer / 4K broadcasting era

https://webcache.googleusercontent.com/search?q=cache:73U9TKvrCq0J:https://pro.miroc.co.jp/headline/fraunhofer-mpeg-h/+&cd=2&hl=en&ct=clnk&gl=jp
I shared it.

September 04, 2018



In the spring of last year, in Korea, audio using MPEG-H was started on air in 4K terrestrial waves using ATSC 3.0. A compatible TV has also been released, and a new phase of the 4K broadcasting era has begun to keep pace with the Olympic milestone. MPEG-H LC profile Level 3 that packages up to 16 tracks of audio data, but what is the status of its use that is about to start around the world? In this article, we will take a look at the current state of MPEG-H, which will give content a variety of flexibility with the keywords personalization and immersive.
Fraunhofer, Europe's largest applied research institute

In taking up MPEG-H, which is the main theme of this article, I would like to introduce Fraunhofer, the developer. Fraunhofer is the largest research institute in Europe, based in Germany. It is funded by the German federal government and conducts practical applied research and development to serve society. Since the research object is specialized in the development of practical technology, the ratio of commissioned research requested by private companies is high, and it seems that the taste is different from the generally considered institutes. Fraunhofer has over 25,000 staff members, most of whom are researchers. Its total research and development costs amount to 2.3 billion euros, of which 1.9 billion euros are covered by research commissions from companies. From this fact, we can see how Fraunhofer's technology development is being utilized in the actual society through private companies, or in short, the application technology that can be established as a business. In Germany, apart from Fraunhofer, there are also institutions specializing in the development of elemental technologies. Recently, it is said that industry-academia collaboration is prosperous, but Fraunhofer is a pioneer and successful pioneer.

Fraunhofer has 72 laboratories in Germany. Each of these laboratories is basically doing their own research. The field of research is diverse, and it can be said that it covers almost all fields from life science to nanotechnology, materials, and defense technology. The laboratories that were originally separate from each other gathered under the Fraunhofer flag and have grown into a huge research institute with 72 bases. The Fraunhofer IIS (Integrated Circuit Research Laboratories) is conducting research on the MPEG-H introduced this time, and its Audio Business group is conducting this research.
Personalization, immersive, MPEG-H keywords

The most familiar result of this Fraunhofer IIS is mp3. Already standard as an audio codec for distribution. It becomes a codec that there is no one who has never used it. In addition, AAC is a technology originated from this institute. MPEG-H has emerged as the next-generation audio codec. More than 10 billion devices that already have mp3 and AAC are shipped. MPEG-H, which has a genealogy of this technology, has high versatility and is being developed as a codec for the next generation. The word “The New Standard for Parsonalized and Immerrsive Audio” at the beginning of the presentation material represents the full picture of MPEG-H. Personalized and immersive. This is because it is a new codec that follows two keywords that are expected to increase user demand in the future.

MPEG-H has two characteristics: personalization and immersive. Here, we will talk individually for each. These two elements are designed and created so that they can be operated flexibly by various factors such as sources and needs while having a close relationship. In order to realize these two elements, the current MPEG-H LC profile Level 3 prepares audio data of up to 16 tracks as a package. One of these tracks becomes metadata, and the actual usable audio is 15 tracks. The data that can be stored there can be the same channel-based audio (stereo, 5.1ch, etc.), object audio, and scene-based audio (HOA) as before. The combination is free, and the number of channels will be expanded in the future. This means that it will change with the infrastructure.

Interactive use by handling all channels, objects, and scenes = personalization, immersive use (objects, HOA), and how many speakers are in the viewing environment for any device It is made to be able to respond flexibly to the differences. I can't help but feel the high R & D capability that has developed the market-leading codec. A technology that ensures as much versatility as possible and that users can enjoy in any case. This is the true value of Fraunhofer.

How to use personalized MPEG-H
Then, I would like to confirm this MPEG-H application example according to an actual example. In terms of personalization, operation has already begun in 4K terrestrial waves using ATSC 3.0 in Korea. As for personalization in broadcasting, for example, it is possible to listen to the announcer's voice (Dialogue Enhancement) during sports broadcasts, to turn it off, and to audition in multiple languages. This means that TV viewers can adjust the balance of audio sent over the air.

This is realized by object audio. The object audio introduced so far was audio with location information, but in MPEG-H, in addition to this, for example, in multilingual broadcasting, “what language” for each object audio This information is provided as meta information so that the user can select it. What kind of content is stored instead of adding location information to object audio? It has the metadata that. Also for channel-based audio as a base, changes in balance, exclusive selection of channels, and what is the default balance are transmitted as presets. As another example, various presets can be used, such as creating a preset as Dialogue Enhancement, which is a balance that makes it easy to hear words as another preset.

These selection screens are displayed by information obtained from the metadata track on the TV side. The extended menu allows you to freely balance individual object tracks. For ordinary users, users can easily balance their objects and tracks with simple operations by simply switching presets. It turns out to be an interactive and personalized next generation TV audio technology.


Immersive support with high flexibility
Although it is immersive, which is another keyword, the MPEG-H object track has position information and can be set as object audio that can be localized in 3D space. HOA = Higher Order Ambisonics can be combined with channel-based bed trucks. At present, there is a channel limit of 16track (15track), but it will be expanded in the future to be a flexible codec and a versatile codec independent of production format. The speaker arrangement has many presets, so this part is also highly flexible, and it is difficult to ensure versatility.

At NAB2018, a reference model of the sound bar was exhibited at the Fraunhofer booth, and answers were also prepared for the speaker arrangement that would be the biggest barrier to immersive audio playback. There has been a system that can reproduce immersive audio by installing a sound bar in front of the TV, but as expected, the surround space with a very beautiful expanse reminiscent of Fraunhofer is reproduced, and the viewing area As I commented that I was focusing on expansion, I was able to experience the sound from a sufficient surround direction even when I left the sweet spot. Fraunhofer does not intend to produce a sound bar as a product, but rather a reference design to present and ask the manufacturer to design and manufacture a product that uses that technology. Unfortunately, I couldn't listen to the sound, but the NAB2018 venue also had a soundbar prototype made by SENNHEISER.

Each company is also progressing with commercialization
Thus, MPEG-H is a technology that covers a very wide range of next-generation content. So what is the MPEG-H production method like? Broadcast systems targeted for personalization need to generate metadata in real time. Several companies have already introduced products that add metadata to SDI Embeded Audio signals in real time. Its representative is the same German manufacturer as Fraunhofer, a Junger Audio product that produces many signal processors. Here, the SDI signal to which the metadata is added is converted into MPEG-TS at the transmission stage and sent out as a radio wave. The metadata can be easily set on the GUI, and it feels like the standard has already begun operation.

For another immersive response, the system is based on production. As expected, immersive sound in live production is possible with a fixed format, but it is a production system if it is built. A demonstration that MPEG-H metadata can be output from the SPATIAL AUDIO DESIGNER = SAD plug-in of NEW AUDIO TECHNOLOGY already running on Pro Tools has been performed. At this time, it seems that it is a beta version because MPEG-H compatible characters are not seen on HP, but it turns out that the production system is steadily approaching completion. An Offline Export Tool is provided on the SAD, and multi-channel WAV with metadata, or only metadata can be exported. Complete packet data can be created by merging WAV and Video File exported in this way. .Mp4 is used as the video container for the file, and MPEG-TS is used as the sending stage. These are also thought to be the result of minimizing changes so that the current system facility can be used effectively.

An example of MPEG-H metadata settings for multilingual content, thus using objects.
Adoption by broadcasting organizations such as ATSC 3.0 and DVB has also been decided, and since May 31, 2017, audio using MPEG-H has been aired over the air from Korea.

0 コメント:

コメントを投稿