Ideas for the Use of Loudness Metering in Game Audio

Last week, I put together a short post to try and help clear up some of the basic confusion surrounding “loudness metering”…specifically, what the primary differences are between ITU-R BS.1770, ATSC-RP A/85 and EBU-R128. At the end of the post I mentioned that, instead of trying to implement standards from the broadcast industry (since EBU-R128 has been such a common topic of conversation), I think the game audio community needs to find its own application of loudness metering. I wanted to put forward some ideas as to how loudness metering could be employed in games in a manner that better suits the unique situations housed within the medium.

Before I begin, I should throw out a disclaimer. I work in audio for television and film, not games. Because of friends that I have in the industry, and personal experimentation with some of the common tools used in game audio, I have some understanding of the production environment and workflow associated with games. I don’t claim to be an expert. This article is posed merely as a “what if” to the audio professionals within that industry. These are just some musing that I’ve toyed with over the last year and a half. I expect there to be problems I haven’t considered with the following ideas, but I do think they are a step in the right direction…an approach designed exclusively for game audio.There are two key points that I’d like to explore:

  1. Dynamic range and measurement duration
  2. Loudness normalization

Let’s start with the idea of a “dialnorm” for games. In the last article, I argued that EBU-R128 and ATSC-RP A/85 are inappropriate for games because they specify a precise infinite loudness measurement over the duration of the program (they are broadcast specifications after all). This is relatively easy to meet in broadcast, as the programs have defined durations (half hour, hour, two hours, etc.) and locked content. No matter when the program is aired, it is always the same program. This is not the case in games; where players can be engaged anywhere from fifteen minutes to hours unending. The content of the game is in flux as well. While there may be some standardized points of action or story elements that do not change, the time it takes to complete them can vary dramatically based on an individual player’s play-style. The actions that occur may also be drastically different; even by the same player across multiple play-throughs. If we were to measure the infinite loudness (perceived volume over total duration) experienced by two players who are playing the same section of a game, it is unlikely we would obtain the same loudness measurements.

Personally, I have another gripe about the idea of a “dialnorm” or target infinite measurement. Aesthetically, I feel that games should, as in film, have the option to use a wider dynamic range than is employed within broadcast. In the least, the option should exist for players (Battlefield 2 is an excellent example of a game allowing the player to choose the dynamic range they prefer). Because of the previously mentioned mutability, achieving this target measurement would require a very narrow dynamic range.

My suggestion here is two fold. First, rather than focusing on the idea of infinite measurement, game audio would be better served by implementing a system surrounding short term measurement…perhaps in the 5 to 10 second time window. Second, rather than targeting one specific measurement, the target should be to fall within a range of loudness measurements. One method of mixing in linear mediums is to mix around an “anchor” element. If you know that there is one moment that has to be louder than all others, that becomes the bar by which you judge subjective loudness in other situations. The same could be said about quiet moments. Establishing maximum and minimum loudness thresholds would afford more flexibility in the mix. Simultaneously, evaluating in smaller time windows would yield data that better reflects the unpredictable nature of a game’s content.

Having a defined range for game audio loudness for the base/default mix would also make it relatively simple to implement something similar to the “War Tapes” mode from Battlefield 2. A bit of compression in the parent mixer of a game’s audio engine could easily narrow the dynamic range in a predictable manner.

I understand that mixing, and evaluating said mix, under these short measurement windows could seem like a very labor intensive process. My next point addresses this concern…loudness normalization of audio elements.

One of the things that ITU-R BS.1770 does better than its metering predecessors is provide predictability of perceived volume. Two radically different sounds (here referring to spectral/frequency content) can subjectively sound at different volumes once in the acoustic medium, even though their electrical representations (as metered via PPM) appear equivalent. This is not the case with the sounds metered via ITU-R BS.1770. If we were to take those same two radically different sounds and adjust levels until they “metered” to the same loudness measurement, they would subjectively sound to be the same volume. This provides an opportunity on the mixing side.

If each family of sound effects were normalized to their own specific loudness unit (LKFS…remember, we’re talking ITU-R BS.1770 here, though LUFS is virtually equivalent), you could begin your mix early on with a very small set of representative sounds implemented into the engine. Conceivably, one could spend a great deal of time perfecting the mix engine using this small and easily managed set. Implementing additional sounds later on in development could have minimal impact on the performance of existing mixer snapshots, game states, etc (provided system resources and memory are properly planned, of course). A standardized peak “limit” may also be required for all sounds, as PPM and LKFS are different scales…clipping or distortion could occur otherwise. While due diligence would require testing and monitoring of the new sounds, it could be a very efficient method for acquiring highly detailed mixes of large and complex soundbanks. There are even loudness normalization batch processing solutions taking hold in the market (NuGen’s Loudness Management Batch Processor being a very elegant solution for network-storage based facilities).

As previously admitted, I am not a game audio professional. I have tried to keep these suggestions broad and general. Loudness normalization of individual elements could provide the flexibility needed to mix with small measurement windows, and targeting a loudness range while mixing is an approach better suited to the unpredictable nature of a population’s individual experiences. I feel these to be an approach better suited to the game audio industry than the infinite window, precise target measurement, prescribed in the North American and European broadcast specifications.

Please point out any flaws in my arguments. There is an appropriate implementation of loudness metering for games somewhere out there. This is merely a starting point. [...and one I wouldn't be surprised to find that someone may have already suggested elsewhere.]

Bookmark and Share
This entry was posted in Thoughts and tagged , , , , . Bookmark the permalink.

5 Responses to Ideas for the Use of Loudness Metering in Game Audio

  1. Steven Googe says:

    This is a great post. The idea of creating a reference mix and sounds to base the rest of the mix on throughout the production is a great idea. As the rest of the sounds are created they can be leveled to this premix and reduce the amount of mixing that has to be handled in the middleware and engine and make the final mix much easier once the dreaded final crunch is reached. The only issue is still the not-so-uncommon occurrence of studios bringing in sound people way too late in the production, but that’s a whole other problem. With the appropriate amount of preproduction time, I can see this working out very well for an audio team.

  2. Rich Aitken says:

    A generally good post. With respect to loudness, these are always windowed functions so I don’t see why we can’t look at doing a (better; lets not go advert crazy) version of TV loudness. The ITU-R BS.1770 reference is already being adopted by a great many involved in sound mixing for games. We do it at Nimrod ( a game audio post production studio) and I’ve heard many others doing the same. Secondly, mix referencing to a standardised track for the game is something most of us have already been doing since the 90s! In fact it’s really the only way to go once you start mixing many cues used either in game, menus or within cut scenes. Thirdly our delivery mechanism does have more in common with broadcast TV than it does film. For a start we are delivering through the same audio systems and display systems. To adopt the wider dynamic range used in films would be a mistake; it’s just too broad for the targeted home systems. To improve game mixing (and yes, it does need improving) ITU-R BS.1770 is a good step forward and I for one absolutely support this as the basis for game audio standards.

    • Shaun says:

      All good points, Rich. Yes, even PPM’s have a time function to them. I simply don’t think that the “infinite” window specified in the broadcast specs applies well very well to the type of interactive media we’re talking about here. As to your second point, I do realize that people have been mixing to a “standardized track.” The point I was making is that a specific mixer sanpshot can be very predictable later down the road when assets are normalized using ITU-R BS.1770 measurements. It’s just a higher degree of accuracy than afforded by dBFS or LEQ-A; which, I assume, people have been using for that purpose in the past. And that broadcast vs. film idea is valid, but I think there are some caveats surrounding how games are consumed. As far as delivery mechanism/reproduction environment goes, I whole-heartedly agree with you. On the other hand, I’d argue that consumers devote a higher degree of attention to games than they do any other medium. Personally, I do…it’s just the nature of interactive media/entertainment. There are fewer distractions that can compete with games than television or film viewing (since they’re such passive activities). For television broadcasts, the goal is to keep you watching the channel. Deviations in volume leave openings for distractions (or can be distractions themselves). The focus invested in games, that requirement of active participation, gives you a bit more wiggle room. I agree that the sometimes extremely wide dynamic range found in films will not be appropriate in all situations, but the narrow range of broadcast isn’t the answer either. There’s a middle ground in there somewhere that will work, and it’s probably something that will have to be determined on a game-by-game basis. I know the game audio community is capable of coming up with some truly unique solutions, I just saw too many people getting hung up on the EBU-R128 spec. I want more people looking for the logic holes in the ideas I presented, because I think that will keep the conversation moving forward in a positive direction. So, thanks for your contributions to the discussion.

      • Rich Aitken says:

        You’re absolutely right, of course. It really needs to be judged on a game by game basis. I highly immersive FPS could absolutely do a more filmic mixing style than, say, a handheld Mario game. It’s as much game specific as mastering can be music genre specific.

  3. Pingback: Meters, Mixing Levels, Loudness (and the new ITU spec) | ARCH11008 Sound Design Media

Leave a Reply

Your email address will not be published. Required fields are marked *


*

* Copy This Password *

* Type Or Paste Password Here *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>