Last week, I put together a short post to try and help clear up some of the basic confusion surrounding “loudness metering”…specifically, what the primary differences are between ITU-R BS.1770, ATSC-RP A/85 and EBU-R128. At the end of the post I mentioned that, instead of trying to implement standards from the broadcast industry (since EBU-R128 has been such a common topic of conversation), I think the game audio community needs to find its own application of loudness metering. I wanted to put forward some ideas as to how loudness metering could be employed in games in a manner that better suits the unique situations housed within the medium.
Before I begin, I should throw out a disclaimer. I work in audio for television and film, not games. Because of friends that I have in the industry, and personal experimentation with some of the common tools used in game audio, I have some understanding of the production environment and workflow associated with games. I don’t claim to be an expert. This article is posed merely as a “what if” to the audio professionals within that industry. These are just some musing that I’ve toyed with over the last year and a half. I expect there to be problems I haven’t considered with the following ideas, but I do think they are a step in the right direction…an approach designed exclusively for game audio.There are two key points that I’d like to explore:
- Dynamic range and measurement duration
- Loudness normalization
Let’s start with the idea of a “dialnorm” for games. In the last article, I argued that EBU-R128 and ATSC-RP A/85 are inappropriate for games because they specify a precise infinite loudness measurement over the duration of the program (they are broadcast specifications after all). This is relatively easy to meet in broadcast, as the programs have defined durations (half hour, hour, two hours, etc.) and locked content. No matter when the program is aired, it is always the same program. This is not the case in games; where players can be engaged anywhere from fifteen minutes to hours unending. The content of the game is in flux as well. While there may be some standardized points of action or story elements that do not change, the time it takes to complete them can vary dramatically based on an individual player’s play-style. The actions that occur may also be drastically different; even by the same player across multiple play-throughs. If we were to measure the infinite loudness (perceived volume over total duration) experienced by two players who are playing the same section of a game, it is unlikely we would obtain the same loudness measurements.
Personally, I have another gripe about the idea of a “dialnorm” or target infinite measurement. Aesthetically, I feel that games should, as in film, have the option to use a wider dynamic range than is employed within broadcast. In the least, the option should exist for players (Battlefield 2 is an excellent example of a game allowing the player to choose the dynamic range they prefer). Because of the previously mentioned mutability, achieving this target measurement would require a very narrow dynamic range.
My suggestion here is two fold. First, rather than focusing on the idea of infinite measurement, game audio would be better served by implementing a system surrounding short term measurement…perhaps in the 5 to 10 second time window. Second, rather than targeting one specific measurement, the target should be to fall within a range of loudness measurements. One method of mixing in linear mediums is to mix around an “anchor” element. If you know that there is one moment that has to be louder than all others, that becomes the bar by which you judge subjective loudness in other situations. The same could be said about quiet moments. Establishing maximum and minimum loudness thresholds would afford more flexibility in the mix. Simultaneously, evaluating in smaller time windows would yield data that better reflects the unpredictable nature of a game’s content.
Having a defined range for game audio loudness for the base/default mix would also make it relatively simple to implement something similar to the “War Tapes” mode from Battlefield 2. A bit of compression in the parent mixer of a game’s audio engine could easily narrow the dynamic range in a predictable manner.
I understand that mixing, and evaluating said mix, under these short measurement windows could seem like a very labor intensive process. My next point addresses this concern…loudness normalization of audio elements.
One of the things that ITU-R BS.1770 does better than its metering predecessors is provide predictability of perceived volume. Two radically different sounds (here referring to spectral/frequency content) can subjectively sound at different volumes once in the acoustic medium, even though their electrical representations (as metered via PPM) appear equivalent. This is not the case with the sounds metered via ITU-R BS.1770. If we were to take those same two radically different sounds and adjust levels until they “metered” to the same loudness measurement, they would subjectively sound to be the same volume. This provides an opportunity on the mixing side.
If each family of sound effects were normalized to their own specific loudness unit (LKFS…remember, we’re talking ITU-R BS.1770 here, though LUFS is virtually equivalent), you could begin your mix early on with a very small set of representative sounds implemented into the engine. Conceivably, one could spend a great deal of time perfecting the mix engine using this small and easily managed set. Implementing additional sounds later on in development could have minimal impact on the performance of existing mixer snapshots, game states, etc (provided system resources and memory are properly planned, of course). A standardized peak “limit” may also be required for all sounds, as PPM and LKFS are different scales…clipping or distortion could occur otherwise. While due diligence would require testing and monitoring of the new sounds, it could be a very efficient method for acquiring highly detailed mixes of large and complex soundbanks. There are even loudness normalization batch processing solutions taking hold in the market (NuGen’s Loudness Management Batch Processor being a very elegant solution for network-storage based facilities).
As previously admitted, I am not a game audio professional. I have tried to keep these suggestions broad and general. Loudness normalization of individual elements could provide the flexibility needed to mix with small measurement windows, and targeting a loudness range while mixing is an approach better suited to the unpredictable nature of a population’s individual experiences. I feel these to be an approach better suited to the game audio industry than the infinite window, precise target measurement, prescribed in the North American and European broadcast specifications.
Please point out any flaws in my arguments. There is an appropriate implementation of loudness metering for games somewhere out there. This is merely a starting point. [...and one I wouldn't be surprised to find that someone may have already suggested elsewhere.]