通过 AUSound 隔离实现人声衰减

现在安静

Apple Music Sing利用内置的隔音音频单元。这是一个可配置隔离人声的基本应用程序。

用法

在 macOS、iPadOS 或 iOS 上构建和运行。打开任何音频文件,播放将立即开始。相应地调整滑块,享受!

您还可以导出并保存当前歌曲的 M4A,并应用当前的人声衰减级别。

Gerard Way的“药店香水”应用程序屏幕截图

(我确实说过这是基本的。希望它能作为如何利用音频单元🙂的示例

Vocal Isolation?

Beginning in iOS 16.0 and macOS 13.0, a new audio unit entitled AUSoundIsolation silently appeared. As of writing:

  • If you search the unit’s name’s on GitHub, there are 9 results.
  • If you search this on Google/Bing/[…], the majority of results are Audacity users complaining about it being incompatible.

(…in other words, another typical unannounced and undocumented addition. Thanks, Apple!)

This unit is silently the backbone of Apple Music Sing. A custom neural network is applied to the isolation audio unit, separating and reducing vocals. This is done entirely on-device, hence Apple’s mention of:

Apple Music Sing is available on iPhone 11 and later and iPhone SE (3rd generation) using iOS 16.2 or later.

This aligns with their second-generation neural engine (present in A13 and above).

Audio Unit Parameters

Parameter ID Name Value Description
95782 UseTuningMode 1.0 The default is 1.0, used as a boolean.
95783 TuningMode 1.0. Similarly, this is set to 1.0.
0 kAUSoundIsolationParam_WetDryMixPercent 85.0 The amount to remove vocals by. Should be 0.0 to 100.0 – any value above increases vocal volumes significantly. (Try 1000.0 with your volume set to 1%.)

Audio Unit Properties

Property ID Name Description
7000 CoreAudioReporterTimePeriod ?
30000 NeuralNetPlistPathOverride The path to load the property list named for the neural network.aufx-nnet-appl.plist
40000 NeuralNetModelNetPathBaseOverride The directory to load weights and so forth from. If not specificed, the within its property list is utilized.ModelNetPathBase
50000 DeverbPresetPathOverride ?
60000 DenoisePresetPath ?

(If you’re a lone soul frantically searching for what these are in the near future, pull requests with their description would be much appreciated.)

MediaPlaybackCore.framework (providing this functionality) appears to only set “NeuralNetModelNetPathBase” and “NeuralNetModelNetPathBaseOverride”, and by default sets “DereverbPresetPathOverride” to null (thus disabling it).

Despite how Apple Music applies it, lyrics (whether timed, or timed by word) are not a factor whatsoever in the model. This is especially apparent if you listen to any song where vocals are distorted, or background instrumentals drown out vocals. I imagine Apple simply does not provide the option for non-timed songs because karaoke wouldn’t be nearly as fun.

GitHub

点击跳转