Skip to content

Low level, mostly safe FFmpeg API wrappers based on FFmpeg.AutoGen.

License

Notifications You must be signed in to change notification settings

RyanYANG52/FFmpegWrapper

 
 

Repository files navigation

FFmpeg.ApiWrapper

GitHub Nuget

Low level, mostly safe FFmpeg API wrappers built on top of FFmpeg.AutoGen.


Using the FFmpeg API tends to be tedious and error prone due to the amount of struct setup, memory management, and vague error codes.
This library aims to provide safe idiomatic wrappers and utility functions around several APIs, to facilitate development of media applications.

Examples

The easiest way to get started may be by looking at the code samples listed below. The wrappers do not diverge too much from their native counterparts, so familiarity with the native API may help with more complicated use cases.

On Windows, FFmpeg binaries must be manually copied to the build directory, or specified through ffmpeg.RootPath as explained here.

Introduction

Write-up on a few key FFmpeg concepts and the wrapper.

Codecs and Containers

^TODO

Frames and Formats

Uncompressed audio and video frames can be represented in numerous different ways. Video frames are defined by resolution, colorspace, and pixel format; while audio frames are defined by sample rate, channel layout, and sample format.

Some formats can differ in how channels are arranged in memory - in either interleaved or planar layouts. Interleaved (or packed) formats store the value for each channel immediately next to each other, while planar formats store each channel in a different array.

Planar formats can make the implementation of many codecs simpler and more efficient, as each channel can be easily processed independently.
They also facilitate chroma subsampling, a scheme where chroma channels (U and V) are typically stored at a factor of the resolution of the luma channel (Y).

Audio formats are a lot simpler than video (but perhaps not as intuitive), as they always represents the same amplitude levels1 that are encoded in one of a few handful of sample formats.

Interleaved Planar
var frame = new VideoFrame(1024, 1024, PixelFormats.RGBA);
var row = frame.GetRowSpan<byte>(y);

row[x * 4 + 0] = 64;  //R
row[x * 4 + 1] = 128; //G
row[x * 4 + 2] = 255; //B
row[x * 4 + 3] = 255; //A
var frame = new VideoFrame(1024, 1024, PixelFormats.YUV444);
var rowY = frame.GetRowSpan<byte>(y, plane: 0);
var rowU = frame.GetRowSpan<byte>(y, plane: 1);
var rowV = frame.GetRowSpan<byte>(y, plane: 2);

rowY[x] = 255;
rowU[x] = 32;
rowV[x] = 255;

Since codecs generally only support a handful of formats, converting between them is an essential operation. FFmpeg provides optimized software-based video scaling and format conversion via libswscale, which is exposed in the SwScaler wrapper. Likewise, libswresample provides for audio conversion, resampling, and mixing, exposed similarly in SwResampler.

Timestamps and Time Bases

Because streams can have arbitrary frame and sample rates, representing frame timestamps with floating-point numbers or around a single scale could presumably lead to slight rounding errors that may accumulate over time. Instead, they are represented as fixed-point numbers based around an arbitrary time base - a rational number denoting one second.

For video, the time base is normally set to 1/FrameRate, and for audio, it is often 1/SampleRate. When decoding something, it's best not to make any assumptions and properly convert between bases if necessary.
The encoder wrappers will automatically set these values in the constructor, so they only need be changed when encoding variable frame-rate video or if required by the codec.

Converting between time bases is still often necessary for ease of interpretation. This can be done via the Rational.Rescale() (av_rescale_q) function, or one of the few other TimeSpan helpers: Rational.GetTimeSpan(), MediaStream.GetTimestamp() and MediaEncoder.GetFramePts().

^TODO: PTS/DTS/BestEffortTimestamps

Hardware Acceleration

Many video-related tasks can be offloaded to dedicated hardware for better performance and power efficiency. FFmpeg provides support for several platform specific APIs, that can be used for encoding, decoding, and filtering2.

The bulk of work in enabling hardware accelerated encoding/decoding typically involves the enumeration of possible hardware configurations (which may or not be supported by the platform), and trying to instantiate appliable devices. Once a valid device is found, the codec setup is fairly simple:

Decoding Encoding
var decoder = (VideoDecoder)demuxer.CreateStreamDecoder(stream, open: false);
var config = decoder.GetHardwareConfigs()
                    .FirstOrDefault(c => c.DeviceType == HWDeviceTypes.DXVA2);
using var device = HardwareDevice.Create(config.DeviceType);

if (device != null) {
    decoder.SetupHardwareAccelerator(config, device);
}
decoder.Open();
var format = new PictureFormat(1920, 1080, PixelFormats.NV12);
using var device = VideoEncoder.CreateCompatibleHardwareDevice(CodecIds.HEVC, format, out var config);

if (device != null) {
    encoder = new VideoEncoder(config, format, frameRate: 30, device);
} else {
    //Software fallback
    encoder = new VideoEncoder(MediaCodec.GetEncoder("libx265"), format, frameRate: 30);
}

For decoding, SetupHardwareAccelerator() setups a negotiation callback via AVCodecContext.get_format, which can still reject the device if e.g. the codec settings aren't supported. In that case, it will silently fallback to software decoding.

Hardware frames need special handling as they generally refer to data outside main memory. Data needs to be copied via transfer calls or memory mappings (rarely supported).

using var decodedFrame = new VideoFrame();
using var swFrame = new VideoFrame();

while (decoder.ReceiveFrame(decodedFrame)) {
    if (decodedFrame.IsHardwareFrame) {
        decodedFrame.TransferTo(swFrame);
        //Use `swFrame`
    } else {
        //Use `decodedFrame`
    }
}

Most encoders can take normal software frames without any additional ceremony, but when that is not possible, hardware frames must be allocated via HardwareFramePool.

Filters

^TODO?

GC and Unmanaged Memory Lifetimes

Most wrappers use GC finalizers to free unmanaged memory in case Dispose() fails to be called. When accessing unmanaged properties from the wrappers (e.g. Handle properties, and span-returning methods such as Frame.GetSpan()), users should ensure that wrapper objects are not collected by the GC while unmanaged memory is in use, otherwise it could be freed by the finalizer.
This can be done by keeping external references (in a field), wrapping in using blocks, or calling GC.KeepAlive() after the code accessing unmanged memory.

Footnotes

  1. https://developer.mozilla.org/en-US/docs/Web/Media/Formats/Audio_concepts

  2. https://trac.ffmpeg.org/wiki/HWAccelIntro

About

Low level, mostly safe FFmpeg API wrappers based on FFmpeg.AutoGen.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C# 100.0%