1. Andrew's Corner »
  2. FFmpeg: Some explorations...

FFmpeg: Exploring Hardware Encoding...

When I recently built a new computer system one thing I was keen to buy with the system was a GPU that was capable of hardware based video encoding. While I was not prepared to pay over $1,000 for a really powerful, front line GPU I was prepared to pay just under half of this amount, which in mid 2019 purchased an MSI GeForce GTX 1660Ti. Quite a capable GPU in its own right and more than capable of being used for hardware encoding as excitingly it is an entry level GPU built with the NVIDIA Turing architecture. With this GPU I then started an exploration of hardware GPU encoding where the often considerable burden of the encoding process is offloaded from the CPU and placed almost exclusively on the GPU. It is this exploration that I have placed on this page...

I do not represent this page as anything approaching definitive, for this there is the FFmpeg Hwaccel trac document. However it does track my own explorations of this comparitively new technology with some real life examples as well as some real life mistakes! And in this exploration I am hoping that I will provide a document that is of some use to others exploring the same area, so that others will find their own path perhaps a little easier...

Meeting the GPU...

First to meet the GPU and for this purpose I have created the table below which demonstrates the various performance levels possible for the card when using the latest NVidia Latest Long Lived Branch driver which at the time of creating this page is Version 430.34. Under Linux there is a limited ability to select the modes from 0 to 4, unlike Windows where I believe selection is a little easier? Nevertheless there is the possibility of selecting a 'Preferred Mode' of Auto, Adaptive or Prefer Maximim Performance and to tell the truth I have found that the best option is to simply select Auto and allow the driver to shuttle up and down. My aim in testing GPU acceleration with FFmpeg is to hit as close as possible to the maximum settings in Performance Level 4! Anyway here are the possible levels:

MSI - GeForce GTX 1660 Ti
'Official' Specifications of the GPU here...
Graphics Clock Memory Transfer Rate
Level Min Max Min Max
0 300 MHz 645 MHz 810 MHz 810 MHz
1 300 MHz 2100 MHz 1620 MHz 1620 MHz
2 300 MHz 2100 MHz 10002 MHz 10002 MHz
3 300 MHz 2100 MHz 11502 MHz 11502 MHz
4 300 MHz 2100 MHz 12002 MHz 12002 MHz

An interesting point with this GPU is the manner in which the cooling runs. It is a reasonably bulky card with some big heat pipes and a couple of big fans but the interesting point is the so-called 'zero Frozr' technology which activates these fans only when the GPU reaches a certain temperature level (spoiler alert: the fans start up at about 60-65°C). Designed to reduce fan noise it is a feature I will comment on in this page, noting when the fans kick in and what sort of levels these fans are pushed up to. But now to get started with transcoding!

Transcoding...

Some preparation is required when setting out to explore hardware GPU acceleration. I am particularly blessed in running the development version Slackware and this gives an environment where it is relatively easy to compile the latest versions (as of July 2019) of the following applications:

In a 'Frequently Asked Question' I did not install the NVidia Cuda Toolkit as this is not required for simple usage of either nvenc or nvdec for GPU based encoding and decoding, which is all that I am really interested in pursuing. Mind you I would like one day to experiment with GPU based scaling which I believe is gained with libnpp from the Cuda Toolkit but this is a job for another day.

To really give the GPU some work to do I have used the 4K lossless version of Blender's 'Sintel' short movie and I give fair warning that this is not only a reasonably epic download at 53GiB but the expanded archive will then take up a mind altering 212GiB of your HDD real estate. A highly useful source file though and details are here:

wget https://media.xiph.org/sintel/sintel-4k.y4m.xz
wget https://media.xiph.org/sintel/sintel-master-st.wav

And now to come to grips with transcoding this source file, working with both H.264 and HEVC encoding...

H.264 Hardware Encoding...

There is a labyrinth of conflicting advice online as to the best command line for encoding with h264_nvenc so for those of us who are not gurus of the multimedia world there is the preset. As of July 2019 the following presets are available from within FFmpeg:

  -preset            <int>        E..V..... Set the encoding preset (from 0 to 11) (default medium)
     default                      E..V..... 
     slow                         E..V..... hq 2 passes
     medium                       E..V..... hq 1 pass
     fast                         E..V..... hp 1 pass
     hp                           E..V..... 
     hq                           E..V..... 
     bd                           E..V..... 
     ll                           E..V..... low latency
     llhq                         E..V..... low latency hq
     llhp                         E..V..... low latency hp
     lossless                     E..V..... 
     losslesshp                   E..V..... 

And certainly for a guaranteed and painless result one of these presets will produce a perfectly usable output file. Can I mention here that one of the great unknowns at the moment is what options are used in these presets? I experimented a little with most of these presets and went more than a little conservative by ending up using the 'medium' preset with a few trimmings, the command line is here:

ffmpeg -hwaccel nvdec -i sintel-4k.y4m -i sintel-master-st.wav \
       -c:v h264_nvenc -preset medium -rc:v vbr_hq -qmin 0 -cq 19 \
       -b:v 4M -minrate:v 2M -maxrate:v 6M -bufsize:v 8M \
       -profile:v high -bf 3 -rc-lookahead:v 32 \
       -c:a libfdk_aac -b:a 128k \
       4k_h264_sintel.mp4

A few notes on this syntax, most of which I will need to return to and rework with the benefit of a little more reflection:

  1. -rc:v vbr_hq -qmin 0 -cq 19: This allows an override of the rate control set by the preset and its replacement with an appropriate quality level. The setting 19 seems to work well enough with this material.
  2. -b:v 4M -minrate:v 2M -maxrate:v 6M -bufsize:v 8M: Even though I am using a variable bitrate method of encoding I believe that the encoder performs best with a target bitrate to operate around. If this is not set the encoder works at quite a low bitrate and the quality is very ordinary. I am unsure if the buffer settings and max / min settings are optimal or even required so this is homework for me...
  3. -rc-lookahead:v 32: Is this required? I am a little unsure so again this is homework for me.
  4. Audio settings: It seems that these audio settings are about as basic as they could be so a small project for me is to spice things up a little and bring the sound to life. It is a great soundtrack and it would be nice to do it justice!

Utilising absolutely zero processing power from the CPU this encode ran at 50-55fps and pushed the GPU frequency to a steady 1995Mhz. GPU temp went up to just above 60° Celcius which enough to kick the GPUs cooling fans into action at 32% possible speed and 1070rpm, this is with default fan curves. Very satisfying but still some room to push the GPU even further!

The output file itself looks fantastic with good colour and the fast action scenes are smooth as silk. The eventual output size of the file was 400MiB and processing time was 6 minutes and 52 seconds. Mind you although it is a monster file at 220GiB the play time is only 15 minutes so it is pretty much the perfect test file for my purposes!

HEVC Hardware Encoding...

If there is a virtual labyrinth of conflicting advice available concerning GPU based H.264 encoding there is in contrast a veritable wasteland of advice concerning GPU based HEVC encoding: not very much at all! Faced with some widespread confusion over what the nvenc developers term '2 pass encoding' I have experimented with a single pass, VBR, quality based encode using a significantly lower target bitrate for the encoder to aim for. The 'working version' command line is here:

ffmpeg -hwaccel nvdec -i sintel-4k.y4m -i sintel-master-st.wav \
       -c:v hevc_nvenc -preset medium -rc:v vbr_hq -qmin 0 -cq 19 \
       -b:v 2M -minrate:v 1M -maxrate:v 4M -bufsize 4M  \
       -profile:v main -bf 3 -rc-lookahead:v 32 \
       -c:a libfdk_aac -b:a 128k \
       4k_hevc_sintel.mp4

After significant experimentation I have landed on syntax which pretty much duplicates the syntax I settled on for h264_nvenc. And again the GPU settled on 1980Mhz with GPU temp settling on about 60° Celcius with intermittent activation of the cooling fans at a speed of 1032rpms or so (32%). The encoding ran at a steady 48fps with a reasonable quality encoding produced.

A few notes on this syntax, most of which I will need to return to and rework with the benefit of a little more reflection:

  1. Nvenc presets?: So I really need to make an effort to find out what are the options included in these presets! If these options are unknown any additional command line options may be either ignored or may very well be counter productive.
  2. Best bitrate?: I have been seduced by the promise of HEVC encoding: great output quality with lower bitrate. What is the best bitrate for this particular material?
  3. -bf 3: I have set B-Frames for 3 and I am not sure if this is the best number. I need also to investigate the '-b_ref_mode' setting which has a choice of disabled / each / middle.
  4. libnpp scaling: The input file is reasonable majestic in terms of its physical dimensions so my plan is to experiment a little with CUDA based scaling with libnpp. This will mean installing the gargantuan CUDA SDK and investigating it in some depth. Lots of fun to come!

So still some work to do here but I am satisfied with this initial exploration. The output file is 190MiB and more than reasonable quality with a total time of this encode of 7 minutes and 25 seconds. Still some room here to push the GPU a little harder as I should be able to achieve a sustained frequency of 2100MhZ and I should really be recording the memory transfer rate which has the ability to reach a maximum of 12002MhZ.

And in conclusion...

This page represents my own initial exploration of FFmpeg and hardware acceleration. My initial thoughts are that it will not replace CPU based encoding any time soon and certainly what I miss mostly is the access to multithreaded encoding possible with a suitable CPU. Not to forget as well the plethora of information concerning CPU based encoding compared with the paucity of information concerning GPU based encoding. However I have perhaps made your own usage of both a little easier let me know by using the email link below. Also let me know of any errors on this page or indeed if there are better ways to accomplish the various tasks I have outlined here. And above all "Have fun!".