Exploring Hardware Encoding...

When I built my latest computer system one thing I was keen to buy with the system was a GPU that was capable of hardware based video encoding. While I was not prepared to pay over $1,000 for a really powerful, front line GPU I was prepared to pay just under half of this amount, which in mid 2019 purchased an MSI GeForce GTX 1660Ti. Quite a capable GPU in its own right and more than capable of being used for hardware encoding as excitingly it is an entry level GPU built with the NVIDIA Turing architecture.

With this GPU I then started an exploration of hardware GPU encoding where the often considerable burden of the encoding process is offloaded from the CPU and placed almost exclusively on the GPU. It is this exploration that I have placed on this page...

I do not represent this page as anything approaching definitive, however it does track my own explorations of this comparitively new technology with some real life examples as well as some real life mistakes! And in this exploration I am hoping that I will provide a document that is of some use to others exploring the same area, so that others will find their own path perhaps a little easier...

Meeting the GPU...

First to meet the GPU and for this purpose I have created the table below which demonstrates the various performance levels possible for the card when using the latest NVidia Latest Long Lived Branch driver which at the time of creating this page is Version 525.78.01. Under Linux these levels can be seen under 'Power Mizer Information' in the NVidia Settings utility. I run the preferred mode as 'Auto' but there is a choice also for either 'Adaptive' or 'Prefer Maximum Performance'.

MSI - GeForce GTX 1660 Ti
Graphics Clock Memory Transfer Rate
Level Min Max Min Max
0 300 MHz 645 MHz 810 MHz 810 MHz
1 300 MHz 2100 MHz 1620 MHz 1620 MHz
2 300 MHz 2100 MHz 10002 MHz 10002 MHz
3 300 MHz 2100 MHz 11502 MHz 11502 MHz
4 300 MHz 2100 MHz 12002 MHz 12002 MHz
'Official' Specifications of the GPU here...

An interesting point with this GPU is the manner in which the cooling runs. It is a reasonably bulky card with some big heat pipes and a couple of big fans but the interesting point is the so-called 'zero Frozr' technology which activates these fans only when the GPU reaches a certain temperature level (spoiler alert: the fans start up at about 60-65°C). Designed to reduce fan noise it is a feature I will comment on in this page, noting when the fans kick in and what sort of levels these fans are pushed up to. But now to get started with transcoding!

Transcoding...

Some preparation is required when setting out to explore hardware GPU acceleration. I am particularly blessed in running the development version Slackware and this gives an environment where it is relatively easy to compile the latest versions (as of March 2023) of the following applications:

In a 'Frequently Asked Question' I did not install the NVidia Cuda Toolkit as this is not required for simple usage of either nvenc or nvdec for GPU based encoding and decoding, which is all that I am really interested in pursuing. Mind you I would like one day to experiment with GPU based scaling which I believe is gained with libnpp from the Cuda Toolkit but this is a job for another day.

To really give the GPU some work to do I have used the 4K lossless version of Blender's 'Sintel' short movie and I give fair warning that the video is an epic download at 53GiB and the expanded archive will then take up a mind altering 212GiB of your HDD real estate. A highly useful source file though, download here along with the matching, lossless audio:

wget https://media.xiph.org/sintel/sintel-4k.y4m.xz
wget https://media.xiph.org/sintel/sintel-master-st.wav

And now to come to grips with transcoding this source file, working with both H.264 and HEVC encoding...

H.264 Hardware Encoding...

There is a labyrinth of conflicting advice online as to the best command line for encoding with h264_nvenc so for those of us who are not gurus of the multimedia world there is always the presets to select from! As of March 2023 the following presets are available for h264_nvenc:

  -preset            <int>        E..V....... Set the encoding preset (from 0 to 18) (default p4)
     default         0            E..V....... 
     slow            1            E..V....... hq 2 passes
     medium          2            E..V....... hq 1 pass
     fast            3            E..V....... hp 1 pass
     hp              4            E..V....... 
     hq              5            E..V....... 
     bd              6            E..V....... 
     ll              7            E..V....... low latency
     llhq            8            E..V....... low latency hq
     llhp            9            E..V....... low latency hp
     lossless        10           E..V....... 
     losslesshp      11           E..V....... 
     p1              12           E..V....... fastest (lowest quality)
     p2              13           E..V....... faster (lower quality)
     p3              14           E..V....... fast (low quality)
     p4              15           E..V....... medium (default)
     p5              16           E..V....... slow (good quality)
     p6              17           E..V....... slower (better quality)
     p7              18           E..V....... slowest (best quality)

And certainly for a guaranteed and painless result one of these presets will produce a perfectly usable output file. Can I mention here that one of the great unknowns at the moment is what options are used in these presets? I experimented a little with most of these presets and ended up using the 'p7' preset with a few trimmings, the command line is here:

time ffmpeg -hwaccel nvdec -i sintel-4k.y4m -i sintel-master-st.wav \
     -c:v h264_nvenc -preset p7 -tune hq \
     -b:v 5M -minrate:v 2M -maxrate 10M -bufsize 5M \
     -profile:v high -bf 3 -rc-lookahead:v 32 \
     -c:a libfdk_aac -af aresample=resampler=soxr -ar 44100 -b:a 128k -ac 2 \
     4k_h264_sintel.mp4

Another option that I am quite keen in is the 'constant quality mode in VBR rate control' which I tested as '-cq 19'. This gave much better video quality but doubled the final file size and extended transcoding time. But a good option to experiment with. You can examine all of the options available for this encoder by running the following:

ffmpeg -h encoder=h264_nvenc -hide_banner

Using my own selected, and pretty conservative, options I saw absolutely zero utilisation of the processing power from the CPU and the encode ran at 40-45fps while pushing the GPU frequency to a steady 1995Mhz. GPU temp went up to just above 60° Celcius which enough to kick the GPUs cooling fans into action at 32% possible speed and 1070rpm, this is with default fan curves. Very satisfying but still some room to push the GPU even further!

The output file itself looks fantastic with good colour and the fast action scenes are smooth as silk. The eventual output size of the file was 547MiB and processing time was 7 minutes and 45 seconds. Perfect!

HEVC Hardware Encoding...

If there is a virtual labyrinth of conflicting advice available concerning GPU based H.264 encoding there is in contrast a veritable wasteland of advice concerning GPU based HEVC encoding: not very much at all! And so I again chose from from the available presets:

  -preset            <int>        E..V....... Set the encoding preset (from 0 to 18) (default p4)
     default         0            E..V....... 
     slow            1            E..V....... hq 2 passes
     medium          2            E..V....... hq 1 pass
     fast            3            E..V....... hp 1 pass
     hp              4            E..V....... 
     hq              5            E..V....... 
     bd              6            E..V....... 
     ll              7            E..V....... low latency
     llhq            8            E..V....... low latency hq
     llhp            9            E..V....... low latency hp
     lossless        10           E..V....... 
     losslesshp      11           E..V....... 
     p1              12           E..V....... fastest (lowest quality)
     p2              13           E..V....... faster (lower quality)
     p3              14           E..V....... fast (low quality)
     p4              15           E..V....... medium (default)
     p5              16           E..V....... slow (good quality)
     p6              17           E..V....... slower (better quality)
     p7              18           E..V....... slowest (best quality)

And again I have intense curiousity about the settings contained within these presets! Anyway I have used a command line set that is not that different to the H.264 command line but I have taken the promise of HEVC encoding to heart and lowered the bitrate in the expectation of better quality at a lower bitrate:

time ffmpeg -hwaccel nvdec -i sintel-4k.y4m -i sintel-master-st.wav \
       -c:v hevc_nvenc -preset p7 -tune hq \
       -b:v 2M -minrate:v 1M -maxrate:v 4M -bufsize 4M \
       -profile:v 2 -bf 3 -rc-lookahead:v 32 \
       -c:a libfdk_aac -af aresample=resampler=soxr -ar 44100 -b:a 128k -ac 2 \
       4k_hevc_sintel.mp4

And again another option that I am quite keen in is the 'constant quality mode in VBR rate control' which I tested as '-cq 19'. This gave much better video quality but increased the final file size and extended transcoding time. But a good option to experiment with. You can examine all of the options available for this encoder by running the following:

ffmpeg -h encoder=h264_nvenc -hide_banner

The GPU settled on 1980Mhz with GPU temp settling on about 60° Celcius with intermittent activation of the cooling fans at a speed of 1032rpm or so (32%). The encoding ran at a steady 48fps with a more than reasonable quality encoding produced with an eventual filesize of 242MiB. Perfect!

And in conclusion...

This page represents my own initial exploration of FFmpeg and hardware acceleration. My initial thoughts are that it will not replace CPU based encoding any time soon and certainly what I miss mostly is the access to multithreaded encoding possible with a suitable CPU. Not to forget as well the plethora of information concerning CPU based encoding compared with the paucity of information concerning GPU based encoding.

However if I have perhaps made your own usage of both a little easier let me know by using the email link here. Also let me know of any errors on this page or indeed if there are better ways to accomplish the various tasks I have outlined here. And above all "Have fun!".