NVEnc &16coreResults-Finally got HEVC to use all my 6core/12HT cpu...OUCH!

jmc

Active member
My normal 1.3Mbps MP4 if tested with HEVC will use 30s% range of cpu.
Was very annoyed.
X264 uses 60-80% and does 110fps, 4.5 real time.(PAL)

Finally got HEVC to peg my cpu at 100%. But it takes 4kHD,10Mbps, VerySlowPreset!
Was amused when it finally finished the 3second mpg > mp4 file at 0.13 fps!
Almost 4 minutes per second... that's just crazy!

Will have to test it on my 16 core threadripper-(gen1).
Got VRD5Pro over there, will have to install VRD6 and test.

Love to see that get a real workout without two outputs going...80%,(75C air cooler)).
with contrast,color,contour, VideoDenoise<(real cpu hog) & x264.
(fixing old fuzzy/faded/off color .mpg digitized tapes)

Should process each "scene" separately and rejoin them but did that one time.
Probably never,never again!
 
Last edited:

Otter

Member
Software encoding isn't a single process. Many steps must be done in order and short of doing 2 different encodes at once, it will be surprising to hit "100".
As your example suggests, 100% was only achieved by setting ridiculously high parameters and at the expense of any reasonable encode time.

I've gone thru many CPUs over the years (4 core, 6 core, 8 core Ryzen and now 12 core 3900X Ryzen)
I saw too many quality issues with files encoded quickly with "Speed" settings and see a big quality improvement with doing 2-pass encode with Slow or Slower settings.
What I needed was to decrease encode time without losing quality.
Each successive CPU increase in clock speed and core count did speed up the encode times, but nothing earthshaking ( 12 Core is 50% faster than 8 Core as expected)
The main advantage was that I can now run 2 software encodes at once and still have CPU cycles to do everyday tasks at the same time.

I don't game, so had always used lower end video cards. THEN, VRD v6 came out with GPU encoding.
The latest Nvidia 6th Gen GPU/NVEnc encoding engine is called Turing.
The results are amazing. I've encoded the same raw 1080i recording with:

VRD H264, 2-Pass, Slower
VRD NVEnc 1-Pass CRF19
X264 encoder, 2-Pass, Slower

When I compare screen shots and transition sections, it is very hard to see any difference. If anything, the VRD-NVEnc has less blocking at transitions than Main Concept H264.
I also notice less mottling or banding in large backgrounds like night skies.

The kicker is that it is 5-7 times faster than SW encoding with even the mighty 3900X 12 core
A 60 min 1080i recording with adverts clipped, resized to 720p and recoded to CRF19 will finish in about 300 seconds.
CPU still does some of the work (decoding etc), but encoding is all in the Turing GPU. VRD doesn't even get the Turing working hard.
Typical encode only uses the GPU at about 40% and GPU temps stay under 40C - fans don't even spin.
Sadly, not possible to do simultaneous GPU encodes, but stack the Batch Queue and I can have 6 files done in 30 minutes.

Encoding with HVEC/H265 gave similar impressive results in the earlier betas, but combo of NVEnc & HEVC just crashes in the last several versions & betas.
Now on v6-808 and still get "NVEnc video encoder error: Error with NVidia encoder initialization"
Hope that gets fixed soon.
 

Danr

Administrator
Staff member
Encoding with HVEC/H265 gave similar impressive results in the earlier betas, but combo of NVEnc & HEVC just crashes in the last several versions & betas.
Now on v6-808 and still get "NVEnc video encoder error: Error with NVidia encoder initialization"
Hope that gets fixed soon.
It's working here, I'm encoding HEVC w/NVENC now, and we don't have any open issues regarding this. Can you send us your log file and we will try and duplicate here.

Sadly, not possible to do simultaneous GPU encodes, but stack the Batch Queue and I can have 6 files done in 30 minutes
You could have 2 interactive sessions encoding at one time, or one interactive along with batch.
 

jmc

Active member
When I compare screen shots and transition sections, it is very hard to see any difference. If anything, the VRD-NVEnc has less blocking at transitions than Main Concept H264.
I also notice less mottling or banding in large backgrounds like night skies.
"less mottling or banding in large backgrounds like night skies"
This is what I really, really dislike!

The problem is I have no idea why the banding/mottling happens.
Is it my monitor or my video card?... I'll spend real money to get rid of banding if possible.
If I have to buy a Nvida card I'll do it.
(you said "less" not "none" mottling but I'm tempted to buy one just to see for myself)

Now if banding is in the original video... is there "any" kind of processing that will remove it?
Or does banding happen in BluRays? (I'm a dvd level guy).
I'm thinking that it is probably a bit rate problem but don't know.

"Video streaming in low light levels" Amazon or Netflix the mottling is just absolutely horrible!!

I have a Vega 56 but AMD's market share is too small for companies to spend time and resources
on supporting AMD GPU processing...per no VRD support etc..

I'm thinking that I saw something on NVEnc encoding that as long as the Turing
hardware GPU unit is in there, it does not matter...
The $1100 top of the line or the bottom, The NVEnc encoding is all the same...(?)

--------------------------
EDIT...
Chart seems to indicate that they are all the same except that the GeForce GTX 1650
does not support HEVC B Frame.
And limited to two sessions unless you go to the Quadro line-Unrestricted.

""https://developer.nvidia.com/video-encode-decode-gpu-support-matrix""
--------------------------
(another aspect... "GPU Hardware" limits the encode options. Cpu software...no limits.)

Sounds like maybe Hardware encoding is catching up to Software in quality...?

And thank you for all your input!
 
Last edited:

Otter

Member
"less mottling or banding in large backgrounds like night skies"
This is what I really, really dislike!

The problem is I have no idea why the banding/mottling happens.
Is it my monitor or my video card?... I'll spend real money to get rid of banding if possible.
If I have to buy a Nvida card I'll do it.
(you said "less" not "none" mottling but I'm tempted to buy one just to see for myself)

Now if banding is in the original video... is there "any" kind of processing that will remove it?
Or does banding happen in BluRays? (I'm a dvd level guy).
I'm thinking that it is probably a bit rate problem but don't know.

"Video streaming in low light levels" Amazon or Netflix the mottling is just absolutely horrible!!

I have a Vega 56 but AMD's market share is too small for companies to spend time and resources
on supporting AMD GPU processing...per no VRD support etc..

I'm thinking that I saw something on NVEnc encoding that as long as the Turing
hardware GPU unit is in there, it does not matter...
The $1100 top of the line or the bottom, The NVEnc encoding is all the same...(?)
--------------------------
EDIT...
Chart seems to indicate that they are all the same except that the GeForce GTX 1650
does not support HEVC B Frame.
And limited to two sessions unless you go to the Quadro line-Unrestricted.

""https://developer.nvidia.com/video-encode-decode-gpu-support-matrix""
--------------------------

Sounds like maybe Hardware encoding is catching up to Software in quality...?

And thank you for all your input!
My point was that I had spent $500 for a Ryzen 3900X on Day One, plus more for DDR4 3600 memory to get the Infinity Fabric clock at max.
All to chase lower encode times without dropping quality below what I like.
Going solely by my subjective comparison of screen shots and playback with MPC-HC on my 65", I could have saved a lot of $$$ by getting a Gen 6 video card.
I think HW encoding has finally come of age. Nvidia's Gen 6 combo of the TU11x GPUs and the Turing Engine does the job for me.

My research also indicates that there is no advantage for video encoding in going to the very expensive "gaming" cards.
I'm using a Gigabyte 1650 OC 4GB card ($160). If I monitor with HWInfo64 when doing HW encoding, the GPU engine, encoder & memory all are less than 50% usage.
GPU temp never hits 40C and the 2 fans never turn on - this thing is loafing AND pumping out very acceptable quality in amazing times.
Since it is not a gaming card, it does not need it's own power cable, doesn't add more heat in my case and is totally silent.

I'm a Nvidia fan, but seems like there are also new AMD cards that will perform AMD's version of HW encoding at a similar level.
If I was buying today, I'd probably wait for one of the new Nvidia GeForce 1650 Super cards coming first week of Nov.
Hearing costs are $10-$20 more and they upgrade to GDDR6, more Cuda cores and 60% more bandwidth.

BANDING:
Color banding IS annoying and is a result of encoding choices. Looking at the sky, a blank wall or any large colored area YOU see millions of distinct shades. The whole idea of encoding is to reduce the number of bits and keep the file to a reasonable size. Scene variance means some frames must have more bits, some need less. The encoder and it's various settings determine what is allocated where and how good a file results. If too few bits are allocated to a "sky" frame, it will be encoded with fewer individual shades and banding or mottling will result.

When you set a target bitrate, target file size or CRF value, you limit what the the encoder has to work with. That is why 2-pass encoding has always given better quality and less banding than 1-pass. First pass checks every frame to identify the frames that need more bits and which can do with less while keeping to the target. If a particular file has too much complexity, the remedy was to throw more bits at it via higher overall bitrate. The Turing engine seems to do a good job deciding where the bits should go without a lot of "setting magic"

That is why I'm chuffed with the new VRD6 / NVEnc Turing combo. What I see from 1-pass CRF19 HW encoding is as good as what I'm getting from 2-pass 2300 kbps "Slower" SW encodes.
And I can complete 6-7 HW files in the time it takes to do 1 SW encode.

EDIT to your EDIT:
Right about HEVC B frames, but I haven't been able to get VRD6 to do NVEnc-HEVC since first "804 Stable" version. Did do amazing things with HEVC in the previous betas, but immediately crashes since as I said above.
1650 GPU does have 2 engines (GPU Compute_0 & GPU Compute_1), but software like VRD6 would have capability to use them, which VRD6 does not at this point - only Compute_0.

There are encoding filters that reduce banding, but every one I tried did some sort of blending and softened everything. Try playing with settings and filters in your software player or tv settings - there might be something there that will help

The whole idea of Blu-Ray is LOTS OF BITS!, so no banding when you watch from the disk. All bets off when it is a file "encoded from Blu-ray" this will be dependent on how compressed it was by the encode.
Same with streaming - Amazon or other - everything is compressed as much as they can get away with - sometimes it seems they are only targeting 6" cellphones as display devices.
 
Last edited:

jmc

Active member
Well, I tested the HEVC/VerySlow 4K10Mbps crazy encode test on my
16core/32Thread Threadripper and WOW, it ate it up still.

0.35Fps instead of 0.13fps - 2.7 times faster the old 3930 6/12 thread 2012 cpu.

The first half used 50-60s% of 32 threads.
But the second half used 70,80,90 to 99% of all 32 threads. Kept getting higher the longer it went.

Never did see 100% tho. So don't need to buy the new 3rd gen 24core 48 thread cpu due on Nov 25th.
Getting the old socket AM4, new 3950 16/32 cpu coming out then also.

Heh, maybe that would actually get up to a full 1fps!
 

Otter

Member
I'm getting ~300 fps on my rig with a Ryzen 3900X, 12core/24thread and Nvidia Volta GPU/NVEnc/HEVC/HD encoding.
Using settings of CRF18, 1 pass, Slower for the GPU encode.
Both the Ryzen and Volta are loafing during encodes - could easily do 2 at once.
Don't know how it compares to your CPU encode for image quality, since I didn't try 4k but mine looks real good on 65" HDTV and upscales well to 4KTV
Spent $500 on the Ryzen - less than 1/2 that on the video card.
I think I wasted my money on the Ryzen 3900X
You might want to investigate switching to GPU encoding
 
Last edited:

jmc

Active member
I'm getting ~300 fps on my rig with a Ryzen 3900X, 12core/24thread and Nvidia Volta GPU/NVEnc/HEVC/HD encoding.
Using settings of CRF18, 1 pass, Slower for the GPU encode.
Both the Ryzen and Volta are loafing during encodes - could easily do 2 at once.
Don't know how it compares to your CPU encode for image quality, since I didn't try 4k but mine looks real good on 65" HDTV and upscales well to 4KTV
Spent $500 on the Ryzen - less than 1/2 that on the video card.
I think I wasted my money on the Ryzen 3900X
Might want to investigate switching to GPU encoding
EDIT....Do you know if the "Slower" Preset actually has any effect on NVEnc? I'm not seeing any...(still testing/learning)
----------I only notice bitrate and resolution having any effect.

GPU encoding...Yep, I'm working on that now.
Have a 4K/60,50MbpsHEVC.ts file I'm trying to get NVEnc and VRD6 BOTH to work with so I can compare.
I going to have to reprocess the file to something a good bit lower to get both VRD6 and NVEnc to handle it.

NVEnc seems to have no problem with a HEVC encode but with X264 runs into an "error 8" x264 not initialize problem.
VRD6 runs out of memory with the same file. (HEVC is not widely supported enough for me-plug&play...yet)
From what I've read about that error, NVEnc has a hardware Mbps limit built in and you have to find it yourself.

I've got TMPGEnc 6 on the threadripper and is the only other program I have that will handle HEVC.
Going to reprocess the 50Mbps to 10Mbps and see if both programs will accept the test file and I
can finally do some screen capture/compares.

Don't like the "one pass" limit on NVEnc so CRF is what I'm hoping on.

Reading up on CRF encoding...
It's suppose to give good quality but you won't know the file size you will get.
No idea what kind of size spans they are talking about. Will find out the hard way.
Just hope it is not too wild...I like my half GB one hour shows.

Thanks,
jmc
 
Last edited:

Otter

Member
With NVidia's HW encoding, you are offloading most decisions to the NVidia API and the GPU engine circuitry. NVidia has updated the API along with the new Volta GPU engine.
Most encoding functions are moved into the chip. This means a choice like "2-pass" has to have a call in the NVidia API to trigger the function in the GPU. The software like VideoReDo has to be written to make that specific API call using the specific Volta syntax. Just because software like VideoReDo has a option for “2-pass” doesn’t mean it works – may just be left over from the software H264 options menu.

Slower preset: Tried various Presets – didn’t make any difference in encoding speed/time and don’t see any big differences in the output files. Still it’s there, doesn’t slow things down, why not use Slower or Slowest.

CRF encoding: Currently, NVidia seems to be 1-pass only. There was a “2-pass” call in the older versions of the NVidia API I’ve read, but I’ve been told the 2-pass option for the Volta GPU API has been “depreciated”.
In 2-pass encoding, the first pass analyzes the entire file for complexity, creates a map as a temp file and uses that to allocate bits based on the needs of each frame while still hitting the target limits.
NVenc works right in the GPU, so getting it to create a complexity map on a 1st pass, store it on a drive and then recall the info from the slow drive while the 2nd pass was running thru the GPU at light speed wouldn't work. Likewise, building a large enough cache into the GPU to store such a map is also silly.
When I’ve done software encodes, I see an improvement in quality for “average bitrate” and “target size” 2-pass, but not much difference with 2-pass CRF.
With the NVEnc HW I always use CRF18 or CRF19 and get good files. Usually the problems cropped up when I tried CRF21 or worse.

HEVC 4K 50Mb/s: Can’t give any help there. Don’t have any need to work with such high resolutions or bitrates. Highest I go is 1080p @ ~10-12Mb/s raw recorded files.
Above that Mb/s seems to trigger errors when recoding in VRD HW – still investigating what’s up.

VRD errors: Yup…NVEnc and HEVC are new to VRD v6 along with a bunch of other features. Despite Stable releases VRD is still very much a work in progress. At least the developers are making progress stamping out the bugs.
 
Last edited:

jmc

Active member
Finally got my test file down to where both NVEnc and VRD6 can encode (1080p@10Mbps)...

VRD6 can handle the 50Mbps but not 4k (1440p and 1080p are ok)

NVEnc x264 can not handle 50Mbps. "Think" that HEVC had no problem. X264-NVEnc is upper bitrate limited and fails.
(think it is error code 8)

Here is some CRF results...
Bits_0 is "Bits/(Pixel*Frame)" from the MediaInfo program.

VRD6-X264---740x480-2Pass-Bits_0.126-MediaInfo-1.30Mbps-FROM_1080p30-6sec-x264@10MbpsCBR --1108KB
(this size seems to be sort of CRF16.5)

NVEnc-CRF18-X264-740x480, Bits_0.104-MediaInfo-1.07Mbps-FROM_1080p30-6sec-x264@10MbpsCBR----908KB
NVEnc-CRF17-X264-740x480, Bits_0.117-MediaInfo-1.21Mbps-FROM_1080p30-6sec-x264@10MbpsCBR--1013KB
NVEnc-CRF16-X264-740x480, Bits_0.150-MediaInfo-1.55Mbps-FROM_1080p30-6sec-x264@10MbpsCBR--1281KB
NVEnc-CRF15-X264-740x480, Bits_0.190-MediaInfo-1.97Mbps-FROM_1080p30-6sec-x264@10MbpsCBR--1609KB

A quick look... CRF 18 has less smooth shadows. More fine detail in the skin as you go up in bitrate.
No frame capture and zooming in yet.

I think I would most likely notice any splotchy shadows...hate those.
 
Last edited:
Top Bottom