Discrete HTPC GPU Shootout
by Ganesh T S on June 12, 2011 10:30 PM EST
LAV CUVID can be benchmarked using GraphStudio's inbuilt benchmark to check the video decoder performance. Unfortunately, GraphStudio can't use madVR in this process. Since our intent was to determine the performance of the GPU with and without madVR enabled, it was essential that madVR be a part of the benchmark. The developer of madVR, Mathias Rauen, created a special benchmarking build which was used to generate the figures in this section.
The picture below shows the madVR benchmark build working in the decode-only mode on the GT 430 for a 1080i60 H264 clip.
Click to Enlarge
LAV CUVID is doing the actual decoding (that is not visible in the picture) and sending frames over to the madVR filter, but the filter just keeps track of the decode frame rate and doesn't render it. All the driver post processing steps are enabled. The interlaced clip being played back uses around 76% of the VPU. Decoding is being performed at 91 fps, much more than the clip's 60 fps rate. The GPU load is 79%, and that is because of the deinterlacing being performed using the shaders. This shows there is some headroom available in the GPU for further post processing. Is there enough for madVR ? The picture below shows the benchmark build working in the decode + post processing mode.
Click to Enlarge
Note that the frame rate falls below the real time requirement. At 52 fps, the renderer drops approximately 8 frames every second. The VPU load falls to 38% because the process is now limited by how fast the processing steps in madVR can execute. GPU-Z shows that madVR has caused the GPU load to hike up to 97%, and this becomes the bottleneck in the chain.
Another interesting aspect to note in the GPU-Z screenshots above is that madVR increases the load on the GPU's memory controller from 23% to 36%. This is to be expected, as madVR makes multiple passes over the frame and needs to move data back and forth between the shaders and the GPU's DRAM.
The extent of drop in the frame rate (and whether it fails to meet real time requirements) is decided by the options enabled in the madVR settings. We ran the benchmarks with various madVR configurations and for various codecs to get an idea of the performance of LAV CUVID, madVR and of course, the GPUs.
Before moving on to the benchmarking results, we have some more notes about the upsampling algorithms in madVR. Human eyes are much less sensitive to chroma resolution than to luma resolution. This is the reason why chroma is stored in a lower resolution with 4:2:0 compression. Due to the low chroma resolution, chroma often tends to look blocky with visible aliasing (especially visible when you have e.g. red fonts on black background). Usually, the best way to upsample chroma is to use a very soft interpolator to remove all the aliasing. However, that comes at the cost of chroma sharpness. A less soft chroma upsampling algorithm will achieve sharpness. Basically, one can't have the cake and eat it too. So, it is a matter of taste as to whether one prefers removal of aliasing or wants a sharper picture.
The default luma algorithm used by madVR is Lanczos. The default chroma algorithm is SoftCubic 100 (which is very soft). It is not recommended to set chroma upsampling to Lanczos or Spline as they are very sharp. The cost in performance is also too big to be worth the gain for chroma. SoftCubic, Bicubic or Mitchell-Netravali are suggested for chroma upsampling as they are all 2-tap and need less GPU resources. In any case, it is hard to spot differences between various chroma algorithms in most real life images.
For luma upsampling the situation is very different. Most people prefer sharp results. The luma algorithm has a much bigger impact on overall image quality than the chroma upsampling algorithm. For luma upscaling, the nice sharp Lanczos 4 or Spline 4 is preferred by some users. Some prefer the SoftCubic 50 because it does a better job at hiding source artifacts. Others prefer Mitchell-Netravali or Bicubic for a more all around solution. There is no hard recommendation for this.
The madVR settings used for benchmarking were classified broadly into three categories:
- Low Quality : Bilinear luma and chroma scaling
- Medium Quality : Bicubic (sharpness 50) luma scaling and Bilinear chroma scaling
- High Quality : Lanczos (4-tap) luma scaling and SoftCubic (softness 70) chroma scaling
Scaling is one of the core functions in madVR, but it is not needed if the display resolution matches that of the video. In the 1080p and 1080i videos presented below, there is no scaling of luma, but chroma needs to be upsampled, though. The 'trade quality for performance' madVR options didn't seem to improve performance too much, and all of them were kept unchecked for benchmarking.
In the graphs below, 'Full VPP' refers to all the video post processing options as set in the NVIDIA Control Panel. The other entries refer to the madVR settings described above. The top row in each graph indicates the performance of the LAV CUVID decoder. When compared with the benchmarks of the DXVA2 decoders (presented in an earlier section), we see that the LAV CUVID decoder has almost no performance penalty.
In the graphs below, we try to identify what causes the throughput to fall below 60 fps. First, let us take a look at the 1080p H.264 clip.
In the above graph, we see that the lack of shaders in the GT 520 affects the madVR performance. The madVR steps become the bottleneck in this case. On the GT 430, the VPU remains the bottleneck till the more complicated scaling algorithms (of theoretical interest) are enabled (which are not presented in the graph above).
We see the same trends continuing for MPEG-2 and VC-1 also. Now, we move on to get a first glimpse at the extent of hardware acceleration available for MPEG-4 streams.
As expected, we get decent hardware acceleration for MPEG-4 and the post processing impact is the same as that for the other codecs.
Interlaced streams don't seem to alter the trend. The absolute values of the maximum decode frame rate is slightly lower in the high stress cases due to the overhead from deinterlacing. The GT 430's efficiency is now limited by shader power, rather than the VPU.
How do things change when we try to upscale the non-1080p content onto a 1080p display? This is probably where madVR's algorithms are needed most. To test this out, we put some non-1080i/p H.264 clips through the same benchmark.
An interesting result in the above benchmark is that the 480i H.264 stream can be processed faster using the GT 430 compared to the GT 520 with madVR disabled. It is quite obvious here that the deinterlacing using the GT 520's shaders is the bottleneck once the VPU hits 300 fps.
In all of the above non-1080i/p benchmarks, the lack of shaders in the GT 520 really hurt it. At 720p60, the High Quality frame rate is very close to 60 fps, and can't be recommended. The GT 430 holds up pretty decently in all the cases.
The takeaway from this section is that the GT 520 is not entirely suitable for madVR processing if you deal with a lot of SD material. The GT 430 is quite suitable for madVR processing as long as you keep the settings sane.
madVR is still an advanced HTPC user's tool. However, it should gain further traction with support for integrated hardware decoding and other driver supplied post processing options. We have covered a solution for NVIDIA GPU based HTPCs in this section. Let us see how this plays out for the AMD and Intel GPU platforms in the future.
70 Comments
View All Comments
fixxxer0 - Sunday, June 12, 2011 - link
that arrangement of cards slightly resembles a swastikaAnand Lal Shimpi - Sunday, June 12, 2011 - link
Thanks for the heads up, it honestly didn't cross our minds at all but now that it's been pointed out I can completely see the resemblance. Needless to say we've removed the offending image and I'd like to apologize to anyone who was offended.Thank you guys for catching it so quickly.
Take care,
Anand
fb39ca4 - Monday, June 13, 2011 - link
If people want to jump to conclusions, let them. The swastika means many things, if you want to associate it with Nazis then go ahead, or you could associate it with the religion Jainsim, which it happens to be a symbol of. Your interpretaion of the image affects no one, there is no reason to make a big deal over it.fixxxer0 - Monday, June 13, 2011 - link
lol... i made no connection to nazis or anything... nor said i was offended.i just pointed out a resemblance i noticed as a matter of fact.
tzhu07 - Sunday, June 12, 2011 - link
Yeah, time to change the Nazi reference.qwertymac93 - Sunday, June 12, 2011 - link
The indians(the asian ones...) have been using Swastikas for centuries before the nazi party was even thought of. Just sayin'.jwilliams4200 - Sunday, June 12, 2011 - link
It is unfortunate that there is such an over-reaction to something like this. Besides, the swastika symbol is and has been used for many, many other purposes than representing Nazis:http://en.wikipedia.org/wiki/Swastika
Souka - Monday, June 13, 2011 - link
True as it may be that fact remains that if some one says:"Hitler"
Most people think of Adolf
"swastika"
Most people think of Nazi
I'm of Jewish decent... the pic didn't offend me in the least bit, nor my friends.
Jusy saying.... ;)
Gnarr - Monday, June 13, 2011 - link
people should be thinking of Nazi's and Hitler anyways, it reminds everyone not to make that mistake again.I see no harm in accidentally arranging something in a Swastika :p
and on that notes.. There is a company in my home country that has been using the swastika as the company logo for over hundred years:
http://martasmarta.blog.is/users/1d/martasmarta/im... ;)
L. - Tuesday, June 14, 2011 - link
Unfortunately it does not.The way the western world depicts adolf hitler, nazism and everything surrounding that part of history is far from reminding anyone not to make that mistake again, as the main message is "nazi evil, hitler evil, us good guys, us not like them".
Anyone ever wondered what difference there is between Gestapo and the Patriot Act ? - oh right it doesn't target jews so it's fine ... lol