AMD Processors and SSE4 Instruction Set: SSE4.1 vs. SSE4a
With the Deneb/Heka core, AMD has added the SSE4 instruction set and as such we would have expected to see the SSE4 mode enabled in VirtualDub, instead of forcing the processors into SSE2 emulation. However, even the latest version of DivX does NOT enable SSE4 in the CODEC configuration panel unless it is an Intel CPU that is running. In order to understand this, it we need to take a look at the SSE4 instruction set:
SSE4 consists of a total of 54 instructions, 47 of which were introduced with the Penryn design as SSE 4.1. The Nehalem design added another 7 instructions referred to as SSE4.2 to the full spectrum of SSE4. AMD currently only supports 4 instructions of the entire SSE4 instruction set, at the same time, AMD added two additional instructions including unaligned SSE load-operation instructions (which formerly required 16-byte alignment). AMD's subset of SSE4 instructions is referred to as SSE4a and currently, aside from the four "core" instructions, the different AMD and Intel subsets are mutually exclusive. In other words, SSE4 is not necessarily SSE4.
On a side note, we have never been able to actually see the quality difference between standard encoding and SSE4 full search. Please keep in mind that the difference may be academic when looking at the two different sets of benchmark results. The bottom line for this benchmark is that, while we are showing it, we need to make absolutely sure that everybody looking at the results is aware of the fact that DivX does not support SSE4a at this moment. We are still waiting for the next CODEC update that will possibly implement AMD-oriented optimizations as well.