torsdag 13 juli 2017

Boy Howdy!

Hello World! (obligatory I know)
I've come to finally creating a blog where I can vent my frustration about all the horrors of day to day stupidity, or things I find enjoyable or interesting. That's about as much introduction you're gonna get though. Oh, and I won't even care to use child friendly language in the heat of my rants, so watch out for that if it bothers you.

Rant #1: Intel
I'll jump right in to this one, Intel are throwing shit on AMD as damage control.

So it surfaced to me that Intel were trying to specify disadvantages of choosing the new AMD EPYC lineup of processors instead of using tried and true processors. the TL;DR of what they were trying to say is "EPYC sucks epically pls don't abandon our server market".

Time to dig into this full force.

Intel published some images during their Press Workshops (event?) recently, first of all some of what the images say isn't technically wrong for reasons I will discuss, but instead are painted as being unsolvable and something not to even consider for anyone ever.

The first image is a generalization of the differences defined as a three column check box list, fair enough, let's see what they say shall we?

OK, I'll break down what these things mean for any people only interested in a good rant, proven performance is a good thing, datacenter directed servers are good for datacenters, and robust ecosystems for hardware is very important for mission critical apps, so Intel does check all these boxes.

The first problem here is what the very counter arguments against Naples (the generic name for the new shiny AMD server processors), proven performance and innovation does not battle "inconsistent" performance over a cpu approach Intel themselves can't seem to remember even offering (I'm talking about 8 CPU socket motherboards). I'll dig deeper into this in a bit.

"Poor track record" and "Inconsistent Supplier" is something Intel shouldn't even be allowed to mention just based on their own horrid mess of problems they've had in the past and still. If we go o the desktop side Intel seems to be using as a hammer to AMD we can see reports of the 7700K overheating and Intel thusly responging users shouldn't overclock their unlocked processors. I wouldn't really care about this normally because overclocking voids warranties anyways, but because a lot of regular users are hit with thermal issues we can't overlook this fluke and inconsistency. Not to mention many of these problems could simply be solved by Intel gimping their CPU's by using economically stupid amounts of thermal compounds between the CPU die and the Integrated Heat Spreader.

"Lack of Ecosystem" is the only real argument one could impose on AMD from the current list, so I don't disagree, for sensitive tasks sticking with something you already know can save you millions in the long run for companies.

Back to that previous point about EPYC being 4 jerryrigged desktop processors glued together with technological elmers glue, bullshit.

Intel at multiple occations tried to frame the Zeppelin (that's a 8 core CPU) dies as "desktop", not only is this very unproductive but in essense hypocritical considering Intel's own business practices. The Zeppelin die is simply a server grade CPU tuned to be scalable through a generic interface (called Infinity Fabric), this is almost exactly the same thing Intel does except the whole generic interface connecting, if you own a HEDT (and in some cases normal desktop) processor from Intel chanses are you are using a server grade CPU. What intel does to save costs is to develop one circuit and use that single circuit as much as viably possible to use anything from server grade low core counts with ginourmous amounts of cache down to the dinkiest i3's that can barely function for browsing the web. The very fact Intel chose to badmouth AMD's approach shows how far they're willing to go to keep their unruled practical monopoly for cirtain workloads (which AMD isn't even directing their potential users to in the forst place).

So what to do? AMD decided "hey let's NOT make an immensly incredibly expensive CPU arcitechture and just make the best thing we can with the funding we have!", arguably a great choise for some people. Intel also tried their best to dirtmouth AMD here as well but I'll also let you wait a bit on discussing what Intel did wrong with regards to calculating the disadvantage of the Zeppelin die. First let's see what Intel's grand analysis of what EPYC can't do though.

The entire thing is littered with HPC (High Performance Computing) being used as the only possible valuable thing a datacenter processor should be able to do, this is very concerning as if that's the only big negative they could find, Intel could be in for a ride. The blatant disregard to worktasks such as compiling or big renders (things commonly done on raw power processors) only amplifies the problem with the horrid situation Intel placed themselves into by simplifying the xeon line and confusing all potential byers by using gold/platinum etc. as a grading system and forcing higher costs for people needing one feature that existed as its own niche product at a more compelling price.

Back to my promise of discussing how Intel seems to dirtmouth Zeppelin.

Intel, as any company would, showed off new technology they had developed that supercedes their previous efforts to save money by making something more usable. What they've done is creating a "mesh" of CPU cores that can quite quickly send a message across to neighbouring cores and use a coerent generic interconnect to handle the fuzz between DRAM, L3 cache, PCIe lanes etc. This sounds familiar to something I mentioned earlier, specifically AMD's Infinity Fabric. They work differently and similarly at the same time, I will explain further.

Infinity Fabric is a VERY generic interface, it works for simplicity's sake the same way as DDR, basically send a pulse and recieve a pulse. The way this is structured on the Zeppelin die is so that values in RAM will have a set defined amount of "jumps" before it can enter an L3 cache slice. Because Naples uses more physical dies that means there's going to be an additional step before data can arrive, for most applications this can be a huge disvantage, but because applications that require more than 8 threads needs coherency it usually isn't a huge problem. In the future AMD might make a bigger server chip with 16 cores on a single die, this could still be small enough it would give good yields and make the real worst case scenario something which is already a problem, datasets of problems smaller than DDR4 limiting factors and bigger than the effective "biggest" L3 size of 8MB.

If we throw regular database crunching with datasets with tens of gigabytes the extra DDR channels on Zeppelin, it will start to show off good results again thanks to the brute force approach that large datasets usually need.

Meanwhile in Intel's side they've also made a mesh/fabric like approach to the most cores they could handle, what they forgot to mention was that a worst case scenario (a core wants to talk with a core on the other side of the CPU) wouldn't be an ideal case either even if they used that a a big argument against "glueing together" things. The nitty gritty comes when L3 slices has to know where to snoop for data if data is required quickly (where it's a battle between a couple DDR cycles and having to either wait for snooping data to arrive or manually finding where the data is, which both have their own sets of advantages and disadvantages).

Where Intel's argument falls flat on its face is when they start comparing latencies between the architectures and using the most extreme data mangling to arrive at a 73% improvement. Basically a favourable placement for data in a Zeppelin L3 is on the same CCX, this ensures a latency with a 30% advantage for Zeppelin, not bad at all, when data ends up in the "wrong" L3 cache of that Zeppelin die it starts to shuffle data over Infinity Fabric, which effectively means it's about as efficient as accessing DDR memory in a worst case scenario (this is what Intel uses as a "we're 73% better" tactic), they didn't comment on Zeppelin to Zeppelin interconnect speeds but some rough estimates makes me think a real unholy interconnect shuffle could cost up towards 200ns, this is a real potential problem but it requires testing, for all we know it could just as well also be around 100ns like RAM access or another CCX to CCX access.

Then there's the part where they compare Xeon SMT to desktop Ryzen SMT, it's not a secret that Intel offers better SMT in practice, but this is also partly due to them having 6 years to tweak and improve their already solidly defined platform, and at the same time blaming AMD for not producing anything new in 6 years (another argument they used).

The last argument which is true to what Intel has as the biggest argument of them all, there's no ecosystem. But even here Intel manages to find something to bash AMD for that's been revoked by most people already, Ryzen being bad at gaming. The long story short of it is this: Intel optimizations for 6 years has made Intel optimized games shit on AMD, but because people actually care (unlike what Intel would like you to believe) game developers have been very keen on finding and fixing performance issues across the board for Ryzen, the biggest examples where there's been up towards 20 and in some cases 30% improvements have faired well for Intel as well with it really being general improvements. We're already in July, and they're quoting something that was said 5 months ago, had they quoted someone saying that last month there'd be a problem for AMD, but the fact is the biggest issues have been fixed and Intel has gained my just made up "hardon for being arseholes" badge.

I'll edit this and add more images/references when I wake up, hell I might even fix any spelling mistakes if I find them.