Part of the story behind the Xeon Scalable platform, built upon server-level Skylake processing cores with AVX-512 and a new mesh topology, was that the CPU was designed to be partnered with additional silicon in the same package. Out of the gate immediately were versions bundled with Intel’s OmniPath controller, allowing for networking fabric connections. There has always been expectation that Intel will launch a Xeon Scalable processor with an integrated Intel Altera FPGA on the same package, and now that expectation has become reality. Intel is now shipping its Xeon Gold 6138P processor with a built-in Altera Arria 10 GX 1150 FPGA.

Back at Supercomputing 2016, Intel demonstrated what supposed to be a Broadwell-based Xeon system with a built in FPGA into the same package, however no real details were given and the chip itself was not on display. This year, at Mobile World Congress (of all places), Intel had a demonstration system showing a Xeon Scalable processor with a built in FPGA into the same package, but again the chip was not on display, only a processor that supposedly had the chip in. I was not allowed to use my screwdriver to open the system up. The Intel attendant next to the system was discussing that the platform would help accelerate Edge Computing for data used by 5G networks, although discussions about the finer details of how many SKUs, the size of the FPGA, and other elements were met with a refusal to answer. As a result, I didn’t post anything at MWC; I could not confirm anything that was being said and Intel was not prepared to say any more.

Lisa Spellman showing Intel Xeon + FPGA during Intel's Presentation at the Fujitsu Forum, Tokyo
Source: PC-Watch

Fast forward a couple of months, and over at PC-Watch are reporting that Intel has announced via its itpeernetwork hub (rather than its traditional PR outreach) the mass production of the Xeon Gold 6138P with an integrated Arria 10 GX 1150 FPGA, with some select customers already being sampled. The announcement states that Fujitsu is one of the Intel partners planning a system around this processor.

Intel Xeon Gold: Adding an FPGA
AnandTech Xeon Gold 6138 Xeon Gold 6138P
with Arria 10 FPGA
Socket Socket P
LGA 3647
Socket P
LGA 3647
Cores / Threads 20 / 40 20 / 40 ?
Base Frequency 2000 MHz 2000 MHz ?
Turbo Frequency 3700 MHz 3700 MHz ?
PCIe Lanes 48 32
DRAM Six Channels
Six Channels
On-Package FPGA - Arria 10 GX 1150
Logic Elements - 1150K (1.15m)
Embedded Memory - 53 Mb
UPI Links Three Two
TDP 125 W 125 W CPU
60 - 70 W FPGA
195 W Total ?
Price $2612 Arm, Leg

Intel is connecting the Xeon processor to the FPGA with 160 Gbps of bandwidth per socket (doesn’t state if this is bi-directional) using a cache coherent interconnect. From the way that we know that the Intel OmniPath Fabric connects in package to an Xeon, this connection likely implements a different protocol over the PCIe x16 interface reserved for in-package components, but also takes advantage of Intel’s Ultra-Path Interconnect (UPI) for cache coherency and access to data across the platform. This may mean that this reduces Xeon+FPGA setups to dual socket at best, if one UPI link from the processor is in use for the FPGA, however Intel did not provide briefings on the new parts to confirm this. We can confirm from an old Intel slide that the platform should be using a High Speed Serial Interface (HSSI) for connectivity; this slide also states that the new processors have different power specifications to standard Skylake-SP sockets, and as such the Xeon Gold 6138P is probably unlikely to be a drop in processor to current systems.

For this launch, Intel has built a virtual switching reference design, which uses the FPGA for infrastructure dataplane switching with virtual machines on the CPU implementing direct compute on the dataplane. Intel states that their reference design offers 3.2x better throughput and half the latency compared to a CPU-only solution when running the Open Virtual Switch framework. This test was measured through its DPDK forwarding performance. It was stated that at the Fujitsu Forum in Tokyo this week an OVS system with additional performance monitoring was on display.

The system under test was a 2P server using two of the new ‘Intel Xeon Gold 6138P with Integrated Arria 10 GX 1150 FPGA’ processors, 12x16 GB of DDR4-2666 (one DIMM per channel), and with an 100G Alaska network card from Marvell. Amusingly it says the system also had a PCIe 3.0 x10 slot, alongside a PCIe 3.0 x8 slot. 10 seems like a different number to normal.

Also in the announcement was a mention of Intel’s desire to offer a discrete FPGA solution with a faster high-bandwidth coherent connection, although details of this interconnect were not provided (it could be UPI through a physical discrete add-in card slot?). These discrete FPGA solutions will support code migration from code developed on the Xeon+FPGA system in this announcement as well as Altera’s Arria 10 GX acceleration cards.

One of Intel's current Arria 10 GX 1150 Programmable Acceleration Cards

Wider availability of the Xeon Gold 6138P with Arria 10 is not yet known at this time. Interested parties are expected to get in contact with their Intel representative or OEM partner.

Source: Intel's ITPeerNetwork, PC-World (main image)

Related Reading

Comments Locked


View All Comments

  • GreenReaper - Sunday, May 20, 2018 - link

    Better yet, it has a whole new standalone processing unit with built-in cache-coherency protocols with the CPU just waiting for a juicy new compromise!
  • Tiggun - Saturday, May 19, 2018 - link

    I'm curious about what workloads favor fpga acceleration over GPU acceleration? Deep learning comes to mind, but I don't know. The fpga is 14nm, but I wonder if its the same process the Xeon is using. I'd really like to see under the IHS. Does it look like that Intel+Vega underneath

Log in

Don't have an account? Sign up now