Gaming

Meta develops its own MTIA v2 AI accelerator with up to 708 TFLOPS


Meta develops its own MTIA v2 AI accelerator with up to 708 TFLOPS



That Artificial Intelligence is the great trend and fashion that spreads in all types of services is not new. This has meant that the need for specific hardware to execute the different phases of the AI ​​process has increased exponentially, with the main beneficiaries being Intel or AMD with their accelerator cards.

However, we have already seen how several companies, such as Google, are dedicating resources and effort to creating their own chips so as not to depend so much on third parties. Meta is one of the technology companies that is betting heavily on Artificial Intelligence, and they have just announced their new next generation MTIA accelerator (Meta Training and Inference Accelerator).

Geeknetic Meta develops its own MTIA v2 AI accelerator with up to 708 TFLOPS 1

Manufactured at 5 nanometers at TSMC, it is the evolution of Meta's first own designs, and is capable of processing 708 TeraFLOPS in INT8 GEMM matrix processing.

All this with a TDP of 90 W and an operating frequency of 1.35 GHz. Each MTIA v2 chip is made up of a network of 8×8 PE processing elements, with a performance 3.5 times higher than the first generation MTIA. For the same figure, they also improve their bandwidth, while doubling the amount of LPDDR5 memory.

Geeknetic Meta develops its own MTIA v2 AI accelerator with up to 708 TFLOPS 2

For comparison, the first generation MTIA was manufactured at 7 nanometers, operated at 800 MHz and had a power of 102.4 TFLOPS GEMM. Of course, its TDP was only 25W.

Geeknetic Meta develops its own AI accelerator MTIA v2 with up to 708 TFLOPS 3

Technical specifications of Meta's second generation MTIA chip

  • Fabrication process: TSMC 5nm
  • Frequency: 1.35GHz
  • Instances: 2.35B gates, 103M flops
  • Size: 25.6mm x 16.4mm, 421mm2
  • Packaging: 50mm x 40mm
  • Voltage: 0.85V
  • TDP: 90W
  • Host Connectivity: 8x PCIe Gen5 (32GB/s)
  • GEMM TOPS:

    • 708 TFLOPS/s (INT8) (sparsity)
    • 354 TFLOPS/s (INT8)
    • 354 TFLOPS/s (FP16/BF16) (sparsity)
    • 177 TFLOPS/s (FP16/BF16)

  • SIMD TOPS:

    • Vector core:

      • 11.06 TFLOPS/s (INT8),
      • 5.53 TFLOPS/s (FP16/BF16),
      • 2.76 TFLOPS/s (FP32)

    • SIMD:

      • 5.53 TFLOPS/s (INT8/FP16/BF16),
      • 2.76 TFLOPS/s (FP32)

  • Memory Capacity

    • Local: 384 KB per PE
    • Integrated: 256 MB
    • LPDDR5 External: 128 GB

  • Memory bandwidth:

    • Local: 1 TB/s for each PE
    • Integrated: 2.7 TB/s
    • External LPDDR5: 204.8 GB/s

End of Article. Tell us something in the Comments!

Article Editor: Antonio Delgado

Antonio Delgado

Computer Engineer by training, editor and hardware analyst at Geeknetic since 2011. I love to tear apart everything that passes through my hands, especially the latest hardware that we receive here for reviews. In my free time I tinker with 3D printers, drones and other gadgets. For anything, here you have me.

Source link