Tesla has unveiled its latest version of its Dojo supercomputer and it’s apparently so powerful that it tripped the power grid in Palo Alto.
Dojo is Tesla’s own custom supercomputer platform built from the ground up for AI machine learning and more specifically for video training using the video data coming from its fleet of vehicles.
The automaker already has a large NVIDIA GPU-based supercomputer that is one of the most powerful in the world, but the new Dojo custom-built computer is using chips and an entire infrastructure designed by Tesla.
The custom-built supercomputer is expected to elevate Tesla’s capacity to train neural nets using video data, which is critical to its computer vision technology powering its self-driving effort.
Last year, at Tesla’s AI Day, the company unveiled its Dojo supercomputer, but the company was still ramping up its effort at the time. It only had its first chip and training tiles, and it was still working on building a full Dojo cabinet and cluster or “Exapod.”
Now Tesla has unveiled the progress made with the Dojo program over the last year during its AI Day 2022 last night.
The company confirmed that it managed to go from a chip and tile to now a system tray and a full cabinet.
Tesla claims it can replace 6 GPU boxes with a single Dojo tile, which the company claims costs less than one GPU box. There are 6 of those tiles per tray.
Tesla says that a single tray is the equivalent of “3 to 4 fully-loaded supercomputer racks”.
The company is integrating its host interface directly on the system tray to create a big full host assembly:
Tesla can fit two of these system trays with host assembly into a single Dojo cabinet.
Here’s what the Dojo cabinet looks like closed and opened:
That’s pretty much where Tesla is right now as the automaker is still developing and testing the infrastructure needed to put a few cabinets together to create the first “Dojo Exapod”.
Bill Chang, Tesla’s Principal System Engineer for Dojo, said during
“We knew that we had to reexamine every aspect of the data center infrastructure in order to support our unprecedented cooling and power density.”
They had to develop their own high-powered cooling and power system to power the Dojo cabinets.
Chang said that Tesla tripped their local electric grid’s substation when testing the infrastructure earlier this year:
“Earlier this year, we started load testing our power and cooling infrastructure and we were able to push it over 2 MW before we tripped our substation and got a call from the city.”
Here’s how the Tesla Dojo Exapod looks like opened and closed:
Tesla released the main specs of a Dojo Exapod: 1.1 EFLOP, 1.3 TB SRAM, and 13 TB high-bandwidth DRAM.
The company used the event to try to recruit more talent, but it also shared that it is on schedule to have its full first cluster or Exapod, in Q1 2023.
It is currently planning to have 7 Dojo Exapods in Palo Alto.
Why does Tesla need to Dojo supercomputer?
It’s a fair question. Why is an automaker developing the world’s most powerful supercomputer? Well, Tesla would tell you that it’s not just an automaker, but a technology company developing products to accelerate the transition to a sustainable economy.
But more specifically, Tesla needs Dojo to auto-label train videos from its fleet and train its neural nets to build its self-driving system.
Tesla realized that its approach to developing a self-driving system using neural nets training on millions of videos coming from its customer fleet requires a lot of computing power. and it decided to develop its own supercomputer to deliver that power.
That’s the short-term goal, but Tesla will have plenty of use for the supercomputer going forward as it has big ambitions to develop other artificial intelligence programs.
Subscribe to Electrek on YouTube for exclusive videos and subscribe to the podcast.