Google’s EdgeTPU benchmarked : >10x speedup vs Intel’s Movidius

Frederik Bode
ML6team
Published in
3 min readMar 13, 2019

--

The first benchmark of Google’s EdgeTPU Dev Board is in. Here a comparison is made against Intel’s (first generation) Movidius Neural Compute Stick, and Google is the clear winner regarding inference time:

Table 1 : Comparison for Inception v1

Benchmark explained

The columns in Table 1 represent the different setups tested. There are two parents, a laptop (a Dell XPS 13 intel i7 8th gen) and the Dev Board. They both interface with child ‘Neural Compute Units’, either Google’s EdgeTPU or Intel’s Movidius Neural Compute Stick (v1). Another option as a parent is the Raspberry Pi (for which the results are a WIP). For more info about the EdgeTPU, take a look at https://blog.ml6.eu/googles-edge-tpu-what-how-why-945b32413cde.

* The Edge TPU is also available as a seperate device, in which case it also uses a USB connection.

In Table 1, total time is the time to load an image from disk, pass it to the neural computing unit (EdgeTPU or Movidius), and get a prediction back. The inference time is only the inference part, which means it’s the time that the single line of code that is responsible for the inference (marked ‘inference call’ below) takes to return:

# EdgeTPU code to classify an imageimg = Image.open(image_name) # Read from disk 
result = engine.ClassifyWithImage(img, top_k=3) # Inference call
# Movidius code to classify an imageimg = read_image(image_filenames[i]) # Read and rescale
graph.queue_inference_with_fifo_elem(fifoIn,fifoOut,img,id) # Queue
output, userobj = fifoOut.read_elem() # Inference call

The speed test was done on 1000 images from Cifar 10 (rescaled to (224,224,3)), with the averages shown in Table 1. Variance wise only the first image that was inferred took (unsurprisingly) significantly longer than the rest, with EdgeTPU taking .06 seconds and Movidius taking .1 seconds longer than the their average (total) time. The model used for the Movidius Stick is the one included in the SDK, and the one used for the EdgeTPU is the one specified at https://coral.withgoogle.com/models/ .

Note that even though the Movidius Stick was used in combination with a laptop, the speedup generated by the EdgeTPU Dev Board is ~13x.

To give you an impression what this means, see how fast the bottom terminal, which uses the EdgeTPU, processes his 100 images, in comparison to the top one.

Plug-and-play?

Usability wise, both the EdgeTPU and the Movidius Stick had its issues. To use the Movidius Stick, you need to download and install the SDK, which (officially) requires either Ubuntu 16.04 or Raspbian Stretch. Definitely use a virtual environment to install the SDK (by specifying it in the ‘ncsdk.conf’ file), which should spare you a lot of headaches. On the other hand, using the EdgeTPU requires you to complete the Quickstart : https://coral.withgoogle.com/tutorials/devboard/ , which is quite experimental. For example, following command should be preceded by sudo, otherwise I got a cryptic error, and this was not specified by the tutorial:

screen /dev/ttyUSB0 115200 # -> sudo screen /dev/ttyUSB0 115200

Also, for the EdgeTPU, make sure your cables are capable (they are not included with the Dev Board), e.g. use an OTG cable for data transfer, and use a powerful power supply, as the EdgeTPU can draw 2–3 A.

Conclusion

In conclusion, EdgeTPU far outperforms the first generation Movidius Compute Stick, and confidently awaits its next contender, Intel’s Movidius Neural Compute Stick 2. What other parents (other than the RaspberryPi) should be used for benchmarking? What other children? Leave a comment!

--

--