Intel Edge AI Scholarship Foundation Course, OpenVINO Fundamentals

OpenVINO Fundamentals is the course from Intel to prepare for taking the full Edge AI Nanodegree program in Udacity. I am glad that I was one of the chosen ones to get this course. I already completed the course and want to bring a summary of what I learned here to you.

There are five lessons in this course; Introduction to AI at the Edge, Leveraging Pre-Trained Models, The Model Optimizer, The Inference Engine, and Deploying an Edge App. There is also a community that is welcome for help in Slack. This course started on December 16, 2019, and will end on March 3, 2020. Udacity and Intel will select 750 students for a full scholarship of the Edge AI Nanodegree program.

Minimum Requirements

You need at least you can program in Python to follow the course. It is better if you can program in C++. Have a minimum understanding, or knowledge, or experience in Machine and Deep Learning, and Linux.

Bingo Challenge

There was a challenge called Bingo Challenge. That was quite challenging and fun. It asked to socialize with the community and complete the lessons until Lesson 2. Click here to see my Bingo card.

Lesson 1: Introduction to AI at the Edge

The edge means local processing, as opposed to the cloud. It can be a local device located as close as possible to the source. It can be in a low latency network because there is no need to send data to the cloud. The cloud could train the algorithms, but it runs at the edge. Examples of edge applications are self-driving cars, remote nature cameras, heartbeat detectors, etc.

Here are a few of importances of the edge:

Lesson 2: Leveraging Pre-Trained Models

You can use the OpenVINO (Open Visual Inferencing and Neural Network Optimization) Toolkit. It is an open-source library useful for edge deployment due to its performance maximizations and pre-trained models. Pre-trained models refer to models where training has already occurred, and have high or even cutting-edge accuracy. Using pre-trained models avoids the need for large-scale data collection and cost to train.

There are three types of computer vision models:

  1. Classification: Determines what an object in a given image is.
  2. Detection: Determines the location of an object using some type of markers like a bounding box.
  3. Segmentation: Determines the location of an object in an image on a pixel-by-pixel.

Here is the full list of pre-trained models available in the Intel Distribution of OpenVINO. Suppose you have an application of Traffic Light Optimization, so the best models for it are detecting people, vehicles, and bikes. If the app is about monitoring form when working out, so the best model is human pose estimation.

Lesson 3: The Model Optimizer

The Model Optimizer helps convert models in multiple different frameworks to an Intermediate Representation for being used in the Inference Engine. It also improves the size and speed.

There are three optimization techniques:

  1. Quantization: Reduces precision of weights and biases, thereby reducing compute time and size with some loss of accuracy.
  2. Freezing: Fine-tuning a neural network on a layer.
  3. Fusion: Combining certain operations together into one operation and needing less computational overhead.

These are supported frameworks with the OpenVINO Toolkit to optimize the model; Caffe, TensorFlow, MXNet, ONNX, and Kaldi.

Intermediate Representations (IRs) are the OpenVINO Toolkit’s standard structure and naming for neural network architectures. A Conv2D layer in TensorFlow, Convolution layer in Caffe or Conv layer in ONNX are all converted into a Convolution layer in an IR. It is a model where specific layers of supported deep learning frameworks are replaced with layers in the “dialect” of the Inference Engine.

The IR can be loaded directly into the Inference Engine and is made of two output files from the Model Optimizer; an XML file and a binary file. The XML file holds the model architecture and other important metadata, while the binary file holds weights and biases in a binary format.

Lesson 4: The Inference Engine

The Inference Engine runs the actual inference on a model. It only works with the Intermediate Representations that come from the Model Optimizer or the Intel pre-trained models in OpenVINO that are already in IR format. It has a straightforward API to allow easy integration into the edge application. It is built in C++, leading to overall faster operations. It provides a library of computer vision functions and performs the inference on a model.

Where the Model Optimizer made some improvements to the size and complexity of the models to improve memory and computation times, the Inference Engine provides hardware-based optimizations to get even further improvements from a model.

The supported devices for the Inference Engine are all Intel hardware and are a variety of such devices; CPU (Central Processing Unit), GPU (Graphics Processing Unit), FPGA (Field-Programmable Gate Array), and VPU (Vision Processing Unit).

VPU is like Neural Compute Stick. It is small, but a powerful device that can be plugged into other hardware, for the specific purpose of accelerating computer vision tasks.

Lesson 5: Deploying an Edge App

OpenCV is an open-source library for various image processing and computer vision techniques that runs on a highly optimized C++ back-end, although it is available for use with Python and Java as well. It is often helpful as a part of edge applications.

MQTT (MQ Telemetry Transport) is a lightweight publish/subscribe architecture that is designed for resource-constrained devices and low-bandwidth setups. It is used a lot for the Internet of Things devices, or other machine-to-machine communication, and has been around since 1999.

In the publish/subscribe architecture, there is a broker, or hub, that receives messages published to it by different clients. The broker then routes the messages to any clients subscribing to those particular messages.

Here is the article that could help to set up Linux server for live video streaming. You could use Flask with OpenCV to stream video to the web browser. You could use the Node server to handle the data coming in from the MQTT and FFmpeg servers, and then actually render that content for a web page user interface.

Conclusion

Sorry for there is a lot of coding exercises that I don't cover here. Udacity provides a workspace for coding exercises. That is helpful. So I don't need to mess up my local.

It’s very important to consider user needs. Knowing their needs can inform the various trade-offs to make regarding model decisions, what information to send to servers, etc.

The considerations of speed, size, and network are still very important for AI at the Edge. Faster models can free up computation, leading to less power usage, or allow to use cheaper hardware. Smaller models can also free up memory, or allow for devices with less memory is easy to begin.

Visit Intel DevMesh for some more awesome projects others have built, join on existing projects, or even post some of your own.

References

  1. https://sites.google.com/udacity.com/intel-edge-ai-scholarship
  2. https://www.udacity.com/scholarships/intel-edge-ai-scholarship
  3. https://blog.udacity.com/2019/11/udacity-announcing-new-scholarship-program-for-edge-development-with-intel.html