Course Introduction

SOPHON Academy will provide you with an environment for self-learning and development experiments. You can select course content in the fields that interest you, and SOPHON Academy also offers customized solutions. We encourage you or your team to flexibly utilize personal time for online learning in digital-related knowledge, gaining valuable development experience and SOPHON training certificates. SOPHON Academy offers free videos, documents, and code for you to watch courses multiple times and conduct experiments repeatedly.

Course Overview

The courses offered at Sophon Academy Online include: BM16 Development Board Series, CV18 Development Board Series, Computer Vision, Large Language Models, AI Compiler, and Professional Skills Certification. The development board courses primarily cover deployment and usage of boards such as Milk-v Duo, Shaolin, Huashan, and SE5. The Computer Vision course encompasses both the theoretical and practical aspects of multimedia programming, including hands-on segments from the development board courses. The AI Compiler course provides a comprehensive overview of TPU-MLIR, covering theoretical knowledge, environment setup, and programming interfaces. The certification course equips individuals with the necessary knowledge required for IT operations engineers, allowing you to choose a suitable course based on your needs.


All courses

BM16 Series Development Board
CV18 Series Development Board
Machine Vision
Large Language Model
Vocational Skills Certification
Compiler development

As a bridge between the framework and hardware, the Deep learning compiler can realize the goal of one-time code development and reuse of various computing power processors. Recently, Computational Energy has also opened source its self-developed TPU compiler tool - TPU-MLIR (Multi-Level Intermediate Representation). Tpu-mlir is an open source TPU compiler for Deep learning processors. The project provides a complete tool chain, which converts the pre-trained neural network under various frameworks into a binary file bmodel that can operate efficiently in TPU to achieve more efficient reasoning. This course is driven by actual practice, leading you to intuitively understand, practice, and master the TPU compiler framework of intelligent Deep learning processors.

At present, the TPU-MLIR project has been applied to the latest generation of deep learning processor BM1684X, which is developed by Computational Energy. Combined with the high-performance ARM core of the processor itself and the corresponding SDK, it can realize the rapid deployment of deep learning algorithms. The course will cover the basic syntax of MLIR and the implementation details of various optimization operations in the compiler, such as figure optimization, int8 quantization, operator segmentation, and address allocation.

TPU-MLIR has several advantages over other compilation tools

1. Simple and convenient

By reading the development manual and the samples included in the project, users can understand the model conversion process and principles, and quickly get started. Moreover, TPU-MLIR is designed based on the current mainstream compiler tool library MLIR, and users can also learn the application of MLIR through it. The project has provided a complete set of tool chain, users can directly through the existing interface to quickly complete the model transformation work, do not have to adapt to different networks

2. General

At present, TPU-MLIR already supports the TFLite and onnx formats, and the models of these two formats can be directly converted into the bmodel available for TPU. What if it's not either of these formats? In fact, onnx provides a set of conversion tools that can convert models written by major deep learning frameworks on the market today to onnx format, and then proceed to bmodel

3, precision and efficiency coexist

During the process of model conversion, accuracy is sometimes lost. TPU-MLIR supports INT8 symmetric and asymmetric quantization, which can greatly improve the performance and ensure the high accuracy of the model combined with Calibration and Tune technology of the original development company. In addition, TPU-MLIR also uses a lot of graph optimization and operator segmentation optimization techniques to ensure the efficient operation of the model.

4. Achieve the ultimate cost performance and build the next generation of Deep learning compiler

In order to support graphic computation, operators in neural network model need to develop a graphic version; To adapt the TPU, a version of the TPU should be developed for each operator. In addition, some scenarios need to be adapted to different models of the same computing power processor, which must be manually compiled each time, which will be very time-consuming. The Deep learning compiler is designed to solve these problems. Tpu-mlir's range of automatic optimization tools can save a lot of manual optimization time, so that models developed on RISC-V can be smoothly and freely ported to the TPU for the best performance and price ratio.

5. Complete information

Courses include Chinese and English video teaching, documentation guidance, code scripts, etc., detailed and rich video materials detailed application guidance clear code script TPU-MLIR standing on the shoulders of MLIR giants to build, now all the code of the entire project has been open source, open to all users free of charge.

Code Download Link:

TPU-MLIR Development Reference Manual:

The Overall Design Ideas Paper:

Video Tutorials:"

Course catalog


序号 课程名 课程分类 课程资料
      视频 文档 代码
1.1 Deep learning编译器基础 TPU_MLIR基础
1.3 MLIR基本结构 TPU_MLIR基础
1.4 MLIR之op定义 TPU_MLIR基础
1.5 TPU_MLIR介绍(一) TPU_MLIR基础
1.6 TPU_MLIR介绍(二) TPU_MLIR基础
1.7 TPU_MLIR介绍(三) TPU_MLIR基础
1.8 量化概述 TPU_MLIR基础
1.9 量化推导 TPU_MLIR基础
1.10  量化校准 TPU_MLIR基础
1.11 量化感知训练(一) TPU_MLIR基础
1.12  量化感知训练(二) TPU_MLIR基础
2.1 Pattern Rewriting TPU_MLIR实战
2.2 Dialect Conversion TPU_MLIR实战
2.3 前端转换 TPU_MLIR实战
2.4 Lowering in TPU_MLIR TPU_MLIR实战
2.5 添加新算子 TPU_MLIR实战
2.8 TPU原理(一) TPU_MLIR实战
2.9 TPU原理(二) TPU_MLIR实战
2.10  后端算子实现 TPU_MLIR实战
2.11 TPU层优化 TPU_MLIR实战
2.12 bmodel生成 TPU_MLIR实战
2.13 To ONNX format TPU_MLIR实战
2.14 Add a New Operator TPU_MLIR实战
2.15 TPU_MLIR模型适配 TPU_MLIR实战
2.16 Fuse Preprocess TPU_MLIR实战
2.17 精度验证 TPU_MLIR实战
Advanced | duration 3.7hours
SE5 Development Series Course

The deep neural network model can be trained and tested quickly and then deployed by the industry to effectively perform tasks in the real world. Deploying such systems on small-sized, low-power Deep learning edge computing platforms is highly favored by the industry. This course takes a practice-driven approach to lead you to intuitively learn, practice, and master the knowledge and technology of deep neural networks.

The SOPHON Deep learning microserver SE5 is a high-performance, low-power edge computing product equipped with the third-generation TPU processor BM1684 developed independently by SOPHGO. With an INT8 computing power of up to 17.6 TOPS, it supports 32 channels of Full HD video hardware decoding and 2 channels of encoding. This course will quickly guide you through the powerful features of the SE5 server.  Through this course, you can understand the basics of Deep learning and master its basic applications.

Course Features

1. One-stop service 

All common problems encountered in SE5 applications can be found here.

 • Provide a full-stack solution for Deep learning micro servers

 • Break down the development process step by step, in detail and clearly

 • Support all mainstream frameworks, easy to use products

2. Systematic teaching 

It includes everything from setting up the environment, developing applications, converting models, and deploying products, as well as having a mirrored practical environment.

• How is the environment built? 

• How is the model compiled? 

• How is the application developed? 

• How are scenarios deployed?

3. Complete materials

The course includes video tutorials, document guides, code scripts, and other comprehensive materials. 

• Rich video materials 

• Detailed application guidance 

• Clear code scripts 

Code download link:

4. Free cloud development resources 

Online free application for using SE5-16 microserver cloud testing space 

• SE5-16 microserver cloud testing space can be used for online development and testing, supporting user data retention and export 

• SE5-16 microserver cloud testing space has the same resource performance as the physical machine environment 

Cloud platform application link:

Cloud platform usage instructions:


Elementary | duration 5.7hours
Shaolin Pi Development Board Practical Course

This course introduces the hardware circuit design and peripheral resource utilization methods of Shaolin Pi, as well as provides tutorials on using the hardware acceleration interface of Deep learning and some basic Deep learning examples.

"Shaolin Pi" is a development platform based on BM1684 with about 20 TOPS computing power. It has good hardware scalability based on the Mini-PCIe interface, a rich ecosystem, and various connectable peripherals.

  • Scalability: The Mini-PCIe of the "Shaolin Pi" core board can be converted into various interfaces such as WiFi, 4G, Bluetooth, GPIO, M2 interface, USB, RJ45, SATA, SFP, HDMI, and CAN.
  • Diverse connectable peripherals: The "Shaolin Pi" core board can be expanded with various devices such as portable screens, keyboards, mice, cameras, headphones, and VR. Users can DIY a full-scenario Linux workstation on the "Shaolin school" and practice various Deep learning experiments to their heart's content.

Course features:

  1. The content materials are rich and complete, including development board hardware design, peripheral interface instructions, development board upgrade process, and sample code scripts.

  2. The learning path is scientifically reasonable, starting from the introduction and basic usage of the development board, deepening the understanding of the development details through the learning of the internal system architecture and code, and finally leading to practical projects to fully utilize the development board and provide reference for users' own development.

  3. The practical projects are rich, and the course provides many examples of practical code usage and function demonstrations. Different functions can be implemented by simply modifying and combining the code.

Code download link:

Note: The model conversion part can refer to the SE5 development series courses.

Elementary | duration 1.6hours
RISC-V+TPU Development Board Practical Course

This course introduces the hardware circuit design and peripheral resource operation methods of the CV1812H development board from the "Huashan Pi" series. It also provides tutorials on using Deep learning hardware acceleration interfaces and some basic Deep learning examples.

Huashan Pi (CV1812H development board) is an open-source ecological development board jointly launched by TPU processor and its ecological partners. It provides an open-source development environment based on RISC-V and implements functions based on vision and Deep learning scenarios. The processor integrates the second-generation self-developed deep learning tensor processor (TPU), self-developed intelligent image processing engine (Smart ISP), hardware-level high-security data protection architecture (Security), speech processing engine, and H.264/265 intelligent encoding and decoding technology. It also has a matching multimedia software platform and IVE hardware acceleration interface, making Deep learning deployment and execution more efficient, fast, and convenient. The mainstream deep learning frameworks, such as Caffe, Pytorch, ONNX, MXNet, and TensorFlow (Lite), can be easily ported to the platform.

Course Features

1. Rich and complete content materials, including hardware design of the development board, SDK usage documents, platform development guides, and sample code scripts.

2. Scientific and reasonable learning path. The course introduces the development board and basic routines, and then delves into the internal system architecture and code learning to understand the development details. Finally, practical projects are introduced to fully utilize the development board, which can also serve as a reference for users to develop on their own. 

3. Suitable for different audiences. For users who want to quickly use the development functions, the course provides many code samples for use and function display, which can be easily modified and combined to achieve different functions. For enthusiasts or developers in related industries, the course also provides detailed SDK development usage guidelines and code sample analysis documents, which can help users to gain in-depth understanding. 

4. long-term maintenance of the course. In the future, we will launch more development courses to communicate with developers and grow together.


Link to the open-source code for the Huashan Pi development board:

Elementary | duration 2.2hours
Intelligent Car Programming Practical Course

There are many types of intelligent robots, and the most widely used ones are wheeled mobile robots, mainly used for indoor or warehouse patrol, planet exploration, teaching, scientific research, and civilian transportation. In this course, the intelligent car obtains video information through the built-in camera (visual sensor), recognizes the surrounding environment, and realizes autonomous navigation and obstacle avoidance in a small space based on sensors such as lidar and inertial measurement unit (IMU). This course takes a practical approach to guide you to intuitively learn robot operating system (ROS) and use Shaolin Pi development board to build an intelligent car vision application platform. Through programming the intelligent car in practical exercises, you will master the basic knowledge and application of Deep learning.

The Shaolin Pi development board is a high-performance, low-power edge computing product equipped with the third-generation TPU processor BM1684 independently developed by SOPHGO, with INT8 computing power of up to 17.6 TOPS. It supports hardware decoding of 32 full HD videos and encoding of 2 channels. The Shaolin Pi development board has flexible peripheral configuration, supporting 3 mini-PCIe and 4 USB interfaces, as well as DC power supply and Type-C power supply. According to the needs of different scenarios, the board can achieve optimal configuration, reasonable cost, optimal energy consumption, and optimal function selection. This course will help you quickly master the powerful features of the Shaolin Pi development board. Through this course, you will not only be able to master the basics of the Robot Operating System (ROS) and Deep learning, but also understand the basic applications of Deep learning.

Course Features

1. One-stop Service

All common issues related to KT001 intelligent car can be found here.

  • Provides a full-stack solution for KT001 intelligent car.
  • Comprehensively explains the basic concepts and practical applications of ROS.
  • With practical application as the core, it explains a large number of computer vision case studies, such as image processing based on OpenCV, object detection based on YOLOv5, multi-object tracking based on DeepSort, face detection based on RetinaFace and face recognition based on ResNet, as well as the implementation principles and methods of action recognition based on TSM.

2. Systematic Teaching

From product introduction to environment building, and then to visual application.

  • What is the composition of the intelligent car?
  • How is the intelligent car assembled?
  • How is the environment built?
  • How is the application developed?

3. Complete Materials

The course includes video tutorials, document guides, code scripts, etc., which are detailed and rich.

  • Abundant video materials.
  • Detailed application guidance.
  • Clear code scripts.

Code download link:

Course Catalogue

Intermediate | duration 1.2hours
Algorithm Experiment Box Application Development

SOPHON SE5 Deep Learning Computing Box is a high-performance, low-power edge computing product based on processors and modules, which targets a wider range of scenarios than module-shaped products. It is equipped with SOPHON's independently developed third-generation TPU processor BM1684, capable of an INT8 computing power of up to 17.6 TOPS, and can simultaneously process 16 channels of high-definition video, providing intelligent computing for various security, comprehensive security, education, finance, and security inspection projects.


The SE5 Deep Learning Computing Box is a small-scale server based on edge computing, supporting algorithms from various industries. With a complete ecosystem, it facilitates users in porting well-trained models. It not only supports facial recognition algorithm models but also supports dozens of auxiliary models, making it highly versatile for different scenarios. It can be applied in indoor and outdoor environments such as parks, communities, commercial buildings, and semi-enclosed integrated outdoor scenarios, without relying on X86 architecture servers. It fully utilizes its internal ARM resources, enabling independent integrated application development.


This computing box boasts high computing power and strong market competitiveness, while also preserving a portion of high-precision computing power. In scenarios requiring high-precision computing power, it retains the advantage of high precision, such as in dynamic visual unmanned retail cabinets and product recognition in smart refrigerator systems. Practical applications of the SE5 include deployment as an edge facial server in parks for entrance identification or park monitoring, facial payments in smart canteens, student facial recognition in home-school systems, access management in school dormitory systems, implementation of dish recognition algorithms in catering systems for billing purposes, replacing traditional security personnel in image recognition, higher accuracy in machine judgment, reduced training costs for security personnel, faster passage, and intelligent assistance in security checks. The diverse range of implantable algorithm models enables diversified application scenarios.


This course will explain the SE5 computing box and its application processes. By taking this course, you'll gain a clear understanding of this experimental box and become familiar with applying it to specific scenarios.


Course Highlights

Systematic teaching: From product introduction to environment setup and application processes.

  • What is the SE5 Experiment Box?
  • How to set up the application environment?
  • How are applications developed?


Comprehensive materials: The course includes instructional videos, documentation, code scripts, etc., providing detailed and rich information

  • Abundant video materials
  • Comprehensive application guidance
  • Clear code scripts
Elementary | duration 1.8hours
The Concept and Practice of LLM

Welcome to the Big Models Course! This course will take you deep into the realm of big models and help you master the skills to apply these powerful models. Whether you're interested in the field of deep learning or looking to apply big models in real-world projects, this course will provide you with valuable knowledge and hands-on experience.


Big models refer to deep learning models with enormous parameters and complex structures. These models perform exceptionally well when dealing with large-scale datasets and complex tasks like image recognition, natural language processing, speech recognition, and more. The emergence of big models has sparked significant changes in the field of deep learning, leading to breakthroughs in various domains.


In this course, you'll learn the fundamental concepts and principles of big models. We'll delve into the foundational theory, developmental history, commonly used big models, and the evolving techniques like Prompts and In-context learning within LLMs (Large Language Models). As the course progresses, we'll dive into the practical applications of big models. You'll learn how to deploy highly regarded big models such as Stable Diffusion and ChatGLM2-6B onto SOPHON's latest generation deep learning processor, the SOPHON BM1684X. The SOPHON BM1684X is the fourth-generation tensor processor specifically introduced by SOPHON for the field of deep learning, capable of 32TOPS computing power, supporting 32 channels of HD hardware decoding, and 12 channels of HD hardware encoding, applicable in environments such as deep learning, computer vision, high-performance computing, and more.


Whether you're inclined toward in-depth academic research on big models or their industrial applications, this course will provide you with a robust foundation and practical skills. Are you ready to take on the challenge of big models? Let's delve into this fascinating field together!

Advanced | duration 2.4hours
Compiler:TPU-MLIR environment construction and use guide

TPU-MLIR is a TPU compiler dedicated to processors. This compiler project offers a complete toolchain that can convert various pre-trained neural network models from different deep learning frameworks (PyTorch, ONNX, TFLite, and Caffe) into efficient model files (bmodel/cvimodel) for operation on the SOPHON TPU. Through quantization into different precisions of bmodel/cvimodel, the models are optimized for acceleration and performance on the SOPHON computing TPU. This enables the deployment of various models related to object detection, semantic segmentation, and object tracking onto underlying hardware for acceleration.

This course is mainly divided into three parts:

  1. Building and configuring a local development environment, understanding related SOPHON SDK, TPU-MLIR compiler core theories, and relevant acceleration interfaces.
  2. Converting and quantizing example deep learning models from ONNX, TFLite, Caffe, and PyTorch, along with methods for converting other deep learning framework models into the intermediate ONNX format.
  3. Guiding participants through the practical porting of four instance algorithms (detection, recognition, and tracking) for compilation, conversion, quantization, and final deployment onto the SOPHON 1684x tensor processor's TPU for performance testing.

This course aims to comprehensively and visually demonstrate the usage of the TPU-MLIR compiler through practical demonstrations, enabling a quick understanding of converting and quantizing various deep learning model algorithms and their deployment testing on the SOPHGO computing processor TPU. Currently, TPU-MLIR usage has been applied to the latest generation deep learning processors BM168X and CV18XX developed by SOPHGO, complemented by the processor's high-performance ARM core and corresponding SDK for rapid deployment of deep learning algorithms.

Advantages of this course in model porting and deployment:

1. Supports multiple deep learning frameworks

Currently supported frameworks include PyTorch, ONNX, TFLite, and Caffe. Models from other frameworks need to be converted into ONNX models. For guidance on converting network models from other deep learning architectures into ONNX, please refer to the ONNX official website:

2. User-friendly operation

Understanding the principles and operational steps of TPU-MLIR through the development manual and related deployment cases allows for model deployment from scratch. Familiarity with Linux commands and model compilation quantization commands is sufficient for hands-on practice.

3. Simplified quantization deployment steps

Model conversion needs to be executed within the docker provided by SOPHGO, primarily involving two steps: using to convert the original model into an MLIR file, and using to convert the MLIR file into bmodel format. The bmodel is the model file format that can be accelerated on SOPHGO TPU hardware.

4. Adaptable to multiple architectures and modes of hardware

Quantized bmodel models can be run on TPU in PCIe and SOC modes for performance testing.

5. Comprehensive documentation

Rich instructional videos, including detailed theoretical explanations and practical operations, along with ample guidance and standardized code scripts, are open-sourced within the course for all users to learn.

SOPHON-SDK Development Guide
TPU-MLIR Quick Start Manual
Example model repository
TPU-MLIR Official Repository
SOPHON-SDK Development Manual
Intermediate | duration 3.6hours
Intelligent Multimedia and TPU Programming Practical Course

Multimedia, commonly understood as the combination of "multi" and "media," refers to the integration of media forms such as text, sound, images, and videos. In recent years, there has been a surge in emerging multimedia applications and services, such as 4K ultra-high-definition, VR, holographic projection, and 5G live streaming.

Multimedia and Artificial Intelligence

Deep Learning is based on multimedia technologies, such as image processing and recognition, audio processing and speech recognition, and so on. This course is based on the BM1684 Deep learning processor, which has a peak performance of 17.6 TOPS INT8 and 2.2 TFTOPS FP32, and supports 32-channel HD hardware decoding. It demonstrates the core capabilities of a processor: computing power + multimedia processing power.

Key Technologies and Indicators for Intelligent Multimedia

Key technologies include coding and decoding technology, image processing technology, and media communication technology. Key indicators include the number of decoding channels, frame rate, resolution, level of richness of the image processing interface, latency, and protocol support.

This course will focus on introducing the three aspects of image processing technology, coding and decoding technology, and media communication technology. Through a combination of theory and practice, students will learn about intelligent multimedia related theories for artificial intelligence and quickly master basic practical methods.

Related GitHub links



Intermediate | duration 10.8hours
SOPHON Vocational Skills Certification Exam - Junior IT Operations Engineer

This course aims to familiarize learners with SOPHON products, understand their basic usage, and grasp their application scenarios to achieve a preliminary understanding of SOPHON products. The course covers product introductions, SE5 server development environment setup, product deployment, and application examples. Completing all the content of this course enables you to qualify for the 'Junior IT Operations Engineer' certification exam.


Course Features

  1. Abundant and Comprehensive Materials: The course includes video tutorials, instructional documents, providing detailed and rich information. Code download link:
  2. Systematic Teaching: From basic introductions to setting up the development environment, and practical deployment of the product, the course systematically covers the entire development process, providing readers with a complete knowledge system.
  3. Free Cloud Development Resources: You can apply for free usage of the SE5-16 micro server cloud test space online:

Course catalog

Admission requirements/Recommendations

This course is a study course corresponding to the "Junior IT Operations Engineer" certification exam, designed to provide learners with basic product knowledge and skills. Although this course assumes that learners do not have a programming background, in order for learners to better grasp the course content, we recommend that students have the following pre-requirements:

Basic Linux operations: Most of the development is done in a Linux environment, and the development involves basic Linux operations, including file management, network configuration, the text editor Vim, and more.
Basic Docker usage: including pulling images, creating containers, running/deleting containers, etc.
Programming languages: The tutorials in this course cover Python and C++ programming languages, and the Computational Energy Toolchain also provides apis for these two languages for developers to call.
Despite the above pre-requirements/recommendations, inexperienced learners are welcome to join the course. The course will use a simple and easy to understand teaching method, with examples and exercises to help students gradually acquire programming skills. For inexperienced learners, you can quickly learn the pre-requirements through Chapter 2 "Common Commands" of this course; For those with development experience, you can automatically skip the content of Chapter 2 and directly deploy through chapters 3 and 4. At the same time, developers who have the strength to learn can try to complete the transplant deployment of the new model on the device.

Elementary | duration 2.2hours

Why Choose SOPHON Academy Online Courses?


Self-paced Learning

Study and practice experiments flexibly to suit your schedule, needs, and technical background.

Professional Skill Learning

Learn the latest technologies, experiment practically, and boost technical skills.

Industry-standard Tools and Frameworks

Supporting major frameworks like PyTorch, TensorFlow, Caffe, PaddlePaddle, ONNX, and using industry-standard tools and software.

SOPHON Technical Skills Certification

SOPHON Technical Proficiency Certification validates your achievements in relevant fields, serving as proof of your personal skill enhancement.

SOPHON.NET Cloud Development Environment

Providing cloud spaces for course development, offering convenient resources for algorithm testing and development, liberating algorithms from hardware limitations.

Industry Application Cases

Learning intelligent acceleration computing applications applicable to industries like drones, robotics, autonomous driving, manufacturing, and more.
SOPHON Academy Training Inquiries
For technical queries, please visit the SOPHGO Developer Forum.