TPU processor, 16 channels HD video intelligent analysis, 16 channels of full HD video decoding, 10 channels of full HD video encoding
TPU processor, 32 channels HD video intelligent analysis, 32 channels of full HD video decoding, 12 channels of full HD video encoding
RISC-V + ARM intelligent deep learning processor
Based on the RISC-V core, operating at a frequency of 2GHz, the processor features a single SOC with 64 cores and 64MB shared L3 cache.
SRC1-10 is an excellent performance server cluster based on RISC-V arch. It has both computing and storage capabilities, and the full stack of software and hardware is domestically produced.
The RISC-V Fusion Server, supports dual-processor interconnection and enabled intelligent computing acceleration.
SRB1-20 is an excellent performance storage server based on RISC-V arch. It supports CCIX, 128-core concurrent, multi-disk large-capacity secure storage, and the full stack of software and hardware is domestically produced.
SRA1-20 is an excellent performance computing server based on RISC-V arch. It supports CCIX, 128-core concurrent, both software and hardware are open source and controllable.
SRA3-40 is a RISC-V server for high-performance computing, domestic main processor,excellent performance,fusion of intelligent computing, support powerful codec.
SRB3-40 is a high-performance RISC-V storage server with multiple disk slots and large-capacity secure storage.
Intelligent computing server SGM7-40, adapted to mainstream LLM, a single card can run a 70B large language model
SOM1684, BM1684, 16-Channel HD Video Analysis
Core-1684-JD4,BM1684, 16-Channel HD Video Analysis
SBC-6841,BM1684, 16-Channel HD Video Analysis
iCore-1684XQ,BM1684X,32-Channel HD Video Analysis
Core-1684XJD4,BM1684X,32-Channel HD Video Analysis
Shaolin PI SLKY01,BM1684, 16-Channel HD Video Analysis
QY-AIM16T-M,BM1684, 16-Channel HD Video Analysis
QY-AIM16T-M-G,BM1684, 16-Channel HD Video Analysis
QY-AIM16T-W,BM1684, 16-Channel HD Video Analysis
AIV02T,1684*2,Half-Height Half-Length Accelerator Card
AIO-1684JD4,BM1684, 16-Channel HD Video Analysis
AIO-1684XJD4,BM1684X,32-Channel HD Video Analysis
AIO-1684XQ,BM1684X,32-Channel HD Video Analysis
IVP03X,BM1684X,32-Channel HD Video Analysis
IVP03A,Microserver, passive cooling, 12GB RAM
Coeus-3550T,BM1684, 16-Channel HD Video Analysis
EC-1684JD4,BM1684, 16-Channel HD Video Analysis
CSA1-N8S1684,BM1684*8,1U Cluster Server
DZFT-ZDFX,BM1684X,Electronic Seal Analyzer,ARM+DSP architecture
ZNFX-32,BM1684, 16-Channel HD Video Analysis
ZNFX-8,BM1684X,ARM+DSP architecture,Flameproof and Intrinsic Safety Analysis Device
EC-A1684JD4,Microserver with active cooling, 16GB RAM, 32GB eMMC
EC-A1684JD4 FD,BM1684, 16-Channel HD Video Analysis,6GB of RAM, 32GB eMMC
EC-A1684XJD4 FD,BM1684X,32-Channel HD Video Analysis
ECE-S01, BM1684, 16-Channel HD Video Analysis
IOEHM-AIRC01,BM1684,Microserver Active Cooling,16-Channel HD Video Analysis
IOEHM-VCAE01, BM1684, 16-Channel HD Video Analysis
CSA1-N8S1684X,BM1684*8,1U Cluster Server
QY-S1U-16, BM1684, 1U Server
QY-S1U-192, BM1684*12, 1U Cluster Server
QY-S1X-384, BM1684*12, 1U Cluster Server
Deep learning intelligent analysis helps make city management more efficient and precise
Using deep learning video technology to analyze sources of dust generation and dust events, contributing to ecological environmental protection
Using deep learning intelligent analysis to monitor scenarios such as safety production, urban firefighting, and unexpected incidents for emergency regulation.
Using deep learning technology to detect and analyze individuals, vehicles, and security incidents in grassroots governance
Empowering the problems of traffic congestion, driving safety, vehicle violations, and road pollution control
Utilizing domestically developed computational power to support the structured analysis of massive volumes of videos, catering to practical applications in law enforcement
Build a "smart, collaborative, efficient, innovative" gait recognition big data analysis system centered around data
Effectively resolving incidents of objects thrown from height, achieving real-time monitoring of such incidents, pinpointing the location of the thrown object, triggering alerts, and effectively safeguarding the safety of the public from falling objects
Using edge computing architecture to timely and accurately monitor community emergencies and safety hazards
SOPHGO with SOPHON.TEAM ecosystem partners to build a deep learning supervision solution for smart hospitals, enhancing safety management efficiency in hospitals
SOPHGO with SOPHON.TEAM ecosystem partners to build a smart safe campus solution
Using a combination of cloud-edge deep learning methods to address food safety supervision requirements across multiple restaurant establishments, creating a closed-loop supervision system for government and enterprise-level stakeholders
SOPHON's self-developed computing hardware devices, such as SG6/SE5/SE6, equipped with SOPHON.TEAM video analysis algorithms, are used to make industrial safety production become smarter
Combining deep learning, edge computing and other technologies, it has the ability to intelligently identify people, objects, things and their specific behaviors in the refueling area and unloading area. It also automatically detects and captures illegal incidents at gas stations to facilitate effective traceability afterwards and provide data for safety management.
SOPHGO, in collaboration with SOPHON.TEAM and its ecosystem partners, is focusing on three major scene requirements: "Production Safety Supervision," "Comprehensive Park Management," and "Personnel Safety & Behavioral Standard Supervision." Together, they are developing a comprehensive deep learning scenario solution, integrating "algorithm + computing power + platform."
SOPHGO, cooperates with SOPHON.TEAM ecological partners to build a deep learning monitoring solution for safety risks in chemical industry parks
SOPHGO with SOPHON.TEAM ecosystem partners to build a Smart Computing Center solution, establishing a unified management and scheduling cloud-edge collaborative smart computing center
SOPHGO, in collaboration with SOPHON.TEAM ecosystem, have jointly developed a set of hardware leveraging domestically-produced deep learning computational power products. This is based on an AutoML zero-code automated deep learning training platform, enabling rapid and efficient implementation of deep learning engineering solutions
比赛所使用的算丰TPU的架构如下所示:
其中host memory为计算机内存,system memory为算丰TPU的内存,local memory为NPU内存。Execution Units(EU)为计算单元。CDMA用于host memory与system memory数据搬运,GDMA用于system momery与local memory之间的数据搬运,BDC用于local memory中数据的计算。
算丰的运算是由指令完成,具体的计算指令由host端发射,device端收到指令执行。其中host memory与system momery的交互由host端代码完成,system memory与local memory的交互由device端代码完成。
比赛需要编写device的代码,也就是关注的是GDMA部分以及BDC的部分,即在local memory和system memory中搬运数据,和使用EU计算数据。
比赛所使用的TPU数据如下:
global memory: 128GB
NPU number: 64
local memory:512KB * 64 = 32MB
EU number: 32bit的EU数量为16
数据在host memory和system memory中的搬运赛事已经提供且无法修改。因此需要选手做的事情简化起来就是:数据从system memory搬入local memory->使用bdc计算->将结果从local memory搬运到system memory。
其中,数据搬运使用的是gdma操作,比赛使用的都是浮点数,因此,搬运一般使用的就是:
void okk_gdma_32bit_cpy_S2L(local_addr_t dst_addr, system_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride); //tensor搬入
void okk_gdma_32bit_cpy_L2S(system_addr_t dst_addr, local_addr_t src_addr, const dim4 *shape, const dim4 *dst_stride, const dim4 *src_stride); //tensor搬出
void okk_gdma_32bit_matrix_S2L(local_addr_t dst_addr, system_addr_t src_addr, int rows, int cols, int cols_per_channel, int row_stride); //矩阵搬入
void okk_gdma_32bit_matrix_L2S(system_addr_t dst_addr, local_addr_t src_addr, int rows, int cols, int cols_per_channel, int row_stride); //矩阵搬出
其中local_addr_t表示local memory中的地址,所有local memory为统一寻址。
bdc的相关操作较多,就不一一列举了,详见二元函数,一元函数,神经网络函数。
流程中选手要完成比赛的题目主要分为两部分:数据切分和实现新算子。比赛中的tensor输入有多种大小,因此对特别大的数据需要使用数据切分,每次搬入一部分以完成计算。实现新算子是由于基础的神经网络算子库中提供的算子并不多,例如softmax函数需要自己结合exp算子、池化算子以及除法算子完成。