TPU processor, 16 channels HD video intelligent analysis, 16 channels of full HD video decoding, 10 channels of full HD video encoding
TPU processor, 32 channels HD video intelligent analysis, 32 channels of full HD video decoding, 12 channels of full HD video encoding
RISC-V + ARM intelligent deep learning processor
Based on the RISC-V core, operating at a frequency of 2GHz, the processor features a single SOC with 64 cores and 64MB shared L3 cache.
SRC1-10 is an excellent performance server cluster based on RISC-V arch. It has both computing and storage capabilities, and the full stack of software and hardware is domestically produced.
The RISC-V Fusion Server, supports dual-processor interconnection and enabled intelligent computing acceleration.
SRB1-20 is an excellent performance storage server based on RISC-V arch. It supports CCIX, 128-core concurrent, multi-disk large-capacity secure storage, and the full stack of software and hardware is domestically produced.
SRA1-20 is an excellent performance computing server based on RISC-V arch. It supports CCIX, 128-core concurrent, both software and hardware are open source and controllable.
SRA3-40 is a RISC-V server for high-performance computing, domestic main processor,excellent performance,fusion of intelligent computing, support powerful codec.
SRB3-40 is a high-performance RISC-V storage server with multiple disk slots and large-capacity secure storage.
Intelligent computing server SGM7-40, adapted to mainstream LLM, a single card can run a 70B large language model
SOM1684, BM1684, 16-Channel HD Video Analysis
Core-1684-JD4,BM1684, 16-Channel HD Video Analysis
SBC-6841,BM1684, 16-Channel HD Video Analysis
iCore-1684XQ,BM1684X,32-Channel HD Video Analysis
Core-1684XJD4,BM1684X,32-Channel HD Video Analysis
Shaolin PI SLKY01,BM1684, 16-Channel HD Video Analysis
QY-AIM16T-M,BM1684, 16-Channel HD Video Analysis
QY-AIM16T-M-G,BM1684, 16-Channel HD Video Analysis
QY-AIM16T-W,BM1684, 16-Channel HD Video Analysis
AIV02T,1684*2,Half-Height Half-Length Accelerator Card
AIO-1684JD4,BM1684, 16-Channel HD Video Analysis
AIO-1684XJD4,BM1684X,32-Channel HD Video Analysis
AIO-1684XQ,BM1684X,32-Channel HD Video Analysis
IVP03X,BM1684X,32-Channel HD Video Analysis
IVP03A,Microserver, passive cooling, 12GB RAM
Coeus-3550T,BM1684, 16-Channel HD Video Analysis
EC-1684JD4,BM1684, 16-Channel HD Video Analysis
CSA1-N8S1684,BM1684*8,1U Cluster Server
DZFT-ZDFX,BM1684X,Electronic Seal Analyzer,ARM+DSP architecture
ZNFX-32,BM1684, 16-Channel HD Video Analysis
ZNFX-8,BM1684X,ARM+DSP architecture,Flameproof and Intrinsic Safety Analysis Device
EC-A1684JD4,Microserver with active cooling, 16GB RAM, 32GB eMMC
EC-A1684JD4 FD,BM1684, 16-Channel HD Video Analysis,6GB of RAM, 32GB eMMC
EC-A1684XJD4 FD,BM1684X,32-Channel HD Video Analysis
ECE-S01, BM1684, 16-Channel HD Video Analysis
IOEHM-AIRC01,BM1684,Microserver Active Cooling,16-Channel HD Video Analysis
IOEHM-VCAE01, BM1684, 16-Channel HD Video Analysis
CSA1-N8S1684X,BM1684*8,1U Cluster Server
QY-S1U-16, BM1684, 1U Server
QY-S1U-192, BM1684*12, 1U Cluster Server
QY-S1X-384, BM1684*12, 1U Cluster Server
Deep learning intelligent analysis helps make city management more efficient and precise
Using deep learning video technology to analyze sources of dust generation and dust events, contributing to ecological environmental protection
Using deep learning intelligent analysis to monitor scenarios such as safety production, urban firefighting, and unexpected incidents for emergency regulation.
Using deep learning technology to detect and analyze individuals, vehicles, and security incidents in grassroots governance
Empowering the problems of traffic congestion, driving safety, vehicle violations, and road pollution control
Utilizing domestically developed computational power to support the structured analysis of massive volumes of videos, catering to practical applications in law enforcement
Build a "smart, collaborative, efficient, innovative" gait recognition big data analysis system centered around data
Effectively resolving incidents of objects thrown from height, achieving real-time monitoring of such incidents, pinpointing the location of the thrown object, triggering alerts, and effectively safeguarding the safety of the public from falling objects
Using edge computing architecture to timely and accurately monitor community emergencies and safety hazards
SOPHGO with SOPHON.TEAM ecosystem partners to build a deep learning supervision solution for smart hospitals, enhancing safety management efficiency in hospitals
SOPHGO with SOPHON.TEAM ecosystem partners to build a smart safe campus solution
Using a combination of cloud-edge deep learning methods to address food safety supervision requirements across multiple restaurant establishments, creating a closed-loop supervision system for government and enterprise-level stakeholders
SOPHON's self-developed computing hardware devices, such as SG6/SE5/SE6, equipped with SOPHON.TEAM video analysis algorithms, are used to make industrial safety production become smarter
Combining deep learning, edge computing and other technologies, it has the ability to intelligently identify people, objects, things and their specific behaviors in the refueling area and unloading area. It also automatically detects and captures illegal incidents at gas stations to facilitate effective traceability afterwards and provide data for safety management.
SOPHGO, in collaboration with SOPHON.TEAM and its ecosystem partners, is focusing on three major scene requirements: "Production Safety Supervision," "Comprehensive Park Management," and "Personnel Safety & Behavioral Standard Supervision." Together, they are developing a comprehensive deep learning scenario solution, integrating "algorithm + computing power + platform."
SOPHGO, cooperates with SOPHON.TEAM ecological partners to build a deep learning monitoring solution for safety risks in chemical industry parks
SOPHGO with SOPHON.TEAM ecosystem partners to build a Smart Computing Center solution, establishing a unified management and scheduling cloud-edge collaborative smart computing center
SOPHGO, in collaboration with SOPHON.TEAM ecosystem, have jointly developed a set of hardware leveraging domestically-produced deep learning computational power products. This is based on an AutoML zero-code automated deep learning training platform, enabling rapid and efficient implementation of deep learning engineering solutions
typedef struct {
int N, C, H, W;
unsigned long long output_addr;
unsigned long long input_addr;
} __attribute__((packed)) param_t;
这里默认对c维度进行softmax。
测试用例参数:
param_t params[] = {
{.N = 1, .C = 370, .H = 13, .W = 13 }, // 0
{.N = 1, .C = 1000, .H = 1, .W = 1 }, // 1
{.N = 4, .C = 2, .H = 157, .W = 283}, // 2
{.N = 79, .C = 4090, .H = 1, .W = 1 }, // 3
{.N = 6132, .C = 21, .H = 1, .W = 1 }, // 4
};
一般做法
算丰算子库中没有提供softmax算子,因此需要使用基础算子完成softmax操作。
softmax的表达式为:
可以看到需要使用的有exp算子,div算子,以及一个跨channel求和的操作。
算丰的算子库也没有提供跨channel求和的操作。这里提供两个思路,一个是使用权重为1.0的1X1卷积,需要多开辟一些空间存放卷积核参数,另一个是由于在计算c维度的softmax的时候,HW的大小并不会有影响,因此,可以将C维度移动到H维度,原本的H,W维度移动到W维度,[N,C,H,W]->[N,1,C,H*W],这样可以通过在新的H维度计算avgpool再乘以元素个数来达到求和的目的。
local_addr_t input_addr, output_addr, sum_addr;
S2L(input_addr, param->addr);
cal_exp(input_addr); // input = exp(input)
cal_sum(sum_addr, input_addr); // sum = input.sum()
div(output_addr, input_addr, sum_addr); // output = input / sum
L2S(param->output_addr, output_addr);
这里的按N切分就比较简单,例如用例4,只需要切分到合适的N就可以完成计算:
local_addr_t input_addr, output_addr, sum_addr;
for(int i=0;i<blocks;i++)
{
S2L(input_addr, param->addr + input_skip_bytes);
cal_exp(input_addr); // input = exp(input)
cal_sum(sum_addr, input_addr); // sum = input.sum()
div(output_addr, input_addr, sum_addr); // output = input / sum
L2S(param->output_addr + output_skip_bytes, output_addr);
}
在softmax中,在使用avgpool计算求和时,由于将C维度移动到了H维度,因此原本的C维度就为1。
但是我们知道,比赛使用的有64个NPU,当C为1时,仅仅会使用第一个NPU进行计算。因此,可以通过将N合理分配到N以及C两个维度,使得既能通过N的切分存放数据,也能够最大利用NPU的算力。
在计算用例2时,由于H和W都为1,因此可以将C维度的4090分摊到H和W维度,将池化由[4090,1]改为[409,10],这样能够加快NPU的计算。
另外,例如用例3,无论怎么调整,C维度的占用都比较少,这时可以调整数据搬入时的stride,使得C维和H维转置,[4,2,157,283]−>[4,157,2,283],这样即可以避免NPU使用不足的问题,也可以避免local memory不够用的问题。