TOP Network Biweekly Report: May 15, 2024-May 28, 2024

TOP AI Network Official
TOP AI Network
Published in
2 min readMay 28, 2024

Testing and Optimization of Multi-Node Multi-GPU Distributed Training Environment

After completing the Dockerization of computing resources, the team focused on testing and optimizing the multi-node multi-GPU distributed training environment. First, a benchmark for single-node multi-GPU was established, with all tests based on training the k2 model with billions of parameters. Subsequently, the team validated the multi-node multi-GPU logic under the PyTorch framework, ensuring the feasibility of distributed training, and expanded the cluster to a maximum of 16 GPUs. Through testing, the team confirmed that increasing the number of GPUs and nodes effectively improves training efficiency, while also identifying the necessity for cluster parameter optimization and improved testing methods.

Model Training Performance Analysis and Future Cluster Adjustment Strategies

In terms of performance analysis, the team found that increasing the number of GPUs and nodes shortens training time, but the efficiency of this reduction decreases as the numbers increase. Therefore, there is room for optimization in GPU utilization within the cluster. The team also conducted performance tests on different models, such as resnet18, resnet50, and deepspeed in the multi-node multi-GPU cluster mode, and found that the training time was similar, further validating the generality of the results. The current small cluster can cover the training and tuning of most AI models, except for some large language models. The next phase will focus on improving cluster computing efficiency, coordinating cross-regional cluster work, and ultimately establishing a decentralized cluster.

Open Source — Crosschain Bridge

The TOP cross-chain bridge facilitates asset bridging between the TOP chain, Ethereum, and EVM-compatible chains. It plays a crucial role within the TOP chain ecosystem. Currently, the entire cross-chain bridge component is open source.

Check the code from:
https://github.com/telosprotocol/TOP-crosschain-front

Find More

Official Website | Telegram | Twitter | Medium | Reddit | Email

--

--

TOP AI Network Official
TOP AI Network

TOP AI Network is a public blockchain that employs sharding technology and a three-layer network to support an AI model service market. >>> www.topnetwork.org