The objective of this study is to construct a simulation platform based on OMNeT++ to validate the potential performance bottlenecks in large-scale compute system networks. Networks composed of thousands of interconnected nodes may encounter unforeseen issues. During the validation process in a real environment, hardware fluctuations can lead to inconsistent results between actual and expected outcomes. Additionally, some servers may be leased and continuously running, which makes it impossible to perform performance tests on them and identify potential performance bottlenecks. Therefore, a simulation platform is needed to simulate and discover these underlying issues. To fulfill the aforementioned requirements, a conversion tool was developed initially to enable the simulation platform to recognize the network's topology. By integrating with OpenSM, the simulation platform acquired routing capabilities identical to those of a real network. Furthermore, the parameters of the simulation platform were adjusted to match the hardware capabilities of real network devices, successfully simulated the delay, bandwidth in point-to-point communication and bandwidth under congested communication scenarios. Finally, the simulation included the collective communication latency of the network. By comparing the simulation results with the data collected from the real environment, it was found that the error between the simulated and actual results was within 10%, thus validating the accuracy of the simulation results. The collective communication results obtained from the simulation were analyzed to identify potential network performance bottlenecks that may arise when running large-scale jobs in a real system.
InfiniBand interconnect network is widely used in parallel computing. Network adapter (HCA) is one of the necessary hardware components for architecture deployment. With the expansion of nodes in parallel computing system, InfiniBand network may have higher performance requirements. However, the current InfiniBand network uses deterministic routing, and adaptive routing can be introduced to meet higher requirements. Because the original InfiniBand protocol is strictly sequence-preserving, adding adaptive routes to the network improves performance but requires out-of-order rearrangement of network cards, which requires a certain cache capability. In this paper, the function extension of InfiniBand protocol is designed to realize the adaptive function and complete the modeling and simulation. Two indexes related to out-of-order are also designed in this paper, and the direct influence of adaptive routing on packet out-of-order and network adapter buffer depth is studied from many aspects. Through the simulation of the above system, the experimental result of All-to-All communication between model nodes shows that under the Greedy and RoundRobin adaptive port selection algorithms, the buffer depth required by the network card is 61 and 62 2KB of packet space, respectively. This provides an estimate of the buffer that the network card needs to handle out-of-order packets.
MPI (Message Passing Interface) plays a crucial role in the field of parallel computing. In the Allreduce algorithm of the OpenMPI communication library, there are some issues in handling communication scenarios with a number of processes that is non-power-of-two. The two existing algorithms address this by excluding some processes to achieve a power-of-two process count. However, the consideration factors are too simplistic, resulting in an imbalanced distribution of participating processes on nodes, greatly impacting communication efficiency. To address this problem, the layout of processes on nodes is taken into consideration, and the range of excluded processes is redefined. Both algorithms are subjected to generic load balancing optimizations and adaptations for domestic architectures, resulting in improved load balancing. Experimental results show that, under a communication scale of 16 nodes, the recursive_doubling algorithm achieves performance improvements of up to 30%, while the reduce_scatter_allgather algorithm achieves performance improvements of up to 21%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.