Burst Buffer was proposed to address the growing performance gap between computation and I/O in high-performance computing systems (HPC). However, the introduction of Burst Buffer has brought new issues in system resource management and job scheduling. Resource management systems on HPC platforms lack efficient management of multiple types of resources, and the allocation of computing resources and Burst Buffer resources is independent of each other. This can lead to underutilization of system resources and job blocking. To address these issues, this paper proposes GABB, a plan-based job scheduling strategy for shared Burst Buffer, and utilizes genetic algorithms for optimization. GABB employs a plan-based job scheduling strategy to uniformly schedule all jobs in the waiting queue to generate an execution plan. Furthermore, GABB comprehensively considers changes in system resources and the job's demand for computing and Burst Buffer resources during the job scheduling process. Finally, GABB utilizes an improved genetic algorithm to optimize the job scheduling scheme. We conducted experimental simulations of a shared Burst Buffer system and implemented a plan-based job scheduling algorithm. The experimental results indicate that BBGA significantly reduces the mean waiting time of jobs by over 20% compared to the Shortest Job First (SJF) algorithm, and reduces the mean bounded slowdown of jobs by over 25%.
NVMe over TCP is a key technology for building large-scale high-performance storage systems. It can realize NVMeoF (NVMe over Fabrics) storage network based on existing data center network infrastructure and standard TCP/IP software protocol stack. In this paper, we design and implement the Load-aware NVMeoF message processing mechanism LANoT (Load-aware NVMe over TCP). Firstly, the interrupt coalescing technology based on PDU aggregation is used to alleviate the interrupt storm problem and achieve high throughput. Secondly, matching the special message processing mechanism, which can effectively improve the key performance indicators according to the I/O characteristics of different dedicated queues. This paper implements the LANoT prototype system in Linux kernel. The performance test results show that comparing to NVMe over TCP implementation in standard Linux kernel, LANoT can significantly improve IOPS metrics and reduce CPU resource consumption by more than 50%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.