paper

application of gpu on-orbit and self-adaptive scheduling by its internal thermal sensor

Paper number

IAC-18,D1,3,8,x46977

Author

Mr. Nan Li, China, University of Chinese Academy of Sciences; Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences

Coauthor

Mr. Aimin Xiao, China, Chinese Academy of Sciences

Coauthor

Mr. Mengxi Yu, China, Technology and Engineering Center for Space Utilization,Chinese Academy of Sciences

Coauthor

Dr. Jianquan Zhang, China, Chinese Academy of Sciences

Coauthor

Dr. Wenbo Dong, China, Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences

Year

2018

Abstract

High-performance COTS components such as GPU and FPGA have been widely used in the applications of big data technology and artificial intelligence technology on the ground. In some on-orbit systems for high-performance applications where GPU components have to be used, the commercial components must be chosen and its strict requirement on power and thermal conditions must be considered, including power supply capability, thermal convection or thermal conduction conditions and so on.

This paper presents a method operating on GPU processors to achieve acceptable computing performance while keeping the temperature sampled from GPU internal thermal sensor within a reasonable range. The feedback control strategy is designed with the sampled GPU temperature as input and the recommended amount of parallel computing resources occupied as output. In other words, the heat of the running GPU is reduced by degrading the GPU performance properly. Furthermore, multiple GPU workloads are scheduled in a single processor and the number of concurrent threads is adjusted correspondingly in a fine-grained manner. 

Fuzzy logical control theory is used to estimate the correlation between the sampled GPU temperature and the number of typical concurrent thread operations. With regard to a single workload, dependency graph is applied to guarantee the computing sequence of different parts inside the workload when it has to limit the degree of parallelism. As to multi-independent workloads, different priorities are assigned to these workloads in order to guarantee the performance of workloads with high priorities. 

Because the GPU processor of Jetson TX2 module is a system on chip, a simple self-adaptive scheduling framework based on the above method is implemented in ARM cores of the processor. Moreover, ethernet communication interface is supported by the framework and it is easy to expand the framework to the circumstances in which several Jetson TX2 modules are interconnected or several docker containers are interconnected. 

When the Jetson TX2 module is put into a thermal chamber, experiments show that this method can successfully complete the auto adjustment of scheduling strategy in accordance with the environment temperature variation. As a result, it reduces the negative impact from the bad situations when thermal throttling is triggered, which incurs large fluctuations of the GPU frequency and power consumption.

Abstract document

IAC-18,D1,3,8,x46977.brief.pdf

Manuscript document

IAC-18,D1,3,8,x46977.pdf (🔒 authorized access only).

To get the manuscript, please contact IAF Secretariat.