Thermal Management of Data Centers

Through Thermal-Aware Job Scheduling


People and Sponsors

Principal Investigator:
          Sandeep K. S. Gupta

Postoctoral Researchers:
          Georgios Varsamopoulos

PhD Students:
         Qinghui Tang
          Tridib Mukherjee
          Ayan Banerjee
          Michael Jonas

Sponsors:
Intel NSFSFAz
Goal and Rationale

With the prevailing usage of high density blade servers, the heat dissipation density of data centers increases exponentially. The high temperature of the data centers will lead to higher hardware failure costs. Improperly designed or operated data centers may either suffer from overheated servers and potential system failures, or from overcooled systems and paying extra utilities cost. Minimizing the cost of operation (utilities, maintenance, device upgrade and replacement) of data centers is one of the key issues to optimize computing resources and maximize business outcome.

The goal of this project is to build a dependable, reliable sensing platform using on board sensors and ambient sensors to collect temperature, humidity, power consumption and computer load information. Combining this data with a heuristic control algorithm, we can dynamically adjust the thermal environment (by making smarter job scheduling decisions, by adjusting air conditioner capacity, fan speed, frequency and voltage scaling, etc.) to achieve a better thermal environment, reduce the cost of operations and improve the business output of data centers.

In The Press

Forbes The Future Is NowDynamic thermal management of the data center – Developed in conjunction with Arizona State University, this research enables job scheduler software to take into account the temperature of servers or server blades before deciding which data center component should do the job. The result should be an online thermal control framework that monitors and manages data center thermal performance from a holistic viewpoint. The researchers say the challenge for the project is to make the system reactive so that it knows when servers are starting to fail because of heat issues. They say it could be another two years before this project could be presented to Intel as a potential product.…”

Project Timeline
Timeline Achievements
2005 Developed abstract heat flow model for data center and verified with CFD simulation
Q4 2005 Developed thermal aware scheduling based on the abstract heat flow model and verified with CFD simulation
Q1 2006 Published a paper on thermal aware scheduling for data centers in DASC 2006
Q2 2006 Developed a software architecture for thermal aware acheduling for Moab Cluster Manager and successfully demostrated the software architecture at Research @Intel Day using the ASU HPC datacenter
Q3 2006 Demostration of thermal aware scheduling at Intel Country Fair
Q4 2006 Published papers on abstract heat flow model (in ICISIP 2006)
Q1 2007 Published a paper on thermal aware scheduling software architecture (COMSWARE 2007)
Q2 2007 Performed power profiling of Data Center computing equipment Performed analysis simulations on heterogeneous data center
Q3 2007 Published a paper on thermal aware scheduling software architecture (IEEE Cluster 2007)
Q4 2007 Performed Simulations on Thermal-aware placement of queued tasks
Q1 2008 Performed Performance Simulations of incremental heuristics
Q2 2008
- Qinghui Tang successfully defended his PhD thesis
- Paper titled "Energy-Efficient, Thermal-Aware Task Scheduling for Homogeneous, High Performance Computing Data Centers: A Cyber-Physical Approach" accepted to appear in TPDS Special Issue on Power-Aware Parallel and Distributed Systems.
- Georgios Varsamopoulos gave an invited talk on thermal-aware data center management at the MoabCon conference.
Q3 2008
- IMPACT released a whitepaper on Do Cool: coordinated, thermal-aware resource management data centers .
Publications
  • Georgios Varsamopoulos, Ayan Banerjee, Sandeep Gupta. Energy Efficiency of Thermal-Aware Job Scheduling Algorithms under Various Cooling Models. International Conference on Contemporary Computing (IC3), Noida , India, August 2009. [ PDF | Bib ]
  • Tridib Mukherjee, Ayan Banerjee, Georgios Varsamopoulos, S. K. S. Gupta, and Sanjay Rungta, Spatio-Temporal Thermal-Aware Thermal-Aware Job Scheduling to Minimize Energy Consumption in Virtualized Heterogeneous Data Centers. (Elsevier) Computer Networks, Special Issue on Virtualized Data Centers(ComNet), accepted (2009). [PDF]

  • Qinghui Tang, Thermal-Aware Scheduling In Environmentally Coupled Cyber-Physical Distributed Systems. PhD Thesis, June 2008. [ Thesis | Presentation ]

  • Qing Tang, S. K. S. Gupta and Georgios Varsamopoulos, Energy-Efficient, Thermal-Aware Task Scheduling for Homogeneous, High Performance Computing Data Centers: A Cyber-Physical Approach. Transactions on Parallel and Distributed Systems, Special Issue on Power-Aware Parallel and Distributed Systems (TPDS PAPADS), 19:(11), pp. 1458–1472, November 2008. [PDF]
  • Q. Tang, S. K. S. Gupta, and G. Varsamopoulos, Thermal Aware Task Scheduling for Datacenters through Minimizing Heat Recirculation. Cluster 2007, Austin TX. Sept. 2007. [PDF]
  • T. Mukherjee, G. Varsamopoulos and S. K. S. Gupta, Measurement-based Power Profiling of Data Center Equipment. GreenCom 2007, Austin TX. Sept. 2007. [PDF]

  • M. Jonas, G. Varasamopoulos, and S. K. S. Gupta, On developing a fast, cost-effective and non-invasive method to derive data center thermal maps. (Extended Abstract) Workshop on Green Computing (in conjunction with CLUSTER 2007), Austin, USA, Sept, 2007. [PDF | PPT]
  • T. Mukherjee, Q. Tang, C. Ziesman, and S. K. S. Gupta, Software Architecture for Dynamic Thermal Management in Datacenters. in Int’l Conf. Communication System Software & Middleware (COMSWARE), Jan 2007. [PDF]

  • Q. Tang, T. Mukherjee, S. K. S. Gupta, and P. Cayton, Sensor-Based Fast Thermal Evaluation Model For Energy Efficient High-Performance Datacenters. in Int’l Conf. Intelligent Sensing & Info. Proc. (ICISIP2006), Dec 2006. [PDF]

  • Q. Tang, Sandeep. K. S. Gupta, Daniel Stanzione, and Phil Cayton, Thermal-Aware Task Scheduling to Minimize Energy Usage of Blade Server Based Datacenters. in 2nd IEEE International Symposium on Dependable, Autonomic and Secure Computing (DASC'06). [PDF]

Research Issues
ASU Data Center
  • High-fidelity sensor data collection and aggregation
  • Characterizing the correlation among temperature rise, computer load and power consumption
  • Building a computational thermal model of a data center
  • Creating a thermal-aware scheduling algorithm that incorporates:
    • Fault tolerance, prediction, and avoidance
    • Thermal interference effects on neighboring computing nodes (heuristically)
  • Mathematically derive an optimal thermal scheduling algorithm
  • Measuring effectiveness of the real-world solution against the optimal solution
Related Work
Links

Guideline

  • "Thermal Guidelines for Data Processing Environments" of ASHRAE (2004)

Load Balancing Software

Data Center Related Conference

Organizations

Community, publication and websites

Imote useful sites

Environmental sensors for data center

Power Meters and Systems

Wireless Power Meter

Wireless metering systems

Data Center Products


Project Planning Page