The 10x10 Foundation for Heterogeneity: Clustering Applications by Computation and Memory Behavior
Apala Guha; Pietro Ciccotti; Allan Snavely; Andrew A. Chien. 16 February, 2012.
Communicated by Andrew Chien.
The performance and energy-efficiency advantages of cus- tomized architectures are well-known and widely-pursued. To date there has been no systematic basis to balance benefit from specialization with general-purpose coverage, much less to assemble accelerators to collectively support a general- purpose workload. Our study is a first step to create a systematic basis for heterogeneous architectures, balancing specialization and general-purpose coverage.
We analyze a collection of 34 programs drawn from five major benchmark suites and a few independent sources, producing clusters of critical loops based on operation and datatype as well as three dimensions of memory behavior. Computational characteristics of each cluster are an op- portunity for customized architecture. The operation and datatype analysis produces 25 multi-loop clusters, corre- sponding to over 90% of the computations. Memory be- havior studies produce a similar number of clusters. The clusters can be exploited individually, or in groups, enabling specialization to be traded systematically against generality as proposed in Borkar and Chien’s “10x10” , influencing choices of accelerators, accelerator breadth, and ensembles of accelerators (in an SoC). For several clusters, we discuss examples of how they might be exploited architecturally for improved performance or energy efficiency. The fact that there are a reasonable number of clusters and that these clusters span multiple application domains, demonstrates that architecture design need not cater to each domain in- dividually, and bodes well for 10x10 research. We also show that the clusters are coherent, distinct, stable and that the applications are a good representation of general-purpose workloads.
The original document is available in PDF (uploaded 16 February, 2012 by