“Portability and Performance in Heterogeneous Many Core Systems”
(Project PTDC/EIA-EIA/100035/2008)
Task “T5 - Distributed Memory” - Description and Expected results
OpenCL [1] is a framework for platform-independent parallel programming of heterogeneous systems and includes a language, API, libraries and a runtime system to support software development and execution. The platform model consists of a host connected to one or more OpenCL devices and assumes a relaxed consistency shared memory model among them; OpenC L applications perceive the underlying computing system as a single shared memory heterogeneous parallel computer system.
Interactive rendering of large scenes with high resolution frames still requires a networked cluster of machines to handle the accumulating workload; this is particularly true if indirect diffuse phenomena are included in the light transport simulation, due to the large number of secondary rays that are generated. The goal of this task is to extend the parallel computing model of OpenCL to a cluster environment, thus allowing exploitation of powerful heterogeneous devices scattered on the distributed environment.
Extension into a cluster will be achieved by introducing a new distinct memory region to the OpenCL memory model, in addition to the “private”, “local”, “constant” and “global” memory regions, defined by the OpenCL specification; this will be referred to as the “distributed” memory region. This region will allow read/write access from any OpenC L device within the cluster. The extended memory model will support an extended execution model which includes the concept of “cluster application”, in addition to “host program” and “kernels” (originally defined by the OpenCL specification). The cluster application will be responsible for the dynamic allocation of distributed memory across the nodes. Distributed memory will be available to OpenCL devices under a relaxed consistency memory model, in accordance with the OpenCL specification. Such consistency memory model suits particularly well the RDMA mechanisms offered by high performance interconnects (e.g., Myrinet 10G) where remote memory is asynchronously accessed. Completely specifying this extended memory semantics will be the first goal within this task.
Distributed memory will be supported by extending the primitives of the OpenC L Platform and Runtime Layers. OpenCL Platform Layer extensions will include, at least, new parameter values for device querying (in order to be able to choose from several OpenC L devices of the cluster) and context creation (in order to be able to define and operate contexts that group OpenCL devices from different cluster nodes). OpenC L Runtime extensions will comprise, at minimum, new parameters or parameter values for creating buffer objects suitable for operation under distributed memory contexts; additionally, the OpenCL primitives for reading, writing and copying buffer objects will also be addressed.
The proposed extensions will be supported through a cluster-aware runtime environment consisting on a set of services, running at each cluster node, and a suite of management applications. The services will implement a directory of resources, will support
the remote operation of OpenCL entities and will provide the distributed memory address space. A preprocessor will allow the usage of the proposed extensions by generating the code required to interact with the cluster-aware runtime environment.
The task will produce four major results: 1) the specification of the “distributed” memory model semantics; 2) extension of a subset of the OpenC L API to support the new model abstractions; 3) a preprocessor to enable the usage of such extensions; 4) a cluster-aware runtime environment to be integrated with each local OpenC L runtime.
This task has to integrate the performance model designed in T2, the scheduling mechanism proposed in T4 and the OpenCL kernels developed in T3. The team members from Bragança (IPB) are fundamental to this task success given their extended experience and knowledge with respect to distributed shared memory concepts and systems.
[1] 2008 “The OpenCL Specification, Version: 1.0”; Khronos OpenCL Working Group;
Editor: Aaftab Munshi