Rob Everest University of New South Wales Embedding foreign code ---------------------- Special purpose embedded languages facilitate generating high-performance code from purely functional high-level code; for example, we want to program highly parallel GPUs without the usual high barrier to entry and the time-consuming development process. To enable this we have Accelerate, a Haskell EDSL that uses a skeleton-based, generative approach to generate low-level CUDA code for execution on Nvidia GPUs. In this talk, I will describe work done by myself (Robert Clifton-Everest), Trevor L. McDonell, Manuel M. T. Chakravarty, and Gabriele Keller and for which a paper we wrote was recently accepted to PADL. What I will describe is our solution to some of the practical problems with skeleton-based code generation and introduce an approach to enabling interoperability with native code. In particular, I will describe how template meta programming simplifies code generation and optimisation. Furthermore, I will present a design for a foreign function interface for an embedded language. Earlier versions of Accelerate implemented CUDA C code templates and template instantiation with a mixture of C++ templates and C preprocessor macros. This solution, while workable, was not ideal. Not only was it fragile, but it provided no guarantees on the validity of the generated code. By leveraging the quasiquotation extension to template Haskell we are able to define CUDA skeletons as quoted CUDA C templates. Doing it in this fashion, the syntactic validity of the skeleton is checked at compile time. This ensures that if we can compile the backend, the code generated will always be syntactically valid. In addition, we can implement consumer-producer fusion by specific skeleton instantiation. In the latter part of the talk I will present what is, to my knowledge, the first foreign function interface for an embedded language. By leveraging Haskell's own FFI and template Haskell we are able to build build an FFI into Accelerate that allows for both calling foreign functions from within an Accelerate program, and for embedding an Accelerate program into an existing CUDA C/C++ program.