Reading Group Archive (2014)
How to Design Programs
On October 17 to November 28 we discussed the book How to Design Programs by Felleisen et. al.
On October 3 we hosted a guest speaker Ben Lippmeier talking about his work on applying flow fusion in compilers to scale computations to large data sets.
Flow Fusion for the Biggest Data
The great MapReduce has fallen and shattered into myriad creatures: Pig, Impala, Dryad, Naiad, Dremel, Druid, and Shark to name a few. Once revered as the gateway to the biggest data, raw MapReduce is now being covered up and pushed aside in favour of faster and easier approaches. The myriad creatures Pig, Impala, Dryad and so on are languages and systems for distributed programming, largely originating from the databases community. Although they seem like a disparate collection at first, on closer inspection they turn out to share a common data-parallel data-flow architecture. Not mere databases, most are general purpose programming environments which support computational workloads consisting of arbitrary user defined-code. Not mere programming environments, some embed full language run-times and compile incoming queries to native code on the fly. Though this lens the modern distributed database is a full language, compiler, and runtime -- query optimisation is data-flow program optimisation, and query compilation is data-flow program compilation. The functional programmers in the audience will know that data-flow programs are also first-order functional programs, so they're that too.
In this talk I'll give an overview of the myriad creatures, then present my recent work on query / data-flow compilation. Existing compilation methods cannot fuse general branching data-flows -- meaning that if a produced table is consumed by several physical operators it may be materialised in memory (or on disk) in an intermediate stage. Flow fusion is an approach to compile such data-flows into single imperative loops that do not require intermediates to be materialised. I'll also talk about the ramifications for query planning -- choosing to fuse clusters of operators into single loops prevents others from being fused. In recent joint work we reduce this query planning problem to an Integer Linear Programming problem and use an external solver to compute the optimal clustering.
Respect Your Parents: How Attribution and Rewriting Can Get Along
On August 29 Tony Sloane gave a practice version of an upcoming SLE 2014 talk.
Attribute grammars describe how to decorate static trees. Rewriting systems describe how to transform trees into new trees. Attribution is undermined by rewriting because a node may appear in both the source and product of a transformation. If an attribute of that node depends on the node’s context, then a previously computed value may not be valid. We explore this problem and formalise it as a question of ancestry: the context of a node is given by the tree’s parent relationships and we must use the appropriate parents to calculate attributes that depend on the context. We show how respecting parents naturally leads to a view of context-dependent attributes as tree-indexed attribute families. Viewed in this way, attribution co-exists easily with rewriting transformations. We demonstrate the practicality of our approach by describing our implementation in the Kiama language processing library
Joint work with Matt Roberts and Len Hamey.
Type Inference for the Spine View of Data
On August 22 Matt Roberts gave a practice version of his upcoming WGP 2014 talk.
In this work we describe both a type checking and a type inference algorithm for generic programming using the "spine view" of data. The spine view of data is an approach to decomposing data in functional programming languages that supports generic programming in the style of Scrap Your Boilerplate and Stratego. The spine view of data has previously been described as a library in a statically typed language (as in Haskell), as a language feature in a dynamically typed language (as in Stratego), and as a calculus of patterns (as in the Pattern Calculus). The contribution of this paper is a type inference algorithm for the spine view and a sound type relation that underlies this inference algorithm. The type inference algorithm does not require any type annotations to be added in support of the spine view. This type inference algorithm is an extension of Hindley-Milner type inference, thus showing how to introduce the spine view of data as a language feature in any functional programming language based on Hindley-Milner.
Joint work with Tony Sloane.
Visual Dataflow Languages, Attribute Grammars
On August 15 we hosted a visitor Niklas Fors who is a PhD student at Lund University in Sweden. He will give a short presentation on his work building extensible visual dataflow-based languages (abstract below). We will also discuss attribute grammar systems, since Niklas' supervisor Prof Görel Hedin and colleagues are responsible for a number of contributions in that area, most notably reference attribute grammars and the JastAdd system.
Intercepting Dataflow Connections in Diagrams with Inheritance
Niklas Fors and Görel Hedin
Control systems are often built using visual dataflow-based languages, and supporting different variants may be challenging. We introduce the concept of connection interception based on inheritance. This mechanism allows a diagram to extend another diagram and intercept connections defined in the supertype, that is, to replace it by two other connections, in order to specialize the behaviour. This can be used to create extensible libraries that support different variants.
On July 11 we discussed a recent paper on the start of the art in language workbenches.
Type and Effect Systems
On June 27th we discussed a paper on Type and Effect Systems by Flemming Nielson and Hanne Riis Nielson.
On June 20th we watched and discussed a recent talk by Martin Odersky on Scala: The Simple Parts.
On August 1, James Moss gave a demo of his undergraduate winter project work on embedding a DSL in Swift.
On June 6th we investigated Apple’s new programming language Swift via some readings and a demo.
On May 16 and 23 we discussed Language-Oriented Programming, based on two overview articles: Language-Oriented Programming: The Next Paradigm and From Programming To Modeling – and back again. This week provides background for future discussions of JetBrains' MPS development environment.
On May 2 we discussed an unpublished paper about the language Katahdin: Mutating a Programming Language’s Syntax and Semantics at Runtime.
On April 4 we conducted a partial tour through the Ceylon language from Red Hat. Only the first few sections were covered.
On March 28 we discussed a paper about Parallaxis-III, a language for architecture-independent data parallel processing.
Data Parallel Haskell
On March 21 we discussed a status report from the Data Parallel Haskell project which aims to achieve expressive, efficient nested data parallel computation in the Haskell language.
Maxine Java VM
On March 14 we discussed a paper about the Maxine Java VM that is written in Java and is intended to form a more practical basis for teaching and research than the mainstream JVM implementations.
On Feb 28 we discussed F#'s facilities for extensible pattern matching by reading an ICFP 2007 paper on the topic.
On Feb 21 we discussed F#'s Strongly-Typed Language Support for Internet-Scale Information Sources (i.e., type providers).