Robin Garner Australian National University Effective Prefetch for Garbage Collection At the heart of most garbage collectors is a transitive closure operation over the heap. This operation dominates the running time of the collector, and tuning the central loop of the closure is an important step in improving the performance of a garbage collector. In this talk we look systematically into the tracing process, splitting the loop into its component parts and with the assistance of hardware performance counters we identify the dominant costs of each part. We show that no single operation accounts for more than 40% of the cost, and that both CPU and Memory costs are significant. Using this analysis, we make several observations on the design of mark-sweep garbage collectors that challenge conventional wisdom, and present a new hybrid marking scheme that outperforms each of the primary alternatives. Our main result is to reconsider the basic process of the depth-first search of the heap graph. By adopting a search algorithm which at first glance has worse complexity, we improve the locality of the algorithm. By itself the tradeoff is shown to be cost-neutral, but the modified trace opens up more scope for software prefetching, and we obtain speedups significantly better than any reported in previous work. This work was presented at ISMM in Montreal, October 2007.