Building a large scale database on top of MPI

In this talk I will report on our efforts to build a relational database engine capable of operating at supercomputer scale. Traditional database engines are notoriously difficult to scale because of the adherence to strong consistency even in the event of failures. Cloud computing made such a limitation very obvious and gave rise to a wide range of alternative designs where performance and scalability were almost always chosen over consistency. Over the years, it has become clear that designs with reduced consistency only shift the burden of guaranteeing correctness to the application, where it is actually very difficult to provide sufficient guarantees. In the talk I will walk through the general design of a relational database engine, indicate where the bottlenecks lie, and show the results we have obtained so far in running large scale parallel operators and implementing highly distributed, strongly consistent concurrency control using MPI on a supercomputer. Our results indicate that, taking advantage of hardware trends and through careful design, it is possible to build a scalable relational engine capable of serving the needs of high performance computing environments.

Gustavo Alonso studied Telecommunications -electrical engineering- at the Madrid Technical University (ETSIT, Politecnica de Madrid). As a Fulbright scholar, he completed an M.S. and Ph.D. in Computer Science at UC Santa Barbara. After graduating from Santa Barbara, he worked at the IBM Almaden Research Center before joining ETH Zurich. At ETH, he is part of the Systems Group and the Head of the Institute of Computing Platforms. Gustavo is a Fellow of the ACM and of the IEEE as well as a Distinguished Alumnus of the Department of Computer Science of UC Santa Barbara.
His research interests encompass almost all aspects of systems, from design to run time. He works on distributed systems, databases, cloud computing, and hardware acceleration of data science. His recent research is related to multi-core architectures, large clusters, FPGAs, large scale data processing, cloud computing, and big data, mainly working on adapting traditional system software (OS, databases, networking) to modern hardware platforms.