From Papers to Products: Open-source Research Projects, Startup Companies

Apache Spark is an open source software project that began as research in the UC Berkeley AMPLab. Over the last four years, Spark has grown to be one of the largest open source communities in big data, with over 120 developers and 30 companies contributing, the latest release alone had contributions from 83 people. Spark is a unified platform with an interactive shell, a clean API, a distributed in-memory computing engine, stream processing components, interactive SQL support, as well as libraries for machine learning and graph computing.

Write-limited sorts and joins for persistent memory

To mitigate the impact of the widening gap between the memory needs of CPUs and what standard memory technology can deliver, system architects have introduced a new class of memory technology termed persistent memory. Persistent memory is byte-addressable, but exhibits asymmetric I/O: writes are typically one order of magnitude more expensive than reads. Byte addressability combined with I/O asymmetry render the performance profile of persistent memory unique. Thus, it becomes imperative to find new ways to seamlessly incorporate it into database systems.


Subscribe to UW-Madison Computer Sciences Department RSS