Apache Spark is an open source software project that began as research in the UC Berkeley AMPLab. Over the last four years, Spark has grown to be one of the largest open source communities in big data, with over 120 developers and 30 companies contributing, the latest release alone had contributions from 83 people. Spark is a unified platform with an interactive shell, a clean API, a distributed in-memory computing engine, stream processing components, interactive SQL support, as well as libraries for machine learning and graph computing.
New memory technologies promise denser and cheaper main memory, and may one day displace DRAM. However, many of them experience permanent failures due to wear far more quickly than DRAM. DRAM mechanisms that handle permanent failures rely on very low failure rates and, if directly applied to PCM, are extremely inefficient.
To mitigate the impact of the widening gap between the memory needs of CPUs and what standard memory technology can deliver, system architects have introduced a new class of memory technology termed persistent memory. Persistent memory is byte-addressable, but exhibits asymmetric I/O: writes are typically one order of magnitude more expensive than reads. Byte addressability combined with I/O asymmetry render the performance profile of persistent memory unique. Thus, it becomes imperative to find new ways to seamlessly incorporate it into database systems.