PhD Final Oral Defense Kwanghyun Park

Friday, May 27, 2016 -
10:00am to 12:00pm
3310 Computer Sciences

Speaker Name: 

Kwanghyun Park

Speaker Institution: 

UW--Madison

Cookies: 

No

Description: 

*Title:* Data Processing Using Flash Storage: Some Opportunities and
Limitations

*Committee: *
(Advisor) Jignesh M. Patel, Professor, Computer Sciences
Jeffrey F. Naughton, Professor, Computer Sciences
AnHai Doan, Professor, Computer Sciences
Paris Koutris, Assistant Professor, Computer Sciences
Yang-Suk Kee, Director of Samsung Memory Solutions Lab, Electrical
Engineering

*Abstract*
*In many data intensive workloads, I/O is a key bottleneck. In a storage
hierarchy in a canonical database system, non-volatile storage devices
(e.g., hard disk drives and flash solid state drives) are used as permanent
data storage subsystems, whereas volatile storage devices (e.g., DRAM and*
*CPU registers) are used to stage data from the non-volatile storage for
processing by the CPU. Under this hardware architecture, non-volatile
storage is connected to the rest of the system via common host I/O
interfaces, such as SAS, SATA, and PCIe. Data movement costs through these
I/O interfaces has become the largest performance bottleneck for many data
intensive workloads. Therefore, in this thesis we explore alternative
solutions to achieve high performance data processing by reducing the
(expensive) data movement cost across the I/O interfaces.*
*In the first and second parts of this thesis, we propose a “code push
down” technology to reduce the data movement cost from flash solid state
drives (SSDs) to DRAM. We use the computation capability of the SSD device
to push down selected database operations into the SSD devices, thereby
dramatically reducing the actual data movement cost through host I/O
interfaces. *
*Another alternative solution that we propose in the third part of this
thesis is to preload necessary data (i.e., hot data) from disks to DRAM
before the user query actually requests the data. In this part of thesis,
we focus on how to load hot data efficiently at system restart, which could
save the data movement cost at query time. *
*Collectively this thesis discusses some opportunities for using SSDs in
data processing platforms, and develops insights about the current
limitations and potential future opportunities for using the computational
processing power inside SSDs to alleviate the I/O bottleneck in data
intensive workloads.*

T