Most people exhibit the patience of a two-year-old when it comes to apps loading on our smartphones or information retrieval on our computers. We post on social media, collect fitness information, and keep hundreds or even thousands of photographs on our phones and computers or in the cloud without a thought about what it takes to store and manipulate those massive amounts of data.
But as we gather more and more data and ask more sophisticated questions about it, the speed with which the data can be manipulated becomes slower and slower. If we continue using the same methods of data storage and computation, we’ll reach a point when analysis of that data will become so slow that it will no longer be effective, according to Jignesh Patel, a University of Wisconsin-Madison professor of computer science and an expert on databases, or big data. Our apps won’t launch as fast and the information we need won’t be at our fingertips.
Now, however, Patel and his collaborators at UW-Madison and eleven other U.S. universities that make up CRISP – the Center for Research on Intelligent Storage and Processing-in-memory — hope to salve our data frustrations with the help of a grant from the Semiconductor Research Corporation (SRC).
The way things work now, explains Patel, data storage and computation are in two separate buckets connected by the computational equivalent of a straw: data travels through the straw to be analyzed in the computation bucket, and then more data (the original data and the data generated by the analysis) go back through the straw to the data bucket. As more and more data is gathered, users are limited by the amount of data that can move through that straw. “Data is the new oxygen, and the computing engines, i.e. processors, can do more and more computing. To harness their full potential, you need to get the oxygen to them,” says Patel.
At its core, there is a fundamental research aspect. Computing is going through a radical transformation. It used to be the case that every generation, which was about every two years, the compute engines (processors) would get about twice as fast. This exponential growth was achieved largely by increasing the speed with which the transistor can be turned on and off (which is how computation is carried out). But we can’t flip the transistors on and off much faster than we do today, as doing so can generate so much heat that the processor can melt down. However, we can build processors that are more powerful by packaging a large number of simpler/slower compute engines together. Now we have a key problem: How do we get data to these networks of computing engines? Patel explains “One way is to architect the devices so that there are pockets of compute and data connected in a network, and there is some data close to each compute engine.” But, it turns out, you can only put small amounts of data next to the compute engines. Some data will always be far away, and even when data is close by, it may not be close to the specific compute engine that needs it at this moment. “You have a complex problem of figuring out how to effectively overlay a complex network of compute topology over another complex network of data storage topology. And it gets more complicated as the application’s compute and data patterns can change rather quickly, which requires reoptimizing these overlays,” says Patel.
The UW-Madison CRISP team is led by Patel, Kevin Eliceiri of the Laboratory for Optical and Computational Instrumentation, and Jane Li of the Department of Electrical and Computer Engineering. Li’s efforts are focused on the chips, the hardware that analyzes the data; Eliceiri works with application imaging; and Patel is responsible for “making the software play nicely with the hardware.”
The three scientists work together, for example, when Eliceiri, an expert on medical imaging, discovers through imaging that a patient’s cancer cells are metastasizing. Taking samples of the cells and analyzing them can take days, but this type of imaging in the future may allow a doctor performing surgery to tell instantly whether the cancer is spreading. Li hopes to develop new computer systems that can quickly and intelligently analyze the data, improving processing speed and performance. And Patel, through his work developing new software approaches to overcoming data bottlenecks, can develop methods to better organize the data and makes sure it can be analyzed with the new systems developed by Li. The five-year SRC grant provides funding for nine graduate students and two post-doctoral students who have a unique opportunity to work on this cross-disciplinary project.
The new research initiative by the CRISP team will speed our ability to process and manipulate the masses of data that accumulate, whether in our daily lives as we check our bank account balances or connect through social media, or in improving things like the analysis of a mass of cancer cells.