Title: Concepts for a FLASH-Based Parallel Computer
Abstract: In-memory computing is exploding in popular recently as the volume, variety and velocity of big data (so-called 3 Vs) increases exponentially, heralding a golden age of parallel computing which has lead to a price war for cluster computing services in the cloud. But since DRAM makes up more than half the cost of the average server, it is obvious no matter what clever architecture we may come up with, memory cost will severely limit the availability of in-memory computing in the foreseeable future. The only option that is both technically feasible and commercial available non-proprietarily at this time is NAND FLASH. Though very much slower than DRAM, I believe it is possible to make FLASH work for a parallel computer (but not a serial computer), due to the fact that the FLASH will be at the end of a network, and the runtime software already had to solve the network latency problem of the parallel computer in the first place. Such a computer could be extremely efficient, up to 10 times cheaper than a DRAM-based computer with the same capacity and instruction throughput, and consume 5 times less power. Unfortunately the trend of multi-level cell technologies (MLC, TLC) is making FLASH slower and slower at the system level, so a FLASH-based architecture would soon come to a performance dead end. However the latency problems are not inherent physics, but rather due to the way one has to operate the chip to deal with noise and device variations resulting from tighter tolerances. It may be possible to change the chip interface to get around these limitations, in effect moving certain operations into the controller DSP where they can be done in parallel. But even if successful, all this will take quite some time to do, and there is a chick-and-egg problem of what market incentive would motivate the FLASH manufacturers to do the R&D to create the new chip interface in the first place. This talk proposes a roadmap whereby we can start with a relatively conventional cluster computer using generous amounts of DRAM and today’s MLC/TLC FLASH, and evolve compatibly towards one using tiny amounts of DRAM, giving us time to develop the necessary software and new FLASH chip. The software for small DRAM footprints, and the new FLASH chip, works equally well for both x86 architectures as well as the new proposed architecture. The talk concludes with a discussion for how this all might plausibly play out in the marketplace, and why I believe we ought to work on it.
Bio: Peter Hsu was born in Hong Kong and came to the United States at age 15. He received a B.S. degree from the University of Minnesota at Minneapolis in 1979, and the M.S. and Ph.D. degrees from the University of Illinois at Urbana-Champaign in 1983 and 1985, respectively, all in Computer Science. His first job was at IBM Research in Yorktown Heights from 1985-1987, working on superscalar code generation with the 801 compiler team. He then joined his ex-professor at Cydrome, which developed an innovative VLIW computer. In 1988 he moved to Sun Microsystems and tried to build a water-cooled gallium arsenide SPARC processor, but the technology was not sufficiently mature and the effort failed. He joined Silicon Graphics in 1990 and designed the MIPS R8000 TFP microprocessor, which shipped in the SGI Power Challenge systems in 1995. He became a Director of Engineering until 1997, then left to co-found his own startup, ArtX, best known for designing the Nintendo GameCube. ArtX was acquired by ATI Technologies in 2000, which has since been acquired by AMD. Peter left ArtX in 1999 and worked briefly at Toshiba America, then became a visiting Industrial Researcher at the University of Wisconsin at Madison in 2001. Throughout the 2000’s he consulted for various startups, and attended the Art Academy University and the California College of the Arts in San Francisco where he learned to paint oil portraits, and a Paul Mitchell school where he learned to cut and color hair. In the late 2000’s he consulted for Sun Labs, which lead to discussions about the RAPID research project, a power-efficient massively parallel computer for accelerating data analytics in the Oracle database. Peter joined Oracle Labs as an Architect in 2011. He became an independent researcher in early 2016.