Ilia A. Lebedev


Research

Current Projects:

OpenRCL Machine - A Reconfigurable Computing System Supporting the OpenCL Paradigm.

This project is an ongoing effort to bring a comprehensive programming model to reconfigurable computing. By restricting the reconfigurable implementation to a collection of interconnected application-specific cores and a customizeable memory subsystem, this project seeks to drastically decrease the cost of development of reconfigurable computing applications, while largely preserving the performance and power efficiency advantages. More specifically, We aim to introduce a novel paradigm for reconfigurable computing. Instead of synthesizing an application directly to hardware, we introduce the concept of an "architectural template", and co-optimize the hardware and software via per-application customization of the resulting programmable manycore architecture. We believe that through specialization, we can achieve high energy efficiency in fixed-point throughput computing loads. To adopt a viable programming model, we integrate with an existing OpenCL toolchain.

PIN - Platform Interconnect Network

A low-latency, high-throughput interconnect network that allows large arrays of FPGAs to be networked efficiently in latency-critical applications. The PIN project places a strong emphasis on latency, seeking primarily to decrease the network radius in order to facilitate the implementation of a range of topologies over this host network.
PIN uses a modified 2D mesh topology, and structurally guarantees the impossibility of deadlock in the network. As PIN is an infrastructure project, we chose to keep the implementation simple, flexible, and transparent. The BEE3 platform is the target platform for this project. The intra-board links are sized at 32 bits, DDR, transmitted board-synchronously at a nominal frequency of 200 MHz (400 MTs). The data is deskewed using a tuner, which calibrates the links to optimize the link error rate. The link has been characterized at frequencies as high as 400 MHz.
Intra-board links are implemented using the Virtex 5 GTP hard blocks, and take advantage of the Xilinx soft IP implementing the Aurora protocol. Intra-board links boast a latency of less than 15ns with no handshake, and 25ns if a handshake is used. PIN offers an extremely low error rate per link, allowing the network to run for days with no expected errors. The i/o characteristics of inter-chip and inter-board interfaces available on the BEE3 have been characterized.

More information will be available soon.

BCM - High-Throughput Bayesian Computing Machine with Reconfigurable Hardware.

We take advantage of reconfigurable hardware to construct a high throughput Bayesian computing machine capable of evaluating probabilistic networks with arbitrary DAG (directed acyclic graph) topology. The BCM achieves high throughput by exploiting the ready availability of memory structures in the FPGA fabric, along with a number of other architectural features indicative of the FPGA platform.
The BCM architecture can be applied to a wide range of important algorithms, ranging from artificial inteligence to computational biology without modifications to the BCM architecture.
  • We have developed an innovative memory allocation scheme based on the maximal matching algorithm, allowing us to fully utilize all memory ports.
  • We deeply pipeline each processing node, taking advantage of latency tolerance inherent in our scheduling algorithm, to optimize the throughput of each operator.
  • We have been able to statically schedule the machine to optimize overall throughput.
The resulting machine exhibits nearly linear performance scaling with available FPGA resources. Significant speedup over CPU and GPU implementations, with a far lower power envelope, is achievable even with commodity FPGAs.
An academic publication surveying this work has been accepted to and will appear in the Eighteenth ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, February 21-23, 2010.

InfiniMesh - A Survey of Mesh On-Chip Interconnects In 90nm CMOS Standard Cells - [CS250 Course Project]

More information coming at the end of the Fall 2009 semester.

Hard Liquor HDL - A pure, distilled, and compact HDL - [CS164 Course Project]

More information coming at the end of the Fall 2009 semester.

Large Projects:

RAMP - Research Accelerator for Multiple Processors

RAMP logo
RAMP is a collection of research projects exploring rapid prototyping techniques for manycore architectures, conducted by a number of universities including, but not limited to University of California, Berkeley, Stanford University, and University of Texas at Austin.
The following explanation is taken from the RAMP website:

Processor architectures have crossed a critical threshold. Manufactures have given up attempting to extract ever more performance from a single core and instead have turned to multi-core designs.
However, little is known on how to build, program, or manage systems of 64 to 1024 processors, and the computer architecture community lacks the basic infrastructure tools required to carry out this research.
Fortunately, Moore's law has not only enabled these dense multi-core chips, it has also enabled extremely dense FPGAs. Today, one to two dozen cores can be programmed into a single FPGA. With multiple FPGAs on a board and multiple boards in a system, large complex architectures can be explored.
Such a system will not just invigorate multiprocessors research in the architecture community, but since processors cores can run at 100 to 200 MHz, a large scale multiprocessor would be fast enough to run operating systems and large programs at speeds sufficient to support software research. Hence, we believe such a system will accelerate research across all the fields that touch multiple processors: operating systems, compilers, debuggers, programming languages, scientific libraries, and so on.

Please visit the RAMP website.

Archived Projects:

FLINT - Flexible, Latency-Insensitive SparcV8, Optimized for the Virtex 5 Architecture - [CS194-6 Course Project]

RAMP logo FLINT (Flexible Latency-Insensitive SPARC) is generator for a family of SPARCv8 processor implementations optimized for Virtex5 FPGAs. This project was developed in the context of the Fall 2008 offering of UCB CS194-6, and was intended as an excercise in latency-insensitive desings on the FPGA platform.
More practically, the design was built upon (and helped shape) GateLib - an extensive hardware library and build envirnonment. Most notably, the project relied on RCBIOS - a flexible toolkit for general-purpose communication between an FPGA design and a PC host - to implement a number of powerful debugging features, including single-sep execution, CPU state readout, and debug access to memory.

FLINT was developed in close collaboration with Chris Fletcher, and Alex Williams.
All source code developed for this project has been licensed under the BPL license.


Other projects included a wireless videoconferencing solution on the Calinx2 FPGA platform, Analog "Oscilloscope" audio visualization, etc. More information coming soon.