Configurable Logic Block Design, Verification and Optimization

In this project, I with my teammate, Linh Chu, designed a 16-bit Configurable Logic Block (CLB), a circuit that can model any 4-input combinational logic function. It is an essential building block that provides flexibility for Field Programmable Gate Arrays (FPGAs). The circuit was developed in 45nm Salicide 1.0V/1.8V 1P 11M technology supported in Cadence.

The CLB is composed of four key components:

All of which are integrated into the final Configurable Logic Block.

16:1 Lookup Table (LUT)

The LUT accepts 4 binary inputs and stores 16 output values—one for each of the 24 possible input combinations. The inputs act as selectors that choose the corresponding pre-stored output. The beauty of the LUT is its configurability: by reprogramming the pre-stored array, the LUT can implement any 4-input logic function.

2:1 Multiplexer schematic
2:1 Multiplexer schematic
2:1 Multiplexer symbol
2:1 Multiplexer symbol

We start with the design of a 2:1 MUX (similar to LUT, it selects one of two outputs based on a select line). A LUT can be constructed by connecting 15 MUXes in a tree structure, as shown below. Two CMOS logic inverters at the output serve as a buffer, restoring the voltage after pass-transistor logic degradation in the MUXes

16-input LUT schematic
16-input LUT schematic
LUT symbol
LUT symbol

SRAM Array

The LUT’s 16 output values are stored in an SRAM array. Each cell is a standard 6T design, controlled via Bit lines (BL / BL) and a Word line (WL) through precharge and discharge. See this tutorial for a detailed walkthrough of SRAM operation.

6T SRAM cell
6T SRAM cell
SRAM cell symbol
SRAM cell symbol

To make the cell easier to integrate, we wrapped it with a small peripheral interface exposing five I/O signals:

6T SRAM cell with peripherals
6T SRAM cell with peripherals
SRAM cell symbol
SRAM symbol

Sixteen identical cells are tiled into the final 16-bit array:

16-bit SRAM array
16-bit SRAM array

Serial In Parallel Out (SIPO) Register

Loading 16 bits into the SRAM in parallel would require 16 dedicated input pins. However, for simplicity and minimal layout area, we instead use a SIPO shift register to serialize the data: it accepts one bit per clock cycle and outputs all 16 bits simultaneously once shifting is complete.

D Flip-Flop schematic
D Flip-Flop schematic
D Flip-Flop symbol
D Flip-Flop symbol

The SIPO is built by cascading 16 D flip-flops. Each clock edge shifts the data one stage forward; the full word is available after 16 cycles

Shift register animation SIPO register visualization SIPO register schematic

An example of 4-bit SIPO register [source]

Non-overlapping Clock Generator

The SIPO requires both CLK and CLK (see the D flip-flop schematic), so we design a non-overlapping clock generator to produce two complementary signals.

Non-overlapping clock generator schematic
Clock generator schematic
Clock generator symbol
Clock generator symbol

Configurable Logic Block (CLB)

With all four modules complete, they are wired together into the full CLB: the SIPO feeds serial configuration data into the SRAM, the SRAM drives the LUT’s data inputs, and the non-overlapping clock coordinates timing across the entire pipeline.

Complete CLB schematic
Complete CLB schematic

Performance Metrics

Post-layout benchmarking on the full design yielded the following results:

Metric Value
Max operating frequency 1 GHz
Loading energy 3.212 × 10⁻³ nJ
Active energy 1.895 × 10⁻³ nJ

Loading energy: energy to shift all 16 configuration bits into the SRAM. Active energy: energy to cycle through all 16 input combinations (i.e., from 0x0 to 0xF) at max frequency.

That covers the full design flow for our 16-bit CLB. Details on transistor sizing, schematic verification, timing-hazard resolution, and circuit-level optimization are available upon request — feel free to reach out 😀