# A 64-Element 28GHz Digital Beamformer Based on Tileable Synchronized Distributed-Beamforming Chiplets

*Abstract*—A tileable 16-element, 4-beam, 28GHz digital beamforming chiplet networks over power-efficient, low-latency Streaming-AIB data links to distribute digital beamforming processing across multiple chiplets. Spiral chiplet connectivity enables scaling to an arbitrary array size with a single beamforming chiplet design. A multi-chip digital PLL ensures digital clock synchronization between chiplets. Scalability is demonstrated with a prototype 4-chiplet, 64-element module. Over-the-air tests confirm accurate 64-element beampatterns, a noise figure of 20dB, a Streaming-AIB BER of 3E-12, and a low power consumption of 23mW per beam-channel.

Keywords—digital beamforming, delta-sigma modulator, chiplet, phased array, mm-wave receiver, synchronization, AIB.

## I. INTRODUCTION

Large-scale digital beamforming is an essential technology for emerging 6G, satellite communication, and defense applications. Digital beamforming delivers accurate beam patterns and rapid, accurate steering. Moreover, a critical advantage of digital beamforming is its ability to efficiently generate multiple, simultaneous beams using little additional power and area. Large arrays increase the effective aperture size, improve spatial selectivity, and enable more MIMO channels. A larger aperture and a more directed beam improve SNR and help overcome mm-wave path loss. Emerging systems require hundreds or thousands of elements. Arrays of this size are impossible with a single chip, and require a tiled, multi-chip approach. This work introduces a collaborative, scalable, chiplet architecture that enables an arbitrarily large digital beamforming array using a single chiplet design.

For moderate-sized arrays (16 elements or less), single-chip integration of the mm-wave frontend, ADCs, and DSP provides the most compact form factor [1-3]. Mounting a single chip in the center of the antenna array ensures low mm-wave routing loss. Recent work combines an mm-wave frontend and digital beamforming on a ceramic substrate with a patch antenna array to demonstrate a compact, fully integrated mm-wave-antennato-digital module [1]. However, a single-chip solution becomes impractical at larger array sizes due to mm-wave routing losses, flip-chip I/O congestion, and on-chip routing congestion.

Conventional multi-chip digital beamforming sends raw high-speed ADC data from multiple frontend chips to a central beamforming DSP chip [2], typically over high-speed SERDES links (Fig. 1(a)). JESD204B/C [2,4] is commonly used for these high-speed ADC to DSP links. Drawbacks of the conventional approach include: (1) high power consumption of long-distance (10+cm) high-bandwidth communication of raw ADC data



Fig. 1. (a) Conventional digital beamforming and (b) proposed distributed chiplet based beamforming over efficient Streaming-AIB data links.

across the entire module; (2) module routing complexity and congestion; (3) location of DSP beamforming processor, which may require multiple boards or 3D stacking to maintain the desired antenna pitch (4) limited chip I/O; (5) digital clock phase synchronization across the module.

This work introduces a new scalable chiplet tiling architecture (Fig. 1(b)) for mm-wave digital beamforming that addresses the challenges of large-scale tiled digital beamforming by introducing: (1) a single chiplet design with mm-wave processing, ADCs and DSP; (2) collaborative chiplet beamforming in which the chiplet array collectively implements digital beamforming; (3) a spiral tiling geometry enabling an arbitrary array size using identical chiplets; (4) low latency, high-speed Streaming-AIB chiplet communication, and (5) multi-chip digital clock synchronization.

We demonstrate these new techniques in a 4-chiplet, 64element module. The chiplet design leverages a compact mmwave frontend, continuous-time bandpass delta-sigmamodulator (CTBPDSM) ADCs, and area- and power-efficient mux-based bitstream processing. The tileable 28GHz 16element chiplet is fabricated in 40nm CMOS. Four chiplet dies are mounted on the backside of a custom 15-layer Kyocera LTCC substrate. Each die connects to 16 antenna elements from a 28GHz 8x8 antenna array on the top side of the substrate. Over-the-air tests confirm accurate 64-element beampatterns and a low power consumption of 23mW per beam-channel.

## II. SYSTEM ARCHITECTURE

## A. Distributed Collaborative Processing

In distributed digital beamforming, each chiplet collaborates in the beamforming process, eliminating the need for the central DSP processor, reducing the digital communication bandwidth and power consumption by an order



Fig. 2. (a) Distributed beamforming operation and (b) extendable spiral tiling.

of magnitude, and enabling an arbitrarily large array (Fig. 1(b)). The entire array is divided into sub-arrays simultaneously processed by individual chiplets (Fig. 2(a)). In our prototype, each chiplet is responsible for its own 16-element array subsection, including the mm-wave frontend, digitization, and partial-array beamforming. Partial beamforming data accumulates and sums at each daisy-chained chiplet, with each chiplet receiving partial digital beam data from the previous chiplet and transmitting data to the subsequent chiplet.

The mathematical linearity of beamforming operations enables the distribution of digital beamforming with no loss in accuracy or bandwidth (Fig. 2(a)). Each chiplet weights and sums the I/Q digital receive data from its 16-element array section, forming a local partial-array beam. The chiplet then time-aligns and sums the local partial beam with the partialarray beam data from the previous chiplet in the daisy chain. In turn, the chiplet forwards the accumulated partial-array beams to the next chiplet over a chiplet-to-chiplet link. The final chiplet in the daisy chain outputs fully formed beams.

## B. Streaming-AIB Chiplet Data Links

The chip-to-chip links in distributed beamforming benefit from emerging parallel chiplet data link schemes, such as AIB [4] and UCIe, which offer order of magnitude improvements in latency, efficiency, and complexity compared to traditional SERDES links like JESD204C [3]. However, existing parallel chiplet links are designed for digital applications with millimeter lengths and asynchronous data transfer [2]. We introduce a new Streaming-AIB architecture to exploit the energy efficiency of AIB data links and apply them to largescale distributed digital beamforming arrays. Streaming-AIB: (1) adds streaming to support the synchronous operation of digital beamformers, (2) extends the link distance by 10x over conventional AIB to support the 18mm inter-chip links required for the 28GHz array, and (3) adapts the physical I/O placement to routing restrictions on-chip and within the module.

#### C. Spiral Tiling Architecture for Distributed Processing

A unique spiral-shaped physical chiplet placement implements the accumulate-and-sum daisy chain described in Section II.A and extends to an arbitrary array size with a single chiplet design. As shown in Fig. 2(b), the orientation of each die in the spiral is rotated 90° relative to the previous die, so that the diagonally positioned TX and RX Streaming-AIB I/O pins remain aligned between chips. This scheme facilitates use of a single chiplet design using one set of I/O pins.



Fig. 3. Data communication power for conventional beamforming compared to distributed chiplet beamforming with differing numbers of beams.

#### D. Power Advantage of Distributed Processing

Distributed digital beamforming delivers an order of magnitude saving in the power consumption of the digital data communication because the chiplet-to-chiplet data link: (1) avoids communication of redundant information and (2) takes advantage of more efficient chiplet AIB data communication.

Fig. 3 compares the digital communication power consumption for a conventional beamformer with JESD204 links to a central beamforming processor with the Streaming-AIB link power consumption of the distributed architecture. We use a chiplet size of 16 elements, a bandwidth of 100MHz, and an oversampling ratio of 2.5. We assume a nominal 10-bit ADC resolution but also expand the link bit-width in the distributed architecture to account for the array SNR gain. Although the conventional approach can support an arbitrarily large number of simultaneous beams, in practice, the number of simultaneous beams is typically much smaller than the number of elements. We use energy per bit of 1.4pJ and 5pJ for Streaming-AIB and JESD204, respectively. With 64 elements, the conventional system has a digital communication power of 1.6W, dominating the total system power. On the other hand, the proposed approach has an equivalent communication power consumption of only 54mW for a beam space of 4.

## III. CHIPLET INTERFACE AND SYNCHRONIZATION

## A. Streaming-AIB Chip-to-Chip Data Link

The 1.4pJ/bit, 13Gbps Streaming-AIB bus comprises 13 parallel 1Gbps data lanes accompanied by a forwarded 1GHz clock (Fig. 4). Each TX data lane uses a 4:1 serializer to timedivision-multiplex four 250Mbps beam-space outputs into a 1Gbps output lane. A 250MHz forwarded clock, also used for clock synchronization, indicates the start of the data packet. The RX interface re-clocks the data with the forwarded 1GHz clock and a 1:4 de-serializer separates the beam-space outputs. To compensate for link latency, downstream chiplets delay their beams in a FIFO before summing with the received beams. The FIFO delay, which accounts for processing and communication delays, is one-time calibrated with on-chip test structures. The summed partial-array I and Q beams feed to the next chiplet over the Streaming-AIB interface. Streaming-AIB easily interfaces with additional downstream digital processing, which we demonstrate on a Xilinx Kintex FPGA. A transmit driver boost mode enables the final chiplet in the chain to relay processed beams over a 19.3cm channel to an off-substrate Xilinx Kintex FPGA, eliminating high-power external buffers.



Fig. 4. The 1GHz 13-lane Streaming-AIB links time-multiplex four partial beams. On-chip FIFOs time-align beam data before summation.



Fig. 5. A multi-chip digital PLL time-aligns local clocks across chiplets.

## B. Multi-Chip PLL for Digital Clock Synchronization

Multi-chip digital clock synchronization is a crucial challenge in multi-chip digital beamforming. Phase synchronized digital chiplet clocks are required to ensure time alignment and I/Q polarity alignment. H-tree module routing evenly distributes a 4GHz chiplet reference clock, which clocks the ADCs and DSP. Chiplets derive 2GHz, 1GHz, and 250MHz clocks from this reference. Phase synchronization errors between chiplets arise when clock dividers initialize in different phase states after a reset, which may not reach all chips simultaneously due to process variation. To solve the synchronization challenge, we introduce a multi-chip digital PLL (Fig. 5) and phase-align local clocks across chiplets.

Chiplets synchronize in pairs, with one acting as a leader and the other as a follower. In the first chiplet, the leader generates divided clocks from the 4GHz reference and forwards the 250MHz to the follower. The follower adjusts a shift register delay (in the local 4GHz domain) to compensate the channel delay and match the phase of the received clock with that of the original 250MHz leader clock. To find the delay, the follower loops back this delayed 250MHz to the leader, which adds a shift register delay to the received looped-back clock. A phase comparator, clocked at 4GHz, measures the lead/lag error direction and magnitude between the leader 250MHz and the loop-back 250MHz. A phase counter calculates the average error and corrects the delay setting. Once the 250MHz clock is aligned, the clock generator produces phase-aligned 2GHz, 1GHz and 500MHz clocks from the 4GHz reference. Each pair of chiplets repeats the same process to measure and compensate for the inter-chiplet channel delay. Continuous PLL operation allows the system to respond to glitches in the source clock.

## IV. BEAMFORMING CHIPLET DESIGN DETAILS

## A. Compact mm-Wave-to-Baseband Channel Stripe

To maintain low area and low mm-wave module routing losses, each chiplet uses a compact channel stripe design (Fig. 6). An inductor-less LNA reduces inter-channel inductive coupling and provides a compact layout. A passive mixer down



Fig. 7. (a) ADC alias suppression and (b) MUX-based bitstream processing.

converts to a 1GHz IF with a 27GHz PLL providing the LO signal. Two CTBPDSM sub-ADCs provide inherent anti-alias filtering and an easy-to-drive resistive input [1]. Clocking the two sub-ADCs on opposite 4GHz clock edges and summing forms a two-tap FIR filter that suppresses crosstalk from the ADC sampling clock (Fig. 7(a)). Furthermore, combining both sub-ADCs provides 6dB digital signal gain and 3dB SNR improvement. Operating with a single ADC reduces power.

## B. Power and Area Efficient Mux-Based Beam Processing

Mux-based bitstream processing (Fig. 7(b)) implements partial digital beamforming on the un-decimated bitstream of an oversampling ADC output, reducing power and area compared to operating on the full word width [1]. The digital processing downconverts the IF signal to baseband quadrature signals, where digital multiplication with weights phase-rotates the I and Q signals. Summing the phase-rotated I and Q components across the 16 antenna channels on each chip forms a 16-element partial-array beam.

## V. PROTOTYPE AND MEASUREMENTS

The tileable 16-element chiplet is fabricated in 40nm CMOS. To demonstrate scaling, four dies are mounted on the backside of a custom 15-layer Kyocera LTCC substrate (Fig. 8). The substrate's top side is an 8x8 array of 28GHz aperture-coupled microstrip patch antennas. Each chiplet mounts in the center of a 4x4 sub-array to minimize mm-wave-to-chip routing loss. A ball-grid array on the back of the module mounts to a carrier PCB, which supplies a 4GHz digital clock, a 100MHz PLL reference, and power. An FPGA PCB connects to the carrier PCB for high-speed data acquisition.

Measured over-the-air 64-element beampatterns agree closely with simulated ideal results (Fig. 9). The measured Streaming-AIB chip-to-chip-to-FPGA bit error rate is 3E-12 at the maximum 1Gbps data rate over a channel consisting of an 18mm chip-to-chip link, and a 193mm module-to-FPGA PCB link. The hot (333K) and cold (77K) source technique is used to measure the single-channel antenna-to-digital noise figure of 20dB. The total chiplet power consumption is 1497mW, with



Fig. 8. (a) Die photo and (b) both sides of 15-layer Kyocera LTCC substrate. 275mW from digital power and 18mW from the Streaming-AIB I/O. Table 1 compares this work to state-of-the-art digital beamformers and tiled beamformers. The power/element/beam-space output is 23mW in low-power mode and 29mW in 2x sub-ADC mode.

## VI. CONCLUSION

This work shows the potential of tiled distributed digital beamforming for applications demanding large array sizes, high efficiency, compact form factor, and ease of module and system integration. A new scalable digital beamforming architecture supports arbitrarily large arrays through distributed collaborative beamforming across tiled chiplets. Distributed beamforming combined with a power-efficient Streaming-AIB interface balances computational load throughout the array and is extendable to larger arrays using a single chiplet design. Streaming-AIB delivers high inter-chiplet bandwidth (13Gbps) with low latency, high reliability (BER < 3E-12), and low energy consumption (1.4pJ/bit). A digital PLL solves the critical problem of multi-chip digital clock synchronization. Four tiled chiplets on an LTCC substrate deliver an efficient antenna-to-digital solution with minimal size and weight.

#### AKNOWLEDGMENT



Fig. 9. Measured and simulated beam patterns of 64-element beamformer.

#### References

- R. Lu, et al., "A 16-Element Fully Integrated 28GHz Digital Beamformer with In-Package 4×4 Patch Antenna Array and 64 Continuous-Time Band-Pass Delta-Sigma Sub-ADCs," 2020 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), Los Angeles, CA, USA, 2020, pp. 343-346.
- [2] C. Hornbuckle, et al., "Low-Power K/Q-Band Digital Phased Array Chiplet," 2022 IEEE International Symposium on Phased Array Systems & Technology (PAST), Waltham, MA, USA, 2022.
- [3] J. McSpadden et al., "MIDAS Wideband mmW Digital Tile," 2022 IEEE International Symposium on Phased Array Systems & Technology (PAST), Waltham, MA, USA, 2022, pp.
- [4] D. Jones, "JESD204C Primer: What's New and in It for You", ADI Analog Dialog, June 2019.
- [5] D. Kehlet, "Accelerating Innovation Through A Standard Chiplet Interface: The Advanced Interface Bus (AIB)", Intel White Paper.
- [6] D. Dosluoglu, et al., "A Reconfigurable Digital Beamforming V-Band Phased-Array Receiver," ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC), Milan, Italy, 2022, pp. 493-496.

| Table 1. Comparison with published digital beamformers. |                       |                      |                       |                          |       |                               |                                  |
|---------------------------------------------------------|-----------------------|----------------------|-----------------------|--------------------------|-------|-------------------------------|----------------------------------|
|                                                         | This Work             |                      | R. Lu [1]             | D. Dosluoglu [7]         |       | C. Hornbuckle [2]             | J. MacSpadden [5]                |
| Technology                                              | 40nm CMOS             |                      | 40nm CMOS             | 28nm CMOS                |       | 12nm FinFET                   | 12nm FinFET                      |
| Frequency [GHz]                                         | 28                    |                      | 28                    | 47 - 57                  |       | 18 - 50                       | 18 - 50                          |
| Phase Shift Res.                                        | 10 bits               |                      | 10 bits               | 10 bits                  |       |                               |                                  |
| Generated Beams                                         | 4                     |                      | 4                     | 1                        |       | Variable                      | Variable                         |
| <b>Beam-space Outputs</b>                               | 4                     |                      | 1                     | 2                        |       | Variable                      | Variable                         |
| Elements per IC                                         | 16                    |                      | 16                    | 4                        |       | 16                            | 16                               |
| Array Size                                              | 64                    |                      | 16                    | 4                        |       | 64                            |                                  |
| <b>On-Package Antenna</b>                               | Yes                   |                      | Yes                   | No                       |       | Yes                           | Yes                              |
| ADC SNDR [dB]                                           | 1x<br>ADC/CH          | 2x<br>ADC/CH         |                       | Comparator               | CTDSM | 49 (sim.)                     | 41.7 (sim.)                      |
|                                                         | 44 (est.)             | 47 (est.)            |                       | 9.1-10.2                 | 21    |                               |                                  |
| Tiling Support                                          | Yes                   |                      | No                    | No                       |       | Yes                           | Yes                              |
| Multi-chip Clk Sync.                                    | Yes                   |                      |                       |                          |       | No                            | No                               |
| RF Measure Input                                        | Antenna Over-the-Air  |                      | Antenna Over-the- Air | Probe                    |       |                               |                                  |
| RX BW [MHz]                                             | 100                   |                      | 100                   | 400                      |       | 2000                          | 2000                             |
| Element NF [dB]                                         | 20<br>(Antenna-to-BB) |                      | 19<br>(Antenna-to-BB) | 7.8-12<br>(RF Front End) |       | 13.2 (sim.)<br>(RF Front End) | 15.1–17 (sim.)<br>(RF Front End) |
| Die Area [mm <sup>2</sup> ]                             | 7.73                  |                      | 7.73                  | 0.72                     |       | 95 (TX+RX)                    | 100 (TX+RX)                      |
| Area/Element [mm <sup>2</sup> ]                         | 0.48                  |                      | 0.48                  | 0.18                     |       | 2.97 (TX+RX)                  | 6.25 (TX+RX)                     |
| Power [mW]                                              | 1x<br>ADC/CH<br>1497  | 2x<br>ADC/CH<br>1847 | 2832                  | 384                      |       | 2880 (sim. RX)                |                                  |
| Power/Element [mW]                                      | 94                    | 115                  | 177                   | 96                       |       | 180 (sim. RX)                 |                                  |
| Off-Chip Data Link                                      | Streaming-AIB         |                      |                       |                          |       | SERDES                        | SERDES                           |
| Off-Chip Data Rate                                      | 13 lanes x 1Gbps      |                      |                       |                          |       | 6 lanes x 30Gbps              | 4 lanes x 28Gbps                 |
| Data Link BER                                           | 3E-12                 |                      |                       |                          |       |                               |                                  |