# Final Report

## Low-Cost Object Detection RF CMOS Sensor Development for Active Safety Systems

HOSSEIN HASHEMI GORDON S. MARSHALL EARLY CAREER CHAIR ASSISTANT PROFESSOR

Department of Electical Engineering - Electrophysics Viterbi School of Engineering University of Southern California

SPONSOR: DEPARTMENT OF TRANSPORTATION (METRANS CENTER)

February 2009

## Contents

| 1        | Summary                                                                                                                                                  | <b>2</b>            |
|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|
| <b>2</b> | The RF-Multibeam Spatio-Temporal RAKE Transceiver Architecture                                                                                           | 3                   |
| 3        | Code Requirements                                                                                                                                        | 7                   |
| 4        | Baseband Implementation - Analog versus Digital4.122-29GHz Commercial Vehicular-Radar Application Space - Dynamic-Range Analysis4.2ADC Power Consumption | <b>8</b><br>9<br>10 |
|          | <ul> <li>4.3 Analog Correlator Power Consumption</li></ul>                                                                                               | 13<br>16<br>17      |
| 5        | A 4-Channel 24-26GHz RF-Multibeam ST-RAKE Transceiver for Vehicular-Radar<br>Applications in 90nm CMOS                                                   | 19                  |
| 6        | Summary                                                                                                                                                  | 35                  |

## 1 Summary

This report documents the research results and accomplishments of the program titled "Low-Cost Object Detection RF CMOS Sensor Development for Active Safety Systems" that was sponsored by the Department of Transportation through the METRANS Center at USC.

The main accomplishment has been the design and implementation of a CMOS radar chip with a newly proposed Multi-beam Spatio-Temporal RAKE transceiver architecture that attempts to *harness* multi-path reflections to gather more information about the environment around the car. The principle of operation as well as simulated and measured results from a 4-channel 24-26GHz 90nm CMOS prototype will be presented. The chip forms 4 simultaneous beams in one (horizontal) dimension. In the transmit mode, a code sequence from an orthogonal set of codes is sent in each beam angle. In the receive mode, all reflected codes coming from each beam angle are detected and discriminated using a parallel set of analog correlators and integrators (for signal-to-noise enhancement). The correlated signals are then converted to digital data streams using off-chip Analog-to-Digital Converters (ADC) for further processing (object recognition, scene reconstruction, etc.) in a Digital Signal Processor (DSP) that is responsible for code generation and timing circuitry as well. This CMOS radar chip can be integrated in a low-cost vehicular sensor for short-range object detection for blind-spot detection, side and rear impact sensing, parking assistance and pedestrian detection.

The research was carried out by Dr. Harish Krishnaswamy, who received his PhD from USC in January 2009 under the supervision of the PI, as a major component of his dissertation<sup>1</sup>.

Envisioned follow-up research includes integration of the RF CMOS radar chip with an antenna array and a DSP in a test-bed that can be programmed using a computer for algorithmic and imaging measurements in controlled and realistic environments.

<sup>&</sup>lt;sup>1</sup>Most of this report is adopted from the PhD dissertation of Harish Krishnaswamy titled "Architectures and Integrated Circuits for RF and mm-Wave Multiple-Antenna Systems on Silicon", Chapter 5: An RF-Multibeam Spatio-Temporal RAKE Transceiver Architecture for Radar.

## 2 The RF-Multibeam Spatio-Temporal RAKE Transceiver Architecture



Figure 1: The RF-multibeam spatio-temporal RAKE architecture. Separate I/Q paths in the baseband section are omitted for simplicity.

Fig. 1 depicts the proposed RF-multibeam spatio-temporal RAKE architecture for radar.

At the center of the architecture is an  $N \times B$  RF multibeam matrix. The function of this matrix is to form multiple (B) fixed simultaneous narrow beams with different spatial orientations, each with its own input, across its N outputs. Several multibeam-matrix architectures are known in the literature and are discussed later. It should be noted that the multiple beams need not be fixed in spatial orientation - some amount of steering on each beam can be incorporated to ensure full spatial coverage.

To illustrate the principle of operation of the architecture, a timing diagram is provided in Fig. 2. At transmit time, *B* orthogonal codes are sent along the different beams. In Fig. 2, this period is represented by the signal TX turning on. During this period, the beam inputs of the RF multibeam matrix are excited by orthogonal codes (represented in Fig. 2 by the rather trivial code sequences 110 and 010) modulated onto the carrier signal. In the absence of multipath, each transmitted code would return only along the direction in which it was sent. However, in the presence of multipath, as is shown in Fig. 1, codes may return along other directions as well. At receive time, represented by the signal RX turning on, the architecture "listens" for the return of all codes along all beams. This is achieved by employing a bank of correlators matched to all the transmitted codes on each beam input. This enables the architecture to isolate LoS reflections as well as multipaths for enhanced scene reconstruction. The time interval between the transmit and receive periods is set by the desired time of flight (*i.e.*, LoS/multipath distance) that the radar is looking for. This time interval, also called the radar range bin, is swept sequentially for complete spatial coverage.

The architecture may also be described in the language of communication theory<sup>2</sup>. Fig. 3 depicts a communication-theoretic description of the RF-Multibeam ST-RAKE architecture. While the transmitter and the receiver are co-located for radar, and share most of their resources in the

<sup>&</sup>lt;sup>2</sup>It should be noted that boldfaced letters are used to signify matrices and column vectors. The superscript H represents the complex transpose of a matrix or vector.



Figure 2: Timing diagram for the operation of the RF-Multibeam Spatio-Temporal RAKE architecture.

RF-Multibeam ST-RAKE architecture, they are represented separately in Fig. 3 as is typically done in communication systems. Assuming N transmitting and receiving antennas, the channel matrix  $\boldsymbol{H}$  is an  $N \times N$  matrix whose entries represent the channel's transfer function for the corresponding transmitting-receiving antenna pair. In communication theory, the environment is typically assumed to be an idealized, rich scattering environment, and the entries of  $\boldsymbol{H}$  are independent, identicallydistributed Gaussian random variables [2]. To describe the RF-Multibeam ST-RAKE architecture, we follow the representation of [2], where a physical channel model is constructed from its constituent multipaths. Specifically,

$$\boldsymbol{H} = \sum_{l=1}^{L} \beta_l \boldsymbol{a}(\theta_{R,l}) \boldsymbol{a}^H(\theta_{T,l}), \qquad (1)$$

where L is the number of multipaths and  $\beta_l$  is the complex gain associated with the  $l^{th}$  multipath.  $\theta_{T,l}$ and  $\theta_{R,l}$  are the phase progressions associated with the angles of transmission at the transmitter and incidence at the receiver respectively for the  $l^{th}$  multipath.  $\boldsymbol{a}(\theta) = [1, e^{-j\theta} ... e^{-j(N-1)\theta}]$ . The multipath matrix  $\boldsymbol{M}$  is an  $N \times B$  matrix given by

$$(M)_{p,q} = e^{-j(p-1)\theta_{M,q}},$$
 (2)

where B is the number of beams formed by the multibeam matrix and  $\theta_{M,q}$  is the phase progression associated with the  $q^{th}$  beam. The input **X** is the vector  $[x_1, x_2, \dots, x_B]$ , where  $x_1, x_2, \dots, x_B$  are the B



Figure 3: Communication-theoretic description of the RF-Multibeam Spatio-Temporal RAKE architecture. While the transmitter and the receiver are typically co-located for radar, and share most of their resources in the RF-Multibeam ST-RAKE architecture, they are represented separately in this figure as is typically done in communication systems.

orthogonal codes that are sent along the individual beams. Ignoring channel noise, the final output  $\boldsymbol{Y}$  is given by

$$Y = M^{H} HMX$$

$$= M^{H} H \times \sum_{m=1}^{B} x_{m} \boldsymbol{a}(\theta_{M,m})$$

$$= M^{H} \sum_{l=1}^{L} \sum_{m=1}^{B} \beta_{l} \boldsymbol{a}(\theta_{R,l}) x_{m} AF_{v}(\theta_{T,l} - \theta_{M,m}).$$

$$(\boldsymbol{Y})_{n} = \sum_{l=1}^{L} \sum_{m=1}^{B} \beta_{l} x_{m} AF_{v}(\theta_{T,l} - \theta_{M,m}) AF_{v}(\theta_{M,n} - \theta_{R,l}).$$
(3)

 $AF_{v}(\theta) = 1 + e^{j\theta} ... e^{j(N-1)\theta}$  is the complex-voltage array factor. The goal of the ST-RAKE radar is to determine the complex multipath gains  $\beta_{l}$ . From (3), it is easy to see that  $\beta_{l}$  is determined by correlating  $(\mathbf{Y})_{n}$  with  $x_{m}$ , with n and m chosen such that  $\theta_{R,l} = \theta_{M,n}$  and  $\theta_{T,l} = \theta_{M,m}$ . The bank of correlators in the ST-RAKE architecture perform these correlations for all m = 1..B and n = 1..N.

The architecture is called an RF-multibeam spatio-temporal RAKE because it is inspired by the RAKE-receiver architecture for communication in multipath-ridden channels [3]. The original RAKE architecture was proposed for single-antenna systems and employed "RAKE fingers" only in the temporal domain to coherently combine different multipaths. In the proposed transceiver architecture, the multiple beams constitute RAKE fingers in the spatial domain, and the correlators that sequentially correlate the incoming signal with delayed versions of the code templates to determine the target's distance can be thought of as RAKE fingers in the temporal domain <sup>3</sup>.

While the RF-multibeam spatio-temporal RAKE architecture has been illustrated here in the context of pulse-based radar, it may be extended to continuous-wave (CW) radar systems as well. The main advantages of pulse-based radar systems arise from their time-gated nature. The isolation between the transmit and receive sections is enhanced because of the different transmit and receive times. Multipath resolution is eased due to the difference in arrival times of the different LoS and multipath reflections. Finally, the isolated transmit and receive times imply that it is possible for

<sup>&</sup>lt;sup>3</sup>Indeed, hardware and power resources permitting, multiple correlators for each code and beam can be implemented that *simultaneously* correlate the received signal with different delayed versions of the code template. This would set up multiple simultaneous RAKE fingers in the temporal domain.

the transmit and receive circuitry to share the same antenna array through T/R switches rather than expensive circulators. As a result, the focus will be on pulse-based radar systems suitable for commercial vehicular-radar application space.



Figure 4: 2D RAKE receiver proposed for CDMA systems [4]. The slice corresponding to the  $i^{th}$  user is depicted. N antennas are employed and  $L_i$  multipaths are resolved for the  $i^{th}$  user.

The concept of a spatio-temporal RAKE has been proposed earlier for CDMA base-station receivers. In [4], the authors propose a 2D RAKE (Fig. 4), where, for each user, the multipaths are coherently combined by the implementation of multiple fingers, each employing beamforming tuned to the direction of arrival of that multipath and time-delayed correlation tuned to the delay of that multipath. When performed across multiple users, each with a unique spreading code, the architecture bears a strong resemblance to the proposed architecture. In the 2D RAKE, the settings of the beamformers and the time delays are determined by a channel-estimation block that determines the time and angle of arrival of each multipath. The radar presented architecture, in effect, tries to perform this channel estimation. The 2D RAKE is sometimes also called the beamformer RAKE. Other variants, such as the space-time maximal-ratio-combining RAKE, decoupled space-time RAKE, joint space-time RAKE and space-time eigen RAKE, have also been studied [5].

## 3 Code Requirements

The orthogonal waveforms or codes to be employed on each beam of the RF-Multibeam ST-RAKE architecture must satisfy certain criteria. A sharp autocorrelation profile is essential for maximum range resolution. From this point of view, Barker codes are optimal in terms of the peak-to-sidelobe ratio for any set of truncated coding sequences and hence are popular for radar [1]. In addition, the codes on the different beams must have low cross correlations *for all time shifts* to minimize interference between the codes. In practice, additional signal processing may be required on the received and correlated data to account for finite cross correlations.



Figure 5: (a) Cross-correlation properties of modified bipolar Walsh-Hadamard sequences of length 16. The diagonal matrix used to modify the Walsh-Hadamard matrix has the following diagonal elements: -1 1 1 1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 (b) Auto-correlation properties of the modified Walsh-Hadamard sequences.

The design of appropriate code families for the RF-Multibeam ST-RAKE architecture is a topic that is beyond the scope of this project and is a worthy topic for future research. It is anticipated that the hardware implementation that is discussed later in this report will serve as a testbed for experimentation with different codes.

### 4 Baseband Implementation - Analog versus Digital



Figure 6: (a) A simplified single-antenna, single-correlator, pulse-based radar with analog baseband processing. (b) Single-antenna, single-correlator, pulse-based radar with digital baseband processing. (c) A single-antenna, pulse-based radar with multiple correlators for simultaneous scanning of multiple range bins.

Baseband signal processing in radar involves the correlation of the received and downconverted signal with a delayed version of the transmitted baseband template. This correlation must be performed, either sequentially using a single correlator (Fig. 6(a) and (b)) or in parallel using multiple correlators (Fig. 6(c)), for different delay values to search for targets at different distances. For each distance, or range bin, often multiple pulses must be transmitted, received, correlated and accumulated to achieve sufficient SNR. Once this data has been collected for different range bins, additional signal processing may also be required, such as background or clutter removal.

Tradeoffs in digital baseband design have been examined in the context of ultra-wideband impulse systems for other applications in the past [13], [14]. In this section, the problem of partitioning the baseband processing across the analog and digital domains is examined for single-antenna and phasedarray radar systems for simplicity. However, the conclusions are applicable to the RF-multibeam ST-RAKE architecture as well. Analog baseband signal processing in radar (Fig. 6(a)) involves the use of an analog correlator - essentially a multiplier or mixer followed by an integrator. The resultant integrated signal is then digitized for further radar signal processing. On the other hand, digital baseband processing involves direct digitization of the downconverted signal (Fig. 6(b)), and all signal processing is done in the digital domain.

The advantage of some degree of analog pre-processing is that it alleviates some performance requirements of the ADC that follows it, such as speed and/or dynamic range. Digital signal processors are typically more flexible and reconfigurable than their analog counterparts. However, this advantage is not particularly valuable for radar, where the signal processing required is a simple matched filter. To quantify the trade-offs between the two approaches, it is necessary to determine the dynamic-range requirement for the vehicular-radar application space.

| Table 1. FOO-mandated specifications for the venicular-radar application space [1 |                                                                       |  |  |  |  |  |
|-----------------------------------------------------------------------------------|-----------------------------------------------------------------------|--|--|--|--|--|
|                                                                                   | Bandwidth contained between 22 and 29GHz                              |  |  |  |  |  |
| Bandwidth                                                                         | Center frequency $> 24.075$ GHz                                       |  |  |  |  |  |
|                                                                                   | Signal bandwidth $> 20\%$ or 500MHz                                   |  |  |  |  |  |
| Signal level                                                                      | Emissions $< -41.3$ dBm EIRP over 1MHz, 1ms average                   |  |  |  |  |  |
|                                                                                   | Peak emissions $< 0$ dBm EIRP in 50MHz around peak-emission frequency |  |  |  |  |  |

Table 1: FCC-mandated specifications for the vehicular-radar application space [15].

Table 2: Application-specific performance requirements for the vehicular-radar application space [?].

| Range            | Minimum ranging distance $= R_{min} = 15$ cm                      |
|------------------|-------------------------------------------------------------------|
|                  | Maximum ranging distance $= R_{max} = 30$ m                       |
| Range Resolution | 5cm                                                               |
| Target Size      | Minimum RCS = $\sigma_{min} = 0.1 \text{m}^2$ (plastic 1/2" pipe) |
|                  | Maximum RCS = $\sigma_{min} = 100 \text{m}^2$ (automobile)        |

### 4.1 22-29GHz Commercial Vehicular-Radar Application Space - Dynamic-Range Analysis

The Federal Communications Commission (FCC) has opened up 7GHz of bandwidth from 22 to 29GHz for the deployment of UWB vehicular-radar sensors [15], intended for driver-assistance functions such as blind-spot detection, parking assistance and pre-crash detection. Single-antenna radar sensors for these applications have been developed in SiGe processes [1]. The FCC-mandated specifications and application-specific performance requirements are summarized in Tables 1 and 2.

The ranging and target-size specifications, in conjunction with the radar equation, determine the dynamic range required for the application. The radar equation enables us to compute the power received by the radar for a given transmitted power, target distance, target RCS, frequency of operation and radar antenna gain, and is given by

$$P_{RX} = \frac{P_{TX}G^2\lambda^2\sigma}{(4\pi)^3R^4},$$
(4)

where R is the distance of the LoS target,  $\sigma$  is its RCS,  $\lambda$  is the free-space wavelength corresponding to the frequency of operation, G is the radar antenna gain,  $P_{TX}$  is the transmitted power in the continuous-wave sense and  $P_{RX}$  is the received power. The distance R can vary from 15cm to 30m, and the RCS can vary from  $0.1\text{m}^2$  to  $100\text{m}^2$ . Assuming that the dynamic range is determined by the maximum- and minimum-possible received-signal levels, this results in a dynamic-range requirement of 122dB! In reality, the radar equation is not valid for large objects that are very close to the radar, as it assumes spherical-wave propagation and a target size that is small compared to the target distance. Therefore, a more realistic assumption is that the maximum received signal at the radar is equal to the transmitted power ( $P_{TX}$ ) due to complete reflection from a large, nearby target. The minimum received signal is produced by the presence of the smallest object ( $\sigma_{min}=0.1\text{m}^2$ ) at the maximum distance ( $R_{max}=30\text{m}$ ), and can be determined to be  $9 \times 10^{-14} \times P_{TX}$  assuming  $\lambda=12\text{mm}$  for a center frequency of 25GHz and an antenna gain of 5dBi. This results in a dynamic-range requirement is approximately 130dB.

The design of circuits that exhibit such a large dynamic range, whether they are analog or digital in nature, requires a large power consumption. In order to alleviate the dynamic-range requirement on the receiver, a radar that uses a single receiving correlator and sequentially searches different range bins using transmitted pulses of different amplitudes and different receiver front-end gains is preferred (Figs. 6(a) and (b) over Fig. 6(c)). While the overall scan time is compromised in this approach, the power consumption, and even the energy expended for a complete scan, are significantly reduced due to the reduced dynamic-range requirement. Once the range bin is fixed, the dynamic range is only determined by the maximum and minimum target sizes, and hence is as low as  $\frac{\sigma_{max}}{\sigma_{min}} = 30 \text{dB}^4$ .

Table 3 derives RF system specifications for a 22-29GHz phased-array radar for vehicular applications assuming pulsed operation. The signal bandwidth is set by the desired range resolution. The number of elements is determined from the desired beamwidth, which is chosen to distinguish a typical automobile at the maximum ranging distance. Based on the FCC specifications for the average- and peak-power levels, the duty cycle and continuous-wave EIRP are determined. The continuous-wave output-power requirement per PA is relaxed in comparison to the system EIRP due to the presence of multiple elements. In order to determine the worst-case SNR and worst-case scan time, a singlechannel noise figure of 8dB is assumed based on the capabilities of current CMOS technology in the 22-29GHz frequency range. The worst-case SNR, achieved for the presence of the smallest target at the maximum distance, is determined to be -31.2dB and derives benefit from the presence of multiple array elements. In order to improve this SNR to a threshold of +10dB, multiple pulses must be sent, received, correlated and accumulated, which results in a worst-case scan time of approximately 1ms. It should be mentioned that 32 elements are difficult to integrate onto a monolithic CMOS radar chip. It is likely that practical systems will multiple scalable sub-arrays to achieve the desired resolution, with each sub-array integrating 4 or 8 elements onto a single-chip.

Under these assumptions, the digital-baseband approach requires an ADC with 30dB of dynamic range and a sampling rate of 6GSa/s. In addition, a digital matched filter with the same dynamic range and speed is also required. In the analog-baseband approach, while the analog correlator must demonstrate a dynamic range of 30dB, the speed requirement on the ADC is reduced by a factor equal to the duty cycle. This is because high-speed correlation is performed in the analog domain, and the ADC only needs to sample and digitize the integrated value after each pulse<sup>5</sup>. To determine the power consumption of these two approaches, the following sections quantify the power consumptions of each of these blocks given their performance requirements.

#### 4.2 ADC Power Consumption

To determine the power consumption of a high-speed ADC with a given sampling rate and a required number of bits, a survey of state-of-the-art high-speed ADCs is useful. Table 4 depicts GSa/s ADCs with the highest figures of merit based on a survey recently published by B. Walden [16]. The figure of merit ( $FOM_{ADC}$ ) is defined as

$$FOM_{ADC} = \frac{2^{ENOB} f_{sample}}{P_{ADC}},\tag{5}$$

where  $f_{sample}$  is the sampling speed in Hertz and  $P_{ADC}$  is the power consumption in watts. The required number of effective ADC bits (*ENOB*) is determined from *DR*, the required dynamic range, using the commonly-used formula

<sup>&</sup>lt;sup>4</sup>This implies that 130-30=100dB of PA output-power control and receiver-gain control are required to account for the signal-level differences across the range bins.

<sup>&</sup>lt;sup>5</sup>As was mentioned earlier, multiple pulses are often required to achieve sufficient SNR. This accumulation can be done either in the digital domain or in the analog domain. If performed in the analog domain, the speed requirement of the ADC would be further reduced by a factor equal to the number of pulses accumulated.

| Table 3: | RF performance     | specifications | for a phase | ed-array | radar : | for $22$ | 2-29GHz | vehicular | applications |
|----------|--------------------|----------------|-------------|----------|---------|----------|---------|-----------|--------------|
| based or | n FCC specificatio | ns assuming p  | pulsed-sinu | soid ope | ration. |          |         |           |              |

| RF Performance                                         | Method of Calculation                                                                                  | Spec.            |
|--------------------------------------------------------|--------------------------------------------------------------------------------------------------------|------------------|
| Bandwidth<br>( <i>BW</i> )                             | $\frac{3 \times 10^8 \text{m/s}}{2 \times \text{Range Resolution (5cm)}}$                              | 3GHz             |
| Pulse Width $(T_{pulse})$                              | $\frac{1}{BW}$                                                                                         | 333ps            |
| Beamwidth                                              | $\frac{\text{Typical Automobile Dimension (2m)}}{\text{Maximum Range (30m)}} \times \frac{180^o}{\pi}$ | $4^o$            |
| Number of Array Elements $(N)$                         | $\frac{2}{\text{Beamwidth in radians}} + 1$                                                            | 32               |
| Optimal Duty Cycle                                     | $10^{\frac{-41.3dBm+10log_{10}50-0dBm}{10}}$                                                           | 0.4%             |
| Continuous-Wave<br>TX EIRP $(P_{EIRP})$                | $-41.3 dBm + 10 \log_{10} \frac{BW}{1MHz} + 10 \log_{10} \frac{1}{0.4\%}$                              | 17.5dBm          |
| Antenna Gain<br><i>G</i>                               | Based on typical antennas                                                                              | 5dBi             |
| Continuous-Wave<br>Output Power per PA $(P_{PA})$      | $P_{EIRP}$ -20 $\log_{10}N$ -G                                                                         | -17.6dBm         |
| RX Single-Channel<br>Noise Figure (NF)                 | Based on current CMOS technology                                                                       | 8dB              |
| Worst-case free-space<br>path loss $(P_{path \ loss})$ | $\frac{G^2\lambda^2\sigma_{min}}{(4\pi)^3R_{max}^4}$                                                   | -130dB           |
| Worst-case SNR $(SNR_{min})$                           | $(P_{PA}+20\log_{10}N+P_{path\ loss})$<br>(-174dBm+10log <sub>10</sub> BW+NF-10log <sub>10</sub> N)    | -31.2dB          |
| Worst-case scan time<br>(Desired SNR is 10dB)          | $10^{\frac{ SNR_{min} +10dB}{10}} \times T_{pulse} \times \frac{1}{\text{Duty Cycle}(0.4\%)}$          | $1.1\mathrm{ms}$ |

| Vendor            | Tech.             | $f_{sample}$ | $n_{bits}$ | SNDR | ENOB | $P_{ADC}$ | $FOM_{ADC}$ |
|-------------------|-------------------|--------------|------------|------|------|-----------|-------------|
|                   |                   | (GSa/s)      |            |      |      | (W)       | (TSa/J)     |
| Nortel            | $0.13 \mu m$ SiGe | 22           | 5          | 22.8 | 3.8  | 3         | 0.102       |
| (P. Schvas)       |                   |              |            |      |      |           |             |
| Agilent Labs      | $0.18 \mu m CMOS$ | 20           | 8          | 29.5 | 4.9  | 9         | 0.067       |
| (Poulton e al.)   |                   |              |            |      |      |           |             |
| HP                | Bipolar Hybrid    | 4            | 8          | 41.5 | 6.9  | 39        | 0.012       |
| (Schiller, Byrne) |                   |              |            |      |      |           |             |
| HP/Rockwell       | GaAs HBT          | 4            | 6          | 33.1 | 5.5  | 5.7       | 0.032       |
| (Poulton, Wang)   |                   |              |            |      |      |           |             |
| Rockwell          | GaAs HBT          | 3            | 8          | 46   | 7.7  | 5.5       | 0.111       |
| (RAD008)          |                   |              |            |      |      |           |             |
| Atmel             | npn bipolar       | 2.2          | 10         | 48   | 8    | 4.2       | 0.134       |
| (AT84AS008)       |                   |              |            |      |      |           |             |
| Atmel             | npn bipolar       | 2            | 10         | 51   | 8.5  | 6.5       | 0.111       |
| (AT84AS004)       |                   |              |            |      |      |           |             |
| Rockwell          | GaAs HBT          | 2            | 8          | 37   | 8.5  | 5         | 0.029       |
| (RSC-ADC080S)     |                   |              |            |      |      |           |             |
| Maxim             | Bipolar           | 1.5          | 8          | 46.9 | 7.8  | 5.25      | 0.064       |
| (Max 108)         |                   |              |            |      |      |           |             |
| Atmel             | npn bipolar       | 1.4          | 10         | 47.5 | 7.9  | 4.6       | 0.014       |
| (TS83102)         |                   |              |            |      |      |           |             |
| Teranetics        | $0.13\mu m$ CMOS  | 1            | 11         | 55   | 9.2  | 0.25      | 2.3         |
| (S. Gupta et al.) |                   |              |            |      |      |           |             |
| Rockwell          | GaAs HBT          | 1            | 10         | 55   | 9.2  | 5         | 0.115       |
| (RAD010)          |                   |              |            |      |      |           |             |

Table 4: Survey of moderate-dynamic-range ADCs with sampling rates larger than 1 GSa/s (courtesy [16]). *ENOB* is derived from (6).

$$ENOB \approx \frac{10 \log_{10} DR}{6}.$$
 (6)

It is noteworthy that

$$P_{ADC} = \frac{2^{\frac{10log_{10}DR}{6}} f_{sample}}{FOM_{ADC}} = \frac{\sqrt{DR} \times f_{sample}}{FOM_{ADC}}.$$
(7)

In other words, the power consumption of an ADC is proportional to the square root of the required dynamic range. It should be noted that the assumption that the power consumption doubles with every extra bit of precision is only true for ADCs with moderate dynamic range. For ADCs with a dynamic range greater than 75dB, the dynamic range tends to be limited by thermal noise rather than quantization noise, which results in a quadrupling of power for every extra bit [17]. The ADC power consumption then becomes proportional to the dynamic range, rather than to its square root.

#### 4.3 Analog Correlator Power Consumption

Fig. 7 depicts the schematic of an analog correlator. The downconverted received signal is multiplied with the code template in a current-commutating mixer and the resultant signal (which is in the current domain) is dumped onto an integrating capacitor. The integrated voltage is sampled by an ADC after each pulse and a shunt switch resets the correlator in preparation for the next pulse. It should be noted that this schematic assumes that the accumulation of multiple pulses to achieve high SNR is performed in the digital domain. If this accumulation is to be performed in the analog domain to further reduce the ADC's speed requirement, switches may be included in series with the integrating capacitor. The switches can be used to disconnect the capacitor from the active devices in between pulses, so that the capacitor voltage does not decay due to the finite output resistance of the circuit. The shunt switch would then reset the capacitor only after sufficient SNR has been achieved in preparation for the next range bin.



Figure 7: An analog correlator.

A useful definition of the dynamic range of the correlator is the ratio of the input-referred -1dB compression point to the input-referred noise level. In order to determine the input-referred -1dB compression point, a "saturation-model" is assumed for the input differential pair with a linear gain of  $g_m$  and a differential threshold voltage of  $V_{diff,th}$  (Fig. 7)<sup>6</sup>. For a saturation block, a sinusoidal differential input voltage of amplitude A produces a differential output current with a fundamental component given by

$$(I_2 - I_1)_{fund} = \frac{g_m A}{\pi} \left( 2\sin^{-1}\left(\frac{V_{diff,th}}{A}\right) + \frac{2V_{diff,th}}{A} \sqrt{1 - \frac{V_{diff,th}^2}{A^2}} \right).$$
(8)

This fundamental component is compressed by 1dB for an input amplitude of

 ${}^{6}g_{m}$  is the transconductance of each device and  $V_{diff,th} = \frac{I_{bias}}{a_{m}}$ .

$$A_{-1dB} \approx 1.25 V_{diff,th} = 1.25 \frac{I_{bias}}{g_m}.$$
(9)

It is assumed that compression is dominated by the input swing rather than the swing at the output nodes. To determine the noise performance, assume that the code template that the input is correlated with is a pulse of width  $T_{pulse}$ , as shown in Fig. 7. During the duration of the pulse, the commutating transistors completely switch resulting in the equivalent circuit depicted on the right in Fig. 7. The differential short-circuit output noise current, obtained by replacing the capacitor with a short circuit, is given by  $\overline{I_{out,n}^2} = \frac{\overline{I_{n1,n2}^2 + \overline{I_{dn,n1}^2} + \overline{I_{dn,n2}^2} + \overline{I_{dn,n1}^2}}{4} = 4kT\gamma g_{d0}\Delta f$ , assuming the pMOS current sources are sized to have an identical  $g_{d0}$  to the input nMOS transistors for simplicity<sup>7</sup>. The nMOS current source does not contribute any noise in differential mode, and the switching noise of the commutating transistors is ignored under the assumption of hard-switching square-wave input pulses. Then, we have

$$V_{out,n}(t) = \frac{1}{C} \int_{-\infty}^{t} I_{out,n}(\tau) \times V_{pulse}(\tau) d\tau, \qquad (10)$$

where  $V_{pulse}(t)$  is the code template with a normalized pulse amplitude of 1. The ADC samples the integrated signal at the end of the pulse ( $t = T_{pulse}$ ). In determining the mean-square value of this sample, we have

$$\overline{V_{sample}^{2}} = E\left[V_{out,n}^{2}(T_{pulse})\right] \\
= E\left[\frac{1}{C^{2}}\int_{-\infty}^{T_{pulse}}I_{out,n}(t_{1}) \times V_{pulse}(t_{1})dt_{1}\int_{-\infty}^{T_{pulse}}I_{out,n}(t_{2}) \times V_{pulse}(t_{2})dt_{2}\right] \\
= \frac{1}{C^{2}}E\left[\int_{-\infty}^{T_{pulse}}\int_{-\infty}^{T_{pulse}}I_{out,n}(t_{1})I_{out,n}(t_{2})V_{pulse}(t_{1})V_{pulse}(t_{2})dt_{1}dt_{2}\right] \\
= \frac{1}{C^{2}}\int_{-\infty}^{T_{pulse}}\int_{-\infty}^{T_{pulse}}4kT\gamma g_{d0}\delta(t_{1}-t_{2})V_{pulse}(t_{1})V_{pulse}(t_{2})dt_{1}dt_{2} \\
= \frac{4kT\gamma g_{d0}}{C^{2}}T_{pulse}.$$
(11)

To refer this noise to the input, the sampled noise level must be divided by the gain applied to an input pulse, which is nothing but  $\frac{g_m T pulse}{2C}$ . Therefore, the input-referred noise level becomes

$$\overline{V_{in,n}^2} = \frac{16kT\gamma g_{d0}}{g_m^2} \frac{1}{T_{pulse}}.$$
(12)

Interestingly, this result indicates that the correlator's noise performance is identical to that of a differential pair with pMOS current-source loads and  $\frac{1}{T_{pulse}}$  as the noise-equivalent bandwidth. The dynamic range may now be written as

$$DR = \frac{A_{-1dB}^2}{\overline{V_{in,n}^2}} \approx \frac{1.56I_{bias}^2 T_{pulse}}{16kT\gamma g_{d0}} \approx \frac{0.1I_{bias}V_{od}T_{pulse}}{kT\gamma},\tag{13}$$

as, in general,  $g_{d0}$  is equal to  $\frac{2I_{ds}}{V_{od}}$ , where  $V_{od}$  is the overdrive voltage  $(=V_{gs}-V_{th})$ .

University of Southern California

 $<sup>^{7}\</sup>gamma$  is the device excess noise factor and  $g_{d0}$  is the channel conductance at zero drain-source bias.



Figure 8: (a) SpectreRF simulations of an analog correlator implemented in IBM's 8RF  $0.13\mu$ m CMOS process. The schematic of the correlator is provided in Fig. 7. The pulse width is 200ps and C=1pF.  $\gamma$  is determined to be 2/3 from the process models. (b) Simulated dynamic range computed as the ratio of the output-referred -1dB compression point to the RMS sampled output noise voltage.

Equation (13) captures several trade-offs in the design of analog signal-processing elements. Firstly, it is clear that to support a larger dynamic range, a linearly-larger power consumption is required. Secondly, as the pulse width decreases, which corresponds to an increase in the signal bandwidth, a larger power consumption is required to maintain the same dynamic range. This is due to the greater amount of noise that is integrated over the larger signal bandwidth. Finally, the overdrive voltage  $V_{od}$  can be related to available supply voltage as

$$nV_{od}$$
 + Output Swing Budget =  $V_{dd}$ , (14)

where n is the number of devices that are vertically stacked in the circuit (four for the correlator depicted in Fig. 7). As a result, with the reducing supply voltages that result from the scaling of technology to lower process nodes, the overdrive voltages reduce and a larger current consumption is required to maintain the same dynamic range.

To verify this theoretical formulation, simulations are performed in SpectreRF using transistors from IBM's 8RF 0.13µm CMOS process across different bias levels (Fig. 8). The device sizes are indicated in Fig. 7 for  $I_{bias}$ =4.3mA, and the sizes for other current levels are scaled linearly to maintain constant overdrive levels of approximately 200mV for each transistor. A pulse width of 200ps is employed which corresponds to a signal bandwidth of approximately 5GHz. C=1pF, V<sub>dd</sub>=1.5V and  $\gamma$  is approximately 2/3 based on the process models. The RMS value of the sampled output noise voltage is determined through root-mean-square-averaging across several transient-noise-simulation runs. A good agreement is seen between theory and simulations. There is a constant multiplicative difference of approximately 5dB between the theory and simulations in both the output-referred -1dB compression point and dynamic range, and this is attributed to capacitive parasitics. However, the linear dependence of dynamic range on power consumption is indeed observed.

It should be noted that flicker noise has been ignored in the presented analysis *and* simulations. However, it can be a significant factor that increases the noise level and limits the dynamic range, especially in deep-submicron processes. The incorporation of flicker noise into the presented formulation is a topic for future investigation.

#### 4.4 Digital Correlator Power Consumption

Fig. 9 depicts the block diagram of a typical digital correlator, also called a matched filter. A bank of  $n_{bits}$ -wide shift registers are used to time-shift the received signal. As is the case with ADCs,  $n_{bits}$  is determined from the dynamic-range requirement. The number of shift registers is equal to the code length  $(2^m)$ . Each register is then multiplied by the correlator coefficients which represent the code sequence. Assuming two-level codes for simplicity, these coefficients would be 11...1 or 00...0. Hence, the multipliers can be simply implemented using multiplexors. The various multiplied values are then added using a binary tree of *n*-bit adders.



Figure 9: Block diagram of a typical digital correlator.

To determine the power consumption of the shift-register portion, the analysis of [18] is followed. It is assumed that each digital gate provides a unit load to its corresponding driver. In each register, on an average, for random data,  $\frac{n_{bits}}{2}$  bits flip their values every clock cycle. The fan out of each bit is 2 because it drives the next register and a load in the correlation network. In addition, the clock flips twice every cycle. Therefore,

$$P_{shift} \propto f_{CLK} V_{dd}^2 \left( 2^m \times \frac{n_{bits}}{2} \times 2 + 2^m \times n_{bits} \times 2 \right) \propto n_{bits} 2^m f_{CLK} V_{dd}^2.$$
(15)

The first term in the equation above represents the  $CV^2$  switching power associated with the  $\frac{n_{bits}}{2}$  bits that flip every clock cycle on an average in each register. The second term represents the switching power associated with the clock.

The power consumption in the multipliers is ignored owing to the simplifying assumption of twolevel codes. In the adder tree, the first level of adders are  $2^{m-1}$  in number and  $n_{bits}$ -wide. The second-level adders are  $2^{m-2}$  in number and  $n_{bits} + 1$ -wide and so on. Ignoring the increase in adder width and assuming that the power consumption of an *n*-bit adder is proportional to *n*, we have

$$P_{adder} \propto f_{CLK} V_{dd}^2 \times n_{bits} \times \left(2^{m-1} + 2^{m-2} \dots 1\right) \propto n_{bits} \left(2^m - 1\right) f_{CLK} V_{dd}^2.$$
(16)

Given these formulations for the power consumptions of the shift-register and adder portions, it becomes possible to define a normalized Figure of Merit for digital matched filters  $(FOM_{corr})$ .

$$FOM_{corr} = \frac{P_{corr}}{n_{bits} 2^m f_{CLK} V_{dd}^2 L},\tag{17}$$

University of Southern California

February 2009

where  $P_{corr}$  is the power dissipation and L is the channel length of the technology employed. L is present because the power dissipation is also proportional to the capacitance associated with each node, which scales down roughly linearly with technology. It is interesting to note that  $P_{corr} \propto n_{bits} \propto \frac{10}{6} log_{10} DR$ . In other words, the power dissipation of digital matched filters is proportional to the logarithm of the dynamic-range requirement.

| • |          |                      | 1                   | 1        |            |             |                     |                      |
|---|----------|----------------------|---------------------|----------|------------|-------------|---------------------|----------------------|
|   | Ref.     | Technology           | $f_{CLK}$           | $V_{dd}$ | $n_{bits}$ | Code Length | $P_{corr}$          | $FOM_{corr}$         |
|   | [18]     | $2\mu \mathrm{m}$    | $25 \mathrm{MHz}$   | 5V       | 8          | 256         | $1.373 \mathrm{~W}$ | $5.4 \times 10^{-7}$ |
|   | [18]     | $2\mu \mathrm{m}$    | $25 \mathrm{MHz}$   | 5V       | 8          | 256         | $0.753 \mathrm{~W}$ | $3 \times 10^{-7}$   |
|   | [19]*    | $0.8 \mu \mathrm{m}$ | 50MHz               | 5V       | 4          | 512         | $0.092 \mathrm{W}$  | $4.5 \times 10^{-8}$ |
|   | [19]*    | $0.8 \mu { m m}$     | 20MHz               | 2.5V     | 4          | 512         | $0.007 \mathrm{W}$  | $3.4 \times 10^{-8}$ |
|   | $[20]^*$ | $0.8 \mu \mathrm{m}$ | 93MHz               | 5V       | 4          | 176         | 0.138 W             | $1.1 \times 10^{-7}$ |
|   | $[20]^*$ | $0.8 \mu \mathrm{m}$ | 44MHz               | 2.6V     | 4          | 176         | 0.030 W             | $1.8 \times 10^{-7}$ |
|   | [21]     | $0.6 \mu { m m}$     | $2.5 \mathrm{MHz}$  | 2V       | 4          | 16          | $0.0016 { m W}$     | $4.2 \times 10^{-6}$ |
|   | [22]     | $0.18 \mu { m m}$    | $15.6 \mathrm{MHz}$ | 1.6V     | 6          | 128         | 0.0009W             | $1.6 \times 10^{-7}$ |
|   | [23]     | $0.8 \mu { m m}$     | 50MHz               | 3V       | 1          | 128         | $0.170 { m W}$      | $3.7 \times 10^{-6}$ |
|   |          |                      |                     |          |            |             |                     |                      |

Table 5: Survey of digital matched-filter designs. The references with an asterisk( $^*$ ) are designs with I and Q channels. Therefore their power dissipation is halved.

In order to determine the typical  $FOM_{corr}$  of digital matched filters, a survey of matched-filter designs is performed. The results of the survey are depicted in Table 5. It should be mentioned that several designs in the survey employ several samples per code bit. This increases the effective code length (depth of the shift register and number of adders). It also increases the clock frequency beyond the code rate. Based on this survey, the best (lowest)  $FOM_{corr}$  achieved is  $3.4 \times 10^{-8}$ .

#### 4.5 Comparison of Analog- and Digital-Baseband Approaches

Fig. 10 shows a comparison between the power consumptions of the various blocks as a function of the required dynamic range. A bandwidth of 3GHz for vehicular radar sets the sampling rate at 6GSa/s for the ADC to be used in the digital-baseband approach.  $FOM_{ADC}$  is assumed to be 2.3TSa/J, the best reported in the survey described earlier. As was mentioned earlier, for high-resolution ADCs, the assumption that an extra bit of resolution doubles the power consumption no longer holds true, since the ADC tends to be limited by thermal noise. Therefore, a second line representing the quadrupling of power for every extra bit in a high-resolution ADC is also included for DR > +75dB. The bandwidth also sets the pulse width at 333ps for the analog correlator.  $\gamma$  is assumed to be equal to 3 (typical for deep-submicron processes),  $V_{dd}$  is set to 1.2V (typical for a 90nm CMOS process) and  $V_{od}$  is assumed to be 0.175V, assuming four stacked transistors in an analog correlator and an output swing budget of 0.5V. For the digital matched filter, a code length of 8 is assumed. The clock frequency is 3GHz,  $V_{dd}$  is 1.2V and L is set to 80nm (the drawn channel length of a typical 90nm CMOS process).  $FOM_{corr}$  is assumed to be  $3.4 \times 10^{-8}$ , the best reported in the survey of digital matched filters described earlier.

The power consumption of the analog correlator rises the fastest with the required dynamic range due to their linear relationship. The ADC has a power consumption that is proportional to the square root of the dynamic range, while the digital matched filter exhibits a logarithmic dependence, resulting in a gradual increase in the power consumption. For the digital-baseband approach, it is clear that, for a typical 90nm process, the ADC is the bottleneck in terms of the power consumption when compared to the digital matched filter.



Figure 10: Comparison of the power consumptions of a 6GSa/s ADC, an analog correlator handling a 333ps pulse with  $\gamma = 3$ ,  $V_{od} = 0.175$ V and  $V_{dd} = 1.2$ V, and a digital matched filter with a code length of 8,  $f_{CLK} = 3$ GHz, L = 80nm and  $V_{dd} = 1.2$ V. The figures of merit for the ADC and digital matched filter are obtained from the surveys presented earlier.

Based on these numbers, Table 6 compares the power consumptions of the analog- and digitalbaseband approaches. The required dynamic range for vehicular radar is 30dB, as was discussed earlier, and the duty cycle is 0.4%. In the digital-baseband approach, the power consumption is dominated by the high-speed ADC, and is 83.5mW. In the analog-baseband approach, as was discussed earlier, the speed requirement on the ADC is reduced by a factor equal to the duty cycle. Furthermore, a 30dB-dynamic-range analog correlator requires only  $2.6\mu$ W. As a result, the total power consumption is as low as 0.34mW, making the analog-baseband approach the preferred implementation for vehicular radar <sup>8</sup>.

The digital-baseband approach is power hungry because the high-speed ADC continuously digitizes the received signal at a high speed, even though the radar is only interested in one range bin at any give time. The power consumption of this ADC may also be reduced by a factor equal to the duty cycle through the design of an ADC that exploits "intelligent sampling" only in the window (*i.e.*, range bin) of interest. Such techniques that exploit known properties of the system/application must be investigated to achieve high-speed digitization at high dynamic ranges and reasonable power consumptions. Another application that can benefit from such investigations is Software-Defined Radio (SDR). The classical view of SDR involves direct digitization of the received signal after the antenna [24]. To cover the major radio standards upto 5GHz, a 12-bit, 10GSa/s ADC is required [24], which currently consumes a large power of 1.9W based on the survey described earlier. In [24], the authors describe a mixed-signal preconditioning technique that relaxes the dynamic-range requirement of the ADC, resulting in significant power savings.

<sup>&</sup>lt;sup>8</sup>For a dynamic-range requirement as low as 30dB, it is possible that stray factors that were not taken into account in the analog-correlator formulation will dominate over the computed power consumption of  $2.6\mu$ W. Such factors include the power of the common-mode feedback circuitry and the power associated with the charging and discharging of the capacitors of the commutating transistors that are being switched hard. Nevertheless, the power consumptions of both the analog and digital correlators are dominated by their respective ADCs, and the ADC that is required for the analog-baseband design requires significantly lower power due its alleviated speed requirement.

| Table 6:  | Compar   | ison o | of the po  | wer c      | $\operatorname{consumpti}$ | ons of | the $\epsilon$ | analog- | and     | digital-    | baseban     | d appi | roache             | es for |
|-----------|----------|--------|------------|------------|----------------------------|--------|----------------|---------|---------|-------------|-------------|--------|--------------------|--------|
| vehicular | radar.   | The l  | bandwid    | th is      | assumed                    | to be  | $3\mathrm{GH}$ | z, the  | duty    | cycle i     | s $0.4\%$ a | and th | le requ            | uired  |
| dynamic   | range is | 30 dB  | S. $FOM_A$ | $_{DC}$ is | s assumed                  | to be  | 8.2T           | Sa/J a  | and $F$ | $OM_{corr}$ | is taken    | as 3.4 | $4 \times 10^{-1}$ | -8.    |

| Component   | Digital Baseban                 | d                | Analog Baseband               |                    |  |
|-------------|---------------------------------|------------------|-------------------------------|--------------------|--|
|             | Comment                         | Power            | Comment                       | Power              |  |
| ADC         | 6GSa/s, 5 bits                  | 83.5mW           | 24MSa/s, 5 bits               | 0.334mW            |  |
| Analog      |                                 |                  | $\gamma = 3, V_{dd} = 1.2 V,$ |                    |  |
| Correlating | N/A                             | N/A              | $V_{od} = 0.175 V,$           | $2.6 \mu W$        |  |
| Integrator  |                                 |                  | $T_{pulse}$ =333ps            |                    |  |
| Digital     | $f_{CLK}$ =3GHz, $V_{dd}$ =1.2V |                  |                               |                    |  |
| Matched     | L=80nm,                         | $0.5\mathrm{mW}$ | N/A                           | N/A                |  |
| Filter      | Code Length=8                   |                  |                               |                    |  |
| Total Power |                                 | 84mW             |                               | $0.34 \mathrm{mW}$ |  |

## 5 A 4-Channel 24-26GHz RF-Multibeam ST-RAKE Transceiver for Vehicular-Radar Applications in 90nm CMOS

Based on the RF Multibeam ST-RAKE concept, a 24-26GHz single-chip radar is implemented in IBM's 9RF-LP 90nm CMOS process for vehicular-radar applications. The radar employs a  $4\times4$  multibeam matrix and hence supports 4 beams. The radar chip occupies an area of  $3.4\times4.2$ mm<sup>2</sup>. The chip microphotograph and block diagram are depicted in Fig. 11 and Fig. 12 respectively. The RF front end of each channel consists of a LNA and a PA which share an antenna path through a low-loss T/R switch<sup>9</sup>. The T/R switch's interface with the antenna is single-ended to ease the mm-wave chip-antenna transition, but its interface with the PA and LNA is differential through an on-chip balun. The implementation of the LNAs and PAs as differential circuits is mainly motivated by the need to reduce substrate coupling between the PA and LNA and between adjacent channels. At the output of the LNA and the input of the PA, another set of switches are incorporated to allow the PA and the LNA to share the area-hungry mm-wave multibeam matrix for beamforming. The outputs of the multibeam matrix interface with the baseband blocks, which incorporate separate I/Q paths, perform on-chip analog baseband processing and share the multibeam matrix between transmit and receive modes. An on-chip LO is implemented and feeds the baseband blocks.

At the time of the writing of this report, the measurement of several blocks of the prototype are still ongoing. Measurements have been obtained for the on-chip LO and the  $4\times4$  multibeam matrix and are presented here. It is anticipated that the measurement of the remaining blocks as well as a system-level demonstration of the chip's ST-RAKE radar functionality will be completed shortly.

Fig. 13 depicts the circuit diagram T/R switch and Fig. 14 depicts its close-up microphotograph. At the input is a stacked transformer that converts the single-ended input to a differential signal. The stacked windings are single-turn spirals with an outer dimension (OD) of  $195\mu$ m and a width (W) of  $10\mu$ m. The top spiral is implemented in the top metal layer (LB), while the lower spiral is implemented in the next two metal layers (M1\_2B and M2\_2B) which are strapped together. The second spiral has a capacitance shunted across its terminals to resonate out the inductance of the

<sup>&</sup>lt;sup>9</sup>The implementation of switches enables the sharing of the antenna because the radar is designed to be halfduplex. In other words, simultaneous transmit and receive capability is not required. As was mentioned before, this is an advantage that pulse-based radars enjoy over continuous-wave radars - the time-gated nature enables sharing of resources and improves isolation between the transmit and receive sections.



Figure 11: Chip microphotograph of the experimental 90nm CMOS 24-26GHz RF-multibeam ST-RAKE radar.

spiral at 25GHz. Once the input is converted to a differential signal, the T/R switch employs quarterwavelength ( $\lambda/4$ ) transmission lines to eliminate the series transistor that is required in conventional T/R switch designs. A  $\lambda/4$  line with a differential characteristic impedance close to the required 100 $\Omega$ is employed in each branch. When a branch is disabled, the shunt transistors at the end provide a (near) short-circuit to ground. The  $\lambda/4$  line transforms the short circuit to an open circuit so that the matching condition for the other branch is undisturbed. The  $\lambda/4$  lines are implemented as coupled coplanar waveguides (CPWs) in the LB metal layer with a width (W) of 8 $\mu$ m, a spacing to ground (S) of 10 $\mu$ m and a differential spacing (D) of 10 $\mu$ m. The ground plane underneath is formed in the two bottom metal layers (M1 and M2). The coupled CPWs are bent to minimize their area consumption (Fig. 14).

Fig. 15(a) depicts the simulated small-signal S-parameters of the switch when port 3 is enabled and port 2 is disabled. The insertion loss  $(S_{31})$  is approximately 3dB, half of which is contributed by the input balun and half by the switch. The reflection coefficients are acceptable over the frequency ranges of interest and the transmission to the disabled port  $(S_{21})$  is approximately -25dB. The large signal performance of the switch at 25GHz is depicted in Fig. 15(b). The elimination of the series transistor greatly enhances the switch's power handling capability, and the input-referred -1dB compression point is approximately 19dBm.

Fig. 16(a) depicts the circuit diagram of LNA and its output switch and Fig. 16(b) depicts the chip microphotograph. The LNA is a four-stage design, with the current sharing employed in the first and second pair of stages. Each stage employs a differential pair with a dummy pair for unilateralization through  $C_{qd}$ -cancellation. The stages are all identical to each other except for the



Figure 12: Block diagram of an experimental 90nm CMOS 24-26GHz RF-multibeam ST-RAKE radar.



Figure 13: Circuit diagram of the T/R switch.



Figure 14: Chip microphotograph of the T/R switch. The coupled CPWs are bent substantially to minimize their area consumption.



Figure 15: (a) Simulated small-signal performance of the T/R switch. Port 3 is enabled and port 2 is disabled. (b) Simulated large-signal T/R switch performance.



Figure 16: (a) Circuit diagram of the four-stage LNA. (b) Chip microphotograph of the LNA.

fact that the first stage has an input matching network and the last stage has an output matching network that are designed to match the ports to  $100\Omega$ . The inductors are implemented as spirals in the LB metal layer with a patterned ground shield formed in M1. The total bias current consumption is 18mA from the supply voltage of 1.2V. Stages 2 and 4 also have pMOS switches at their outputs. These switches reduce the LNA gain during the transmit phase to prevent saturation of the LNA due to the PA's output and also contribute toward stability of the PA-LNA combination.

The output side switch consists of series transistor switches, with a dummy set of transistors with inverted connections. These dummy transistors exploit the differential nature of the circuit to provide feedthrough-capacitance cancellation and essentially enhance isolation. This too is essential for the stability of the PA-LNA combination in transmit mode.

Fig. 17(a) depicts the circuit diagram of the PA and its input switch. Fig. 17(b) depicts the chip microphotograph. The input switch is identical to the output switch of the LNA. The PA is a single-stage cascoded pseudo-differential pair that is designed for class A operation. Class A operation is necessitated by the fact that the PA must be linear to support the superposition of the codes being transmitted on the different beams. The PA is biased to draw 60mA from the supply using a scaled replica branch that uses an operational amplifier-based feedback loop to fix the current across process and temparature variations. As is the case in the LNA, there is a shunt pMOS switch at the output



Figure 17: (a) Circuit diagram of the single-stage pseudo-differential PA. (b) Chip microphotograph of the PA.



Figure 18: (a) Simulated small-signal S-parameters of the PA-LNA-T/R switch combination in receive mode. (b) Simulated large-signal performance of the PA-LNA-T/R switch combination in receive mode.



Figure 19: (a) Simulated small-signal gain of the PA-LNA-T/R switch combination in transmit mode. (b) Simulated large-signal performance of the PA-LNA-T/R switch combination in transmit mode.

of the PA to reduce the gain in receive mode to ensure stability of the LNA-PA combination. The  $490\mu m$  CPS at the input of the PA is necessitated by the layout to compensate for the extra length of the LNA due to its four-stage nature.

Fig. 18 depicts the simulated performance of the T/R switch, LNA and PA combination in receive mode. The LNA is expected to exhibit a peak small-signal gain of approximately 32dB at 25GHz and an NF<5.7dB over the 24-26GHz frequency<sup>10</sup>. Fig. 18(b) shows the simulated large-signal performance of the LNA at 25GHz. The input-referred -1dB compression point of the LNA is expected to be approximately -35dBm.

Fig. 19 depicts the simulated performance of the T/R switch, LNA and PA combination in transmit mode. The PA is expected to exhibit a peak small-signal gain of approximately 11dB at 25GHz. Fig. 19(b) shows the large-signal performance of the PA at 25GHz. The PA is expected to achieve an output-referred -1dB compression point of 8dBm and a saturated output power level in excess of 13.5dBm.

Several multibeam matrices have been reported in past literature [7], [8], [9]. The Blass matrix [8] (Fig. 20(a)) and Chu's architecture [9] (Fig. 20(b)) are true-time-delay architectures and hence are suitable for extremely wideband signals. The Blass matrix uses varying lengths of transmission

<sup>&</sup>lt;sup>10</sup>Based on T/R switch simulations, the contribution of the T/R switch to the NF is approximately 3dB.



Figure 20: (a) The Blass Matrix. (b) Chu's mulitbeam architecture for the simple case of two antennas.



Figure 21: (a) A 4-input, 4-output Butler matrix. (b) Implemented -3dB quadrature branchline hybrids.

lines to generate delay differences and employs area-hungry couplers to transfer signals from one transmission-line to another. Chu's architecture reduces the number of delay elements required in comparison to the Blass matrix. In the implementation described in [9], the couplers are replaced with active buffers to transfer the signal from one line to another. This further reduces the area requirement but comes at the expense of power consumption and linearity.

The Butler matrix [7], in the form of a 4-input, 4-output realization, is depicted in Fig. 21(a). It employs -3dB quadrature hybrids and fixed-length transmission lines to generate the multiple beams. Being a purely passive structure, it is bidirectional in nature and highly linear. In view of these advantages, the Butler matrix was chosen for the implementation described in this section. It is, however, *not* a true-time-delay architecture as it maintains prescribed constant *phase differences* over frequency at its outputs between the signals incident at the inputs. Hence, it is suitable only for applications where the fractional bandwidth is not too large. However, for a four-element array in the vehicular-radar frequency range, the lack of true-time delays does not significantly degrade array performance.

Fig. 21(b) depicts the implemented -3dB, 90° couplers, realized as branchline hybrids. The transmission lines of the branchline hybrids are implemented using coupled coplanar waveguides in LB, with the ground plane implemented in M1 and M2. Fig. 22(a) depicts a chip microphotograph of a single hybrid, while Figs. 22(b)-(d) summarize the simulated performance. Simulations based on foundry models for the coupled CPWs are compared to EM simulations of the hybrid in IE3D. A reasonable agreement is seen in all parameters. Based on the foundry models, the loss from the input port to the through and coupled ports is 4-5dB in the desired frequency range. This implies a dissipative loss of 1-2dB when compared to the ideal hybrid loss of 3dB to each port. The input reflection coefficient, isolation to the isolated port and phase difference between the through and coupled ports are all acceptable based on simulations.

The small-signal S-parameters of the Butler matrix are measured through test SGS pads that are placed in the layout at the four outputs of the Butler matrix. Unfortunately, test pads were not placed at the four inputs due to space constraints. As a result, the inputs of the four T/R switches are probed with the PA-LNA-T/R switch combinations configured to receive mode. Using this probing scheme, the insertion gains and phases over frequency from each input to each output are determined, and this data is used to synthesize normalized UWB array patterns for each beam (Fig. 23). The waveform is assumed to be a pulsed sinusoid with 500ps pulse width and 25GHz center frequency.



Figure 22: (a) Chip microphotograph of the implemented -3dB quadrature branchline hybrid. (b) Simulated insertion loss to through and coupled ports. (b) Simulated reflection coefficient and isolation to the isolated port. (d) Phase difference between through and coupled ports.



Figure 23: Synthesized normalized UWB array patterns of the  $4 \times 4$  Butler matrix from measured S-parameter data.



Figure 24: Chip microphotograph and block diagram of each beam's baseband block - transmit mode.

The antenna spacing is assumed to be half-wavelength at 25GHz and an energy detector is assumed to gauge the output strength. The array performance is seen to be reasonable and confirms the fact that the lack of true-time-delays does not significantly degrade performance for the bandwidths and number of array elements of interest.

The Butler matrix is area-hungry and represents a poor utilization of the power of CMOS technology, as it does not take advantage of the active devices that can be reliably integrated to reduce the area consumption in any way. A more-compact 4-input, 4-output Butler matrix operating at 24GHz was reported recently [10], and occupies a silicon area of  $0.9 \text{mm} \times 0.46 \text{mm}$  while exhibiting a measured minimum insertion loss of 2.25dB. While the bidirectional nature of the architecture does allow the Butler matrix to be shared between transmit and receive modes, thus keeping the overall transceiver area under check, alternate compact multibeam architectures that do not rely on passive elements for delays/phase shifts are a worthy line of research for the future.

The output of each beam is connected to a baseband block, the chip microphotograph and block diagram of which are depicted in Fig. 24 for transmit mode. A differential Wilkinson power splitter/combiner [11] is used to combine the signals from the I and Q sub-blocks. The splitter is constructed using coupled CPW lines with the appropriate characteristic impedance  $(100 \times \sqrt{2} \approx 140\Omega)$ . Each I/Q sub-block consists of a passive mixer that, in up-conversion mode, is driven by a TX modulator and an LO buffer. The passive mixer is chosen for bidirectionality - the mixer, the LO buffer driving its LO port and the power splitter/combiner are shared between transmit and receive modes. The TX modulator receives is code data from a shift register - this enables flexibility during testing for experimentation with different coding schemes. The frequency of the code, which governs the bandwidth of the transmitted signal, is set by the externally provided shift register clock and the other timing signals that control the generation and duration of the code are externally provided as well.

The block diagram and chip microphotograph of the baseband block in receive mode are depicted in Fig. 25. The RF signal is split by the power splitter to the I and Q sub-blocks, where it is downconverted to baseband by the passive mixer. The downconverted signal in each sub-block is



Figure 25: Chip microphotograph and block diagram of each beam's baseband block - receive mode.



Figure 26: (a) Schematic diagram of the passive mixer in each baseband I/Q sub-block. (b) Simulated large-signal downconversion performance of the passive mixer.

then correlated with each of the four transmitted codes and then integrated using four correlators and integrators. The four correlators receive the template codes from shift registers in a manner similar to the TX modulator for testing flexibility.

Fig. 26(a) shows the schematic diagram of the passive mixer present in each I/Q sub-block. Fig. 26(b) shows the simulated large-signal downconversion performance of the mixer. The baseband side of the mixer is terminated to 100 $\Omega$ , the LO signal has a frequency of 25GHz and a differential peak voltage of 1V, the RF frequency is 26GHz and  $V_{bias} = 0.5V$ . The input-referred -1dB compression point is close to -1dBm, and the conversion gain is approximately -8.4dBm.

Fig. 27 depicts the schematic diagram of the coupled VCO architecture employed for I/Q LO generation. Each individual VCO is implemented using a cross-coupled nMOS pair and an LC resonant load. The 227pH inductor is implemented as a single-turn spiral inductor in LB with an outer dimension of 145 $\mu$ m, a line width of 15 $\mu$ m and an M1 slotted ground shield. The Q at 25GHz is 24.6 based on foundry models. Peak Q is achieved at a frequency of 33.5GHz and is equal to 26.1. nMOS varactors are included for frequency tuning, and a bank of calibration varactors are also included to



Figure 27: Schematic diagram of the 25GHz VCO employed for I/Q LO generation.



Figure 28: Circuit diagram of the LO distribution network.

accommodate for process and layout mismatches to ensure good quadrature. A pMOS current source is employed due to its superior flicker noise performance when compared to its nMOS counterpart. The two VCOs are coupled to each other to ensure quadrature through the employment of differentialpair injection transistors of which one pair has an inverted output connection in comparison to the other [12]. Each VCO is also equipped with an output buffer that generates sufficient output power to distribute the I/Q LO signals to the various baseband blocks. Each VCO, along with its injection transistors, consumes approximately 5.2mA of current, and each output buffer consumes 15.4mA.

I/Q LO distribution is accomplished in the following manner - the inputs of the I/Q LO buffers in each baseband block have a 100 $\Omega$  resistor shunted across their inputs to provide a termination for the long 100 $\Omega$  coupled CPW lines that connect each LO buffer's input to the output of the output buffer of the coupled VCO (Fig. 28). Therefore, each I/Q output buffer of the coupled-VCO pair has four terminated 100 $\Omega$  lines in parallel interfacing with it, and hence its output is matched to 25 $\Omega$ . Each I/Q LO buffer in the baseband blocks consumes 9mA and this, in conjunction with the power generated by the VCO output buffers, is sufficient to ensure a peak differential swing in excess of 1V



Figure 29: (a) Measured frequency tuning characteristic of the I/Q VCO. All calibration bits are set to 0/1. The simulated tuning range when all bits are 0 is also depicted. (b) Simulated and measured phase noise performance when the control voltage is set to 1.2V and all calibration bits are set to 0.

at the various baseband mixers.

The power consumption associated with LO distribution is rather large and may be computed to be  $2 \times 15.4$ mA $\times 1.2$ V+8  $\times$  9mA $\times 1.2$ V= 123.4mW. The sources of this power consumption are the 100 $\Omega$  termination resistors at the inputs of the I/Q LO buffers that dissipate the power generated by the I/Q VCO's output buffer and the I/Q LO buffers themselves that provide additional voltage gain. An alternate design approach involves the matching of the input impedance of each I/Q passive mixer to 100 $\Omega$ . This would potentially eliminate the 100 $\Omega$  termination resistors and the I/Q LO buffers, as the matching network would provide the required voltage gain.

It should be noted that the power associated with LO distribution is further exacerbated by the long distances (1-3mm) over which the LO signals need to be routed. These long distances result in transmission-line losses that must be overcome by the I/Q VCO's output buffers. The challenge of LO distribution is central to multibeam transceivers and power- and area-efficient techniques to accomplish the distribution remain an open topic of research.

Fig. 29(a) depicts the measured frequency-tuning characteristic of the I/Q VCO when all calibration bits are set to 0/1. The simulated tuning curve when all bits are 0 is also depicted. The frequency error is approximately 1GHz at a control-voltage value of 0V, and 1.4GHz at a control-voltage value of 1.2V. It is found that this discrepancy corresponds to an increase in the capacitance of the VCO's tuned loads by approximately 17fF. The simulated and measured phase noise of the VCO for a control-voltage value of 1.2V when all calibration bits are set to 0 are shown in Fig. 29(b). The VCO achieves a phase noise performance of -93.7dBc/Hz at a 1MHz offset.

The sensitivity of a coupled I/Q VCO's quadrature phase relationship to process and layout mismatches is an important performance metric. Fig. 30 shows the simulated phase difference across the I and Q outputs as a function of the capacitance mismatch between the two VCOs <sup>11</sup>. The calibration varactors can be used to compensate for upto 15fF of capacitance mismatch.

Fig. 31 illustrates the circuit diagrams of the TX modulator and the analog correlator. The TX modulator is a simple hard-switching differential pair that accepts the rail-to-rail square-wave code from the shift register and produces an output square wave of controllable amplitude. The output amplitude is controlled by setting the bias current. The bias current may be varied in seven

 $<sup>^{11}</sup>$ An extra mismatch capacitance is added in shunt to one of the VCO cores to perform the simulation. All calibration bits are set to 0.



Figure 30: Simulated I/Q phase difference as a function of mismatch capacitance.



Figure 31: (a) Circuit diagram of the TX modulator. (b) Circuit diagram of the analog correlator.



Figure 32: Circuit diagram of the analog integrator.

linear steps from 1.78mA-10.78mA through the implementation of three controllable binary-weighted current sources in parallel. This in turn causes the output amplitude of the TX modulator to vary from 90mV to 550mV, which serves to provide approximately 16dB of transmit power control per code. Common-mode feedback is employed to maintain a constant output common-mode level across the various bias currents. The analog correlator is a doubly-balanced current-commutating mixer with the commutating transistors being driven by the square-wave code sequence provided by the shift registers. Each correlator consumes  $530\mu$ A and employs shunt switches at its output to vary the load resistance and hence the voltage gain from 13dB to 4dB.

Fig. 32 illustrates the circuit diagram of the analog integrator. The design consists of a differentialpair transconductance cell with an active load and an integrating output capacitor. Three levels of nMOSes and pMOSes are stacked to increase the output resistance. This reduces the frequency of the dominant output pole and, as a result, increases the hold-time of the integrator. A shunt switch is included along with the output capacitor to reset the integrator's output to zero.

Table 7 summarizes the simulated performance of the experimental prototype. As was mentioned earlier, the measurement of several blocks of the prototype are still ongoing, and will be completed shortly. It should be pointed out that the power consumption of the radar transceiver can be significantly reduced by switching off the PA and other transmit components during receive mode, and the LNA and other receive components during transmit mode, through the design of fast-power-on, fast-power-off circuits.

| Implementation                              |                                                                  |
|---------------------------------------------|------------------------------------------------------------------|
| Technology                                  | 90nm CMOS                                                        |
| Die Area                                    | $4.2$ mm $\times 3.4$ mm                                         |
| Supply Voltage                              | 1.2V                                                             |
| RF-Path Performance                         |                                                                  |
| $PA+T/R$ switch output-referred $CP_{-dB}$  | 8dBm                                                             |
| PA+T/R switch saturated output power        | > 13.5dBm                                                        |
| LNA+T/R switch NF                           | < 5.7dB over 24-26GHz                                            |
| LO-Path Performance                         |                                                                  |
| Measured $I/Q$ VCO tuning range             | 5.6%                                                             |
| Measured I/Q VCO phase noise at 1MHz offset | -93.7dBc/Hz                                                      |
| Array Performance                           |                                                                  |
| Number of beams                             | 4                                                                |
| Number of antenna channels                  | 4                                                                |
| RF-Path Power Consumption                   |                                                                  |
| PAs                                         | $4 \times 60 \text{mA} \times 1.2 \text{V} = 288 \text{mW}$      |
| LNAs                                        | $4 \times 18 \text{mA} \times 1.2 \text{V} = 86.4 \text{mW}$     |
| LO-Path Power Consumption                   |                                                                  |
| I/Q VCO and buffers                         | 49.4mW                                                           |
| Baseband-Path Power Consumption             |                                                                  |
| TX modulators                               | $8 \times 10.75 \text{mA} \times 1.2 \text{V} = 103.2 \text{mW}$ |
| Analog correlators                          | $32 \times 0.5 \text{mA} \times 1.2 \text{V} = 19.2 \text{mW}$   |
| Integrators                                 | $32 \times 3.08 \text{mA} \times 1.2 \text{V} = 118.3 \text{mW}$ |
| I/Q local LO buffers                        | $8 \times 9 \text{mA} \times 1.2 \text{V} = 86.4 \text{mW}$      |
| Total Power Consumption                     | 751mW                                                            |

 

 Table 7: Summary of the simulated performance of the implemented 90nm CMOS 24-26GHz RFmultibeam ST-RAKE radar.

# 6 Summary

In summary, an RF-Multibeam ST-RAKE transceiver architecture has been introduced for radar and imaging applications. The architecture has the ability to isolate not only LoS reflections but multipath reflections as well. The collection of additional multipath-reflection information enhances scene reconstruction, as multipath reflections impinge on the desired object(s) from directions other than the LoS. An experimental prototype, operating in the 24-26GHz frequency range and targetting the commercial vehicular-radar application space, was built to verify the principle of operation of the architecture.

The design of orthogonal codes that are central to the operation of the RF-Multibeam ST-RAKE transceiver architecture is a topic that is beyond the scope of this work. It is expected that the developed experimental prototype will serve as a testbed for different code families, and hence will aid future investigations in this direction.

## References

- I. Gresham, A. Jenkins, R. Egri, C. Eswarappa, N. Kinayman, N. Jain, R. Anderson, F. Kolak, R. Wohlert, S. P. Bawell, J. Bennett and J.-P. Lanteri, "Ultra-Wideband Radar Sensors for Short-Range Vehicular Applications," *IEEE Transactions on Microwave Theory and Techniques*, vol. 52, no. 9, pp. 2105-2122, Sep. 2004.
- [2] A. M. Sayeed, "Deconstructing multiantenna fading channels," *IEEE Transactions on Signal Processing*, vol. 50, no. 10, pp. 2563-2579, October 2002.
- [3] R. Price and Jr. P. E. Green, "A communication technique for multipath channels," *Proceedings* of the IRE, vol. 46, pp. 555-570, March 1958.
- [4] B. H. Khalaj, A. Paulraj and T. Kailath, "2-D rake receivers for CDMA cellular systems," in Proceedings of the 1994 IEEE Global Telecommunications Conference, vol. 1, pp. 400-404, Nov.-Dec. 1994.
- [5] C. Brunner, J. S. Hammerschmidt, A. Seeger and J. A. Nossek, "Space-time eigenrake and downlink eigenbeamformer: exploiting long-term and short-term channel properties in WCDMA," in *Proceedings of the 2000 IEEE Global Telecommunications Conference*, vol. 1, pp. 138-142, Nov.-Dec. 2000.
- [6] B. J. Wysocki and T. A. Wysocki, "Orthogonal Binary Sequences with Wide Range of Correlation Properties," in *Proceedings of the Sixth International Symposium on Communication Theory and Applications*, pp. 483-485, July 2001.
- [7] J. Butler and R. Lowe, "Beam-Forming Matrix Simplifies Design of Electronically Scanned Antennas," *Electronic Design*, pp. 170-173, April 1961.
- [8] J. Blass, "Multidirectional Antenna: A New Approach to Stacked Beams," in IRE International Conference Record, vol. 8, part 1, 1960.
- T. Chu and H. Hashemi, "A CMOS UWB Camera with 7x7 Simultaneous Active Pixels," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, pp. 120-121, February 2008.
- [10] T.-Y. Chin, S.-F. Chang, C.-C. Chang and J.-C. Wu, "A 24-GHz CMOS Butler Matrix MMIC for multi-beam smart antenna systems," in 2008 IEEE RFIC Symposium Digest of Technical Papers, pp. 633-636, June 2008.
- [11] E. Wilkinson, An N-way hybrid power divider, IRE Transactions on Microwave Theory and Techniques, vol. MTT-8, no. 1, pp. 116118, Jan. 1960.
- [12] A. Rofougaran, J. Rael, M. Rofougaran and A. Abidi, "A 900MHz CMOS LC-Oscillator with Quadrature Outptus," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, pp. 392-393, February 1996.
- [13] I. D. O'Donnell and R. W. Brodersen, "An ultra-wideband transceiver architecture for low power, low rate, wireless systems," *IEEE Transactions on Vehicular Technology*, vol. 54, no. 5, pp. 1623-1631, September 2005.

- [14] D. D. Wentzloff, R. Blazquez, F. S. Lee, B. P. Ginsburg, J. Powell and A. P. Chandrakasan, "System design considerations for ultra-wideband communication," *IEEE Communications Magazine*, vol. 43, no. 8, pp. 114-121, August 2005.
- [15] First report and order, revision of part 15 of the commissions rules regarding ultra wideband transmission systems, FCC, Washington, DC, ET Docket 98-153, 2002.
- [16] R. H. Walden, "Analog-to-Digital Conversion in the Early 21st Century," in 2007 IEEE International Microwave Symposium - Workshops and Tutorials: WMK Ultrafast Analog-to-Digital (A/D) Conversion Technique and its Applications, June 2007.
- [17] B. Murmann, "A/D Converter Trends: Power Dissipation, Scaling and Digitally Assisted Architectures," in *Proceedings of the 2007 IEEE Custom Integrated Circuits Conference*, pp. 105-112, Sep. 2008.
- [18] D. Garrett and M. Stan, "Power Reduction Techniques for a Spread Spectrum Based Correlator," in *Proceedings of the 1997 International Symposium on Low Power Electronics and Design*, pp. 225-230, Aug. 1997.
- [19] M.-L. Liou and T.-D. Chiueh, "A low-power digital matched filter for direct-sequence spreadspectrum signal acquisition," *IEEE Journal of Solid-State Circuits*, vol. 36, no. 6, pp. 933-943, June 2001.
- [20] J.-S. Wu, M.-L. Liou, H.-P. Ma, and T.-D. Chiueh, A 2.6-V 44-MHz all-digital QPSK direct sequence spread-spectrum transceiver IC, *IEEE Journal of Solid-State Circuits*, vol. 32, no. 10, pp. 1539-1540, Oct. 1997.
- [21] S.-H. Yen and C.-K. Wang, "A 2 V CMOS programmable pipelined digital differential matched filter for DS-CDMA system," in *Proceedings of the First IEEE Asia-Pacific Conference on ASICs*, pp. 403-404, August 1999.
- [22] S. Goto, T. Yamada, N. Takayama, Y. Matsushita, Y. Harada and H. Yasuura, "A low-power digital matched filter for spread-spectrum systems," in *Proceedings of the 2002 International Symposium on Low Power Electronics and Design*, pp. 301-306, 2002.
- [23] T. Shibano, K. Lizuka, M. Miyamoto, M. Osaka, R. Miyama and A. Kito, "Matched filter for DS-CDMA of up to 50 MChip/s based on sampled analog signal processing," in *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, pp. 100-439, February 1997.
- [24] R. Bagheri, A. Mirzaei, M. E. Heidari, S. Chehrazi, M. Lee, M. Mikhemar, W. K. Tang and A. A. Abidi, "Software-defined radio receiver: dream to reality," *IEEE Communications Magazine*, vol. 44, no. 8, pp. 111-118, August 2006.