Design techniques for low-power wide-band direct digital frequency synthesizers of spread spectrum communication applications

Jiandong Jiang
Iowa State University

Follow this and additional works at: https://lib.dr.iastate.edu/rtd

Part of the Electrical and Electronics Commons

Recommended Citation
Jiang, Jiandong, "Design techniques for low-power wide-band direct digital frequency synthesizers of spread spectrum communication applications" (2001). Retrospective Theses and Dissertations. 1049.
https://lib.dr.iastate.edu/rtd/1049

This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact digirep@iastate.edu.
INFORMATION TO USERS

This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer.

The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction.

In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.

Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand corner and continuing from left to right in equal sections with small overlaps.

Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6" x 9" black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order.

ProQuest Information and Learning
300 North Zeeb Road, Ann Arbor, MI 48106-1346 USA
800-521-0600

UMI®
Design techniques for low-power wide-band direct digital frequency
synthesizers of spread spectrum communication applications

by

Jiandong Jiang

A dissertation submitted to the graduate faculty
in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

Major: Electrical Engineering (Microelectronics)
Major Professor: Edward K.F. Lee

Iowa State University
Ames, Iowa
2001

Copyright © Jiandong Jiang 2001. All rights reserved.
This is to certify that the Doctoral dissertation of

Jiandong Jiang

has met the dissertation requirements of Iowa State University

Signature was redacted for privacy.

Major Professor

Signature was redacted for privacy.

For the Major Program

Signature was redacted for privacy.

For the Graduate College
# TABLE OF CONTENTS

**LIST OF FIGURES**

**LIST OF TABLES**

**ABSTRACT**

**CHAPTER 1 INTRODUCTION**

1.1 Introduction to Frequency Synthesis 1
1.2 Objectives of the Work 7
1.3 Dissertation Organization 8

**CHAPTER 2 LITERATURE REVIEW**

2.1 DDFS Architecture 9
2.2 Memory Compression Techniques 13
2.3 DDFS Implementation Examples 19

**CHAPTER 3 ANALYSIS OF DDFS OUTPUT SPECTRUM**

3.1 Conventional Nonlinear DAC Based DDFS 25
3.2 Noise and Spurious Signals 29

**CHAPTER 4 LINEAR INTERPOLATION TECHNIQUE FOR SEGMENTED NONLINEAR DAC**

4.1 Linear Interpolation Technique for Segmented Nonlinear DAC’s 37
4.2 Segmentation Optimization for Segmented Nonlinear DAC’s 46
CHAPTER 5 NONLINEAR INTERPOLATION TECHNIQUE FOR SEGMENTED NONLINEAR DAC
5.1 Nonlinear Interpolation Technique for Segmented Nonlinear DAC 53
5.2 Segmentation Optimization of the Proposed Segmented DAC 60

CHAPTER 6 A LOW-POWER WIDE-BAND SEGMENTED NONLINEAR DAC BASED DDFS
6.1 Specifications of the DDFS Chip 67
6.2 Behavioral Model of the Segmented Nonlinear DAC Based DDFS 70
6.3 Circuit Design of the DDFS Chip 75
6.4 DDFS Layout Design 90
6.5 Chip Packaging and PCB Design 94
6.6 Evaluation Setup and Experimental Results 99

CHAPTER 7 CONCLUSION AND CONTRIBUTIONS
7.1 Conclusion 112
7.2 Contributions 112

REFERENCES 116

ACKNOWLEDGMENTS 122
LIST OF FIGURES

Figure 1.1. Architecture of a PLL synthesizer 2
Figure 1.2. Block diagram of a DAFS 4
Figure 1.3. Block diagram and typical waveforms of a generic DDFS 5
Figure 2.1. Tierney et al DDFS architecture 9
Figure 2.2. DDFS using sine wave DAC 12
Figure 2.3. Logic to exploit quarter wave symmetry 13
Figure 2.4. Sunderland's algorithmic approximation 15
Figure 2.5. ROM partition of the Nicolas' architecture 17
Figure 2.6. Hutchison's architecture 18
Figure 3.1. Conceptual nonlinear DAC based DDFS architecture 26
Figure 3.2. Graphical representation of the value of $\phi_k$'s 27
Figure 3.3. Model of DDFS noise sources 30
Figure 3.4. Nicolas' modification on phase accumulator 35
Figure 3.5. Theoretical worst case spurious responses 36
Figure 4.1 Block diagram of a phase interpolation DDFS 38
Figure 4.2. The proposed linear phase interpolation DDFS 40
Figure 4.3. Amplitudes vs. phase for 12 bit phase (with 3 interpolation bits ) 41
Figure 4.4. 16-bit Multiple resistor-string linear DAC 42
Figure 4.5. Proposed multiple resistor-string sine wave DAC 43
Figure 4.6. Proposed R-C hybrid sine wave DAC 45
<table>
<thead>
<tr>
<th>Figure</th>
<th>Description</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>Figure 4.7</td>
<td>Amplitude error plots of &quot;5-5&quot; segmented sine wave DAC</td>
<td>49</td>
</tr>
<tr>
<td>Figure 4.8</td>
<td>Estimated maximum amplitude error vs. standard deviation</td>
<td>51</td>
</tr>
<tr>
<td>Figure 5.1</td>
<td>Graphical representation of the fine DAC output and the Interpolation Of the sine wave DAC</td>
<td>56</td>
</tr>
<tr>
<td>Figure 5.2</td>
<td>The proposed nonlinear phase interpolation DDFS</td>
<td>58</td>
</tr>
<tr>
<td>Figure 5.3</td>
<td>Amplitudes vs. phase for 12-bit phase resolution segmented DAC based DDFS (with 3 interpolation bits)</td>
<td>59</td>
</tr>
<tr>
<td>Figure 5.4</td>
<td>Amplitudes vs. phase for 9-bit phase resolution non-segmented DAC based DDFS (without 3 interpolation bits)</td>
<td>59</td>
</tr>
<tr>
<td>Figure 5.5</td>
<td>Amplitude error plots of the non-segmented and &quot;3-4-3&quot; segmented sine wave DAC</td>
<td>64</td>
</tr>
<tr>
<td>Figure 5.6</td>
<td>Estimated maximum amplitude error vs. standard deviation</td>
<td>66</td>
</tr>
<tr>
<td>Figure 6.1</td>
<td>Generic transceiver architecture</td>
<td>68</td>
</tr>
<tr>
<td>Figure 6.2</td>
<td>The top-down design methodology</td>
<td>70</td>
</tr>
<tr>
<td>Figure 6.3</td>
<td>Block diagram of the &quot;3-4-3&quot; segmentation</td>
<td>71</td>
</tr>
<tr>
<td>Figure 6.4</td>
<td>Behavioral model of the nonlinear phase interpolation DDFS</td>
<td>73</td>
</tr>
<tr>
<td>Figure 6.5</td>
<td>Results from the 12-bit phase resolution DDFS model ((F_{out}/F_{ck} = 43/1024))</td>
<td>75</td>
</tr>
<tr>
<td>Figure 6.6</td>
<td>Register-based pipelined system</td>
<td>76</td>
</tr>
<tr>
<td>Figure 6.7</td>
<td>1-bit transmission gate full adder</td>
<td>77</td>
</tr>
<tr>
<td>Figure 6.8</td>
<td>Full pipelined 16-bit phase accumulator</td>
<td>78</td>
</tr>
<tr>
<td>Figure 6.9</td>
<td>Simulation result of the phase accumulator at 500 MHz clock rate</td>
<td>79</td>
</tr>
<tr>
<td>Figure 6.10</td>
<td>Simulation result of the phase accumulator at 1000 MHz clock rate</td>
<td>80</td>
</tr>
<tr>
<td>Figure 6.11</td>
<td>Global clock driver</td>
<td>81</td>
</tr>
</tbody>
</table>
Figure 6.12. Transmission gate one's complementor 82
Figure 6.13. A local decoder for the rows between the 2\textsuperscript{nd} row and the 7\textsuperscript{th} row 83
Figure 6.14. DAC cell of the sine wave DAC with complementary current outputs 84
Figure 6.15. Schematic and waveforms of a voltage level shifter 85
Figure 6.16. The biasing circuit of the nonlinear DAC 86
Figure 6.17. Spice simulation of DDFS schematic (\(F_{\text{out}}/F_{\text{clk}} = 3/256\)) 88
Figure 6.18. PSD plots of DDFS at clock rate of 500 MHz (a) and 200 MHz (b) 90
Figure 6.19. Layout of the prototype DDFS Chip 91
Figure 6.20. Layout column order of the coarse sine wave DAC 93
Figure 6.21. Photomicrograph of the DDFS chip 94
Figure 6.22. Bonding diagram of the DDFS chip 95
Figure 6.23. The structure of the 4-layer board 96
Figure 6.24. Digital power connections 96
Figure 6.25. Analog power connections 97
Figure 6.26. Testing arrangement using transformer 97
Figure 6.27. Layout of the evaluation board in Eagle\textsuperscript{®} 98
Figure 6.28. Test setup to evaluate the DDFS chip 99
Figure 6.29. Evaluation board with soldered components 100
Figure 6.30. Waveform of 1/128\times F_{\text{CLK}} sine wave output at 600 MHz clock rate 101
Figure 6.31. Zoom-in the waveform of 1/128\times F_{\text{CLK}} at 30 MHz clock rate 101
Figure 6.32. Waveform of 1/256\times F_{\text{CLK}} sine wave output at 930 MHz clock rate 102
Figure 6.33. Spectrum of $3/8 \times F_{CLK}$ sine wave output, where the clock frequency is 80 MHz 103

Figure 6.34. SFDR versus clock frequency for $3/8 \times F_{CLK}$ output 103

Figure 6.35. Spectrum of $3/8 \times F_{CLK}$ sine wave output, where the clock frequency is 300 MHz 104

Figure 6.36. SFDR versus clock frequency for $f_{OUT} = 65/4096$ of $f_{CLK}$ 105

Figure 6.37. Spectrum of $65/4096$ $F_{CLK}$ output for 64 MHz clock frequency 105

Figure 6.38. SFDR versus synthesized frequency for clock frequency of 300 MHz 106

Figure 6.39. Power dissipation versus clock frequency for $f_{OUT} = 1/64 \times f_{CLK}$ 107

Figure 6.40. Power dissipation versus synthesized frequency for $f_{CLK} = 500$ MHz 108

Figure 6.41. Spectrum plots of $1/4 \times F_{CLK}$ output for clock frequency of 300 MHz 109
LIST OF TABLES

Table 1.1. Comparison of frequency synthesis techniques 6

Table 4.1. FM's for the segmented nonlinear sine wave DAC using Linear interpolation technique 48

Table 4.2. Optimal segmentations for sine wave DAC’s for different resolutions when linear interpolation technique is used 50

Table 5.1. Optimal parameters for a segmented DAC using nonlinear interpolation technique 61

Table 5.2. Optimal segmentations for different phase and amplitude resolutions when linear interpolation technique is used 63

Table 6.1. Values of the coarse DAC Matrix Cells' 72

Table 6.2. Interpolation values of the fine DAC's 72

Table 6.3. Comparison to the non-segmented nonlinear DAC based DDFS 87

Table 6.4. SFDR values from Spice simulations 88

Table 6.5. Measured characteristics of the DDFS chip 110

Table 6.6. Comparison among the recently reported wide-band DDFS's 111
ABSTRACT

For frequency agile communication systems, fast frequency switching in fine frequency steps with good spectral purity is crucial. Direct Digital Frequency Synthesizer (DDFS) is best suitable for these applications, but is not widely employed in wireless communication systems due to its high power consumption. In general, low power consumption and high integration level are two challenges for mixed signal circuits and communication systems designers. In this dissertation, new design techniques for DDFS at both architecture and circuit levels are proposed and investigated in order to minimize power consumption and optimize performance. A ROM-less wide band low power DDFS prototype using segmented sine wave Digital-to-Analog Converter (DAC) were designed, fabricated and tested to demonstrate the new design techniques.

First, to further reduce power consumption and save chip area, two new phase interpolation ROM-less DDFS architectures are proposed. Segmentation technique is applied to the design of sine wave DAC for DDFS: (1) based upon trigonometric identities, a segmented sine wave DAC with fine nonlinear interpolation DAC’s is proposed; (2) based upon first order Taylor series and simple linear interpolation, a segmented sine wave DAC with a fine linear interpolation DAC is proposed. Second, a figure of merit (FM) is defined to find optimal sine wave DAC segmentations for various resolutions of the segmented sine wave DAC’s. The device mismatch effects on the performance of segmented sine wave were also discussed. Third, For DDFS using
current-steering segmented sine wave DAC with 12-b phase resolution and 11-b amplitude resolution, a behavioral model in Verilog was used to verify the functionality and validate the architecture. Finally, a DDFS prototype was designed and fabricated in a standard 0.25\,\mu m CMOS process. The measured SFDR is better than 50 dB with output frequencies up to 3/8 of the 300 MHz clock frequency. The prototype occupies an active area of 1.4 mm$^2$ and consumes 240 mW for 300 MHz clock frequency. The new techniques reduce the power dissipation and die area substantially when compared to conventional ROM based DDFS designs with on-chip DAC.
CHAPTER 1 INTRODUCTION

In this information age, the demands for wider bandwidth and wireless communication become the driving forces for information technology (IT). As one of the building blocks in network infrastructure and mobile communication, frequency synthesizers have been playing a very important role. In this chapter, a brief introduction of frequency synthesis, the objectives of this Ph.D. work and the organization of the dissertation will be presented.

1.1 Introduction to Frequency Synthesis

A frequency synthesizer is defined as an electronic system that generates one or more frequencies from one single frequency reference and some other control signals, such that the ratio of the output frequency to the reference frequency is a rational fraction. Due to the complexity and cost reason, frequency synthesizers were first used in more complex and demanding applications, such as satellite communication terminals, military radios, and radar systems. With the maturity of frequency synthesis, frequency synthesizers can be found in high performance measurement/test equipments, wireless communication systems, networking systems, commercial broadcasting radios, etc. In a word, frequency synthesizers have become ubiquitous with the advances and the evolutions of technology [1].

There are two types of frequency synthesis according to different generation mechanisms in direct and indirect frequency synthesis. In indirect frequency synthesis,
a feedback mechanism is used for the output to lock onto the reference frequency. Indirect frequency synthesis techniques include Phase Locked Loop (PLL) and fractional-N PLL. In direct frequency synthesis, no feedback mechanism is used. Direct frequency synthesis techniques include direct analog frequency synthesis (DAFS) and direct digital frequency synthesis (DDFS).

1.1.1 PLL and Fractional-N PLL

The basic architecture of PLL synthesizer is shown in Figure 1.1. A PLL synthesizer consists of a phase detector, a low-pass loop filter, a voltage-controlled oscillator (VCO) and a programmable frequency divider configured in a loop. The feedback frequency is phase/frequency locked to the main reference frequency, and the output of a PLL synthesizer is obtained from a secondary oscillator.

Figure 1.1. Architecture of a PLL synthesizer
The output frequency can be a multiple of the reference frequency due to the frequency divider. PLL synthesizers provide a very wide frequency range and good spectral purity. Due to the feedback loop in PLL synthesizer, the frequency switching is slower than that of DDFS's or DAFS's. The division range of the programmable divider, which is quite limited, determines the tuning range of a PLL synthesizer. The synthesizer is called Fractional-N PLL synthesizer if the divider is a fractional-N divider. The PLL synthesis is the most widely used frequency synthesis, because it offers a very wide output frequency range and good economics. PLL synthesizers have found applications not only in radar systems and the satellite communication systems, but also in television receivers, digital communication systems, car radios, and stereo systems for home entertainment [3][4].

1.1.2 DAFS Technique

Figure 1.2 shows the block diagram of a direct analog frequency synthesizer, which generates frequencies of 21 MHz, 22 MHz, and 23 MHz. In DAFS, a group of frequencies are generated from the main reference frequency or obtained directly from precision oscillators. These reference frequencies are mixed and filtered, added and/or subtracted, multiplied and/or divided according to the requirements of the output frequencies. The output frequency of DAFS can be higher than the reference frequency due to mixing and filtering. Notice that there is no feedback mechanism involved.

Usually, DAFS's offer excellent spectral purity, and good switching speed. The DAFS switching speed is mainly determined by the response time of the filters and the
switches, and the DAFS spectral purity is mainly determined by the spectral purity of the references and the linearity of mixer, filter and other components. Since the realization of DAFS's is quite complicated and expensive, there are limited applications of DAFS's, such as medical imaging and spectrometers, fast-switching anti-jam communications and radar [1].
1.1.3 DDFS Technique

For frequency agile communication systems such as spread spectrum wireless LAN’s and some digital cellular systems, fast frequency switching in fine frequency steps with good spectral purity is crucial. Synthesizing different output frequencies for these systems is often achieved using DDFS’s. The original DDFS was proposed by Tierney et al[2], Figure 1.3 shows a generic conceptual block diagram of DDFS.

![Block diagram of DDFS](image)

Figure 1.3. Block diagram and typical waveforms of a generic DDFS

From Figure 1.3, a DDFS consists of: (1) a phase accumulator that gives the phase of a sine wave, (2) a ROM lookup table from which the digital amplitudes of a sine wave can be addressed, (3) a DAC that converts the digital amplitudes to analog format, (4) a low pass filter to remove the aliases and unwanted harmonics. The output
frequency range of a DDFS is limited by the Nyquist theorem and can only be up to 45% of the reference frequency. DDFS is usually considered power-hungry due to the required digital circuits such as ROM lookup table. As a result, DDFS has not been widely used in portable wireless communication systems. Uniquely, DDFS's provide convenient digital modulation for some modulation schemes (FM, PM, AM), and friendly interface to the controlling computer of the DDFS's as well. On the down side, digital logic, phase-to-sine-amplitude mapping, and D/A conversion seriously limit the output frequency range. Benefiting from the advances of IC fabrication process and DSP technology, DDFS is now a very important frequency synthesis technique with applications ranging from data communication and cellular telephones, to radar systems and medical imaging. More details about this technique will be discussed in the following chapters.

Table 1.1 compares three different synthesis techniques [5]. The frequency range is defined as the synthesized output frequency range for different frequency

<table>
<thead>
<tr>
<th></th>
<th>DAFS</th>
<th>DDFS</th>
<th>PLL (Fractional N)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Frequency range</td>
<td>Very wide</td>
<td>Limited</td>
<td>Very wide</td>
</tr>
<tr>
<td>Tuning range</td>
<td>Bad</td>
<td>Excellent</td>
<td>Limited</td>
</tr>
<tr>
<td>Switching speed</td>
<td>Fast</td>
<td>Very fast</td>
<td>Moderate</td>
</tr>
<tr>
<td>Spectral purity</td>
<td>Excellent</td>
<td>Good</td>
<td>Good</td>
</tr>
<tr>
<td>Complexity</td>
<td>Complicated</td>
<td>Simple</td>
<td>Complicated</td>
</tr>
<tr>
<td>Size</td>
<td>Bulky</td>
<td>Small</td>
<td>Smallest</td>
</tr>
<tr>
<td>Cost</td>
<td>Expensive</td>
<td>Cheap</td>
<td>Cheapest</td>
</tr>
</tbody>
</table>
synthesis technique. The tuning range of a synthesizer is defined as the frequency
tuning step. The switching speed of a synthesizer is defined as the speed to change the
output frequency. The spectral purity of a synthesizer is defined as the ratio between the
strongest spurious signal and the desired output signal.

1.2 Objectives of the work

Due to the portability requirement and the advances of integrated circuit
fabrication technology, low power design is one of the design challenges for mixed
signal circuits and communication systems design. For frequency agile communication
systems, such as CDMA digital cellular telephones, spread-spectrum wireless LAN's
and military frequency-hopped communication systems, fast switching speed and fine-
tuning resolution is critical [6][7]. DDFS is considered most suitable for these
applications; because it has the advantages of fastest switching, fine tuning frequency
resolution, modulation flexibility.

With the elimination of ROM lookup table in conventional DDFS, the nonlinear
DAC technique has been proved an effective approach for implementing low power
ROM-Less DDFS [8][9]. Since the nonlinear DAC discussed in [8][9] is equivalent to a
thermometer-code DAC, the power consumption and the chip area of a nonlinear DAC
can be further minimized by applying segmentation technique. Besides, the maximum
operating speed of a nonlinear DAC based DDFS can be increased by using advanced
deep sub-micron CMOS technology and pipelined system timing. To summarize the
objectives of this Ph.D. work:
(a) To further reduce power dissipation of a DDFS by using new design
   techniques and investigating new architectures;
(b) To model the proposed DDFS architecture at the system level and verify the
   functionality through simulation;
(c) To find the optimal sine wave DAC segmentations for future applications;
(d) To design a low-power high speed DDFS based on the proposed
   architecture;

1.3 Dissertation Organization

The dissertation consists of seven chapters. CHAPTER 1 gives a general
introduction of frequency synthesis and the objective of this Ph.D. work. In CHAPTER
2, a review of recent publications on DDFS's is presented. In CHAPTER 3, DDFS
output spectrum is analyzed. CHAPTER 4 presents a linear interpolation technique for
segmented nonlinear DAC and provides optimal segmentations in terms of a defined
figure of merit. In CHAPTER 5, a nonlinear interpolation technique for segmented
nonlinear DAC is proposed and optimal segmentations in terms of the defined figure of
merit are recommended as well. CHAPTER 6 addresses the design of a segmented
nonlinear DAC based DDFS chip and the experimental results of the prototype chip.
CHAPTER 7 draws conclusions for this Ph.D. work and lists the major contributions.
CHAPTER 2 LITERATURE REVIEW

In the last decade, DDFS has been studied extensively. A review of recent publications on DDFS's is presented according to the category of DDFS architecture, memory compression technique, and implementation example.

2.1 DDFS Architecture

2.1.1 DDFS Using Look-up Table ROM

The conventional DDFS architecture was originally proposed by Tierney et al. in 1971 [2]. In general, a DDFS consists of a phase accumulator, a phase-to-sine-amplitude look-up table ROM, a digital-to-analog converter (DAC), and a low-pass filter. Figure 2.1 shows the simplified block diagram of conventional DDFS. The mathematical equation of a fixed-amplitude $A$, fixed-phase $\varphi$, constant frequency $\omega$, sinusoidal waveform is: $A \sin (\omega t + \varphi)$. Since this is a well-understood elegant function, it can be shown that the sine wave can be built up from "ground", based upon the

![Figure 2.1. Tierney et al DDFS architecture](image)
observations below: (a) The signal phase is a linear function of time. Notice that the corresponding slope is the angular frequency $\omega$, i.e. $\Phi(t) = \omega t + \phi$, this linear periodicity can be realized by using a digital accumulator which is clocked by the reference frequency. (b) The sine wave amplitudes can be derived by mapping: $\Phi(t) \rightarrow A\sin \Phi(t)$. Usually, a ROM/RAM look-up table is employed to realize this nonlinear mapping. (c) The digital representation of the sinusoidal signal is then converted to analog sine wave by a linear DAC. (d) The harmonic frequencies are removed by low-pass filter, thus a smooth analog sine wave is reconstructed. In many literatures and commercial products, there's no low-pass filter in the so-called DDFS. Tierney et al architecture is simple yet powerful, and most DDFS's are designed based upon this original structure [6][10]-[12].

2.1.2 CORDIC Algorithm Based DDFS

In CORDIC algorithm based DDFS, the phase-to-sine-amplitude mapping is achieved by CORDIC processor. J. Volder first proposed the CORDIC trigonometric computing technique in 1959 [13]. The CORDIC algorithm performs vector coordinate rotation by using simple iterative shifts and add/subtract operations, which are easy to implement in digital circuits like adders, shifters, multiplexers and registers. The most speed critical component will be the adder/subtractor due to the carry propagation. It was suggested that the CORDIC processor could be accelerated by introducing a redundant number representation into the internal computation and eliminating the carry propagation from each addition/subtraction [14][15]. The other approach to increase CORDIC algorithm throughput is to use pipelined processors, with the penalty of large
chip size [16]. Two major problems prevent popular use of the CORDIC algorithm in DDFS architectures, namely poor frequency resolution and high power consumption. The architecture reported in [17], tries to circumvent both of these problems by modifying the classical CORDIC algorithm.

2.1.3 Simplified Angle-Rotation Algorithm Based DDFS

This patented approach uses a simplified angle-rotation algorithm instead of a lookup table [18]. The high speed and high precision spectral purity DDFS IC is implemented as a multiplierless feed-forward data path that allows easy pipelining and limits the accumulation of round-off errors [19]. The modular architecture permits outputs of arbitrary precision by simply cascading enough angle-rotation stages in the data path. The chip was tested to be functional at a clock of 100 MHz, produce 16-b sinusoids with Spurious Free Dynamic Range (SFDR) of 100 dBc.

2.1.4 Digital Signal Processor Based DDFS

In this approach, a sine wave is derived by programming the Digital Signal Processor. A second order IIR resonator, whose impulse response is a sine wave, is used to obtain DDFS outputs [20]. Notice that the impulse response of the IIR resonator can be expressed in a second order recursive formula, the output of a sine wave sample requires a few multiplication and addition operations. If a fast DSP microprocessor can implement the recursive equation in "real-time", a sine wave output is thus derived. This technique is simple to design and reliable to operate, but it is typically slow due to the DSP speed constraint.
2.1.5 DDFS Using Nonlinear Sine Wave DAC

To improve the spectral purity of a sine wave output, larger lookup table ROM is required in a conventional ROM based DDFS. Larger ROM lookup table consumes more power, needs longer access time, occupies larger chip area, and less reliability. To deal with this problem, many ROM compression and optimization techniques were proposed [21] and will be reviewed in more detail later. It is easy to find out that the power dissipation of a DDFS will be decreased substantially if the ROM lookup table is eliminated. A design technique to implement low-power ROM-Less DDFS was proposed in [8][9]. Figure 2.2 shows the simplified block diagram. Theoretically, for the same phase resolution and amplitude resolution, the performance of DDFS using sine wave DAC will match that of the conventional ROM based DDFS with on-chip DAC. In the design discussed in [8][9], the nonlinear sine wave DAC performs phase-

![Figure 2.2. DDFS using sine wave DAC](image)

Figure 2.2. DDFS using sine wave DAC
to-sine-amplitude mapping and linear D/A conversion. The sine wave DAC's are equivalent to a ROM lookup table and a thermometer-coded linear DAC. The advantages of DDFS's using sine wave DAC's are low power and low cost.

2.2 Memory Compression Techniques

2.2.1 Exploitation of Waveform Symmetry

A well-known technique of ROM compression is to store only π/2 radian of sine function, and to generate the complete period of sine wave by exploiting the quarter-wave symmetry. The two most significant phase bits are used to decode the quadrant, while the remaining bits are used to address a one-quadrant sine lookup table. Figure 2.3 shows the logic to exploit this waveform symmetry. The most significant bit (MSB) determines the sign of the lookup table result, and the second MSB determines whether the amplitude is increasing or decreasing. The saving on the size of the ROM lookup

Figure 2.3. Logic to exploit quarter wave symmetry
table using this technique is approximately one fourth of the size of the straightforward $0\sim2\pi$ ROM, but this saving is offset by the additional digital logic circuits. In practice, simple 1's complementor instead of 2's complementor can be utilized when introducing one half least significant bit (LSB) offset to the phase and the amplitude. Therefore hardware requirements of this compression technique can be reduced [21][22].

For applications where quadrature outputs are desired, one can take advantage of the eighth symmetry of a sine and cosine waveform to compress the ROM lookup tables [11]. Specifically, only $0\sim\pi/4$ radiant of the sine function and $0\sim\pi/4$ radiant of the cosine function are stored in two separate ROM's. The two MSB's work as described above, the third MSB exclusive or (XOR) the second MSB and the resulted signal selects the outputs of the two ROM's accordingly for the quadrature outputs.

### 2.2.2 Sine-Phase Difference Algorithm

To decrease the ROM storage requirement of the quarter wave sine function, the difference function $f(\Phi) = \sin (\pi \Phi/2) - \Phi$ instead of $\sin (\pi \Phi/2)$, is stored. Notice the maximum value of the difference function is smaller,

$$\text{Max} \left[ \sin \frac{\Phi \pi}{2} - \Phi \right] = 0.21 \text{Max} \left[ \sin \frac{\Phi \pi}{2} \right] \quad (2-1)$$

By storing the difference, two bits of the amplitude word length in each memory location are saved. The penalty for this memory saving is that an additional adder is required to compute the final sine function amplitudes,

$$\sin \frac{\Phi \pi}{2} = \left( \sin \frac{\Phi \pi}{2} - \Phi \right) + \Phi \quad (2-2)$$
The other advantage of this technique is that the propagation delay of the lookup table is reduced due to the reduction of the number of amplitude bits. This may ensure higher operation reference clock.

2.2.3 Sunderland’s Algorithmic Approximations

Based upon simple trigonometric identities, the quarter wave ROM lookup table can be divided into a coarse ROM and a fine ROM. The extra hardware of the technique is an adder [23]. Figure 2.4 shows the block diagram of this compression method.

If the phase bits are divided into three parts: A, B, and C, based upon trigonometric identities, the desired sine function for the region between 0 and $\pi/2$ can be written as,

$$
\sin \frac{\pi (A + B + C)}{2} = \sin \frac{\pi (A + B)}{2} \cos \frac{\pi C}{2} + \cos \frac{\pi (A + B)}{2} \sin \frac{\pi C}{2}
$$

(2-3)
We further assume that the numbers of bits for the A part, the B part and the C part are a, b and c, respectively. If $A < 1$, $B < 2^{-a}$, and $C < 2^{-(a+b)}$, the right hand side of the equation (2-3) can be approximated as,

$$\sin \frac{\pi (A + B + C)}{2} = \sin \frac{\pi (A + B)}{2} + \cos \frac{\pi A}{2} \sin \frac{\pi C}{2} \quad (2-4)$$

Due to the fact that $\sin (\pi C/2)$ is very small, the second term in the right hand side is much smaller than the first term. Therefore, the first term in the right hand side of (2-4) can be stored in the coarse ROM whose addressing bits are from the A part and the B part, the second term can be stored in the fine ROM whose addressing bits are from the A part and the C part.

This compression technique is very effective. The design discussed in [23] provides the following memory requirements: (a) For 12 addressing bits and 11 bits amplitude quarter wave ROM, the storage requirement is $2^{12} \times 11 = 45056$ bits.

(b) Following Sunderland's algorithmic approximation, the 12 bits are divided to 3 4-bit fraction, the coarse ROM requires $2^{8} \times 11 = 2816$ bits, the fine ROM needs $2^{8} \times 4 = 1024$ bits. The total storage requirement is 3840 bits. (c) The compression ratio = 11.73:1.

2.2.4 Nicolas' Architecture

Similar to the Sunderland's algorithmic approximation, Nicolas' DDFS architecture stores samples based on numerical optimization [22]. The optimum partition of the ROM address word lengths for 13 bits phase resolution is determined through computer simulation: $A = 4$, $B = 4$, and $C = 5$, as shown in Figure 2.5 [22]. The interpolation values are chosen to minimize either the mean square error or the maximum absolute
error of the interpolation within each coarse phase sample's region. Notice that the sine-
phase difference algorithm is employed in this architecture. Further reduction of the fine
ROM storage is achieved by exploiting the symmetry of the fine ROM correction
factors, while this modification needs a subtract/add control logic instead of an adder
for the summation of the coarse ROM and fine ROM outputs in Sunderland's
approximation.

Figure 2.5. ROM partition of the Nicolas' architecture

2.2.5 Taylor Series Approximation

L. A. Weaver and R. J. Kerr proposed a new phase-to-sine-amplitude conversion
technique in a patent [24]. Based on Taylor series approximation, the quarter wave
lookup table ROM can be divided into three smaller ROM's. The total storage
requirement is thus reduced. Let us observe the Taylor series approximation of a sine function around $\theta = a$, then,

$$
\sin \frac{\pi \theta}{2} = \sin \frac{\pi a}{2} + k_1 (\theta - a) \cos \frac{\pi a}{2} - \frac{k_2 (\theta - a)^2}{2} \sin \frac{\pi a}{2} + \cdots \tag{2-5}
$$

Where $k_1$ and $k_2$ are constants. The approximation only keeps three terms, because the rest of the terms contribute much less to the accuracy. The penalty of this technique is that it needs one multiplier and two adders, therefore it is complicate to implement. Due to the speed constraints of multiplier and adders, DDFS using Taylor series approximation is slow. QUALCOMM used this technique in the BiCMOS DDFS product Q2334, which works at 50 MHz clock frequency with 12 bits' word length [25].

2.2.6 Hutchison’s Architecture

Based upon simple linear interpolation, the quarter wave lookup table ROM is partitioned into two smaller ROM's [26]. The coarse ROM represents the total ROM in fewer addresses, and the second ROM use linear interpolation to get more sine function amplitudes. Figure 2.6 shows the block diagram of Hutchison’s algorithm.

![Figure 2.6. Hutchison's architecture](image-url)
More specifically, the fine ROM stores the difference between the true value and the value of sine function at $K_1$ points. To illustrate the efficiency of this compression technique, let us study an example based on the same notation of Figure 2.6 [1]. Assume $W = 14$, $D = 12$, $K_1 = 8$, $K_2 = 12$, $M=4$. The original ROM size is $2^{12}\times 11 = 45,056$ bits. For DDFS using Hutchison’s architecture, the total ROM’s size is $2^8\times 11 + 2^{12}\times 4 = 19,200$. Thus, the compression ratio is 2.346.

2.3 DDFS Implementation Examples

2.3.1 Superconducting DDFS (SDDFS)

A. Spooner and coworkers designed and fabricated a DDFS on a 1-cm$^2$ substrate in superconducting niobium (Nb) Josephson technology and tested at 4 Kelvin. The chip contains a 12-bit pipelined Modified Variable Threshold Logic incremental phase accumulator, a SQUID cell based Sine ROM core with Sunderland’s compression algorithm, R-2R DAC fabricated directly on the superconducting SDDS chip. It’s reported that the spurious content is –30 dB at 2 GHz clock for 500 MHz output frequency. [27]

2.3.2 GaAs/HBT DDFS's

Advanced GaAs and Si-bipolar IC processes, usually costly solutions, have taken their shares in DDFS’s market for their very high operating frequency. A GaAs numerically controlled oscillator (NCO) was designed and fabricated in a 1 µm GaAs
E/D MESFET process based on DCFL logic. The NCO provides both DDFS as well as direct digital phase modulation. The design consists of a frequency phase accumulator, a phase modulation accumulator, an on-chip ROM lookup table, and a waveform output logic, which includes twelve 25 Ω drivers. The NCO achieved a SFDR of -46 dBc by using 8-bit TriQuint DAC. The total power dissipation is less than 2 W [28].

Caglio and coworkers designed an integrated GaAs FM-CW Direct Digital Synthesizer (DDS) by using Philips Microwave Limeil standard ER07AD technology [29]. The DDS is composed of a double phase accumulator (implemented with five chips) and a Digital to Analog Sine Converter (monolithic). The DDS is able to generate chirp signals up to 100 MHz. The maximum measured clock frequency on the phase accumulator is 1.25 GHz and the power consumption is 320 mW. The total power dissipation of the DDS system is 2.2 Watts.

A high speed, high spectral purity DDS was designed by G. W. Kent and N. Sheng. The hybrid GaAs circuit consists of a HBT DAC and a MESFET accumulator/ROM combination. Spectral purity is better than -55 dBc worst spur, up to 245 MHz output frequency [30].

A monolithic digital chirp synthesizer (DCS) chip was fabricated using 1.5 μm GaAs/AlGaAs HIL technology. This DCS chip is capable of producing linear frequency-modulated (chirp) waveforms or single frequency waveforms. It consists of two 28-b pipelined accumulators, a 1.8-kbit sine ROM, a 1.8-kbit cosine ROM, and two 8-bit DAC's. The DCS chip operated at 450 MHz clock frequency with a power dissipation of 18 W [31].
2.3.3 Bipolar/BiCMOS DDFS's

Saul and Mudd presented a bipolar DDS with 5-kHz to 100-MHz frequency output range in 1988. The spurious signal is less than $-32 \text{ dBc}$, and the switching speed is 17 ns. Close-to-carrier measurements indicate a noise floor lower than $-138 \text{ dBc}$ at $\pm 25 \text{ kHz}$ [32]. In another paper of JSSC'90, Saul and Taylor reported a 500 MHz output frequency DDFS with 1 Hz frequency resolution in a 1-µm silicon bipolar process (Plessey Process HE) [10].

A DDS with on-chip DAC was designed and fabricated in a 0.8-µm BiCMOS technology. At 150-MHz clock frequency, the SFDR is better than 60 dBc at low synthesized frequencies, and decreasing to 52 dBc worst case at high synthesized frequencies in the output band (0-75 MHz). The minimum tuning step is 0.0349 Hz with frequency switching speed of 140 ns. The total power dissipation is 0.6 W at 150 MHz @ 5 V. This DDS can operate up to 170 MHz [33].

2.3.4 Low Power CMOS DDFS's

In portable applications and wireless communication systems, power consumption is one of the major concerns. G. Chang et al designed a DDFS with quadrature outputs in a 1µm CMOS process. At 40 MHz clock frequency with a 3 V supply, the DDFS dissipates 40 mW. At low output frequencies, the SFDR is -56 dBc while the worst case ($\xi/3$) SFDR is -50 dBc [34].

Liao and Chen presented a low-power low-voltage DDFS designed in a 0.6 µm CMOS process from TSMC (Taiwan Semiconductor Corporation, Taiwan, China) in
1997. This chip achieved low power dissipation by using a low supply voltage (2 V) and the proposed ROM compression method. The chip was functional at 62.5 MHz clock frequency. The total power consumption of the DDFS chip is 32 mW at 100 MHz with a 2 V supply (powermill simulation) [35].

A multi-threshold-voltage 0.5 μm CMOS process makes 2-V operation possible for the DDFS logic and DAC for a synthesizer chip-set. The CMOS- DDFS consumes 45 mW out of the total 157 mW power budget. The synthesizer achieves a SFDR of 50 dB at 2 GHz [36].

S. Mortezapour and E. Lee presented two low-power quadrature DDFS's design using the proposed nonlinear DAC. In this new DDFS architecture, the conventional ROM lookup table is eliminated. Therefore, significant saving of total power has been achieved. For a clock frequency of 25 MHz with a 3.3-V supply, the power dissipation was measured to be less than 4 mW for the prototype DDFS using nonlinear resistor string DAC. For a 3.3-V power supply and a clock frequency of 230 MHz, the total power dissipation for the DDFS using nonlinear current-mode DAC was measured to be 92 mW with a 1.8 MHz output synthesized frequency. For both DDFS's, the SFDR's are over 55 dB for low synthesized frequencies. The prototype DDFS using nonlinear R-string DAC was designed and fabricated in 1.2-μm CMOS process. The DDFS using nonlinear current-mode DAC was designed and fabricated in a 0.5-μm CMOS process [8][9].

Based on linear interpolation between the sample points and eighth symmetry of the sine function ROM reduction methods, a low power DDFS was designed and
fabricated in 0.8 μm CMOS technology. The DDFS features 60 dBc spectral purity, 9-bit output data for the sine function, and 29-Hz frequency resolution with a power dissipation of 9.5 mW (at 30 MHz, 3.3V). The ROM size is 416 bits. The DDFS chip is used as a building block for a wireless spread-spectrum communication systems [37].

2.3.5 High Speed CMOS DDFS's

Previously, DDFS's have been considered a low speed frequency synthesizer due to the speed constraints of digital logic and DAC’s in a CMOS process. The speed of digital logic has been improved greatly due to the recent advances in IC fabrication technology. L. Tan and H. Samueli reported a 200 MHz Quadrature digital synthesizer/mixer in a 0.8-μm CMOS in 1995. This chip exhibits a wide frequency range (dc to 100 MHz), high spectral purity (−84.3 dBc), fast switching (5ns), and fine frequency resolution (0.047 Hz). This design takes the advantage of sine and cosine eighth symmetry property to reduce memory requirement. The chip also provides modulation capabilities [11].

In another paper by L. Tan et al, an 800-MHz quadrature DDFS chip utilizing a parallel architecture was presented. The QDDFS features high spectral purity of (−84.3 dBc), a wider frequency range (dc to 400 MHz), fast switching (5ns), and fine frequency resolution (0.188 Hz). Using a 0.8 μm CMOS technology, this architecture achieves four-fold speed increase over the previous fastest CMOS design at four times chip area [12] and power dissipation.
2.3.6 Special Purpose CMOS DDFS's

Using a 3.5-µm CMOS/SOS (silicon on sapphire) technology, a single chip, radiation-hardened, DDFS was reported in [23]. In this DDFS chip, Sunderland and co-workers implemented a computational efficient algorithm to reduce memory storage. Combined with a commercial DAC, the DDFS chip has demonstrated spectral purity of -65 dBc over a band extending to 3/8 of the clock frequency. Operating up to 7.5 MHz is possible in a worst case environment, including ionizing radiation level up to $3 \times 10^5$ rads (Si).
CHAPTER 3 ANALYSIS OF DDFS OUTPUT SPECTRUM

Spectral purity of DDFS’s is one of the major concerns in applications. Due to phase truncation, some of the spurious signals may be close to the desired signal and difficult to remove. Therefore, it is important to study the spurious signals and noise in DDFS output. In this chapter, the design of a conventional nonlinear DAC based DDFS will be discussed first. Since the nonlinear DAC based DDFS is equivalent to a ROM based DDFS, the phase truncation error of the nonlinear DAC based DDFS is similar to that of the ROM based DDFS. Based on this observation, the phase truncation error of DAC based DDFS according to the conventional analysis of phase truncation error will be discussed.

3.1 Conventional Nonlinear DAC Based DDFS

3.1.1 Design of Non-segmented Nonlinear DAC Based DDFS

In the nonlinear DAC based DDFS, the sine wave DAC converts digital phase directly into analog sine wave amplitude. By using the first two MSB’s of the phase bits, the quarter wave symmetry of a sine wave can be utilized to reduce the power consumption and the chip area of the nonlinear DAC. Figure 3.1 shows the block diagram of a nonlinear DAC based DDFS. The design of the nonlinear DAC can be explained by first assuming that the phase resolution is j bits; the amplitude resolution of the output signal is i bits; and the amplitude of the output sine wave is equal to $2^{i-1}$
Figure 3.1. Conceptual nonlinear DAC based DDFS architecture

steps. The ideal output of the nonlinear DAC $v_{0,\text{ideal}}$ is a function of the complementor output $st(n)$ and the MSB of the phase accumulator output. It can be written as,

$$
 v_{0,\text{ideal}} = \begin{cases} 
 (2^i - 1) \sin \left( \frac{2\pi st(n)}{2^i - 1} \right), & \text{for } MSB = 0 \\
 -(2^i - 1) \sin \left( \frac{2\pi st(n)}{2^i - 1} \right), & \text{for } MSB = 1 
\end{cases} 
$$

(3-1)

Where $st(n)$ has a range between 0 and $2^{j-2}-1$. Since the nonlinear DAC is based on thermometer-code DAC architecture, it has $2^{i-2}$ cells for the positive part of the sine wave output and $2^{i-2}$ cells for the negative part of the sine wave output. The absolute value of the DDFS output is determined by the complementor output and can be described as
Where $o_k$ is the $k$-th DAC cell output value, which represents the difference between the two adjacent DAC output values when the phase bit is different by 1. Based on (3—1) and (3—2), each $o_k$ can be calculated using the following iterative equation:

$$o_k = \begin{cases} 
\text{int}\left(\left(2^i - 1\right)\sin \frac{2\pi(0.5)}{2^{j-1}}\right) & \text{for } k = 0 \\
\text{int}\left(\left(2^i - 1\right)\sin \frac{2\pi(k + 0.5)}{2^{j-1}} - \sum_{n=0}^{k-1} o_n\right) & \text{for } 1 \leq k \leq 2^{j-2} - 1 
\end{cases}$$  \hspace{1cm} (3—3)

Where int [·] denotes the operation of rounding a real number to the nearest integer for matching purposes in an actual implementation. Figure 3.2 illustrates the value of $o_k$ graphically. It also illustrates the effect of quantization error $\varepsilon_k$. 

Figure 3.2. Graphical representation of the values of $o_k$'s
The value of 0.5 in (3–3) introduces a half least significant bit (LSB) offset to the phase and amplitude such that XOR gates can be used as the 1's complementor [14]. The maximum value of $o_k$, $o_{\text{max}}$, is approximately equal to the maximum value of the slope of the sine wave and can be derived as ceiling $[(2^j - 1)\pi/2^{j-1}]$. To simplify the layout of the nonlinear DAC, the size of each DAC cell will be unified to have the same size and proportional to $o_{\text{max}}$ number of unit current sources.

3.1.2 Disadvantages of Non-segmented Nonlinear DAC Based DDFS

Since the nonlinear DAC will have about the same power dissipation as an $i+1$ bit thermometer-code linear DAC and there is no ROM lookup table, the power dissipation of the nonlinear DAC based DDFS is less than that of a conventional ROM based DDFS. The advantages of non-segmented DACs are the inherent monotonic, the relaxed device-matching requirement, and decreased non-ideal dynamic effects. In order to improve the spectral purity of a DDFS, the phase truncation error needs to be minimized. Therefore, more phase bits are desired to convert to sine amplitudes. When the number of phase resolution bit $j$ increases by 1, the number of DAC cells doubles. For high performance sine wave DAC, local synchronization latches are required. Thus, the DAC size will increase due to the additional digital circuits (local decoders, local latches, larger thermometer decoder). Since there are more digital circuits working at the reference clock frequency, the power consumption will be increased substantially. Therefore, it is desirable to reduce the number of DAC cells to further minimize the
chip area and power dissipation. In the following chapters, two design techniques will be proposed for low power DDFS design.

### 3.2 Noise and Spurious Signals

#### 3.2.1 Fundamental Theory

The output spectrum of an ideal DDFS should contain only a perfect sinusoid, i.e. a function like $\delta(\omega_0)$ in the frequency domain. In practice, many other frequency components, such as clock leakage, distortions, aliasing signals, may appear in the output spectrum. DDFS is a sampled-data system in nature, since the sine wave output is represented by digital amplitude samples first, and then are reconstructed to analog form by D/A conversion and filtering. Hence, the Nyquist sampling theorem applies for DDFS.

Nyquist sampling theorem states that for any (stochastic with finite energy) signal having a band-limited spectrum ($\omega \leq B$), it can be represented by its discrete samples in time, provided that the sampling rate is at least $2 \times F_0$, where $F_0 = B / 2\pi$. For a DDFS with reference clock frequency of $F_{clk}$, the highest output frequency is limited to the Nyquist frequency $0.5 \times F_{clk}$. Practically, it is limited about $0.45 \times F_{clk}$, or 45% of the clock frequency due to the realization of a low pass filter [1].
3.2.2 Noise Sources in DDFS

For simplicity, assume an ideal phase accumulator, phase-to-amplitude mapping block and DAC, an additive noise model for a DDFS is shown in Figure 3.3. The first noise source includes two parts: (a) the phase truncation error, i.e. only part of the MSB's are used to generate sine wave amplitude; (b) the quantization effect of the accumulator due to the finite word length of the accumulator. The noise power of the first noise source \( \varepsilon_{n1} \) is determined by the phase resolution of DDFS, the frequency control word, and the truncated phase bits.

![Figure 3.3. Model of DDFS noise sources](image)

The second source with noise power \( \varepsilon_{n2} \) is due to the finite word length of the digital sine function samples stored in the ROM if a ROM based DDFS is considered. For nonlinear DAC based DDFS, the noise power \( \varepsilon_{n2} \) comes from the systematic design procedure. The output value of a nonlinear DAC cell is rounded to an integer when implementing the nonlinear DAC. The third noise source is D/A conversion noise that
includes the quantization noise and some dynamic non-linearity effects [22][38]. For nonlinear DAC based DDFS, \( e_{a2} \) and \( e_{a2}' \) will be combined to one single error source.

### 3.2.3 Amplitude Quantization Error

Due to the fact that a real number \( R \) is represented by a digital word with finite length, there exists the so-called amplitude quantization noise. To calculate the signal to quantization noise ratio, let us consider a sine wave with amplitude \( A \), which is represented by a \( N \)-bit digital word:

(a) The minimum quantization step = 1 \( U_{\text{LSB}} = 2A/2^N \).

(b) If the quantization error \( e_n = R - [R]_q \) can be treated as a random white noise sequence, which is uniformly distributed between \(-0.5 U_{\text{LSB}} \) and \( 0.5 U_{\text{LSB}} \) with zero mean. The noise power is thus given by

\[
E_n = A^2 2^{-2N+2}/12
\]

(c) The effective power of a sine wave is, \( E_s = (2A)^2/8 = A^2/2 \).

(d) Therefore the signal to noise ratio (SNR) in dB is,

\[
\text{SNR} = 10\log_{10}(E_s/E_n) = 10\log_{10}(6 \times 2^{2N}) = 6.02N + 1.76 \text{ dB} \quad (3-4)
\]

The above signal to noise ratio is a fundamental ratio in theory. Based upon a random noise approximation, this equation gives good estimation of theoretical noise floor for DDFS with large \( N \). Furthermore, increasing the \( N \)-bit digital word by one bit will increase the SNR by about 6 dB.
3.2.4 Analysis of Spurious Signals

In most applications, a low pass filter is employed after the D/A converter to remove any output signals beyond the Nyquist frequency (i.e. $f_{lk}/2$). Usually, errors due to phase truncation is worse than the noise caused by amplitude quantization because it generates spur(s) below the Nyquist frequency rather than a noise floor. Thus, the low pass filter cannot filter out the spur(s) due to phase truncation. In some cases, the various combinations of the frequency control word, the phase accumulator width and the truncated phase bit width result in worst-case spurs very close to the desired frequency, which again cannot be removed by the filters. Therefore, it is important to estimate the number of spurs, the location of spurs, and their magnitude.

Due to the ROM size limitation or the amplitude resolution of the nonlinear DAC, the truncation of phase bits is required for compact and low power DDFS. To use the same notation in Figure 3.1, the truncated phase bits ($L-M=B$ bits) do not generate sine amplitudes directly but can manifest themselves through propagation to $M$ higher phase bits. This is the major source of error due to the phase truncation. By the nature of the operation of the phase accumulator, the error signal is periodic and deterministic.

For a DDFS with $L$-bit accumulator and frequency control word $Fr$, the output sequence of the DDFS is ideally given by,

$$S(n) = \sin \left( \frac{2\pi Fr}{2^L} n \right) \quad (3-5)$$
In essence, the DDFS output sequence is the sampled value of a sinusoidal with frequency \( Fr/2^L \). Considering that \( B = L-M \) bits are truncated, the output sequence is given by,

\[
S_t(n) = \sin \left( 2\pi \frac{2^B}{2^L} \left\langle \frac{F_r}{2^L} n \right\rangle \right) \tag{3-6}
\]

Where the operator "\( \langle \rangle \)" represents the operation of rounding to the nearest integer. The equation (3-3) can be further rewritten as,

\[
S_t(n) = \sin \left( 2\pi \frac{2}{2^L} [F_r n - \varepsilon_t(n)] \right) \tag{3-7}
\]

Nicolas and Samuei suggested that the phase error sequence \( \varepsilon_t(n) \) be modeled as the sampled values of a continuous-time saw-tooth waveform \( \varepsilon_t(t) \) [22]. The amplitude of the saw-tooth waveform is \( 2^B \), and the frequency is \( Fr/2^B \). The Fourier series of the error sequence is then obtained so that the spectral properties of the phase error sequence can be characterized. The procedure to obtain the discrete spectrum of phase truncation error function \( \varepsilon_t(n) \) is summarized below [22][38]:

1. Let: \( \text{gcd} (a, b) = \) the greatest common divisor of \( a \) and \( b \), \( [Y]_x = Y \) modulus \( x \),

\[
\Lambda = 2^{B-1}/\text{gcd} (F_r, 2^B), \quad \Gamma = F_r/\text{gcd} (F_r, 2^B).
\tag{3-8}
\]

2. Number of spurs = \( \Lambda \)

3. Magnitude of all spurs is given by,

\[
\xi_K = \frac{\pi 2^{-L} \text{gcd} (F_r, 2^B)}{\sin \left( \pi K \frac{\text{gcd} (F_r, 2^B)}{2^B} \right)} \tag{3-9}
\]
(4) The sequential frequency number $F_n$ of a spur location is related to the actual analog spur frequency $F_{sp}$ by

$$
F_n = \frac{F_{sp}}{\gcd (F_r, 2^L)} \frac{2^L}{F_{clk}}
$$

(3-10)

$$
F_{sp} = \frac{F_n F_{clk} \gcd (F_r, 2^L)}{2^L}
$$

(3-11)

(5) The locations of the spurs in the spectrum (between 0 and $2^L/\gcd(F_r, 2^L)$) are,

a) for 2 divides $(F_r-A)$: $K = \left[ \frac{F_n - \Gamma}{2^{L-B}} \Gamma^{A-1} \right]_{2^A}$

(3-12)

b) for 2 divides $(-F_r-A)$: $K = \left[ \frac{-F_n - \Gamma}{2^{L-B}} \Gamma^{A-1} \right]_{2^A}$

(3-13)

c) Otherwise: $\xi_K = 0$ (3-14).

Some implications and observations from the above analysis results are very useful for practical design of DDFS:

a) For $\gcd (F_r, 2^B) = 2^{B-1}$, number of spurs $\Lambda = 2^{B-1}/\gcd (F_r, 2^B) = 1$;

b) If $\gcd (F_r, 2^B) = 2^B$, no spur exists due to phase truncation;

c) For $F_r$ values that have same $\gcd (F_r, 2^B)$, the output spectrum have a one to one correspondence between the magnitudes of spurs;

d) The amplitude of the worst-case spur due to phase truncation only is given by,

$$
\xi_{\text{worst}} = \frac{\pi 2^{-L} \gcd (F_r, 2^B)}{\sin \left[ \pi \gcd (F_r, 2^B) \frac{2^B}{2^B} \right]}
$$

(3-15)
It was shown that the amplitude of the worst-case spur is determined by the frequency control word, the word length of the accumulator, the truncated bit number. The largest worst case spur occurs when gcd \( (F_r, 2^B) = 2^{B-1} \), the amplitude of the spur is,

\[
\text{Max}\{\xi_{\text{worst}}\} = \pi 2^{B-L-1} = \pi 2^{-M-1} \quad (3-16)
\]

It is obvious that this largest worst-case spur amplitude decreases as the phase resolution increases. In order to prevent \( F_r \) values that would lead to gcd \( (F_r, 2^B) = 2^{B-1} \), one straightforward solution is to use only odd values of \( F_r \). This way, gcd \( (F_r, 2^B) \) is forced to be one. This would gain 3.992 dB in spurious free dynamic range at the price of reducing the frequency resolution of the synthesizer by a half. By observing the behavior of the least significant bit of the phase accumulator for odd frequency control word, Nicolas and Samueli introduced a way to emulate the operation of a \( L+1 \) bits phase accumulator without much hardware complexity. The modified accumulator is shown in Figure 3.4. An additional D-type Flip-Flop (DFF) and an inverter are connected to provide toggling output between one and zero for the carry input of

![Figure 3.4. Nicolas' modification on phase accumulator](image-url)
the LSB adder. This hardware modification provides a net spur performance gain of 3.922 dB without degradation of the phase accumulator performance [38].

The locations and amplitudes of the spurious signals, which are determined only by the frequency control words and the accumulator resolution, are predicted by the procedure discussed above. According to the DDFS noise model, the ROM and the linear DAC (or the sine wave DAC in nonlinear DAC based DDFS) affect the relative strength of a spurious signal. In another words, the amplitude errors contribute to the noise floor of a DDFS spectrum. As a conclusion of the spurious response analysis, the theoretical worst-case spurious responses of DDFS's are shown in Figure 3.5 [38]. In this plot, the contribution of finite word length effects in ROM and DAC is assumed to be 1/2 LSB.

![Figure 3.5. Theoretical worst case spurious responses](image-url)
CHAPTER 4 LINEAR INTERPOLATION TECHNIQUE
FOR SEGMENTED NONLINEAR DAC

For a nonlinear sine wave DAC, the phase-to-sine-amplitude mapping is a nonlinear conversion. Therefore, segmenting the nonlinear DAC will not be as simple as segmenting a linear DAC. In this chapter, a linear interpolation technique for segmenting a nonlinear DAC is proposed and a figure of merit is defined in order to search for the segmentations that give a reasonable trade-off between power dissipation and performance.

4.1 Linear Interpolation Technique for Segmented Nonlinear DAC

4.1.1 Phase Interpolation DDFS

To decrease power consumption and save chip area of a sine wave DAC based DDFS, a phase interpolation DDFS architecture is proposed in Figure 4.1. Similar to coarse-fine-ROM compression technique of conventional DDFS, the thermometer-code sine wave DAC of a ROM-less DDFS can be divided into a coarse DAC and a fine interpolation DAC. The coarse nonlinear DAC provides low-resolution sine function amplitudes, and the fine nonlinear DAC gives additional resolution by interpolating between the two low-resolution amplitude samples. The total chip area and the total power consumption of the coarse nonlinear DAC and the fine interpolation DAC are less than those of a non-segmented nonlinear DAC, since the total DAC cell number can be reduced significantly.
This new nonlinear DAC based DDFS can be called as segmented nonlinear DAC based DDFS. Segmentation techniques have been used when designing linear DAC to save chip area and power consumption. However, unlike designing a linear DAC, which has a nice linear relationship between the MSB's and the LSB's, the nonlinear sine wave DAC does not have this property due to the inherent nonlinear relationship in the phase-to-sine-amplitude conversion. To achieve satisfactory segmentations for the phase interpolation DDFS shown in Figure 4.1, two segmentation techniques will be proposed and studied.

### 4.1.2 Linear Interpolation Technique

If the sine wave on $[0, \pi/2]$ is divided into infinite small pieces, each piece can be treated as a piece of straight line. That is to say, we can use piecewise linear approximation to represent the sine wave curve. Similar to Hutchison's ROM compression architecture, a non-segmented sine wave DAC can be divided into a coarse sine wave DAC and a fine linear interpolation DAC when simple linear interpolation is
applied. The first order approximation equation for a sine wave amplitude within a
small region between $\theta_a$ and $\theta_b$ can be written as,

$$\sin \frac{\pi \theta}{2} \approx \sin \frac{\pi \theta_a}{2} + \frac{\sin \frac{\pi \theta_b}{2} - \sin \frac{\pi \theta_a}{2}}{\theta_b - \theta_a} \left( \theta - \theta_a \right)$$  \hspace{1cm} (4-1)

for $\theta \in [\theta_a, \theta_b]$ and $[\theta_a, \theta_b] \subset [0, \pi/2]$. Where, $\sin \frac{\pi \theta_a}{2}$ and $\sin \frac{\pi \theta_b}{2}$ are the values of the
two ends of a sine wave piece.

The first term at the right hand side of (4-1) can be realized using a coarse sine
wave DAC, which is a thermometer-code nonlinear DAC. The design of thermometer-
code nonlinear DAC was discussed in chapter 3. The second term at the right hand side
can be implemented by using a fine interpolation linear DAC. Figure 4.2 shows the
conceptual block diagram of a DDFS architecture based on the proposed linear
interpolation technique. The coarse DAC gives an interval to the following fine DAC,
and then the fine linear DAC provides finer interpolation values between the two
adjacent coarse values. From Figure 4.2, the coarse DAC has $(N+1)$ bits and the fine
interpolation DAC has $M$ bits. According to this segmentation, the total DAC cell
number of the segmented DAC is less than that of a non-segmented DAC as shown in
the following inequality equation indicates,

$$2^{N+1} + 2^M < 2^{N+M+1} = 2^{p-1}$$  \hspace{1cm} (4-2)

Therefore, the chip area and power consumption will be minimized due to the reduction
of the DAC cell number.
For a DDFS example with 12-bit phase resolution and 11-bit amplitude resolution, we assume that the quarter-wave coarse DAC has 7 phase bits, and the linear fine interpolation DAC has 3 interpolation phase bits. The amplitude of the coarse DAC is given by,

\[ o_k = \begin{cases} \text{int} \left[ \left(2^{10} - 1\right) \sin \frac{2\pi(0.5)}{2^{9-i}} \right] & \text{for } k = 0 \\ \text{int} \left[ \left(2^{10} - 1\right) \sin \frac{2\pi(k + 0.5)}{2^{9-i}} - \sum_{n=0}^{k-1} o_n \right] & \text{for } 1 \leq k \leq 2^7 - 1 \end{cases} \tag{4-3} \]

The amplitude plot is shown in Figure 4.3. Based upon equation (4-1), the fine linear DAC provides additional fine amplitude samples by linearly interpolated between the intervals determined by the coarse sine wave DAC.
4.1.2 Proposed Segmented Nonlinear DAC Architectures

Due to the nonlinear relationship in the phase-to-sine-amplitude conversion, the coarse sine wave DAC can be implemented using a nonlinear resistor string. Unlike linear resistor string, the taps of the nonlinear resistor string are designed using the equation (3–3). The fine interpolation DAC can be realized using a regular linear resistor string, or by using a thermometer-coded programmable capacitor array. Voltage buffers may be required after the coarse DAC in order to drive the fine linear DAC.

To illustrate this idea, the design of a high-resolution multiple resistor-string linear DAC [39] is discussed here. Figure 4.4 shows the architecture of a 16-bit multiple resistor string linear DAC.
As can be seen from Figure 4.4, the multiple resistor-string linear DAC consists of a coarse resistor string (R1 to R255) that is decoded by 8 MSB's, and a fine resistor string (R256 to R511) that is decoded by 8 LSB's. The fine tapped resistor string is connected between buffers whose inputs are two adjacent nodes of the first resistor string. The second resistor string linearly interpolates between the two adjacent voltages from the first resistor string. A special arrangement of the switches in the MSB segment decoder logic and in the LSB segment decoder logic can be used to make the system independent of the offset voltage of the voltage followers.
If the coarse resistor string is designed based on a nonlinear resistor-string, and the fine resistor-string is designed based on a simple linear interpolation resistor-string, then this multiple resistor-string sine wave DAC can be used to realize the proposed linear interpolation technique. Figure 4.5 shows a simplified schematic of a 7-bit multiple resistor-string nonlinear DAC.

![Proposed multiple resistor-string sine wave DAC](image)

**Figure 4.5. Proposed multiple resistor-string sine wave DAC**

The DDFS using this nonlinear DAC has 8-bit phase resolution since the 2nd MSB controls the complementor, which is shown in Figure 4.2. It should be pointed out that the coarse resistor-string is symmetric about the mid-point, because the MSB is used to decide the sign of the sine wave (i.e. MSB = 1, the taps in the lower half of the
resistor string is selected according to the three middle MSB's). The resistors of the fine R-String are made of equal values. This approach needs only $2 \times 2^3 + 2^3 = 24$ resistor taps along with three voltage buffers, while full thermometer-coded R-string sine wave DAC needs $2^7 = 128$ resistor taps and one voltage buffer. If the phase resolution is 12 bits, the "5-5" segmentation (5 bits for the coarse DAC and 5 bits for the fine DAC) multiple R-String approach requires $2 \times 2^5 + 2^5 = 96$ resistor taps and three voltage buffers; while the full thermometer-coded R-string sine wave DAC requires $2 \times 2^{10} = 2048$ resistor taps and one voltage buffer. This architecture guarantees monotonicity if the offset voltage of the voltage buffers are not a major concern. However, the operational amplifiers must be fast enough and low noise [41]. Hence, it would be better to design this multiple R-string sine wave DAC by using advanced Bi-CMOS process. The proposed multiple resistor-string sine wave DAC can work at low voltage if good low voltage operational amplifier is available, thus further decrease power dissipation.

Another approach to implement the proposed linear interpolation sine wave DAC is to combine a nonlinear tapped resistor string with a programmable capacitor array based on switched-capacitor techniques. A good example of resistor-capacitor hybrid linear DAC is reported in [42]. In this two-stage approach, a switched-capacitor binary-weighted D/A converter that is controlled by the 8-bit LSB's, has its capacitors connected to adjacent nodes of a linear resistor-string D/A converter that is controlled by the MSB's.

To apply this resistor-capacitor hybrid linear DAC idea to implement a segmented sine wave DAC, we can use the MSB's of the phase accumulator to select
two adjacent nodes of the nonlinear coarse resistor string and use the LSB's of the phase accumulator to control a thermometer-coded charge-redistribution DAC that accomplishes the linear interpolation as shown in Figure 4.6. Based upon the DDFS structure in Figure 4.2, the MSB and the other middle MSB's determine which pair of voltages across the coarse nonlinear resistor string is passed on to the thermometer-code charge-redistribution DAC.

Figure 4.6. Proposed R-C hybrid sine wave DAC
The operation of this resistor-capacitor hybrid nonlinear DAC is similar to the one designed by Yang and Martin [42], except that the programmable capacitor array of this hybrid nonlinear DAC is thermometer-code and the nonlinear resistor string is designed similar to the amplitude equation (4–3). Like the multiple resistor-string sine wave DAC, the proposed resistor-capacitor hybrid sine wave DAC is most suitable for low power DDFS's. The capacitor $C_d$ in Figure 4.6 is used as deglitching capacitor. Because of the inherent sample-and-hold operation of the thermometer-code charge-redistribution DAC, many undesirable spurs due to settling errors can be avoided and a DDFS using the proposed resistor-capacitor hybrid DAC can have a very good spectral performance along with low power dissipation as long as the operational amplifier can settle within a half of the clock period.

When applying the proposed linear interpolation technique to design ROM-less DDFS design, both multiple resistor-string DAC and resistor-capacitor hybrid DAC architectures are suitable for low power applications, such as battery powered communication systems and wireless LAN's.

### 4.2 Segmentation Optimization for Segmented Nonlinear DAC's

#### 4.2.1 Segmentation Considerations

The segmentation technique discussed above trades off the performance for lower power consumption. Different segmentations for the coarse DAC and the fine DAC may result in quite different performance due to the different systematic
amplitude errors introduced during the process of approximating the sine wave. To study this, the maximum amplitude difference between an ideal sine wave and the segmented nonlinear DAC output (\(MAX_{ERR}\)) is utilized for describing the accuracy of the nonlinear DAC. The maximum amplitude difference is analogous to the maximum integral nonlinearity (\(INL\)) for a linear DAC. Unlike the INL of a linear DAC, the maximum amplitude difference is due to the systematic design procedure of the sine wave DAC. The maximum amplitude difference (\(MAX_{ERR}\)) is approximately inversely proportional to spurious free dynamic range of the synthesized output signal.

To represent the tradeoff between the power consumption (or chip area) and the spectral performance, the total number of DAC cells of the coarse DAC and the fine DAC is counted for different segmentations. The total number of DAC cells (\(TOT_{CELL}\)) can be interpreted differently according to different implementation approach. For example, \(TOT_{CELL}\) may represent the total number of resistor taps for multiple resistor-string sine wave DAC. In resistor-capacitor hybrid segmented nonlinear DAC, the \(TOT_{CELL}\) represents the total number of resistor taps and capacitors. Under the above assumption, a figure of merit for segmentation optimization can be defined as,

\[
FM = \frac{MAX_{ERR_{NM}} \times TOT_{CELL_{NM}}}{MAX_{ERR_{non-segmented}} \times TOT_{CELL_{non-segmented}}} \quad (4-3)
\]

For a 12-bit phase resolution and 11-bit amplitude resolutions DDFS that uses linear interpolation technique, we assume that (1) the number of the MSB's in the coarse sine wave DAC is \(N\); and (2) the number of the LSB's in the fine linear DAC is \(M\). By
Table 4.1. FM’s for the segmented nonlinear sine wave DAC using linear interpolation technique

<table>
<thead>
<tr>
<th>N-M</th>
<th>MAX_ERR (LSB)</th>
<th>TOT_CELL</th>
<th>FM</th>
</tr>
</thead>
<tbody>
<tr>
<td>1-9</td>
<td>299</td>
<td>514</td>
<td>300.16</td>
</tr>
<tr>
<td>2-8</td>
<td>77</td>
<td>260</td>
<td>39.10</td>
</tr>
<tr>
<td>3-7</td>
<td>19</td>
<td>136</td>
<td>5.04</td>
</tr>
<tr>
<td>4-6</td>
<td>5</td>
<td>80</td>
<td>0.78</td>
</tr>
<tr>
<td>5-5</td>
<td>1.39</td>
<td>64</td>
<td>0.18</td>
</tr>
<tr>
<td>6-4</td>
<td>1.30</td>
<td>80</td>
<td>0.20</td>
</tr>
<tr>
<td>7-3</td>
<td>1.30</td>
<td>136</td>
<td>0.34</td>
</tr>
<tr>
<td>8-2</td>
<td>1.09</td>
<td>260</td>
<td>0.56</td>
</tr>
<tr>
<td>9-1</td>
<td>0.97</td>
<td>514</td>
<td>0.98</td>
</tr>
<tr>
<td>10-0</td>
<td>0.50</td>
<td>1024</td>
<td>1.00</td>
</tr>
</tbody>
</table>

using MATLAB simulations, the \( MAX_{ERR} \), \( TOT_{CELL} \), and \( FM \) are calculated for different segmentations and are listed in Table 4.1.

Based on the FM values in Table 4.1, the "5-5" segmentation is considered to be the optimal segmentation. The "5-5" segmentation reduce \( TOT_{CELL} \) of the non-segmented DAC by 16 times when compared to the non-segmented one ("10-0"). Compared to “5-5” segmentation, only half of the resistor taps or capacitors is needed by the fine interpolation DAC using “6-4” segmentation, thus the "6-4" segmentation may be preferred for higher speed due to the smaller time constant and less loading of the inter-stage buffers when the sine wave DAC is implemented using multiple resistor string technique. The expense of the additional chip area and the extra power consumption is almost negligible when compared to the "5-5" segmentation. Figure 4.7 shows the amplitude error plot of the "5-5" DAC in one period.
Based on the above optimization consideration, optimal segmentations of different amplitude and phase resolutions are given in Table 4.2 as a design guideline for practical implementation of nonlinear DAC based DDFS design. In most cases, the values of N and M are almost equal to one half of the phase resolutions (excluding the two MSB’s).

4.2.2 Device Mismatch Effect on "5-5" Segmentation Sine Wave DAC

To study device mismatch effects on the “5-5” segmentation sine wave DAC implemented for a DDFS with 12-bit phase resolution and 11-bit amplitude resolution, we assume that the “5-5” segmented DAC is implemented by using multiple resistor-string architecture, each normally distributed unit resistor has a mean of 1 LSB and a standard deviation (\(\sigma\)). If these unit resistors are utilized to design a 11-bit linear...
Table 4.2 Optimal segmentations for sine wave DAC's for different resolutions when linear interpolation technique is used.

<table>
<thead>
<tr>
<th>Resolutions (amplitude–phase)</th>
<th>Optimal Segmentation (N-M)</th>
<th>FMs of the Optimal Segmentation</th>
</tr>
</thead>
<tbody>
<tr>
<td>8-10</td>
<td>4-4</td>
<td>0.30</td>
</tr>
<tr>
<td>8-11</td>
<td>5-4</td>
<td>0.22</td>
</tr>
<tr>
<td>8-12</td>
<td>5-5</td>
<td>0.16</td>
</tr>
<tr>
<td>9-11</td>
<td>5-4</td>
<td>0.26</td>
</tr>
<tr>
<td>9-12</td>
<td>5-5</td>
<td>0.18</td>
</tr>
<tr>
<td>9-13</td>
<td>6-5</td>
<td>0.10</td>
</tr>
<tr>
<td>10-12</td>
<td>5-5</td>
<td>0.18</td>
</tr>
<tr>
<td>10-13</td>
<td>5-6</td>
<td>0.12</td>
</tr>
<tr>
<td>10-14</td>
<td>6-6</td>
<td>0.08</td>
</tr>
<tr>
<td>11-13</td>
<td>6-5</td>
<td>0.12</td>
</tr>
<tr>
<td>11-14</td>
<td>6-6</td>
<td>0.08</td>
</tr>
<tr>
<td>11-15</td>
<td>6-7</td>
<td>0.06</td>
</tr>
<tr>
<td>12-14</td>
<td>6-6</td>
<td>0.08</td>
</tr>
<tr>
<td>12-15</td>
<td>7-6</td>
<td>0.06</td>
</tr>
<tr>
<td>12-16</td>
<td>7-7</td>
<td>0.04</td>
</tr>
</tbody>
</table>

thermometer-code DAC, the peak theoretical value of INL is $0.5 \times \sqrt{2048} \sigma = 16\sqrt{2} \sigma$ [43]. From the discussion in previous section, the maximum amplitude error (MAX_ERR) is associated with the systematic segmentation design. Taking the device mismatch effect into account, the estimated maximum amplitude error is given by,

$$MAX_{-}\text{ERR'} = \sqrt{\text{MAX}_{-}\text{ERR}^2 + \left(16\sqrt{2}\sigma\right)^2} \quad (4-4)$$

In the equation (4-4), when the amplitude error due to device mismatch is taken into account for estimation, the device mismatch effect becomes a significant contributor to the performance degradation of the segmented nonlinear DAC. To simulate the device mismatch effect, the mismatch error on each resistor is represented...
by adding a random variable that has a zero mean and an uniform distribution over [-1, 1] with different standard deviation. The maximum amplitude error is then obtained based on MATLAB simulation. The sine wave DAC has the same architecture as before. Figure 4.8 shows the curve of the estimated amplitude errors based on equation (4-4) and the maximum magnitude errors from the MATLAB simulations. The data from simulation correlate well with the estimation equation (4-4). Therefore, equation (4-4) can be used to estimate the mismatch effect in practice.

Figure 4.8. Estimated maximum amplitude error vs. standard deviation

From Table 4.1, the maximum amplitude error of “5-5” segmentation is 1.39 LSB, if the magnitude error due to device mismatch is around 1.39 LSB, the estimated amplitude error will be 1.96 LSB. In Figure 4.8, when the standard deviation of mismatch error is less than 0.06 LSB, the estimated amplitude error based on equation
(4-4) is less than 1.94 LSB. This means the device mismatch effect will impact on the performance when the standard deviation due to device mismatch errors is greater than 0.06 LSB.

It is known that the relationship between the chip area and the standard deviation $\sigma$ of device random mismatch is approximately described as follow [43],

$$ Area \propto \frac{1}{\sigma^2} \quad (4-5) $$

From (4-5), the required die area increases very fast for decreasing device mismatch error. Since smaller mismatch standard deviation requires larger chip area, it is necessary to find the tolerance of device mismatch errors such that desired performance of a segmented nonlinear DAC can be achieved by occupying a reasonable chip area. Since resistor devices and capacitor devices are among the most expensive devices in integrated circuit fabrication, attention should be paid to the matching characteristics of the resistor/capacitor devices. Based upon the above discussion, we can estimate the reasonable sizes for the resistors/capacitors used in the segmented nonlinear DAC for a desired performance.
CHAPTER 5 NONLINEAR INTERPOLATION

TECHNIQUE FOR SEGMENTED NONLINEAR DAC

In this chapter, another design technique based on nonlinear interpolation is proposed for segmented nonlinear DAC. The segmentation optimization for the nonlinear DAC using this nonlinear interpolation technique will also be discussed.

5.1 Nonlinear Interpolation Technique for Segmented Nonlinear DAC

In reality, the sine wave is divided into finite number of smaller pieces and each piece is not a segment of a straight line. Due to this nonlinear nature, a nonlinear interpolation technique is proposed for segmenting the nonlinear DAC. Similar to the Sunderland’s memory compression algorithm [23], the new nonlinear interpolation technique is based upon the simple trigonometric identities. Suppose the phase resolution of a nonlinear DAC based DDFS is j and the amplitude resolution is i+1. To make the use of the quarter-wave symmetry of a sine wave, the two MSB’s are used to decode the quadrant of the sine wave. The remaining j−2 phase bits are divided into three parts: α, β and γ, where α is the MSB part, β is the middle bit part and γ is the LSB part. We further assume that the numbers of bits for the α part, the β part and the γ part are a, b and c, respectively. Then the range of α, β and γ are: \(0 \leq \alpha \leq x \cdot 2^{bc}\), \(0 \leq \beta \leq y \cdot 2^c\), and \(0 \leq \gamma \leq 2^c - 1\), where x and y are integers given as \(0 \leq x \leq 2^a - 1\) and \(0 \leq y \leq 2^b\).
- 1, respectively. The relative sizes for \( \alpha, \beta \) and \( \gamma \) can be written as \( 2^{a+b+c} > \alpha \gg \beta \gg \gamma \), and the first quadrant of the sine wave can be expressed as

\[
(2^i - 1) \sin \frac{\pi(x + \beta + \gamma)}{2(2^a+b+c - 1)} = (2^i - 1) \sin \frac{\pi(x + \beta)}{2(2^a+b+c - 1)} + f(\alpha, \beta, \gamma)
\]

(5-1)

For \( \gamma = 0 \), the first term on the right hand side is equal to the left side and the second term on the right hand side is equal to zero. The first term is monotonic and can be realized as a coarse nonlinear sine wave DAC by using the nonlinear DAC implementation technique discussed in [8][9]. The corresponding DAC cell output values can be found according to the formula of (3-16). Since the total number of bits for \( \alpha \) and \( \beta \) is less than \( j-2 \) bits, the total number of coarse nonlinear DAC cells will be much less than \( 2^{j-1} \), and hence, the coarse DAC will be much smaller than the original full thermometer-code nonlinear DAC. The second term \( f(\alpha, \beta, \gamma) \) is used for interpolating additional amplitude steps between two adjacent coarse DAC outputs and is provided from a fine nonlinear DAC output. Based on trigonometric identities, the output of the fine interpolating DAC is approximately given by

\[
f(\alpha, \beta, \gamma) = (2^i - 1) \cos \frac{\pi(\alpha + \beta_{\text{avg}})}{2(2^a+b+c - 1)} \sin \frac{\pi \gamma}{2(2^a+b+c - 1)}
\]

(5-2)

Where \( \beta_{\text{avg}} \) is the average value of \( \beta \)'s. Due to the smaller value of \( \sin \frac{\pi \gamma}{2(2^a+b+c - 1)} \) and the relative sizes of \( \alpha \) and \( \beta \), \( \beta_{\text{avg}} \) is used such that the interpolation term is determined by \( \alpha \) and \( \gamma \) only to reduce the number of cells in the fine nonlinear DAC.
It can be observed that the output of the fine DAC is non-monotonic. When $\gamma$ equals to zero, the output of the fine DAC, i.e. $f(\alpha, \beta_{avg}, 0)$, always equals to zero for different values of $\alpha$. Hence, if the fine nonlinear DAC is to be realized using the technique described in [8][9], the value of $o_k$ in (3–15) for $\alpha = x \cdot 2^{b+c}$ and $\gamma = 0$ will be negative and have the same absolute value of $o_{k-1}$ (i.e. the fine DAC output value of $f((x-1) \cdot 2^{b+c}, \beta_{avg}, 2^{c-1})$), in order to have the fine DAC output value of $f(x \cdot 2^{b+c}, \beta_{avg}, 0)$ equal to zero. As a result, it may raise a matching issue between the values of $o_k$'s. In addition, a larger and complex fine DAC is required due to the number of required current sources.

Fortunately, the interpolation values for a fixed value of $\alpha$ is monotonic and can be simply realized using a monotonic nonlinear sub-DAC according to the technique discussed in previous chapter. Therefore, the fine interpolation DAC can be constructed using $2^a - 1$ number of nonlinear sub-DAC's. A different sub-DAC is activated according to $\alpha$ and the output of the corresponding nonlinear sub-DAC is determined by $\gamma$. Figure 5.1 illustrates the output of the fine nonlinear DAC for different values of $\alpha$ and $\gamma$. Notice that $2^b$ sections of sine wave with the same $\alpha$ value are interpolated by the same $\alpha$-th sub-DAC. Based on (5–2), the DAC cell output of the $\alpha$-th sub-DAC $o_{\alpha,m}$ can be approximated as

$$o_{\alpha,m} = \begin{cases} \int \left(2^i - 1\right) \cos \left(\frac{\pi (\alpha + \beta_{avg})}{2(2^{a+b+c} - 1)} \sin \frac{0.5\pi}{2(2^{a+b+c} - 1)} \right) & \text{for } m = 0 \\ \int \left(2^i - 1\right) \cos \left(\frac{\pi (\alpha + \beta_{avg})}{2(2^{a+b+c} - 1)} \sin \frac{(m + 0.5)\pi}{2(2^{a+b+c} - 1)} \right) - \sum_{n=0}^{m-1} o_{\alpha,n} & \text{for } 1 \leq m \leq 2^c - 1 \end{cases}$$  

(5–3)
To maximize the SFDR, the $o_{\alpha,m}$'s are further optimized based on MATLAB simulations using (5–3) as a starting point. The overall DDFS output is the sum of the outputs of the coarse nonlinear DAC and the fine interpolation sub-DAC's. If current steering technique is employed to implement the DAC cells, this summation can be
realized by simply connecting the output nodes of the coarse DAC and the fine sub-DAC's together. Compared to the additional digital hardware requirement by coarse-fine-ROM approach used by a ROM based DDFS [23], this is an attractive feature for segmenting the nonlinear DAC. If the proposed nonlinear phase interpolation DDFS architecture is implemented, increasing one phase bit to the fine interpolation DAC will only double the number of fine DAC cells. Furthermore, dividing the phase bits into different input bits for the coarse DAC and the fine sub-DAC, decreases the complexity of thermometer-code decoder, and accelerates the speed of logic operation as well. In the interpolation amplitude equation (5–3), $p_{avg}$ is used in the interpolation term. This implies the independence of interpolation term with the $\beta$ part of the phase bits. In practical implementation of this proposed DDFS architecture, the coarse DAC cell that has the largest $\beta$ for a fixed $\alpha$ can be utilized for implementing the corresponding sub-DAC for a given $\alpha$ by using additional local decoding logic inside the coarse DAC cell with the largest $\beta$. This leads to further savings of power and chip area. In the following chapter, this DAC cell-sharing scheme will be further discussed.

From the above discussion, a novel nonlinear phase interpolation DDFS for segmenting the sine wave DAC is proposed. Conceptually, the DDFS architecture is shown in Figure 5.2. The first two MSB's are used for decoding the quadrant of the sine wave. Hence, the phase resolution of this DDFS is $(a+b+c+2)$ bits. In actual implementation, the fine DAC is separated into $2^a$ number of sub-DAC's, which can be implemented together with the coarse DAC. The implementation details will be discussed further in Chapter 6.
To illustrate the proposed nonlinear phase interpolation DDFS architecture, a DDFS example based upon the proposed architecture that has 12-bit phase resolution and 11-bit amplitude resolution is discussed as follow. The segmented DAC is partitioned to have \( a = 3, \ b = 4, \) and \( c = 3. \) The amplitudes for the coarse DAC cells are given by (5-4), and the amplitudes for the \( \alpha \)th fine sub-DAC cells are given by (5-5), respectively.

\[
\begin{align*}
  \omega_\alpha &= \begin{cases} 
    \left\lfloor 2^{10} - \frac{1}{2^{13}-1} \sin \left( \frac{2\pi(0.5)}{2^{13}-1} \right) \right\rfloor & \text{for } \alpha = \beta = 0 \\
    \left\lfloor 2^{10} - \frac{1}{2^{13}-1} \sin \left( \frac{2\pi(\alpha + \beta + 0.5)}{2^{13}-1} \right) \right\rfloor - \sum_{n=0}^{k-1} o_n & \text{for } 1 \leq \alpha \leq 7 \times 2^7, 1 \leq \beta \leq 15 \times 2^3,
  \end{cases}
\end{align*}
\]

(5-4)

\[
\begin{align*}
  \omega_{\alpha,m} &= \begin{cases} 
    \left\lfloor 2^{10} - \frac{1}{2^{13}-1} \cos \left( \frac{\pi(\alpha + \beta)}{2^{13}-1} \right) \sin \left( \frac{0.5\pi}{2^{13}-1} \right) \right\rfloor & \text{for } m = 0 \\
    \left\lfloor 2^{10} - \frac{1}{2^{13}-1} \cos \left( \frac{\pi(\alpha + \beta)}{2^{13}-1} \right) \sin \left( \frac{(m + 0.5)\pi}{2^{13}-1} \right) \right\rfloor - \sum_{n=0}^{m-1} o_{\alpha,n} & \text{for } 1 \leq m \leq 7.
  \end{cases}
\end{align*}
\]

(5-5)
Figure 5.3. Amplitudes vs. phase for the 12 bit segmented DAC (9+3 phase bits)

Figure 5.4. Amplitudes vs. phase for the 9-b non-segmented DAC

Figure 5.3 shows the output of the "3-4-3" segmented DAC. Figure 5.4 shows the output of the coarse nonlinear DAC. The ideal sine function is plotted in both figures by using thinner lines. It is shown in Figure 5.3 that more amplitude samples are provided by the fine DAC, thus the steps of the DAC output curve are finer. Therefore, the spectral performance of the proposed nonlinear phase interpolation DDFS is better.
The SFDR of the DDFS using this nonlinear interpolation technique will be presented later in Chapter 6.

5.2 Segmentation Optimization of the Proposed Segmented DAC

5.2.1 Segmentation Considerations

Similar to the segmentation considerations discussed in chapter 4, the performance of a segmented nonlinear DAC is represented by the maximum amplitude difference between an ideal sine wave and the segmented nonlinear DAC output (MAX_ERR). The relative savings of chip area and power consumption of the segmented nonlinear DAC is represented by the ratio of the total DAC cells to the DAC cells of the non-segmented counterpart. A figure of merit (FM) is defined as,

\[
FM = \frac{MAX_{-ERR_{a-b}} \times TOT_{-CELL_{a-b}}}{MAX_{-ERR_{non-seg}} \times TOT_{-CELL_{non-seg}}} = \frac{MAX_{-ERR_{a-b}} \times (2^{e+a} + 2^{a+b})}{2^{a+b+c-i}} \tag{5-6}
\]

Where \( TOT_{-CELL_{a-b}} \) is the total number of DAC cells in an “a-b-c” segmented DAC and \( TOT_{-CELL_{non-seg}} \) is the total number of DAC cells in a non-segmented DAC. The total number of the fine DAC cells is \( 2^{e+a} \), but the real number may be smaller because some of the fine interpolation values can be zero. Furthermore, the fine DAC cells can be shared with the coarse DAC cells as discussed later.

For a nonlinear DAC based DDFS shown in Figure 5.2 with 12-bit phase resolution and 11-bit amplitude resolution, the values of the \( MAX_{-ERR} \) and \( FM \) for various segmentations were calculated by using MATLAB. Table 5.1 shows some of
the calculated values. The MAX_ERR and FM for the non-segmented nonlinear DAC are also listed for comparison. Due to quantization error, the MAX_ERR for the non-segmented case is equal to 0.5 LSB independent of j and i. The corresponding FM can also be shown to be independent of j and i, and is always equal to 1, which provides an unbiased reference for comparison. The results for the segmentations corresponding

<table>
<thead>
<tr>
<th>a-b-c</th>
<th>MAX_ERR (LSB)</th>
<th>TOT_CELL</th>
<th>FM</th>
</tr>
</thead>
<tbody>
<tr>
<td>1-7-2</td>
<td>2.24</td>
<td>264</td>
<td>1.16</td>
</tr>
<tr>
<td>2-6-2</td>
<td>1.62</td>
<td>272</td>
<td>0.86</td>
</tr>
<tr>
<td>3-5-2</td>
<td>1.24</td>
<td>288</td>
<td>0.70</td>
</tr>
<tr>
<td>4-4-2</td>
<td>1.14</td>
<td>320</td>
<td>0.72</td>
</tr>
<tr>
<td>5-3-2</td>
<td>1.14</td>
<td>484</td>
<td>0.86</td>
</tr>
<tr>
<td>6-2-2</td>
<td>1.14</td>
<td>512</td>
<td>1.14</td>
</tr>
<tr>
<td>7-1-2</td>
<td>1.14</td>
<td>768</td>
<td>1.72</td>
</tr>
<tr>
<td>1-6-3</td>
<td>4.09</td>
<td>144</td>
<td>1.12</td>
</tr>
<tr>
<td>2-5-3</td>
<td>2.25</td>
<td>160</td>
<td>0.70</td>
</tr>
<tr>
<td>3-4-3</td>
<td>1.64</td>
<td>192</td>
<td>0.62</td>
</tr>
<tr>
<td>4-3-3</td>
<td>1.41</td>
<td>256</td>
<td>0.70</td>
</tr>
<tr>
<td>5-2-3</td>
<td>1.41</td>
<td>384</td>
<td>1.06</td>
</tr>
<tr>
<td>6-1-3</td>
<td>1.41</td>
<td>640</td>
<td>1.76</td>
</tr>
<tr>
<td>1-5-4</td>
<td>8.36</td>
<td>96</td>
<td>1.56</td>
</tr>
<tr>
<td>2-4-4</td>
<td>4.17</td>
<td>128</td>
<td>1.04</td>
</tr>
<tr>
<td>3-3-4</td>
<td>2.36</td>
<td>192</td>
<td>0.88</td>
</tr>
<tr>
<td>4-2-4</td>
<td>1.93</td>
<td>320</td>
<td>1.20</td>
</tr>
<tr>
<td>5-1-4</td>
<td>1.29</td>
<td>576</td>
<td>1.46</td>
</tr>
<tr>
<td>1-4-5</td>
<td>17.00</td>
<td>96</td>
<td>3.18</td>
</tr>
<tr>
<td>2-3-5</td>
<td>8.17</td>
<td>160</td>
<td>2.56</td>
</tr>
<tr>
<td>3-2-5</td>
<td>3.81</td>
<td>288</td>
<td>2.14</td>
</tr>
<tr>
<td>4-1-5</td>
<td>2.31</td>
<td>544</td>
<td>2.46</td>
</tr>
<tr>
<td>5-5-0</td>
<td>0.50</td>
<td>1024</td>
<td>1.00</td>
</tr>
</tbody>
</table>
to \( c > a + b \) and \( c = 1 \) are not listed. For segmentations with \( c > a + b \), the \textit{MAX ERR}'s are usually more than 5 LSBs, and the \textit{FM}'s are usually greater than 5.6. Although the \textit{MAX ERR}'s are small for \( c = 1 \), the corresponding \textit{FM}'s are usually greater than 1.00 due to large number of DAC cells.

From the table, it can be observed that a large coarse DAC (i.e. small \( c \)) usually leads to a small \textit{MAX ERR} but large in chip area as well as power dissipation due to the total number of DAC cells. It can be further observed that when the value of "\( c \)" is fixed, the \textit{MAX ERR} becomes smaller for increasing the value of "\( a \)". This is due to the fact that less number of sine wave sections (equal to \( 2^b \) sections), is required to interpolate using the same \( \alpha \)-th sub-DAC as illustrated in Figure 5.1. When the value of "\( a \)" increases up to a certain point, the differences between all the \( 2^b \) sections for a fixed value of \( \alpha \) will become approximately the same. Furthermore, all the additional fine steps within a section can almost be approximated using linear interpolation. As a result, all the values of \( \alpha_m \)'s in the \( \alpha \)-th sub-DAC will have about the same value (referred to Table 4.6) and the \textit{MAX ERR}'s will remain almost constant for further increase in the value of \( a \). This point represents the optimal segmentation for a given value of \( c \) since any further increase in the value of \( a \) will only increase the number of sub-DACs and hence, die area as well as power dissipation without improving \textit{MAX ERR}. When compared to different combinations, the "3-4-3" segmentation gives a \textit{MAX ERR} almost equal to the minimum value for \( c = 3 \) and has the smallest \textit{FM}, which represents a good compromise between area, power and accuracy. Thus, it was selected for the prototype DDFS chip.
As a guideline for designing DDFS using segmented nonlinear DAC with different i and j values, Table 5.2 shows the optimal segmentations in terms of the defined figure of merit for the segmented nonlinear DAC's with different phase and amplitude resolutions.

Table 5.2 Optimal segmentations for different phase and amplitude resolutions when non-linear interpolation technique is used

<table>
<thead>
<tr>
<th>Resolution</th>
<th>Optimal Segmentation</th>
<th>FM</th>
<th>MAX_ERR (LSB)</th>
</tr>
</thead>
<tbody>
<tr>
<td>(i–j)</td>
<td>(a-b-c)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>8–10</td>
<td>3-3-2</td>
<td>0.42</td>
<td>1.14</td>
</tr>
<tr>
<td>8–11</td>
<td>3-4-2</td>
<td>0.41</td>
<td>1.37</td>
</tr>
<tr>
<td>8–12</td>
<td>2-5-3</td>
<td>0.24</td>
<td>1.55</td>
</tr>
<tr>
<td>9–11</td>
<td>3-4-2</td>
<td>0.51</td>
<td>1.59</td>
</tr>
<tr>
<td>9–12</td>
<td>3-4-3</td>
<td>0.33</td>
<td>1.59</td>
</tr>
<tr>
<td>9–13</td>
<td>2-5-4</td>
<td>0.21</td>
<td>2.10</td>
</tr>
<tr>
<td>10–12</td>
<td>3-4-3</td>
<td>0.31</td>
<td>1.64</td>
</tr>
<tr>
<td>10–13</td>
<td>4-4-3</td>
<td>0.27</td>
<td>1.47</td>
</tr>
<tr>
<td>10–14</td>
<td>3-5-4</td>
<td>0.16</td>
<td>1.61</td>
</tr>
<tr>
<td>11–13</td>
<td>4-4-3</td>
<td>0.31</td>
<td>1.67</td>
</tr>
<tr>
<td>11–14</td>
<td>3-5-4</td>
<td>0.22</td>
<td>2.21</td>
</tr>
<tr>
<td>11–15</td>
<td>4-5-4</td>
<td>0.16</td>
<td>1.58</td>
</tr>
<tr>
<td>12–14</td>
<td>4-5-3</td>
<td>0.28</td>
<td>1.72</td>
</tr>
<tr>
<td>12–15</td>
<td>4-5-4</td>
<td>0.18</td>
<td>1.93</td>
</tr>
<tr>
<td>12–16</td>
<td>4-5-5</td>
<td>0.09</td>
<td>1.85</td>
</tr>
</tbody>
</table>

From Table 5.2, the phase resolution of the DDFS is actually j+2 bits, and the amplitude resolution of the DDFS is i+1 bits. The MAX_ERR's for these segmentations are also listed as references. Since the rule of thumb for DDFS design usually requires an amplitude resolution in the range between 8 bits and 12 bits, and the phase resolution to be 2 to 3 more bits than the amplitude resolution, only these combinations
of phase and amplitude resolutions are shown in the table. For high amplitude resolution and phase resolution, a DDFS using segmented nonlinear DAC has the significant

Figure 5.5. Amplitude error plots of the non-segmented and "3-4-3" segmented DAC

advantages in terms of power dissipation and die area. Figure 5.5 shows the error plots for the non-segmented DAC and the "3-4-3" segmented sine wave DAC. It is shown clearly in Figure 5.5 that the "3-4-3" segmented sine wave DAC has larger amplitude errors. It should be mentioned that a segmented nonlinear DAC might have higher glitch energy. This phenomenon is very similar to the glitches produced in a linear segmented DAC. If the turn-on and turn-off times of the fine sub-DAC cells are different from the turn-on and turn-off times of the coarse DAC cells, it will cause a temporarily increase or decrease in output current and hence, the glitches in the output will occur. The maximum glitch amplitude due to this phenomenon is proportional to

$$\max \left[ \sum_{m=0}^{2^n-1} o_{\alpha,m} \right].$$

This value should be used as a criterion for selecting different segmentation combinations when glitch amplitude becomes the main concern in the
design. Nevertheless, this kind of glitches can be minimized if local latches are used inside the DAC cells to synchronize the turn-on and turn-off times of the DAC cell output currents.

5.2.2 Device Mismatch Effects on "3-4-3" Segmentation DAC

To study device mismatch effects on the "3-4-3" segmentation sine wave DAC, we assume that each unit current source generates an output current with a mean of 1 LSB and a standard deviation (σ). The distribution is assumed to be normal distribution. For 2048 unit current sources, the peak theoretical value of INL for a linear DAC is $0.5 \times \sqrt{2048} \sigma = 16\sqrt{2}\sigma$ [43]. Similar to the discussion in Chapter 4, the estimated worst-case maximum amplitude error can be expressed as follow,

$$\text{MAX}_\text{ERR'} = \sqrt{\text{MAX}_\text{ERR}^2 + (16\sqrt{2}\sigma)^2} \quad (5-7)$$

Using similar MATLAB simulation as discussed in Chapter 4, the curve of the estimated amplitude errors based on equation (5-7) and the maximum magnitude errors from the MATLAB simulations are shown in Figure 5.6. Again, the data from simulation correlate well with the estimation equation (5-7). Applying the same arguments in Chapter 4, when the peak theoretical value of INL due to device mismatch is comparable to the maximum amplitude error due to systematic design, the device mismatch effect becomes an important contributor to the performance degradation of a segmented nonlinear DAC. If the peak theoretical value of INL is 1.64 LSB, the estimated worst-case maximum error will be 2.32 LSB from (5-7). In Figure 5.6, when the standard deviation of mismatch error is less than 0.06 LSB, the estimated worst
amplitude error is less than 2.22 LSB. This indicates that the device mismatch effect will impact on the performance of the “3-4-3” segmented nonlinear DAC when the standard deviation is greater than 0.06 LSB. In practice, we can estimate the chip area requirement for satisfactory performance.

![Graph showing estimated maximum amplitude error vs. standard deviation.](image)

Figure 5.6. Estimated maximum amplitude error vs. standard deviation
A segmented nonlinear DAC based DDFS prototype chip is presented in this chapter. The DDFS has 12 bits of phase resolution and 11 bits of amplitude resolution. It was fabricated in a standard 0.25 µm CMOS process with an active area of 1.4 mm². For a clock frequency of 300 MHz, the spurious free dynamic range (SFDR) is better than 50 dB with the output frequencies up to 3/8 of the clock frequency.

6.1 Specifications of the DDFS Chip

6.1.1 DDFS Specification

DDFS is best suitable for frequency agile communication applications because it has the advantages of fine frequency step and fast switching speed. It is important to understand the requirement of application in order to define the DDFS specification. The channel spacing can be as small as 30 kHz with the center frequency in the vicinity of 900 MHz or 1.9 GHz in some wireless standards, such as Advanced Mobile Phone Service (AMPS) and North American Digital Cellular (NADC). This means that the Local Oscillator (LO) frequency may be required to change by step of only 30 kHz when changing the receiver or the transmitter channel. Figure 6.1 shows a generic wireless transceiver architecture that uses frequency synthesizer to select different channels [44]. When the frequency synthesizer uses a DDFS, the final output frequency
of the frequency synthesizer is obtained by mixing the DDFS output with the output of a high frequency local oscillator inside the frequency synthesizer. In this case, the DDFS is used for finer frequency selection and the high frequency local oscillator is used for converting the DDFS output frequency up to the gigahertz range.

![Generic transceiver architecture](image)

Figure 6.1. Generic transceiver architecture

Wide frequency range and low power consumption are two major challenges in DDFS design. The proposed nonlinear phase interpolation DDFS has the potential to consume less power. The phase noise and spurs of a synthesizer impact the transceiver system performance. Typically, the spurs of synthesizer should be approximately 60 dB below the carrier. In order to achieve SFDR over 60 dB, both the phase resolution and the amplitude resolution of the sine wave DAC have to be determined. From theoretical worst-case spurious response shown in Figure 3.3, a DDFS with 12 bits of phase
resolution and 11 bits of amplitude resolution can provide over 60 dB SFDR. To demonstrate the proposed DDFS technique, a 16-bit phase accumulator is used.

The speed of a DDFS using sine wave DAC is determined by the speed of the logic operation and the digital-to-analog conversion. Among the logic blocks, the phase accumulator may become the bottleneck. Full-pipelined architecture is the best choice to design a high-speed accumulator that may achieve a speed comparable to that of a simple logic circuit. For measuring the high speed DAC, the output current is applied directly to two 50 Ω or 75 Ω off-chip resistors or a differential-to-single-ended transformer. A 10-b 500-Msample/s current steering DAC in standard digital 0.35-μm CMOS process was reported in [43]. Hence, it is possible to design a 500 MHz DDFS in 0.25-μm CMOS process.

In summary, the DDFS prototype chip is to provide: 12 bits of phase resolution, 11 bits of amplitude resolution, maximum operation clock frequency ≥ 200 MHz, and SFDR ≥ 60 dB for low synthesized output frequency.

6.1.2 Design Methodology

The DDFS chip design follows a top-down design methodology. First, the system specifications are determined based on the application requirements. Second, the amplitudes of the segmented DAC cells are calculated. Third, a behavioral model is developed in Verilog® Hardware Description Language (HDL) to describe the functionality of the DDFS chip, and to simulate at the system-level for verifying the functionality DDFS system and for optimizing the segmented nonlinear DAC. Fourth,
the schematic of the DDFS system is designed and spice simulation is performed. Fifth, the layout of DDFS is designed and the physical design is verified by using the Cadence design verification tools. Finally, an evaluation printed circuit board (PCB) is designed and the prototype DDFS chips are tested. A block diagram of this design methodology is shown in Figure 6.2.

![Block Diagram]

Figure 6.2. The top-down design methodology

### 6.2 Behavioral Model of the Segmented Nonlinear DAC Based DDFS

Before starting the schematic design, it is necessary to verify the functionality of the proposed DDFS. The proposed DDFS can be described using Verilog ® HDL. Usually, it takes days to finish a Spice simulation of the DDFS schematic. It only takes minutes for this DDFS behavioral model to run a simulation in Verilog-XL®.
6.2.1 Design of "3-4-3" Segmentation Sine Wave DAC

From the DDFS system specifications, the twelve MSB's of the phase accumulator are utilized to convert to sine function amplitude. The amplitude resolution of the DAC is eleven bits. Figure 6.3 shows the block diagram of the "3-4-3" segmentation sine wave DAC.

![Block diagram of the "3-4-3" segmentation sine wave DAC](image)

Figure 6.3. Block diagram of the "3-4-3" segmentation sine wave DAC

Notice that seven coarse DAC cells in shadow are shared by the fine sub-DAC's that provide the interpolation steps for the coarse DAC cells in the same row. This scheme can save chip area and power consumption and improve device matching. Some dedicated local decoders are designed for this cell-sharing scheme and seven local decoders of the coarse DAC cells are saved. The global fine DAC decoder consists of a
3-bit thermometer-code decoder and a 3-to-7 decoder. For the coarse sine wave DAC, the row decoder is a 3-bit thermometer-code decoder and the column decoder is a 4-bit thermometer-code decoder. The values of the coarse DAC cells are listed in Table 6.1. Table 6.2 lists the interpolation values for the fine interpolation DAC.

<table>
<thead>
<tr>
<th>α \ β</th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>9</th>
<th>10</th>
<th>11</th>
<th>12</th>
<th>13</th>
<th>14</th>
<th>15</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>12</td>
<td>13</td>
<td>12</td>
<td>13</td>
<td>12</td>
<td>13</td>
<td>12</td>
<td>13</td>
<td>12</td>
<td>13</td>
<td>12</td>
<td>13</td>
<td>12</td>
<td>13</td>
<td>12</td>
</tr>
<tr>
<td>1</td>
<td>12</td>
<td>13</td>
<td>12</td>
<td>12</td>
<td>13</td>
<td>12</td>
<td>12</td>
<td>12</td>
<td>12</td>
<td>12</td>
<td>11</td>
<td>12</td>
<td>12</td>
<td>12</td>
<td>12</td>
<td>12</td>
</tr>
<tr>
<td>3</td>
<td>10</td>
<td>11</td>
<td>11</td>
<td>11</td>
<td>10</td>
<td>10</td>
<td>10</td>
<td>10</td>
<td>11</td>
<td>10</td>
<td>9</td>
<td>10</td>
<td>10</td>
<td>9</td>
<td>10</td>
<td>9</td>
</tr>
<tr>
<td>4</td>
<td>9</td>
<td>8</td>
<td>9</td>
<td>8</td>
<td>9</td>
<td>8</td>
<td>8</td>
<td>8</td>
<td>8</td>
<td>8</td>
<td>8</td>
<td>7</td>
<td>8</td>
<td>7</td>
<td>8</td>
<td>7</td>
</tr>
<tr>
<td>5</td>
<td>7</td>
<td>7</td>
<td>7</td>
<td>7</td>
<td>6</td>
<td>6</td>
<td>5</td>
<td>6</td>
<td>5</td>
<td>6</td>
<td>5</td>
<td>6</td>
<td>5</td>
<td>6</td>
<td>5</td>
<td>6</td>
</tr>
<tr>
<td>6</td>
<td>4</td>
<td>5</td>
<td>5</td>
<td>4</td>
<td>5</td>
<td>4</td>
<td>3</td>
<td>4</td>
<td>3</td>
<td>3</td>
<td>4</td>
<td>2</td>
<td>3</td>
<td>3</td>
<td>2</td>
<td>3</td>
</tr>
<tr>
<td>7</td>
<td>2</td>
<td>3</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>2</td>
<td>1</td>
<td>2</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>α \ γ</th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>1</td>
<td>2</td>
<td>1</td>
<td>2</td>
<td>1</td>
<td>2</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>2</td>
<td>1</td>
<td>1</td>
<td>2</td>
<td>1</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>1</td>
<td>1</td>
<td>2</td>
<td>1</td>
<td>1</td>
<td>2</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>2</td>
<td>1</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>5</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>7</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td></td>
</tr>
</tbody>
</table>
There are only forty non-zero values in Table 6.2 that need to be realized by the fine DAC. Since the last row contains all zeros, a 3-to-7 decoder instead of a 3-to-8 decoder is used. Therefore, the hardware requirement is less than what is predicted in Chapter 5 during the optimization of the sine wave DAC.

6.2.2 DDFS Behavioral Model and Simulation at System-level

To model the DAC cells, a simplified block diagram of the behavioral model for the DDFS using “3-4-3” sine wave DAC is shown in Figure 6.4. By nature, the

![Behavioral model of the nonlinear phase interpolation DDFS](image-url)
proposed DDFS using sine wave DAC is a mixed signal system. The digital parts of the proposed DDFS can be described using HDL. In order to describe the proposed nonlinear DAC in HDL, it should be mentioned that the proposed nonlinear DAC performs the conversion of the digital phase from a phase accumulator to the analog sine wave output. Therefore, the proposed nonlinear DAC is analogous to a ROM lookup table plus a conventional linear DAC. If we assume that the digital-to-analog conversion is ideal, the output of the proposed nonlinear DAC can be addressed using a phase-to-sine-amplitude lookup table (as shown in Figure 6.4). The values in the lookup table are from Table 6.1 and Table 6.2. Under these assumption and arrangements, the simulation results at the system-level will validate the proposed DDFS architecture and will provide the upper bound of the DDFS performance.

While implementing the phase-to-sine-amplitude lookup table, address-decoding scheme is designed to simulate the local decoders of the proposed nonlinear DAC. To realize the proposed DAC-cell-sharing scheme, a special local decoding scheme is designed such that the fine DAC cells for certain row (with a fixed $\alpha$ value) are implemented inside the coarse DAC cell with the largest $\beta$ in the same row. Specifically, the fine interpolation steps between the coarse values are provided by these fine DAC cells inside the coarse DAC cell with the largest $\beta$ value. When the entire row is selected, all the fine DAC cells in this row will be selected. The results from Verilog-XL® are analyzed using MATLAB® programs. Figure 6.5 shows the simulation results from the DDFS model. The frequency is assumed to be 500 MHz and the ratio of the output frequency to the clock frequency is: $F_{\text{out}}/F_{\text{clk}} = 43/1024$. 
Figure 6.5 Results from the 12-bit phase resolution DDFS model ($F_{out}/F_{clk} = 43/1024$)

From the power spectral density (PSD) plot, Signal-to-Noise-Ratio (SNR) is 61.60 dB and SFDR is 71.58 dB. By using the simulation at system level, the segmented nonlinear DAC was optimized and the DDFS architecture was validated.

6.3 Circuit Design of the DDFS Chip

6.3.1 Pipelined System Approach

A DDFS is a synchronized system. In a conventional DDFS, the ROM lookup table is the bottleneck for high-speed operation. The proposed DDFS architecture is suitable for high-speed applications because the slow phase-to-amplitude mapping ROM is eliminated. For 500 MHz clock, the arithmetic operations must complete within 2 ns. This requirement is very stringent for both digital and analog circuits. To increase
the clock frequency, register-based pipelined system timing is employed by this DDFS chip. Due to the latency time of the pipelined system, it will delay the frequency switching by a fixed number of clock periods. A register-based pipelined system allows all state changes occur at the rising (or falling) clock edges. A typical pipelined system is shown in Figure 6.6 [45]. At 500 MHz clock rate, the timing requirement for this example is,

\[ T_q + T_d + T_s \leq 2 \text{ns} \quad (6-1) \]

Where \( T_d \) is the worst-case delay through the combinatorial logic block, \( T_q \) is the delay from Register A to the input of the combinatorial block, and \( T_s \) is the set-up time for register B.

![Figure 6.6. Register-based pipelined system](image)

### 6.3.2 The Phase Accumulator Design

A phase accumulator generates phase and determines the output frequency as well as the frequency tuning resolution. It is critical for spectral purity. The phase accumulator is one of the slowest digital blocks in a DDFS because it cannot complete multi-bit addition in a short clock period if simple carry look-ahead scheme is used.
Usually, a phase accumulator consists of a multi-bit adder and some registers. Since the operation speed of a full adder depends strongly on its carry propagation delay, a multi-bit adder is slower than a single-bit adder due to the propagation delay of the more carry bits. By using pipelined scheme [33], the speed of the phase accumulator can be increased. Based on this technique, a 16-b full-pipelined phase accumulator was designed. The 1-bit area-optimized transmission-gate full adder [46], which is used in the phase accumulator, is shown in Figure 6.7.

![Figure 6.7. 1-bit transmission gate full adder](image)

In order to drive another adder and a register, two buffers are added after the transmission gates of "SUM" and "CARRY_OUT" outputs such that the outputs of the transmission gates are buffered. Figure 6.8 shows the schematic of the full-pipelined phase accumulator. For low dynamic power dissipation, static D-Flip-Flop's (DFF's) are
used in the phase accumulator. The 12 MSB's out of the 16 phase bits are utilized to convert to the sine wave magnitude. Hence, only the 12 MSB's are delayed by the shift registers in Figure 6.8 such that the 12 MSB's are synchronized. As a result, 58 registers are saved. The hardware modification discussed in Chapter 3 is employed in this accumulator to reduce spurs in spectrum [22]. When \( \text{RESET} = 1 \), the carry input of the

Figure 6.8. Full pipelined 16-bit phase accumulator
LSB adder toggles periodically between 0 and 1, emulating an additional bit such that \( \text{gcd}(\text{FCW}, 2^B) = 1 \). Therefore, the hardware modification randomizes errors introduced by phase truncation and amplitude quantization. The "RESET" pin provides the option to switch on/off the spur reduction method for spectral purity improvement [33]. As can be seen in Figure 6.8, the maximum latency delay is thirty-two clock cycles for the LSB of the phase accumulator output.

Figure 6.9 shows the spice simulation result of the phase accumulator by using TSMC CMOS 0.25-\( \mu \)m typical-typical models at 75 °C with a 500 MHz clock. Figure 6.10 shows another result of the phase accumulator by using the typical-typical models at temperature of 75 °C with a clock frequency of 1000 MHz. Based upon these
simulations, the accumulator can work at a clock frequency over 500 MHz for various design corners. To test the functionality of the phase accumulator, some accumulator outputs are designed to drive output pads. These outputs are: the sum of the first adder P0, the carry-out of the first adder CARRY0, the sum of the fourth adder P3, the sum of the sixteenth adder P15, and the carry out of the sixteenth adder OVFL.

Figure 6.10. Simulation result of the phase accumulator at 1000 MHz clock rate

6.3.3 Clock Driver Design

In this DDFS design, static DFF's and latches with standard two-phase clocking scheme are employed. Since small amount of clock skew may limit the speed of operation, a technique of eliminating process-dependent clock skew was applied in the
global clock driver design [47]. By using this technique, the delays of the two complementary clock signals propagating along two different paths can be matched against all process variations if the sum of the pull-up delays in two paths and the sum of pull-down delays in two paths are matched to each other, respectively. Figure 6.11 shows the simplified schematic of the global clock driver circuit. To balance the clock signal delays, the global clock signal outputs of this driver are carefully routed to the local clock buffers that drive DFF's and latches directly.

![Figure 6.11. Global clock driver](image)

6.3.4 One's Complementor and Decoders Design

From the DDFS system design, the conventional two's complementor can be simplified to a one's complementor if a half LSB offset is introduced to the phase and amplitude. The one's complementor can then be realized simply by using exclusive-or (XOR) logic gates. Transmission gate XOR gates are utilized to design the one's
complementor due to the low die area requirement and low power dissipation. For a 10-bit 1's complementor, the 2nd MSB signal needs to drive 20 transmission gates and 10 minimum size inverters. Therefore, an additional buffer is needed to improve the driving capability of the 2nd MSB signal. Figure 6.12 shows the schematic of the transmission gate XOR in which a buffer is added at the output to increase the driving capability.

![Schematic of Transmission Gate XOR](image)

**Figure 6.12. Transmission gate one's complementor**

Static CMOS logic is used to implement the thermometer-code decoders and the 3-to-7 decoder. The binary-to-thermometer-code decoders are designed using K-map minimization technique. Based upon the DDFS behavioral model, there are three kinds of local decoders in the coarse DAC and two kinds of special local decoders in the fine interpolation DAC. It will be further discussed later that each DAC cell has two current
sources — one for the positive part of the sine function and one for the negative part. Thus, two sets of complementary decoding outputs are required for the two differential switches in each DAC cell. As an example, the decoder for $N^{th}$ row ($2 \leq N \leq 7$) is shown in Figure 6.13.

![Figure 6.13. A local decoder for the rows between the 2nd row and the 7th row](image)

6.3.5 DAC Cell Design

The DAC cells are designed using current-steering scheme. The output currents feed directly into two off-chip 50-$\Omega$ resistors or a differential-to-single-ended transformer. In order to get a 500-mV voltage swing from each single-end outputs, or a 1000-mV voltage swing from the differential output, the unit current is approximately 5-$\mu$A. Figure 6.14 shows a simplified schematic of a DAC cell. Each DAC cell consists of two sets of current sources — N and P. Two differential switches and two cascoded current sources are employed to generate both the positive region and the negative region of the sine function. The values of the current sources are identical but are selected differently by different local decoders. Notice that cascoded current sources are
used here to reduce current variation due to voltage changes at the output nodes and the
digital signal feed-through from the switches. Only the N current sources are turned on
for the negative sine wave region according to $\alpha$, $\beta$ and $\gamma$, and all the P current sources
are off. For the positive sine wave region, all the N current sources are on and the P
current sources are turned on according to $\alpha$, $\beta$ and $\gamma$ [9]. The regions of sine wave
output are determined by the MSB of the phase accumulator. This approach ensures a
smooth transition between the two regions and hence reduces the glitch energy.

![Diagram of DAC cell](image)

Figure 6.14. DAC cell of the sine wave DAC with complementary current

The current sources in the DAC cells are biased by distributed local biasing
circuits, which are then biased by global biasing currents. More details on biasing
scheme will be given in the next section. The design of the coarse DAC cells, which are
also used by the nonlinear sub-DAC’s, is similar to the design of other coarse DAC
cells except that more local latches and local decoders are required to control the current sources for different $\gamma$ inputs. As a result, they require a larger area. All the DAC cells have dummy transistors to improve the matching of the current sources. For the DAC cells that have the $o_k$'s current outputs less than $\max|o_k|$, extra dummy transistors are included. To minimize the parasitic resistance effect of the switches, the sizes of the switching NMOS transistors are scaled according to the DAC cell output currents. Local latches are used to synchronize the local decoder outputs. To reduce switching signal feed-through effect, voltage level shifters are used to decrease the logic signal swing. The schematic of the voltage level shifter is shown in Figure 6.15.

![Figure 6.15. Schematic and waveforms of a voltage level shifter](image)

The level shifter simply consists of four PMOS transistors used as switches, and each has an aspect ratio of $W/L = 1.2 \, \mu m/0.24 \, \mu m$. The outputs for the positive current switches from the local decoder and the local latches (P_DECP, P_DECN) are then transformed to SWP and SWN respectively. SWP and SWN have voltage swings from
digital power supply DVDD to externally adjustable voltage ADJ_SWV. From spice simulations, the value of ADJ-SWV should be set to about 1.9 Volts to ensure minimum glitches.

6.3.6 Biasing Circuit Design

To minimize parasitic resistance effect on the biasing of the DAC current sources, global biasing currents are utilized instead of global biasing voltages. Figure 6.16 shows the biasing scheme of this DDFS chip.

Figure 6.16. The biasing circuit of the nonlinear DAC

Twenty-four pairs of biasing currents are generated in the "Global Biasing" block similar to the idea of adding local clock buffers in the phase accumulator design. Local biasing circuits are carefully distributed in the entire sine wave DAC such that
they can be shared by adjacent DAC cells. PMOS cascoded current mirrors provide
global bias currents to the local biasing circuits. The local biasing circuit consists of
diode-connected NMOS transistors and is used to provide local biasing voltages directly
to the cascoded current sources. An external resistor is used to generate the reference
current for the sine wave DAC.

### 6.3.7 Compression Efficiency and Spice Simulation

The proposed DDFS architecture reduces power consumption and saves chip
area by decreasing the number of DAC cells significantly. Based upon the DDFS circuit
implementation details discussed in the previous section, and assuming that the non-
segmented nonlinear DAC is implemented using the same circuit design, the
compression efficiency of the proposed DDFS using phase interpolated sine wave DAC
is shown in Table 6.3. Notice that the fine DAC cells that have zero current are not
counted. Compared to the full thermometer-code sine wave DAC, the nonlinear phase
interpolated sine wave DAC decreases the total DAC cells by more than 5 times. The
proposed DDFS architecture is an attractive alternative for low power applications.

<table>
<thead>
<tr>
<th></th>
<th>DDFS using full thermometer-coded sine wave DAC</th>
<th>Proposed DDFS using nonlinear interpolated sine wave DAC</th>
</tr>
</thead>
<tbody>
<tr>
<td>DAC cells</td>
<td>1024</td>
<td>175</td>
</tr>
<tr>
<td>5-b global decoders</td>
<td>2</td>
<td>1 (equivalent)</td>
</tr>
<tr>
<td>Global DFF's</td>
<td>64</td>
<td>39</td>
</tr>
<tr>
<td>Local latches</td>
<td>4096</td>
<td>700</td>
</tr>
<tr>
<td>Local decoder</td>
<td>1024</td>
<td>175</td>
</tr>
</tbody>
</table>
For this DDFS design, it is possible to run spice simulation due to the substantial reduction of DAC cells. Unlike the behavioral model simulation, spice simulation takes a few days to complete one simulation. Figure 6.17 shows simulated waveforms of the MSB of the phase accumulator and the two differential output measured using 50-Ω resistors for a clock frequency of 500 MHz with a temperature of 75 °C.

![Figure 6.17. Spice simulation of DDFS schematic (F_{out}/F_{clk} = 3/256)](image_url)

It can be seen in Figure 6.17 that the outputs stabilize after 16 clock periods, i.e. after 32 ns for a clock rate of 500 MHz. The peak-to-peak single-ended voltage swing of the analog sine wave output is about 530 mV. Thus the differential voltage swing is 1060 mV. Discrete Fourier Transformation (DFT) analysis on the differential output was performed. Table 6.4 lists some of the SFDR values from the DDFS schematic.
simulations based on TSMC CMOS 0.25-μm typical-typical models, at a temperature of 75 °C, where \( F_{\text{out}} / F_{\text{clk}} \) represents the ratio between the output frequency and the clock frequency. It shows that the DDFS can operate up to a clock frequency of 1000 MHz, but the SFDR decreases to 35.70 dB for \( F_{\text{out}} = 48.83 \) MHz. For low clock frequencies and low synthesized output frequencies, the SFDR is greater than 67 dB. From the behavioral model simulation, the upper performance bound is about 71 dB, which is quite close to the spice simulation results. At 500 MHz clock rate, the SFDR's are better than 50 dB for low synthesized output frequencies. As an examples for the spice results, Figure 6.18 gives two PSD (power spectral density) plots for clock frequencies of 500 MHz and 200 MHz, respectively.

<table>
<thead>
<tr>
<th>Clock (MHz)</th>
<th>Output (MHz)</th>
<th>( F_{\text{out}}/F_{\text{clk}} )</th>
<th>SFDR (dB)</th>
</tr>
</thead>
<tbody>
<tr>
<td>10</td>
<td>0.488</td>
<td>25/512</td>
<td>67.75</td>
</tr>
<tr>
<td>200</td>
<td>25</td>
<td>1/8</td>
<td>67.65</td>
</tr>
<tr>
<td>500</td>
<td>5.859</td>
<td>3/256</td>
<td>59.63</td>
</tr>
<tr>
<td>500</td>
<td>22.46</td>
<td>23/512</td>
<td>55.34</td>
</tr>
<tr>
<td>500</td>
<td>24.41</td>
<td>25/512</td>
<td>52.25</td>
</tr>
<tr>
<td>500</td>
<td>62.5</td>
<td>1/8</td>
<td>48.39</td>
</tr>
<tr>
<td>1000</td>
<td>48.83</td>
<td>25/512</td>
<td>35.70</td>
</tr>
</tbody>
</table>
6.4 DDFS Layout Design

6.4.1 Layout Design of the Prototype DDFS

An experimental prototype of the proposed nonlinear phase interpolation DDFS has been designed and fabricated in TSMC 0.25-µm single-poly, penta-metal CMOS technology. The layout of the DDFS Chip including pads is shown in Figure 6.19. Due to economic reason, the minimum area for the DDFS chip is required to be at least 4 mm². The total DDFS die area is 2320-µm × 2020-µm, and the active area is 1.4 mm². In Figure 6.19, the most active and noisy digital blocks, such as the phase accumulator, the global clock driver, and the one's complementor lie in the lower right part of the die. The segmented nonlinear DAC is in the upper half and occupies approximately 4/7 of
Figure 6.19. Layout of the prototype DDFS Chip

the active area. The row and column decoders are located around the edge of DAC matrix cells. The global biasing circuit lies on the top of DAC cells. Local clock buffers and local biasing circuits are distributed carefully in order to minimize the parasitic errors, such as metal path parasitic resistance and capacitance. There are total 52 pads in the chip. Eight pads at the corners are used as dummy pads for double bonding. The analog output pads and other sensitive reference input pads lie at the top of the figure. To decrease signal reflection, sharp corners of the wide power supply paths are avoided by using 45° metal paths.
6.4.2 Layout Design Considerations

As can be seen from Figure 6.19, a careful floor plan has been done to separate the sensitive analog section from the noisy digital blocks. This mixed-signal chip has separate power supplies and grounds: AVDD & DVDD, AGND & DGND. Wide sheets of metal are used for POWER and GROUND in order to minimize the voltage drop along lines due to parasitic resistance. Moreover, several pins are used for power supplies and grounds to reduce the parasitic inductance of bond wires and package traces. Multiple pads and double bonding help to reduce parasitic inductance effect on current outputs. On-chip bypassing capacitors, which are realized by NMOS or PMOS transistors with their drains and sources shorted together, are applied between DVDD and DGND. The total bypassing capacitance used for this purpose is around 7.5 nF. Another bypassing capacitance of about 0.8 nF is applied between reference voltage "ADJ_SW" and DGND. The substrate is connected to AGND, because by doing so the substrate and analog ground "AGND" will have the same variation. Layout design follows the design rule recommendations from TSMC for preventing latch-up effect. Many substrate-connected shields are employed to reduce coupling.

While laying out the cascoded current sources, one-dimensional inter-digitized layout style is utilized to reduce process gradient. Dummy transistors are used for better-matched environment. To make global placement and routing easier, all the current cells have the same sizes, while currents vary from 1 to 13 units. Multi-finger transistors are used for the switches and dummy transistors are utilized to reduce sidewall parasitic capacitance. The analog part and digital part occupy similar chip area in
each DAC matrix cell. Since the fifth metal layer provides less parasitic effects, the current outputs are routed using this metal layer. To minimize parasitic effects, there's no Electro-Static Discharge (ESD) protection diodes in the output pad.

The global reference currents instead of voltages are distributed to reduce interconnect resistance effect. The path from global clock buffer to local buffer is carefully routed. Inside the sine wave DAC cell matrix, a set of DAC cells (four or three cells) uses a common local clock buffer and a common local biasing circuit. In the layout design of the phase accumulator, careful floor plan has been done to place the 1-b adders, the local clock buffers, and the registers such that high-speed operation of the phase accumulator can be achieved. Due to the layout complexity of the proposed sine wave DAC, complete randomization of DAC cells is difficult to realize. Therefore, only the column order of the coarse DAC is randomized. Figure 6.20 shows the actual column order of the coarse DAC cells.

\[ \begin{array}{cccccccccccccc}
8 & 7 & 10 & 9 & 15 & 14 & 12 & 11 & 13 & 1 & 5 & 4 & 3 & 6 & 2 \\
\end{array} \]

![Column 4-b Thermometer-code decoder](image)

Figure 6.20. Layout column order of the coarse sine wave DAC
6.5 Chip Packaging and PCB Design

6.5.1 Chip Packaging

Figure 6.21 shows the photomicrograph of the DDFS chip. Five DDFS chips were packaged using 44-lead TQFP by ASAT. The rest of the twenty unpackaged DDFS dies were sent to Rockwell Collins for further evaluation. The die size of the packaged die is 104 mils × 91 mils. The package model is 44L 210 × 210 TQFP 10 × 10 ETCH. The bonding diagram of the DDFS chip is shown in Figure 6.22. It can be seen from the bonding diagram that the dummy pads at the corner of the die are used for double bonding the corresponding pins. During testing, IC sockets for this 44-Lead TQFP were used on the PCB.
6.5.2 DDFS Evaluation PCB Design

The printed circuit board (PCB) was designed using EAGLE® from CadSoft Computer™. To reduce the coupling effects, the evaluation PCB was designed using 4-layer board. Figure 6.23 shows the structure of the 4-layer PCB. The top and the bottom layers are signal layers. The second layer is for the grounds and third layer is for power supplies.
The measured performance of the device under test (DUT) using the evaluation board can be improved when analog power, digital power, and pad power are connected to separate power supplies. Therefore, the analog power, digital power, and pad power have their own individual banana-style connectors [48]. Figure 6.24 shows the digital power connections.

The power and ground connections for the pads are similar to the digital power connections. Due to the sensitivity of analog power and analog ground, a linear voltage regulator with fixed output voltage of 2.5 V (LM 2937-2.5 from National Semiconductor) is used. Figure 6.25 shows the analog power and ground connections.
Notice that for the best noise rejection on the power supplies, the high value bulk capacitors are placed around the external power connectors, while the smaller value capacitors, which are required for high frequency noise rejection, are placed close to the device under test (DUT).

![Figure 6.25. Analog power connections](image)

In order to test the DUT using a spectrum analyzer or an oscilloscope, the differential current outputs are combined to single-ended output by using a Mini-Circuits transformer (T1-6T). Figure 6.26 shows the simplified connection for this purpose. The flexibility of getting one of the outputs is achieved by the 0-Ω resistor

![Figure 6.26. Testing arrangement using transformer](image)
jumper points on the board.

In practice, a $50-\Omega$ surface mount adapter (SMA) is used for the analog output, and another SMA is used by the reference clock input signal. In order to get better impedance matching, the clock path on the PCB was connected to the digital ground by using an additional $50-\Omega$ resistor. Figure 6.27 shows the layout of the DDFS evaluation board.

![Figure 6.27. Layout of the evaluation board in Eagle®](image-url)
6.6 Evaluation Setup and Experimental Results

6.6.1 Evaluation Setup

To evaluate the DDFS chip, the necessary equipment and evaluation setup is shown in Figure 6.28. The 4-layer evaluation boards were manufactured by Gerland Leiterplatten GMBH, Germany. Figure 6.29 shows the photo of the test PCB with the soldered components.

Figure 6.28. Test setup to evaluate the DDFS chip
6.6.2 Experimental Results

With the test setup described above, the packaged DDFS chips were evaluated at room temperature. The functionality of the phase accumulator was confirmed by observing the waveforms of the digital output signals on an oscilloscope (Tek-TDS 694C). Some of the sine wave outputs were recorded directly from the oscilloscope. Figure 6.30 shows the waveform of the DDFS for an output frequency of 4.69 MHz. The clock frequency was 600 MHz, and the clock frequency to output frequency ratio was set to be 128. Figure 6.31 shows the zoom-in waveform of the DDFS for a clock frequency of 30 MHz. Glitches due to code transitions can be seen in Figure 6.31.
Figure 6.30. Waveform of $1/128 \times F_{CLK}$ sine wave output at 600 MHz clock rate

Figure 6.31. Zoom-in the waveform of $1/128 \times F_{CLK}$ at 30 MHz clock rate
In order to test the maximum clock of the logic circuitry in this DDFS, the clock frequency to output frequency ratio was set to be 256. Figure 6.32 shows the waveform of the DDFS for an output frequency of 3.64 MHz at 930 MHz clock rate. The high operation speed of the digital circuitry is due to the pipelined timing scheme.

Figure 6.32. Waveform of $1/256 \times F_{CLK}$ sine wave output at 930 MHz clock rate

Usually, the worst-case spurs occur when the output frequency is tuned close to $1/4$ or $3/8$ of the clock frequency. The measured SFDR is 63.83 dB with 30.08 MHz output ($3/8 \times f_{CLK}$) for clock frequency of 80 MHz in Figure 6.33. The worst-case spur is the $3^{rd}$ harmonic alias at 10.02 MHz ($f_{alias} = f_{CLK} - 2 \times 3/8 \times f_{CLK} = 1/8 \times f_{CLK}$).
Figure 6.33. Spectrum of $3/8 \times F_{CLK}$ output, where the clock frequency is 80 MHz

Figure 6.34. SFDR versus clock frequency for $3/8 \times F_{CLK}$ output
Figure 6.34 shows the SFDR as a function of clock frequency, for \( f_{\text{OUT}} = 3/8 \) of \( f_{\text{CLK}} \). It can be seen that for this frequency control word, the DDFS achieves a SFDR of over 40 dB up to a clock frequency of 450 MHz. At the clock frequency of 300 MHz, the spectrum of the \( 3/8 \times f_{\text{CLK}} \) sine wave output is shown in Figure 6.35. The SFDR is 57.33 dB.

![Figure 6.35. Spectrum of 3/8 \times f_{\text{CLK}} output, where the clock frequency is 300 MHz](image)

Figure 6.36 shows the SFDR as a function of clock frequency for output frequency \( f_{\text{OUT}} = 65/4096 \times f_{\text{CLK}} \). It can be seen that the DDFS can operate up to 500 MHz with a SFDR greater than 50 dB for this frequency control word. Figure 6.37 shows the spectrum plot of a \( 65/4096 \times f_{\text{CLK}} \) output frequency for a clock frequency of 64 MHz. The SFDR is 64.50 dB.
Figure 6.36. SFDR versus clock frequency for $f_{\text{OUT}} = 65/4096$ of $f_{\text{CLK}}$

Figure 6.37. Spectrum of 65/4096 $f_{\text{CLK}}$ output for 64 MHz clock frequency
Figure 6.38 shows SFDR as a function of synthesized output frequency. For a clock frequency of 300 MHz, the SFDR is better than 60 dB when the synthesized output frequency is low and decreases to 50.34 dB when synthesized output frequency is high. The SFDR is better than 50 dB with output frequencies up to 3/8 of the clock frequency.

![Graph showing SFDR as a function of synthesized output frequency](image)

Figure 6.38. SFDR versus synthesized frequency for clock frequency of 300 MHz

When the supply voltage was set to 2.5 V, the power dissipation was measured to be 240 mW with a reference clock of 300 MHz. The synthesized output frequency was 4.68 MHz. The accumulator consumes approximately 30% of the total power dissipation. For an output frequency of $1/64 \times f_{CLK}$, the total power dissipation versus different clock frequencies are shown in Figure 6.39.
In Figure 6.39, the power dissipation is approximately linearly proportional to the clock frequency. The power dissipation of the analog circuitry in the DDFS is almost constant for fixed biasing currents. Thus, the power dissipation of the digital circuitry determines the relationship between the power dissipation and the clock frequency. Note that the digital building blocks of this DDFS chip were designed using static logic style, and the major part of static logic power dissipation is the dynamic power dissipation, which is proportional to the switching frequency, as illustrated by the following equation [45],

$$P_d = C_L V_{DD}^2 f_p$$  \hspace{1cm} (6-4)

The linear relationship of (6-4) explains the trend in Figure 6.39. Figure 6.40 shows the power dissipation versus synthesized output frequency for a fixed clock
frequency of 500 MHz. The power dissipation increases as the synthesized frequency increases, since more logic state changes occur.

It is discussed in chapter 3 that the extra 1/2 LSB added to the accumulator emulates the operation of a phase accumulator with one additional bit, thus forces the greatest common divisor of the frequency control word and the truncated word to be one. This hardware modification also has an effect of randomizing errors introduced by the nonlinear sine wave DAC. In some frequency control words, adding this 1/2 LSB will make the output spectrum worse [33]. Therefore, it is recommended that this spur reduction method be optional depending on applications. To test the effect of hardware modification in the phase accumulator, the spectrum plots with "RESET=0" and "RESET=1" are shown in Figure 6.41. In Figure 6.41 (a), there are more spurs and the SFDR is only 35 dB when the hardware modification was turned off.
Figure 6.41. Spectrum plots of $1/4 \times F_{CLK}$ output for clock frequency of 300 MHz
When the hardware modification was turned on, as shown in Figure 6.41 (b), there are only two major spurious signals and the SFDR is 51 dB due to the randomization effect.

### 6.6.3 Summary of the DDFS Chip

Table 6.6 summaries the performance of the DDFS chip. When compared with the recently reported DDFS's shown in Table 6.7, this work achieves a higher operation speed with comparable spectral performance and consumes considerably less power and die area. This is achieved mainly due to the use of the segmented sine wave DAC in place of the ROM lookup table and the linear DAC in conventional DDFS. The segmentation technique for the sine wave DAC has been proved effective in reducing power dissipation and saving chip area.

<table>
<thead>
<tr>
<th>Technology</th>
<th>0.25 μm CMOS process</th>
</tr>
</thead>
<tbody>
<tr>
<td>Power dissipation</td>
<td>~240 mW @ 300 MHz</td>
</tr>
<tr>
<td>Active area</td>
<td>1.4 mm$^2$</td>
</tr>
<tr>
<td>Power supplies</td>
<td>2.5 V</td>
</tr>
<tr>
<td>Phase resolution</td>
<td>12 bits</td>
</tr>
<tr>
<td>Amplitude resolution</td>
<td>11 bits</td>
</tr>
<tr>
<td>SFDR for 300 MHz clock</td>
<td>&gt; 50 dB with $f_{\text{OUT}} \leq 3/8 \times f_{\text{CLK}}$</td>
</tr>
<tr>
<td>Maximum clock frequency</td>
<td>930 MHz (digital circuitry), 500 MHz (DDFS)</td>
</tr>
</tbody>
</table>
Table 6.7. Comparison among the recently reported DDFS's

<table>
<thead>
<tr>
<th></th>
<th>[11]</th>
<th>[33]</th>
<th>[9]</th>
<th>This work</th>
</tr>
</thead>
<tbody>
<tr>
<td>Technology</td>
<td>0.8-μm CMOS</td>
<td>0.8-μm BiCMOS</td>
<td>0.5-μm CMOS</td>
<td>0.25-μm CMOS</td>
</tr>
<tr>
<td>Clock frequency</td>
<td>200 MHz</td>
<td>150 MHz</td>
<td>230 MHz</td>
<td>300 MHz</td>
</tr>
<tr>
<td>Phase resolution</td>
<td>14-b</td>
<td>12-b</td>
<td>10-b</td>
<td>12-b</td>
</tr>
<tr>
<td>Amplitude resolution</td>
<td>12-b</td>
<td>10-b</td>
<td>11-b</td>
<td>11-b</td>
</tr>
<tr>
<td>On-chip DAC's</td>
<td>No</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr>
<td>Complementary Outputs</td>
<td>N/A</td>
<td>Yes</td>
<td>No</td>
<td>Yes</td>
</tr>
<tr>
<td>Power dissipation</td>
<td>2 W</td>
<td>0.6 W</td>
<td>0.092 W</td>
<td>0.240 W</td>
</tr>
<tr>
<td>Active area</td>
<td>15.9 mm²</td>
<td>3.9 mm²</td>
<td>1.6 mm²</td>
<td>1.4 mm²</td>
</tr>
</tbody>
</table>
CHAPTER 7 CONCLUSIONS AND CONTRIBUTIONS

7.1 Conclusions

In this dissertation, new design techniques were proposed to minimize the power consumption and to optimize the performance of Direct Digital Frequency Synthesizer (DDFS) using segmented sine wave Digital-to-Analog Converter (DAC). Using this technique, the number of DAC cells can be significantly reduced. Therefore, lower power dissipation and smaller die size can be achieved. The nonlinear segmentation approach can achieve very high speed due to the utilization of current steering technique. Both multiple resistor string and resistor-capacitor hybrid DAC architectures are proposed to realize linear segmented sine wave DAC. The linear segmentation approach can achieve less die area and even lower power consumptions due to less number of resistors/capacitors and less local decoders. The linear technique is more suitable for instrumentation applications. To demonstrate the new techniques, a ROM-less high-speed low power DDFS prototype was designed and fabricated in a standard 0.25μm CMOS process.

7.2 Contributions

In chapter 4, a linear phase interpolation technique was proposed to realize the sine wave DAC's for DDFS. Both multiple resistor string and resistor-capacitor hybrid DAC architectures were proposed to realize the segmented sine wave DAC. The DDFS
using this R-C hybrid sine wave DAC can have better spectral performance compared to other proposed architectures due to the inherent sample/hold operation. A figure of merit (FM) is defined to optimize the segmented sine wave DAC. For a DDFS with 12 phase resolution and 11-b amplitude resolution, the "5-5" is the optimal segmentation for the sine wave DAC according to the FM criterion [49].

In chapter 5, a nonlinear phase interpolation technique was proposed for DDFS using nonlinear DAC. A current-steering scheme is proposed to implement the nonlinear phase interpolation DDFS in order to achieve high operation speed. For the prototype DDFS chip with 12 phase resolution and 11-b amplitude resolution, the "3-4-3" segmentation was found to be the optimal one for the sine wave DAC according to the FM criterion. As a guideline for both nonlinear phase interpolation and linear interpolation DDFS's design, optimal segmented sine wave DAC's were obtained by calculating the figure of merits (FM's). The effects of device mismatch on the segmented sine wave DAC were also briefly discussed. In order to describe the functionality of the prototype chip, a behavioral model was developed by using Verilog.

In chapter 6, a 16-b full pipelined phase accumulator was designed for the prototype DDFS. SPICE simulations with nominal models, a 2.5 V power supply, and a temperature of 75 °C predicts that the maximum clock frequency of the phase accumulator can be over 1 GHz. In the segmented sine wave DAC design, some of the coarse DAC cells are shared with fine interpolation DAC. Therefore, good matching between the coarse DAC and the fine DAC as well as small die area were achieved. The decoding scheme of the DAC is designed such that the sine-wave DAC works like a
thermometer-code DAC and the dynamic effects due to transition between positive and negative region are minimized. In the layout of the coarse sine wave DAC, the order of columns is quasi-randomized. This helps to decrease process and temperature gradient on the chip. A 4-layer evaluation printed circuit board (PCB) was designed for testing the prototype DDFS chip. The measured SFDR is better than 50 dB with output frequencies up to 3/8 of the 300 MHz clock frequency. The maximum clock frequency is over 500 MHz for the entire DDFS chip, and the maximum clock frequency is 930 MHz for the digital circuits alone. The DDFS prototype occupies an active area of 1.4 mm$^2$ and consumes 240 mW for a clock frequency of 300 MHz. When compared to other high speed CMOS DDFS’s with on-chip DAC that were published recently, the presented DDFS chip achieves the highest clock frequency with comparable spectral performance and consumes considerably less power and die area [50][51].

Further improvement of the prototype DDFS can be achieved if a better current steering DAC cell can be designed to reduce digital signal feed-through. An on-chip band-gap reference circuit can also be used to improve temperature stability. All the current sources of the DAC cells can be laid out together in one area that is separated from other digital circuitry (decoders, latches, etc.) in order to reduce noise and improve matching. Both the multiple resistor string and the resistor-capacitor hybrid segmented sine wave DAC’s are promising for low power applications. Therefore future implementation of a prototype DDFS is important for the study of the proposed linear phase interpolation technique. Another interesting research topic is to explore the digital frequency or phase modulation scheme based on the DDFS using segmented sine wave
DAC. Finally, it is interesting to take the advantages of a PLL and a DDFS and to design a hybrid frequency synthesizer, which can have a wide frequency range and fine-tuning frequency steps.
REFERENCES


ACKNOWLEDGEMENTS

I would like to express my most sincere and grateful appreciation to my major professor, Dr. Edward K.F. Lee. Thank him for the opportunity of working closely with him in the department of Electrical and Computer Engineering at Iowa State University. Thank him for the invaluable guidance and insight throughout the research project.

I would also like to thank Dr. William Black, Dr. Robert Weber, Dr. Marwan Hassoun, Dr. Chris Chu, and Dr. Yuhong Yang for serving on my Program of Study (POS) committee members and reviewing this dissertation. The comments from my POS committee and the discussions with them are important factors in the success of this research. I also thank Dr. Randy Geiger who taught me the first course on analog VLSI design at ISU.

I greatly appreciate the help from my colleagues in the Analog and Mixed Signal VLSI design center as well as peers from other groups. Special thanks go to Maria Blanco who supported the design center extremely well and treated all the VLSI students like her own children. I also thank Jason Boyd who helped me on soldering. Many thanks to my old college roommate Bei Liu for his friendship and great help. I would also like to thank my colleagues Huanzhang Huang, Huawen Jin, Lin Wu, Tao Han, Weibiao Zhang, Baiying Yu, Hui Liu, Huiming Xia, to name a few, for their friendship and support. My fellow Chinese basketball and soccer players at ISU, thank you all for making my life at Ames enjoyable.

I am deeply grateful to my parents Gequan Jiang and Yizuo Xiu for their love and support in my whole life. Their confidence in me and their pride of my successes
are driving me all these years. Special and deepest thanks go to my wife Zhiying who inspired and accompanied me through the hard times. I deeply appreciate her love and support. I also thank Zhiying for giving birth to our lovely son, baby Brandon Jiang. I want to thank baby Brandon for all the love and joys he brings to us.

I appreciate the opportunities of internship from Rockwell International Inc. in 1997 and Texas Instruments Inc. in 1998. I would like to thank the support from Rockwell Collins and the Roy J. Carver Charitable Trust under grant #98-229.