Phase to Amplitude Converter
The phase to amplitude converter calculates the amplitude for the current phase angle. This operation is basically the same as calculating the sine or cosine function of its argument. Two methods are widely used for implementation in hardware, the CORDIC algorithm and the ROM lookup table. A CORDIC algorthim is often used for lower speed applications like audio processing. CORDIC is the abbreviation of COordinate Rotation DIgital Computer. It is an iterative method of performing vector rotations by arbitrary angles. When rotating a unity vector in carthesian coordinates, its xaxis component is described by the cosine of the rotation angle and its yaxis is the sine. The angle is measured between the positive xaxis and the vector. CORDIC is computationally very simple, as it requires only shifts and adds. The disadvantage for our purpose is its iterative algorithm, which usually requires one clock cycle per bit of resolution. For a master clock frequency around 100 MHz, we would need about 1.5 GHz clock speed for the CORDIC. That is well beyond today's FPGA capabilities and if it becomes possible in the future, you will probobly want to run the DDS at that speed. See Ray Andraka's excellent pages on digital signal processing for more details on the CORDIC algorithm. As we want to generate RF rather than audio frequency, our target is to generate one analogue output sample per clock cycle. As said before, this is not (easily) possible with CORDIC and so we chose a lookup table. The amplitude is stored in a ROM, which is effectively an initialized RAM in an FPGA, and the phase angle, that is generated in the previous stage, addresses this amplitude. Access times of FPGA onchip memories are in the 10ns range and therefore clock speeds of 100 MHz and more can be achieved. The immediate question that pops up, when discussing lookup tables, is “how large should it be”? Well, as a rule of thumb, if the resolution of the ADconverter is nbits, then there should at least be 2^{n+2} entries in the table, each of them nbits wide. But why?
In case of a 14bit DAC, about 2^{16} entries in the lookup table would be needed. That is almost 128 kB, well beyond the affordable sizes inside an FPGA. We could consider a lookup table in a fast external RAM, but the access times would probably be below those for the builtin RAM. Onchip memory sizes of about 1 kB are feasible even with the smaller and cheaper FPGAs. 8 kB or 16 kB are still possible, if you can afford these FPGAs, but much larger memory is certainly not worth the money spent for it. Fortunately there are simple ways to keep the size of the lookup table surprisingly small. First of all, the sine function enjoys two symmetries: the function values between {0..π/2} are the same as those between {π/2..π}, i.e. they are mirrored over x = π/2. The function values between {0..π} are the same as between {π..2π} with a negative sign. The following graphic shows the symmetries. 

Only the shaded function values between {0..π/2} need to be stored in the lookup table. All other values can be derived from them. This simple trick reduces the number of required entries to a quarter and we can also save one bit of its output, because the output data range is only between {0..1} instead of {1..1}. That reduces the required size from the example above from 128 kB to below 32 kB. That was very useful, but we are still about a factor of 32 above the achievable size inside an FPGA. Another old trick serves very well in this case: interpolation. If we have only a few equally spaced values, we may calculate the intermediate ones from their neighbours. It turns out that linear interpolation is suitable to calculate the 31 intermediate values between two stored values, so that the error is always below or in the order of one bit. A program to calculate the error of a sine lookup table can be found in the download area. The phase accumulator delivers a total of ρ bits to the phasetoamplitude converter. As is obvious from the graph above, the most significant bit is high for arguments between π and 2π and therefore it controls a complement calculator, which simply negates the output from the LUT if high. The output value is passed through without any modification, when the high order bit is low (i.e. the argument is between 0 and π). The second highest order bit is high, when the argument is between π/2 and π or 3π/2 and 2π respectively. The address inputs to the lookup table is modified in these cases. The argument is calculated by subtracting the phase accumulator output from π. The subsequent bits from the phase accumulator, as many as required, are directly taken as address inputs for the lookup table. Further bits control the interpolator and so the determine the distance to each of the two interpolation points. Let me give an example to make things clear: The accumulator size ρ is 28 bits, the DAC has a resolution of 14 bits and the quarterlookup table has 512 entries, each 14 bits wide. The output from the phase accumulator is simply called “phase” and the input to the DAC is called “data”. Then the signals are connected as follows: phase[ρ1] = phase[27]: controls the complementor phase[ρ2] = phase[26]: controls the πx subtractor phase[ρ3:ρ11] = phase[25:17]: address input to the lookup table phase[ρ12:ρ16] = phase[16:12]: weighing factor for interpolator It is possible to use more bits for the interpolator, but an additional precision is usually not achieved. Note that finally an offset has to be added to make the result unipolar it that is required for the employed DAC. 