More Analysis of the CUDA Code

There are precisely 1,048,576 complex data points that come in with each SETI@Home Work Unit where each the real and imaginary components , are both represented by single-precision floating point numbers each.  These one million odd data points are stored in a variable named "DataIn", with the number of data points is stored in a variable called "NumDataPoints."  These variables are defined in the analyzeFuncs.cpp file as such:

typedef float sah_complex[2];

sah_complex *DataIn;
int NumDataPoints;

In the CUDA version of SETI@Home, this dataset is passed into the memory of the graphics card and stored in its local memory.  Afterwards, a set of 58,347 "Chirp/Fft Pairs" are generated and also passed and stored into the memory of the graphics card as well.  The total number of Chirp/Fft Pairs is stored in a variable called num_cfft.  These variables are defined as such:

typedef struct {
     double ChirpRate;
     int ChirpRateInd;
     int FftLen;
     int GaussFit;
     int PulseFind;
} ChirpFftPair_t;

int num_cfft;
ChirpFftPair_t *ChirpFftPairs;

After these two data structures have been loaded by the seti_analyze function, which holds the core of the SETI@Home analysis, the seti_analyze function starts to iterate over each of these 58,347 Chirp/Fft Pairs and performs an analysis on the input data based on the parameters of each Chirp/Fft Pair. Here is some more detailed descriptions:

  1. Chirps the original DataIn data using the ChirpRateInd and ChirpRate values from the current Chirp/Fft pair, along with another value that comes from the work unit called "subband sample rate."
  2. After the original data set has been chirped, a discrete fast fourier transform is performed on the chirped data using the Fast Fourier Transform in the West library.  The fft length is also specified as a parameter in the Chirp/Fft Pair.
  3. The results from the Fft are then processed by a function called "GetPowerSpectrum," which simply squares each of the components of the data passed in.

After these cheap nba jerseys 3 steps, the resulting data is processed, depending on certain values with the following functions which are not candidates for FPGA acceleration cheap oakleys sunglasses at this time:

  • FindSpikes
  • FindAutocorrelation
  • analyze_pot

It should be noted that the GPU/CUDA code also optimizes parts of the analyze_pot function, along with some of the functions which analyze_pot calls.

Getting Started

So what does this mean for us?

Well, what if we replicated the same thing but in a separate project and then use that separate project to develop the FPGA code?  We can re-implement those sections of code – namely the ChirpData, Fft, and GetPowerSpectrum functions in a high-level language like LabVIEW and see what some advanced tools automatically port LabVIEW code to an FPGA can do for us.

For more information on LabVIEW and its graphical FPGA development environment, including its FPGA builder tool that automatically moves parts of an application into an FPGA see: www.ni.com.

Step 1 – Export the Input Data

I modified the original analyzeFuncs.cpp file to "dump" all of the input data and the results at certain points of execution.  The goal here was to dump enough data so that the separate program would have enough information to reproduce a portion of the calculation in an isolated environment and to validate the results are accurate by loading the results from a previous run and comparing the values.

For the input data, we need two files, one for the Work Unit Data points, and another for the Chirp/Fft pairs.

Work Unit Data points will be stored in a file called "binWorkUnitDataPoints.bin" and will have the following format:

<Number of Data Points> (int)

<Data Point 1 – Real component> (float)

<Data Point 1 – Imaginary component> (float)

<Data Point N – Real component> (float)

<Data Point N – Imaginery component> (float)

The Chirp/Fft pairs will be stored in a file called "binChirpFftPairs.bin" and will have the following format:

<Number of Chirp/Fft Pairs> (int)

<Chirp/Fft Pair 1 – Chirp Rate> (double)

<Chirp/Fft Pair 1 – Chirp Rate Ind> (int)

<Chirp/Fft Pair 1 – Fft Length> (int)

<Chirp/Fft Pair 1 – Gauss Fit> (int)

<Chirp/Fft Pair 1 – Pulse Find> (int)

 

<Chirp/Fft Pair N – Chirp Rate> (double)

<p oakley sunglasses for men style=”font-size: 13px;”>
<Chirp/Fft Pair N – Chirp Rate Ind> (int)

<Chirp/Fft Pair N – Fft Length> (int)

<Chirp/Fft Pair N – Gauss Fit> (int)

<Chirp/Fft Pair N – Pulse Find> (int)

Step 2 – Export the Results of PowerSpectrum

For the results, or output data, we will be saving the results of the data that comes out of the GetPowerSpectrum function.  This function is called once per iteration over the Chirp/Fft Pair iteration loop, and because each iteration of the loop takes a good amount of time to execute, I will limit the execution to several iterations of the loop.  Also, upon examination of the ChirpData function one sees that no processing really occurs whenever the chirprateind variable is 0.  So I examined the ChirpFftPairs array and noticed that this all changes after Chirp/Fft pair number 15.  So I will take the outer for loop, which is located on analyzeFuncs.cpp line 433 and modify it to start at index (icfft) 16 and to stop at index 20.  Here is a quick snippet:

for (icfft = 16; icfft < 20; icfft++) {

The structure of this for loop is to do the steps outlined above in each iteration, and after the call to GetPowerSpectrum on line 560 we see that the results of the operation arte stored in a variable named "PowerSpectrum."

560                 GetPowerSpectrum( WorkData,
561                                   &PowerSpectrum[CurrentSub],
562                                   fftlen
563                                 );

Now the code itself actually calls GetPowerSpectrum many times, working on the data set fftlen bytes at a time, but the results do not overlap each other and all end up in a really large array called "PowerSpectrum," which is stored in a variable named:

float *PowerSpectrum;

whose memory is allocated on line:

211         PowerSpectrum = (float*) calloc_a(NumDataPoints, sizeof(float), MEM_ALIGN);

We will save the values of this array along with the current iteration number (icfft) so that we can take the input data – DataIn, along with the proper Chirp/Fft pair and re-calculate the results ourselves in the external program and compare the results.

Storing the results will be in a file named "binPowerSpectrum.bin" and will have the following format:

<icfft> (int)

<Number of Data Points> (int)

<Data Point Cheap Oakleys Sunglasses 1> (float)

<Data Point N> (float)

The icfft numbers for our first run Oakleys Outlet should be 16, 17, 18, and 19.  For further runs, I will experiment with placing the iterator over different values.  Multiple calls to dump the Power Spectrum will results in the results being appended to the end of any existing file.  So any consumer of this file will have to parse through it to find the required/necessary icfft.

Step 3 – Implement an external application in C++

I implemented a separate application in C++ that reads the input binary files mentioned above and runs the ChirpData, Fft, and GetPowerSpectrum functions and compares the results obtained with the results in the file.  This application is simple and requires that you have Visual C++ 2005.

Until I find a host, you can download the zip file for now:

seti_fpga

Leave a Reply

Your email address will not be published. Required fields are marked *