delete

Are You Ready to Join the Party?

I made a few more commits to the master branch of the repository on github.  The repository is now in a state where if you have the following: Visual Studio 2005 LabView 2014 SP1 LabView Fpga Module Then you can join in on all the fun! If you do not, well, you can still join in on all the fun by opening the Visual Studio 2005 solution located here: https://github.com/FpgaAtHome/seti_fpga/blob/master/seti_boinc/client/win_build/seti_boinc.sln Then we will have to think of a strategy of how to handle the calling of various Fpga platforms.  Right now the solution references the LabView build tools in the "C:\Program Files (x86)\National Instruments\LabVIEW 2014\cintools", and to the Seti_LabView dll that exists in this repository – located in: "$(SolutionDir)..\..\..\seti_labview\builds\seti_labview\Seti_LabView" You can help by creating a new DLL, call it whatever you want and make DLL implement the same interface – FpgaInterface – located here: https://github.com/FpgaAtHome/seti_fpga/blob/master/seti_boinc/client/win_build/FpgaInterface.h We should modify the "PerformFft" function to return a boolean value of true or false to indicate whether or not the external DLL can handle an Fft of the requested length. Then the code will load this DLL at runtime, there can be many implementations of this DLL and the one to use can be specified somehow.  Whether a seti configuration option, or the mere presence of the DLL should be sufficient.  Where if seti_labview.dll is present, that is loaded, if seti_xilinx.dll is present, that DLL is loaded. So, to summarize, the current to do list is: Create DLL which handles the calculation of an Fft of one or more of the following lengths: 8, 16, 32… and all powers of 2 up to and including: 131,072. Modify seti_boinc to pick up and read this DLL Modify seti_boinc and add a configuration option to select which DLL to use Create a seti_labview DLL that conforms to this standard. Up after this: Modify the loop starting at line 587, and ending on line 648 in file listed below to perform the entire Fft calculation before enterting this loop.   https://github.com/FpgaAtHome/seti_fpga/blob/master/seti_boinc/client/analyze Modify this same loop to calculate ChirpData outside the loop Modify this same loop to calculate GetPowerSpectrum Add more and more Fft lengths until we acheive a good-enough performance increase! Thanks and feel free to email me at johnstratoudakis at gmail.com Of course, comments are the prefered method, so that everybody can see your questions. Up next: A better description of the code we are looking to optimize, its current state and the roadmap....
delete

Fpga@Home is Back!

So at the start of this year we decided to resurrect the Seti@Home project.  I went through all of the original Seti@Home source code, I went through all of our work from back in 2014 and was able to get things working again.  You can take a look at my work and join in.  All you need is a GitHub account. The GitHub repository which I set up is located here: https://github.com/FpgaAtHome/seti_fpga This repository has the following structure: boinc-old boinc_depends_win_vs2005 seti_boinc LabViewTester.cpp seti_labview I do want to clean things up a bit, but for now I am focusing on giving everybody a simple, easy to use code repository that you do not have to analyze for 10 hours before figuring out how to get things running. Here is a description of what currently exists in the repository: boinc-old I found the “boinc-old” repository to work with the latest version of Seti@Home.  It appeared to me that Set@Home did not work out of the box with the latest version of boinc – which is version 2.0.  You can find this repository linked directly linked here: http://boinc.berkeley.edu/gitweb/?p=boinc-old.git;a=summary boinc_depends_win_vs2005 This git repository has to exist in the same parent directory as boinc-old in order for the boinc-old project to compile. seti_boinc This is the latest version of the seti_boinc source code that I found.  I had to make quite a few changes to make it compile with Visual Studio 2005. seti_labview This is where all of the LabVIEW code is located which is a work-in-progress to include: a pure LabVIEW implementation of portions of the Seti@Home code a LabVIEW FPGA implementation LabViewTester.cpp A C++ project, currently written in Visual Studio 2010 that will call the seti_labview code to exercise and test the LabVIEW FPGA component.  This is hopefully here to be extendable to other Fpga platforms. So my hopes are to organize the repository as follows: boinc-old boinc_depends_win_vs_2005 seti_boinc seti_labview                 Exposes a DLL that can be called by seti_boinc, and seti_tester                 Will do portions of the Seti@Home analysis in pure LabVIEW, and LabVIEW FPGA seti_tester                 Does some tests on Ffts and other calculations that are done by the Seti@Home project.  These tests should be used to validate and test code being written in seti_labview, and seti_<new_platform>                 Exercises the DLL functions exposed by seti_labview, including the pure LabVIEW implementation and the LabVIEW FPGA implementation. Note: I am currently working on a quick "Benchmarking" distraction, and when I stop I will update the master branch with a cleaner version and some...
delete

Visit to NSF-CHREC at BYU

A few weeks ago I was in Salt Lake City, Utah visiting friends.  As I was thinking of my trip, Brigham Young University (BYU) came to mind and how they are involved in Reconfigurable Computing (btw, reconfigurable typically means FPGAs).  I did some searches and my memory was correct!  At NI Week 2011, I attended a talk by Professor Brent Nelson from BYU on Increasing FPGA Design Productivity. Turns out that BYU is one of the National Science Foundation's (NSF) Center for High-Performance Reconfigurable Computing (CHREC) which means they are really serious about accelerating algorithms with FPGAs!  From the BYU CHREC page you can see their focus is: B1-12: Rapid FPGA Design Prototyping and Implementation and B6-12: Reliable FPGA-Based Systems. My meeting with Professor Nelson was very interesting and encouraging.  Turns out they are users of the Xilinx AutoESL (now called Vivado HLS).  We discussed the SETI@Home algorithm and how it could be accelerated by FPGAs.  Seems like our work here could be the basis for ungraduate senior Electrical and Computer Engineering projects.  This work could be the basis for benchmarking and comparing high level synthesis tools by seeing how they fare in accelerating the same algorithm (i.e. SETI@Home); see July 2012 on this blog on the same topic.  It could also have astronomy and defense applications where real-time analysis of data  A next step for us should be to organize some our work thus far into a paper that will make it easier for others to join this...
delete

More Analysis of the CUDA Code

There are precisely 1,048,576 complex data points that come in with each SETI@Home Work Unit where each the real and imaginary components are both represented by single-precision floating point numbers each.  These one million odd data points are stored in a variable named "DataIn", with the number of data points is stored in a variable called "NumDataPoints."  These variables are defined in the analyzeFuncs.cpp file as such: typedef float sah_complex[2]; sah_complex *DataIn; int NumDataPoints; In the CUDA version of SETI@Home, this dataset is passed into the memory of the graphics card and stored in its local memory.  Afterwards, a set of 58,347 "Chirp/Fft Pairs" are generated and also passed and stored into the memory of the graphics card as well.  The total number of Chirp/Fft Pairs is stored in a variable called num_cfft.  These variables are defined as such: typedef struct {      double ChirpRate;      int ChirpRateInd;      int FftLen;      int GaussFit;      int PulseFind; } ChirpFftPair_t; int num_cfft; ChirpFftPair_t *ChirpFftPairs; After these two data structures have been loaded by the seti_analyze function, which holds the core of the SETI@Home analysis, the seti_analyze function starts to iterate over each of these 58,347 Chirp/Fft Pairs and performs an analysis on the input data based on the parameters of each Chirp/Fft Pair. Here is some more detailed descriptions: Chirps the original DataIn data using the ChirpRateInd and ChirpRate values from the current Chirp/Fft pair, along with another value that comes from the work unit called "subband sample rate." After the original data set has been chirped, a discrete fast fourier transform is performed on the chirped data using the Fast Fourier Transform in the West library.  The fft length is also specified as a parameter in the Chirp/Fft Pair. The results from the Fft are then processed by a function called "GetPowerSpectrum," which simply squares each of the components of the data passed in. After these 3 steps, the resulting data is processed, depending on certain values with the following functions which are not candidates for FPGA acceleration at this time: FindSpikes FindAutocorrelation analyze_pot It should be noted that the GPU/CUDA code also optimizes parts of the analyze_pot function, along with some of the functions which analyze_pot calls. Getting Started So what does this mean for us? Well, what if we replicated the same thing but in a separate project and then use that separate project to develop the FPGA code?  We can re-implement those sections of code – namely the ChirpData, Fft, and GetPowerSpectrum functions in a high-level language like LabVIEW and see what some advanced tools automatically port LabVIEW code to an FPGA can do for us. For more information on LabVIEW and its graphical FPGA development environment, including its FPGA builder tool that automatically moves parts of an application into an FPGA see: www.ni.com. Step 1 – Export the Input Data I modified the original analyzeFuncs.cpp file to "dump" all of the input data and the results at certain points of execution.  The goal here was to dump enough data so that the separate program would have enough information to reproduce a portion of the calculation in an isolated environment and to validate the results are accurate by loading the results from a previous run and comparing the values. For the input data, we need two files, one for the Work Unit Data points, and another for the Chirp/Fft pairs. Work Unit Data points will be stored in a file called "binWorkUnitDataPoints.bin"...
delete

New Year and New Direction…

So it is New Years Day 2013 in New York.  I am home and not supposed to be “working” because it is new years day.  Regardless, I decide to work on the analysis of the FPGA@Home movement because it is not really work for me, it is fun.  My brother Terry walks into the room and I start explaining what I have found to him.  He then tells me, “Why not look at the SETI CUDA source code to see what functions they optimized?”. I download it from subversion and try to compile it… it fails… of course, but I realize that I don’t need to run it, I just need to analyze it.  So I go to the seti_analyze function and I find a bunch of calls and determine which functions are offloaded to the GPU!  Now all I have to do is determine if these functions are FPGA-acceleratable! Here are the functions: ChirpData fftw_execute_dfts GetPowerSpectrum summax <= custom function written by CUDA people, likely to be a summation of different functions in the regular code GaussFit find_triplets find_pulse So, the CUDA code works by basically downloading the entire set of data points – which usually number 1,048,000 complex data points and then performing the analysis on the functions listed above on it and sending the results back to the host computer to do the rest of the analysis. I think we are getting real...
delete

More Detailed Profiling Results

Now that I have a list of functions to profile, I want to gather some more information to determine exactly how long each one takes to execute with a Release build of the executable, while making the smallest impact on the performance of the code while it is profiling/benchmarking the code.  To do this, I need to do the following: Create a “StopWatch” class that I can use to: Name a section of code for easy identification Time that section of code using a high-resolution counter like QueryPerformanceCounter (available only on the Windows Platform) Store some important variables Find all function calls from the previous post and use the StopWatch class to time those sections of code and to note the value of certain important variables that would indicate things like: the length of any input array the length of the FFT being calculated other similar function specific parameters Provide the smallest possible impact on the performance of the application. (i.e. don’t insert a printf statement inside a for loop that will add a significant amount of delay per iteration) So, a quick search on google resulted in the following code in a class named “CStopWatch”.  See http://cplus.about.com/od/howtodothingsi2/a/timing.htm for more information. I modified it of course to suit my needs, adding a constructor that allows you to “name” a section of code as well as two integer parameters to help identify what exactly was occurring in the iteration. My CStopWatch class writes everything to standard output from a separate thread of course to limit the impact it makes on the analysis, and after I did a first run, the file stderr.txt was growing too fast, so I decided to modify my logging class to log only executions that are over 750 microseconds in total execution time.  I then wrote a small awk script to parse the output to give me a Maximum and Average execution time for each function.  To sum things up, here are my results: (Note: I added functions that were not listed in the original profiler and I named sections of larger functions using the underscore character and section number)   Function Name Average (milliseconds) Maximum (milliseconds) FindSpikes 184.917 3,153.166 analyze_pot_section_1 155.373 369.653 outerloop 109.149 3,328.679 analyze_pot_section_2 59.354 3,152.386 ChirpData 4.052 14.12 find_pulse 4.032 38.737 fftwf_execute_r2r 2.512 9.079 f_GetChiSq 2.181 3.688 GetFixedPoT 1.871 7.457 GaussFit 1.556 22.356 FindAutoCorrelation 1.336 1.336 f_GetPeak2 1.333 2.727 f_GetTrueMean2 1.313 2.463 fftwf_execute_dft 1.254 6.409 GetPowerSpectrum 1.118 1.711 f_GetPeak1 0.765 0.765 FindSpikes is 83 lines long and has 2 sets of nested for loops. analyze_pot_section_1 is 44 lines long, has one for loop, and that for loop calls “GetFixedPoP”, and “GaussFit” many times Outerloop is the big grand for loop that loops over each chirp/fft pair, so we can forget about putting that whole section of code into an FPGA… analyze_pot_section_2 also has a large for loop that is nested inside of yet another for loop, and the inside for loop makes many calls to find_triplets and find_pulse The rest of the functions have a much shorter average execution time and there must be a good reason for this, probably because they are much simpler.  I will go through the rest of the functions and list how many for loops, and how many lines of code each one has. ChirpData – 56 lines of code – a single for loop that does a simple triginometric calculation on the input data.  Looks like a good candidate for FPGA acceleration. find_pulse – 253 lines of code –...
delete

Profiling SETI@Home @ Home

So instead of just randomly analyzing functions in the SETI@Home source code, I decided that I wanted to be scientific about my analysis while I search for which functions take up the most amount of execution time.  It turns out that there is a nice freeware C++ Profiler available for the Windows Platform named “Very Sleepy”.  This profiler works as long as you have the debugging symbols available for the executable that you wish to profile.  So for us this means selecting the Release Build configuration for the seti_boinc project and making sure that the Debug Information Format is set to any of the “Program Database” options.  Eerily, this option is already turned on for the Release build of the SETI@Home project.  I wonder is this is another oversight of the SETI@Home project team, or if it doesn’t really matter… Well, at least the rest of the configuration does appear to have the optimization settings selected. How to Profile SET@Home Step 1 – Rebulid with Program Database Option Turned on To set your project up for profiling, right-click on the “seti_boinc” project from within Visual Studio and select “Configuration Properties->C/C++->General”, and then make sure that any of the “Program Database” options are selected.  After you have done this re-build your solution. Step 2 – Start you SETI@Home client After rebuilding the application, click “Debug->Start without Debugging.” Again, make sure that the dll “libfftw3f-3-1a_upx.dll” is located in the same directory as your setiathome executable. Step 3 – Start Profiling Load the Very Sleepy application profiler and select the SETI@Home executable, then select either “Profile All”, or “Profile Selected” from the right-side of the window.  Wait about 1 minute then click “OK”.  Waiting too long will generate a really large file that could crash your system… Here are the results from my profiling run of the setiathome code: % Exclusive   Function Name          File Name 18.91%           fftwf_set_timelimit     <not called directly by seti code> 8.37%             GetFixedPoT             analyzePoT.cpp 6.75%             analyze_pot               analyzePoT.cpp 5.50%             f_GetPeak                 gaussfit.cpp 5.36%             find_pulse                  pulsefind.cpp 4.58%             GaussFit                   gaussfit.cpp 4.48%             FindSpikes                spike.cpp 4.46%             f_GetTrueMean         gaussfit.cpp 4.11%             f_GetChiSq               gaussfit.cpp 3.03%             lcgf                            lcgamm.cpp So… it looks like the libfftw aka Fastest Fourier Transform in the West is taking up the most of the CPU time… I wonder how much time the other FFT libraries would be using… After that GetFixedPoT and analyze_pot take up the most time.  This gives up plenty of work to do.  If you wish to join in the effort take one of these functions listed above and try to figure out which has the most repetitive tasks that could hopefully be optimized by offloading to an FPGA.  Remember, any operation that is sent to an FPGA must be serialized into a long data...
delete

Get Involved! Start Analyzing the SETI@Home Source Code…...

So, do you want to join the FPGA@Home effort and don’t know where to begin?  If you like analyzing source code, this post is for you.  Our next step is to find spots that can be accelerated by an FPGA. Below are instructions for how to download and run the SETI@Home source code.  All you need is a Windows machine with Visual Studio 2005. Note: You must have Visual Studio 2005 as there are certain deprecated libraries that are no longer available in later version of Visual Studio. How to Run SETI@Home from its source code on Windows XP SP3 Step 1 – Run Windows Update and Install Visual Studio 2005 Please run Windows Update just to be safe, and make sure that you have installed Visual Studio 2005 onto your machine.  I used the Standard Edition for testing these instructions, any edition should work, although I am not sure if the free edition – the Express Edition – will work or not.  If you have the Express Edition and try this out please let me know either way. Note: In case you were wondering where you can get the Visual C++ 2005 Express Edition, you can download it from softpedia.com, as Microsoft has removed it from their website in an effort to make you use the latest version of their products… http://www.softpedia.com/get/Programming/Other-Programming-Files/Microsoft-Visual-C-Toolkit.shtml Step 2 – Install TortoiseSVN (for your favorite subversion client) Download and run the installer from here: http://tortoisesvn.net (click Downloads and select “Download Now”) Step 3 – Install git The people over at BOINC have set up some pretty good instructions for setting up git on your machine. You can access these instructions here: http://boinc.berkeley.edu/trac/wiki/SourceCodeGit/Windows# Step 4 – Checkout SETI@Home source code using Subversion It is easiest if you create a directory in the root of your partition called “Boinc” as so we will (1) make a directory called C:\Boinc, (2) in that directory right-click and select “Checkout Project” and use the following options: URL of repository: https://setisvn.ssl.berkely.edu/svn/seti_boinc Checkout directory: c:\Boinc\seti_boinc or you can issue the command line version: cd c:\boinc svn co https://setisvn.ssl.berkeley.edu/svn/seti_boinc Step 5 – Clone BOINC git repository into C:\Boinc You can use Git gui, or you can use the command line version.  Personally I would just use the Git Bash command line and get it over with!  Do the following from a Git Bash command prompt window: Start->Run->All Programs->Git->Git Bash cd /c/Boinc git clone git://boinc.berkeley.edu/boinc.git git clone git://boinc.berkeley.edu/boinc_depends_win_vs2005.git This will create 2 directories under C:\Boinc, one named boinc, and the other named boinc_depends_win_vs2005.  The Visual C++ projects located in the C:\boinc\seti_boinc directory reference the projects and files that are located in these newly created boinc directories. Note: boinc_depends_win_vs2005 is a pretty large download (over 750MB) so go take a break, get some organic tea, you have some free time on your hands. Step 6 – Fix the Source Code That’s right… the code doesn’t work out of the box for building with Visual Studio 2005, so I have recorded all of the changes that I made for this to work.  Perhaps somebody from the SETI@Home project team can work with me to incorporate these changes into the trunk of their subversion repository. You can use the following patch file to apply the appropriate changes to the seti_boinc subversion directory in order for it to build under Visual Studio 2005. Download this file: http://www.fpgaathome.org/downloads/Patch_SETIatHome.patch Right-click on the folder seti_boinc and select “Apply-Patch” then click “Patch all items” and click exit. Or you can...