Ready for partial reconfiguration!
Many articles already exists about dynamic partial reconfiguration, but they often make use of pre-generated bitstreams (using Xilinx proprietary design suite ). Thanks to the amazing work of the guys at fpgatools  who reversed engineered most of the internal bits configuration of the Xilinx Spartan6 LX9, it is now possible to generate/modify the FPGA configuration without any Xilinx tools, e.g., directly on microcontroller.
How does it work?
-  The FLR client generates requests over serial line and controls the FLR server.
-  The FLR server processes the requests in its internal working buffer. It can communicate to the FPGA over JTAG and read/write its internal configuration memory.
-  the FPGA is always activated, and changes its behavior according the configuration being updated on-the-fly.
The advantage of this architecture where a chunk of bitstream is read-patched-and-written-back to the target, is that not all bits of the internal configuration have to be understood. The FLR server only patches the known bits (e.g., a LUT equation), while the other (possibly unknown) bits remains untouched.
- [A] two serial-to-USB interfaces: one for sending the request with the FLR protocol, one for the debug output
- [B] ARM debug interface
- [C] ARM STM32F103 microcontroller, running at 48MHz
- [D] Xilinx programming interface for FPGA (not used here)
- [E] Xilinx Spartan6 LX9
- [F] Oscilloscope
Below each step is the corresponding FLR  request words.
1. Read a chunk of the Spartan6 configuration over JTAG into a buffer of the microcontroller. 0x0002030f15000200 (<- read 2 frames starting from row:3 maj:15 min:21)
2. Change the LUT equation in the microcontroller copy of the FPGA configuration. 0x0121030f02050000 (<- set LUT equation at row:3 maj:15 idx:2 lut_b_mm) 0xffffffffffffcccc (<- new 64-bit equation)
3. Write back the new/modified configuration to the FPGA 0x0003000000020000 (<- write 2 frames starting from offset:0)
Note: to make the reconfiguration sequence faster, the reconfiguration server on the microcontroller is fed with hard-coded requests, avoiding serial line and PC-client delays. In this configuration, the reconfiguration is completely autonomous.
The smallest addressable piece of configuration is the frame (65 16-bit words). Two frames have to be changed for a LUT equation. Partial read and write also require a dummy frame containing 0xFFFF only (according to [UG380]), hence the minimal amount of transfered data at read or write is:
(2 + 1) * 65 * 2 bytes = 390 bytes
In this test, a simple AND gate is initially implemented to the FPGA from the serial flash memory. The sequence transforms the AND gate into a OR gate. In the oscilloscope screenshot below, the green signal is connected to a GPIO pin of the microcontroller which is set to high before starting the reconfiguration, and set the low at the end. The blue signal is the output of the inital AND gate which becomes an OR gate. Of course, during the test, only one of the gate intputs is set to high.
The complete dynamic partial reconfiguration sequence took less than 30ms (top signal). The bottom signal shows that the new LUT equation becomes valid before the end of the reconfiguration sequence.
Can it be faster?
Bit banging on the microcontroller side is currently very slow, because the GPIOs are controlled directly by the CPU and this one is only running at 48MHz (instead of 72MHz possible, because of a bug in my board). It could easily be twice as fast at maximum speed, or three times faster using a STM32F4 at 160MHz. Using the FSMC and SelectMap 16-bit FPGA interface, reconfiguration could even be significantly faster (up to 16-bit@50MHz according to the datasheet).
At the maximum theoretical speed, a minimal reconfiguration sequence would take:
 ((2+1) * 65) * 2 = 390 words (for read and write)  390 * (1/50,000,000) = 0.000,007,8 → 7.8us (~128KHz)
Update 2015.05.19 What about the Virtex-6 family?
According to the configuration manual of the Virtex-6 [UG360] (v3.8, p.117), the Frame Address Register (aka FAR) gives the possibility for configuration up to the row granularity (see Row Address in the table). Therefore it is no long required to write a frame consisting of a whole minor (65 16-bit words for the Spartan-6, 81 32-bit words for the Virtex-6) like for the Spartan-6 which uses FAR_MAJ and FAR_MIN registers.
|||Xilinx, ISE Design Suite, http://www.xilinx.com/products/design-tools/ise-design-suite/|
|||Wolfgang Spraul, FpgaTools GitHub repository, https://github.com/Wolfgang-Spraul/fpgatools|
|||Vincent Jordan, FPGA Live Reconfiguration protocol specifications, http://vjordan.info/flr|
|[UG380]||Xilinx, Spartan-6 FPGA Configuration User Guide, http://www.xilinx.com/support/documentation/user_guides/ug380.pdf|
|[UG360]||Xilinx, Virtex-6 FPGA Configuration User Guide, http://www.xilinx.com/support/documentation/user_guides/ug360.pdf|