> The XSOC RISC is also not very small. That stings! Oh well -- de gustibus non disputandem est -- so what's new in small processor cores? Anyone care to fill in the table below with more recent entries? Small (IMHO): PicoBlaze: 76-96 slices (approx. double to get LUTs) gr0000: (simpleVirtex-optimized 16-bit RISC): < 200 LUTs + 1 BRAM gr1000: (unpublished Virtex-optimized pipelined 16-bit RISC): < 200 LUTs + 1 BRAM xr16: 260 logic cells (258 4-LUTs, 52 3-LUTs, 165 flip-flops, 112 TBUFs) Middlin': Nios-II/e: ~550 LEs (Cyclone) Not very small: MicroBlaze: ~900 LUTs Nios (16-bit): ~1100 LEs Nios-II/f: 1800 LEs (Cyclone) (Disclaimer: all data may be obsolete/wrong.) You can of course implement a 16- or 32-bit datapath on an 8-bit datapath (taking 2 or 4 cycles per operation); but that will not significantly improve the "performance divided by area" number that is my preferred figure of merit. Cheers, Jan Gray |
|

small CPUs
Started by ●October 22, 2004
Reply by ●October 22, 20042004-10-22
At 12:30 AM 10/22/2004, you wrote: > > The XSOC RISC is also not very small. > >That stings! Oh well -- de gustibus non disputandem est -- so what's new in >small processor cores? Anyone care to fill in the table below with more >recent entries? I must apologize. My memory was incorrect. I did a brief survey a while back and did not remember any CPU cores that were less than about 1000 LUTs for CPUs that were completely functional. I think I did not include yours in my mental list of small CPUs because, as opposed to GPL, the license does not allow commercial use, which is what I was looking for. Now that I have looked at it a bit harder, I see that the big LUT consumer, the wide mux, is implemented in TBUFs. That won't fly in the newer Xilinx parts or the Altera parts. That would use about 4 LUTs per bit (7 inputs) or 64 more LUTs, still a very small CPU at about 375 LUTs. I have not looked hard at the others. But I did look at your notes with the GR0040. You estimate a 32 bit GR0050 at 330 LUTs, even after adding 128 more for the wide mux, that is only 460 LUTs, still very good for a 32 bit CPU. How do you expect to extend the immediate operands to a full 32 bits, multiple imm instructions? >Small (IMHO): >PicoBlaze: 76-96 slices (approx. double to get LUTs) >gr0000: (simpleVirtex-optimized 16-bit RISC): < 200 LUTs + 1 BRAM >gr1000: (unpublished Virtex-optimized pipelined 16-bit RISC): < 200 LUTs + 1 >BRAM >xr16: 260 logic cells (258 4-LUTs, 52 3-LUTs, 165 flip-flops, 112 TBUFs) > >Middlin': >Nios-II/e: ~550 LEs (Cyclone) 32 bits though, right? >Not very small: >MicroBlaze: ~900 LUTs >Nios (16-bit): ~1100 LEs >Nios-II/f: 1800 LEs (Cyclone) I believe the microBlaze and NIOS-II/f are 32 bits too. >(Disclaimer: all data may be obsolete/wrong.) > >You can of course implement a 16- or 32-bit datapath on an 8-bit datapath >(taking 2 or 4 cycles per operation); but that will not significantly >improve the "performance divided by area" number that is my preferred figure >of merit. I would think a metric should also take into account the efficiency of the instruction set. Using 16 bit instructions can use more program memory than 8 bit instructions. But I guess that would be very hard to measure other than using benchmarks. Likewise, the instruction set affects processor speed in ways other than just clock speed. But that takes us into the nebulous world of benchmarking as well. Rick Collins Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX |
Reply by ●October 22, 20042004-10-22
Arius - Rick Collins wrote: > I must apologize. My memory was incorrect. I did a brief survey a while > back and did not remember any CPU cores that were less than about 1000 LUTs > for CPUs that were completely functional. I think I did not include yours > in my mental list of small CPUs because, as opposed to GPL, the license > does not allow commercial use, which is what I was looking for. But can one use LUTs as reasonable benchmark since not all LUTs are equal. Also features like fast carry and dual port ram with Xilinx tend to cloud just what you can do. >>You can of course implement a 16- or 32-bit datapath on an 8-bit datapath >>(taking 2 or 4 cycles per operation); but that will not significantly >>improve the "performance divided by area" number that is my preferred figure >>of merit. Good benchmark but the 8 bit toy computer that fit in a 32 cell CPLD can not be easly beat. http://www.tu-harburg.de/~setb0209/cpu/mcpu.html > I would think a metric should also take into account the efficiency of the > instruction set. Using 16 bit instructions can use more program memory > than 8 bit instructions. But I guess that would be very hard to measure > other than using benchmarks. > > Likewise, the instruction set affects processor speed in ways other than > just clock speed. But that takes us into the nebulous world of > benchmarking as well. I think the real fractor is the random logic needed. FPGA's have problems with this no matter what brand of FPGA's you use. Many years ago , BYTE magazine had a benchmark of sorts that took in acount the instruction set of the computer rather raw speed. This I think is what you are looking for. |
|
Reply by ●October 22, 20042004-10-22
At 02:32 AM 10/22/2004, you wrote: > But can one use LUTs as reasonable benchmark since not all LUTs >are equal. Also features like fast carry and dual port ram with Xilinx >tend to cloud just what you can do. No one is trying to compare different FPGAs, just the CPU designs. For a given FPGA, LUTs are a useful measure of the size of a design. In fact, they are the *only* measure since that is typically the limiting resource in the chip. Rick Collins Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX |
