Dear all, any one know how to design a 16X16 multiplier with one clock cycle? I know that some FPGA support embedded 16 X 16 multiplier. However, is this operated in one clock cycle? Where can i get the layout/schematic/verilog design of this multiplier? Thanks Reala 

16 X 16 multiplier
Started by ●July 21, 2002
Reply by ●July 24, 200220020724
Hi, Implementing 16*16 multiplier in one clock is easy, however the main problem is what clock frequency? Please clear yourself Bye,  Manfield Chow <> wrote: > Dear all, > > any one know how to design a 16X16 multiplier with > one clock cycle? > I know that some FPGA support embedded 16 X 16 > multiplier. > However, is this operated in one clock cycle? > Where can i get the layout/schematic/verilog design > of this multiplier? > > Thanks > Reala > [Nontext portions of this message have been > removed] > To post a message, send it to: > > To unsubscribe, send a blank message to: > > > ">http://docs.yahoo.com/info/terms/ __________________________________________________ 

Reply by ●July 24, 200220020724
Yes. unless we dont specify frequency, we can not say whether it is possible
to implement or not. please tell the frequency of operation.  Regards, Sridhar Nandula Design Engineer, 125, Phase1, Udyog vihar, Gurgaon Ph: (+91) 01246439224 ext 132 Email: On Wednesday 24 July 2002 05:40 pm, you wrote: > Hi, > > Implementing 16*16 multiplier in one clock is easy, > however the main problem is what clock frequency? > > Please clear yourself > > Bye, >  Manfield Chow <> > > wrote: > > Dear all, > > > > any one know how to design a 16X16 multiplier with > > one clock cycle? > > I know that some FPGA support embedded 16 X 16 > > multiplier. > > However, is this operated in one clock cycle? > > Where can i get the layout/schematic/verilog design > > of this multiplier? > > > > Thanks > > Reala > > > > > > [Nontext portions of this message have been > > removed] > > > > > > To post a message, send it to: > > > > To unsubscribe, send a blank message to: > > > > > > ">http://docs.yahoo.com/info/terms/ > > __________________________________________________ > > To post a message, send it to: > To unsubscribe, send a blank message to: > ">http://docs.yahoo.com/info/terms/ 
Reply by ●July 24, 200220020724
Hadi, Sridhar, Thank you for your reply. the clock frequency is not too high. It should be 20Mhz to 40Mhz. My boss tells me that if we implement 16X16 multiplier directly, the size will be very big. So, I want to know some technic to design a single cycle 16X16 multiplier to reduce this size. Moreover, If i implement the design by FPGA, then change to ASIC. As there are specific blocks in FPGA (eg. Logic block , lookup table), how can i put this in my ASIC? some tools to do this? or buy library from FPGA's vender? or some format of files (eg. netlist) generate by FPGA tools which can be read by ASIC layout design tools? Thank you for your help. With best regards, Reala  Original Message  From: "hadi khani" <> To: <> Sent: Wednesday, July 24, 2002 8:10 PM Subject: Re: [fpgacpu] 16 X 16 multiplier > Hi, > > Implementing 16*16 multiplier in one clock is easy, > however the main problem is what clock frequency? > > Please clear yourself > > Bye, >  Manfield Chow <> > wrote: > > Dear all, > > > > any one know how to design a 16X16 multiplier with > > one clock cycle? > > I know that some FPGA support embedded 16 X 16 > > multiplier. > > However, is this operated in one clock cycle? > > Where can i get the layout/schematic/verilog design > > of this multiplier? > > > > Thanks > > Reala > > > > > > [Nontext portions of this message have been > > removed] > > > > > > To post a message, send it to: > > > > To unsubscribe, send a blank message to: > > > > > > ">http://docs.yahoo.com/info/terms/ > > > > __________________________________________________ > > To post a message, send it to: > To unsubscribe, send a blank message to: > > ">http://docs.yahoo.com/info/terms/ > 

Reply by ●July 25, 200220020725
HI Reala, Implementing a multiplier in a single clk means that u gotta use some combinational logic whose delay is 1/frequncy.It will take some logic(any good digital book gives an idea abt multipliers(i ahve read one in smith book (ASIC))also u ahve lot of info in the net regarding combl multipliers.If u go for sequential ones it will take less logic and give good freq also with reduced logic when compared to combl ones.U can even try pipelining a combl multipliuer urself. P.S:U also have a dedicated multiplier in virtexII which will be implemented by leospec if u jus use * in ur design chk it out too.its faster and fine. Best of luck :) Rgds, Bala.C  Manfield Chow <> wrote: > Hadi, Sridhar, > > Thank you for your reply. > the clock frequency is not too high. It should be > 20Mhz to 40Mhz. > My boss tells me that if we implement 16X16 > multiplier directly, > the size will be very big. So, I want to know some > technic to design a > single cycle 16X16 multiplier to reduce this size. > Moreover, If i implement the design by FPGA, then > change to ASIC. > As there are specific blocks in FPGA (eg. Logic > block , lookup table), how > can i put this in my ASIC? some tools to do this? or > buy library from FPGA's > vender? > or some format of files (eg. netlist) generate by > FPGA tools which can be > read by ASIC layout design tools? > > Thank you for your help. > > With best regards, > Reala __________________________________________________ 
Reply by ●July 25, 200220020725
Hi Bala, Thank you for your help. For more details, we would like to design a multiplier for DSP chip. One feature of DSP chip is one cycle for multiplication. If i design the multiplier directly, I afraid that the size of multiplier will be too big. So, I want to know how to design a multiplier for DSP chip. Yes, I know that dedicated multiplier in virtexII. Then, If i want to translate the design from FPGA to ASIC. How to translate the dedicated multiplier to our ASIC? Can I get the design of this multiplier? Or pay the money to buy the design of this multiplier from Xilinx? Thanks Reala  Original Message  From: "Bala Subramani.C" <> To: <> Sent: Thursday, July 25, 2002 12:12 PM Subject: Re: [fpgacpu] 16 X 16 multiplier > HI Reala, > Implementing a multiplier in a single clk means > that u gotta use some combinational logic whose delay > is 1/frequncy.It will take some logic(any good digital > book gives an idea abt multipliers(i ahve read one in > smith book (ASIC))also u ahve lot of info in the net > regarding combl multipliers.If u go for sequential > ones it will take less logic and give good freq also > with reduced logic when compared to combl ones.U can > even try pipelining a combl multipliuer urself. > P.S:U also have a dedicated multiplier in virtexII > which will be implemented by leospec if u jus use * in > ur design chk it out too.its faster and fine. > Best of luck :) > Rgds, > Bala.C > >  Manfield Chow <> > wrote: > > Hadi, Sridhar, > > > > Thank you for your reply. > > the clock frequency is not too high. It should be > > 20Mhz to 40Mhz. > > My boss tells me that if we implement 16X16 > > multiplier directly, > > the size will be very big. So, I want to know some > > technic to design a > > single cycle 16X16 multiplier to reduce this size. > > Moreover, If i implement the design by FPGA, then > > change to ASIC. > > As there are specific blocks in FPGA (eg. Logic > > block , lookup table), how > > can i put this in my ASIC? some tools to do this? or > > buy library from FPGA's > > vender? > > or some format of files (eg. netlist) generate by > > FPGA tools which can be > > read by ASIC layout design tools? > > > > Thank you for your help. > > > > With best regards, > > Reala > __________________________________________________ > > To post a message, send it to: > To unsubscribe, send a blank message to: > > ">http://docs.yahoo.com/info/terms/ 

Reply by ●July 25, 200220020725
> For more details, we would like to design a multiplier for DSP chip. Do you really need singlecycle latency, or is it acceptable to be able to start a new multiply every cycle, but get the results of each one several cycles after it was started? For many DSP tasks such as digital filters, a multiple cycle latency is acceptable. Anyhow, pipelining the multiplier may not reduce its size, but it will certainly let you get a faster cycle time. 
Reply by ●July 25, 200220020725
Dear Eric, Actually, my boss request single cycle. Moreover, MAC  multiplyadder module is another difficult task for me. If multiple is not single cycle, MAC will be very slow. Am I correct? As the specification of our DSP chip is not finalized, may be the most important thing for me is study "how to develop DSP chip". If you have any site recommand for me to learn DSP development, please tell me ^_^. Thanks a lot. With best regards, Reala  Original Message  From: "Eric Smith" <> To: <> Sent: Thursday, July 25, 2002 1:24 PM Subject: Re: [fpgacpu] 16 X 16 multiplier > > For more details, we would like to design a multiplier for DSP chip. > > Do you really need singlecycle latency, or is it acceptable to be able > to start a new multiply every cycle, but get the results of each one > several cycles after it was started? For many DSP tasks such as digital > filters, a multiple cycle latency is acceptable. > > Anyhow, pipelining the multiplier may not reduce its size, but it will > certainly let you get a faster cycle time. > > > To post a message, send it to: > To unsubscribe, send a blank message to: > > ">http://docs.yahoo.com/info/terms/ 

Reply by ●July 25, 200220020725
> Actually, my boss request single cycle. OK, but a pipelined multiplier is still considered single cycle, in that every cycle you can start a new multiply. > Moreover, MAC  multiplyadder module is another difficult task for me. > If multiple is not single cycle, MAC will be very slow. Am I correct? Pipelined multiply works just fine for MAC. Let's say you're going to do a series of 50 MACs, and you have a pipelined multiplier with a threecycle latency. Your operands are A0..A49 and B0..B49. Further, let's assume that your accumulator takes one cycle. multiplier multiplier accumulator inputs output output (multiplicands) (product) (sum of products)    Cycle 0: input A0, B0 don't care don't care Cycle 1: input A1, B1 don't care don't care Cycle 2: input A2, B2 don't care don't care Cycle 3: input A3, B3 A0*B0 force zero Cycle 4: input A4, B4 A1*B1 A0*B0 Cycle 5: input A5, B5 A2*B2 A0*B0+A1*B1 Cycle 6: input A6, B6 A3*B3 Sum for i = 0 to 2 of Ai*Bi Cycle 7: input A7, B7 A4*B4 Sum for i = 0 to 3 of Ai*Bi .... Cycle 49: input A49, B49 A46*B46 Sum for i = 0 to 45 of Ai*Bi Cycle 50: don't care A47*B47 Sum for i = 0 to 46 of Ai*Bi Cycle 51: don't care A48*B48 Sum for i = 0 to 47 of Ai*Bi Cycle 52: don't care A49*B49 Sum for i = 0 to 48 of Ai*Bi Cycle 53: don't care don't care Sum for i = 0 to 49 of Ai*Bi As you can see, you've completed 50 MACs in 54 cycles, even though the total time to compute one MAC is 4 cycles. As I said before, a pipelined parallel multiplier will generally take as much space as a flowthrough parallel multiplier. But a pipelined parallel multiplier with a latency of three can typically be cycled almost three times faster than the flowthrough multiplier, so you get nearly three times the total data throughput. 
Reply by ●July 25, 200220020725
Dear Eric, Thanks a lot. I am more understand now. Reala  Original Message  From: "Eric Smith" <> To: <> Sent: Thursday, July 25, 2002 2:58 PM Subject: Re: [fpgacpu] 16 X 16 multiplier > > Actually, my boss request single cycle. > > OK, but a pipelined multiplier is still considered single cycle, in > that every cycle you can start a new multiply. > > > Moreover, MAC  multiplyadder module is another difficult task for me. > > If multiple is not single cycle, MAC will be very slow. Am I correct? > > Pipelined multiply works just fine for MAC. > > Let's say you're going to do a series of 50 MACs, and you have a pipelined > multiplier with a threecycle latency. Your operands are A0..A49 and > B0..B49. Further, let's assume that your accumulator takes one cycle. > multiplier multiplier accumulator > inputs output output > (multiplicands) (product) (sum of products) >    > Cycle 0: input A0, B0 don't care don't care > Cycle 1: input A1, B1 don't care don't care > Cycle 2: input A2, B2 don't care don't care > Cycle 3: input A3, B3 A0*B0 force zero > Cycle 4: input A4, B4 A1*B1 A0*B0 > Cycle 5: input A5, B5 A2*B2 A0*B0+A1*B1 > Cycle 6: input A6, B6 A3*B3 Sum for i = 0 to 2 of Ai*Bi > Cycle 7: input A7, B7 A4*B4 Sum for i = 0 to 3 of Ai*Bi > .... > Cycle 49: input A49, B49 A46*B46 Sum for i = 0 to 45 of Ai*Bi > Cycle 50: don't care A47*B47 Sum for i = 0 to 46 of Ai*Bi > Cycle 51: don't care A48*B48 Sum for i = 0 to 47 of Ai*Bi > Cycle 52: don't care A49*B49 Sum for i = 0 to 48 of Ai*Bi > Cycle 53: don't care don't care Sum for i = 0 to 49 of Ai*Bi > As you can see, you've completed 50 MACs in 54 cycles, even though the > total time to compute one MAC is 4 cycles. > > As I said before, a pipelined parallel multiplier will generally take > as much space as a flowthrough parallel multiplier. But a pipelined > parallel multiplier with a latency of three can typically be cycled > almost three times faster than the flowthrough multiplier, so you get > nearly three times the total data throughput. > > > To post a message, send it to: > To unsubscribe, send a blank message to: > > ">http://docs.yahoo.com/info/terms/ 