Somewhat related is decoding during the EX stage (late stage
decoding).
Lately I've been switching to providing more decoding during the EX
stage and moving it out of the usual decode stage. The FPGA's seem to
be good enough at decoding, that there is typically only a minimal
slowdown. I've found it a lot easier to pass a signal called 'opcode'
around rather than a whole bunch of decodes.
If it's an invalid opcode the output is set to zero so that the
outputs of different units can just be or'd together.
`define XORL 7'd16
`define ANDL 7'd17
`define ORL 7'd18
`define XORM 7'd20
`define ANDM 7'd21
`define ORM 7'd22
`define XORH 7'd24
`define ANDH 7'd25
`define ORH 7'd26
module logic_unit(op, a, b, o);
parameter WID = 64;
input [6:0] op; // opcode
input [WID:1] a; // operand 'a'
input [WID:1] b; // operand 'b'
output [WID:1] o; // output result
reg [WID:1] o;
always @(a or b or op)
case (op)
`XORL,
`XORM,
`XORH: o = a ^ b;
`ANDL,
`ANDM,
`ANDH: o = a & b;
`ORL,
`ORM,
`ORH: o = a | b;
default: o = {WID{1'b0}};
endcase
endmodule

(You need to be a member of fpga-cpu -- send a blank email to fpga-cpu-subscribe@yahoogroups.com )