Tagged: MIPSfpga COrExtend UDI
- January 27, 2016 at 10:04 pm #52745
CorExtend is a feature of MIPS32 microAptiv microprocessor that allows system designers to define and add their own instructions that operate on data in the general-purpose registers in the same manner as standard MIPS instructions.
MicroAptiv UP Integrator’s Guide, which is included in MIPSfpga documentation, describes CorExtend but it is not detailed enough. This post clarifies using CorExtend UDI interface protocol in order to help other participants of imagination university programme to create their own projects with custom instructions.
This post can be also download as a pdf file by the link MIPS microAptiv UP Processor CorExtend UDI interface protocol guide.
Example project with simulation sources can be downloaded on GitHub https://github.com/zatslogic/UDI_example. The project is described further in this post.
Position of CorExtend in the top-level RTL hierarchy of a m14k microAptiv processor core is shown below.
All core signals at the m14k cpu level, including CorExtend UDI, are listed in MIPS32 microAptiv UP Processor Core Family Integrators Guide (Table 2.3 Signal Descriptions for m14k cpu Level). In addition to signals connected to m14k cpu, custom CorExtend block has external signals with variable width propagated out of m14k top.
In order to implement custom CorExtend block, m14k_edp_buf misc and m14k_udi_stub should be modified. Input and output signals of m14k_edp_buf_misc should be connected to each other, for example, like this.
assign UDI_ir_e[31:0] =mpc_ir_e ;
assign UDI_irvalid_e =mpc_irval_e ;
assign UDI_rs_e[31:0] =edp_abus_e ;
assign UDI_rt_e[31:0] =edp_bbus_e ;
assign UDI_endianb_e =cpz_rbigend_e ;
assign UDI_kd_mode_e =cpz_kuc_e ;
assign UDI_kill_m =mpc_killmd_m ;
assign UDI_start_e =mpc_run_ie ;
assign UDI_run_m =mpc_run_m ;
assign UDI_greset =greset ;
assign UDI_gscanenable =gscanenable ;
assign UDI_gclk =gclk ;
assign edp_udi_wrreg_e[4:0]=UDI_wrreg_e ;
assign edp_udi_ri_e =UDI_ri_e ;
assign edp_udi_stall_m =UDI_stall_m ;
assign edp_udi_present =UDI_present ;
assign edp_udi_honor_cee=UDI_honor_cee ;s
mvp_mux2 #(32) _res_m_31_0_(res_m[31:0],mpc_udislt_sel_m, asp_m, UDI_rd_m);
Actual custom CorExtend block should replace m14k_udi_stub. Example of interaction between CorExtend and the microAptiv UP core is presented on the waveform below.
The UDI_present signal must be tied high. UDI_honor_cee can be tied low; in case it is tied high, Status CEE bit must be asserted using mtc0 instruction before any attempt to execute a CorExtend instruction. Otherwise the CorExtend unusable exception will occur and UDI_kill_m will be set during two clock cycles on the next clock cycle after UDI_start_e is asserted.
Every instruction word being executed by the core arrives on UDI_ir_e[31:0] with UDI_irvalid_e signal. UDI_start_e indicates the execution stage of the microAptiv UP core pipeline. If instruction has RS and/or RT operands, they arrive correspondingly on UDI_rs_e[31:0] and UDI_rt_e[31:0] with the UDI_start_e signal.
Some parts of an instruction must be decoded on the same cycle with UDI_start_e arriving. It is crucial for forming UDI_ri_e, which must be asserted on the same cycle with UDI_start_e if the instruction is illegal. If the instruction has to write the result to the processor’s general-purpose register, the address of RD must be presented on UDI_wrreg_e[4:0] on the same cycle with UDI_start_e. Other fields of the instruction may be registered and decoded later.
The signal UDI_wrreg_e[4:0] can address 31 processor’s general-purpose registers; value 5’d0 means not writing to them.
The result of the UDI instruction to be written to the register file must be presented on UDI_rd_m[31:0] on the next cycle after UDI_start_e. In case it should be written later, UDI_stall_m must be asserted on the next clock cycle after UDI_start_e. UDI_stall_m must be deasserted in the clock cycle before the result is present on UDI_rd_m[31:0].
Image below represents the UDI instruction format. Major opcode of UDI is included in special2 major opcodes and equals 6’d28. RS and RT fields address source operand registers. Bits 15..6 may be used for custom CorExtend block purposes. For example, the address of the destination register to write the result can be written there. Function field has bits 5..4 with a mandatory value of 2’b01 and bits 3..0 capable of encoding up to 16 UDI instructions.
Implementation of a custom CorExtend block is illustrated by the following example of the DSP accelerator block.
The block performs several closely related operations. It calculates instantaneous power
of a quadrature signal P(t) which is defined asP[t]=a2[t] + b2[t]
where a(t) and b(t) are correspondingly real and imaginary parts of a quadrature signal.
This operation is useful for signal detection using comparing with a threshold.
Implemented UDI instructions are presented below.
Instruction: UDI0 RD; RS; RT
Explanation: RD = RS[31:16]2 + RT[31:16]2
Function field: 6’b010000
Instruction: UDI1 RD; RS; RT
Explanation: RD = (RS[31:16]2 + RT[31:16]2) >> 1
Function field: 6’b010001
Instruction: UDI2 RD; RS
Explanation: RD = RS[31:16]2
Function field: 6’b010010
Instruction: UDI3 RS
Explanation: stored_threshold = RS
Function field: 6’b010011
Instruction: UDI4 RD; RS; RT
Explanation: RD = ( (RS[31:16]2 + RT[31:16]2) > stored_threshold ) ? 1:0
Function field: 6’b010100
Instruction: UDI5 RD; RS; RT
Explanation: RD = ( ((RS[31:16]2 + RT[31:16]2) >> 1) > stored_threshold ) ? 1:0
Function field: 6’b010101
Instruction: UDI6 RD; RS; RT
Explanation: RD = ( RS[31:16]2 > stored_threshold ) ? 1:0
Function field: 6’b010110
UDI0 calculates instantaneous power. RS and RT are source operands which contain 16-bit real and imaginary parts of a quadrature signal. The 32-bit result is put in a RD destination register.
UDI1 does essentially the same operation as UDI0. The di fference is that UDI1 shifts the result to prevent overflow.
UDI2 calculates instantaneous power using only real part of a quadrature signal. RT operand is not used.
UDI3 stores 32-bit threshold value in an internal register of the CorExtend block, no result is returned.
UDI4, UDI5, and UDI6 correspondingly do UDI0, UDI1, and UDI2 operations and compare the result with the stored threshold value. If it is exceeded, a value of 32’d1 is returned. Otherwise, a value of 32’d0 is returned.
All instructions, except UDI3, write results to the register file and, therefore, require the address of the destination register. To that end, field RD was included in the instruction word structure, as shown in figure below.
The code listing below shows the program written in MIPS assembler for testing all developed UDI instructions.
Machine Code Instruction Address Assembly Code
3c088000 // bfc00000: lui $8, 0x8000
3c09beaf // bfc00004: lui $9, 0xbeaf
71095010 // bfc00008: udi0 $8 $9 $10
71095011 // bfc0000c: udi1 $8 $9 $10
71005012 // bfc00010: udi2 $8 $10
3c0bbeaf // bfc00014: lui $11, 0xbeaf
356bdead // bfc00018: ori $11,$11, 0xdead
71600013 // bfc0001c: udi3 $11
71095014 // bfc00020: L1: udi4 $8 $9 $10
71095015 // bfc00024: udi5 $8 $9 $10
71095016 // bfc00028: udi6 $8 $9 $10
3c0b0001 // bfc0002c: lui $11, 0x0001
356bfeed // bfc00030: ori $11,$11, 0xfeed
71600013 // bfc00034: udi3 $11
1000fff9 // bfc00038: beq $0, $0, L1
00000000 // bfc0003c: nop
Example project that implements in Verilog custom CorExtend block from the example above can be downloaded with the link https://github.com/zatslogic/UDI_example.
The project includes all sources needed for simulation except the files from rtl_up directory. You may need XilinxCorelib for simualtion. It can be compiled in Vivado using tcl command compile_simlib.
Example project has two variants of custom CorExtend block. The first one performs all UDI instructions in one cycle. The second one has additional pipelining and requires more cycles for some instructions. It was made especially to utilize UDI_stall_m signal.
Waveforms below show simulation of the assembler program from the above.
In the first unpipelined variant first three instructions UDI0, UDI1, and UDI2 are executed as show in the figure below.
It can be seen that instructions arrives on UDI_ir_e with the signals UDI_irvalid_e and UDI_start_e. Operands are valid in the same cycle on UDI_rs_e and UDI_rt_e. The address of the GPR register to write the result is also formed in this very cycle. In the next cycle the result is valid on UDI_rd_m.
The signals from the register file (rf) are presented in the waveform as well. The result is written to the GPR with address displayed on mpc_dest_w. The data value can be seen on edp_wrdata_w with the strobe on mpc_rfwrite_w. Addresses of the operands being read from GPR are presented on mpc_rega_i and mpc_regb_i.
In the figure below the instructions UDI3, UDI4, UDI5, and UDI6 are shown.
As can be seen from the code listing, UDI3 writes a value 0xbeafdead to stored_threshold. It is a value of zero, that is written to the result since none of the computation product has exceeded threshold.
In the next waveform instructions UDI3, UDI4, UDI5, and UDI6 are executed again after a conditional jump was taken. Here the threshold value is lower than the computational products, and thus the results of executing these instructions are 0x000001.
The next three waveforms shows simulation of pipelined UDI block.
In the figure below instructions UDI0, UDI1, and UDI2 are executed. UDI_stall_m asserted while computation is being done in the UDI block. The result arrives on UDI_rd_m in the next cycle after deasserting UDI_stall_m. In the further cycle the result is written to the GPR.
In the waveform below instructions UDI4, UDI5, and UDI6 are executed with signal UDI_stall_m.
In the waveform below instructions UDI4, UDI5, and UDI6 are executed. The result value is different from the figure above.