Skip to content

Commit

Permalink
Adds the new j1 subtract support to keep the j4 properly software com…
Browse files Browse the repository at this point in the history
…patible. Also some additions to the j4a example
  • Loading branch information
RGD2 committed Mar 8, 2016
1 parent ffbefd4 commit 83014c5
Show file tree
Hide file tree
Showing 4 changed files with 83 additions and 49 deletions.
Binary file modified j1a/icestorm/j1a8k.bin
Binary file not shown.
Binary file modified j1a/icestorm/j4a.bin
Binary file not shown.
67 changes: 48 additions & 19 deletions j1a/j4a_multithread_example.fs
Original file line number Diff line number Diff line change
@@ -1,43 +1,72 @@
\ Examples for the j4a's multitasking capabilities.
\ You probably don't want to #include this

: assign1 ( xt -- ) $100 io! ;
: assign2 ( xt -- ) $200 io! ;
: assign3 ( xt -- ) $400 io! ;
: kill1 2 $4000 io! ;
: kill2 4 $4000 io! ;
: kill3 8 $4000 io! ;

: on1 ( xt -- ) $100 io! ;
: on2 ( xt -- ) $200 io! ;
: on3 ( xt -- ) $400 io! ;
: once ( standard/looping xt -- runonce xt ) 1 or ;
: kill1 0 on1 2 $4000 io! ;
: kill2 0 on2 4 $4000 io! ;
: kill3 0 on3 8 $4000 io! ;
: stopall 0 on1 0 on2 0 on3 ;
: killall kill1 kill2 kill3 ;
\ note killx will only work from slot0, if called by numbered slots, they will only kill themselves.

\ and now for very simple "model-view-controller" app
\ this exploitsthe j4's architecture to break the system design into separate, simple pieces

variable display
variable delay

0 display !
42 delay !

: update display @ 1 + display ! ;
: show display @ leds ;
' show assign1 \ assigns show to slot 1. note no loop.

: show display @ leds ; \ this is the "view" in a MVC. You might have a more complicated "display" function.

' show on1 \ assigns show to slot 1. note no loop.

: update display @ 1 + display ! ; \ this is the "model" in a MVC pattern

: t2 update delay @ ms ;
' t2 $200 assign2 \ assigns a timed update to slot 2, also no loop.
' t2 on2 \ assigns a timed update to slot 2, also no loop.

\ leds will count upward, but quit will still run. that was nice and easy, wasn't it?


\ leds will count upward, but quit will still run.
\ try:
\ 10 delay !
\ here, user interaction via quit is the "controller" in the MVC pattern, but we could just as well have a small loop polling some switches, running on3

: slowcount 0 0 do i leds 42 ms loop ;
' slowcount assign3 \ conflicts with the show thread, but both keep running anyway. Note that it has a loop. Will take ~3 hours to finish.

\ now let's sabotage our system:
: slowcount 0 0 do i leds 10 ms loop ;
' slowcount on3 \ conflicts with the show thread, but both keep running anyway.
\ Note that it has a loop. this will take ~3 hours to finish, then will run repeatedly since we forgot to mark it to run just once.


0 assign3 \ asks nicely, but does not stop slot 3, because it's paying no attention. (giving a good example of a locked thread).
0 on3 \ asks nicely, but does not stop slot 3, because it's paying no attention. (giving a good example of a locked thread).
\ 0 is treated as a special case, it causes the core to just sit and poll again, effectively "stopping" it.

kill3 \ selectively resets just the third slot, which does stop it.
kill3 \ selectively resets just the third slot, which does do a nondiscretionary interrupt. Also resets its stacks.
\ note that CTRL-C in the python shell will reset all cores, clearing the stacks, but won't clear the XT's.

0 assign1 0 assign2 \ ask the other two nicely to stop
0 on1 0 on2 \ ask the other two nicely to stop, in between iterations. A "cooperative" interrupt if you will
0 leds \ the led's would be left in whatever state, so we have to clean up ourselves.

: offleds 0 leds ;
' offleds 1 or assign1 \ Run a word just once, not continously. Good for initialisation and cleanup after changing tasks.
' offleds once on1 \ Run a word just once, not continously. Good for initialisation and cleanup after changing tasks.
\ Valid XT's are always even, the lsb is used to autoclear the taskexec register after it has been read once by the slot running it.
\ Note that it is not safe to write XT's one after the other to the same core - it takes several cycles for it to poll for the XT even if it's stopped. so a program with sequential writes *won't* result in each task running one after the next.

\ one need not name one's code:
:noname -1 leds ; once
:noname 0 leds ; once swap on2 on3 \ one core turns leds on, the next will turn them off.
\ note that without 'once' these will 'fight', resulting in the LEDS's flashing at high speed.


\ initialise a core, run for a while, then nicely interrupt that core and clean up before stopping
:noname 0 ; on1 \ initialise a counter on core 1's stack
1 ms \ wait a little while, so core 1 does definitely get a chance to run.
:noname delay @ ms 1+ dup leds ; on1 \ uses just the stack to count on the leds
10000 ms
:noname drop 0 leds ; once on1 \ stops and cleans up the stack, also turns the leds off .
65 changes: 35 additions & 30 deletions j1a/verilog/j4.v
Original file line number Diff line number Diff line change
Expand Up @@ -19,23 +19,23 @@ module j4(
output wire [1:0] io_slot,
output wire [15:0] return_top,
input wire [3:0] kill_slot_rq);

reg [1:0] slot, slotN; // slot select

greycount tc(.last(slot), .next(slotN));

reg [4:0] dsp, dspN;// data stack pointers, -N is not registered,
reg [14:0] dspD; // -D is the delay shift register.

reg [`WIDTH-1:0] st0, st0N;// top of data stacks
reg [3*`WIDTH-1:0] st0D; // top delay


reg [12:0] pc /* verilator public_flat */, pcN; // program counters
reg [38:0] pcD; // pc Delay

wire [12:0] pc_plus_1 = pc + 13'd1;

reg reboot = 1;
reg [3:0] kill_slot = 4'h0;

Expand All @@ -44,38 +44,41 @@ module j4(
// because this *was* pcN, we will instead use what will be pc one clock cycle in the future, which is pcD[12:0]. But wait:
// We make this two clock cycles into the future, and then register again insn so
// that the instruction is already available, not needing to be read from ram... This is why it's pcD[25:13], then:

reg [15:0] insn_now = 0;

// adds a clock delay, but this is fine.
always @(posedge clk) insn_now <= insn;
always @(posedge clk) insn_now <= insn;
// note every reference below here which was to insn will now be to inst_now instead.
// this automatically includes all memory reads, instructions or otherwise.


// io_din is registered once in j4a.v, so it still needs 3 delays to be good.
reg [3*`WIDTH-1:0] io_din_delay = 0;
always @(posedge clk) io_din_delay <= {io_din, io_din_delay[3*`WIDTH-1:`WIDTH]};
wire [`WIDTH-1:0] io_din_now = io_din_delay[`WIDTH-1:0];


// The D and R stacks
wire [`WIDTH-1:0] st1, rst0;
// stack delta controls

// stack delta controls
wire [1:0] dspI, rspI;

reg dstkW,rstkW; // data stack write / return stack write

wire [`WIDTH-1:0] rstkD; // return stack write value

stack2pipe4 #(.DEPTH(16)) dstack_(.clk(clk), .rd(st1), .we(dstkW), .wd(st0), .delta(dspI));
stack2pipe4 #(.DEPTH(19)) rstack_(.clk(clk), .rd(rst0), .we(rstkW), .wd(rstkD), .delta(rspI));


// stack2 #(.DEPTH(24)) dstack(.clk(clk), .rd(st1), .we(dstkW), .wd(st0), .delta(dspI));
// stack2 #(.DEPTH(24)) rstack(.clk(clk), .rd(rst0), .we(rstkW), .wd(rstkD), .delta(rspI));

wire [16:0] minus = {1'b1, ~st0} + st1 + 1;
wire signedless = st0[15] ^ st1[15] ? st1[15] : minus[16];

always @*
begin
// Compute the new value of st0. Could be pipelined now.
Expand All @@ -92,15 +95,17 @@ module j4(
9'b0_011_?0100: st0N = st0 | st1;
9'b0_011_?0101: st0N = st0 ^ st1;
9'b0_011_?0110: st0N = ~st0;
9'b0_011_?0111: st0N = {`WIDTH{(st1 == st0)}};
9'b0_011_?1000: st0N = {`WIDTH{($signed(st1) < $signed(st0))}};

9'b0_011_?0111: st0N = {`WIDTH{(minus == 0)}}; // =
9'b0_011_?1000: st0N = {`WIDTH{(signedless)}}; // <

9'b0_011_?1001: st0N = {st0[`WIDTH - 1], st0[`WIDTH - 1:1]};
9'b0_011_?1010: st0N = {st0[`WIDTH - 2:0], 1'b0};
9'b0_011_?1011: st0N = rst0;
9'b0_011_?1100: st0N = io_din_now; // was io_din, which was a cycle late like insn and st0/st1/rst0/pc/dsp etc
9'b0_011_?1101: st0N = io_din_now;
9'b0_011_?1100: st0N = minus[15:0];
9'b0_011_?1101: st0N = io_din_now; // was io_din, which was a cycle late like insn and st0/st1/rst0/pc/dsp etc
9'b0_011_?1110: st0N = {{(`WIDTH - 5){1'b0}}, dsp};
9'b0_011_?1111: st0N = {`WIDTH{(st1 < st0)}};
9'b0_011_?1111: st0N = {`WIDTH{(minus[16])}}; // u<
default: st0N = {`WIDTH{1'bx}};
endcase
end
Expand Down Expand Up @@ -162,19 +167,19 @@ module j4(
if (!resetq) begin
reboot <= 1'b1;
{ pc, dsp, st0} <= 0;

slot <= 2'b00;
kill_slot <= 4'hf;
end else begin
// reboot needs to be set a clock in advance of the targeted slot.
// reboot needs to be set a clock in advance of the targeted slot.
// kill_slot_rq is read-ahead in case the next thread to execute's time should be up already.
reboot <= kill_slot[slotN] | kill_slot_rq[slotN];
reboot <= kill_slot[slotN] | kill_slot_rq[slotN];

// kill_slot register holds the signals until the right time, and are auto-cleared as each slot is reached, as it will already have been reset by the clock before.
kill_slot[3] <= kill_slot_rq[3] ? 1'b1 : ( (slot == 2'd3) ? 1'b0 : kill_slot[3]) ;
kill_slot[2] <= kill_slot_rq[2] ? 1'b1 : ( (slot == 2'd2) ? 1'b0 : kill_slot[2]) ;
kill_slot[1] <= kill_slot_rq[1] ? 1'b1 : ( (slot == 2'd1) ? 1'b0 : kill_slot[1]) ;
kill_slot[0] <= kill_slot_rq[0] ? 1'b1 : ( (slot == 2'd0) ? 1'b0 : kill_slot[0]) ;
kill_slot[1] <= kill_slot_rq[1] ? 1'b1 : ( (slot == 2'd1) ? 1'b0 : kill_slot[1]) ;
kill_slot[0] <= kill_slot_rq[0] ? 1'b1 : ( (slot == 2'd0) ? 1'b0 : kill_slot[0]) ;

pc <= pcD[12:0];
dsp <= dspD[4:0];
Expand Down

0 comments on commit 83014c5

Please sign in to comment.