smile – Tile1 core + debugger playground
smile is a small environment for experimenting with a single RV32 core (Tile1), simple accelerators, and a debugger, outside of the full smicro SoC harness.
It gives you:
- A standalone RV32 core (
Tile1) - A clean memory abstraction (
MemoryPort) - A tiny instruction decoder and execute helpers
- A debugger REPL (breakpoints, stepping, memory dump)
- A testbench (
tb_tile1) that ties it all together
The idea: you can iterate on the core, accelerators, and debugger in smile, then reuse the same pieces when the core is embedded in smicro.
Big picture: who talks to whom?
In the tb_tile1 testbench, the wiring looks like this:
+---------------------+
| tb_tile1.cpp |
|---------------------|
| - creates… |
| - Tile1 |
| - Dram |
| - DramMemoryPort |
| - accel |
| - debugger |
+----------+----------+
|
V
Tile1 (CPU core, Tile1.cpp / Tile1_exec.cpp)
|
| MemoryPort::read32 / write32
V
DramMemoryPort (shim, defined in tb_tile1.cpp)
|
| Dram::read / Dram::write
V
Dram (pulled in from smicro/src/Dram.hpp)
+-------------------+
| AccelArraySum |
| (AccelArraySum.*) |
| talks to memory |
| via same |
| MemoryPort |
+-------------------+
On each cycle:
- The debugger calls Sim::run().
- Cascade ticks the clock and calls Tile1::tick().
- Tile1::tick() fetches an instruction via mem_port_->read32(pc).
- The memory port shim turns that into a Dram::read(…) and returns the word.
- Tile1 decodes and executes the instruction (possibly calling the accelerator).
- The debugger inspects state and prints traces / breakpoints as needed.
smile/ directory structure
smile/
├── include/ # Public headers for core, memory shim, accelerators, debugger
│ ├── AccelArraySum.hpp # single-cycle array-sum accelerator interface
│ ├── AccelArraySumMc.hpp # multi-cycle array-sum accelerator interface
│ ├── AccelDemoAdd.hpp # demo add accelerator interface
│ ├── AccelPort.hpp # abstract accelerator contract + ACCEL_E_* codes
│ ├── Debugger.hpp # debugger REPL interface
│ ├── Diagnostics.hpp # postmortem diagnostics helpers
│ ├── Instruction.hpp # RV32 decoder interface
│ ├── MemCtrlTimedPort.hpp # fixed-latency MemoryPort wrapper
│ ├── Tile1_exec.hpp # exec_* helper declarations
│ └── Tile1.hpp # Tile1 core interface + MemoryPort API
├── src/
│ ├── AccelArraySum.cpp # single-cycle array-sum accelerator
│ ├── AccelArraySumMc.cpp # multi-cycle array-sum accelerator
│ ├── AccelDemoAdd.cpp # trivial demo accelerator
│ ├── Debugger.cpp # debugger REPL + stepping logic
│ ├── Diagnostics.cpp # helper traces/asserts
│ ├── Instruction.cpp # RV32 decoder
│ ├── MemCtrlTimedPort.cpp # fixed-latency MemoryPort implementation
│ ├── tb_tile1.cpp # testbench main() + suite injection
│ ├── Tile1_exec.cpp # exec_* helpers (ALU, load/store, branch, CSR, custom0)
│ ├── Tile1.cpp # core fetch/decode/execute/trap + stall logic
│ └── util/
│ ├── FlatBinLoader.cpp # load flat .bin into MemoryPort
│ └── FlatBinLoader.hpp # loader interface for flat binaries
├── progs/ # RV32 test programs + generated binaries
│ ├── link_rv32.ld # linker script (places _start at 0x0)
│ ├── core/ # core bring-up / ISA tests (smurf, smexit, etc.)
│ ├── sci/ # small scientific kernels
│ ├── *.elf / *.bin # generated outputs (e.g., smurf.bin, hmm_step.bin)
│ └── .smile_dbg # optional debugger breakpoint file
├── docs/
│ └── accel_port.md # shared v1 accelerator contract
└── CMakeLists.txt # build rules for tile1/tb_tile1 + smile_progs
Current core test programs (smile/progs/core/)
These are the small, freestanding RV32 programs that are currently used with smile for micro-architectural and debugger testing.
smexit.c– minimal exit sanity check
Issues anecallwith syscall ID 93 (put in a7 so system knows how to interpret thisecall) and an exit code (e.g., 7 in a0) almost immediately. Used to verify that:- ECALL handling works
Tile1::request_exit()is wired correctly- the simulator halts cleanly with the expected exit code
smurf.c– main bring-up + trap-handling demo
Sets up a stack, installs a trap handler inmtvec, computesSUM(0…9) = 45into memory at0x0100, and sets a flag at0x0104depending on whether the sum is correct. It then:- triggers
ebreakandecall - uses
mcauseto distinguish breakpoint vs ecall - writes
0xBEEFto0x0108on breakpoint and0xDEADto0x0104on ecall
This program exercises:
- LW/SW and simple ALU arithmetic in a loop
- conditional branches
- CSR setup via
mtvec - trap entry/exit (mepc/mcause, mret)
- triggers
smurf_debug.c– debugger REPL exercise
Writes known patterns to “scratch” locations at0x0100and0x0104, loads specific constants into registers (t0,t1,s0,a0), then executesebreakfollowed byecall. It is designed so that you can:- break into the debugger
- inspect registers and memory from the REPL
- confirm that exit carries the expected code
smurf_threads.c– toy shared-sum “two-thread” demo
Two C functions (thread0andthread1) both update a shared globalsum:thread0adds 1 tosumfive times, with anebreakafter each addthread1adds 2 tosumfive times, also breaking after each add
After both complete, the program ECALLs and returnssumas the exit code. This is useful for:- observing repeated traps/breakpoints in a loop
- watching how shared memory updates show up in the debugger
All of these programs are linked with progs/link_rv32.ld, which places _start at address 0x00000000 so Tile1 can begin execution by simply starting at PC=0 after reset.
Current scientific programs (smile/progs/sci/)
These are small scientific kernels.
sum_lpv.c– LPV-style reduction Initializes an array ofN32-bit values at0x0200(1,2,…,N), computes their sum, stores the result at0x0100, and exits via ECALL 93 with the sum as the exit code. Intended as a first “LPV-like” scalar kernel to study instruction mix and memory access patterns onTile1.
Core Pieces
Tile1(Tile1.hpp/cpp): the RV32 core implementation- Files:
include/Tile1.hpp,src/Tile1.cpp,src/Tile1_exec.cpp,include/Instruction.hpp,src/Instruction.cpp - Role: implements a simple RV32I core
- one main entrypoint per cycle:
void Tile1::tick() - standard logical handling sequence: fetch, decode, execute, trap/PC update
- one main entrypoint per cycle:
- Notes:
- uses
MemoryPortto talk to memory (abstract interface,Tile1never talks directly to DRAM) - uses
Instructionfor decoding RV32I instructions - uses exec_* helpers in
Tile1_exec.cppfor ALU, loads, branches, CSR, custom0
- uses
- Files:
MemoryPort: a general protocol link forTile1to talk to memories through- Files:
include/Tile1.hpp(abstract class defined here),src/tb_tile1.cpp(concrete implementation for a Dram model) - Role: abstract memory interface for allowing
Tile1to access all sorts of memory backends- exposes simple word-oriented API:
read32(addr),write32(addr, value). - designed to be synchronous (0-delay) for simplicity.
- exposes simple word-oriented API:
- Files:
- Accelerators:
- Files:
include/AccelPort.hpp,include/AccelArraySum.hpp/cpp,include/AccelDemoAdd.hpp/cpp - Role: simple accelerators that
Tile1can call via custom0 instructionsAccelPort: abstract interface for “something that can take a CUSTOM-0 instruction and optional memory access.”AccelArraySum: an example accelerator that sums an array in memory.
- Files:
Debugger:- Files:
include/Debugger.hpp,src/Debugger.cpp - Role: a simple REPL debugger for stepping through instructions, setting breakpoints, and inspecting state
- connected to
Tile1to read registers, PC, and memory viaMemoryPort.
- connected to
- Files:
Testbench:- Files:
src/tb_tile1.cpp - Role: a simple testbench that instantiates
Tile1,Dram, andDebuggerand runs the simulation.- loads a flat binary program into memory
- runs the simulation loop, calling
Tile1::tick()andDebugger::run()
- Files:
tb_tile1 quick reference
tb_tile1 is the standalone runner for smile: it constructs Tile1, memory (DramMemoryPort + MemCtrlTimedPort), optional accelerator, and the debugger loop.
If -prog is provided, it loads that flat binary and runs it. If -prog is empty, it injects a built-in suite program selected by -suite.
| Option | Default | Meaning |
|---|---|---|
-showcontexts |
0 |
Print Cascade component contexts and exit. |
-prog=<path> |
"" |
Path to flat .bin to load. When set, suite injection is skipped. |
-load_addr=<hex/int> |
0x0 |
Address where the program image is written. |
-start_pc=<hex/int> |
0x0 |
Initial PC override. If 0, injected suites start at load_addr. |
-mem_latency=<n> |
0 |
Fixed latency (cycles) used by MemCtrlTimedPort. |
-ideal_mem=<0/1> |
0 |
Force Tile1 ideal memory mode (sync read/write, no request/response stalls). |
-mem_model=timed\|ideal |
timed |
Tile1 memory model selection. |
-accel=none\|demo_add\|array_sum\|array_sum_mc |
array_sum |
Accelerator attached to CUSTOM-0. |
-suite=proto_accel_sum\|proto_accel_sum_altaddr\|proto_accel_sum_badarg\|proto_accel_sum_unsupported\|proto_accel_sum_twice |
proto_accel_sum |
Built-in injected test suite used only when -prog is empty. |
-selfcheck=<0/1> |
0 |
Run built-in regression matrix across accel/suite/memory latency; exits nonzero on failure. |
-steps=<n> |
0 |
Auto-run for n cycles; <=0 enters interactive debugger REPL. |
-sw_threads=<1|2> |
1 |
Number of software thread contexts scheduled by the debugger. |
-ignore_bpfile=<0/1> |
0 |
Do not load .smile_dbg breakpoints on startup. |
Example usage
Build tb_tile1
cd smarc
smarc $ cmake -S . -B build -DCEDAR_DIR=/path/to/Cascade/cedar -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
smarc $ cmake --build build --target tb_tile1 -j
Run Built-In Suites (No -prog)
Run the default injected suite (proto_accel_sum):
./build/smile/tb_tile1 -steps=50 -sw_threads=1
Run an explicit injected suite:
./build/smile/tb_tile1 -suite=proto_accel_sum_twice -accel=array_sum_mc -mem_latency=5 -steps=800
Recent validated proto runs:
./build/smile/tb_tile1 -accel=array_sum -suite=proto_accel_sum -steps=200
./build/smile/tb_tile1 -accel=array_sum_mc -mem_latency=5 -suite=proto_accel_sum -steps=400
./build/smile/tb_tile1 -accel=array_sum_mc -mem_latency=5 -suite=proto_accel_sum_twice -steps=800
./build/smile/tb_tile1 -accel=array_sum_mc -mem_latency=5 -suite=proto_accel_sum_badarg -steps=200
./build/smile/tb_tile1 -accel=array_sum_mc -mem_latency=5 -suite=proto_accel_sum_unsupported -steps=200
Selfcheck Regression (-selfcheck)
Run the built-in regression matrix (non-interactive). It returns exit code 0 when all cases pass, otherwise 1:
./build/smile/tb_tile1 -selfcheck=1
Run External Program (-prog)
Build a flat binary and run it:
smarc $ cd smile/progs
smarc/smile/progs $ riscv64-unknown-elf-gcc -march=rv32i_zicsr -mabi=ilp32 -nostartfiles -nostdlib -T link_rv32.ld core/smurf.c -o prog.elf
smarc/smile/progs $ riscv64-unknown-elf-objcopy -O binary prog.elf prog.bin
smarc/smile/progs $ cd ../..
smarc $ ./build/smile/tb_tile1 -prog=smile/progs/prog.bin -load_addr=0x0 -start_pc=0x0 -steps=100 -sw_threads=1
Build and run the accelerator smoke test (accel_sum_test) via smile_progs:
# You don't have to build .bin one-by-one; the smile_progs target builds them all.
smarc $ cmake --build build --target smile_progs -j
smarc $ ./build/smile/tb_tile1 -prog=smile/progs/accel_sum_test.bin -accel=array_sum_mc -mem_latency=5 -steps=5000
smarc $ ./build/smile/tb_tile1 -prog=smile/progs/accel_sum_test.bin -accel=array_sum -steps=2000
Expected for both runs:
[EXIT] Program exited with code 136
Interactive Debugger REPL
Launch the debugger (no -steps):
./build/smile/tb_tile1 -prog=smile/progs/prog.bin -load_addr=0x0 -start_pc=0x0 -sw_threads=1
# then in the REPL:
smile> step
smile> regs
smile> mem 0x100 4
smile> break 0x10
smile> cont
smile> exit
Multithread Scheduling (-sw_threads)
tb_tile1 can schedule one or two software thread contexts in the debugger loop:
-sw_threads=1(default): single-context execution (recommended for performance and instruction counting).-sw_threads=2: round-robin two contexts; useful for multithread/debug experiments.
Example:
# Single-thread (baseline perf counters)
./build/smile/tb_tile1 -prog=smile/progs/prog.bin -steps=2000000 -sw_threads=1
# Two software contexts (typically doubles instruction/counter totals if both run same program)
./build/smile/tb_tile1 -prog=smile/progs/prog.bin -steps=2000000 -sw_threads=2
Relationship to smicro
- smile focuses on
- a single core (
Tile1) - instruction set
- accelerators
- debugger REPL
- a single core (
- smicro focuses on
- SoC wiring (MemCtrl, DRAM, suites),
- memory protocols (MemReq/MemResp),
- different topologies and test suites
The same Tile1 + Instruction + Tile1_exec + accelerator code is meant to be shared between smile and smicro. The only thing that changes is how the MemoryPort is implemented and how the core gets clocked.