Super NES Programming/Super FX tutorial

< Super NES Programming

Introduction

The Super FX is a custom 16-bit RISC processor with a special bitmap emulation function designed for the Nintendo Super NES. It was designed to bring rudimentary 3D capabilities to the SNES. Programming for it is done with special Super FX assembly language. Each SuperFX title uses a combination of standard SNES assembly code with specially compiled SuperFX assembly routines coded as binary data in the cartridge. It can run in parallel with the SNES under certain conditions. Each SuperFX cartridge has on-board RAM which the SuperFX chip uses as a frame buffer and for general purpose operations which it can share with the Super NES.

Existing titles

The SuperFX chip was used in 8 released SNES games, in Starfox 2(unreleased) and in multiple tech demos; 2 of which binaries are available.

Title SuperFX Version ROM Size Work RAM Size Save RAM Size
Starfox/Starwing Mario Chip 8 MBit 256 kbit None
Dirt Racer GSU-1 4 MBit 256 kbit None
Dirt Trax FX GSU-1 4 MBit 512 kbit None
Stunt Race FX GSU-1 8 MBit 512 kbit 64kb
Starfox 2 GSU-1 8 MBit 512Kb 64kb
Vortex GSU-1 4 MBit 256Kb None
Voxel(demo) GSU-1 3 MBit 512Kb None
Powerslide(demo) GSU-1 3 MBit 512Kb None
DOOM GSU-2 16 MBit 512Kb None
Yoshi's Island GSU-2-SP1 16 MBit 256kb 64kb
Winter Gold GSU-2 16 MBit 512Kb 64kb

Theory of Operation

The SuperFX is a co-processor for the SNES CPU. The SuperFX's task is to execute complex mathematical calculations much faster than the SNES and to generate bitmap pictures for simple 3D rendering of SuperFX games. The SuperFX and SNES processors share access to a common Work RAM and Game pak ROM bus. Only one of the SuperFX or SNES CPU may access the game pak ROM and RAM at any time, controlled by special registers. The flow of the SNES and SuperFX accessing the data busses is an art in optimizing the program's efficiency.

The RAM inside the SuperFX cart is different from battery backup RAM - it can be used for storing results of calculations, for storing a Superfx program, for storing bulk data or for storing a PLOT picture the SuperFX is generating. There is quite a lot of RAM - 256k or 512k, which is more than the SNES has.

The SuperFX can process instructions in 3 ways, reading them from game pak RAM. from the gamepak ROM (reading straight out of the ROM chip) or via a special 512 byte instruction cache.

It is possible for the SuperFX to run in parallel with the SNES CPU when using the 512 byte instruction Cache. It involves loading a program in, and then setting the SuperFX to start its work. The 512 byte cache is in general 3x faster compared to running the program in the game pak RAM or ROM. The SuperFX can interrupt the SNES CPU after it finishes processing.

When using the special bitmap functions of the superFX it's possible to quickly load the bitmap out of the gamepak into the SNES Video Ram and display it on the screen. The SNES by default is a tile and sprite based console - pixel based scene construction used in 3D rendered games is very inefficient with stock SNES hardware. In SuperFX games such as DOOM, Starwing and the like, the SuperFX is rapidly painting pixel based scene bitmaps onto the game pak RAM and then throwing it into the SNES VRAM for graphics display many times per second.

Hardware Revisions

There are 3 different hardware revisions of the SuperFX. All revisions are functionally compatible in terms of instruction set but support different ROM sizes.

Registers

The SuperFX registers are mapped from 0x3000 to 0x32FF. Some are 16-bit; some are 8-bit. The explanation of each register is shown in this section. 0x3100-0x32FF is the Instruction Cache.

Overview

The Super FX chip has 16 general-purpose 16-bit registers labeled R0 to R15 plus 11 control registers. Additionally, a memory space from 0x3100 to 0x32FF forms the instruction cache

RegisterAddressDescriptionAccess from SNES
R03000 default source/destination register R/W
R13002 pixel plot X position register R/W
R23004 pixel plot Y position register R/W
R33006 for general use R/W
R43008 lower 16 bit result of lmult R/W
R5300a for general use R/W
R6300c multiplier for fmult and lmult R/W
R7300e fixed point texel X position for merge R/W
R83010 fixed point texel Y position for merge R/W
R93012 for general use R/W
R103014 for general use R/W
R113016 return address set by link R/W
R123018 loop counter R/W
R13301a loop point address R/W
R14301c rom address for getb, getbh, getbl, getbs R/W
R15301e program counter R/W

Control Registers

NameAddressDescriptionSizeAccess from SNES
SFR 3030 status flag register 16 bits R/W
3032 unused
BRAMR 3033 Backup RAM register 8 bitsW
PBR 3034 program bank register 8 bits R/W
3035 unused
ROMBR 3036 rom bank register 8 bits R
CFGR 3037 control flags register 8 bitsW
SCBR 3038 screen base register 8 bitsW
CLSR 3039 clock speed register 8 bitsW
SCMR 303a screen mode register 8 bitsW
VCR 303b version code register (read only) 8 bits R
RAMBR 303c ram bank register 8 bitsR
303d unused
CBR 303e cache base register 16 bitsR

Instruction Cache

NameAddressDescriptionSizeAccess from SNES
1 3100 First byte of instruction cache 8 bits R/W
2 3101 Second byte of instruction cache 8 bitsR/W
... ... ... 8 bitsR/W
... ... ... 8 bitsR/W
512 32FF Five hundred and twelth byte of instruction cache 8 bitsR/W

SFR Status Flag Register

The SFR is a very important register. It controls branching within the SuperFX after evaluating a calculation and can determine the status of the SuperFX when accessed from the SNES CPU.

Bit Description
0 -
1 Z Zero flag
2 CY Carry flag
3 S Sign flag
4 OV Overflow flag
5 G Go flag (set to 1 when the GSU is running)
6 R Set to 1 when reading ROM using R14 address
7 -
8 ALT1 Mode set-up flag for the next instruction
9 ALT2 Mode set-up flag for the next instruction
10 IL Immediate lower 8-bit flag
11 IH Immediate higher 8-bit flag
12 B Set to 1 when the WITH instruction is executed
13 -
14 -
15 IRQ Set to 1 when GSU caused an interrupt. Set to 0 when read by 658c16

BRAMBR Backup RAM Register

Used to allow protection of the save ram inside the Game Pak. This should be set to 0(write disable) normally, and 1(write enable) when saving the game.

Bit Description
0 BRAM Flag (0 = write disable, 1=write enable)
1 Not Used
2 Not Used
3 Not Used
4 Not Used
5 Not Used
6 Not Used
7 Not Used

PBR Program Bank Register

When the SuperFX is loading code it references the PBR register to specify the bank being used. The LJMP instruction is the general method used to change this register.

Bit Description
0 A16 Address Select
1 A17 Address Select
2 A18 Address Select
3 A19 Address Select
4 A20 Address Select
5 A21 Address Select
6 A22 Address Select
7 A23 Address Select

ROMBR Game Pak ROM Bank Register

When using the ROM buffering system, this register specifies the bank of the game pak ROM being copied into the buffer. The ROMB instruction is the general method used to change this register.

Bit Description
0 A16 ROM Address Select
1 A17 ROM Address Select
2 A18 ROM Address Select
3 A19 ROM Address Select
4 A20 ROM Address Select
5 A21 ROM Address Select
6 A22 ROM Address Select
7 A23 ROM Address Select

CFGR Config Register

Controls the clock multiplier and interrupt mask.

Bit Description
0 Not used
1 Not Used
2 Not Used
3 Not Used
4 Not Used
5 MS0 (0=standard,1=high speed)
6 Not Used
7 IRQ (0=normal, 1=masked)

Note: If set to run at 21 MHz through the CLSR flag(1), MS0 flag should be set to 0.

SCBR Screen Base Register

This register sets the starting address of the graphics storage area. It is written to directly, rather than through a specific instruction.

Bit Description
0 A10 Screen Base Select
1 A11 Screen Base Select
2 A12 Screen Base Select
3 A13 Screen Base Select
4 A14 Screen Base Select
5 A15 Screen Base Select
6 A16 Screen Base Select
7 A17 Screen Base Select

CLSR Clock Register

Controls the clock frequency of the Super FX chip.

Bit Description
0 CLSR, 0=10.7 MHz, 1=21.4 MHz
1 Not Used
2 Not Used
3 Not Used
4 Not Used
5 Not used
6 Not Used
7 Not used

SCMR Screen Mode Register

This register sets the number of colors and screen height for the PLOT graphics acceleration routine and additionally controls whether the Super FX or SNES has control of the game pak ROM and work RAM.

Bit Description
0 Color Mode MD0
1 Color Mode MD1
2 Screen Height HT0
3 Game pak Work RAM Access - RAN (0=SNES,1=SuperFX)
4 Game pak ROM Access - RON (0=SNES,1=SuperFX)
5 Screen Height HT1
6 Not used
7 Not used

Screen Height Truth Table

HT1 HT0 Mode
0 0 128 pixels
0 1 160 pixels
1 0 192 pixels
1 1 OBJ Mode

Color Mode Truth Table

MD1 MD0 Mode
0 0 4 colors
0 1 16 colors
1 0 Not used
1 1 256 colors

VCR Version Register

Can read out the version of the SuperFX chip in use with this register

Bit Description
0 VC0
1 VC1
2 VC2
3 VC3
4 VC4
5 VC5
6 VC6
7 VC7

RAMBR Game Pak RAM Bank Register

When writing between the game work RAM and the Super FX registers, this register specifies the bank of the game pak RAM being used. The RAMB instruction is the general method used to change this register. Only one bit is used to set the RAM bank to 0x70 or 0x71

Bit Description
0 A16 (0x70 when 0, 0x71 when 1)
1 Not Used
2 Not Used
3 Not Used
4 Not Used
5 Not Used
6 Not Used
7 Not Used

CBR Cache Base Register

This register specifies the address of either the game pak ROM or work RAM where data will be loaded from into the cache. Both the LJMP and CACHE instructions are accepted ways to change this register

Bit Description
0 - (0 when read always)
1 - (0 when read always)
2 - (0 when read always)
3 - (0 when read always)
4 A4
5 A5
6 A6
7 A7
8 A8
9 A9
10 A10
11 A11
12 A12
13 A13
14 A14
15 A15


Memory Map

From Super NES CPU point of view

Super FX Interface: Mapped to 0x3000 to 0x32FF, in banks 0x00-0x3F and 0x80-0xBF
Game ROM: Mapped to 2 Megabytes from 0x0000-0x8000. Mirror mapped from bank 0x40 0x0000, stored in 32KB blocks.
Game Work RAM: Mapped to 128KB starting from Bank 0x70:0x0000. 8KB mapped from 0x6000 in each of bank 0x00 - 0x3F. RAM mirror is in banks 0x80-0xBF.
Game Save RAM: Mapped to 128KB from bank 0x78:0x0000
SNES CPU ROM: 6 MB ROM is mapped from bank 0x80:0x8000

From Super FX point of view

Game ROM: Mapped to 2 Megabytes from 0x0000-0x8000. 2 Megabyte mirror mapped from bank 0x40:0x0000 onwards, stored in 32KB blocks. Other memory locations viewable from the SNES should not be addressed.
Game Work RAM: Mapped to 128KB starting from Bank 0x70:0x0000.
Note: The Super FX accesses memory through three bank control registers: Program Bank Register(PBR), ROM Bank Register (ROMBR) and RAM Bank Register

Instruction Set

The SuperFX instruction set is unique from the Super Nintendo's native instruction set. It allows faster, more sophisticated 16-bit mathematical functions and includes some specific graphics manipulation functions.

Some instructions can be assembled as a single byte. This is where both the instruction(nibble) and argument(nibble) are co-joined into the same storage byte. This allows for faster execution and also greater instruction density. These are important objectives when designing a co-processor. One such instruction is "adc", which starts as 0x5 and takes an argument of one of the 16 general purpose superFX registers(0x0-0xF).

Quite a few instructions require an "ALT" instruction to be executed before the opcode. This modifies the behavior of the same opcode to perform a slightly different operation. There are 3 possible ALT codes - ALT1(0x3D), ALT2(0x3E) and ALT1+ALT2(0x3F). In the table below, the specific ALT code is listed for each instruction.

Most instructions rely on pre-defined pointers for the locations of calculation variables. These are the FROM, TO and WITH instructions. The TO and FROM commands specify the general purpose register that is the variable, and the calculation result respectively. WITH defines both of the variable/result in the same command. The variable and result are known as the source and destination registers respectfully.

Instruction Set Table

InstructionDescriptionALT(Hex)CODE(HEX)ARGLength(B)BATL1ALT2O/VSCYZROMRAMCacheClassificationNote
adcAdd with carry3D0x5Rn2000****662Arithmetic Operation Instructions
adcAdd with carry3F0x5#n2000****662Arithmetic Operation Instructions
addAddNone0x5Rn1000****331Arithmetic Operation Instructions
addAdd3E0x5#n2000****662Arithmetic Operation Instructions
alt1Set ALT1 modeNone0x3d/1/1/////331Prefix Flag Instructions
alt2Set ALT2 modeNone0x3e/1//1////331Prefix Flag Instructions
alt3Set ALT3 modeNone0x3f/1/11////331Prefix Flag Instructions
andLogical ANDNone0x7Rn1000/*/*331Logical Operation Instructions
andLogical AND3E0x7#n2000/*/*662Logical Operation Instructions
asrArithmetric Shift RightNone0x96/1000/***331Shift Instructions
bccBranch on carry clearNone0x0ce2///////662"Jump, Branch and Loop Instructions"
bcsBranch on carry setNone0x0de2///////662"Jump, Branch and Loop Instructions"
beqBranch on equalNone0x09e2///////662"Jump, Branch and Loop Instructions"
bgeBranch on greater than or equal to zeroNone0x06e2///////662"Jump, Branch and Loop Instructions"
bicBit clear mask3D0x7Rn2000/*/*662Logical Operation Instructions
bicBit clear mask3F0x7#n2000/*/*662Logical Operation Instructions
bltBranch on less than zeroNone0x07e2///////662"Jump, Branch and Loop Instructions"
bmiBranch on minusNone0x0be2///////662"Jump, Branch and Loop Instructions"
bneBranch on not equalNone0x08e2///////662"Jump, Branch and Loop Instructions"
bplBranch on plusNone0x0ae2///////662"Jump, Branch and Loop Instructions"
braBranch alwaysNone0x05e2///////662"Jump, Branch and Loop Instructions"
bvcBranch on overflow clearNone0x0ee2///////662"Jump, Branch and Loop Instructions"
bvsBranch on overflow setNone0x0fe2///////662"Jump, Branch and Loop Instructions"
cacheSet cache base registerNone0x02/1000////3-43-41GSU Control Instructions
cmodeSet Plot mode3D0x4e/2000////662Plot/related instructions
cmpCompare3F0x6Rn2000****662Arithmetic Operation Instructions
colorSet plot colorNone0x4e/1000////331Plot/related instructions
decDecrementNone0xeRn1000/*/*331Arithmetic Operation Instructions
div2Divide by 23D0x96/2000/***662Arithmetic Operation Instructions
fmultFractional signed multiplyNone0x9f/1000/***11 or 711 or 78 or 4Arithmetic Operation InstructionsCycles Depends on CFGR Register
fromSet SregNone0xbRn1///////331Prefix Register Instructions
getbGet byte from ROM bufferNone0xef/1000////3-83-81-6Data Transfer From game pak ROM to registerCycles varies due to ROM buffer
getbhGet high byte from ROM buffer3D0xef/2000////6-106-92-6Data Transfer From game pak ROM to registerCycles varies due to ROM buffer
getblGet low byte from ROM buffer3E0xef/2000////6-106-92-6Data Transfer From game pak ROM to registerCycles varies due to ROM buffer
getbsGet signed byte from ROM buffer3F0xef/2000////6-106-92-6Data Transfer From game pak ROM to registerCycles varies due to ROM buffer
getcGet byte from ROM to color registerNone0xdf/1000////3-103-91-6Data Transfer From game pak ROM to registerCycles varies due to ROM buffer
hibValue of high byte of registerNone0xc0/1000/*/*331Byte transfer Instructions
ibtLoad immediate byte dataNone0xa"Rn, #pp"2000////662Data Transfer / Immediate data to register
incIncrementNone0xdRn1000/*/*331Arithmetic Operation Instructions
iwtLoad immediate word dataNone0xf"Rn, #xx"3000////993Data Transfer / Immediate data to register
jmpJumpNone0x9Rn1000////331"Jump, Branch and Loop Instructions"
ldbLoad byte data from RAM3D0x4Rm1000////11136Data Transfer From game pak RAM to register
ldwLoad word data from RAMNone0x4Rm1000////10127Data Transfer From game pak RAM to register
leaLoad effective addressNone0xf"Rn, xx"3000////993Macro Instructions
linkLink Return AddressNone0x9#n1000////331"Jump, Branch and Loop Instructions"
ljmpLong jump3D0x9Rn2000////662"Jump, Branch and Loop Instructions"
lm"Load word data from RAM, using 16 bits"3D0xf"Rn, (xx)"2000////202111Data Transfer From game pak RAM to register
lms"Load word data from RAM, short address"3D0xa"Rn, (yy)"2000////171710Data Transfer From game pak RAM to register
lmult16x16 signed multiply3D0x9f/2000/***10 or 1410 or 145 or 9Arithmetic Operation InstructionsCycles Depends on CFGR Register
lobValue of low byte of registerNone0x9e/1000/*/*331Byte transfer Instructions
loopLoopNone0x3c/1000/*/*331"Jump, Branch and Loop Instructions"
lsrLogical shift rightNone0x03/1000/0**331Shift Instructions
mergeMerge high byte of R8 and R7None0x70/1000////662Byte transfer Instructions
moveMove word data from Rn' to RnNone0x2n1n'"Rn, Rn'"2000////662Data transfer register to register
movesMove word data from Rn' to Rn and set flagsNone0x2nBn'"Rn, Rn'"2000////662Data transfer register to register
multSigned multiplyNone0x8Rn1000/*/*3 or 53 or 51 or 2Arithmetic Operation InstructionsCycles Depends on CFGR Register
multSigned multiply3E0x8#n2000/*/*6 or 86 or 82 or 3Arithmetic Operation InstructionsCycles Depends on CFGR Register
nopNo operationNone0x01/1000////331GSU Control Instructions
notInvert all bitsNone0x4f/1000////331Logical Operation Instructions
orLogical ORNone0xcRn1000////331Logical Operation Instructions
orLogical OR3E0xc#n2000////662Logical Operation Instructions
plotPlot pixelNone0x4c/1000////3-483-511-48Plot/related instructionsCycles varies due to RAM buffer and program
rambSet RAM data bank3E0xdf/2000////662Bank Set/up Instructions
rolRotate left through carryNone0x04/1000/***331Shift Instructions
rombSet ROM Data bank3F0xdf/2000////662Bank Set/up Instructions
rorRotate right through carryNone0x97/1000/***331Shift Instructions
rpixRead pixel color3D0x4c/2000/*/*24-8024-7820-74Plot/related instructions
sbcSubtract with carry3D0x6Rn2000****662Arithmetic Operation Instructions
sbk"Store word data, last RAM address used"None0x9/1000////3-87-111-6Data Transfer From register to game pak RAM
sexSign extend registerNone0x95/1000/*/*331Byte transfer Instructions
smStore word data to RAM using 16 bits3E0xf"Rn, (xx)"3000////12-1716-204-9Data Transfer From register to game pak RAMCycles varies due to RAM buffer and program
sms"Store word data to RAM , short address"3E0xa"Rn, (yy)"3000////9-1413-173-8Data Transfer From register to game pak RAMCycles varies due to RAM buffer and program
stbStore byte data to RAM3D0x3Rm2000////6-98-142-5Data Transfer From register to game pak RAMCycles varies due to RAM buffer and program
stopStop processorNone0x00/1000////331GSU Control Instructions
stwStore word data to RAMNone0x3Rm1000////3-87-111-6Data Transfer From register to game pak RAMCycles varies due to RAM buffer and program
subSubtractNone0x6Rn1000****331Arithmetic Operation Instructions
subSubtract3E0x6#n2000****662Arithmetic Operation Instructions
swapSwap low and high byteNone0x4d/1000/*/*331Byte transfer Instructions
toSet DregNone0x1Rn1///////331Prefix Register Instructions
umultUnsigned multiply3D0x8Rn2000/*/*6 or 86 or 82 or 3Arithmetic Operation InstructionsNumber of cycles depends on CONFIG register
umultUnsigned multiply3F0x8#n2000/*/*6 or 86 or 82 or 3Arithmetic Operation Instructions?
withSet Sreg and DregNone0x2"Rn, ?"???????????Prefix Register Instructions?
xorLogical Exclusive Or3D0xcRn2??????????Logical Operation Instructions?
xorLogical Exclusive Or3F0xc#n2??????????Logical Operation Instructions?

Sreg and Dreg

For certain instructions, the Sreg and Dreg must be specified before the instruction is run. The Sreg is the "Source Register" and the Dreg is the "Destination Register" - each specified as one of the 16 general purpose registers. Use of the "to", "from" and "with" instructions specifies the Sreg and Dreg.

Bitmap Emulation

The Bitmap Emulation function is one of the major acceleration functions of the SuperFX. It allows a pixel based shading approach within frame buffer as opposed to a tile based approach in the SNES VRAM. For 3D rendering operations, a fast pixel by pixel shader is necessary. The SuperFX provides the framework to plot individual pixels to the frame buffer fast, and then transfer the plotted picture to the SNES VRAM.

Fast Multiply

The SuperFX has 4 multiplication instructions.

The MULT/UMULT instructions are faster than the LMULT/FMULT instructions.

Compiling SuperFX routines

Whilst SNES assembly language programs can be compiled using a regular 65c816 compiler, the SuperFX assembly language requires a custom compiler. The original compiler used on existing SuperFX games has not been released outside the closed development community.

An open source compiler called sfxasm is available for compiling SuperFX programs.

https://sourceforge.net/projects/sfxasm/

Once compiled, SuperFX programs are included in the SNES assembly language program as a binary library. The SNES program then directs the SuperFX to use the precompiled program packed into the ROM.

Using the SuperFX in a SNES program

When the SNES boots up with a SuperFX game, the SuperFX chip is idle and you don't need to do anything to start the normal SNES routine of loading the ROM and executing code. When the SNES has booted, performed some startup routines and generally is ready, then the SuperFX can be activated in your program. Note, for emulators to support SFX instructions, the $FFD6 byte in the header must be 0x13,0x14,0x15 or 0x1a. The $FFD5 byte should be 0x20.

Initializing

The SuperFX chip should be initialized before running code. This includes setting the basic config registers.

Choosing the execution mode

As mentioned before, code can be loaded into the Super FX in 3 different ways - from ROM, game pak RAM and also the 512 byte cache. Depending which way you want to go, there is a slightly different procedure.

Setup - ROM Mode

1. Setup the Program Bank Register(PBR) for where the SFX program starts.
2. Program the program counter (R15) in the SuperFX.
2. Give the SuperFX exclusive access to the ROM by setting the "RON" flag in the SFR register.

Setup - RAM Mode

1. Transfer the program from ROM into game pak RAM using copy routines.
2. Setup the Program Base Register(PBR) for where the SFX program starts.
3. Write to the SuperFX program counter (R15).

Setup - Cache Mode

1. Transfer the program from ROM into Cache RAM (0x3100-0x31FF) onwards using copy routines. The programs need to be in blocks of 16 bytes each otherwise the SuperFX will not execute the instructions surplus to a 16 byte segment. This also applies for tiny programs under 16 bytes - to get around this, write something into the 16th byte(0x310F)
2. Write to the SuperFX program counter (R15), this is usually 0.
3. The SuperFX program will execute independently of the SNES until it hits a "STOP" instruction. When it finished, depending if the SFR config interrupt is set, it will generate an interrupt(rti instruction) on the SNES. If the interrupt is masked then the SuperFX will go to idle mode and wait for the next command from the SNES to start execution.

Starting processing

Processing starts when the SuperFX notices that the SNES has written to its program counter register (R15).

Stopping processing

The SuperFX can be stopped in one of two ways - by executing a "stop" instruction in the SuperFX's program, or from the SNES by writing a "0" to the GO flag in the SuperFX's SFR register.

Interrupt on stop

The SuperFX calls an RTI instruction when it reads a SuperFX stop instruction. It is possible to mask the interrupt by setting the "IRQ" bit in the SFR register. If interrupt is not masked, to figure out if it is a screen blanking interrupt or the Super FX, check the IRQ flag bit in the SFR register.

This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.