go6asm

go6asm language reference

A strict ca65-syntax cross-assembler, linker and static analyzer for the NMOS 6502 — native CLI or in-browser WebAssembly.

go6asm assembles a deliberate subset of cc65's ca65 syntax and is byte-identical to ca65/ld65 for that subset (verified by a differential test suite against cc65). It targets the original NMOS 6502 instruction set — all 151 official opcodes. Undocumented/"illegal" opcodes and the 65C02 / 6510 / 65816 variants are not yet supported. Output is a raw image, a Commodore .prg, or a relocatable .o6 object; the core never touches the filesystem, so the CLI, the browser, and embedding hosts behave identically.

What is assembly?

A CPU only runs tiny numeric instructions: load this byte, add, jump if the last result was zero. Assembly language gives each of those instructions a readable name (LDA, ADC, BEQ), plus labels and data, so you can write machine code without counting bytes by hand. An assembler translates that text into the exact bytes the chip executes; a linker decides where in the machine's memory those bytes live.

The 6502 is an 8-bit processor from 1975 — the Apple II, Commodore 64, NES and BBC Micro all ran one. Its instruction set is small enough to hold in your head, which makes it a good place to actually meet the hardware rather than read about it. If you've only written in higher-level languages, this is the layer underneath them.

How to read this. go6asm is meant for learning as much as for building. You can write only instructions and let the assembler supply the routine boilerplate, or spell out every detail yourself — it is the same source language either way. Anything the assembler fills in for you it will also show you on request (-explain), so the short version stays a stepping stone to the explicit one rather than a thing to unlearn.

1Authoring layers

One source language; you choose how much to write out. Each step here is optional and additive: start with just instructions and take over a piece at a time. Writing something explicitly just switches off the matching inference — nothing else about the language changes. ("Layer" is only a name for how much you spell out, not a mode you toggle.)

Layer 0 — just instructions

No directives. You write labels and opcodes; go6asm chooses the load address from the target and generates the NMI/RESET/IRQ vector block. A target's symbol pack supplies the hardware register names.

; a complete, runnable sim-tui program — no .org, no .segment
        LDA #CmdClear      ; CmdClear/RegCmd from the sim-tui pack
        STA RegCmd
done:   JMP done

Enable with go6asm -simple file.s; add -explain and the assembler prints exactly what it filled in. The generated vectors point NMI/IRQ → $0000 ("no handler") and RESET → the first instruction.

Layer 1 — choose the target and load address

Single-line directives that set one thing each, leaving the rest inferred:

DirectiveEffect
.target sim-tuiSelect a built-in memory map (§8).
.load $0600Override the default load address.
.org $0600Set the assembly origin (also overrides the load address).

Layer 2 — full control

The complete subset below — you place the code and write the vector table yourself. Any explicit .segment is the bright line: it tells go6asm you are arranging memory yourself, so all Layer-0 inference switches off and you link with a config (§9).

Same program, two ways. The Layer-0 example above is not a different dialect — go6asm rewrites it into the explicit Layer-2 form (a CODE segment, a synthesized VECTORS segment, a generated config) and assembles that. The two produce byte-identical output. Layer 0 saves typing the ceremony; it never hides a different program. Run the same source with -explain to see the explicit version it built.
Object/Layer-0 namespace. The relocatable path uses a flat symbol namespace: .proc/.scope/.org and cheap/unnamed labels are diagnostics there (E0023). Nested scopes are fully supported on the flat (non-object) path.

2Lexical structure

One statement per line. A ; begins a comment to end of line (never inside a string or char literal). Trailing comments — and the comment line directly above an instruction — are carried through to the disassembly.

TokenForms
Decimal42
Hex$1A, 0x1A
Binary%1010, 0b1010
Char'A' (one byte; \n \t \\ \' \xNN escapes)
String"text" (.asciiz/.byte/.include contexts; same escapes + \")
Identifierletter/_ then letters/digits/_

Value model. Every expression is a signed 32-bit value with ca65-parity wraparound (the same overflow/truncation semantics as ca65), independent of the 8/16-bit operand it ultimately lands in.

source
; a comment runs to end of line
        LDA #42          ; decimal
        LDX #$1A         ; hex (also 0x1A)
        LDY #%10110000   ; binary (also 0b10110000)
        LDA #'A'         ; char literal
        ADC #'0'         ; '0' = $30
go6asm disassembly  addr · bytes · instruction · your comment
$0000  A9 2A        LDA #$2A                 ; decimal
$0002  A2 1A        LDX #$1A                 ; hex (also 0x1A)
$0004  A0 B0        LDY #$B0                 ; binary (also 0b10110000)
$0006  A9 41        LDA #$41                 ; char literal
$0008  69 30        ADC #$30                 ; '0' = $30

Each literal collapses to its byte: 42$2A, %10110000$B0, 'A'$41. Strings and the \n \t \\ \" \xNN escapes go in the data directives (§7); the disassembler also threads your source comments back in.

3Labels & constants

FormMeaning
name:Global label at the current address.
@name:Cheap-local label (scoped between two global labels).
:Unnamed label; reference as :+/:- (:++ = second forward, etc.).
name = exprConstant assignment (may reference labels; resolved to a fixed point).

Scope qualifiers: ::name starts at the global scope; A::b descends named scopes. A constant becomes a visible symbol (memory-view / linker) only when .export/.exportzp/.global-marked — exactly ca65's rule. Example — naming zero-page variables:

X  = $10
.exportzp X
        LDX X          ; the disassembler now shows "X", not $10

Cheap-local labels (@name:) reset between global labels; unnamed labels (:) are referenced by direction:

delay:  LDX #$FF
@loop:  DEX            ; @loop is local to 'delay'
        BNE @loop
        RTS

        LDX #0
:       LDA src,X      ; an unnamed label
        BEQ :+         ; branch forward to the next ':'
        INX
        BNE :-         ; branch back to the previous ':'
:       RTS

In a disassembly these resolve to plain addresses (a cheap or unnamed label has no name to show). Bytes the analyzer can't prove are reachable code are rendered as .byte rather than guessed at — see §10–11.

4Expressions

The ca65 operator set, lowest to highest precedence. * alone is the current program counter.

TierOperators
boolean||   &&   .xor
comparison= <> < > <= >=
additive+ - | ^ (bitwise or/xor)
multiplicative* / .mod & << >>
unary+ - ~ !   < low byte   > high byte   ^ bank byte

In a relocatable context an operand must reduce to symbol [± constant] (optionally under </>); a same-segment label difference cancels to a constant. Anything more complex there is E0026 — never a silent miscompile.

source
WIDTH = 40
base  = $0400
        LDA #<base       ; low byte  -> $00
        LDX #>base       ; high byte -> $04
        LDY #WIDTH*2-1   ; 32-bit math, then truncated -> $4F
loop:   JMP *            ; '*' = THIS instruction (spin here)
go6asm disassembly
$0000  A9 00        LDA #$00                 ; low byte  -> $00
$0002  A2 04        LDX #$04                 ; high byte -> $04
$0004  A0 4F        LDY #$4F                 ; 32-bit math, then truncated -> $4F
loop:
$0006  4C 06 00     JMP loop                 ; '*' = THIS instruction (spin here)

<base/>base resolve to $00/$04, WIDTH*2-1 is computed in 32-bit then truncated to the one-byte operand ($4F), and * is the address of its own instruction — so JMP * is the classic spin-to-self (the disassembler even names it loop).

5Addressing modes

All NMOS modes. go6asm auto-sizes zero-page vs absolute from the operand's value (unknown/forward → absolute, ca65-parity). Force with z: (zero page) or a: (absolute).

ModeSyntax
Implied / AccumulatorRTS   ASL A
ImmediateLDA #$10   LDA #<label
Zero page (,X ,Y)LDA $10   LDA $10,X
Absolute (,X ,Y)LDA $1000,X
IndirectJMP ($1000)
(Indirect,X) / (Indirect),YLDA ($10,X)   LDA ($10),Y
RelativeBEQ label (±128)
source
        LDA #$00         ; immediate
        STA $10          ; zero page (auto-sized: $10 < $100)
        STA $0400,X      ; absolute,X
        LDA ($20),Y      ; (indirect),Y
        LDA a:$0040      ; force absolute (else this would be zero page)
        JMP ($FFFC)      ; indirect
go6asm disassembly
$0000  A9 00        LDA #$00                 ; immediate
$0002  85 10        STA $10                  ; zero page (auto-sized: $10 < $100)
$0004  9D 00 04     STA $0400,X              ; absolute,X
$0007  B1 20        LDA ($20),Y              ; (indirect),Y
$0009  AD 40 00     LDA $0040                ; force absolute (else this would be zero page)
$000C  6C FC FF     JMP ($FFFC)              ; indirect

Note the byte counts: STA $10 is two bytes (zero page, auto-sized because $10 < $100) while LDA a:$0040 is three (a: forced absolute, AD 40 00) even though $0040 would otherwise fit in zero page.

6Instructions

All 151 official NMOS 6502 opcodes (the table is generated by probing ca65 and count-locked in tests). Branch displacement is checked (E0014). Undocumented opcodes are rejected (E0001); --allow-illegal and other CPU variants are post-v1.

7Directives

Data

.byte / .bytBytes; accepts string literals.
.word / .addr16-bit little-endian words.
.dbyt16-bit big-endian.
.asciizString + terminating $00.
.res n[,fill]Reserve n bytes (fill, default 0).
.incbin "f"Include raw bytes from a host-supplied file.

Layout & symbols

.org $XXXXSet assembly origin (flat path).
.segment "NAME"Switch segment (Layer 2 / object).
.target / .loadLayer-1 inference controls (§1).
.export / .exportzpMake a label/constant visible to the linker.
.import / .importzpDeclare an external symbol.
.global / .globalzpExport if defined here, else import.

Structure, macros, conditionals

.proc/.endproc · .scope/.endscopeNested scopes (flat path).
.macro/.endmacro · .paramcountParameterised macros.
.if/.elseif/.else/.endifConditional assembly.
.ifdef / .ifndefDefined-symbol conditionals.
.repeat/.endrepeatCounted repetition.
.assert cond, action, "msg"Build-time assertion (error/warning).
.include "f"Splice another in-memory source file.

A worked example — a parameterised macro and a build-time switch (assembled with the flat-64k target; DEBUG not defined):

source
.macro POKE addr, val      ; parameterised macro
        LDA #val
        STA addr
.endmacro

.export reset
reset:  POKE $D020, $00    ; the macro expands inline
.ifdef DEBUG
        BRK                ; only with -D DEBUG
.endif
        RTS
go6asm disassembly
reset:
$0200  A9 00        LDA #$00
$0202  8D 20 D0     STA $D020
$0205  60           RTS

The POKE invocation became LDA #$00 / STA $D020 inline, and the .ifdef DEBUG block produced nothing because DEBUG wasn't defined — both happen before code generation, so they leave no trace in the bytes. Data directives (.byte, .word, .res, .asciiz, .repeat) emit their bytes as listed; the disassembler renders anything that isn't reached as code as .byte (§10–11).

8Targets & memory maps

TargetWhere things are
raw ("" )Flat, code @ $0200, no vectors. Learn the bare CPU.
flat-64kOne 64 KB image, code @ $0200, vectors @ $FFFA.
sim-tuiROM @ $E000; RAM $0000–$1FFF; VIC/VIA I/O $A000–$DFFF; vectors $FFFA; $2000–$9FFF unmapped. Carries a symbol pack (RegCmd, CharBase, ViaIFR, …).
lcm-32carledwards/lcm-32 bare-metal board (map from carledwards/6502-simulator): RAM $0000–$1FFF; two W65C22 VIAs at $6000/$8000; ROM @ $E000; vectors $FFFA. Symbol pack of the VIA registers (VIA1_DDRA, VIA1_ORA, …).
c64Commodore .prg, load @ $0801 (BASIC start).

A target supplies the default load address, the vector base, a symbol pack, and the device region map the static analyzer judges against. Custom maps use a ld65 config subset (MEMORY/SEGMENTS) passed with -C.

9Objects & linking

The relocatable .o6 format ("GO6O") holds named segments at origin 0, a symbol table (export/import/local), and relocations (low/high byte, word, branch). The linker places segments per a memory-map config, resolves symbols, patches relocations, applies fills, and emits the image. Built-in configs: flat-64k, sim-tui, c64. The whole path is byte-identical to ld65 for the supported subset.

10Static analysis

From the reset/NMI/IRQ vectors (and program start), go6asm traces every path the CPU can take and reports — without running the code:

CodeFinding
A0001A BRK reached on a normal-execution path.
A0002An illegal/undefined opcode reached as code (executing data).
A0003JMP (indirect) — an analysis boundary (Note).
A0004A store into a read-only region (writing ROM).
A0005An access to unmapped memory.
A0006Control leaves the analyzed image (Note).
Honesty rule. What it can't bound — indirect/computed addresses, jump tables, self-modifying code — is reported as a boundary ("analysis stops here"), never silently passed and never asserted safe. Memory checks are device-specific: the same code is fine on one target and a bug on another.

11Outputs

.binRaw linked image.
.prgCommodore image (2-byte load-address header).
.o6Relocatable object (-obj).
listingAddress / bytes / source (-l).
.lblVICE monitor labels (al C:$ADDR .name).
.symCanonical symbol JSON.
disassemblyDecoded + symbol-substituted + your source comments threaded back in.

12CLI & JS API

go6asm [flags] file.s [more.s|data.bin ...]
  -o out        output file              -org $ADDR  origin
  -t target     built-in memory map      -C cfg      ld65-style config
  -D N=V        predefine a constant     -obj        emit .o6
  -simple       Layer-0 inference        -explain    print inferences
  -disasm       annotated disassembly    -analyze    static analysis
  -prg          Commodore .prg           -lbl/-sym   debug sidecars
go6ld  a.o6 ... -t target [-o out] [-prg] [-m map] [-L lbl] [-S sym]
go6dump file.o6     ; inspect a relocatable object

In the browser, go6asm.assemble({source, target, obj, defines}) returns {success, bytes, origin, symbols, comments, disasm, analysis, lbl, sym, object, errors} — the same engine, no server.

13Diagnostics

Every phase emits one shared diagnostic shape with a ca65-style caret:

error[E0014]: branch out of range
  --> game.s:42:5
   |
42 |     BEQ done
   |     ^^^^^^^^
   |
   = hint: use the inverse branch + JMP for a long branch

Code ranges: E0001–E00xx errors (assembly/link), I0001+ informational (Layer-0 inferences, silent unless -explain), A0001–A0006 static-analysis findings.

This reference describes what is implemented today. Roadmap items (stdlib includes, more CPU variants, additional analyzer checks, MCP deploy) live in the project design doc. Try any of it in the playground.