===================================================================
Atari 2600 TIA Hardware Notes
===================================================================

v1.0 6-March-2003
by Andrew Towers
mariofrog@bigpond.com


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ TIA Hardware Notes (a Small Opus on the TIA)

Following is a whole bunch of notes on the TIA I made while I
was trying to work out how the whole thing is put together.
You'll need a copy of the TIA schematics to understand the more
complicated bits of this since I was looking at them when I
wrote it. According to my copy they were scanned in by Mark De
Smet. They are now available for download from Atari Compendium.

I started out searching through the stella archives for any info
on triggering the players more than once per scanline (at the
time I wanted to draw more than the 12 copies possible by flicker
and 3-repeat) - and I came across Eckhard Stolberg's 'grid2' demo
from Oct 1998, followed by a long series of threads over several
months discussing how the technique actually manages to work =)
From all the articles it looked like a complete black art and
no-one had a theory that would explain it fully.

Then I came across the TIA-1A schematics, and proceeded to spend
the next 3-4 days solid drinking copious amounts of coffee and
taking very little sleep while I tried to figure the whole mess
out from the gate level up. (hey, the 2600 is a new hobby, I can
splurge ;) In the end I found that as usual writing a 'few quick
notes' turned into 'writing a tutorial' or, 'a small opus on the
TIA'. So, here we go.


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Polynomial Counters, what the heck is this?

Almost all of the timing and counting within the TIA is implemented
in the form of "polynomial counters", so this seems a good place to
start. If you've never come across these before (I hadn't) they
seem a really obscure way to go about counting things, but they're
very small and simple to implement and therefore cheap on silicon.
They also have the useful property that 'adding 1' takes linear
time (unlike a ripple-carry adder/counter) - as long as you don't
want to know where you're up to in traditional binary numeric form,
they're perfect ;)

Actually, as the TIA designers pointed out, you can use a small
lookup table to convert from one to the other, and you can compare
two counter states to see if you're up to the same count without
knowing the numeric values. But this is getting off track and
hand-wavery. If you want to know the maths behind polynomial
counters I suggest you look elsewhere, I'm no mathematician ;)
These things seem to be used as pseudo-random number generators or
noise generators (see the TIA sound generator, ditto for the GBA)
more than anything else.

A polynomial counter (actually a form of "Linear Feedback Shift
Register") consists of a shift register, as the name suggests,
with some sort of feedback logic - in this case a single two-
input XNOR gate obfuscated in NOR logic. They have the property
that they will step through up to (2^n)-1 unique states when
optimally wired up, from any starting state except for the illegal
state (and of course it's possible to power-up in the illegal
state =) so for a 6-bit shift register there can be at most 63
valid states.

The TIA uses the same polynomial counter circuit for all of its
horizontal counters - there is a HSync counter, two Player
Position and two Missile Position counters, and the Ball Position
counter. The sound generator has a more complex design involving
another polynomial counter or two - I haven't delved into the
workings of this one yet.

Beside each counter there is a two-phase clock generator. This
takes the incoming 3.58 MHz colour clock (CLK) and divides by
4 using a couple of flip-flops. Two AND gates are then used to
generate two independent clock signals thusly:
 __          __          __
_| |_________| |_________| |_________  PHASE-1 (H@1)
       __          __          __
_______| |_________| |_________| |___  PHASE-2 (H@2)

You'll need a thingo, fixed-spacing font, to make sense of that.
The two clock lines are used to perform a two-step increment
of the counter, as well as being used independently to move
data through the supporting clocked logic.

This concept seems to come up a -lot- in the TIA, I think it's
some sort of Zen NMOS thing, it seems to go hand and hand with
storing data in back of inverters all over the place (a * is used
in the TIA schematics to denote this), and building bit-shifting
chains into your data storage so you don't need addressing ;p
If you've ever wondered why the Playfield bit order is so obscure,
now you know.

Each counter has a wired-AND counter state decode matrix (woo..)
connected in parallel with the shift register. In every case,
the top line of the decoder on the schematics checks for '111111'
and forces a Reset if it is encountered. This is to prevent the
counter getting stuck in the illegal state when it powers up as
mentioned earlier.

Also in every case, the next decoder line is the 'wrap-around'
value - when this state comes up, the counter does a self-reset
to 000000 on the next phase-2 clock, and usually does something
useful like generating a START signal for graphics output.

The rest of the counter decodes depend entirely on which counter
we're looking at, set let's get into 'em.


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Horizontal Sync Counter

The HSync counter counts from 0 to 56 once for every TV scan-line
before wrapping around, a period of 57 counts at 1/4 CLK (57*4=228
CLK). The counter decodes shown below provide all the horizontal
timing for the control lines used to construct a valid TV signal.

This table shows the elapsed number of CLK, CPU cycles, Playfield
(PF) bits and Playfield pixels at the start of each counter state
(ie when the counter changes to this state on the rising edge of
the H@2 clock). The decoded control lines are usually clocked into
other logic blocks during the next H@1-H@2 cycle (within 4 CLK).

Value	HCount	CLK	CPU	PF	Pixel	Control

000000	0	0	0
100000	1	4	1.3
110000	2	8	2.6
111000	3	12	4
111100	4	16	5.3			Set H-SYNC [SHS]
111110	5	20	6.6
011111	6	24	8
101111	7	28	9.3
110111	8	32	10.6			Reset H-SYNC [RHS]
111011	9	36	12
111101	10	40	13.3
011110	11	44	14.6
001111	12	48	16			ColourBurst [RCB]
100111	13	52	17.3
110011	14	56	18.6
111001	15	60	20
011100	16	64	21.3			Reset H-BLANK [RHB]
101110	17	68	22.6	0	0
010111	18	72	24	1	4	Late RHB [LRHB]
101011	19	76	25.3	2	8
110101	20	80	26.6	3	12
011010	21	84	28	4	16
001101	22	88	29.3	5	20
000110	23	92	30.6	6	24
000011	24	96	32	7	28
100001	25	100	33.3	8	32
010000	26	104	34.6	9	36
101000	27	108	36	10	40
110100	28	112	37.3	11	44
111010	29	116	38.6	12	48
011101	30	120	40	13	52
001110	31	124	41.3	14	56
000111	32	128	42.6	15	60
100011	33	132	44	16	64
110001	34	136	45.3	17	68
011000	35	140	46.6	18	72
101100	36	144	48	19	76	Center [CNT]
110110	37	148	49.3	20	80
011011	38	152	50.6	21	84
101101	39	156	52	22	88
010110	40	160	53.3	23	92
001011	41	164	54.6	24	96
100101	42	168	56	25	100
010010	43	172	57.3	26	104
001001	44	176	58.6	27	108
000100	45	180	60	28	112
100010	46	184	61.3	29	116
010001	47	188	62.6	30	120
001000	48	192	64	31	124
100100	49	196	65.3	32	128
110010	50	200	66.6	33	132
011001	51	204	68	34	136
001100	52	208	69.3	35	140
100110	53	212	70.6	36	144
010011	54	216	72	37	148
101001	55	220	73.3	38	152	
010100	56	224	74.6	39	156	RESET, HBLANK [SHB]
101010	57	(228)	(76)	(40)	(160)	(already at 000000)
010101	58	232	-	-	-
001010	59	236	-	-	-
000101	60	240	-	-	-
000010	61	244	-	-	-
000001	62	248	-	-	-
000000	0	0	-	-	-	(cycle)
111111	-	-	-	-	-	ERROR (Reset to 000000)

Key:
SHS   Turn on the TV HSYNC signal to start Horizontal flyback.
RHS   Turn off the HSYNC signal, delayed 4 CLK.
RCB   Reset Colour Burst, delayed 4 CLK latching [CB].
RHB   Reset HBlank (enable output), delayed 4 CLK latching [HB].
LRHB  Late RHB, used instead of RHB when [HMOVE] latch is set.
CNT   Center screen, start copy/reflect PF, delayed 4 CLK for [CNTD].
SHB   Start HBlank (disable output), Reset HCount to 000000.

The HSync counter resets itself after 57 counts; the decode on
HCount=56 performs a reset to 000000 delayed by 4 CLK, so
HCount=57 becomes HCount=0. This gives a period of 57 counts
or 228 CLK.

Playfield pixels start on the [RHB] control line at CLK=64, but
the first visible pixel won't appear until CLK=68 due to the
clocking on its output. The [CNT] control line either starts the
Playfield again as normal, or starts a reverse-shifted copy when
reflect-playfield [REF] is enabled.

RSYNC resets the two-phase clock for the HSync counter to the
H@1 rising edge when strobed. It looks like this could be used
to move the HSync counter into phase with the CPU on any cycle
(although there is some auto-synchronisation between the two-phase
clock and the div-by-3 counter for the CPU clock, I haven't looked
into this yet.) A full H@1-H@2 cycle after RSYNC is strobed, the
HSync counter is also reset to 000000 and HBlank is turned on.
This one requires more investigation.


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Player 0 and Player 1 Horizontal Position Counters

There are two independent Player Horizontal Position Counters, one
each for player 0 and player 1. The counters are identical; only
one is drawn in the schematics. This section describes only the
player 0 counter.

The player position counter controls the position of the player
graphics object (P0) on each scanline. The player counter counts
from 0 to 39 and then wraps around, giving a period of 40 counts
at 1/4 CLK (160 CLK) - also the number of visible pixels on a
scanline.

This table shows the elapsed number of CLK and CPU cycles at the
beginning of each counter state (the CPU column isn't particularly
relevant). Each START decode is delayed by 4 CLK in decoding, plus
a further 1 CLK to latch the STARTat the graphics scan counter.
The START decodes are ANDed with flags from the NUSIZ register
before being latched, to determine whether to draw that copy.
Actual graphics output is shown in parentheses for non-stretched
copies of the player.

Value	PCount	CPU	CLK	Event

000000	0	0	0	(draw -012)
100000	1	1.3	4	(draw 3456)
110000	2	2.6	8	(draw 7---)
111000	3	4	12	START DRAWING (NUSIZ=001,011)
111100	4	5.3	16	(draw -012)
111110	5	6.6	20	(draw 3456)
011111	6	8	24	(draw 7---)
101111	7	9.3	28	START DRAWING (NUSIZ=011,010,110)
110111	8	10.6	32	(draw -012)
111011	9	12	36	(draw 3456)
111101	10	13.3	40	(draw 7---)
011110	11	14.6	44
001111	12	16	48
100111	13	17.3	52
110011	14	18.6	56
111001	15	20	60	START DRAWING (NUSIZ=100,110)
011100	16	21.3	64	(draw -012)
101110	17	22.6	68	(draw 3456)
010111	18	24	72	(draw 7---)
101011	19	25.3	76
110101	20	26.6	80
011010	21	28	84
001101	22	29.3	88
000110	23	30.6	92
000011	24	32	96
100001	25	33.3	100
010000	26	34.6	104
101000	27	36	108
110100	28	37.3	112
111010	29	38.6	116
011101	30	40	120
001110	31	41.3	124
000111	32	42.6	128
100011	33	44	132
110001	34	45.3	136
011000	35	46.6	140
101100	36	48	144
110110	37	49.3	148
011011	38	50.6	152
101101	39	52	156	RESET, START DRAWING (always)
010110	40	53.3	160	(already at 000000)
001011	41	54.6	
100101	42	56	
010010	43	57.3	
001001	44	58.6	
000100	45	60	
100010	46	61.3	
010001	47	62.6	
001000	48	64	
100100	49	65.3	
110010	50	66.6	
011001	51	68	
001100	52	69.3	
100110	53	70.6	
010011	54	72	
101001	55	73.3	
010100	56	74.6	
101010	57	76	
010101	58	-
001010	59	-
000101	60	-
000010	61	-
000001	62	-
000000	0	-		(cycle)
111111	-	-		ERROR (Reset to 000000)

The graphics output for players contains some extra clocking
logic not present for the Playfield or other screen objects.
It takes 1 additional CLK to latch the player START signal.
The rest of the clocking logic is in common with the other
grahpics objects; therefore we can say that player grahpics
are delayed by 1 CLK (this is why the leftmost possible start
position for a RESP0 is pixel 1, not pixel 0. You can HMOVE
the player further left though, if necessary.)

The most important thing to note about the player counter is
that it only receives CLK signals during the visible part of
each scanline, when HBlank is off; exactly 160 CLK per scanline
(except during HMOVE). During the other 68 CLK per line, the
counter lies dormant on the exact 1/4 phase it was up to.
The [MOTCK] (motion clock?) line supplies the CLK signals
for all movable graphics objects during the visible part of
the scanline. It is an inverted (out of phase) CLK signal.

This arrangement means that resetting the player counter on any
visible pixel will cause the main copy of the player to appear
at that same pixel position on the next and subsequent scanlines.
There are 5 CLK worth of clocking/latching to take into account,
so the actual position ends up 5 pixels to the right of the
reset pixel (approx. 9 pixels after the start of STA RESP0).

For better or worse, the manual 'reset' signal (RESP0) does not
generate a START signal for graphics output. This means that
you must always do a 'reset' then wait for the counter to
wrap around (160 CLK later) before the main copy of the player
will appear. However, if you have any of the 'close', 'medium'
or 'far' copies of the player enabled in NUSIZ, these will be
drawn on the current and subsequent scanlines as the appropriate
decodes are reached and generate their START signals.


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Player 0 and Player 1 Graphics Scan Counters

The Player Graphics Scan Counters are 3-bit binary ripple counters
attached to the player objects, used to determine which pixel
of the player is currently being drawn by generating a 3-bit
source pixel address. These are the only binary ripple counters
in the TIA.

The Scan Counters are never reset, so once the counter receives
the Start signal it will count fully from 0 to 7. Counting is
only performed during the visible part of the scanline since
it is driven by the [MOTCK] line used to advance the Player
Position Counter. This gives rise to "sprite wrapping" whereby
a player positioned so it ends past the righthand side of the
screen will finish drawing at the beginning of the next scanline.
Note that a HMOVE can gobble up the wrapped player graphics -
see below.

The count frequency is determined by the NUSIZ register for that
player; this is used to selectively mask off the clock signals to
the Graphics Scan Counter. Depending on the player stretch mode,
one clock signal is allowed through every 1, 2 or 4 graphics CLK.
The stretched modes are derived from the two-phase clock; the H@2
phase allows 1 in 4 CLK through (4x stretch), both phases ORed
together allow 1 in 2 CLK through (2x stretch).

The NUSIZ register can be changed at any time in order to alter
the counting frequency, since it is read every graphics CLK.
This should allow possible player graphics warp effects etc.

Player Reflect bit - this is read every time a pixel is generated,
and used to conditionally invert the bits of the source pixel
address. This has the effect of flipping the player image drawn.
This flag could potentially be changed during the rendering of
the player, for example this might be used to draw bits 01233210.

Player graphics registers - there are four 8-bit registers in the
TIA for storing Player graphics, two for each player. Only two
of these are ever directly accessible; these are labelled the
"new" player graphics registers on the schematics. Unless the
Player Vertical Delay (VDELPn) is set, the "new" registers are
always drawn.

Writes to GRP0 always modify the "new" P0 value, and the
contents of the "new" P0 are copied into "old" P0 whenever
GRP1 is written. (Likewise, writes to GRP1 always modify the
"new" P1 value, and the contents of the "new" P1 are copied
into "old" P1 whenever GRP0 is written). It is safe to modify
GRPn at any time, with immediate effect.

Vertical Delay bit - this is also read every time a pixel is
generated and used to select which of the "new" (0) or "old" (1)
Player Graphics registers is used to generate the pixel. (ie
the pixel is retrieved from both registers in parallel, and
this flag used to choose between them at the graphics output).
It is safe to modify VDELPn at any time, with immediate effect.


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Missile 0 and Missile 1 Horizontal Position Counters

There are also two individual Horizontal Position Counters for
missile 0 and missile 1. The counters are independent and identical.

These counters use exactly the same counter decodes as the players,
but without the extra 1 CLK delay to start writing out graphics.

Missiles use the same control lines as the player from the NUISZ
register to determine the number of copies drawn, although they
ignore the player scaling options (you'll just get a single copy
for the scaled player modes).

Missile width is implemented in the same way as the ball width; it
appears to be exactly the same gate arrangement (see below).

The Missile-to-player reset is implemented by resetting the M0
counter when the P0 graphics scan counter is at %100 (in the middle
of drawing the player graphics) AND the main copy of P0 is being
drawn (ie the missile counter will not be reset when a subsequent
copy is drawn, if any). This second condition is generated from a
latch outputting [FSTOB] that is reset when the P0 counter wraps
around, and set when the START signal is decoded for a 'close',
'medium' or 'far' copy of P0.


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Ball Horizontal Position Counter

The ball position counter controls the position of the ball
graphics object (BL) on each scanline. The ball counter counts
from 0 to 39 and then wraps around, giving a period of 40 counts
at 1/4 CLK (160 CLK).

Ball width is given by combining clock signals of different widths
based on the state of the two size bits (the gates form an AND ->
OR -> AND -> OR -> out arrangement, with a hanger-on AND gate).
See notes later for all the messy details ;p

It seems a shame to have a whole polynomial counter for the ball, and
no special effects aside from its size - except for one small detail.

If you look closely at the START signal for the ball, unlike all
the other position counters - the ball reset RESBL does send a START
signal for graphics output! This makes the ball incredibly useful
since you can trigger it as many times as you like across the same
scanline and it will start drawing immediately each time :)

So it's good for cutting holes in things, drawing background details,
clipping the edges off things, etc. It can even be used to draw simple
sprites, or used as the background colour (because it's behind
everything else) for a two-colour sprite.

Actually on my 2600jr (long rainbow), setting the ball size to 8
pixels results in solid colour when it's reset every 9 pixels
(this might just be colour bleeding, I'm not sure).

Value	BCount	CPU	CLK	Event

000000	0	0	0	(draw 0123)
100000	1	1.3	4	(draw 4567)
110000	2	2.6	8
111000	3	4	12
111100	4	5.3	16
111110	5	6.6	20
011111	6	8	24
101111	7	9.3	28
110111	8	10.6	32
111011	9	12	36
111101	10	13.3	40
011110	11	14.6	44
001111	12	16	48
100111	13	17.3	52
110011	14	18.6	56
111001	15	20	60
011100	16	21.3	64
101110	17	22.6	68
010111	18	24	72
101011	19	25.3	76
110101	20	26.6	80
011010	21	28	84
001101	22	29.3	88
000110	23	30.6	92
000011	24	32	96
100001	25	33.3	100
010000	26	34.6	104
101000	27	36	108
110100	28	37.3	112
111010	29	38.6	116
011101	30	40	120
001110	31	41.3	124
000111	32	42.6	128
100011	33	44	132
110001	34	45.3	136
011000	35	46.6	140
101100	36	48	144
110110	37	49.3	148
011011	38	50.6	152
101101	39	52	156	RESET, START DRAWING
010110	40	53.3
001011	41	54.6	
100101	42	56	
010010	43	57.3	
001001	44	58.6	
000100	45	60	
100010	46	61.3	
010001	47	62.6	
001000	48	64	
100100	49	65.3	
110010	50	66.6	
011001	51	68	
001100	52	69.3	
100110	53	70.6	
010011	54	72	
101001	55	73.3	
010100	56	74.6	
101010	57	76	
010101	58	-
001010	59	-
000101	60	-
000010	61	-
000001	62	-
000000	0	-		(cycle)
111111	-	-		ERROR (Reset to 000000)

Vertical Delay bit - the VDELBL control bit works in the same
manner as the player VDEL bits; the state of VDELBL is used
every CLK to determine which of the "new" (0) or "old" (1)
ENABL values to use at the graphics output. Writes to ENABL
always modify the "new" value, and whenever GRP1 is written
the "new" value is copied into the "old". It is safe to
modify VDELBL and ENABL at any time, with immediate effects.


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Using the Horizontal Position Counters

The documented way to use a player position counter is to reset
it with RESPn on any CPU cycle divisible by 5 during the visible
scanline (5 is a convenient number for DEX-BNE loops), set up
HMPn to adjust the position by +7 (left) to -8 (right) pixels,
and hit HMOVE immediately after the next WSYNC. Then configure
NUSIZn for the number and spacing of copies required, and let
the hardware go about its business. Once this is set up, you
can just change the grpahics in GRPn every scanline to get one,
two or three copies at fixed spacing.

In fact the hardware has hard-wired requirements for almost none
of the above =) The fixed spacing between copies is hard-wired
and HMOVE is largely not negotiable, but the rest is complete
tosh.

The TIA renders each movable graphics object according to
independent position counters running at 1/4 CLK with a period
of 40 increments, and synchronised to the last RESPn/RESMn/RESBL
strobe. Each and every time a counter wraps around, the 'main'
copy of the object starts to draw. Since it takes 4 CLK to reset
the counter to zero and 4 CLK to increment the counter, the image
can be expected to appear after exactly 40 full counts, or 160 CLK.

The counters are normally only running during the 'visible' part
of a scanline, unless you're doing a HMOVE. Since the scanline
has 160 visible pixels, this yields the documented behavior that
a RESPn/etc sets the position for the next scanline. It's out
by 5 pixels when you set it, but who's counting?

Due to extra clocking logic for Player graphics output, the first
player pixel won't appear until 1 CLK later than for any other
grahpics object once rendering 'starts'. See the HSync/Player
Counter info above for an explanation of this.

During the horizontal blank (see the Horizontal Counter info
above) the Player, Missile and Ball counters stop receiving
CLK signals, so they pause on the exact 1/4 CLK they're up to
and resume where they left off at the first visible pixel on
the next scanline. This gives rise to the 'wrap around' effect,
to the point of splitting a copy of the player image in half
because it happened to start too near the right edge of the
screen ;)

The object counters are running at the same 1/4 CLK rate as the
HSync counter, but you can set them out of phase with the HSync
counter (and therefore the Playfield) by resetting any of them
on a CPU cycle that isn't divisible by 4. (If this were not the
case, there would only be 40 possible positions along the
scanline and we could all go home early). You can also use the
HMOVE command, which is described below.


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Playing with the HMOVE registers

In principle the operation of HMOVE is quite straight-forward;
if a HMOVE is initiated immediately after HBlank starts, which
is the case when HMOVE is used as documented, the [HMOVE] signal
is latched and used to delay the end of the HBlank by exactly
8 CLK, or two counts of the HSync Counter. This is achieved in
the TIA by resetting the HB (HBlank) latch on the [LRHB] (Late
Reset H-Blank) counter decode rather than the normal [RHB] (Reset
H-Blank) decode.

The extra HBlank time shifts everything except the Playfield
right by 8 pixels, because the position counters will now
resume counting 8 CLK later than they would have without the
HMOVE. This is also the source of the HMOVE 'comb' effect;
the extended HBlank hides the normal playfield output for the
first 8 pixels of the line.

In order to move less than 8 pixels right the TIA performs
'clock stuffing' on the Player, Missile and Ball position
counters, whereby a number of clock pulses between 0 and 15
are sent to the counters during HMOVE. Each extra clock pulse
eats up 1/4 count in the object's horizontal position counter,
and thereby moves the object left one pixel. This must be done
during HBlank because it is sending these extra clock pulses
down the same clock lines that usually receive [MOTCK] pulses
during the visible part of the scanline.

The Stella Programmer's Guide states that "the motion registers
should not be modified for at least 24 machine cycles after an
HMOVE command". This is indeed for internal hardware
considerations, although perhaps not entirely mysterious.
After several attempts, I finally got my head around the
heavily obfuscated logic in the schematics. It turns out to
be fairly simple, and quite elegant :)

The HMOVE values set by the programmer are stored in a matrix
of 4-bit data latches with built-in comparators - each latch
effectively contains a wired-XOR gate, and the 4 latches for
a given HMxx register are arranged in a wired-NOR formation
to give a 4-bit comparator.

Beside the matrix of HMxx latches is a 4-bit binary ripple
counter. It begins at 15 and decrements down to zero during
the HMOVE at a rate of 1 decrement every 4 CLK (it's built
from 2-phase clocked logic). The counter is wired in parallel
to the comparators in all 5 HMxx registers.

At the beginning of the HMOVE, a latch is set for each movable
object to indicate that it requires more motion to the left.
When the comparator for a given object detects that none of
the 4 bits match the bits in the counter state, it clears this
latch (a clever exercise in reverse logic!) Until this time,
the output of the latch is sent through to the movable object
once every 4 CLK (on every H@1 signal from the HSync two-phase
clock) as an extra "stuffed" clock signal.

Since one extra CLK pulse is sent every 4 CLK, this takes at
most 4*16=64 CLK (including counter reset at the end), or
64/3=21 CPU cycles. It takes 3 CLK after the HMOVE command
is received to decode the [SEC] signal (at most 6 CLK depending
on the timing of STA HMOVE) and a further 4 CLK to set the
"more movement required" latches. So we need to wait at least
71/3=23.66 CPU cycles before the HMOVE operation is complete.
For a normal HMOVE after WSYNC, it might be safe by cycle 23
(this has not been tested).

The first compare (against 15) will be sampled 15 CLK after STA
HMOVE begins and every 4 CLK thereafter. The first counter
decrement will happen at CLK 17, and every 4 CLK thereafter.

You may have noticed that the above discussion ignores the
fact that HMxx values are specified in the range +7 to -8.
In an odd twist, this was done purely for the convenience
of the programmer! The comparator for D7 in each HMxx latch
is wired up in reverse, costing nothing in silicon and
effectively inverting this bit so that the value can be
treated as a simple 0-15 count for movement left. It might
be easier to think of this as having D7 inverted when it
is stored in the first place.

In theory then the side effects of modifying the HMxx registers
during HMOVE should be quite straight-forward. If the internal
counter has not yet reached the value in HMxx, a new value greater
than this (in 0-15 terms) will work normally. Conversely, if
the counter has already reached the value in HMxx, new values
will have no effect because the latch will have been cleared.

Much more interesting is this: if the counter has not yet
reached the value in HMxx (or has reached it but not yet
commited the comparison) and a value with at least one bit
in common with all remaining internal counter states is
written (zeros or ones), the stopping condition will never be
reached and the object will be moved a full 15 pixels left.
In addition to this, the HMOVE will complete without clearing
the "more movement required" latch, and so will continue to send
an additional clock signal every 4 CLK (during visible and
non-visible parts of the scanline) until another HMOVE operation
clears the latch. The HMCLR command does not reset these latches.

The Cosmic Ark stars effect achieved this by writing the value
$60 to HMM0, 21 cycles after HMOVE starts. See this message in
the archives:
http://www.biglist.com/lists/stella/archives/199705/msg00024.html

Following is how I believe it works: at 21 cycles in, the internal
counter has just decremented to %0000 and is about to test this
against the HMxx registers (2 CLK from now, if my timings are
correct). If we flip the top bit of $60 as described above,
we have the binary pattern %1110. This pattern has at least one
bit in common with the final remaining state (the bottom zero
bit), and also has bits in common with the default counter state
%1111 which will arise when the counter resets. This means the
compare will pass now and forever more :) For this to work, I
expect that they must have set HMM0 to $70 before using the trick
(binary %0111, or %1111 with the bit flipped), but after a cursory
glance at Thomas' commented Cosmic Ark code I haven't found this.

Looking at the archives relating to Cosmic Arc and Rabbit Transit
tricks, I also notice that a HMCLR 20 cycles in has the same effect.
In this case it will be resetting HMxx to %1000 (bit-flipped)
which also obeys the rules for bypassing the stopping condition.


Also of note, the HMOVE latch used to extend the HBlank time
is cleared when the HSync Counter wraps around. This fact is
exploited by the trick that invloves hitting HMOVE on the 74th
CPU cycle of the scanline; the CLK stuffing will still take
place during the HBlank and the HSYNC latch will be set just
before the counter wraps around. It will then be cleared again
immediately (and therefore ignored) when the counter wraps,
preventing the HMOVE comb effect. Since the extended HBlank
is needed to move all objects right 8 pixels, this has the
limitation that objects can only be moved left, and the normal
HMOVE numbering no longer applies. Instead the HMOVE value is
interpreted as (8 + value) pixels to the left, ie:

    -8 = 0    -4 = 4	0 = 8     4 = 12
    -7 = 1    -3 = 5	1 = 9	  5 = 13
    -6 = 2    -2 = 6	2 = 10	  6 = 14
    -5 = 3    -1 = 7	3 = 11	  7 = 15

This means that all objects will be moved 8 pixels left unless
you set their HMxx value to -8 for zero movement.

I've recently found a post in the Stella mailing list archives
that gave these results by exhaustive testing, posted by Brad Mott:
http://www.biglist.com/lists/stella/archives/199804/msg00198.html


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Graphics Scan Counters during HMOVE

Since the Graphics Scan Counters are never reset, player
graphics output can wrap around as mentioned above.

A HMOVE 8 pixels right (-8 << 4), has no effect on the scan
counter since it will perform no "clock stuffing" of the
player counters for that player (the extended HBlank time
moves everything right 8 pixels).

Any other HMOVE value will gobble up at least one pixel,
or more proportional to the HMOVE value. Since a HMOVE
value really represents a count from 0 (for -8) to 15
(for +7) with the top bit inverted, this is the number
of player pixels that will be gobbled up by the HMOVE.

This means that a HMOVE of 0 will gobble up all remaining
wrapped output for the non-stretched player modes, since it
sends 8 extra clocks to the player. (Note that this is only
true if HMOVE was actually strobed for the scanline,
otherwise the configured HMxx registers never have any
effect). For the stretched player modes there could be some
output left - it takes 16 stuffed clocks to eat up a full
2X player, and 32 clocks to eat up a full 4X player.


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ HMOVE during the visible scanline.

I mentioned above that HMOVE sends extra clock pulses down
the same clock lines that are usually used during the visible
part of the scanline. In theory this means that performing a
HMOVE during the visible part of the scanline should have no
effect. However, looking at how the various clock signals
interact, I suspect it is possible. I did some preliminary
experiments (on a 2600 Jr) at some point, and I seem to
remember having some success.

In this case the extra HMOVE clock pulses act to perform
'plugging' instead of the normal 'stuffing'; by this I mean
that the extra pulses plug up the gaps in the normal [MOTCK]
pulses, preventing them from counting as clock pulses. This
only works because the extra HMOVE pulses are derived from
the two-phase clock on the HSync counter, which is itself
derived from CLK (the TIA colour clock input), whereas
[MOTCK] is an inverted CLK signal - so they are more or less
precisely out of phase :)

I'm not sure how universal (or reliable!) this might turn out
to be, but I haven't seen it mentioned before. Also of note,
this technique can only be used to effect a move to the right,
at a rate of 1 pixel every 4 CLK (since this is the rate that
HMOVE generates the extra clock pulses).


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ The Re-trigger Trick, and all that jazz

I've read some theories suggesting that re-triggering is a
hack, possibly dependent on chip revision, where you trick
the TIA into rendering more than three copies by hitting
RESP0/RESP1 during the rendering of a 'legitimate' copy, or
some other method to confuse the poor chip. Through extensive
coffee consumption, I have determined that this is not the
case. Perhaps peering at the TIA schematics for countless
hours on end, until I fell asleep (two days in a row), may
have helped also.

The behaviour of the TIA positioning registers is quite
predictable and completely independent from its graphics
output logic, as documented above. What remains are issues
involving the timing of RESPn commands, given that the TIA
counts things at 1/4 clock and the CPU runs at 1/3 CLK =)

Following is a table of the cycle decodes for the Player
counters, starting from CLK=0 when the counter resets. This
is an excerpt from the Player Counter table listed elsewhere
in the document (I recomment you go have a look, the spacing
between events should look oddly familiar ;)

Value	PCount	CPU	CLK	Event

111000	3	4	12	START DRAWING (NUSIZ=001,011)	Close
101111	7	9.3	28	START DRAWING (011,010,110)	Medium
111001	15	20	60	START DRAWING (100,110)		Far
101101	39	52	156	RESET, START DRAWING (always)	Main

The columns from the left are: the polynomial counter state,
(see notes above), the decimal value that the player counter
is up to, the number of CPU cycles since the counter reset,
and the number of CLKs elapsed since the counter reset.

You'll notice I'm now talking about everything relative to
RESPn on the current scanline, rather than the beginning of
the scanline. This is because this is all that matters.
You should understand the following point:

  If you hit RESPn at least twice on every scanline,
  you will never see the 'main' copy of that player,
  ever, on any scanline.

This is because the counter will always be reset before it
manages to complete a full 40 counts (160 CLK), and so the
'main' copy will never start drawing.

This is tricky to test, especially if you don't reset a few
things when you stop (eg, for VSync) - whenever you stop
hiting RESPn, you will start to get the normal output on the
next and subsequent scanlines, including the 'main' copy.
The very top visible scanline is a perfectly valid
'subsequent scanline' after the very bottom visible scanline,
once you get past the first frame ;)

If you've set up NUSIZn for 3 copies close (011), you'll be
getting four copies on every scanline on which you hit RESPn
twice, as long as they are far enough apart. This works because
it doesn't take a counter wrap-around to get to the 'close' and
'medium' copies as shown in the table above. They will appear
4+12+1=17 and 4+28+1=33 pixels after each RESPn CLK arrives
in the TIA (it takes 4 extra CLK to reset the counter, and 1
extra CLK to start the graphics output).

It's important to note, that as long as the second RESPn on
the line causes a reset after the 'start' signal has been
generated for the 'medium' copy of the first RESPn, you will
get four copies regardless of how far apart the RESPn hits
are. If you do the second RESPn too soon you'll end up with
only three copies - the 'close' from the first RESPn, followed
by the the 'close' and the 'medium' from the second RESPn.
If you do the second RESPn before the first 'close' copy,
you'll only end up with the 'close' and 'medium' from the
second RESPn.

From this it follows that if you set NUSIZ0 to 011, hit RESPn
and wait until the 'medium' copy has started, then change NUSIZ0
to 100 or 110, you will get all of 'close', 'medium' and 'far'.


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Re-triggering after exactly 18, 33, 66 or 162 cycles

These are special cases only because resetting a Position Counter
(RSPn, RESMn, RESBL) also resets the two-phase clock attached
to it, and this in turn affects the clocked logic on the output
of the counter decodes.

For the player counters, this affects the four decodes that
produce the Start signal for copies of the player graphics.
These are generated 12, 28, 60, and 160 CLK after the Position
Counter has been reset, in order to trigger the 'close', 'medium',
'far' and 'main' copies.

These decodes pass through a block of logic that requires a full
cycle of the two-phase clock (hence the normal 4 CLK delay before
graphics output common to all movable graphics objects). If the
Position Counter and therefore the two-phase clock are reset
during this decoding process, the Start signal will either be
lost or delayed up to 3 CLK depending on exact timing.

This effect is most evident when attempting to re-trigger the
player graphics over and over again. For example, examine this
retriggering technique:

        STA     RESP0   ;3 reset P0, call this 0 CLK.
        CMP     $EA     ;3 nop
        STA     RESP0   ;3 reset P0 again, after 18 CLK.
        CMP     $EA     ;3 nop
        STA     RESP0   ;3 reset P0 again, after 18 CLK.

The visible result of this will be a 'close' copy of P0 shifted
right by two pixels from the expected position, followed by a
second 'close' copy shifted right by two pixels, and finally a
third 'close' copy, not shifted right. There will be an 18 pixel
gap between the first two copies of P0, and only a 16 pixel gap
before the third copy.

In order to fix up the spacing of the final copy, it is necessary
to trigger P0 yet again exactly 18 CLK later, but clear GRP0 in
the mean time so nothing is drawn.

If the retriggering will be continuing onto the next line there is
no need to do this; just ensure that the first re-trigger on the
next line happens 18 visible pixels after the last RESP0 on the
previous line (ie 18 CLK later, minus HBlank time).


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Notes on the Ball/Missile width enclockifier

Just to reiterate, ball width is given by combining clock signals
of different widths based on the state of the two size bits (the
gates form an AND -> OR -> AND -> OR -> out arrangement, with a
hanger-on AND gate).

The Enable (output) signal is built in two halves, arranged back-
to-back at the final OR gate.

The first half comes from one of three sources combined through
the earlier OR gate and then AND-ed with the Start signal:

(1) If D4 and D5 are both clear, one of the two-phase clock signals
(active 1 in 4 colour CLK) yields a single pixel of output.
(2) If D4 is set, a line active 2 in every 4 colour CLK is borrowed
from the two-phase clock generator (this yields 2 pixels).
(3) Finally D5 itself is used directly - the Start signal is active
for 4 CLK so this generates 4 pixels.

The second half is added if both D4 and D5 are set; a delayed copy
of the Start signal (4 colour CLK wide again) is OR-ed into the
Enable signal at the final OR gate.

I hope someone had as much fun building this little circuit as I
had pulling it apart again ;p


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ CPU Clock to Player Pixel Table

The Player Position Counter can be reset to zero (with RESP0/1) on
any CPU cycle as shown below, and copies will appear at the pixel
positions listed for 'close', 'medium' and/or 'far' depending on
the flags in NUSIZ; 1, 2, 3 or (if you change NUSIZ at the right
time) 4 copies at hard-wired positions after the reset. If the
counter is allowed to wrap around, the 'main' copy will appear
on the next line.

Resetting the counter takes 4 CLK, decoding the 'start drawing' signal
takes 4 CLK, latching the 'start' takes a further 1 CLK giving a
total 9 CLK delay after a RESP0/1. Since the playfield takes 4 CLK
to start drawing the player is visibly delayed by exactly 5 CLK -
hence the magic '5' :)

NOTE: The player counter can be safely reset 18 CLK after the previous
reset and the previous copy will still be drawn. BUT the 'start' signal
for the previous copy will be delayed a further 2 CLK due to the 2-
phase clock being reset before the 'start' signal has been clocked
through to the 'start' latch.

CPU	CLK	Pixel	Main	Close	Medium	Far	PF

0	0	-	1	17	33	65	-
...
22	66	-	1	17	33	65	-
22.6 --------------------------------------------------------
23	69	1	6	22	38	70	0.25
24	72	4	9	25	41	73	1
25	75	7	12	28	44	76	1.75
26	78	10	15	31	47	79	2.5
27	81	13	18	34	50	82	3.25
28	84	16	21	37	53	85	3
29	87	19	24	40	56	88	
30	90	22	27	43	59	91	
31	93	25	30	46	62	94	
32	96	28	33	49	65	97	
33	99	31	36	52	68	100	
34	102	34	39	55	71	103	
35	105	37	42	58	74	106	
36	108	40	45	61	77	109	
37	111	43	48	64	80	112	
38	114	46	51	67	83	115	
39	117	49	54	70	86	118	
40	120	52	57	73	89	121	
41	123	55	60	76	92	124	
42	126	58	63	79	95	127	
43	129	61	66	82	98	130	
44	132	64	69	85	101	133	
45	135	67	72	88	104	136	
46	138	70	75	91	107	139	
47	141	73	78	94	110	142	
48	144	76	81	97	113	145	
49	147	79	84	100	116	148	
50	150	82	87	103	119	151	
51	153	85	90	106	122	154	
52	156	88	93	109	125	157	
53	159	91	96	112	128	0	
54	162	94	99	115	131	3	
55	165	97	102	118	134	6	
56	168	100	105	121	137	9	
57	171	103	108	124	140	12	
58	174	106	111	127	143	15	
59	177	109	114	130	146	18	
60	180	112	117	133	149	21	
61	183	115	120	136	152	24	
62	186	118	123	139	155	27	
63	189	121	126	142	158	30	
64	192	124	129	145	1	33	
65	195	127	132	148	4	36	
66	198	130	135	151	7	39	
67	201	133	138	154	10	42	
68	204	136	141	157	13	45	
69	207	139	144	0	16	48	
70	210	142	147	3	19	51	
71	213	145	150	6	22	54	
72	216	148	153	9	25	57	
73	219	151	156	12	28	60	
74	222	154	159	15	31	63	
75	225	157	2	18	34	66	
76	228	0	5	21	37	69	
----------------------------------------------------- Start HBLANK

Also note that hitting RESP0 before HBLANK has finished will reset
the counter immediately, but it will only start counting again when
HBLANK goes off. Due to output clocking, this will produce player
graphics at playfield pixel 1.


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ The Venerable 6-digit Score Trick

The 6-digit score trick involves putting both players into 3-repeat
mode (011 or 110 in NUSIZ0/1) and resetting them such that all the
player 2 images are positioned exactly between all the player 1
images, ergo:

  P1  P2
  v   v
  1   2   1   2   1   2

Then you need to set the graphics up (GRP0/1) for the first two
digits, and write some very precise timing code to wait until the
scan-line is just about to start drawing the first copy of P1.
While you're waiting, get the rest of the graphics loaded into
the registers (A, X, and Y).

At this point you need to start storing all the graphics you've
loaded into GRP0 and GRP1 as fast as you can - it will look like
this because there's only one way to do it fast enough:

  STA GRP0  ; 3
  STX GRP1  ; 3
  STY GRP0  ; 3
  ST? GRP1  ; 3   we've run out of registers!

Notice that each one takes 3 cycles to execute (which is 9 pixels)
and makes the change on the -end- of the 3rd cycle. We could use
the stack pointer register (S) for the last one and do a TSX, but
that would take 5 cycles (that's 15 pixels) which is too long.

To get it working you need to turn on VDELP0/1 (vertical delay)
which allows you to set up the first 3 digits in the TIA's
graphics registers before the beginning of the scanline, and
requires only the 3 remaining registers to hold the last 3 digits.

I've found a post in the Stellar archives that explains this
technique in great detail, so I'll stop here.

http://www.biglist.com/lists/stella/archives/199704/msg00137.html


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Fine Print

Please note that these notes are my own, and are made available
without any warranties of any kind. They may include errors,
omissions and much that is apocryphal; use at your own risk.

Please let me know if you spot anything that is blatantly wrong
and I'll update the document. I'm also happy to answer any
questions about this stuff.

Copyright (C) Andrew Towers 2003