== Details on how characterisation is tracked through the different rule
applications 


For each application of a rule s/_target_/_format_/g applied to string1 to get
string2 (where string1 != string2), produce two integer vectors of length
string2.length()+2. The first and last cell of the vector represent the
zero-length start and end anchors respectively. The other cells all align to a
character in the new string. 

Vector 1 is the startmap. Each cell contains the offset required to get the
index of the cell in the previous startmap that has the same start position.

Vector 2 is the endmap. Each cell contains the offset required to get the index
of the cell in the previous endmap that has the same end position. 

That is, the start position of the ith (1-based) character of string2 is the
same as the jth character of string1, when j = i+startmap[i]. If j == 0 or
string2.length()-1 (last cell), the 'character' indicated is actually a
boundary. Initial mapping vectors (before any rules are applied) make sure that
these boundary 'characters' are adjusted to point to the right character. (The
first cell of startmap is shifted one to the right, the last cell of endmap is
shifted one to the left.)

Calculating these start and end maps relies on boundaries within the _target_
and _format_. Relevant boundaries are:

* start and end of string1 and string2
* start and end of the spans of string1 matched by _target_
* start and end of the spans of string2 produced by applying _format_ to 
	_target_
* start and end position of every capture group from _target_ matched in string1
	and the start and end positions of spans in string2 produced by expanding
	capture group references, *up to the point that the capture groups are in
	order in _target_*

After a partial match (a single match of _target_), spans in the (possibly
not-yet complete) string2 can be divided into four types:

* Span prior to the match (but after any previous matches)
	- offset is zero, plus any shift incurred in previous matches
* Span after the final match
	- offset is zero, plus any shift incurred in this or previous matches
* Span produced by expanding capture group references
	- offset is zero, plus any shift incurred in previous matches, or so far in
	  this one
* Span within the match, but not coming from a capture group
	- should have a start point equal to the end point of the last capture group
	  reference (or the start of the match)
	- should have an end point equal to the start point of the next capture group
	  reference (or the end of the match)
	- start offset is decreased through the span, so each character maps to the
	  same point, with the offset of the first character equal to any shift
	  incurred in previous matches, or so far in this one
	- end offset is decreased through the span, so each character maps to the
	  same point, with the offset of *last* character of this span equal to any
	  shift incurred in previous matches, or so far in this one, plus any shift
	  resulting from this span. To get the right offset for the *first* character
	  of the span, add size of span (gap) - 1.

 where shift is the difference between the matched span and the replacement
 span. Since match prefix, suffix and capture groups replace like with like,
 only spans of the last type can cause a modification to shift.

After final tokenisation (where again, indexes need to be kept), the first and
last character of a token can be traced back through the {start,end}maps. The
final index from the startmap should be decremented by one (since it's a 1-based
array to allow for the initial anchor). Final index from the endmaps should give
the token end position directly.