>> << Down to: Dyad Back to: Vocabulary Thru to: Dictionary
Rank 1 -- operates on lists of y, producing a list of variable length for each one -- WHY IS THIS IMPORTANT?
Partitions a string into boxed words according to J's rules for word formation.
In other programming languages, words are called tokens.
;: '<;._1' +-+--+--+ |<|;.|_1| +-+--+--+
Note that an entire numeric list or a comment is a single word.
This matters if you use (;:) to tokenize some syntax resembling J but not J
y=: 'Fine, easy as 1 2 3?' ;: y +----+-+----+--+-----+-+ |Fine|,|easy|as|1 2 3|?| +----+-+----+--+-----+-+ z=: ' ''s t''=: 3 5' NB. multiple assignment' ;: z +-----+--+---+-----------------------+ |'s t'|=:|3 5|NB. multiple assignment| +-----+--+---+-----------------------+
Compare this with the action of verb cutopen
cutopen is a
cutopen y +-----+----+--+-+-+--+ |Fine,|easy|as|1|2|3?| +-----+----+--+-+-+--+Common uses
1. Partition an open list into boxed strings
;: 'alpha bravo charlie' +-----+-----+-------+ |alpha|bravo|charlie| +-----+-----+-------+
Useful if you are sure J's word rules will work for your y.
Not recommended for general parsing, because ;:y signals a J error if the string is ill-formed according to J's rules
]z =: 'Eugene O''Neill' Eugene O'Neill ;: z |open quote | Eugene O'Neill
Rank Infinity -- operates on x and y as a whole -- WHY IS THIS IMPORTANT?
Partitions a string (or a more general kind of list) y according to the rules of a given finite-state (Mealy) machine defined by noun x.
Common uses1. Tokenize strings of code following some other syntax than J. The x-argument defines the required syntax.
Processing a string using (x ;: y) is typically many times faster than using other primitives.
The entry in the J Dictionary for ;: has a fully-worked example.
A brief overview is as follows, indicating the key terms
The y argument is the sequence of inputs to the machine. y will be converted to a list of numbers using the m portion of the x argument.
Each item of y produces one input to the machine. The inputs become column indexes to the state table.
The x argument: description of the machineThe machine description, the x argument to (;: y), is a list of boxes f;s;m;ijrd . m and ijrd may be omitted.
m, controlling conversion of the y argumentm controls the conversion of the items of y to column numbers. m may be:
ijrd gives the initial values of 4 variables used by the sequential machine
s gives the row/action table. This table has as many rows as needed to encode all the states of the sequential machine, and as many columns as there are possible columns numbers of mapped input. Each 1-cell of s is a 2-element list, containing in order:
Each action code denotes a combination of changes applied to the output array and indexing at the next step with the meaning as follows:
where indexing at the next step is relative to the current input index i:
The smview
verb provided by the graphviz addon can be used to visualize one of these state tables.
When the action code indicates that something should be added to the output, the value that is appended depends on the f parameter, which came from the x argument to (x ;: y) and the values of the iteration variables. The values appended for different values of f are (r=row number, c=column number, j=word pointer, i=input pointer):
The indicated data is appended to the output array whenever an 'add single word' action is executed.
The 'add multiple words' action also adds the data, but coalesces multiple occurrences into a single added value. 'Add multiple words' actions are delayed, and consecutive ones from the same state are combined into a single word that is emitted only when the input string is exhausted, or a word is emitted from a different state.
Running the MachineExecution of the state machine follows this program:
NB. On input the variables i, j, r, and d have been set, the state table s and the NB. output format f have been extracted, the argument y has been converted into a NB. list of column numbers n, and the result has been initialized to a zero-item array NB. where the items have the proper shape for f. sq=:4 :0 'f s m ijrd' =. x,(#x)}.0;0;'';0 _1 0 _1 assert. 2 <: #x 'i j r d' =. ijrd 'pj pr' =. j,_1 if. 0 < L. m do. n =. (y i.~;m) { (#m),~(#&>m)#i.#m elseif. ''-:m do. n =. y elseif. do. n =. (a.i.y){m end. result=. f {:: (0#a:);'';i.&.>0 2;0;0 3;0 6 while. i <: #n do. if. i = #n do. if. d >: 0 do. 'newrow action' =. (<r,c =. d) { s elseif. j = _1 do. break. elseif. f = 5 do. break. NB. Don't output final flush elseif. do. 'newrow action' =. 0 5 end. else. 'newrow action' =. (<r,c =. i { n) { s end. assert. newrow < #s if. f = 5 do. result =. result , i, j, r, c, newrow, action end. select. action case. 0 do. case. 6 do. break. case. 7 do. i =. i-1 continue. NB. backtrack fcase. 2;3;4;5 do. NB. emit assert. j >: 0 if. f ~: 5 do. ej=. ((r=pr)*action>3) { j,pj select. f case. 0 do. newdata =. < ej }. i {. y case. 1 do. newdata =. ej }. i {. y case. 2 do. newdata =. ej , i case. 3 do. newdata =. (}:$s) #: r,c case. 4 do. newdata =. ej , (i-ej) , (}:$s) #: r,c case. do. 'Invalid output type' 13!:8 (1) end. if. (action <: 3)+.r~:pr do. result =. result , newdata else. result =. newdata (<:#result)} result end. end. if. r~:pr do. pj =. j end. pr =. action {_1 _1 _1 _1,r,r case. 1 do. j =. (action e. 1 2 4) { _1,i case. do. 'Invalid action' 13!:8 (1) end. NB. end of select. action r =. newrow i =. i + 1 end. result )
Notes:
To illustrate the use of (x ;: y) we will build a machine to recognize C-style hex constants in character input, where a hex constant is '0x' followed by any positive number of hexadecimal digits.
We see that the input characters are in 4 classes
We will assign these column numbers 3, 2, 1, and 0 respectively. The conversion control m can be generated by the statements
m =. a. e. '0x123456789abcdefABCDEF' m =. m + a. e. '0x' m =. m + a. e. '0'
and can be verified by
(a. i. '0x2aq') { m 3 2 1 1 0
We can build the state table s now. There will be 4 rows. First, we wait for 0; then we expect x; then we expect a hexadecimal digit, signaling start-of-word if we see it; then we wait for a non-hexadecimal-digit, and output the word when we get one. The state table will look like this:
This state table is generated by:
s =. 1 4 2 $ 0 0 0 0 0 0 1 1 s =. s , 4 2 $ 0 0 0 0 2 0 0 0 s =. s , 4 2 $ 0 0 3 0 0 0 3 0 s =. s , 4 2 $ 0 3 3 0 0 3 3 0
and we use it with
(0;s;m;0 _1 0 0) ;: 'qqq0x30x30x40x0xxxx' +----+----+ |0x30|0x40| +----+----+ (0;s;m;0 _1 0 0) ;: 'qqq0x30x30x40x0x34a' +----+----+-----+ |0x30|0x40|0x34a| +----+----+-----+
Note in the second example that 0x34a was emitted by the end-of-input action.
Words, revisitedThe behavior of ;:y can be emulated, approximately (the monad can trigger open quote errors), using (0;sj;mj);:y where
mj=: 256$0 NB. X other mj=: 1 (9,a.i.' ')}mj NB. S space and tab mj=: 2 (,(a.i.'Aa')+/i.26)}mj NB. A A-Z a-z excluding N B mj=: 3 (a.i.'N')}mj NB. N the letter N mj=: 4 (a.i.'B')}mj NB. B the letter B mj=: 5 (a.i.'0123456789_')}mj NB. 9 digits and _ mj=: 6 (a.i.'.')}mj NB. . the decimal point mj=: 7 (a.i.':')}mj NB. : the colon mj=: 8 (a.i.'''')}mj NB. Q quote mj=: 9 (a.i.'{')}mj NB. { the left curly brace mj=:10 (10)} mj NB. LF mj=:11 (a.i.'}')}mj NB. } the right curly brace sj=: 0 10#:10*}.".;._2(0 :0) ' X S A N B 9 . : Q { LF }']0 1.1 0.0 2.1 3.1 2.1 6.1 1.1 1.1 7.1 11.1 10.1 12.1 NB. 0 space 1.2 0.3 2.2 3.2 2.2 6.2 1.0 1.0 7.2 11.2 10.2 12.2 NB. 1 other 1.2 0.3 2.0 2.0 2.0 2.0 1.0 1.0 7.2 11.2 10.2 12.2 NB. 2 alp/num 1.2 0.3 2.0 2.0 4.0 2.0 1.0 1.0 7.2 11.2 10.2 12.2 NB. 3 N 1.2 0.3 2.0 2.0 2.0 2.0 5.0 1.0 7.2 11.2 10.2 12.2 NB. 4 NB 9.0 9.0 9.0 9.0 9.0 9.0 1.0 1.0 9.0 9.0 10.2 9.0 NB. 5 NB. 1.4 0.5 6.0 6.0 6.0 6.0 6.0 1.0 7.4 11.4 10.2 12.4 NB. 6 num 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 8.0 7.0 7.0 7.0 NB. 7 ' 1.2 0.3 2.2 3.2 2.2 6.2 1.2 1.2 7.0 11.2 10.2 12.2 NB. 8 '' 9.0 9.0 9.0 9.0 9.0 9.0 9.0 9.0 9.0 9.0 10.2 9.0 NB. 9 comment 1.2 0.2 2.2 3.2 2.2 6.2 1.2 1.2 7.2 11.2 10.2 12.2 NB. 10 LF 1.2 0.3 2.2 3.2 2.2 6.2 1.0 1.0 7.2 13.0 10.2 1.2 NB. 11 { 1.2 0.3 2.2 3.2 2.2 6.2 1.0 1.0 7.2 1.2 10.2 14.0 NB. 12 } 1.2 0.3 2.2 3.2 2.2 6.2 1.7 1.7 7.2 1.2 10.2 1.2 NB. 13 {{ 1.2 0.3 2.2 3.2 2.2 6.2 1.7 1.7 7.2 1.2 10.2 1.2 NB. 14 }} )
FIXME: link to relevant point in j version history, once that has been published, for this version of the state table.
More InformationFor detailed development of elementary examples, see: Vocabulary/SequentialMachineNotes and Generate useful tokens from input.
Also explore the "Sequential Machine" and "Huffman Code" labs from the J session, in Help > Studio > Labs...
in J6, the labs are found at Studio > Labs...
in J8, the labs are found at Help > Studio > Labs...
The entry in the J Dictionary for (;:) also contains useful examples to build the noun x to define your own Sequential Machine, including a complete definition equivalent to the monadic use of (;:) to tokenize a string of J code.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4