The Glyph Substitution (GSUB) table provides data for substitution of glyphs for appropriate rendering of scripts, such as cursively-connecting forms in Arabic script, or for advanced typographic effects, such as ligatures.
Many language systems require substitution of alternate glyph forms. For example, in the Arabic script, the glyph shape that depicts a particular character varies according to its position in a word or text string (see Figure 1). In other language systems, glyph substitutes are aesthetic options for the user, such as the use of ligature glyphs in the English language (see Figure 2).
Figure 1. Isolated, initial, medial, and final forms of the Arabic character HAH Figure 2. Two Latin glyphs and their associated ligatureOpenType fonts use character encoding standards, such as the Unicode Standard, that assumes a distinction between characters and glyphs: text is encoded as sequences of characters, and the 'cmap' table provides a mapping from that character to a single default glyph. Multiple characters are not directly mapped to a single glyph, as needed for ligatures; and a single character is not mapped directly to multiple glyphs, as may be needed for some complex-script scenarios. The GSUB table provides a way to describe such substitutions, enabling applications to apply such substitutions during text layout and rendering to achieve desired results.
To access substitute glyphs, GSUB maps from the glyph index or indices defined in a 'cmap' subtable to the glyph index or indices of the substitute glyphs. For example, if a font has three alternative forms of an ampersand glyph, the 'cmap' table associates the ampersandâs character code with only one of these glyphs. In GSUB, the indices of the other ampersand glyphs are then referenced from this one default index.
The text-processing client uses the GSUB data to manage glyph substitution actions. GSUB identifies the glyphs that are input to and output from each glyph substitution action, specifies how and where the client uses glyph substitutes, and regulates the order of glyph substitution operations. Any number of substitutions can be defined for each script or language system represented in a font.
The GSUB table supports seven types of glyph substitutions that are widely used in international typography:
A single substitution replaces a single glyph with another single glyph. This is used, for example, to render positional glyph variants in Arabic and vertical text in East Asia (see Figure 3).
Figure 3. Alternative forms of parentheses used when positioning Kanji verticallyA multiple substitution replaces a single glyph with more than one glyph. This is used to specify actions such as ligature decomposition (see Figure 4).
Figure 4. Decomposing a Latin ligature glyph into its individual glyph componentsAn alternate substitution identifies functionally equivalent but different looking forms of a glyph. These glyphs are often referred to as aesthetic alternatives. For example, a font might have five different glyphs for the ampersand symbol, but one would have a default glyph index in the 'cmap' table. The client could use the default glyph or substitute any of the four alternatives (see Figure 5).
Figure 5. Alternative ampersand glyphs in a fontA ligature substitution replaces several glyph indices with a single glyph index, as when an Arabic ligature glyph replaces a string of separate glyphs (see Figure 6). When a string of glyphs can be replaced with a single ligature glyph, the first glyph is substituted with the ligature. The remaining glyphs in the string are deleted, this does not include those glyphs that are skipped as a result of lookup flags.
Figure 6. Three Arabic glyphs and their associated ligature glyphContextual substitution is an extension of the above lookup types, describing glyph substitutions in context â that is, a substitution of one or more glyphs within a certain pattern of glyphs. Each substitution describes one or more input glyph sequences and one or more substitutions to be performed on that sequence. Contextual substitutions can be applied to specific glyph sequences, glyph classes, or sets of glyphs.
Chained contexts substitution extends the capabilities of contextual substitution. As with contextual substitution, actions can be performed on one or more glyphs within a pattern of glyphsâthe input sequence. But the actions can be constrained by chained glyph sequence contexts: a backtrack sequence that precedes the input sequence, and a lookahead sequence that follows the input sequence. Three formats allow the backtrack, input and lookahead sequence patterns to be described using specific glyphs, glyph classes, or glyph sets.
Reverse Chaining contextual single substitution allows one glyph to be substituted with another by chaining input glyph to a backtrack and/or lookahead sequence. The difference between this and other lookup types is that processing of input glyph sequence goes from end to start.
The GSUB data formats used to implement the different types of substitution include an eighth type, substitution extension. This provides a format extension mechanism, allowing reference to subtables using 32-bit offsets rather than 16-bit offsets. It does not provide an additional type of substitution action, however.
GSUB table and OpenType Font VariationsOpenType Font Variations allow a single font to support many design variations along one or more axes of design variation. For example, a font with weight and width variations might support weights from thin to black, and widths from ultra-condensed to ultra-expanded. For general information on OpenType Font Variations, see the chapter, OpenType Font Variations Overview.
In a variable font, it may be desirable to have different glyph-substitution actions used for different regions within the fontâs variation space. For example, for narrow or heavy instances in which counters become small, it may be desirable to make certain glyph substitutions to use alternate glyphs with certain strokes removed or outlines simplified to allow for larger counters. Such effects can be achieved using a FeatureVariations table within the GSUB table. The FeatureVariations table is described in the chapter, OpenType Layout Common Table Formats. See also the Required Variation Alternates ('rvrn') feature in the OpenType Layout tag registry.
GSUB table organizationThe GSUB table begins with a header that defines offsets to a ScriptList, a FeatureList, a LookupList, and an optional FeatureVariations table (see Figure 7):
For a detailed discussion of ScriptLists, FeatureLists, LookupLists, and FeatureVariation tables, see the chapter, OpenType Layout Common Table Formats.
Figure 7. High-level organization of GSUB tableThis organization helps text-processing clients to easily locate the features and lookups that apply to a particular script or language system. To access GSUB information, clients should use the following procedure:
For a detailed description of the Feature Variations table and how it is processed, see the FeatureVariations table section in the OpenType Layout Common Table Formats chapter.
Lookup data is defined in Lookup tables, which are defined in the OpenType Layout Common Table Formats chapter. A Lookup table contains one or more Lookup subtables that define the specific conditions, type, and results of a substitution action used to implement a feature. Specific Lookup subtable types are used for glyph substitution actions, and are defined in this chapter. All subtables within a Lookup table must be of the same lookup type, as listed in the following table for the GsubLookupType enumeration:
GsubLookupType enumeration
Each lookup type has one or more subtable formats. The âbestâ format depends on the type of substitution and the resulting storage efficiency. When glyph information is best presented in more than one format, a single lookup may define more than one subtable, as long as all the subtables are for the same lookup type. For example, within a given lookup, a glyph index array format could best represent one set of target glyphs, whereas a glyph index range format could be better for another set.
A series of substitution operations on the same glyph or string requires multiple lookups, one for each separate action. Each lookup has a different array index in the LookupList table and is applied in the LookupList order. The substitution action of each lookup is applied to the results of previous lookups. Some substitution lookups could âfeedâ later lookups by producing a glyph sequence that matches the input sequence pattern of a later lookup that would not have matched the original glyph sequence. The opposite is also possible: one substitution lookup produces a glyph sequence that does not match the pattern of a later lookup that would have matched the original glyph sequence. Thus, the ordering of lookups in the LookupList can be very significant.
During text processing, a client applies a lookup to each glyph in the string before moving to the next lookup. A lookup is finished for a glyph after the client locates the target glyph or glyph context and performs a substitution, if specified. To move to the ânextâ glyph, the client will skip all the glyphs that participated in the lookup operation: glyphs that were substituted as well as any other glyphs that formed an input sequence context for the operation. Only glyphs in the input sequence are skipped; in the case of chained contexts substitution, the glyphs in the lookahead sequence are not skipped.
The next section of this chapter describes the GSUB header and the subtables defined for each GsubLookupType. Examples at the end of this chapter illustrate the GSUB header and six of the eight LookupTypes, including the three formats available for contextual substitutions (LookupType 5).
GSUB table structuresThe GSUB table begins with a header that contains a version number for the table and offsets to three tables: ScriptList, FeatureList, and LookupList. For descriptions of each of these tables, see the chapter, OpenType Layout Common Table Formats. Example 1 at the end of this chapter shows a GSUB Header version 1.0 table definition.
GSUB Header, version 1.0
Type Name Description uint16 majorVersion Major version of the GSUB table, = 1. uint16 minorVersion Minor version of the GSUB table, = 0. Offset16 scriptListOffset Offset to ScriptList table, from beginning of GSUB table. Offset16 featureListOffset Offset to FeatureList table, from beginning of GSUB table. Offset16 lookupListOffset Offset to LookupList table, from beginning of GSUB table.GSUB Header, version 1.1
Type Name Description uint16 majorVersion Major version of the GSUB table, = 1. uint16 minorVersion Minor version of the GSUB table, = 1. Offset16 scriptListOffset Offset to ScriptList table, from beginning of GSUB table. Offset16 featureListOffset Offset to FeatureList table, from beginning of GSUB table. Offset16 lookupListOffset Offset to LookupList table, from beginning of GSUB table. Offset32 featureVariationsOffset Offset to FeatureVariations table, from beginning of the GSUB table (may be NULL). Lookup type 1 subtable: single substitutionSingle substitution (SingleSubst) subtables tell a client to replace a single glyph with another glyph. The subtables can be either of two formats. Both formats require two distinct sets of glyph indices: one that defines input glyphs (specified in the Coverage table), and one that defines the output glyphs. Format 1 requires less space than format 2, but it is less flexible.
Single substitution format 1
Format 1 calculates the indices of the output glyphs, which are not explicitly defined in the subtable. To calculate an output glyph index, format 1 adds a constant delta value to the input glyph index. The input and output glyphs do not need to be in continuous glyph ID ranges, but the delta between input glyph IDs and output glyph IDs need to be constant. This format does not use the Coverage index that is returned from the Coverage table.
The SingleSubstFormat1 subtable begins with a format identifier of 1. An offset references a Coverage table that specifies the indices of the input glyphs. The deltaGlyphID is a constant value added to each input glyph index to calculate the index of the corresponding output glyph. Addition of deltaGlyphID is modulo 65536. If the result after adding deltaGlyphID to the input glyph index is less than zero, add 65536 to obtain a valid glyph ID.
Example 2 at the end of this chapter uses format 1 to replace standard numerals with lining numerals.
SingleSubstFormat1 subtable
Type Name Description uint16 format Format identifier: format = 1. Offset16 coverageOffset Offset to Coverage table, from beginning of substitution subtable. int16 deltaGlyphID Add to original glyph ID to get substitute glyph ID.Single substitution format 2
Format 2 is more flexible than format 1 but requires more space. It provides an array of output glyph indices (substituteGlyphIDs) explicitly matched to the input glyph indices specified in the Coverage table.
The SingleSubstFormat2 subtable specifies a format identifier, an offset to a Coverage table that defines the input glyph indices, and an array of output glyph indices (substituteGlyphIDs).
The substituteGlyphIDs array must contain the same number of glyph indices as the Coverage table, and the glyphs must be ordered to match the order of corresponding input glyphs in the Coverage table. To locate the corresponding output glyph index in the substituteGlyphIDs array, this format uses the Coverage index returned from the Coverage table.
Example 3 at the end of this chapter uses format 2 to substitute vertically oriented glyphs for horizontally oriented glyphs.
SingleSubstFormat2 subtable
Type Name Description uint16 format Format identifier: format = 2. Offset16 coverageOffset Offset to Coverage table, from beginning of substitution subtable. uint16 glyphCount Number of glyph IDs in the substituteGlyphIDs array. uint16 substituteGlyphIDs[glyphCount] Array of substitute glyph IDs â ordered by Coverage index. Lookup type 2 subtable: multiple substitutionA multiple substitution (MultipleSubst) subtable replaces a single glyph with a sequence of glyphs, as when multiple glyphs replace a single ligature. The subtable has a single format.
Multiple substitution format 1
The MultipleSubstFormat1 subtable specifies a format identifier, an offset to a Coverage table that defines the input glyph indices, and an array of offsets to Sequence tables that define the output glyph indices. The Sequence table offsets are ordered by the Coverage index of the input glyphs.
For each input glyph listed in the Coverage table, a Sequence table defines the output glyphs. Each Sequence table contains a count of the glyphs in the output glyph sequence and an array of output glyph indices.
Note: The order of the output glyph indices depends on the writing direction of the text. For text written left to right, the left-most glyph will be first glyph in the sequence. Conversely, for text written right to left, the right-most glyph will be first.
The use of multiple substitution for deletion of an input glyph is prohibited. The glyphCount value must always be greater than 0.
Example 4 at the end of this chapter shows how to replace a single ligature with three glyphs.
Type Name Description uint16 format Format identifier: format = 1. Offset16 coverageOffset Offset to Coverage table, from beginning of substitution subtable. uint16 sequenceCount Number of Sequence table offsets in the sequenceOffsets array. Offset16 sequenceOffsets[sequenceCount] Array of offsets to Sequence tables. Offsets are from beginning of substitution subtable, ordered by Coverage index.Sequence table
Type Name Description uint16 glyphCount Number of glyph IDs in the substituteGlyphIDs array. This must always be greater than 0. uint16 substituteGlyphIDs[glyphCount] String of glyph IDs to substitute. Lookup type 3 subtable: alternate substitutionAn alternate substitution (AlternateSubst) subtable identifies any number of aesthetic alternatives from which a user can choose a glyph variant to replace the input glyph. For example, if a font contains four variants of the ampersand symbol, the 'cmap' table will specify the index of one of the four glyphs as the default glyph index, and an AlternateSubst subtable will list the indices of the other three glyphs as alternatives. A text-processing client would then have the option of replacing the default glyph with any of the three alternatives.
The subtable has one format.
Alternate substitution format 1
The AlternateSubstFormat1 subtable contains a format identifier, an offset to a Coverage table containing the indices of glyphs with alternative forms, and an array of offsets to AlternateSet tables.
For each glyph in the Coverage table, an AlternateSet subtable contains a count of the alternative glyphs and an array of their glyph indices. Because all the glyphs are functionally equivalent, they can be in any order in the array.
Example 5 at the end of this chapter shows how to replace the default ampersand glyph with alternative glyphs.
AlternateSubstFormat1 subtable
Type Name Description uint16 format Format identifier: format = 1. Offset16 coverageOffset Offset to Coverage table, from beginning of substitution subtable. uint16 alternateSetCount Number of AlternateSet tables Offset16 alternateSetOffsets[alternateSetCount] Array of offsets to AlternateSet tables. Offsets are from beginning of substitution subtable, ordered by Coverage index.AlternateSet table
Type Name Description uint16 glyphCount Number of glyph IDs in the alternateGlyphIDs array. uint16 alternateGlyphIDs[glyphCount] Array of alternate glyph IDs, in arbitrary order. Lookup type 4 subtable: ligature substitutionA ligature substitution (LigatureSubst) subtable identifies ligature substitutions where a single glyph replaces multiple glyphs. One LigatureSubst subtable can specify any number of ligature substitutions. The subtable has one format.
Ligature substitution format 1
The LigatureSubstFormat1 subtable contains a format identifier, a Coverage table offset, and an array of offsets to LigatureSet tables. The Coverage table specifies only the index of the first glyph component of each ligature set.
Example 6 at the end of this chapter shows how to replace a string of glyphs with a single ligature.
LigatureSubstFormat1 subtable
Type Name Description uint16 format Format identifier: format = 1. Offset16 coverageOffset Offset to Coverage table, from beginning of substitution subtable. uint16 ligatureSetCount Number of LigatureSet tables. Offset16 ligatureSetOffsets[ligatureSetCount] Array of offsets to LigatureSet tables. Offsets are from beginning of substitution subtable, ordered by Coverage index.A LigatureSet table, one for each covered glyph, specifies all the ligature sequences that begin with the covered glyph. For example, if the Coverage table lists the glyph index for a lowercase âf,â then a LigatureSet table will define ligature that begin with âfâ, such as the âfflâ, âflâ, âffiâ, âfiâ and âffâ ligatures. If the Coverage table also lists the glyph index for a lowercase âeâ, then a different LigatureSet table will define ligatures that begin with âeâ, such as the âetcâ ligature.
A LigatureSet table consists of a count of the ligatures that begin with the covered glyph and an array of offsets to Ligature tables, which define the glyphs in each ligature. The order in the Ligature offset array defines the preference for using the ligatures. For example, if the âfflâ ligature is preferable to the âffâ ligature, then the Ligature array would list the offset to the âfflâ Ligature table before the offset to the âffâ Ligature table.
LigatureSet table
Type Name Description uint16 ligatureCount Number of Ligature tables. Offset16 ligatureOffsets[LigatureCount] Array of offsets to Ligature tables. Offsets are from beginning of LigatureSet table, ordered by preference.For each ligature in the set, a Ligature table specifies the glyph ID of the output ligature glyph; a count of the total number of component glyphs in the ligature, including the first component; and an array of glyph IDs for the components. The array starts with the second component glyph in the ligature (input glyph sequence index = 1, componentGlyphIDs array index = 0) because the first component glyph is specified in the Coverage table.
Note: The componentGlyphIDs array lists glyph IDs according to the writing direction â that is, the logical order â of the text. For text written right to left, the right-most glyph will be first. Conversely, for text written left to right, the left-most glyph will be first.
Ligature table
Type Name Description uint16 ligatureGlyph Glyph ID of ligature to substitute. uint16 componentCount Number of components in the ligature. uint16 componentGlyphIDs[componentCount - 1] Array of component glyph IDs â start with the second component, ordered in writing direction. Lookup type 5 subtable: contextual substitutionA contextual substitution subtable describes glyph substitutions in context that replace one or more glyphs within a certain pattern of glyphs.
Contextual substitution subtables can use any of three formats that are common to the GSUB and GPOS tables. These define input sequence patterns to be matched against the text glyph sequence, and then actions to be applied to glyphs within the input sequence. The actions are specified as ânestedâ lookups, and each is applied to a particular sequence position within the input sequence.
Each sequence position + nested lookup combination is specified in a SequenceLookupRecord. Examples 7, 8, and 9 at the end of this chapter illustrate use of sequence lookup records within the GSUB table.
While the subtable formats are common between the GSUB and GPOS tables, the lookups referenced by sequence lookup records within the GSUB table are referenced by index into the GSUB LookupList table. In this way, actions specified by a GSUB contextual lookup can only be substitutions.
An input sequence pattern is matched against the current glyph sequence before any substitution actions are performed. The substitutions may change the current glyph sequence, but that has no effect on the initial matching operation. For a given lookup subtable, there may be multiple sequence lookup records, and these are processed in the specified order. Each substitution action on the glyph sequence applies to the results from the preceding sequence lookup records. Note in particular that the sequence position index in each sequence lookup record is relative to the glyph sequence as modified by the actions of preceding SequenceLookupRecords.
For example, consider a contextual lookup specifying an input glyph sequence of four glyphs. Suppose that no substitution is performed on the first glyph, but that the middle two glyphs will be replaced with a ligature, and a single glyph will replace the fourth glyph. Suppose also that the actions are listed in that order.
Contextual substitution format 1: simple glyph contexts
Format 1 defines the context for a glyph substitution as a particular sequence of glyphs. For example, a context could be <xyz>, <holiday>, <!?*#@>, or any other glyph sequence.
For example, suppose the glyph string <abc> is to be replaced with its reverse glyph string <cba>. The input context would be defined as the glyph sequence, <abc>. Two single-substitution actions can be specified: the âaâ at sequence position 0 is substituted by âcâ, and the âcâ at sequence position 2 is substituted by âaâ.
Format 1 contextual substitutions are implemented using a SequenceContextFormat1 table. See Sequence context format 1: simple glyph contexts in the OpenType Layout Common Table Formats chapter for complete details.
Example 7 at the end of the chapter uses a SequenceContextFormat1 table to replace a sequence of three glyphs with a sequence preferred for the French language system.
Contextual substitution format 2: class-based glyph contexts
Format 2 defines contexts for glyph substitutions as input sequence patterns, with patterns expressed in terms of glyph classes. The glyph classes are defined using a Class Definition table. Several sequence patterns may be specified, with each pattern specifying a class of glyphs for each input sequence position.
For example, suppose that a swash capital glyph should replace each uppercase letter glyph that is preceded by a space glyph and followed by a lowercase letter glyph (a glyph sequence of space - uppercase - lowercase). The set of uppercase glyphs would constitute one glyph class (class 1), the set of lowercase glyphs would constitute a second class (class 2), and the space glyph would constitute a third class (class 3). The input context might be specified as a pattern of one glyph from class 3, followed by one glyph from class 1, followed by one glyph from class 2.
Format 2 contextual substitutions are implemented using a SequenceContextFormat2 table. See Sequence context format 2: class-based glyph contexts in the OpenType Layout Common Table Formats chapter for complete details.
Example 8 at the end of this chapter uses a SequenceContextFormat2 table to substitute Arabic mark glyphs for base glyphs of different heights.
Contextual substitution format 3: coverage-based glyph contexts
Format 3 defines a context for glyph substitutions as an input sequence pattern, with the pattern expressed in terms of Coverage tables. A different Coverage table is defined for each sequence position.
Format 3 is like format 2 in that patterns are defined using sets of glyphs. However, with the glyph classes used in format 2, each glyph is in exactly one class. With format 3, any glyph can occur in multiple Coverage tables.
Unlike Formats 1 and 2, however, this format can define only one context.
For example, consider an input context that contains a lowercase glyph (position 0), followed by an uppercase glyph (position 1), either a lowercase or numeral glyph (position 2), and then either a lowercase or uppercase vowel (position 3). This context requires four Coverage tables, one for each position:
Format 3 contextual substitutions are implemented using a SequenceContextFormat3 table. See Sequence context format 3: coverage-based glyph contexts in the OpenType Layout Common Table Formats chapter for complete details.
Example 9 at the end of this chapter uses SequenceContextFormat3 to substitute swash glyphs for two out of three glyphs in a sequence.
Lookup type 6 subtable: chained contexts substitutionA chained contexts substitution subtable describes glyph substitutions in context with an ability to look back and/or look ahead in the sequence of glyphs. The design of the chained contexts substitution subtable is parallel to that of the contextual substitution subtable, including the availability of three formats. Each format can describe one or more chained backtrack, input, and lookahead sequence combinations, and one or more substitutions for glyphs in each input sequence.
Note: Substitutions can be specified only for the input sequence context, not for backtrack and lookahead sequences.
See the introduction to the Contextual substitution section for general remarks regarding contextual substitutions, which also apply to chained contexts substitutions.
Note that backtrack sequences are specified in reverse logical order. See the Chained sequence context format 1 section in the OpenType Layout Common Table Formats chapter for details regarding chained backtrack, input, and lookahead sequences.
Chained contexts substitution format 1: simple glyph contexts
Format 1 defines the context for a glyph substitution as a particular sequence of glyphs. For example, a context could be <xyz>, <holiday>, <!?*#@>, or any other glyph sequence. Specific glyph sequences are used for input, backtrack or lookahead contexts.
Format 1 chained context substitutions are implemented using a ChainedSequenceContextFormat1 table. See Chained sequence context format 1: simple glyph contexts in the OpenType Layout Common Table Formats chapter for complete details.
Chained contexts substitution format 2: class-based glyph contexts
Format 2 defines contexts for glyph substitutions as patterns expressed in terms of glyph classes. The glyph classes are defined using a Class Definition table. Several sequence patterns may be specified, with each pattern specifying a class of glyphs for each sequence position.
To chain contexts, three separate Class Definition tables are used for the backtrack sequence, input sequence, and lookahead sequence.
Format 2 contextual substitutions are implemented using a ChainedSequenceContextFormat2 table. See Chained sequence context format 2: class-based glyph contexts in the OpenType Layout Common Table Formats chapter for complete details.
Chained contexts substitution format 3: coverage-based glyph contexts
Format 3 defines contexts for glyph substitutions as patterns expressed in terms of Coverage tables. A different Coverage table is defined for each position in a sequence. To chain contexts, three separate sets of Coverage tables are used for the backtrack sequence, input sequence, and lookahead sequence.
Format 3 is like format 2 in that patterns are defined using sets of glyphs. However, with the glyph classes used in format 2, each glyph is in exactly one class. With format 3, any glyph can occur in multiple Coverage tables.
Format 3 contextual substitutions are implemented using a ChainedSequenceContextFormat3 table. See Chained sequence context format 3: coverage-based glyph contexts in the OpenType Layout Common Table Formats chapter for complete details.
Lookup type 7 subtable: substitution subtable extensionThis lookup type provides a way to access lookup subtables within the GSUB table using 32-bit offsets. This is needed if the total size of the subtables exceeds the 16-bit limits of the various other offsets in the GSUB table. In this specification, the subtable stored at the 32-bit offset location is termed the âextensionâ subtable.
This subtable type uses one format.
Substitution extension format 1
SubstExtensionFormat1 subtable
Type Name Description uint16 format Format identifier. Set to 1. uint16 extensionLookupType Lookup type of subtable referenced by extensionOffset (that is, the extension subtable). Offset32 extensionOffset Offset to the extension subtable, of lookup type extensionLookupType, relative to the start of the ExtensionSubstFormat1 subtable.The extensionLookupType field must be set to any lookup type other than 7. If a lookup table uses extension subtables, then all of the extension subtables must have the same extensionLookupType. All offsets to extension subtables are set in the usual wayâthat is, relative to start of the ExtensionSubstFormat1 subtable.
When a layout engine encounters a GSUB type 7 Lookup table, it shall:
The reverse chaining contextual single substitution subtable (ReverseChainSingleSubst) describes single-glyph substitutions in context with an ability to look back and/or look ahead in the sequence of glyphs. The major difference between this and other lookup types is that processing of input glyph sequence goes from end to start.
Compared to chained contexts substitution (lookup subtable type 6), this format is restricted to only a coverage-based subtable format, input sequences can contain only a single glyph, and only single substitutions are allowed on this glyph. This constraint is integrated into the subtable format.
This lookup type is designed specifically for Arabic script writing styles like Nastaliq in which the shape of the glyph is determined by the following glyph, beginning at the last glyph of the âjoorâ, or set of connected glyphs.
This subtable type uses one format.
Reverse chained contexts single substitution format 1: coverage-based glyph contexts
Format 1 defines a chaining context rule as a sequence of Coverage tables. Each position in the sequence may define a different Coverage table for the set of glyphs that matches the context pattern. With format 1, the glyph sets defined in the different Coverage tables may intersect.
Despite reverse order processing, the order of the Coverage tables listed in the Coverage array must be in logical order (follow the writing direction). The backtrack sequence is as illustrated for the Chained sequence context format 1 table, in the OpenType Layout Common Table Formats chapter. The input sequence is one glyph located at i in the logical string. The backtrack begins at i - 1 and increases in offset value as one moves toward the logical beginning of the string. The lookahead sequence begins at i + 1 and increases in offset value as one moves toward the logical end of the string. In processing a reverse chaining substitution, i begins at the logical end of the string and moves to the beginning.
The subtable contains a Coverage table for the input glyph and Coverage table arrays for backtrack and lookahead sequences. It also contains an array of substitute glyph indices (substituteGlyphIDs), which are substitutions for glyphs in the Coverage table, and a count of glyphs in the substituteGlyphIDs array. The substituteGlyphIDs array must contain the same number of glyph indices as the Coverage table. To locate the corresponding output glyph index in the substituteGlyphIDs array, this format uses the Coverage index returned from the Coverage table.
Example 10 at the end of this chapter uses ReverseChainSingleSubstFormat1 to substitute Arabic glyphs with a correct stroke thickness on the left (exit) to match the stroke thickness on the right (entry) of the following glyph (in logical order).
ReverseChainSingleSubstFormat1 subtable
Type Name Description uint16 format Format identifier: format = 1. Offset16 coverageOffset Offset to Coverage table, from beginning of substitution subtable. uint16 backtrackGlyphCount Number of glyphs in the backtrack sequence. Offset16 backtrackCoverageOffsets[backtrackGlyphCount] Array of offsets to coverage tables in backtrack sequence, in glyph sequence order. uint16 lookaheadGlyphCount Number of glyphs in lookahead sequence. Offset16 lookaheadCoverageOffsets[lookaheadGlyphCount] Array of offsets to coverage tables in lookahead sequence, in glyph sequence order. uint16 glyphCount Number of glyph IDs in the substituteGlyphIDs array. uint16 substituteGlyphIDs[glyphCount] Array of substitute glyph IDs â ordered by Coverage index. GSUB structure examplesThe rest of this chapter describes and illustrates examples of the various GSUB subtables, including each of the three formats available for contextual substitutions. All the examples reflect unique parameters described below, but the samples provide a useful reference for building subtables specific to other situations.
All the examples have three columns showing hex data, source, and comments.
Example 1 shows a typical GSUB Header table definition.
Example 1
Hex Data Source Comments GSUBHeaderExample 2 illustrates the SingleSubstFormat1 subtable , which uses ranges to replace single input glyphs with their corresponding output glyphs. The indices of the output glyphs are calculated by adding a constant delta value to the indices of the input glyphs. In this example, the Coverage table has a format identifier of 2 to indicate the range format, which is used because the input glyph indices are in consecutive order in the font. The Coverage table specifies one range that contains a startGlyphID for the â0â (zero) glyph and an endGlyphID for the â9â glyph.
Example 2
Hex Data Source Comments SingleSubstFormat1Example 3 uses the SingleSubstFormat2 subtable for lists to substitute punctuation glyphs in Japanese text that is written vertically. Horizontally oriented parentheses and square brackets (the input glyphs) are replaced with vertically oriented parentheses and square brackets (the output glyphs).
The Coverage table, format 1, identifies each input glyph index. The number of input glyph indices listed in the Coverage table matches the number of output glyph indices listed in the subtable. For correct substitution, the order of the glyph indices in the Coverage table (input glyphs) must match the order in the Substitute array (output glyphs).
Example 3
Hex Data Source Comments SingleSubstFormat2Example 4 uses a MultipleSubstFormat1 subtable to replace a single âffiâ ligature with three individual glyphs that form the string <ffi>. The subtable defines a format identifier of 1, an offset to a Coverage table that specifies the glyph index of the âffiâ ligature (the input glyph), an offset to a Sequence table that specifies the sequence of glyph indices for the <ffi> string in its substitute array (the output glyph sequence), and a count of Sequence table offsets.
Example 4
Hex Data Source Comments MultipleSubstFormat1Example 5 uses the AlternateSubstFormat1 subtable to replace the default ampersand glyph (input glyph) with one of two alternative ampersand glyphs (output glyph).
In this case, the Coverage table specifies the index of a single glyph, the default ampersand, because it is the only glyph covered by this lookup. The AlternateSet table for this covered glyph identifies the alternative glyphs: AltAmpersand1GlyphID and AltAmpersand2GlyphID.
In Example 5, the index position of the AlternateSet table offset in the AlternateSet array is zero (0), which correlates with the index position (also zero) of the default ampersand glyph in the Coverage table.
Example 5
Hex Data Source Comments AlternateSubstFormat1Example 6 shows a LigatureSubstFormat1 subtable that defines data to replace a string of glyphs with a single ligature glyph. Because a LigatureSubstFormat1 subtable can specify glyph substitutions for more than one ligature, this subtable defines three ligatures: âetcâ, âffiâ, and âfi.â
The sample subtable contains a format identifier (4) and an offset to a Coverage table. The Coverage table, which lists an index for each first glyph in the ligatures, lists indices for the âeâ and âfâ glyphs. The Coverage table range format is used here because the âeâ and âfâ glyph indices are numbered consecutively.
In the LigatureSubst subtable, ligatureSetCount specifies two LigatureSet tables, one for each covered glyph, and the ligatureSetOffsets array stores offsets to them. In this array, the âeâ LigatureSet precedes the âfâ LigatureSet, matching the order of the corresponding first-glyph components in the Coverage table.
Each LigatureSet table identifies all ligatures that begin with a covered glyph. The sample LigatureSet table defined for the âeâ glyph contains only one ligature, âetc.â A LigatureSet table defined for the âfâ glyph contains two ligatures, âffiâ and âfi.â
The sample FLigaturesSet table has offsets to two Ligature tables, one for âffiâ and one for âfi.â The ligatureOffsets array lists the âffiâ Ligature table first to indicate that the âffiâ ligature is preferred to the âfiâ ligature.
Example 6
Hex Data Source Comments LigatureSubstFormat1Example 7 illustrates format 1 contextual substitution, using a SequenceContextFormat1 subtable to replace a string of three glyphs with another string. For the French language system, the subtable defines a contextual substitution that replaces the input sequence, space-dash-space, with the output sequence, thin space-dash-thin space.
The contextual substitution, called Dash Lookup in this example, contains one SequenceContextFormat1 subtable called the DashSubtable. The subtable specifies two contexts: a SpaceGlyph followed by a DashGlyph, and a DashGlyph followed by a SpaceGlyph. In each sequence, a single substitution replaces the SpaceGlyph with a ThinSpaceGlyph.
The Coverage table, labeled DashCoverage, lists two glyph IDs for the first glyphs in the SpaceGlyph and DashGlyph sequences. One SequenceRuleSet table is defined for each covered glyph.
SpaceAndDashSubRuleSet lists all the contexts that begin with a SpaceGlyph. It contains an offset to one SequenceRule table (SpaceAndDashSubRule), which specifies two glyphs in the context sequence, the second of which is a DashGlyph. The SequenceRule table contains a SequenceLookupRecord that lists the position in the sequence where the glyph substitution should occur (position 0) and the index of the SpaceToThinSpaceLookup applied there to replace the SpaceGlyph with a ThinSpaceGlyph. DashAndSpaceSubRuleSet lists all the contexts that begin with a DashGlyph. An offset points to a SequenceRule table (DashAndSpaceSubRule), which specifies two glyphs in the context sequence, and the second one is a SpaceGlyph. The SequenceRule table contains a SequenceLookupRecord that lists the position in the sequence where the glyph substitution should occur, and an index to the same lookup used in the SpaceAndDashSubRule. The lookup replaces the SpaceGlyph with a ThinSpaceGlyph.
Example 7
Hex Data Source Comments SequenceContextFormat1Example 8 illustrates a format 2 contextual substitution using a SequenceContextFormat2 subtable with glyph classes to replace default mark glyphs with their alternative forms. Glyph alternatives are selected depending upon the height of the base glyph that they combine with; that is, the mark glyph used above a high base glyph differs from the mark glyph above a very high base glyph.
In the example, SetMarksHighSubtable contains a Class Definition table that defines four glyph classes: default mark glyphs (class 1), high base glyphs (class 2), very high base glyphs (class 3), and all remaining glyphs, including medium-height base glyphs. The subtable also contains a Coverage table that lists each base glyph that functions as a first component in a context, ordered by glyph index.
Two ClassSequenceRuleSet tables are defined, one for substituting high marks and one for very high marks. No ClassSequenceRuleSets are specified for class 0 and class 1 glyphs because no contexts begin with glyphs from these classes. The classSeqRuleSetOffsets lists offsets to the ClassSequenceRuleSet tables in class value order, so the offset for ClassSequenceRuleSet for class 2 precedes that for class 3.
Within each ClassSequenceRuleSet, a ClassSequencRule is defined. In SetMarksHighSubClassSet2, corresponding to contexts that begin with a glyph in class 2, the ClassSequenceRule table specifies an input sequence with two glyphs: the first glyph in class 2 (a high glyph), and the second in class 1 (a mark glyph). The SequenceLookupRecord specifies applying SubstituteHighMarkLookup at the second position in the sequenceâthat is, a high mark glyph will replace the default mark glyph.
In SetMarksVeryHighSubClassSet3, corresponding to contexts that begin with a glyph in class 3, the ClassSequencRule specifies an input sequence with two glyphs: the first in class 3 (a very high glyph), and the second in class 1 (a mark glyph). The SequenceLookupRecord specifies applying SubstituteVeryHighMarkLookup at the second position in the sequenceâthat is, a very high mark glyph will replace the default mark glyph.
Example 8
Hex Data Source Comments SequenceContextFormat2Example 9 illustrates a format 3 contextual substitution, using a SequenceContextFormat3 subtable with Coverage tables to describe a context sequence of three lowercase glyphs in the pattern: any ascender or descender glyph in position 0 (zero), any x-height glyph in position 1, and any descender glyph in position 2. The overlapping sets of covered glyphs for positions 0 and 2 make Format 3 better for this context than the class-based Format 2.
In positions 0 and 2, swash versions of the glyphs replace the default glyphs. The contextual-substitution lookup is SwashLookup (LookupList index = 0), and its subtable is SwashSubtable. The SwashSubtable defines three Coverage tables: AscenderDescenderCoverage, XheightCoverage, and DescenderCoverage-one for each glyph position in the context sequence, respectively.
The SwashSubtable also defines two SequenceLookupRecords: one that applies to position 0, and one for position 2. (No substitutions are applied to position 1.) The record for position 0 uses a single substitution lookup called AscDescSwashLookup to replace the current ascender or descender glyph with a swash ascender or descender glyph. The record for position 2 uses a single substitution lookup called DescSwashLookup to replace the current descender glyph with a swash descender glyph.
Example 9
Hex Data Source Comments SequenceContextFormat3Example 10 uses a ReverseChainSingleSubstFormat1 subtable to substitute glyphs with a form that has a thick connection to the left (thick exit). This allows the glyph to correctly connect to the letter form to the left of it.
The ThickExitCoverage table is the listing of glyphs to be matched for substitution.
The LookaheadCoverage table, labeled ThickEntryCoverage, lists four glyph IDs for the glyph following a substitution coverage glyph. This lookahead coverage attempts to match the context that will cause the substitution to take place.
The substituteGlyphIDs array provides the glyphs to replace glyphs that correspond in order in the ThickExitCoverage table.
Example 10
Hex Data Source Comments ReverseChainSingleSubstFormat1RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4