Spoc-Text already supports Highlighting of more than 20 Programming and Markup Languages, but You can add Your own Language.
Languages are defined by *.lang Files in the Sub-Folder _Resources/Syntax. These are XML Files with a defined Schema that allows You to define Parsing Grammars based on Regular Expression Text
Matching.
Lang Files can also define Styles, but because these are volatile and changed by more People, it is better to create a separate, associated *.CSS File, because the Syntax and Structure is much
easier. It also clearly separates the Grammar from the Highlighting Styles. Mind though, that the Styles and Keywords from CSS Files are defined globally and cannot be restricted to a certain
Text Section. For that Purpose You need to use <Style> Elements.
Lang Files contain multiple <Grammar> Elements that define Sets of <Match> and <Range> Elements.
When parsing a Text only a single <Grammar> is currently active, but a <Range> can contain or refer to a different <Grammar> for its Section. This happens by either nesting a <Grammar> in the <Range> or by referencing it by Name.
<Grammar>s use so-called Regular Expressions for Text-Matching. They may contain these Sub-Elements:
<Match>es and <Range>s compete when parsing Text. <Range>s are determined first, because they can define Structures that may span the whole File. 'Earlier' (left or above) Matches in the Text have Precedence. When two <Range>s compete, the one defined earlier in the *.lang File wins.
Similarly <Match>es are determined within these Sections: by Text Position first and then by Position in the Style File (when matching at the same Text Position).
This Section explains the actual XML Grammar used for Syntax-Highlighting all Kinds of XML Files. Of course You can (and should) use Spoc-Text to edit them, because it defines a
Highlighting-Scheme that makes it easier to read and Autocomplete-Keywords that help in writing XML. The Sample XML Code You see in this Article was simply copied and pasted into this
Web-Page.
<Grammars name="XML" extensions=".xml .xsl .xslt .xsd ... .wsdl .disco .ps1xml .nuspec" xmlns="http://spoc-web.com/xtext/styles/2015"
folding="XmlFolding" indenting="DefaultIndentation" keep_left_markup="1" keep_right_markup="1">
This Start Tag defines the Language Name, the applicable File-Extensions, Strategies to identify Folding Section and to indent the following Line. The keep_markup Attributes are usually defaulted to 1/true with Programming Languages and to 0/false for Markup Languages, so the Markup will be removed when copying or printing.
The first <Grammar> Sub-Element defines the Default Rule-Set. That is the initially active <Grammar> and must either NOT be named or the Name must match the Language Name.
<Grammar name="XML">
It contains several <Range> Definitions for different Sections of an XML File. Each <Range> selects a style and has a begin and an end Attribute
containing regular Expressions that detect whether this Range starts or ends. Styles are defined either in this Rules File or, better, in separate CSS Files.
<Range style=".Comment" multiLine="1" start="<!--" stop="-->"/>
<Range style=".CData" multiLine="1" start="<!\[CDATA\[" stop="]]>" />
<Range style=".DocType" multiLine="1" start="<!DOCTYPE" stop=">" />
<Range style=".XmlDeclaration" multiLine="1" start="<\?" stop="\?>" />
You can also import Grammars from different Files to reuse them or to form a unified Language:
<Import grammarRef="EntitySet"/>
This adds the Rules for XML/HTML Entities to the Base XML Grammar.
The next <Range is especially interesting, because it contains a new <Grammar> to parse XML Attributes. This means that the Text between the start and the stop Characters the inner Rules and Ranges are used for parsing and their Styles applied:
<Range style=".XmlTag" multiLine="1" start="<" stop=">" >
<Grammar name="Tag">
<Range style=".AttributeValue" multiLine="1" grammarRef="EntitySet" start='"' stop='"|(?=<)' />
<Range style=".AttributeValue" multiLine="1" grammarRef="EntitySet" start="'" stop="'|(?=<)" />
<Match style=".AttributeValue">=</Match>
<Match style=".AttributeName">[\d\w_\-\.]+(?=\s*=)</Match>
</Grammar>
</Range>
Programming Language Grammars often define a Set of built-in Literals, so-called Keywords. These are highlighted with the given Style and presented at the Top of the Completion List when
activated. Since they are used literally, they don't need to be Regex-escaped. They have a very simple Syntax to declare them, including a brief Description that is displayed as Tooltip to
support Selection from the List:
<KeyWords style=".Standard">
<Key Word="xml:lang=" content="Universal Attribute to indicate the Language of the
(Text-)Content"/>
<Key Word="xml:space='preserve'" content="preserve|default Universal Attribute to
control the Language of the (Text-)Content"/>
...
</KeyWords>
Keywords are like <Match>es, only they don't use regular Expressions and therefore don't need to be escaped
(except for being defined in an XML or CSS File which require their own escaping.
As mentioned above, You don't have to separate Styling and Grammar. You can define Styles anywhere in the *.lang File using the supported subset of CSS Properties:
<Style name=".Comment" color="Green" content="<!-- comment -->" />
<Style name=".CData" color="Blue" content="<![CDATA[data]]>" />
<Style name=".XmlTag" color="DarkMagenta" content='<tag attribute="value" />' />
<Style name=".AttributeName" color="Red" content='<tag attribute="value" />' />
<Style name=".AttributeValue" color="Blue" content='<tag attribute="value" />' />
<Style name=".Entity" color="Teal" content="index.aspx?a=1&amp;b=2" />
Styles and Keywords are better defined in CSS Files, because it is much easier to write in CSS than in the *.lang Grammar. Additionally You can specify HTML-Translations there using the
target-name Property. The only disadvantage is that such Definitions are global.
And that's all about Lang Files; now go and specify the Styling for Your own Language.