Next Generation TypoScript Parser

Classification:tsp
Version:1.0.1
Language:en
Description:Next Generation TypoScript Parser (tsp)
Keywords:typoscript, performance, parser
Copyright:2016
Author:Elmar Hinz
Email:t3elmar@gmail.com
License:This document is published under the Open Content License available from http://www.opencontent.org/opl.shtml
Rendered:June 18, 2016

The content of this document is related to TYPO3, a GNU/GPL CMS/Framework available from www.typo3.org.

Table of Contents

Introduction

This extensions ships a TypoScript parser, that is suited to replace the original TypoScript parser for frontend rendering. In fact a family of parsers has been introduced, specialized on different tasks.

  • FE: TypoScriptConditionsPreProcessor
  • FE: TypoScriptProductionParser
  • BE: TypoScriptSyntaxParser

What it is not

No Boost in Performance

The parsing of TypoScript just takes a few milliseconds. Hence, it’s not the primary goal to speed up the performance but to improve the architecture. The algorithm is twice as fast as the original algorithm, but with the split into conditions preprocessor and processor the time is about the same again.

What it is

Public Presentation

First of all this extension is a public presentation of the rewritten parser. Should it replace the old parser of the core? If yes, it needs to be tested in the wild before until it is really stable.

Standalone Usage

It’s possible to use the TypoScript parser outside of the TYPO3 CMS, if you like the TypoScript syntax and want to use it for configuration in other fields. This is possible with or without the conditions preprocessor.

Improving the error detection

The error detection covers the error detection of the origional parser and tries be be a little better already. Also the displaying of the line numbers has been worked upon. See Screenshots!

Planned improvements in future versions:

  • CLI interface to check TS within continuous integration workflows.
  • Do syntax highlighting of conditions, instead of printing them in one color.
  • Detect the difference of objects and properties, because only objects are allowed ot be copied by reference.
  • (Related) Throw verbose errors from TS objects, catch them and and display them into the backend.
New Architecture

The reason to write a new TypoScript parser is, to get a modern architecture for it:

  • easy to understand
  • easy to debug
  • easy to extend

A modern parser makes it more easy to get rid of flaws in TypoScript, enhance error detection and add new features like if-else conditions, that work the way you are used to from other languages.

Condition Preprocessor

Condition evaluation has been separated into a preprocessor class. It becomes possible to use the TypoScript parser without bothering with conditions at all or apply different types of preprocessors. It’s more simple to enhance the condition preprocessing, as an example think of a fullblown IF-ELSEIF-ELSE-END structure.

As with the old parser the condition matching is handled by a third object. Exchanging this object enables the development of conditions, that address a completly different field than the TYPO3 CMS.

Differences

  • Escaping of dots by backslash is not supported.

Screenshots

Line numbering

_images/TowKindsOfLineNumbers.png

The line numbers show the numbering of the template and the overall numbering within the template tree.

_images/ErrorsWithLineNumberingTurnedOff.png

When line numbering is turned off the error messages contain the line number instead.

_images/ErrorsWithLineNumberingTurnedOn.png

When line numbering is turned on the error messages don’t duplicate the information.

Types of errors

_images/KeyValueErrors.png

For invalid lines it is assumed that the user want’s to enter an operator line. It is checked for invalid key and operator.

_images/BracesInExess.png

Braces in access are shown in the line where they occur.

_images/BracesMissing.png

Missing closing braces are detected at conditions and at the end of the template.

_images/UnclosedComment.png

An unclosed multiline comment is detected at the end of the template. Multiline comments can be used to comment out parts of the script. Included elements like conditions don’t result in an error.

_images/UnclosedValue.png

An unclosed multiline value is detected at the end of the template.

Administration

Install the extension, clear caches and check if your frontend is rendered as expected and if you get the advanced error feedback in the backend.

If anything goes wrong, uninstall and report the issue.

https://github.com/elmar-hinz/TX.tsp/issues

Known Issues

No Exceptions are Thrown

The TypoScript production parser currently doesn’t throw execptions. It expects valid TS as input. To check if your input is valid use the syntax higlighting parser in the BE.

No exceptions are thrown because the original parser doesn’t throw exceptions either. Modules of the backend are not prepared to catch exeptions from the parser and break if execeptions would be thrown from invalid TS.

Intolerant for Invalid TS

The TypoScript production parser will silently break, if feed with invalid TS. It is optimized for speed and is less tolerant for invalid TS than the origional parser.

This means in rare cases code that works for the original parser may break with the TypoScript production parser. Use the syntax highlighting parser to fix the TS code.

XCLASS issues

The origional parser is not fully replaced but extended by XCLASS registration. The extended class serves as adapter to the standalone classes. Conflicts may occur with extensions, that also XCLASS the core parser.

Appendix

Architecture

The major goal of the architecture is flexibility, to enable the development of new features and to enable the user to customize the parsers to his needs. The main devices to reach this goal are:

  • Separation of concerns
  • Programming against interfaces
  • Dependency injection
  • Classes as identifiers

Separtion of Concerns

The classes are rather small to encapsulate a single concern.

The syntax tracker is the most complex example. It focuses on the parsing algorithm, while it delegates the representation of tokens and execptions to dedicated classes. The collecting of tokens and exeptions is done by tracker classes. The tracker objects are finally accessed by a formatter class to produce the highlighted output.

Concerns represented by one class each:

  • Parsing
  • Representation of a token
  • Representation of an exception
  • Tracking tokens
  • Tracking exceptions
  • Formatting the report

Programming against Interfaces

Whereever two classes cooperate, there is an interface between them. A class can have multiple interfaces, if it cooperates with multiple other classes. All this interfaces are defined as PHP interfaces, that are stored into the folder Classes/Interfaces.

A class should not depend on other classes to cooperate, but on interfaces. It is free to cooperate with every class that implements the matching interface. Each class can be exchanged by a customized class, as long as the customized class provides the interfaces, that the given classes can talk to.

An example usage of this interfaces are the mock objects of the unit tests. While testing a single class it is decoupled from other classes, by using mock objects, that implement the interface to test against.

Dependency Injection

Dependency injection is related to programming against interfaces. If a class must not depend on other classes, it must not create classes by the keyword new itself. Instead objects, that implement the required interface, are injected.

For sure a place is needed where all this dependency injection is done, where the objects are created and wired up. This is done in the main application classes that are stored in the folder Main/. You can think of an application class as a kind of configuration, that composes objects according to your taste. You write a new one of this main configuration classes, to compose your own application or to alter an existing one.

Classes as Identifieres

An exception from the rule, to not use the keyword new, are the tokens and exceptions. Each class is designed to serve as an identifier. You can think of them as constants. The object is created by the keyword new as you mean exactly it’s class as identifier, not the interface. They are final.

Nonetheless there is flexibilty. The exceptions and tokens are created by parsers and you can exchange the parser creating them. That means you can exchange the part, that contains the new keywords.

You can create your own exceptions and tokens by writing new classes. It’s just a few lines each, because they inherit almoust all from abstract classes. The freedom to easily add new tokens and exceptions is one reason, why they are not implemented as constants, apart from the additional functionality a class offers.

Exceptions

The Exception Hierarchy

  • Exception
    • TypoScriptParsetimeException (abstract)
      • TypoScriptBraceInExcessException
      • TypoScriptKeysException
      • TypoScriptUnclosedConditionException
      • TypoScriptBracesMissingAtConditionE
      • TypoScriptOperatorException
      • TypoScriptUnclosedValueException
      • TypoScriptBracesMissingAtEndOfT
      • TypoScriptParsetimeException
      • TypoScriptUnclosedCommentException

Where is the TypoScriptRuntimeException?

Where is a TypoScriptParsetimeException there should also be a TypoScriptRuntimeException, shouldn’t it?

TypoScript pasetime exceptions occur while parsing TypoScript into a PHP array tree. Runtime exceptions would make sense in the ContentObjectRenderer, when the PHP array tree is used to render the page.

Both parts are connected by the PHP array tree, but apart from that, they are not connected. The array tree could come from a differnt source. The parser could render an array tree for a completly different purpose.

Follows:

1.) A TypoScriptParsetimeException doesn’t belong into the parser package. 2.) Both types of exceptions should not inherit from a common

TypoScriptException to not introduce an unnecessary dependency of the packages. Instead both directly inherit from Exception.

Tokens

The Token Hierarchy

  • AbstractTypoScriptToken
    • TypoScriptIgnoredToken
    • TypoScriptOperatorToken
    • TypoScriptValueToken
    • TypoScriptCommentContextToken
    • TypoScriptKeysPostspaceToken
    • TypoScriptPrespaceToken
    • TypoScriptCommentToken
    • TypoScriptKeysToken
    • TypoScriptValueContextToken
    • TypoScriptConditionToken
    • TypoScriptOperatorPostspaceToken
    • TypoScriptValueCopyToken

Tokens as Type

First of all the token object is a device to ship a type and a value. The Type is the class itself, the value is set with the constructor and accessible by the method getValue().

Tokens to Format Token Tags

The token object represents a token type, not a formatting class. Despite of this, by calling the method toTag() a HTML tag representation of the token can be created. This is just additional sugar in addition to the primary function. String representations of the token can be created by external methods as well. The tag creation can be customized by the methodes setTag() and setClasses(). The default values are chosen to match the CSS classes of the existing syntax highlighting of the backend.

Research

\Core\TypoScript\Parser\TyposcriptParser

Overview

The method parse() is a preprocessor that handels including and excluding of template parts by condtions.

It doesn’t parse the incoming lines to end first, but delegates the parts immediately to parseSub() (a kind of depth-first parsing of the template tree).

The method doSyntaxHighlight() is responsible to generate a syntax highlighted HTML string. It also calls the preprocessor parse() but sets a flag, that disables the coditions, so that all parts are evaluated.

The latter is strange in two aspects. It doesn’t make sense to send syntax highlighting through a conditioning preprocessor. It doesn’t make sense to parse into an array tree, when one actually want’s a HTML string as result.

Conditions

In the method parse() the template is branched into rendered and non-rendered parts based on conditions. The condition evalutation is delegated to a $matchObj, that is injected by parameter.

For each condition the method creates a hash and stores it into $this->sections array. This are used by the TemplateService, to cache the rendered templates matching combinations of conditions, that evaluate to true.

Line numbering

There is a line number offset, that sums up the line numbers of previously rendered templates. It is advanced at end of parse().

The line numbers of the current template are tracked by $this->rawP in the main loop of parseSub() and also for the condition sections, that evaluate to false in the method nextDivider(). $this->rawP is reset to zero at the beginning of the rendering of the current template in the method parse().

Error handling

method error($errorString, $severity = 2).

This method collects into $this->errors[] = [a, b, c, d] with:

  • a = error message
  • b = severity
  • c = line number
  • d = template line number offset

Collected messages:

  • ‘Script is short of XXX braces.’
  • ‘An end brace is in excess.’
  • ‘On return to [GLOBAL] scope, the script was short of XXX braces.’
  • ‘A multiline value section is not ended with a parenthesis!’
  • ‘Object Name String, contains invalid character XXX. Must be alphanumeric or one of: “_:-.”.’
  • ‘Object Name String XXX was not followed by any operator, =<>({‘
  • ‘### ERROR: XXX’ (Error to be extract from an error comment created in previous parsing steps like during template includes.)
Syntax highlighting

Highlighted parsing is controlled by the method doSyntaxHighlight().

It sets the flag $this->syntaxHighLight to true and the template string is parsed. The flag activates the additional highlighting functionality during the process of parsing. Finally the method syntaxHighlight_print() is called to format the collected results including the error messages.

Registration of highlighted parts of lines is done during parsing by the method regHighLight() if the above flag is set. The parts are collected into

  • $this->highLightData
  • $this->highLightData_bracelevel

Both arrays count per line, the first one the higlighted sections of the line, the second one the depth of brace nesting.

Breakpoints

A breakpoint is a line number in $this->breakPointLN to break the execution of the rendering. The method parseSub() returns with a marker [_BREAK]. This marker stops the further execution of the main loop in parse().

TemplateService

TemplateService is a service that makes use of the parser. A main task of TemplateService is, to cache the rendered template for different combinations of conditions of a page.

ExtendedTemplateService

The class ExtendedTemplateService contains method for the TS module in TYPO3 backend. It extends TemplateService.

Lessons Learned

The overall time to parse the TypoScript of a website takes just a few milliseconds. It is not a critical part of the overall page rendering time. Yet the development of this extension was also focused on performance.

Time to parse the templates vs. time to parse TypoScript

When measured with the TYPO3 core time tracker (admin panel) the template parsing takes a few hundred milliseconds. When measuring and summing up all calls to the TypoScript parse function (TypoScriptParser::parse()) it takes just a few milliseconds. The difference is most likley to be explained by I/O calls to read the templates.

Non-Recursive Parser

The Non-Recursive Parser is the approach taken by this parser. The whole rendering happens within one function by using simple loop structures. Calls to itself or other methods are avoided as far as reasonable. This turns out to be twice as fast as the recursive Original TypoScript Parser.

Original TypoScript Parser

The original parser of the TYPO3 core uses recursive calls to handle the nesting of the braces of the object name pathes.

JSON Parser

The idea of the JSON Parser was, to use the PHP function json_decode to create the large TypoScript tree consisting of hundreds of PHP arrays on the binary level. TypoScript was rewritten to a valid JSON string as input.

Unfortunately json_decode does merging but not recursive merging. As overwriting is a feature of TypoScript this requires to prepare the JSON rendering by any approach to do the overwriting in advance. An array was created, containing the full object path as key and the value as value to solve this. Although this creates no nested tree, it takes time.

Together with the conversion to a JSON string in the second step, there is no advantage in speed. Taking the non-recursive approach to handle the two steps, it ends up in a similar speed as the Original TypoScript Parser.

TODO

  • Class hierarchies
  • Update the screenshots.
  • CLI interface
  • Hash sections for the TemplateService.
  • Breakpoints
  • Errors from previous parsing steps (see: research)