Next Generation TypoScript Parser¶
Classification: | tsp |
---|---|
Version: | 1.0.1 |
Language: | en |
Description: | Next Generation TypoScript Parser (tsp) |
Keywords: | typoscript, performance, parser |
Copyright: | 2016 |
Author: | Elmar Hinz |
Email: | t3elmar@gmail.com |
License: | This document is published under the Open Content License available from http://www.opencontent.org/opl.shtml |
Rendered: | June 18, 2016 |
The content of this document is related to TYPO3, a GNU/GPL CMS/Framework available from www.typo3.org.
Table of Contents¶
Introduction¶
This extensions ships a TypoScript parser, that is suited to replace the original TypoScript parser for frontend rendering. In fact a family of parsers has been introduced, specialized on different tasks.
- FE: TypoScriptConditionsPreProcessor
- FE: TypoScriptProductionParser
- BE: TypoScriptSyntaxParser
What it is not¶
No Boost in Performance¶
The parsing of TypoScript just takes a few milliseconds. Hence, it’s not the primary goal to speed up the performance but to improve the architecture. The algorithm is twice as fast as the original algorithm, but with the split into conditions preprocessor and processor the time is about the same again.
What it is¶
Public Presentation¶
First of all this extension is a public presentation of the rewritten parser. Should it replace the old parser of the core? If yes, it needs to be tested in the wild before until it is really stable.
Standalone Usage¶
It’s possible to use the TypoScript parser outside of the TYPO3 CMS, if you like the TypoScript syntax and want to use it for configuration in other fields. This is possible with or without the conditions preprocessor.
Improving the error detection¶
The error detection covers the error detection of the origional parser and tries be be a little better already. Also the displaying of the line numbers has been worked upon. See Screenshots!
Planned improvements in future versions:
- CLI interface to check TS within continuous integration workflows.
- Do syntax highlighting of conditions, instead of printing them in one color.
- Detect the difference of objects and properties, because only objects are allowed ot be copied by reference.
- (Related) Throw verbose errors from TS objects, catch them and and display them into the backend.
New Architecture¶
The reason to write a new TypoScript parser is, to get a modern architecture for it:
- easy to understand
- easy to debug
- easy to extend
A modern parser makes it more easy to get rid of flaws in TypoScript, enhance error detection and add new features like if-else conditions, that work the way you are used to from other languages.
Condition Preprocessor¶
Condition evaluation has been separated into a preprocessor class. It becomes possible to use the TypoScript parser without bothering with conditions at all or apply different types of preprocessors. It’s more simple to enhance the condition preprocessing, as an example think of a fullblown IF-ELSEIF-ELSE-END structure.
As with the old parser the condition matching is handled by a third object. Exchanging this object enables the development of conditions, that address a completly different field than the TYPO3 CMS.
Differences¶
- Escaping of dots by backslash is not supported.
Screenshots¶
Line numbering¶

The line numbers show the numbering of the template and the overall numbering within the template tree.

When line numbering is turned off the error messages contain the line number instead.

When line numbering is turned on the error messages don’t duplicate the information.
Types of errors¶

For invalid lines it is assumed that the user want’s to enter an operator line. It is checked for invalid key and operator.

Braces in access are shown in the line where they occur.

Missing closing braces are detected at conditions and at the end of the template.

An unclosed multiline comment is detected at the end of the template. Multiline comments can be used to comment out parts of the script. Included elements like conditions don’t result in an error.

An unclosed multiline value is detected at the end of the template.
Administration¶
Install the extension, clear caches and check if your frontend is rendered as expected and if you get the advanced error feedback in the backend.
If anything goes wrong, uninstall and report the issue.
Known Issues¶
No Exceptions are Thrown¶
The TypoScript production parser currently doesn’t throw execptions. It expects valid TS as input. To check if your input is valid use the syntax higlighting parser in the BE.
No exceptions are thrown because the original parser doesn’t throw exceptions either. Modules of the backend are not prepared to catch exeptions from the parser and break if execeptions would be thrown from invalid TS.
Intolerant for Invalid TS¶
The TypoScript production parser will silently break, if feed with invalid TS. It is optimized for speed and is less tolerant for invalid TS than the origional parser.
This means in rare cases code that works for the original parser may break with the TypoScript production parser. Use the syntax highlighting parser to fix the TS code.
XCLASS issues¶
The origional parser is not fully replaced but extended by XCLASS registration. The extended class serves as adapter to the standalone classes. Conflicts may occur with extensions, that also XCLASS the core parser.
Appendix¶
Architecture¶
The major goal of the architecture is flexibility, to enable the development of new features and to enable the user to customize the parsers to his needs. The main devices to reach this goal are:
- Separation of concerns
- Programming against interfaces
- Dependency injection
- Classes as identifiers
Separtion of Concerns¶
The classes are rather small to encapsulate a single concern.
The syntax tracker is the most complex example. It focuses on the parsing algorithm, while it delegates the representation of tokens and execptions to dedicated classes. The collecting of tokens and exeptions is done by tracker classes. The tracker objects are finally accessed by a formatter class to produce the highlighted output.
Concerns represented by one class each:
- Parsing
- Representation of a token
- Representation of an exception
- Tracking tokens
- Tracking exceptions
- Formatting the report
Programming against Interfaces¶
Whereever two classes cooperate, there is an interface between them. A class can have multiple interfaces, if it cooperates with multiple other classes. All this interfaces are defined as PHP interfaces, that are stored into the folder Classes/Interfaces.
A class should not depend on other classes to cooperate, but on interfaces. It is free to cooperate with every class that implements the matching interface. Each class can be exchanged by a customized class, as long as the customized class provides the interfaces, that the given classes can talk to.
An example usage of this interfaces are the mock objects of the unit tests. While testing a single class it is decoupled from other classes, by using mock objects, that implement the interface to test against.
Dependency Injection¶
Dependency injection is related to programming against interfaces. If a class
must not depend on other classes, it must not create classes by the keyword
new
itself. Instead objects, that implement the required interface, are
injected.
For sure a place is needed where all this dependency injection is done, where
the objects are created and wired up. This is done in the main application
classes that are stored in the folder Main/
. You can think of an
application class as a kind of configuration, that composes objects according
to your taste. You write a new one of this main configuration classes, to
compose your own application or to alter an existing one.
Classes as Identifieres¶
An exception from the rule, to not use the keyword new
, are the tokens and
exceptions. Each class is designed to serve as an identifier. You can think of
them as constants. The object is created by the keyword new
as you mean
exactly it’s class as identifier, not the interface. They are final
.
Nonetheless there is flexibilty. The exceptions and tokens are created by
parsers and you can exchange the parser creating them. That means you can
exchange the part, that contains the new
keywords.
You can create your own exceptions and tokens by writing new classes. It’s just a few lines each, because they inherit almoust all from abstract classes. The freedom to easily add new tokens and exceptions is one reason, why they are not implemented as constants, apart from the additional functionality a class offers.
Exceptions¶
The Exception Hierarchy¶
- Exception
- TypoScriptParsetimeException (abstract)
- TypoScriptBraceInExcessException
- TypoScriptKeysException
- TypoScriptUnclosedConditionException
- TypoScriptBracesMissingAtConditionE
- TypoScriptOperatorException
- TypoScriptUnclosedValueException
- TypoScriptBracesMissingAtEndOfT
- TypoScriptParsetimeException
- TypoScriptUnclosedCommentException
Where is the TypoScriptRuntimeException?¶
Where is a TypoScriptParsetimeException there should also be a TypoScriptRuntimeException, shouldn’t it?
TypoScript pasetime exceptions occur while parsing TypoScript into a PHP array tree. Runtime exceptions would make sense in the ContentObjectRenderer, when the PHP array tree is used to render the page.
Both parts are connected by the PHP array tree, but apart from that, they are not connected. The array tree could come from a differnt source. The parser could render an array tree for a completly different purpose.
Follows:
1.) A TypoScriptParsetimeException doesn’t belong into the parser package. 2.) Both types of exceptions should not inherit from a common
TypoScriptException to not introduce an unnecessary dependency of the packages. Instead both directly inherit from Exception.
Tokens¶
The Token Hierarchy¶
- AbstractTypoScriptToken
- TypoScriptIgnoredToken
- TypoScriptOperatorToken
- TypoScriptValueToken
- TypoScriptCommentContextToken
- TypoScriptKeysPostspaceToken
- TypoScriptPrespaceToken
- TypoScriptCommentToken
- TypoScriptKeysToken
- TypoScriptValueContextToken
- TypoScriptConditionToken
- TypoScriptOperatorPostspaceToken
- TypoScriptValueCopyToken
Tokens as Type¶
First of all the token object is a device to ship a type and a value. The Type
is the class itself, the value is set with the constructor and accessible by
the method getValue()
.
Tokens to Format Token Tags¶
The token object represents a token type, not a formatting class. Despite of
this, by calling the method toTag()
a HTML tag representation of the token
can be created. This is just additional sugar in addition to the primary
function. String representations of the token can be created by external
methods as well. The tag creation can be customized by the methodes
setTag()
and setClasses()
. The default values are chosen to match the
CSS classes of the existing syntax highlighting of the backend.
Research¶
\Core\TypoScript\Parser\TyposcriptParser¶
Overview¶
The method parse()
is a preprocessor that handels including and
excluding of template parts by condtions.
It doesn’t parse the incoming lines to end first, but delegates the parts
immediately to parseSub()
(a kind of depth-first parsing of the template
tree).
The method doSyntaxHighlight()
is responsible to generate a syntax
highlighted HTML
string. It also calls the preprocessor parse()
but
sets a flag, that disables the coditions, so that all parts are evaluated.
The latter is strange in two aspects. It doesn’t make sense to send syntax
highlighting through a conditioning preprocessor. It doesn’t make sense to
parse into an array tree, when one actually want’s a HTML
string as result.
Conditions¶
In the method parse()
the template is branched into rendered and
non-rendered parts based on conditions. The condition evalutation is delegated
to a $matchObj
, that is injected by parameter.
For each condition the method creates a hash and stores it into
$this->sections
array. This are used by the TemplateService
, to cache
the rendered templates matching combinations of conditions, that evaluate to
true.
Line numbering¶
There is a line number offset, that sums up the line numbers of previously
rendered templates. It is advanced at end of parse()
.
The line numbers of the current template are tracked by $this->rawP
in the
main loop of parseSub()
and also for the condition sections, that evaluate
to false in the method nextDivider()
. $this->rawP
is reset to zero at
the beginning of the rendering of the current template in the method
parse()
.
Error handling¶
method error($errorString, $severity = 2)
.
This method collects into $this->errors[] = [a, b, c, d] with:
- a = error message
- b = severity
- c = line number
- d = template line number offset
Collected messages:
- ‘Script is short of XXX braces.’
- ‘An end brace is in excess.’
- ‘On return to [GLOBAL] scope, the script was short of XXX braces.’
- ‘A multiline value section is not ended with a parenthesis!’
- ‘Object Name String, contains invalid character XXX. Must be alphanumeric or one of: “_:-.”.’
- ‘Object Name String XXX was not followed by any operator, =<>({‘
- ‘### ERROR: XXX’ (Error to be extract from an error comment created in previous parsing steps like during template includes.)
Syntax highlighting¶
Highlighted parsing is controlled by the method doSyntaxHighlight()
.
It sets the flag $this->syntaxHighLight
to true and the template string is
parsed. The flag activates the additional highlighting functionality during the
process of parsing. Finally the method syntaxHighlight_print()
is called to
format the collected results including the error messages.
Registration of highlighted parts of lines is done during parsing by the method
regHighLight()
if the above flag is set. The parts are collected into
$this->highLightData
$this->highLightData_bracelevel
Both arrays count per line, the first one the higlighted sections of the line, the second one the depth of brace nesting.
Breakpoints¶
A breakpoint is a line number in $this->breakPointLN
to break the
execution of the rendering. The method parseSub()
returns with a marker
[_BREAK]
. This marker stops the further execution of the main loop
in parse()
.
TemplateService¶
TemplateService
is a service that makes use of the parser. A main task of
TemplateService is, to cache the rendered template for different combinations
of conditions of a page.
ExtendedTemplateService¶
The class ExtendedTemplateService
contains method for the TS module in TYPO3
backend. It extends TemplateService.
Lessons Learned¶
The overall time to parse the TypoScript of a website takes just a few milliseconds. It is not a critical part of the overall page rendering time. Yet the development of this extension was also focused on performance.
Time to parse the templates vs. time to parse TypoScript¶
When measured with the TYPO3 core time tracker (admin panel) the template parsing takes a few hundred milliseconds. When measuring and summing up all calls to the TypoScript parse function (TypoScriptParser::parse()) it takes just a few milliseconds. The difference is most likley to be explained by I/O calls to read the templates.
Non-Recursive Parser¶
The Non-Recursive Parser
is the approach taken by this parser. The whole
rendering happens within one function by using simple loop structures. Calls to
itself or other methods are avoided as far as reasonable. This turns out to be
twice as fast as the recursive Original TypoScript Parser
.
Original TypoScript Parser¶
The original parser of the TYPO3 core uses recursive calls to handle the nesting of the braces of the object name pathes.
JSON Parser¶
The idea of the JSON Parser
was, to use the PHP function json_decode
to
create the large TypoScript
tree consisting of hundreds of PHP arrays on
the binary level. TypoScript
was rewritten to a valid JSON
string as
input.
Unfortunately json_decode
does merging but not recursive merging. As
overwriting is a feature of TypoScript
this requires to prepare the
JSON
rendering by any approach to do the overwriting in advance. An array
was created, containing the full object path as key and the value as value to
solve this. Although this creates no nested tree, it takes time.
Together with the conversion to a JSON
string in the second step, there is
no advantage in speed. Taking the non-recursive approach to handle the two
steps, it ends up in a similar speed as the Original TypoScript Parser
.
TODO¶
- Class hierarchies
- Update the screenshots.
- CLI interface
- Hash sections for the TemplateService.
- Breakpoints
- Errors from previous parsing steps (see: research)