Hacking a simple syntax highlighter around Spec’s TextModel

The Text object in Pharo.

I need to do basic highlighting of SQL code inside text editor in Pharo. Spec is high-level GUI framework allowing one one to design re-usable widgets easily in Pharo. To get text input, Spec provides a TextModel widget. When using this widget for “raw text” input or for “Smalltalk code” input, it is easy. Respectively use the widget as it or configure the TextModel with #beForCode message. But when it gets to highlight another syntax, it is another story.

No easy and high-level mechanism seems to be provided in order to build your own syntax highlighter for a TextModel (I had a look at Shout package which seems to provide this kind of mechanism for Rubric’s text editor but I did not find a way to hack at this level and get results at Spec’s level).

The whole hack explained here starts from the following observation. If you add attributes to a Text instance, they get displayed in the TextModel.

text := 'Hello World!' asText.
text
addAttribute: TextEmphasis bold from: 1 to: 5;
addAttribute: (TextColor color: Color red) from: 7 to: 11;
addAttribute: (TextColor color: Color green) from: 12 to: 12.

TextModel new
text: text;
openWithSpec
A TextModel with a stylised Text.
All the source code associated with this blogpost is open-source, released under the MIT licence and is available in this github repository.

A quick syntax highlighter

Let’s create SpecSyntaxHighlighting package. A dedicated object is needed to do the highlighting stuff. Let’s name it SSHSyntaxHighlighter.

Object subclass: #SSHSyntaxHighlighter
instanceVariableNames: ''
classVariableNames: ''
package: 'SpecSyntaxHighlighting'

This class is abstract and provides the three following messages:

  • #applyOn: which is the public interface to highlight a Text instance,
SSHSyntaxHighlighter>>#applyOn: aText
self
clean: aText;
highlight: aText
  • #clean: which cleans the attributes of a Text instance; and
SSHSyntaxHighlighter>>#clean: aText
aText runs setRuns: { aText size } setValues: #(())
  • #highlight: which adds attributes to the parts of the Text requiring highlighting. #highlight: is the message that SSHSyntaxHighlighter subclasses should override to actually perform the highlighting.
SSHSyntaxHighlighter>>#highlight: aText
self subclassResponsibility

Let’s add two subclasses to SSHSyntaxHighlighter:

1. SSHNullSyntaxHighlighter which is an implementation of the null object design pattern and will be required when no syntax highlighting is needed. This object overrides #highlight: and #clean: and does nothing in both methods implementations.

2. SSHRegexSyntaxHighlighter which allows one to associate TextAttributes to a regular expressions in order to perform basic syntax highlighting on a Text instance.

SSHSyntaxHighlighter subclass: #SSHRegexSyntaxHighlighter
instanceVariableNames: 'regexToTextAttribute'
classVariableNames: ''
package: 'SpecSyntaxHighlighting'

The mapping between regular expressions and TextAttributes will be stored as a Dictionary in the instance variable #regexToTextAttribute. This inst. var. has an accessor and a mutator. The #initialize method creates an empty Dictionary in #regexToTextAttribute inst. var.

SSHRegexSyntaxHighlighter>>#initialize
super initialize.
self regexToTextAttribute: Dictionary new

It is now possible to implement the #highlight: method:

SSHRegexSyntaxHighlighter>>#highlight: aText
| str |
str := aText asString.
self regexToTextAttribute keysAndValuesDo: [ :regex :attributes |
(str allRangesOfRegexMatches: regex) do: [ :interval |
attributes do: [ :attribute |
aText addAttribute: attribute from: interval first to: interval last ] ] ]

Our regex-based syntax highlighter is now ready to be used. Nevertheless, the following methods are added to make the manipulation of the regexToTextAttribute dictionary easier.

  • First a wrapper method to add a regex and its attributes:
SSHRegexSyntaxHighlighter>>#addRegex: aString withAttributes: anArrayOfTextAttributes
self regexToTextAttribute
at: aString put: anArrayOfTextAttributes
  • Second, an helper to add a regex with a single text attribute.
SSHRegexSyntaxHighlighter>>#addRegex: aString withAttribute: aTextAttribute
self addRegex: aString withAttributes: { aTextAttribute }
  • Then, an helper for keyword-highlighting The keyword-string provided as parameter is converted to a regular expression allowing one to select only the keyword when it is alone (i.e. with a space before and after).
SSHRegexSyntaxHighlighter>>#addKeyword: aString withAttribute: aTextAttribute
self addRegex: '(^|\W)',aString,'($|\W)' withAttribute: aTextAttribute
  • Finally an helper allowing one to add multiple keywords sharing the same text attribute.
SSHRegexSyntaxHighlighter>>#addKeywords: aCollectionOfKeywords withAttribute: aTextAttribute
aCollectionOfKeywords do: [ :keyword |
self addKeyword: keyword withAttribute: aTextAttribute ]

Integration in Spec framework

Now that a basic syntax highlighter is available, it is the moment to build a text widget allowing one to use it. The SSHTextModel object is created as a subclass of TextModel.

TextModel subclass: #SSHTextModel
instanceVariableNames: 'syntaxHighlighterHolder'
classVariableNames: ''
package: 'SpecSyntaxHighlighting'

The #syntaxHighlighterHolder inst. var. will hold a NewValueHolder holding an instance of one of SSHSyntaxHighlighter’s subclass. Two methods to change the value of this NewValueHolder are implemented:

SSHTextModel>>#syntaxHighlighter
^ syntaxHighlighterHolder value
SSHTextModel>>#syntaxHighlighter: aSSHSyntaxHighlighter
syntaxHighlighterHolder value: aSSHSyntaxHighlighter

Here comes the “hacky part”. We need the syntax highlighting to be executed each time the user type a letter. The method #whenTextChanged: taking a Block with one argument (the Text instance) as parameter exists. Nevertheless, modifying the attributes of the Text in the block provided to this method does not work. As a temporary solution, I decided to do the highlighting in the block provided to #acceptBlock: method. This method is called each time the user accepts the input (i.e. each time the user presses Cmd+S). The problem with #acceptBlock: is that the user has to save its code to update syntax highlighinh. To solve this inconvenience, the method #autoAccept: with true as parameter can be called.

To still have the possibility to perform actions when the text is accepted, SSHTextModel overrides #autoAccept: as follow:

SSHTextModel>>#acceptBlock: aBlock
super acceptBlock: [ :text |
self syntaxHighlighter applyOn: text.
aBlock value: text ]

The fact that #acceptBlock: is used to perform syntax highlighting is thus transparent for the users of SSHTextModel.

Finally the following #initialize method is implemented in order to set up the object correctly when it is instantiated:

SSHTextModel>>#initialize
super initialize.
syntaxHighlighterHolder := SSHNullSyntaxHighlighter new asValueHolder.
self acceptBlock: [ :text | ]

Voilà!

Back to our SQL highlighter

keywords := #(ADD ALL ALLOCATE ALTER AND ANY ARE ARRAY AS ASENSITIVE ASYMMETRIC AT ATOMIC AUTHORIZATION BEGIN BETWEEN BIGINT BINARY BLOB BOOLEAN BOTH BY CALL CALLED CASCADED CASE CAST CHAR CHARACTER CHECK CLOB CLOSE COLLATE COLUMN COMMIT CONDITION CONNECT CONSTRAINT CONTINUE CORRESPONDING CREATE CROSS CUBE CURRENT CURRENT_DATE CURRENT_DEFAULT_TRANSFORM_GROUP CURRENT_PATH CURRENT_ROLE CURRENT_TIME CURRENT_TIMESTAMP CURRENT_TRANSFORM_GROUP_FOR_TYPE CURRENT_USER CURSOR CYCLE DATE DAY DEALLOCATE DEC DECIMAL DECLARE DEFAULT DELETE DEREF DESCRIBE DETERMINISTIC DISCONNECT DISTINCT DO DOUBLE DROP DYNAMIC EACH ELEMENT ELSE ELSEIF END ESCAPE EXCEPT EXEC EXECUTE EXISTS EXIT EXTERNAL FALSE FETCH FILTER FLOAT FOR FOREIGN FREE FROM FULL FUNCTION GET GLOBAL GRANT GROUP GROUPING HANDLER HAVING HOLD HOUR IDENTITY IF IMMEDIATE IN INDICATOR INNER INOUT INPUT INSENSITIVE INSERT INT INTEGER INTERSECT INTERVAL INTO IS ITERATE JOIN LANGUAGE LARGE LATERAL LEADING LEAVE LEFT LIKE LOCAL LOCALTIME LOCALTIMESTAMP LOOP MATCH MEMBER MERGE METHOD MINUTE MODIFIES MODULE MONTH MULTISET NATIONAL NATURAL NCHAR NCLOB NEW NO NONE NOT NULL NUMERIC OF OLD ON ONLY OPEN OR ORDER OUT OUTER OUTPUT OVER OVERLAPS PARAMETER PARTITION PRECISION PREPARE PRIMARY PROCEDURE RANGE READS REAL RECURSIVE REF REFERENCES REFERENCING RELEASE REPEAT RESIGNAL RESULT RETURN RETURNS REVOKE RIGHT ROLLBACK ROLLUP ROW ROWS SAVEPOINT SCOPE SCROLL SEARCH SECOND SELECT SENSITIVE SESSION_USER SET SIGNAL SIMILAR SMALLINT SOME SPECIFIC SPECIFICTYPE SQL SQLEXCEPTION SQLSTATE SQLWARNING START STATIC SUBMULTISET SYMMETRIC SYSTEM SYSTEM_USER TABLE TABLESAMPLE THEN TIME TIMESTAMP TIMEZONE_HOUR TIMEZONE_MINUTE TO TRAILING TRANSLATION TREAT TRIGGER TRUE UNDO UNION UNIQUE UNKNOWN UNNEST UNTIL UPDATE USER USING VALUE VALUES VARCHAR VARYING WHEN WHENEVER WHERE WHILE WINDOW WITH WITHIN WITHOUT YEAR).
highlighter := SSHRegexSyntaxHighlighter new
addKeywords: keywords withAttribute: TextEmphasis bold;
yourself.
SSHTextModel new
autoAccept: true;
syntaxHighlighter: (highlighter);
text: 'SELECT * FROM table WHERE id = 42 AND property = foo ORDER BY name;';
openWithSpec.

Let’s implement syntax highlighting of number constants (regex built from the documentation of PostgreSQL ):

highlighter
addRegex: '\d+|\d+\.\d*(e(\+|-)?\d+)?|\d*\.\d+(e(\+|-)?\d+)?|\d+e(\+|-)?\d' withAttribute: (TextColor color: Color green darker)

And that’s all for this blogpost. The implementation is hacky (the #acceptBlock: part) but it does the job. Do not hesitate to drop me a comment if you have other ideas around this subject.