Build Your Text Editor With Rust! Final Part

Kofi Otuo
15 min readOct 22, 2021

--

This part concludes the series “Build Your Text Editor With Rust”. For those who may have stumbled on this page without going through the other parts, here is a summary of what we’ve done so far: We started with “Setup” to reading keypress and entering “raw mode”, to drawing on the screen and moving the cursor around, to displaying text files (making our program a text view), to editing text files and saving changes, to implementing a cool search feature and finally to this part where we add syntax highlighting.

Let’s begin by coloring each digit cyan:

Highlight Digits

We no longer add the text we want to display right to editor_contents. Instead, we loop through all the characters and then check if the current character is a digit. If it is, we add the escape sequence for cyan color, add the character and finally, reset the color (You can find what each escape sequence maps to in terms of ANSI codes on wikipedia). The colors that we’ll use are supported by many terminals. If the terminal doesn’t support colors then we just add the digit and ignore the error; let _ = ... syntax just prevents the compiler from complaining of unhandled errors.

Refactor Syntax Highlighting ✨

Now we know how to color digits, but what we want is to actually highlight entire strings, keywords, comments, and so on. We can’t just decide what color to use based on the class of each character, like we’re doing with digits currently. What we want to do is figure out the highlighting for each row of text before we display it, and then re-highlight a line whenever it gets changed.

The highlighting rules might differ from from one programming language to another. For instance how we might want to highlight the syntax in say a Python file might be different for how we’d like to highlight it if it were a Rust file. But what would like to do is to write a “generic” method for highlighting syntax which could be easily modified or specialized for different programming languages. So we’re going to create a trait (a.k.a interface in most object oriented programming languages such as Java and Kotlin) to help us implement that.

But even before creating the trait, let’s add a new field, highlight, to Row which would contain the highlight information for that Row.

Now let’s add the trait.

For now, the trait has a method which will go through the characters of a Row and highlight them by setting each value in the highlight array. Let’s write one implementation of SyntaxHighlight:

The code is wrapped in a macro before we implement SyntaxHighlight. This kind of provides a #[derive] for that struct, just that we’re using macro_rules so it’s less complex. The macro has one parameter and that’s the name of the struct you want to create. The macro then creates the struct and derives SyntaxHighlight for it.

In the update_syntax method, we get the Row we have to update and then loop through the chars of that Row and then update Row.highlight. We add the simple assert_eq!() to make sure our code works as expected. You can remove it after you’re sure the program works fine.

Next, let’s make an syntax_color() function that maps values in highlight to the actual color codes we want to draw them with:

Note that Color is from crossterm::style::Colors.

Now let’s finally and the function that would draw the colored text:

The color_row function is the same as we had in draw_rows but modified a bit to use the trait methods instead. Let’s now add SyntaxHighlight to Output:

Since we’re using dynamic dispatch, we use the Box<dyn > syntax. Not every file type might have a syntax highlight method implemented, hence we wrap that the field in Option<T>. We’re using the RustHighlight struct that we created from the macro as the default value for now. In draw_rows, we replace the coloring code to use syntax_highlight if it’s available.

To the code working, let’s add update_syntax everywhere that modifies Row, similar to how we implemented render_row():

Now whenever there’s a change in any of the rows, update_syntax would be called to reflect the changes. That concludes our refactoring of the syntax highlighting system.

Colorful Search Results 🔭

Before we start highlighting strings and keywords and all that, let’s use our highlighting system to highlight search results. We’ll start by adding SearchMatch to the HighlightType enum, and mapping it to the color blue:

Now all we have to do is set the matched substring’s highlight to SearchMatch in our search code:

First, we get a mutable reference to the Row with the query. To highlight the matched word, we just loop through the highlight array, starting from the index to index + query‘s length, setting the corresponding highlight value to SearchMatch

Restore Syntax Highlighting After Search ⚖️

Currently, search results stay highlighted in blue even after the user is done using the search feature. We want to restore the highlight values to its previous value after each search. We’re going to derive Clone for HighlightType, similar to what we did for CursorController. Then we’ll save the original highlight and the corresponding row_index in SearchIndex:

Now let’s modify find_callback:

At the start of the function we use .take() to return the owned version of the value in previous_highlight, if any. take() replaces the original value with None, which is what we want in this case. If there was a previous match, we reset that row’s highlight. When there’s a match we store the previous highlight before modifying.

Assignment for the reader:

You’d realize that that when you’re on the last match (irrespective of the direction) and you keep pressing the Arrow key that got you to the last match, the color for that row resets. This is the expected behavior. What if we want the color to remain? Before you continue, you should try modifying the program to fix that.

Optimize color_row() ⚡️

So far the coloring works fine but color_row performs 3 operations which may not be necessary. First, it writes the escape sequence for the color, then it writes a single letter and finally it writes the escape sequence to reset the color. In practice, most characters are going to be the same color as the previous character, so most of the escape sequences are redundant. Let’s keep track of the current text color as we loop through the characters, and only print out an escape sequence when the color changes:

After looping through the chars we reset the color so that the remaining rows aren’t affected.

Colorful Numbers 🖍

Alright, let’s start working on highlighting numbers properly. First, we’ll change our for loop in update_syntax to a while loop, to allow us to consume multiple characters each iteration:

Now let’s define an is_separator() function that takes a character and returns true if it’s considered a separator character:

Since many programming languages have similar separators, we make is_separator a default method.

Right now, numbers are highlighted even if they’re part of an identifier, such as the 32 in int32_t. To fix that, we’ll require that numbers are preceded by a separator character, which includes whitespace or punctuation characters. Let’s add a previous_separator variable to update_syntax() that keeps track of whether the previous character was a separator. Then let’s use it to recognize and highlight numbers properly:

We initialize previous_separator to true because we consider the beginning of the line to be a separator. (Otherwise numbers at the very beginning of the line wouldn’t be highlighted.)

previous_highlight is set to the highlight type of the previous character. To highlight a digit, we now require the previous character to either be a separator, or to also be highlighted with HightlightType::NUMBER.

When we decide to highlight the current character a certain way, we increment i to “consume” that character, set previous_separator to false to indicate we are in the middle of highlighting something, and then continue the loop. We will use this pattern for each thing that we highlight.

If we end up at the bottom of the while loop, we set previous_separator according to whether the current character is a separator, and we increment i to consume the character.

Now let’s support highlighting numbers that contain decimal points:

A . character that comes after a character that we just highlighted as a number will now be considered part of the number.

Detect File Type 🧑‍🔬

Before we go on to highlight other things, we’re going to add file type detection to our editor. This will allow us to have different rules for how to highlight different types of files. For example, text files wouldn’t have any highlighting, and Rust files should highlight numbers, strings, comments, and many different keywords specific to Rust.

To begin, we’ll add an extensions() to SyntaxHighlight so that all types that implement SyntaxHighlight would have to specify the corresponding file extensions:

After creating the extensions function, we then implement it in our macro. First, we create a new parameter extensions,which also takes an expression (expr), and name it ext. We then create a struct with a field named extensions of type &’static [&’static str] (We have to explicitly specify the type ‘static since we’re using it in a struct field). We also create a new method to return an instance of the struct with the specified extensions.

Now let’s create a select_syntax method which would return a SyntaxHighlight object or None if there’s no corresponding SyntaxHighlight for that file extension:

select_syntax now returns the right SyntaxHighlight object. To add a new SyntaxHighlight object, just insert it to the array. Since EditorRows is the struct with the filename, we pass mutable syntax_highlight to EditorRows::new() so that the it would be modified to return the right SyntaxHighlight object.

Let’s not forget to update the syntax_highlight when the user uses SaveAs:

After giving the file a new name, we have to go through all the rows and then highlight them appropriately.

Next, we’ll show the current file type. Some programming languages such as c could have files with different extensions (e.g .h and .c) so we can’t use the extension for the file type. We want the file type to just show which programming language the file belongs to. To do that, let’s add a new method to SyntaxHighlight which returns the corresponding file type:

Now let’s show the file type, if SyntaxHighlight has been implemented for it:

If there’s no corresponding SyntaxHighlight for that file type, we show “no ft” (no file type).

Colorful Strings 📝

Now let’s start highlighting proper. We’ll begin by highlighting strings:

We’re coloring strings green. In some programming languages (such as Rust) the single quote refers to character literal while in others like python, it refers to a string. For now we’ll color the “character literal” dark green. (Later, you can use a boolean or similar to indicate whether or not that distinction should be made) We will use an in_string variable to keep track of whether we are currently inside a string. If we are, then we’ll keep highlighting the current character as a string until we hit the closing quote:

Highlight Strings

We actually store either a double-quote (") or a single-quote (') character as the value of in_string, so that we know which one closes the string.

So, going through the code from top to bottom: If in_string is set, then we know the current character can be highlighted with HightlightType::String. Then we check if the current character is the closing quote (val == c), and if so, we reset in_string to None. Then, since we highlighted the current character, we have to consume it by incrementing i and continuing out of the current loop iteration. We also set previous_separator to true so that if we’re done highlighting the string, the closing quote is considered a separator.

If we’re not currently in a string, then we have to check if we’re at the beginning of one by checking for a double- or single-quote. If we are, we store the quote in in_string, highlight it with HightlightType::String, and consume it.

Usually when the sequence \' or \" occurs in a string, then the escaped quote doesn’t close the string in the vast majority of languages. For instance, in the line:

else if c == '"' || c == '\'' {
String Escapes

If we’re in a string and the current character is a backslash (\), and there’s at least one more character in that line that comes after the backslash, then we highlight the character that comes after the backslash with HightlightType::String and consume it. We increase i by 2 to consume both characters at once.

Colorful Single Line Comments 📎

Next let’s highlight single-line comments. We’ll give comments a dark grey color:

We’ll let each language specify its own single-line comment pattern, as they differ a lot between languages. Let’s create a comment_start field and method:

Now to the highlighting code:

Single Line Comments

Perhaps an empty string was passed as comment_start, so in that case we won’t highlight any comments. We wrap our comment highlighting code in an if statement that makes sure we’re not in a string, since we’re placing this code above the string highlighting code (order matters in this function). We use .as_bytes method since render too returns the bytes of that row.

We then check if this character is the start of a comment. If so we give the rest of the line a HighlightType::Comment setting and then we break out the loop; The range i..i+ comment_start.len() spans chars that is the exact len of the comment_start. But since i + comment_start.len() can overflow we use cmp::min to prevent that.

Colorful Keywords 🔑

Now let’s turn to highlighting keywords. We’re going to allow languages to specify arbitrary number of keywords and their corresponding colors. For now we’re going to have only 3 types of key words but you can always expand it. We’ll highlight actual keywords in one color and common type names in the other color and keywords that go with macro_rules in another color:

What we simply do in the macro is to add a new field named keywords which could contain an arbitrary number of items such that each item should first have a color specified, followed by ; and then the keywords we want to color with the color specified. We also added & as a separator so strings like &str and &mut self and signatures would be rendered properly. (Also added [])

Note how using a macro we’re able to write our own syntax to build our struct

Let’s modify our HighlightType enum so we can pass a color directly:

Now let’s highlight them:

Highlight Keywords

Keywords require a separator both before and after the keyword. Otherwise, the match in rematch, matching, or matched would be highlighted as a keyword, which is definitely a problem we want to avoid. So we check previous_separator to make sure a separator came before the keyword, before looping through each possible keyword. Recall that keywords could contain any any number of color, arbitrary number of keywords pair. So we expand the macro ($( )*) to get each color, arbitrary number of keywords pair. We then expand the macro again so we can now operate on each keyword. Note that the color is still available in this scope.

Similar to how we implemented commenting, for each keyword we determine whether indexing it into render would cause an overflow. We do that check before comparing whether that character is the start of a keyword. We also check whether the character after the keyword is a separator. If the keyword is the last word on the line, we consider it as a keyword. Next, we highlight the whole keyword. After highlighting, we increase i by the keyword’s length, set previous_separator and then continue the loop.

Colorful Multi-line Comments 🖇

We have one last feature to implement: multi-line comment highlighting. We’ll color multi-line comments with the same color as single line comments. We’ll let each language specify a multi line comment start and end. In Rust these are /* and */ respectively.

We use a tuple to hold the information about the start and end of multi line comments. The first value refers to the start and the last value refers to the end.

Now for the highlighting code. We won’t worry about multiple lines just yet.

First we add an in_comment boolean variable to keep track of whether we’re currently inside a multi-line comment. Moving down into the while loop, we check to make sure we’re not in a string, because having /* inside a string doesn’t start a comment in virtually all languages. If we’re currently in a multi-line comment, then we can safely highlight the current character. We then check whether we’re at the end of the comment. If we’re, then we highlight the remaining string which indicates the comment’s end. If we’re not at the end of the comment, we simply consume the current character which we already highlighted.

If we’re not currently in a multi-line comment, then we check if we’re at the beginning of a multi-line comment. If so, we highlight the whole multi line comment start string, set in_comment to true, and then increase i appropriately.

Now let’s fix a bit of a complication that multi-line comments add: single-line comments should not be recognized inside multi-line comments

Multi line Comments

Now let’s work on highlighting multi-line comments that actually span over multiple lines. To do this, we need to know if the previous line is part of an un-closed multi-line comment. Let’s add an is_comment boolean variable to the Row struct:

Now, the final step:

Final Step

We now assign in_comment before current_row since in_comment borrows editor_rows immutable but current_row borrows it mutably and we can’t borrow a mutable variable as immutable while it’s been used. We initialize in_comment to true if the previous row has an un-closed multi-line comment. If that’s the case, then the current row will start out being highlighted as a multi-line comment.

At the bottom of update_syntax(), we set the value of the current row’s is_comment to whatever state in_comment got left in after processing the entire row. That tells us whether the row ended as an un-closed multi-line comment or not. Then we have to consider updating the syntax of the next lines in the file. So far, we have only been updating the syntax of a line when the user changes that specific line. But with multi-line comments, a user could comment out an entire file just by changing one line. So it seems like we need to update the syntax of all the lines following the current line. However, we know the highlighting of the next line will not change if the value of this line’s is_comment did not change. So we check if it changed, and only call update_syntax() on the next line if is_comment changed (and if there is a next line in the file). Because update_syntax keeps calling itself with the next line, the change will continue to propagate to more and more lines until one of them is unchanged, at which point we know that all the lines after that one must be unchanged as well.

That’s It! 🎊🎉

Our text editor is finished. You can find the repository here. You can create an issue if you have any questions, or if you detected a typo or so. You can also email me. If you’d also like to support me, you can donate using this link and suggest more cool tutorials you’d like to have.

I’ll also release some more tutorial like these so if you haven’t already followed, you probably should!

We can link up on Upwork to work together or if you’d like more assistance in your of become a proficient software engineer.

Ideas For Features To Implement On Your Own

You can add some more features that you’d like. May I suggest some.

If you want to extend the program on your own, I suggest trying to actually use it as your text editor for a while. You will very quickly become painfully aware of all sorts of features you’re used to having in a text editor, but are missing in pound. Those are the features you should try to add.

Here are some ideas you could try, roughly in order of increasing difficulty:

  • More File Types: Add more syntax rules for various programming languages. You can simply do this by using the syntax_struct! macro and don’t forget to include it Output::select_syntax().
  • Line Numbers: Display the line number to the left of each line of the file.
  • Soft indent: If you like using spaces instead of tabs, make the Tab key insert spaces instead of \t.
  • Auto indent: When starting a new line, indent it to the same level as the previous line.
  • Soft-wrap lines: When a line is longer than the screen width, use multiple lines on the screen to display it instead of horizontal scrolling.
  • Copy and paste: Give the user a way to select text, and then copy the selected text when they press Ctrl-C, and let them paste the copied text when they press Ctrl-V.
  • Mouse Control: Allow the user to scroll through the file using the mouse. You can also use this to prevent the user from scrolling out of the file when scrolling down. You can also allow the user to insert characters where the last left mouse button click occurred. You can find a head start here.
  • More Coloring: Other expressions such as function names, various “types” such as String, Option and so on, along with macro_rules and various macros could use some highlighting. Currently, “lifetimes” aren’t highlighted well so you might also want to fix that.
  • Config file: Have the editor read a config file (maybe named .{opened_file}.pc) to set options that are currently constants, like TAB_STOP and QUIT_TIMES. Try to make more things configurable.
  • Multiple buffers: Allow having multiple files open at once, and have some way of switching between them.
  • UTF-8: This might seem as though it isn’t a big issue but as the Rust book says, strings aren’t so simple. Currently, invalid UTF-8 characters would crash the editor. Also, characters which are more than 1 byte (such as emojis) would cause the editor to crash when scrolling through those characters. You could use iterators to such as chars(). skip(). take() instead of direct indexing (which panics). This also includes modifying the editing features of the program to insert and delete characters without panicking, along as fixing the cursor position in the presence of these multi bytes characters.

--

--

Kofi Otuo

Sr. Software Engineer in Systems, Mobile and Blockchain programming. I write coding tutorials. Reach Me: https://www.upwork.com/freelancers/~0196d30a485de56f48