Basics of Web Word Processor (2)

Web Word Processor (“Web Word”) is attractive software that allows users to edit documents anywhere as long as they have access to a browser. It does not support all the features of the native Word processor, but I hope it will do so in the near future. I have been through hard times developing Web Word for the two years working on this, only looking on the bright side. I’ve experienced a variety of thrills and excitement in the course of implementing each feature. I wish that you would feel the same as you read this article.

My previous article explained the criteria for categorizing Web Words, the necessity and complexity of implementing pages, contentEditable, and the principles behind displaying and placing layouts of pages in HTML. In this article, I will discuss the ways to implement simple page views and editing features using actual code.

What to Implement

As discussed in the previous article, the simple requirements for implementing page views are as follows:

  • When a paragraph overhangs between pages, it should be able to be wrapped in two separate pages.
  • Text should be updated in real time as user enters or deletes characters.

A lot more implementations and considerations are needed when it comes to implementing tables. This article discusses mainly focusing on texts. In CSS, the element handled as display: inline; is the target for layout.

Let’s Get Started

In the previous article, I explained the steps to implement Layout. This process is done in the following procedure:

  • Dividing a Paragraph between Two Pages
  • Dividing a Paragraph into Lines
  • Events Requiring Page Layouts Again

Let’s see the actual code implementing these steps.

Dividing a Paragraph between Two Pages

Each page has its own margins, and text can be typed in any area inside the margins. Therefore, the actual area for text display is anywhere except the margins. In the code below, it is an element that has class="page-body". Simply put, I will call it pageBodyElement.

The actual size of an A4 sheet is 210mm x 297mm, but this is rather oversized to fit in the space of this article, so I will use an arbitrary size of 150mm x 80mm for our convenience.

Write HTML code to implement pages as shown below:

Setting pageBodyElement

contentEditable="true” is set to pageBodyElement, so it is in an editable mode at the moment. style="outline: 0px;" is used to remove the outlines that are visible in the edit mode. I used the p tag to indicate paragraphs. For an empty paragraph, I added a bogus (br tag) to display cursor.

Now we’ve got a very basic document editor.

We will now move the leftover part of the paragraph to the next page.

If additional texts are entered in this state, the texts will exceed the margin as shown below.

If we were going to create a Word Processor that didn’t require page implementation, we could have simply specified overflow-y: hidden or overflow-y: scroll. However, since our aim is higher, let's take a look at how we can deal with this.

Simply put, we are going to find all paragraph elements of which bottom value is greater than page 1, and then move them all to the next page.

This task is done by firstly find whether such a paragraph exists by using _findExceedParagraph().

As it is clearly shown in the code, only p tags are currently regarded as paragraphs. As there are many more types of Block-Level Elements, visit MDN to see what else you can add.

The second task is to find all the exceeding paragraphs (_getExceedAllParagraphs()) and move them across to the next page (_insertParagraphsToBodyAtFirst()).

Please note the presence of the line of code in the _getExceedAllParagraphs() function, which handles a case when the height of a single paragraph is greater than the that of a page.

This happens when the height of the first paragraph is greater than that of the page. When this happens, an infinite number of pages will be created unless it is properly handled within the layout flow. If this kind of oversized paragraph is left as it is, the text will cross the margin as shown below. We will take care of this problem in the Dividing a Paragraph into Lines section. In reality, a paragraph that contains a large picture or a tall table is likely to cause this problem. If this is the case, we need more advanced handling process of the layout.

_insertParagraphsToBodyAtFirst() moves all the exceeding paragraphs across to the next page. If the next page is blank, we could simply add a paragraph element to pageBodyElement. If the page is not blank, we could insert it at the top of the page. Any paragraphs previously split must be combined back into one at this stage. Otherwise, we will see two separate paragraphs that should've been one.

Applying layout to a page results in the increased number of pages. Layout must be applied to the newly created pages up to the very last page. Let’s take a look at the code of the entire page layout.

_layout() applies page layout to the first page through the very last page. _layoutPage() uses the function mentioned earlier to apply Layout to the specified page.

This is the picture of applying Layout to all pages. I applied a bit of delay to see the process of Page Layout in action.

Dividing a Paragraph into Lines

Now is the right time to take care of a paragraphs of which height is greater than that of the page. This is shown in the image.

As you can see, the last line crossed the margin. As I mentioned earlier, you cannot gain the coordinates of letters with Text Node only; you actually need to wrap all text nodes with span tags. There are two points you need to know here. First is to detect lines within the paragraph, and the second is to split a paragraph that goes out of the page. Let’s take a look at the actual code.

_splitParagraph() has been added. If there is an over-the-margin paragraph, we need to separate the paragraph in two, starting with the exceeding line of the paragraph. All other paragraphs separated as a result of executing _getExceedAllParagraphs() are also collected and moved across to the next page.

For better understanding of wrapping text nodes with spans, I highlighted the wrapping of each letter in red. (In reality, borders should not be displayed in order to avoid paragraph distortion.)

Another thing you’ve got to focus is how cursor is kept. I assumed the case of cursor being collapsed, but this must be dealt with to keep the current cursor. (In fact, implementing this alone in depth requires a lot of work.)

This is the step where the number of lines in a paragraph is detected.

This is the step where the exceeding lines in a paragraph are detected.

This is where exceeding paragraphs are split into two. Store the ID of a split paragraph so that you can combine it back later.

Remove the wrapping span tags, maintain Cursor, and normalize() the split texts.

Now when text overhangs between pages, it should be able to be wrapped on a line basis and displayed in the next page.



Events Requiring Page Layouts Again

In this example, I made a layout to be applied starting from the first page when the keyup event letter is entered. We also need to handle other events such as Copy & Paste and Delete.

Page layout during typing in text


We have looked at how we could implement pages, which is essential in developing a Web Word. I do not encourage you to jump right into the development of a Web Word. Rather, be cautious and take a careful approach because there are a lot of features still to be considered as follows:

  • Adding more Block-Level Elements
  • Processing Block-Level Elements when they are in deep places within the DOM Tree
  • In this case, paragraphs must be split until pageBodyElement becomes a parent
  • Processing a single paragraph that does not fit in the remaining area of a page (images, tables, etc.).
  • Splitting and rejoining the table which has exceeded the margin of the previous page in the next page (splitting cells is a quite difficult task).
  • For Korean letters, maintaining cursor and text compositing
  • And taking care of those subtle 1-pixel differences!

In my experience, you are better to discuss the necessary features in the Web Word in advance with your client. Otherwise, your software will constantly be compared to the native Word processor. In my opinion, if the specification of providing coordinates from Text Node is added or DOM access support in WebAssembly is supported, the development and performance of your Web Word will be a lot better.

If you are a front-end developer who would dare to give his or her best shot despite all the challenges, give it a try, brave one! Here are my little gifts: the source code (html-page-layout) and the demo.