Extracting the contents of a table in Selenium
The easiest way is to locate the WebElement and invoke its getText method. This works quite well: it is fast and it only shows visible text. It has one potential downside: distinctions between the cells are lost. You need to split the string into lines and words to recover explicit distinctions:
If the cells themselves have blank spaces in them, the parsing becomes more involved.
An alternative solution is to explicitly get the rows/cells using findElementsByTagName and then call getText on each specific cell, constructing a list of lists of strings in the process. Something like:
However, this is hideously slow for big tables! The repeated calls to isDisplayed and getText are the reason.
So, the idea is to construct the nested array browser-side and return it to Java in one go:
This approach seems to work quite well. Its speed is comparable with the original getText solution. In its current form, it has the disadvantage that it doesn’t check the visibility of each individual cell. (Remember that getText only returned visible content.)
I have implemented the three approaches in this GitHub repo, which tests them against a sample table.