🌬️Oracle SQL — Split Table Column into words as Records

Ido Montekyo
idomongo
Published in
3 min readSep 24, 2023
Photo by Glen Carrie on Unsplash

So, I want to create a word cloud to identify common and major words of my free text, which is split into a column spread over several records.

select *
from alice_in_wonderland

1 So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her.
2 There was nothing so very remarkable in that; nor did Alice think it so very much out of the way to hear the Rabbit say to itself, “Oh dear! Oh dear! I shall be late!” (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the Rabbit actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge.
3 In another moment down went Alice after it, never once considering how in the world she was to get out again.
4 The rabbit-hole went straight on like a tunnel for some way, and then dipped suddenly down, so suddenly that Alice had not a moment to think about stopping herself before she found herself falling down a very deep well.
...

Can I create a new table with this text split into words and have one record per word?

Luckily, Lalit Kumar explains exactly how to do this here.

Step #1

  • So, actually, a single whitespace “ “ can be used as the delimiter.
  • We can use Oracle’s REGEX along with CONNECT BY (Hierarchical Queries) to get the desired effect.
SELECT
TRIM(regexp_substr(t.text, '[^ ]+', 1, lines.column_value)) text
FROM
alice_in_wonderland t,
TABLE ( CAST(MULTISET(
SELECT
level
FROM
dual
CONNECT BY
regexp_substr(t.text, '[^ ]+', 1, level) IS NOT NULL
) AS sys.odcinumberlist) ) lines
ORDER BY
lines.column_value


TEXT
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
and
dry
leaves,
and
the
fall
was
over.

...

Step #2

Now we can add GROUP BY and COUNT statements using wrapper SQL statement.

SELECT
text,
COUNT(1) word_count
FROM
(
SELECT
TRIM(regexp_substr(t.text, '[^ ]+', 1, lines.column_value)) text
FROM
alice_in_wonderland t,
TABLE ( CAST(MULTISET(
SELECT
level
FROM
dual
CONNECT BY
regexp_substr(t.text, '[^ ]+', 1, level) IS NOT NULL
) AS sys.odcinumberlist) ) lines
ORDER BY
lines.column_value
) x
GROUP BY
text
ORDER BY
2 DESC


the 61
to 48
she 47
and 41
was 40
a 39
it 31
of 30
I 18
in 18
very 17
that 16
...

Note the space used in the REGEXP satement: regexp_substr(t.text, ‘[^😊]+’, 1, level)

We got this, Let’s go!

--

--

Ido Montekyo
idomongo

System Analysis. System Design. Architecture. Databases. Project Management. Speaker. People Motivator.