A Database That Links All Fields Together In A Web. It Begins With Saving Every Field As A Separate Record
By David Grace (www.DavidGraceAuthor.com)
A File Finder — Remember Magellan?
Back in the days before Windows (1993 to be exact) it could be difficult to find the file you wanted on your hard disk.
I decided to build a program that would do that. I called it CrossRef. The key idea was to treat every file characteristic as a parent with the related file name as its child. Each child was also linked back to each of its parents so clicking on a parent showed its children and clicking on any child showed its parents.
Every file date was considered the parent of every file created on that date. Every proper name in a file was the parent of every file that contained that proper name. Headings were parents. And so forth.
The program parsed the contents of each text file to create the list of proper names and headings that would be treated as parents of that file.
Each of these parents — dates, names, headings— were put in an alphabetical list and when any of them was highlighted, a second, alphabetized list of the file names that were associated with those names or dates appeared.
If you highlighted the name of any file you would see the list of all the proper names, the date, etc. associated with that file.
Because you could tag names and dates and apply an AND filter, you could quickly narrow the results of any search.
I built a custom database as the engine to run the program. After I finished it I realized that the database structure I had designed was good for a lot more than just relating file characteristics to file names. It was good for relating any piece of data to any other piece of data.
Every piece of data was a parent to many other pieces of data as its children and each child had its own children. Since everything was linked you could cycle from parents through children, grandchildren, great grandchildren . . . on out to 64 levels deep.
Traditional Database Structures
The long-established basic unit of databases is the “record” which contains related pieces of information, with each unit of information being called a “field.”
If the database is your address book then each record contains the information related to one person’s name and the fields are that person’s street address, city, state, zip code, phone number, email address, etc.
In effect, your friend’s name, “John Smith” is treated like the parent and John’s phone number, zip code, city, state, etc. are John Smith’s “children.”
In a standard database the NAME field (“John Smith”) would be a “key” field which means that it would be maintained in a sorted list so that it could be quickly found. It would be the main access point to the database.
The user would search for “John Smith” which would be found quickly because it was maintained in a sorted structure.
When the “John Smith” record was found, then the other bits of data related to John Smith, the fields in the John Smith record, his street address, his zip code, his phone number, would also be found and displayed.
But suppose you wanted to find all the people in your address book who lived in a specific zip code or on a specific street? If those fields weren’t also designated as key fields whose contents were maintained in a sorted list you were out of luck. And, even if they were so maintained, you would just get a list of rows of data, each beginning with “94087.”
It would be up to you to manually scan through each row to extract or find the information you wanted.
Since I was interested in building a system in which every piece of data could be linked to every other related piece of data, the standard database structure of records and fields just wasn’t going to work well for me.
The Basic Structure Of The New Database
Instead, I decided that every piece of data that would otherwise be a field in a standard database record would now be individually stored in its own record. Each such record would contain two fields. The first field would be the text data itself (“John Smith” or “94087” or “Sunnyvale”) and the second field would be a long integer — a number between about minus 2.147 billion to plus 2.147 Billion.
“John Smith” went into its own record and as John Smith’s data was added to the database that data also went into their own individual records that were automatically created at the time of entry.
So, when you entered John’s zip code a separate record was automatically created for that zip code and John Smith was automatically linked to the zip code record and the zip code record was automatically linked to John Smith. Smith was a parent to 94087 and Sunnyvale; 94087 was a parent to John Smith and Sally Jones; and Sunnyvale was a parent of John Smith, Sally Jones and Bill Johnson.
The same thing happened when you entered John Smith’s street address, city, state, email, etc. Each became a child of John Smith and each also became a parent of John Smith.
The second field in the “John Smith” record contained the number of a record in another file, the child-list file, that held the first entry in a list of all the locations of all of John Smith’s children.
That second field in the zip-code entry’s main database record contained the number of a record in the child-list file that held the first number in a list of the numbers of the records that were children of that zip code. One of the numbers in that list of the zip code’s children was the main file record number that held the entry “John Smith.”
When the program opened the John Smith record it would find a number that told it where to look in the child-list file for the links to John Smith’s children.
Those links were a list of numbers that pointed back to the records in the main file that contained each of John Smith’s children.
So, in that child-list file the program would find a number pointing to the record that contained John Smith’s zip code, then a number that pointed to the record that contained John Smith’s street address, then the number of the record that contained John Smith’s phone number, etc.
Thus, like a standard database, searching for “John Smith” would quickly bring up all the bits of data that were the children of John Smith.
But, unlike a standard database, if you searched for a zip code, e.g. 94087, you would quickly get a list of all the people in your address book who lived in that zip code, including John Smith. If you clicked on John’s city, Sunnyvale, you would get a list everyone in your address book who lived in Sunnyvale. And so on.
Records in standard databases have a fixed number of fields with fixed lengths and most are not sorted fields, key fields, which means they are not able to be quickly located.
In my structure every field, every individual piece of data, was saved as its own entry in the sorted main database file so it could be found very quickly. Since this system assigned children one-by-one as they were entered there were no practical limits on the number of fields/children that could be associated with any entry. John Smith could have three children or three hundred. There were no pre-set limits.
The User Interface
I built the program to display the data in three columns.
If you entered “John Smith” in the search box for column one, then “John Smith” would appear at the top of a list in the first column, followed by “John Somers”, “John Trainer” etc.
When the cursor was on “John Smith” in column One, then column Two would show:
- 436 Lincoln Ave.
- Mary Smith
If you moved the cursor to “Sunnyvale” in column Two then column Three would fill with the names of everyone else in your address book who lived in Sunnyvale. Suppose one of those people was “Sally Jones.”
If you then clicked on “Sally Jones” in column Three all the columns would immediately shift to the left so that all of John Smith’s children that had been in column Two would now be in column One. Sally Jones would be highlighted in column Two, and all of Sally Jones’ data would now appear in column Three.
If you clicked on “California” in column Three, the data would again shift to the left so that Sally Jones would now be in column One. California would be in column Two and column Three would show an alphabetic list of every person in your address book who lived in California.
You could continue to migrate to the right through the database through 64 virtual columns.
Filtering — A Real Estate Database Example
This structure made it relatively easy to perform AND and OR searches (filtering).
Suppose that this was a database of real estate for sale. The entry “421 Main Street, Sunnyvale, CA” in column One might have the following children in column Two:
- 2 baths
- 3 bedrooms
- Air conditioning
- Sunnyvale Unified School District
- swimming pool
If you highlighted “swimming pool” in column Two you would get a list in column Three of all of the other houses currently for sale that also had swimming pools.
You could AND tag “swimming pool” and “tennis court” in column One and column two would show you a list of all of the houses for sale that had both a swimming pool and a tennis court.
If you OR tagged swimming pool and tennis court in column One, column two would show you a list of all of the houses for sale that had EITHER a swimming pool OR a tennis court.
You could concatenate filters to look for houses in column One that were either in Palo Alto or Mountain View that had either a swimming pool or a tennis court and the matching houses would appear in column Two.
You could then highlight any house in column Two and its fields, its children, would appear in column three.
Adding Text, Pictures, Sound & Video
I attached a third field to each main file record. That third field pointed to a record in a text file that could contain up to 64,000 text characters about that entry .
So, if you highlighted “421 Main Street” in column Two, a window would open that contained a text description of the property. Of course you would want to include photos and links to videos on that text screen.
I added the ability to parse these text entries and look for words or phrases that were in the database.
If the text description of 421 Main Street contained the phrase “Sunnyvale Unified School District” which was also an entry in the database, then that phrase was automatically turned into a hot link that when clicked would take you to the text page for the database entry “Sunnyvale Unified School District.”
If you jumped to the school district’s text page you might find that it contained a word or phrase that matched another entry in the database. You could then click on that link and jump to that new page.
You could, of course, jump backward through the string of hot links back to the text screen you started at.
Importing Data From Other Sources
I built utilities that allowed Ruby to import and convert comma delimited and dBase databases (what’s dBase you ask?) into the Ruby parent-child database structure so that information in existing databases could be accessed in this way.
Use As An Educational Tool
I thought that this could also be used as an educational tool. I created a utility that could parse a marked-up text file and import those marked-up words and phrases into Ruby. The idea was that an existing text book file could be “marked up” and then the Ruby utility would parse it and create a Ruby database from that file.
Use As A Text Book
Suppose you converted an astronomy textbook into this kind of a database. You could either mark it up and import only specific words and phrases or you could import every word that was not on an exclusion list (“and”, “the”, “that”, “was”, etc.).
Each of these words or phrases would have its own record in the main database file and every entry would have an unlimited number of children that were related to it.
For example, the word “stars” might be linked to each of the several hundred stars named in the book. Selecting “earth” would give you a list of everything in the book linked to earth — luna, moon, sol, solar system, sun, third planet, etc.
This opened up lots of new ways to access and look at data. You could jump from solar system to planets to Mars to Phobos, etc.
If you imported the astronomy book into this database you might go to “Planets” in column One and then to “Mars” in column Two and then you could open the text window linked to Mars and you could read about Mars and see photos, watch videos, etc.
That same Mars text screen might have a hot link to Phobos. By clicking on it you would go to the Phobos text page. That page might contain the hot-linked word “moons.” By clicking on it you would go to the book’s text page on moons.
Essentially, you would be able to navigate through the entire body of knowledge contained in the book and then jump back at any time to the three-column data lists.
Discovery Of Indirectly Linked Data
Another feature of this structure was its ability to show indirectly related data.
If you think of “Solar System” as the parent and Asteroids, Moons, Planets, and Sun as Solar System’s children then Planet’s children — Earth, Jupiter, Mars, Neptune, Venus, etc. — are Solar System’s grand children.
Put differently, collectively, all of the children of Asteroids, Moons, Planets, etc. are grandchildren of Solar System.
This kind of a data structure gives you the ability to find indirect links between data items, that is indirect links between parents and grandchildren.
Find The Children That Are Common To Parent Earth and Parent Mars
Let’s start with a simple filter of two data items, Earth and Mars. Think of them each as parents.
Create a list of the children that are common to both Earth and Mars — an AND filter of Earth and Mars. This is a relatively small list of only the children that Earth and Mars have in common, maybe Moon and Planet.
Look at each child on this list, Moon & Planet, and make new, separate lists of each of the children of Moon and each of the children of Planet. These are lists of the grandchildren of the children who are common to both parents, Earth and Mars.
Find The Grandchildren That Are Common To Parent Earth And Parent Mars
Make A Grandchildren AND Filter List
Compare all of Moon’s children to all of Planet’s children and make a new list that shows only the items that are both children of Moon and also children of Planet. In other words, show a limited list of Earth’s and Mars’ grandchildren.
This is a grandchildren AND filter. This list will have relatively few entries.
Find The Grandchildren Who Are On Any Of The Lists, An OR List
Make A Grandchildren OR Filter List
Start with the same list of the children that are common to both Earth and Mars, namely, Moon and Planet.
Again, make separate lists of the children of Moon and children of Planet. These are the same grandchildren lists you created in the first step.
Now make a new list that contains the grandchildren that are on either of these two grandchildren lists. Take every child of Moon and every child of Planet and put them all together into one big list.
This is an OR list and it will be a relatively large list.
List All Of The Children Of Either Parent Earth Or Parent Mars
Make A Grandchildren AND Filter List
Create one list that contains both the children of Parent Earth and also the children of Parent Mars.
This is a parent Earth OR a parent Mars filter. This is a relatively large list.
Make separate lists of the children of each of the entries on the above list. These lists show all the grandchildren of Earth and also all the grandchildren of Mars.
Compare all those grandchildren lists to each other and show all the grandchildren that are common to all of those grandchildren lists.
This is a grandchildren AND filter. This list will have a relatively small number of entries.
Make A Grandchildren OR Filter List
Start again with the list of all entries that are children of either parent Earth OR parent Mars. This will be a relatively large list.
Now make a new list that contains all of the grandchildren of Earth and add to that list all of the grandchildren of Mars all on one big list.
This will be a relatively large list.
If the database you’re using is the astronomy textbook and the two parents that are being looked at are Earth and Mars then this type of grandchildren filtering will give you greater or lesser lists of entries that bear some indirect relationship to Earth and Mars. These relationships may be non-obvious.
Of course, you’re not limited to only two items. You could tag each of the nine planets and perform each of these four types of filters — Parent AND, Parent OR, Grandchildren AND, Grandchildren OR on all nine parents.
You could also expand the principle and add great-grandchildren filtering as well. A Parent AND filter followed by a Grandchildren AND filter would likely yield a zero result.
A Parent OR filter followed by a Grandchild OR filter would likely yield such a huge list as to be completely worthless.
But, A Parent AND filter coupled with a Grandchild OR filter and a Parent OR filter coupled with a Grandchild AND filter might each yield interesting results.
What Happened To This Product?
I built both products, CrossRef for file discovery and Ruby for general information management. Both worked quite well as test beds and proofs of principle.
Since this was the early to mid-1990s, I wrote both programs in Visual Basic for DOS. Yes, DOS!
As you can imagine, that was a lot of code. Then Microsoft killed VB for DOS and replaced it with VB for Windows.
Because CrossRef was dependent on the file system, changes from the DOS file system would require changes in the CrossRef program. Also, since CrossRef created cross references by parsing file contents the rise of the MS Word file format and changes in the Word Perfect file format also would require major changes to the CrossRef program.
While CrossRef was structurally powerful, as a practical matter it was hopelessly obsolete.
Ruby still ran fine under all versions of Windows through XP, but it looked like crap compared to a real Windows program, as you can see from the graphic at the beginning of this article.
I never had the emotional energy to migrate it all to VB for Windows.
Ruby still runs on Win 7 using DOSBOX. As you can see from the screen shot at the beginning of this story, it still works, though the DOS screens sure look clunky.
Though visually obsolete, I still think it provides a superior way to view, skim, scan and access many kinds of data.
I’ve often thought that this structure might be useful to people who needed to search for indirect connections between data items, such a police department investigating a series of related crimes.
It occurred to me that if dozens or even hundreds of data items were added to the database for each of the crimes and each of the victims that the indirect filtering tools might turn up relationships or investigative leads that otherwise would be inaccessible.
I also thought that if all the known symptoms of various conditions — equipment failures, sickness, whatever — were treated as parents and the known causes for each such symptom were added as their children, that AND and OR filtering of the symptoms applicable to any particular illness or equipment failure could be used to create a narrow list of the causes of that particular condition, thus reducing the possibilities down to some manageable level. My thought was that this might be used as a crude expert system.
A Little More Technical Data
For those of you interested in a bit more technical data, I created a hashed structure to control saving and finding the string data in the main DB file. Field two of the main file contained a long integer pointing to a double linked list which list contained pointers to the children of each parent in sorted order. Field three in each record in the main DB file contained a pointer to the file containing the linked text records associated with each item in the main DB file.
Because this was DOS I could not use large in-memory arrays and disk space was also at somewhat of a premium. I limited the hash numbers to a short integer, 32,000, so the hash numbers for strings began to collide as the DB grew bigger. This required that a collision table be built into the hashing system and that added some latency as the database grew.
If I had built this in today’s Visual Basic I would have used an off-the-shelf DB engine to keep track of the string fields in the main DB file without the overhead of the hashing mechanism or collision files, and I also would have used a commercial database to store and access the text entries linked to each string in the main DB file.
Also, today I would not maintain the list of each data item’s children in sorted order. Instead, I would save the links unsorted in the order added and sort them in memory immediately before they were displayed.
Well, this is all now just food for thought. If anyone wants to build this kind of a data system to run under Win 10, let me know.
– David Grace (www.DavidGraceAuthor.com)