Shared Schemas

Thomas Reggi
5 min readOct 13, 2015

--

I want a world where schemas are forkable and versionable and I want a CMS that allows for creating content as well as managing Schema definitions.

The world is filled with information and data. Object oriented programming is a philosophy and a method of coding that has grown greatly in popularity over the past decade, however the way that we think of the information itself is decentralized and siloed.

With web 2.0 we were promised a comprehensive semantic understanding of data on the web. The philosophy of the semantic web has failed to gain real popularity over the past couple of years and I have a theory why.

The reason that the semantic web isn’t here is because organizations (http://schema.org/ and http://microformats.org/) have tried to enforce a standard that doesn’t work for everyone and doesn’t evolve organically. The benefit of having an ecosystem of schemas is an obvious one. With the ability to version and fork, the people who need a solution can collaborate and collectively choose what best suits their needs.

I want to see a world where data makes sense, and forking and versioning of schemas is a breeze. For instance my fork reggi/postalAddress/v0.0.1 is from another schema basic/postalAddress/v3.21.9 (following username/schema/version), with something like this we get the ability to migrate data from one schema to another or update the version without a major headache.

What is a schema? A schema is a set of properties that are assigned to a specific object. I think of a schema like an online form, when you’re filling out your address each field is a property of the address object. Some form inputs are a bit more complicated than others like when you put in your credit card information there’s a field for the expiration date, the interface for which can be a calendar. Schemas underline a objects think about the properties that these have, Address, Person, Movie, Birthdate, Date, Product, Image.

What I’d love to see is a version of Wikipedia built using a schema-driven technique that way app creators can have access to open-source APIs of any content. Here’s a scenario: you want to build a small site that shows the US state flower for each state, Wikipedia already has this list (here). There’s no semantic content on that page and there is so much data that is lost. With a State and Flower object you can get very far by creating both of these schemas and then filling applying the information. I want this information free and queryable and downloadable. So if this existed I would be able to query and download the States with the Flower properties, then I can make an app using that data. Right now I’d have to scrape that data, and create my own database as many people do. Most information is accessible but not exposed to download for yourself for instance look at the star wars api, imdb.

This is the information age let’s free our information.

Ideas:

  • Fork other users schemas
  • Incrementally version schemas using semver
  • Migrate schemas
  • Extending a schema
  • CMS for schema design
  • CMS for any object
  • Provides a guideline for Objects and apps, need a generic bootstrap Postal Address HTML form? Export the HTML.
  • Connects UI components to objects and forms.
  • Merging duplicate content
  • Organic ecosystem for schemas
  • Form field + property are tied together
  • Form fields are objects
  • Customize the interface you wish to use to interact with for field (e.g. calendar)
  • Way to fork existing databases and data
  • Division between content and schema architecture (instance and construct eg. Thomas Reggi and Person)
  • Way to reference existing objects within CMS
  • An object-oriented wikipedia with API
  • Wikipedia does long-form great, where it fails is the storing and organization of properties and keeping them consistent.
  • Someone would have an Actors repo that would contain the data for Actors in a given schema, they can make the decision to migrate to a different schema everything is version controlled so all the data is always in an older commit. You can fork the whole repo and add in new Actors they can accept your Pull Request, or perhaps it’s more of an open governance picture where not only one person or an organization has all the power. The problem is, this is a lot of information, hard to store, organize and visualize. This is somewhere between wikipedia and github.
  • How do you prevent “unpopular Actors” from being in the Actors repo? The idea is you don’t have to. I guess you don’t need an Actors repo outright. Person would be an object and you can extend that person with the Actor object. Then if you wanted to query for every actor you can do that and assemble a data set. If you wanted to get popular actors then you can query all people that are actors with n amount of movies they’ve starred in. You can even query the actors by the sum of the rating of the movies they’ve been in. So I guess there is no “Actors repo”. Data has information in it and you can query to get anything you need and assemble a data-set that way.
  • The problem with that is you need to query for specific schemas and when you get the information back it will be in different schemas. If you wanted it all in one format you have to migrate it all to one on the fly. :(
  • I know from when I’ve done Ancestry research that if you have an object for a given person and you apply a ‘mother’ property to them and then their ‘mother’ has a ‘mother’ then the origin person should have a circular ‘grandmother’ object assigned. This is an interesting phenomenon that needs a way of being described when a schema is being created. Or if you have a ‘step-father’ and he has a ‘son’ that object should be applied automatically as your ‘half-bother’.
  • With the advent of data API’s we’ve given our data to gatekeepers that keep it locked away.
  • The world of API’s allows us to do great and many things but it prevents us from easily being able to access, update, share and redistribute data.
  • Photograph (Media) > Author (Person) > Specific Person
  • Referencing Specific Object in a decentralized way is impossible, and you can’t reference any specific person by a content hash. The reference is totally arbitrary.
  • Some properties do act as meta properties, created_at, update_at, authored_by, within a global CMS the behavior of these properties need to be tied to the current logged in user (as an object) and their timezone. Ideally an object for {global-cms/user/#latest:{current-user}.name} is something the CMS is aware of as a variable.
  • What if you could put any data object through the lens of another schema. So if “thomasreggi” was an object and the schema reggi/person had a `.name` property (set to Thomas Reggi) and there was another schema that was a fork (ex josh/person) of the same repo and mapped `.name` to `.fullName` {reggi/person/#latest:thomasreggi.name}} and {josh/person/#latest:thomasreggi.fullName}} both would render “Thomas Reggi”. As long as there’s a map or connection from one authored schema to the lens schema it’s totally possible.
  • Aliased properties so that `.fullName` and `.name` are both the same, `.surname` and `.lastName` could be the same, likewise for `.state` and `.provence`.
  • Conditional Properties, if Country is United States show “state” field, if Country is France don’t.
  • Automatic properties / inference data. If Person is under 21 and resides in United States they can’t legally drink alcohol, so `legalDrinkingAge: false`. In this case this property can be set to a function based on other properties of the object. This would also need to import information from another object “country.legalDrinkingAge”. So you can apply the country of the person into this function and output an integer and check if they are at `legalDrinkingAge`.

Originally published at gist.github.com.

--

--

Thomas Reggi

Brooklyn born full-stack web developer, madly in ♥ with JavaScript.