Immuta v1.2: A focus on data access & governance for data scientists.
Prior to our 1.0 release, by conforming to the Immuta API, you could expose a REST endpoint to make any data source “exposable” by Immuta. This is powerful and we wanted to start here to ensure we were completely agnostic to any storage technology and any data format; structured, semi-structured, unstructured. Similarly, to build a data policy, you could conform to the Immuta API and expose a “policy” REST service that could make complex access decisions and was completely customizable. Again, this provided us the utmost flexibility to handle any complex data policy a customer could throw at us. There was a lot of meat to that — problem was, you really shouldn’t ask people to create custom endpoints (write code) for standard storage technologies and standard data policies.
Version 1.0 ushered in our new user interface for exposing data from various storage technologies. This user interface not only allows you to expose data sources through button clicks, but you can also build standard data policies through the UI as well. As discussed, we’ve expanded on these 1.0 UI capabilities to now include the following databases out of the box:
Elasticsearch integration in Immuta is interesting due to the nature of Elasticsearch. While Elasticsearch is a NoSQL database, unlike some of our other UI supported data sources which are non-relational, such as S3, Elasticsearch has a concept of a schema, albeit dynamic. Due to this, we were able to expose Elasticsearch within Immuta in ways that allows the data in Elasticsearch to be curated for analytics via the Immuta virtual file system and the SQL access pattern. You can chunk up groupings of Elasticsearch results in virtual files, while at the same time, allow BI tools to integrate and query Elasticsearch as if it was a SQL-able database, all while enforcing privacy controls dynamically. Due to this, we can also expose an Elasticsearch data dictionary for people to collaborate on fields and subfields within the json structure of the Elasticsearch data.
Take note of the dot-notation to represent hierarchical data within Elasticsearch as “columns”.
Expanding on the data dictionary thread, we now have the ability to expose a manually created data dictionary for collaboration on non-relational data sources. For example, let’s say you had some json data stored in S3 that you’ve exposed in Immuta and you know the format of the json data. You can create a manual data dictionary in Immuta so users accessing that data will be able to understand it, collaborate on uses of it, and ask questions about the structure and values.
We’ve also added spatial capabilities by enabling PostGIS through our SQL access pattern. This means you can expose spatial PostGIS tables in Immuta and apply policy controls. For example, you could integrate ArcGIS with the Immuta data layer, and visualize the spatial data on the map, and as you zoom, the available geometries will be limited to the policies for your user. This also means you can extract spatial features (geojson, WKT) when exposing non-relational sources, store them in Immuta, and then query for those geometries with policies enforced based on the upstream blob they were extracted from.
We also needed to add more flexibility on how data sources are made available to users in the Storefront.
The first portion of this feature was to enable the ability to expose a data source from a remote source, yet make it private to yourself. This is important for when users are experimenting with how they want to represent the data view, or how they want their policy to behave. While a data source is private, you can also subscribe specific users, as desired, and then promote the data source to public when ready (and downgrade it back to private, if desired).
The second feature added flexibility around allowing users to edit certain aspects of the data sources exposed in the Immuta Storefront. For example, the parent data source may have prescribed a particular file system structure and has a large amount of data exposed.
Users with lesser permissions can now take this data source, and extend it in ways they need — we call this creating child data sources. For example, the user could change the directory structure, add additional filters to the data, or add new columns.
This is powerful, because it allows users to create views of data from existing views, while still maintaining the policy controls. Children always retain the policy controls from the parent data source — there is no risk of a data leak from creating child data sources. This concept also allows the persons with permission to create the initial parent data sources to be more liberal with the level of data the expose, assuming the data scientists leveraging the data will make several more meaningful child data sources over time.
Child data sources also inherit all collaboration and definitions from their parents, which aligns with the collaboration approach Immuta has always promoted, which is providing a single location to understand everything about your data holdings and to scale knowledge about the work you are doing with this data.
Lastly, we’ve furthered our authentication and authorization capabilities beyond LDAP integration to ActiveDirectory, OAuth and PKI. As part of this expansion, we’ve also added the ability to take a hybrid approach to authentication vs authorization. Meaning, you could authenticate against ActiveDirectory for example, yet keep that user’s authorizations internal to Immuta, if desired. Conversely, you could authenticate with ActiveDirectory and also pull the user’s authorizations from ActiveDirectory. In both instances, Immuta will act as the rules engine and data unification layer, acting on the authorizations returned from ActiveDirectory or Immuta internal when using the hybrid approach.
We believe the features contained in 1.2 further our capabilities surrounding data connectivity and flexibility in working with exposed remote data. Future releases will take this further to allow us to connect analytics to data, just as we connect users to data today….more on that soon!