Love Your Database: Current Research
A quick update. I am still actively engaged with the LYDB project. I’m trying to get solid if only in-principle explanations of how to move your model into the database, and how to otherwise leverage things that are easiest/most securely etc done in the database.
I also have in mind that people’s time is short, so if I can’t write something up clearly and concisely, I haven’t done so. But so I don’t lose folks’ interest, I will quickly mention a few things I’ve been looking into. If anyone has any advice in any of these areas, please let me know.
Databases tend to have pretty robust security features, with hierarchical roles and row security and the like. I love the idea of row security, in particular, so that if any account is compromised, the attacker can only affect the data owned by that account.
Postgres’ roles model is pretty nice, and I would suggest that if you don’t have an authorization feature for your application, you might consider actually just using that, if only because it is there, it’s rugged, it works, and it’s easily shared between applications. Note that these roles don’t have to be used for database connections — you could have your current users table and use a set of corresponding, non-login database roles to represent your users and groups and privileges and how they relate to each other.
The row security feature is pretty easy to use for accounts that are used for login. But because connections are heavyweight, that’s not how our apps work — we use a connection pool with the same privileged account for all connections. I’m still looking into whether there is a simple and robust way of using row security in this environment. Unfortunately, discussion on the pgsql-general list suggests no.
Stored Procedures and Triggers
What I’ve been struggling with here is what to say that isn’t really obvious or overly detailed for this stage of my project.
Obvious: stored procedures and triggers will usually be faster and simpler than the alternative. Maybe that’s not so obvious, because why aren’t more people using server-side code?
I asked around for objections. The only really solid one I’ve heard is that it locks you into one database. This is a valid objection, but I don’t think it’s a strong one: SQL databases are sufficiently different that people don’t often switch, and unless you use very vanilla ORM, switching will be so painful folks don’t often bother. Also, you can stick to PL/pgSQL, which is deliberately similar to the language in Oracle, making that migration relatively painless.
For Postgres, it appears that the best current advice is: use PL/pgSQL if you’re writing server-side code that is mostly just joining together SQL. The reason is that PL/pgSQL uses the same types as SQL, so everything is just simpler.
If you need to do something that is more actual computation outside of SQL, for most purposes I would propose PL/V8 (ie Ecmascript). It’s performant, all developers know it these days, the implementation is featureful (you can do just about anything you can with PL/pgSQL), and it will usually be available (it’s standard on Amazon RDS, in particular).
For scientific or statistical computing purposes, you should be aware of PL/Python and PL/R. Your statistical code will likely run vastly faster if it’s right in the database.
Why and How to SQL
I also want to do some write-up on SQL at all levels. At the basic level, I think many folks who use databases are unfamiliar with the relational model and still think of what the database does for them using faulty models. The database is a logic engine! If anyone knows of a really come to Jesus sort of write up on that for developers with faulty mental models, I’d love to see it, but I think I probably have to write it.
At the more intermediate level, even developers who can design basic schemas and write competent basic joins and queries might have a decent mental model, but don’t understand how expressive a modern database makes that model. So I’d like to do a write-up of building normalized data up into representations that suits particular tasks using Views, Common Table Expressions, Window Functions, Sequence Generating Functions and a Partridge in a Pear Tree. :-)
A lot of developers also don’t understand how to use compound indexes properly.
At the more advanced end of things, there are a few topics I’d like to explore.
One thing I’m particularly intrigued by is using Postgres as a data integration point. Foreign Data Wrappers is an amazing and far-ranging feature. I have no idea at this point how practical all that is, or what the best practices would be, but I can see a lot of ways in which federating multiple data sources through Postgres is theoretically cleaner, simpler and more efficient than other approaches.
Did you know it’s part of the SQL standard (SQL/MED, since 2003)?
I would like to cover Postgres’ full-text search, its features and limitations, particularly when it would be necessary to use the likes of Solr.
It would also be interesting to cover Postgres’ NoSQL features, and perhaps to compare them with CouchDB and the like.
Finally, I’d like to consider Postgres’ support for custom types, which is really quite sophisticated and mature. We take it for granted that we should create and use custom types in our other programming tools; why don’t we tend to do that much in Postgres? Is it just that this is a very Postgres-specific feature?