--- title: Partitioning Without Partitions tags: SQL description: I look at an old SQL hack for partitioning called "UNION ALL views". --- One of the common modern features of relational databases is the ability to partition tables. "Partition" can mean a few different things, but in this case, I'm talking about the common use-case of 1- taking a large table, 2- splitting it up into smaller tables, but 3- still accessing those tables as if they were a single table. There are some benefits to partitioning, particularly around maintenance tasks. Smaller tables mean swifter table scans, faster table operations, and faster histogram generation, not to mention the opportunity for more widely varied histograms for your data. As mentioned, though the feature is *relatively* modern. Oracle added it in 1997, SQL Server in 1998 (sort of), and later in 2005 (for real). DB2 didn't add it till 2007 (seriously). PostgreSQL added table inheritance (which is commonly used to implement partitioning) somewhere prior to 2000. But the need predates 1997. And, well, DB2 users were out in the cold until less than a decade ago. What did you do if you needed to handle Big Data[^bigdata] in 1995? ### UNION [ALL] As it turns out, enterprising hackers can do all sorts of amazing things. See, there's a SQL operator called `UNION`, which is a name that absolutely makes sense if you think about data in a very specific way. Remember [sets](https://en.wikipedia.org/wiki/Set_(mathematics))? Maybe as a data structure, maybe from formal logic, maybe from advanced math? Well, if you think about the results of a SQL `SELECT` as a set, then the [`UNION`](https://en.wikipedia.org/wiki/Union_(set_theory)) operator does exactly what you think it would: it takes the results of two different `SELECT` statements and delivers the set of unique results[^sets]. For example[^postgres]: ~~~ {.sql} =# SELECT generate_series(1,5); generate_series ----------------- 1 2 3 4 5 =# SELECT generate_series(1,5) UNION SELECT generate_series(4,8); generate_series ----------------- 1 2 3 4 5 6 7 8 ~~~ Note that this is a real set-like union -- the numbers 4 and 5 were part of both `SELECT` results, but only showed up once in the output. .... Which might not be what you want. Often times, maybe even most often, you want all of the results. That's what the `UNION ALL` operator does: ~~~ {.sql} =# SELECT generate_series(1,5) UNION ALL SELECT generate_series(4,8); generate_series ----------------- 1 2 3 4 4 5 5 6 7 8 ~~~ Now we're talking. That may not have set-theoretical utility, but it's much more useful for the pragmatics of table partitioning. ### CREATE VIEW foo With `UNION ALL`, partitioning actually becomes relatively straightforward (if still hacky and a bit brittle). You just create your partitions as separate tables, and then slap them together in a big view: ~~~ {.sql} =# CREATE TABLE foo2014 (d date, a int, b text); =# CREATE TABLE foo2015 (d date, a int, b text); =# CREATE TABLE foo2016 (d date, a int, b text); =# CREATE VIEW foo AS SELECT * FROM foo2014 UNION ALL SELECT * FROM foo2015 UNION ALL SELECT * FROM foo2016; ~~~ Et, voila! You can now select data from one place, and have your DB search a bunch of places. To play with a live demo, [check this sqlfiddle](http://sqlfiddle.com/#!15/953c5/2) (and consider donating to SQL Fiddle). Now, this is hacky. Unless you can hang triggers on your views, you probably need to send all of your `INSERT` and `UPDATE` traffic to the partitions directly, rather than to the "table" itself. And your database may not be smart enough to avoid looking at all of your partitions for each query[^constraintexclusion]. And it's brittle. I do hope you automated partition creation, because if your columns are inconsistently named or typed, then you'll have problems. And each time you need a new partition, you have to re-create the view. But it works. And it works in anything that conforms to SQL99. It even works in sqlite (though its usefulness there is debatable, since sqlite databases are a single file on-disk anyway). There's not much use for this technique now that we have more mature and robust partitioning capabilities available to us, though. Tomorrow I'll walk through some non-partitioning uses for these techniques that might be a little more relevant. [^bigdata]: Where "Big" is relative to the time. We're talking gigadata, not teradata, and certainly not petadata. [^sets]: SQL99 defines `INTERSECT` and `EXCEPT` operators, as well, for all of your tables-as-sets needs. [^postgres]: Using PostgreSQL (specifically v9.5.4). I cheated slightly and used `ORDER BY` clauses to actually generate the results, for readability purposes. The SQL as-written will return results in a potentially-unstable order. [^constraintexclusion]: Most are, but sometimes require extra configuration. And all the ones I know of rely on constraints on each partition table, and possible mirrors of those constraints in each `SELECT` clause in the view.