2016-10-07-union-view-tricks.md 5.5 KB


title: Uses for UNION ALL and Views tags: SQL

description: A few things you can pull off with some SQL that are not hacks to get partitioning

Right, so yesterday[^yesterday], I talked about one of the cooler historical SQL hacks out there. Turns out you can (perhaps somewhat incompletely) replicate useful features purely with pure SQL99 (and pure SQL92[^sqlold]). There are, in fact, several additional fun ways to use the UNION operator in conjunction with views, and I'd like to talk about them.

The Now Partition

Let's say you have a heavy-write table that is also heavy-read. The writes come often, but you also regularly need to run reports against that data. And your customers are pretty picky about timeliness, so you can't just run the reports on a batch-populated, heavily-indexed table. But putting lots of indexes on this table would bring the insert/update time far below what you need to keep up.

Your customers are complaining, your devs are helpess. What are you to do?

Imagine combining the fast-update speed of a small, unindexed table with the fast querying of a thoroughly-indexed, batch-populated table! The possibilities would be endless.

All this is possible with a pattern I like to call "The Now Partition".

The concept is straightforward: send your insert traffic to one table, and have a batch process move the data transactionally to a queryable table. That much seems obvious. The secret sauce[^obvious] is using a view to UNION ALL those two tables together. Something like:

=# CREATE TABLE foo_20161005 (d date, a int, b text);
=# CREATE INDEX ON foo_20161005 (a);
=# CREATE INDEX ON foo_20161005 (b);
=# CREATE TABLE foo_20161006 (d date, a int, b text);
=# CREATE INDEX ON foo_20161006 (a);
=# CREATE INDEX ON foo_20161006 (b);
=# CREATE TABLE foo_now (d date, a int, b text);
=# CREATE VIEW foo AS
  SELECT * FROM foo_20161005
    UNION ALL
  SELECT * FROM foo_20161006
    UNION ALL
  SELECT * FROM foo_now;

Just make sure all your OLTP traffic hits foo_now, and you're all set. The batch transfer should be straightforward, though I advise constraining your DELETE. Unconstrained, you might run into some race conditions and delete more than you bargained for.

This may look familiar if you've read yesterday's post. Don't worry, the next one's a bit further afield.

The Lazy Archive

Let's imagine a different scenario. You need to be able to query all of history, but you only care that it's fast for the past year. Anything further back can take hours or days. Materialized views (or manual summary tables, depending on your RDBMS's capabilities) provide the speed you need.

So what's the problem? Your data is complicated. You need a few dozen different materialized views to accommodate all of your report use cases. And you've got decades of data archived -- that's not only a whole lot of storage space for all those materialized views, maintaining them takes more effort and more time.

The solution? Have materialized views for the last year, and keep the archived facts unsummarized. And (drumroll) connect them using UNION ALL in views!

Something like:

=# CREATE VIEW foo_by_a AS
  SELECT rundate, a, sum(b), sum(c), ...
      FROM foo
      WHERE rundate < '2016-01-01'
      GROUP BY rundate, a
    UNION ALL
  SELECT rundate, a, b, c, ...
      FROM foo2016_summ_by_a;

=# CREATE VIEW foo_by_b AS
  SELECT rundate, b, sum(a), sum(c), ...
      FROM foo
      WHERE rundate < '2016-01-01'
      GROUP BY rundate, b
    UNION ALL
  SELECT rundate, b, a, c, ...
      FROM foo2016_summ_by_b;

=# ....

Boom! There you go.

Unlike the earlier examples, we're not just slapping similar tables together. Instead, we're slapping together the results of two queries that happen to have the same column layout.

Of course, if you look back over all the other examples, that's what we're doing everywhere. SELECT * and "the table" is an easy difference to gloss over. It's good to remember occasionally that UNION is an operator that operates on result sets, though, not tables. And that makes it very flexible.

That flexibility is what we're exploiting here, creating an on-the-fly set of summarized data in response to those infrequent queries on old data. And it should work without slowdown on queries in the past year[^slowdown].

Ultimately, this is just one trick, but it's such a useful one. If you find yourself in a position where you need your data in two (or more) different layouts, and want the best of both worlds, well UNION ALL is exactly the tool for the job.

Let me know if you have any other UNION ALL patterns you like!

[^yesterday]: Okay, so it was two days ago. Posterity will forgive me. Unless posterity somehow can't read footnotes, and decides that "yesterday" could mean "two days ago" in the early 21st century. Or that I didn't know math, dates, or date math.

[^obvious]: Which, let's be honest, is probably also a bit obvious. I mean, I mentioned at the top of the page that I was going to talk about UNION and views, and you're all clever people, so ....

[^sqlold]: At least as far back as SQL-92. I haven't been able to track down a copy of SQL-89 or SQL-86 to check the existence of UNION.

[^slowdown]: Depending on your RDBMS. Like yesterday's caveat about configuration, you may need to tweak some settings for your planner to be smart about your old data. And there may be systems with very poor optimizers that end up reading in your entire archive table every query. I recommend migrating away from such systems (at least for your reporting needs).