2016-08-25-different-distances.md 13 KB


title: Different Measures of Distance tags: geometry

description: A few different ways of calculating distance in noncontinuous 2d spaces

How far is it from one place to another? How does that change if you're on a flat plane? What if that plane has discrete distances, like a checkerboard, or Civ map, or pixels on a screen?

If we've got two points, how far do you have to move to get from one to the other?

Euclidean Distance

The shortest distance between two points{width=150 height=150}

This is the go-to of distance calculations. If you're talking about continuous, uniform planar (2d) space^[It holds for 3d and higher-dimensional spaces, too.], Euclidean distance is the shortest distance between two points.

In the most general sense, the way you get this distance is by measuring it. If you're lucky enough to be using Cartesian coordinates, then you can calculate the distance based on the distance between the two coordinate points. For our purposes, I'm going to use $a$ and $b$, located at $(x_1, y_1)$ and $(x_2, y_2)$.

Pythagoras figured out how to calculate the length of the hypotenuse of a right triangle. The fun part is that, with Cartesian coordinates, any pair of points can now define a right triangle. Just take one point with an x coordinate from one point and a y coordinate from the other, and the triangle itself can be defined as $((x_!, y_1), (x_1, y_2), (x_2, y2))$^[Or $((x!, y_1), (x_2, y_1), (x_2, y_2))$ if you're feeling frisky].

At that point, it becomes pretty easy. The length of the two sides is just the absolute value of the difference between their x and y coordinates. So, $|x_1 - x_2|$ and $|y_1 - y_2|$. Those are easy to plug into the Pythagorean equation to get the distance between the points ($d$):

$$d = \sqrt{ |x_1 - x_2|^2 + |y_1 - y_2|^2 }$$

In code, this might look something like:

def eucliddistance(x1, y1, x2, y2):
    xdist = abs(x1 - x2)
    ydist = abs(y1 - y2)
    return math.sqrt( (xdist ** 2) + (ydist ** 2) )

This all seems a bit unnecessary, though, right? Cartesian coordinates are universal. They're how we think about points. All of our screens are Cartesian planes, and even our 3d worlds are Cartesian. Done and done.

Still, I think it's good to appreciate how we got here. Why? Because it doesn't work for our purposes.

A Euclidean distance Voronoi diagram{width=150 height=150}

A "point", to Euclid, is infinitesimal -- it has no size, nor shape. And a Euclidean plane is continuous -- there is no "smallest distance" you can move.

Pixels, and graph squares, on the other hand, take up space. And if you want to get from $(0,0)$ to $(7,14)$ on a discrete plane, there's no such thing as moving 15.56247..... That's the nature of a discrete space -- everything's an integer. And the other fun bit is that you can't move in arbitrary directions. On a square grid, you've only got 8 directions to go, at most.

It is also, frankly, a bit costly to calculate. Square roots are a bit more complicated than adding and subtracting.

Still, Euclidean distance is the definition of distance. And though we can't use it to determine distance of movement, we can still use it for comparative purposes.

Voronoi Diagrams

For illustration purposes, I've decided to use Voronoi diagrams. Voronoi diagrams are a way of splitting up planes around a set of sites, such that all points in the plane are associated with the closest site.

The simplest, slowest possible way to generate such a diagram is to iterate over each pixel in an image, calculate the distance between that pixel and each site, sort the sites by distance, and assign that site to the pixel. So I coded that up. I've put that code up on github. Nothing fancy going on, I assure you.

By looking at the Voronoi diagrams for the same set of sites, using different distance measures, we can get an appreciation for how those measures vary, and what their characteristics are.

Manhattan Distance

A taxicab distance Voronoi diagram{width=150 height=150}

The island of Manhattan is densly populated, and densely built. For our purposes, though, it's also, notably, laid out like a grid. If you want to get two blocks up and two blocks over, you're going to have to move four total block lengths to get there. And that's a pretty straightforward distance measure.

It's also known as taxicab distance, rectilinear distance, and a bunch of other things. And it's the simplest calculation we're going to be making. Just add the lengths of each side of our triangle:

$$d = |x_1 - x_2| + |y_1 - y_2|$$

In code, it might look like this:

def taxidistance(x1, y1, x2, y2):
    return (abs(x1 - x2) + abs(y1 - y2))

If you're on a checkerboard grid, and you are limited in movement to only the four cardinal directions, this is how you're going to measure getting from place to place. If you want to get from $(0,0)$ to $(7,14)$ in a taxi, you're going to go 21 squares, which is a fair bit further than 15.56.

If you take a look at the Voronoi diagram derived from taxi distance, you'll see that the shapes have changed from straight-edged polygons to odd, almost art-deco-like shapes. In particular, some of the places where there were very long, thin cells in the Euclidean version have changed to blocky, angled, and often much distorted versions.

Civ Distance

A 'Civ' distance Voronoi diagram{width=150 height=150}

If Manhattan distance is what you reckon with when you can only move in the four cardinal directions, what do you do when you can move in all 8 possible directions? Up until the most recent edition, the Civilization games had exactly that scenario[^1]. So let's call this the Civ distance.

Calculating this one is also easy. Instead of adding the sides together, you just take the length of the longest side.

How's that work, exactly? Well, if you're going from $(0,0)$ to $(7,14)$, you go at an angle for 7 squares (moving twice as fast as taxicabs). At that point, you're at $(7,7)$, and it's a straight shot to $(7,14)$ only 7 squares away. Just 14 squares total.

Mathematically, it looks like this:

$$d = \max(|x_1 - x_2|, |y_1 - y_2|)$$

And in code, something like this:

def civdistance(x1, y1, x2, y2):
    return (max(abs(x1 - x2), abs(y1 - y2)))

Again, very simple. And 14 is actually shorter than 15.56. By taxi, up and over is a distance of two, by Euclid that same distance is 1.41 and change. But by Civ distance, that same span is only one. That, of course leads to some different distortions in the voronoi diagram. Where the taxicab Voronoi cells stretched up, down, left, and right, the Civ cells are stretching further in the diagonals.

Overall, Civ distance is truer to Euclid than taking the taxi, but it's still a very different beast.

D&D Distance

Perhaps you've played some Dungeons & Dragons. Perhaps you've specifically played 3.5ed^[Or its offshoot, Pathfinder; or, for that matter, any of the d20 games that also derived from that source]. That game was notorious for its complex, tactical, grid-based combat system. Each square represented 5 feet of space, and each character had a limited amount of movement per turn.

{width=150 height=150}

Now, it didn't make sense to allow only movement in the four cardinal directions. That would, if nothing else, be incredibly frustrating for players trying to move their characters around the map.

They also didn't want to allow willy-nilly Civ-style movement, since that would offer a clear advantage to anyone who wished to use the diagonals[^4ed].

Ultimately, they decided on an iterative process which was a bit more complex:

  • The first time (and every odd time) you move a diagonal, it costs 5 feet of movement/distance, just like moving in a cardinal direction

  • The second time (and every even time) you move a diagonal, it costs 10 feet of movement/distance, like you're taking a taxicab

The net result is that, if you start out moving diagonally only, it's 5 feet, then 15, then 20, then 30, then 35, then 45, etc. Translating feet to grid square units, it's 1 unit, then 3, then 4, then 6, then 7, then 9.

That pattern, as it turns out, is the floor of 1.5 * the diagonal distance.

To generalize that to more than just the purely diagonal, we figure out how much we can go straight, then do the rest diagonal, and add them together:

$$d_1 = \max(|x_1 - x_2|, |y_1 - y_2|) - \min(|x_1 - x_2|, |y_1 - y_2|)$$ $$d_2 = \lfloor 1.5 \times (\min(|x_1 - x_2|, |y_1 - y_2|))\rfloor$$ $$d = d_1 + d_2$$

And in code, that looks like this[^odd]:

def dandddistance(x1, y1, x2, y2):
    mindist = min(abs(x1 - x2), abs(y1 - y2))
    maxdist = max(abs(x1 - x2), abs(y1 - y2))
    return ((maxdist - mindist) + (1.5 * mindist))

Looking at the Voronoi diagram, this is easily the closest match to Euclid. And that's not surprsing, really. When Euclid things up and over is 1.41 and change, taxicabs and Civ think it's 2 and 1, respectively, D&D thinks it's (about) 1.5, which is much, much closer to the continuous distance. To get from $(0,0)$ to $(7,14)$ by way of D&D distance is 17, which is pretty close to the 15.56 of Euclid, and without the distortions of Civ.

Of course, it's a bit more complex to calculate than Civ or taxi, but it doesn't require taking any square routes, which marks it as still computationally simpler than Euclid. And it's grid-native. The designers of 3.5 did a good job finding a relatively simple approximation for distance.

Conclusion

And here we are at the end. Hopefully getting from point $a$ to point $b$ wasn't too rough, and I know I learned a thing or two on the way.

Things get more complicated when you start measuring things in three dimensions. The D&D algorithm, particularly, is unlikely to scale well as you end up figuring out the appropriate values for each of the 26 different directions you can go. Maybe I'll revisit this again.

And let's not forget non-spatial distances! Hamming distance is particularly fun to think about. It has uses in small-scale AI tasks like autocorrect, as well as in very-large-dimension binary vector spaces^duh. I guess I'm saying that distances are fascinating things, and there's so much more to talk about.

Thanks for bearing with me.

[^1]: For an even longer time, the king piece on a chessboard had the exact same abilities, which is why this is probably more commonly known as "chessboard distance", or Chebyshev distance, after Pafnuty Chebyshev, a 19th-century Russian mathematician. But I've played more Civ than chess, frankly.

[^4ed]: When it came time to publish the 4th edition of D&D, however, they changed their mind on this. The complexity of the 3.5 distance calculation wasn't worth the verisimilitude, so they moved to using Civ distances instead. This decision also led to the infamous square fireballs, but no system is perfect.

[^odd]: You might notice that there isn't a min or math.floor call in that code. There was, for a bit, but it was generating some strange artifacts, where strange striping was occurring. I suspect that's due to an odd quantizing effect, but haven't looked into it deeply. In the meantime, using the un-floored numbers produce a more coherent image.

\ ...\ \ For some machine learning tasks, you end up representing different things in very large vector spaces (one dimension for each "feature" you are keeping track of). In many cases, binary features are more tractable and just as useful -- for example, you might keep track of whether a certain word shows up in a document, rather than how many times it shows up. When those two situations coincide, Hamming distance becomes more palatable and tractable than Euclidean distance.]