Taxicab/Manhattan | Euclidean Distance mathematical intuition

In this article, I will cover taxicab (which is also known as Manhattan distance) and Euclidean distance with the mathematical intuition behind it. Distances are typically used to measure the difference between two points in the given space. Both distances are widely used in machine learning applications. One the example is Nearest Neighbor algorithm.

Pre-requisite

Basic understanding of the 2 dimensional coordinate system (Cartesian coordinates system)

In the below grid (figure 1), there are two points named A and B. The height & width of each grid block (square) is 1x1.

(figure 1) Defining dots on the grid

We want to move from point A to B using the shortest path, but the condition is that one can only move in blocks (gridlines). So to move from point A to point B in the given grid, we need to walk 5 points horizontally (forward) and 2 points vertically (upward).

(figure 2) Move in blocks from A to B

Let’s number the grid from left to right (x-axis) and bottom to top (y-axis). And define point A at (2, 3) and point B is (7, 5). To move from point A to point B (shortest path), it’s the same drill as above. On the x-axis to move from 2 to 7, walk 5 blocks and on the y-axis to move from 3 to 5, walk 2 blocks. Adding 5 (x-axis) blocks + 2 (y-axis) blocks, the total distance becomes 7.

(figure 3) Move in blocks from A to B axes

Label the coordinates of point A as (x1 = 2, y1 = 3) and point B as (x2 = 7, y2 = 5). We can rewrite the horizontal distance as x2 - x1 and vertical distance as y2 - y1, so it becomes 7 - 2 and 5 - 3 respectively which results in 5, 2 respectively. And the total distance (d) becomes 5 + 2, which is 7. So, if we write this in general terms, then it becomes -

d = (x_2 - x_1) + (y_2 - y_1)

Plug in the x and y values

d = (7 - 2) + (5 - 3)

d = (5) + (2)

d = 7

Would the distance remain the same, If the reverse distance is calculated that is from B to A? Ideally, it should be. Let’s find out.

Point B = (x1 = 7, y1 = 5) Point A = (x2 = 2, y2 = 3)

d = (2 - 7) + (3 - 5)

d = (-5) + (-2)

d = -5 - 2

d = -7

The distance from point B to A is negative 7. And the distance could not be negative. To eliminate the negation, take the modulus or absolute differences between the two points.

d = |(2 - 7)| + |(3 - 5)|

d = |-5| + |-2|

d = 5 + 2

d = 7

(figure 4) Taxicab/Manhattan distance

In general terms for two dimensions, we can write it as -

\mathbf{d = |x_2 - x_1| + |y_2 - y_1|}

The subtraction can be done either way, $|x_2 - x_1|$ or $|x_1 - x_2|$ and both will return the same positive value due to modulus.

Extending the above formula to calculate the distance in n dimensions

\mathbf{d=\sum_{i=1}^n|a_i - b_i|}

$\sum_{i=1}^n$ : represents the dimensions, starting from 1 to n.

$|a_i - b_i|$ : The absolute difference represents the value of the difference between two points in the ith dimension. Modulus or absolute value is used to ensure that the distance is non-negative.

$a_i$ , $b_i$ : are the coordinates of two points A and B in $i^{\text{th}}$ dimension. For example, in 3-dimensional space $a_1$ , $b_1$ are the x coordinates, $a_2$ , $b_2$ are the y coordinates and $a_3$ , $b_3$ are the z coordinates of points A and B.

What is covered so far is known as Taxicab or Manhattan distance. It is also known as L1 distance. Let’s consider a scenario where there is no restriction of moving in blocks, instead one could directly go from A to B in a straight line (Euclidean distance). To calculate Euclidean distance, we can leverage Pythagorean’s theorem.

Pythagorean’s Theorem | Euclidean distance

For a given two points (A, B) construct the right triangle; label the adjacent side as a, the opposite side as b, and hypotenuse (the longer side in the triangle) as c. The length of adjacent side is the absolute difference between $x_1\ \&\ x_2\$ which is 5 ( $\ \mathbf{|x_2 - x_1| = |7 - 2| = 5}$ ). The length of opposite side is the absolute difference between $y_1\ \&\ y_2\$ which is 2 ( $\ \mathbf{|y_2 - y_1| = |5 - 3| = 2}$ ). The Pythagorean theorem states that the sum of the square of the adjacent side, the opposite side is equal to the hypotenuse side square which would be 29.

$\mathbf{c^2 = a^2 + b^2}\ _{.\ .\ .\ Pythagorean\ Theorem}$
$_{where}\ \\ \ \mathbf{a = (x_2 - x_1)}\ \&\ \mathbf{b = (y_2 - y_1)}$

c^2 = (x_2 - x_1)^2 + (y_2 - y_1)^2

c^2 = (7 - 2)^2 + (5 - 3)^2

c^2 = (5)^2 + (2)^2

c^2 = 25 + 4

c^2 = 29

(figure 5) Pythagorean Theorem

From the pythagorean’s theorem; to find the length of the hypotenuse (longer side), simply take the square root on both sides as below. Let’s call it d as distance.

\mathbf{d = \sqrt{a^2 + b^2}},\ where\ \ a = (x_2 - x_1)\ \&\ b = (y_2 - y_1)

d = \sqrt{29}\ \ \ .\ .\ .\ .\ .\ .\ from\ above

\mathbf{d = 5.39}

The length of the hypotenuse side (longer side) or the straight distance between points A and B is 5.39. Hence, the Eucledian distance.

Eucledian distance is the straight distance between two given points in Eucledian space. The example covered above is in two-dimensional space ( $\mathbb{R}^2$ ) but one can extend it to n dimensional space ( $\mathbb{R}^n$ ). It is also known as L2 distance.

(figure 6) Eucledian distance

For example, to measure the distance between two points $(x_1, y_1, z_1)$ and $(x_2, y_2, z_2)$ in 3 dimensional space, the equation would be

d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2 + (z_2 - z_1)^2}

Extending it to n dimensional space for given two points $(x_1, x_2,...,x_n)$ and $(y_1, y_2,...,y_n)$ the equation would be

d = \sqrt{\sum_{i=1}^{n} (y_i - x_i)^2}

When to use

Few pointers on when to use Taxicab/Manhattan (L1) distance

Taxicab/Manhattan distance is ideal for a grid-like data like a chessboard, or city blocks layout where the movements are restricted to horizontal, vertical directions.
It is suited while dealing with sparse data. For example, while dealing with text similarity using term frequencies (sparse). It is because the absolute differences in individual dimensions could provide more meaningful distinctions between points. It can also handle high-dimensional data well.
Use this distance when you want a metric more robust to outliers because it takes the absolute distance between two given points rather than squaring the differences (Euclidean distance). As a result, the large deviations in specific dimensions do not dominate. One can consider L1 distance in the scenarios where outliers are present but do not want it to dominate the results drastically.
Manhattan distance can be considered while dealing in high-dimensional spaces, where component-wise comparisons are important and the advantage it has is, that it is less sensitive to outliers due to absolute differences, unlike Euclidean distance which squares the differences.
It is useful when dealing with discrete or non-smooth data changes or events.

Few pointers on when to use Eucledian (L2) distance

Euclidean distance is well suited when dealing with smooth, continuous data where differences in features are smooth and not drastic. Typically leveraged while working with clustering, classification, and regression algorithms.
Typically, it is suited where similarity is determined by a straight line distance between two data points in the given feature space.
Euclidean distance is not robust to outliers, so it is useful where you want to penalize the large deviations because of the nature of squaring the differences.
Leverage Euclidean distance when doing dimensionality reduction.
When the data is standardized, normalized, euclidean distance can work well given that the individual feature will not dominate the rest.

When to avoid

Few pointers on when to avoid Taxicab/Manhattan (L1) distance

Avoid using taxicab distance when geometric interpretations are important like angles, spherical shapes, straight lines because it works well with only grid-like structures.
Due to the linear nature of taxicab distance, it treats all the deviations equally across all dimensions which may fail to capture the natural relationship between given points. This occurs when dealing with continuous or dense data.
Do not use Manhattan distance when you want to penalize large deviations compared to small deviations.
Do not use with clustering algorithms that assume a spherical, natural shape.
Avoid using it when features are at different scales or inconsistent. Ensure normalization before utilizing the distance.

Few pointers on when to avoid Eucledian (L2) distance

Do not use euclidean distance with data that has significant outliers becuase it squares the differences which would result in outlier domination.
Do not use Euclidean distance with categorical or ordinal data instead try hamming or some other relevant metric.
Euclidean distance does not capture angular similarity. The angular similarity is important when dealing with problems like document similarity.
Does not work well with features at different scales.
In high dimensional space, the relative differences between points become smaller, making it harder to distinguish between data points.

Python Snippets

Taxicab/Manhattan distance 2 dimensional - taxicab_manhattan_2d.py
Taxicab/Manhattan distance N dimensional - taxicab_manhattan_nd.py
Eucledian distance 2 dimensional - euclidean_2d.py
Eucledian distance N dimensional - euclidean_nd.py

Thank you for reading!