Choosing the parameters n_landmarks and cocycle_idx

The purpose of this example is to demonstrate the effect of different choices of the n_landmarks and cocycle_idx parameters.

The n_landmarks parameter

The n_landmarks parameter tells the algorithm how many points to sample from the original point cloud in order to perform the main computations. Sampling allows the user to run the algorithm on large datasets as long as the n_landmarks parameter is kept relatively small. The choice of the parameter n_landmarks is done to strike a balance between computation time and how representative of the whole point cloud the sample is.

The cocycle_idx parameter

Once the n_landmarks parameter has been chosen and the persistent cohomology of the landmarks has been computed, the cocycle_idx parameter tells the algorithm which persistent cohomology class is used to build the cohomological coordinates. Roughly speaking, short-lived classes (that is, classes represented by points close to the diagonal), represent small topological features. This means that cohomological coordinates built using these classes will be constant on large portions of the point cloud, and will parametrize small topological features. Conversely, long-lived classes represent large scale topological features. The user often starts by considering long-lived classes as these are usually easier to identify and to interpret.

import matplotlib.pyplot as plt
from dreimac import CircularCoords, GeometryExamples, CircleMapUtils
from persim import plot_diagrams

We load a simple dataset consisting of three circles in \(\mathbb{R}^2\) of different radii.

X = GeometryExamples.three_circles()

plt.scatter(X[:,0],X[:,1], s = 10)
plt.gca().set_aspect("equal") ; _ = plt.axis("off")

Here, we display the persistence diagram for various choices of the n_landmarks parameter. We see that, when n_landmarks is too small, the persistence diagram does not reflect the topology of the data well, and only when the parameter gets to about \(50\) we start seeing the three classes clearly.

n_landmarks_choices = [10, 20, 50, 100, 200]

for i,n_landmarks in enumerate(n_landmarks_choices):
    cc = CircularCoords(X, n_landmarks=n_landmarks)
    plt.subplot(1, len(n_landmarks_choices), i+1)
    plot_diagrams(cc._dgms, title="n_landmarks = " + str(n_landmarks))

We now fix n_landmarks \(= 100\) and focus on the cocycle_idx parameter.

We display the output circular coordinates for the three clear choices for cocycle_idx: the most persistent class (cocycle_idx \(=0\)), the second most persistent class (cocycle_idx \(=1\)), and the third most persistent class (cocycle_idx \(=2\)). DREiMac always indexes classes by their persistence, from most persistent to less persistent.

We see that, in this case, the circular coordinates given by the algorithm reflect precisely the three circular features we see by directly looking at the data.

cc = CircularCoords(X, n_landmarks=100)

cocycle_idx_choices = [0,1,2]

for i,cocycle_idx in enumerate(cocycle_idx_choices):
    circular_coordinate = cc.get_coordinates(cocycle_idx=cocycle_idx)
    plt.subplot(1, len(cocycle_idx_choices), i+1)
    plt.scatter(X[:,0],X[:,1], s = 10, c = CircleMapUtils.to_sinebow(circular_coordinate))
    plt.title("cocycle_idx = " + str(cocycle_idx))
    plt.gca().set_aspect("equal") ; _ = plt.axis("off")