Volume Optimal Cycle: Tightest representative cycle of a generator on persistent homology

This paper shows a mathematical formalization, algorithms and computation software of volume optimal cycles, which are useful to understand geometric features shown in a persistence diagram. Volume optimal cycles give us concrete and optimal homologous structures, such as rings or cavities, on a given data. The key idea is the optimality on $(q + 1)$-chain complex for a $q$th homology generator. This optimality formalization is suitable for persistent homology. We can solve the optimization problem using linear programming. For an alpha filtration on $\mathbb{R}^n$, volume optimal cycles on an $(n-1)$-th persistence diagram is more efficiently computable using merge-tree algorithm. The merge-tree algorithm also gives us a tree structure on the diagram and the structure has richer information. The key mathematical idea is Alexander duality.


Introduction
Topological Data Analysis (TDA) [1,2], which clarifies the geometric features of data from the viewpoint of topology, is developed rapidly in this century both in theory and application. In TDA, persistent homology and its persistence diagram (PD) [3,4] are important tool for TDA. Persistent homology enables us to capture multiscale topological features effectively and quantitatively. Fast computation softwares for persistent homology are developed [5,6] and many applications are achieved such as materials science [7,8,9], sensor networks [10], evolutions of virus [11], and so on. From the viewpoint of data analysis, a PD has some significant properties: translation and rotation invariance, multiscalability and robustness to noise. PDs are considered to be compact descriptors for complicated geometric data.
qth homology H q encodes q dimensional geometric structures of data such as connected components (q = 0), rings (q = 1), cavities (q = 2), etc. qth persistent homology encodes the information about q dimensional geometric structures with their scale. A PD, a multiset 1 in R × (R ∪ {∞}), is used to summarize the information. Each point in a PD is called a birth-death pair, which represents a homologous structure in the data, and the scale is encoded on x-axis and y-axis.
Typical workflow of the data analysis with persistent homology is as follows: (1) Construct a filtration from data • Typical input data is a point cloud, a finite set of points in R n and a typical filtration is an alpha filtration (2) Compute the PD from the filtration (3) Analyze the PD to investigate the geometric features of the data In the last part of the above workflow, we often want to inversely reconstruct a geometric structure corresponding each birth-death pair on the PD, such as a ring or a cavity, into the original input data. Such inverse analysis is practically important for the use of PDs. For example, we consider the 1st PD shown in Fig. 1 from the atomic configuration of amorphous silica computed by molecular dynamical 1 A multiset is a set with multiplicity on each point.  Figure 1. The 1st PD for the atomic configuration of amorphous silica in [7], reproduced from the simulation data. The data is provided by Dr. Nakamura. simulation [7]. In this PD, there are some characteristic bands C P , C T , C O , B O , and these bands correspond to typical geometric structures in amorphous silica. To analyze the PD more deeply, we want to reconstruct rings corresponding such birth-death pairs in the original data. In the paper, optimal cycles, one of such inverse analysis methods, are effectively used to clarify such typical structures. A representative cycle of a generator of the homology vector space has such information, but it is not unique and we want to find a better cycle to understand the homology generator for the analysis of a PD. For example, Fig. 2(a) has one homology generator on H 1 , and cycles z 1 , z 2 , and z 3 shown in Fig. 2 (b), (c), and (d) are the same homologous information. However, we consider that z 3 is best to understand the homology. Optimization problems on homology are used to find such a representative cycle. We can find the "tightest" representative cycle under a certain formalization. Such optimization problems have been widely studied under various settings [12,13,14], and two concepts, optimal cycles [15] and volume optimal cycles [16], have been successfully applied to persistent homology. The optimal cycle minimizes the size of the cycle, while the volume optimal cycle minimizes the internal volume of the cycle. Both of these two methods give a tightest cycle in different senses. The volume optimal cycles for persistent homology have been proposed in [16] under the restriction of dimension. We can use them only for (n − 1)-th persistent homology embedded in R n , but under the restriction, there is an efficient computation algorithm using Alexander duality.
In this paper, we generalize the concept of volume optimal cycles on any persistent homology and show the computation algorithm. The idea in [16] is not applied to find a volume optimal ring (a volume optimal cycle for q = 1) in a point cloud in R 3 but our method is applicable to such a case. In that case, optimal cycles are also applicable, but our new algorithm is simpler, faster for large data, and gives us better information.
The contributions of this paper are as follows: • The concept of volume optimal cycles is proposed to identify good representatives of generators in persistent homology. This is useful to understand a persistence diagram.
-The concept has been already proposed in [16] in a strongly limited sense about dimension and this paper generalize it. -Optimal cycles are also usable for the same purpose, but the algorithm in this paper is easier to implement, faster, and gives better information.
* Especially, children birth-death pairs shown in Section 6 are available only with volume optimal cycles. • Mathematical properties of volume optimal cycles are clarified.
• Effective computation algorithms for volume optimal cycles are proposed.
• The algorithm is implemented and some examples are computed by the program to show the usefulness of volume optimal cycles.
The rest of this paper is organized as follows. The fundamental ideas such as persistent homology and simplicial complexes are introduced in Section 2. In Section 3 the idea of optimal cycles is reviewed. Section 4 is the main part of the paper. The idea of volume optimal cycles and the computation algorithm in a general setting are presented in Section 4. Some mathematical properties of volume optimal cycles are also shown in this section. In Section 5 we show some special properties of (n − 1)-th persistent homology in R n and the faster algorithm. We also explain tree structures in (n − 1)-th persistent homology. In Section 6, we compare volume optimal cycles and optimal cycles. In Section 7 we show some computational examples by the proposed algorithms. In Section 8, we conclude the paper.

Persistent homology
In this section, we explain some preliminaries about persistent homology and geometric models. Persistent homology is available on various general settings, but we mainly focus on the persistent homology on a filtration of simplicial complexes, especially an alpha filtration given by a point cloud.
That is, X t ⊂ X t holds for every t ≤ t . Then we define qth homology vector spaces {H q (X t )} t∈T whose coefficient is a field k and homology maps ϕ t s : is called the qth persistent homology. The theory of persistent homology enables us to analyze the structure of this family.
Under some assumptions, H q (X) is uniquely decomposed as follows [3,4]: consists of a family of vector spaces and linear maps: This means that for each I(b i , d i ) there is a q dimensional hole in X and it appears at t = b i , persists up to t < d i and disappears at t = d i . In the case of d i = ∞, the q dimensional hole never disappears on X. b i is called a birth time, d i is called a death time, and the pair (b i , d i ) is called a birth-death pair. When X is a filtration of finite simplicial/cell/cubical complexes on T with #T < ∞ (we call X a finite filtration under the condition), such a unique decomposition exists. When we have the unique decomposition, the qth persistence diagram of X, D q (X), is defined by a multiset and the 2D scatter plot or the 2D histogram of D q (X) is often used to visualize the diagram.
We investigate the detailed algebraic structure of persistent homology for the preparation. For simplicity, we assume the following condition on X.
Under the condition, is a filtration of complexes. For a general finite filtration, we can construct a filtration satisfying Condition 1 by ordering all simplices properly. Let ∂ q : C q (X) → C q−1 (X) be the boundary operator on C q (X) and ∂ (k) q : C q (X k ) → C q−1 (X k ) be a boundary operator of C q (X k ). Cycles Z q (X k ) and boundaries B q (X k ) are defined by the kernel of ∂ (k) q and the image of ∂ (k) q+1 , and qth homology vector spaces are defined by H q (X k ) = Z q (X k )/B q (X k ). Condition 1 says that if σ k is a q-simplex, holds. From the decomposition theorem and (2), for each birth-death pair where i=1 . An algorithm of computing a PD actually finds persistence cycles from a given filtration. The persistence cycle of (b i , d i ) is not unique, therefore we want to find a "good" persistence cycle to find out the geometric structure corresponding to each birth-death pair. That is the purpose of the volume optimal cycle, which is the main topic of this paper. We remark that the condition (7) can be easily proved from (3)(4)(5)(6) and the decomposition theorem, and hence we only need to show (3)(4)(5)(6) to prove that given {z i } p i=1 are persistence cycles.
2.2. Alpha filtration. One of the most used filtrations for data analysis using persistent homology is an alpha filtration [2,17]. An alpha filtration is defined from a point cloud, a set of finite points P = {x i ∈ R n }. The alpha filtration is defined as a filtration of alpha complexes and they are defined by a Voronoi diagram and a Delaunnay triangulation.
The Voronoi diagram for a point cloud P , which is a decomposition of R n into Voronoi cells The Delaunnay triangulation of P , Del(P ), which is a simplicial complex whose vertices are points in P , is defined by where [x i0 · · · x iq ] is the q-simplex whose vertices are x i0 , . . . , x iq ∈ P . Under the assumption of general position in the sense of [17], the Delaunnay triangulation is a simplicial decomposition of the convex hull of P and it has good geometric properties. The alpha complex Alp(P, r) with radius parameter r ≥ 0, which is a subcomplex of Del(P ), is defined as follows: where B r (x) is the closed ball whose center is x and whose radius is r. A significant property of the alpha complex is the following homotopy equivalence to the r-ball model.
where |Alp(P, r)| is the geometric realization of Alp(P, r). The alpha filtration for P is defined by {Alp(P, r)} r≥0 . Figure 3 illustrates an example of a filtration by r-ball model and the corresponding alpha filtration. The 1st PD of this filtration is {(r 2 , r 5 ), (r 3 , r 4 )}. Since there are r 1 < · · · < r K such that Alp(P, s) = Alp(P, t) for any r i ≤ s < t < r i+1 , we can treat the alpha filtration as a finite filtration Alp(P, r 1 ) ⊂ · · · ⊂ Alp(P, r K ). We mention an weighted alpha complex and its filtration [18]. An alpha complex is topologically equivalent to the union of r-balls, while an weighted alpha complex is topologically equivalent to the union of √ r 2 + α i -balls, where α i depends on each point. The weighted alpha filtration is useful to study the geometric structure of a point cloud whose points have their own radii. For example, for the analysis of atomic configuration, the square of ionic radii or Van der Waals radii are used as α i .

Optimal cycle
First, we discuss an optimal cycle on normal homology whose coefficient is k = Z 2 . Figure 2(a) shows a simplicial complex whose 1st homology vector space H 1 is isomorphic to Z 2 . In Fig. 2(b), (c), and (d), z 1 , z 2 , and z 3 have same information about H 1 . That is, ] . However, we intuitively consider that z 3 is the best to represent the hole in Fig. 2 since z 3 is the shortest loop in these loops. Since the size of a loop z = σ:1−simplex α σ σ ∈ Z 1 (X) is equal to #{σ : 1-simplex | α σ = 0}, and this is 0 "norm" 23 , we write it z 0 . Here, z 3 is the solution of the following problem: The minimizing z is called the optimal cycle for z 1 . From the definition of homology, we can rewrite the problem as follows: (8) minimize z 0 , subject to: Now we complete the formalization of the optimal cycle on a simplicial complex with one hole.
How about the case if a complex has two or more holes? We consider the example in Fig. 4. From z 1 and z 2 , we try to find z 1 and z 2 using a similar formalization. If we apply the optimization (8) to each z 1 and z 2 , z 1 and z 2 are found. How can we find z 1 from z 1 and z 2 ? The problem is a hole represented by z 2 , therefore we "fill" that hole and solve the minimization problem. Mathematically, filling a hole corresponds to considering Z 1 (X)/(B 1 (X) ⊕ z 2 ) instead of Z 1 (X)/B 1 (X) and the following optimization problem gives us the required loop z 1 . minimize z 0 , subject to: 2 For a finite dimensional R-or C-vector space whose basis is {g i } i , the 0 norm · 0 is defined by i α i g i 0 = #{i | α i = 0}. Mathematically this is not a norm since it is not homogeneous, but in information science and statistics, it is called 0 norm. 3 On a Z 2 -vector space, any norm is not defined mathematically, but it is natural that we call this 0 norm.
When you have a complex that has many holes, you can apply the idea repeatedly to find all optimal cycles. The idea of optimal cycles obviously applied qth homology for any q.
3.1. How to compute an optimal cycle. Finding a basis of a homology vector space is not a difficult problem for a computer. We prepare a matrix representation of the boundary operator and apply matrix reduction algorithm. Please read [19] for the detailed algorithm. Therefore the problem is how to solve the above minimizing problem.
In general, solving a optimization problem on a Z 2 linear space is a difficult problem. The problem is a kind of combinatorial optimization problems. They are well studied but it is well known that such a problem is sometimes hard to solve on a computer.
One approach is using linear programming, used in [20]. Since optimization problem on Z 2 is hard, we use R as a coefficient. For R coefficient, 0 norm also means the size of loop and 0 optimization is natural for our purpose. However, 0 optimization is also a difficult problem. Therefore we replace 0 norm to 1 norm. It is well known in the fields of sparse sensing and machine learning that 1 optimization gives a good approximation of 0 optimization. That is, we solve the following optimization problem instead of (8).
minimize z 1 , subject to: This is called a linear programming and we can solve the problem very efficiently by good solvers such as cplex 4 and Clp 5 . Another approach is using integer programming, used in [12,15]. 1 norm optimization gives a good approximation, but maybe the solution is not exact. However, if all coefficients are restricted into 0 or ±1 in the optimization problem (9), the 0 norm and 1 norm is identical, and it gives a better solution. This restriction on the coefficients has another advantage that we can understand the optimal solution in more intuitive way. Such an optimization problem is called integer programming. Integer programming is much slower than linear programming, but some good solvers such as cplex and Clp are available for integer programming.

3.2.
Optimal cycle for a filtration. Now, we explain optimal cycles on a filtration to analyze persistent homology shown in [15]. We start from the example  In the filtration, a hole [z 1 ] appears at X 2 and disappear at X 3 , another hole [z 2 ] appears at X 4 and [z 3 ] appears at X 5 . The 1st PD of the filtration is {(2, 3), (4, ∞), (5, ∞)}.
The persistence cycles z 1 , z 2 , z 3 , are computable by the algorithm of persistent homology and we want to find z 3 or z 3 to analyze the hole corresponding to the birth-death pair (5, ∞). The hole [z 1 ] has been already dead at X 5 and [z 2 ] remains alive at X 5 , so we can find z 3 or z 3 to solve the following optimization problem: minimize z 0 subject to: In this case, z 3 is chosen because z 3 0 > z 3 0 . By generalizing the idea, we show Algorithm 1 to find optimal cycles for a filtration X 6 . Of course, to solve the optimization problem in Algorithm 1, we can use the computation techniques shown in Section 3.1.

Algorithm 1 Computation of optimal cycles on a filtration
Compute D q (X) and persistence cycles z 1 , . . . , z n Choose (b i , d i ) ∈ D q (X) by a user Solve the following optimization problem minimize z 1 , subject to:

Volume optimal cycle
In this section, we propose volume optimal cycles, a new tool to characterize generators appearing in persistent homology. In this section, we will show the generalized version of volume optimal cycles and the computation algorithm. The limited version of volume optimal cycles shown in [16] will be explained in the next section.
We assume Condition 1 and consider the filtration X : k is the linear map on C q (X) satisfying σ * k (σ j ) = δ kj for any σ k , σ j : q-simplex.
Note that the persistent volume is defined only if the death time is finite. The volume optimal cycle for (b i , d i ) and the optimal volume for the pair are defined as follows.
Definition 3. ∂ẑ is the volume optimal cycle andẑ is the optimal volume for (b i , d i ) ∈ D q (X) ifẑ is the solution of the following optimization problem. minimize z 0 , subject to (10), (11), and (12).
The following theorem ensures that the optimization problem of the volume optimal cycle always has a solution.
Theorem 4. There is always a persistent volume of any The following theorem ensures that the volume optimal cycle is good to represent the homology generator corresponding to (b i , d i ).
Intuitively say, a homology generator is dead by filling the internal volume of a ring, a cavity, etc., and a persistent volume is such an internal volume. The volume optimal cycle minimize the internal volume instead of the size of the cycle.
Proof of Theorem 5. We prove the following arguments. The theorem follows from these arguments.
The condition (11), τ * (∂z i ) = 0 for all τ ∈ F q , means ∂z i ∈ Z q (X bi ). The condition (12) and this finishes the proof. 4.1. Algorithm for volume optimal cycles. To compute the volume optimal cycles, we can apply the same strategies as optimal cycles. Using linear programming with R coefficient and 1 norm is efficient and gives sufficiently good results. Using integer programming is slower, but it gives better results. Now we remark the condition (12). In fact it is impossible to handle this condition by linear/integer programming directly. We need to replace this condition to |σ * bi (∂z)| ≥ for sufficiently small > 0 and we need to solve the optimization problem twice for σ * bi (∂z) ≥ and σ * bi (∂z) ≤ − . However, as mentioned later, we can often remove the constraint (12) to solve the problem and this fact is useful for faster computation.
We can also apply the following heuristic performance improvement technique to the algorithm for an alpha filtration by using the locality of an optimal volume. The simplices which contained in the optimal volume for (b i , d i ), are contained in a neighborhood of σ di . Therefore we take a parameter r > 0, and we use F is the ball of radius r whose center is the centroid of σ di . Obviously, we cannot find a solution with a too small r. In Algorithm 2, r is properly chosen by a user but the computation software can automatically increase r when the optimization problem cannot find a solution.
We also use another heuristic for faster computation. To treat the constraint (12), we need to apply linear programming twice for positive case and negative case. In many examples, the optimized solution automatically satisfies (12) even if we remove the constrain. There is an example in which the corner-cutting does not work (shown in 4.2), but it works well in many cases. One way is that we try to solve the linear programming without (12) and check the (12), and if (12) is satisfied, output the solution. Otherwise, we solve the linear programming twice with (12).
The algorithm to compute a volume optimal cycle for an alpha filtration is Algorithm 2.
Algorithm 2 Algorithm for a volume optimal cycle procedure Volume-Optimal-Cycle(X, r) Compute the persistence diagram D q (X) Choose a birth-death pair (b i , d i ) ∈ D q (X) by a user Solve the following optimization problem: minimize z 1 , subject to: if we find the optimal solutionẑ then if σ * bi (∂ẑ) = 0 then returnẑ and ∂ẑ else Retry optimization twice with the additional constrain: else return the error message to the user to choose larger r.
If your filtration is not an alpha filtration, possibly you cannot use the locality technique. However, in that case, the core part of the algorithm works fine and you can use the algorithm.

4.2.
Some properties about volume optimal cycles. In this subsection, we remark some properties about volume optimal cycles.
First, the volume optimal cycle for a birth-death pair is not unique. Figure 6 shows such an example. In this example, D 1 = {(1, 5), (3,4), (2, 6)} and both (b) and (c) is the optimal volumes of the birth-death pair (2,6). In this filtration, any weighted sum of (b) and (c) with weight λ and 1 − λ (0 ≤ λ ≤ 1) in the sense of chain complex is the volume optimal cycle of (2, 6) if we use R as a coefficient and 1 norm. However, standard linear programing algorithms choose an extremal point solution, hence the algorithms choose either λ = 0 or λ = 1 and our algorithm outputs either (b) or (c). Second, by the example in Fig 7, we show that the optimization problem for the volume optimal cycle may give a wrong solution if the constrain (12) is removed.
are birth-death pairs in the 1st PD, and the volume optimal cycle for (b 1 , d 1 ) is (α) in Fig. 7, but the algorithm gives (β) if the constrain (12) is removed.

Volume optimal cycle on (n − 1)-th persistent homology
In this section, we consider a triangulation of a convex set in R n and its (n−1)-th persistent homology. More precisely, we assume the following conditions. Condition 6. A simplicial complex X in R n satisfies the following conditions.
• Any k-simplex (k < n) in X is a face of an n-simplex • |X| is convex For example, an alpha filtration satisfies the above conditions if the point cloud has more than n points and satisfies the general position condition. In addition, we assume Condition 1 to simplify the statements of results and algorithms.
The thesis [16] pointed out that (n − 1)-th persistent homology is isomorphic to 0th persistent cohomology of the dual filtration by the Alexander duality under the assumption. By using this fact, the thesis defined volume optimal cycles under different formalization from ours. The thesis defined a volume optimal cycle as an output of Algorithm 3. In fact, the two definitions of volume optimal cycles are equivalent on (n − 1)-th persistent homology. 0th persistent cohomology is deeply related to the connected components, and we can compute the volume optimal cycle by linear computation cost. The thesis also pointed out that (n − 1)-th persistent homology has a tree structure called persistence trees (or PH trees).
In this section, we always use Z 2 as a coefficient of homology since using Z 2 makes the problem easier.
The following theorems hold.
Theorem 7. The optimal volume for (b i , d i ) ∈ D n−1 (X) is uniquely determined.
Theorem 8. If z i and z j are the optimal volumes for two different birth-death pairs (b i , d i ) and (b j , d j ) in D n−1 (X), one of the followings holds: Note that we can naturally regard any z = σ:n-simplex k σ σ ∈ C n (X) as a subset of n-simplices of X, {σ : n-simplex | k σ = 0}, since we use Z 2 as a homology coefficient.
From Theorem 8, we know that D n−1 (X) can be regarded as a forest (i.e. a set of trees) by the inclusion relation. The trees are called persistence trees.
We can compute all optimal volumes and persistence trees on D n−1 (X) by the merge tree algorithm (Algorithm 3). This algorithm is a modified version of the algorithm in [16]. To describe the algorithm, we prepare a directed graph (V, E) where V is a set of nodes and E is a set of edges. In the algorithm, an element of V is a n-cell in X ∪ {σ ∞ } and an element of E is a directed edge between two n-cells, where σ ∞ = R n \X is the n-cell in the one point compactification space R n ∪ {∞} S n . An edge has extra data in Z and we write the edge from σ to τ with extra data k as (σ k − → τ ). Since the graph is always a forest through the whole algorithm, we always find a root of a tree which contains a n-cell σ in the graph (V, E) by recursively following edges from σ. We call this procedure Root(σ, V, E).
The following theorem gives us the interpretation of the result of the algorithm to the persistence information. initialize V = {σ ∞ } and E = ∅ for k = K, . . . , 1 do if σ k is a n-simplex then add σ k to V else if σ k is a (n − 1)-simplex then let σ s and σ t are two n-cells whose common face is σ k The theorems in this section can be proven from the following facts: • From Alexander duality, for a simplicial complex X in R n , holds.
-∞ is required for one point compactification of R n .
-More precisely, we use the dual decomposition of X. • By applying above Alexander duality to a filtration, (n − 1)-th persistent homology is isomorphic to 0-th persistent cohomology of the dual filtration. • On a cell complexX, a basis of 0-th cohomological vector space is given by where cc(X) is the connected component decomposition of 0-cells inX.
• Merge-tree algorithm traces the change of connectivity in the filtration, and it gives the structure of 0-the persistent cohomology. We prove the theorems in Appendix A.

5.1.
Computation cost for merge-tree algorithm. In the algorithm, we need to find the root from its descendant node. The naive way to find the root is following the graph step by step to the ancestors. In the worst case, the time complexity of the naive way is O (N ) where N is the number of of n-simplices, and total time complexity of the algorithm becomes O(N 2 ). The union-find algorithm [21] is used for a similar data structure, and we can apply the idea of union-find algorithm. By adding a shortcut path to the root in a similar way as the union-find algorithm, the amortized time complexity is improved to almost constant time 7 . Using the technique, the total time complexity of the Algorithm 3 is O(N ). 7 More precisely, the amortized time complexity is bounded by the inverse of Ackermann function and it is less than 5 if the data size is less than 2 2 2 2 16 . Therefore we can regard the time complexity as constant.

Comparison between volume optimal cycles and optimal cycles
In this section, we compare volume optimal cycles and optimal cycles. In fact, optimal cycles and volume optimal cycles are identical in many cases. However, since we can use optimal volumes in addition to volume optimal cycles, we have more information than optimal cycles. One of the most prominent advantage of volume optimal cycles is children birth-death pairs, explained below. 6.1. Children birth-death pairs. In the above section, we show that there is a tree structure on an (n−1)-th persistence diagram computed from a triangulation of a convex set in R n . Unfortunately, such a tree structure does not exist in a general case. However, in the research of amorphous solids by persistent homology [7], a hierarchical structure of rings in R 3 is effectively used, and it will be helpful if we can find such a structure on a computer. In [7], the hierarchical structure was found by computing all optimal cycles and searching multiple optimal cycles which have common vertices. However, computing all optimal cycles or all volume optimal cycles is often expensive as shown in Section 7.4 and we require a cheaper method. The optimal volume is available for that purpose. When the optimal volume for a birth-death pair (b i , d i ) isẑ = σ di + σ k ∈Fq+1α k σ k , the children birth-death pairs of (b i , d i ) is defined as follows: This is easily computable from a optimal volume with low computation cost. Now we remark that if we consider (n − 1)-th persistent homology in R n , the children birth-death pairs of (b i , d i ) ∈ D n−1 (X) is identical to all descendants of (b i , d i ) in the tree structure. This fact is known from Theorem 8. This fact suggests that we can use children birth-death pairs as a good substitute for the tree structure appearing on D n−1 (X) in R n . The ability of children birth-death pairs is shown in Section 7.2, the example of amorphous silica.

6.2.
Some examples in which volume optimal cycles and optimal cycles are different. We show some differences between optimal cycles and volume optimal cycles on a filtration. In Fig 8, the 1st PD of this filtration is {(2, 5), (3, 4)}. The optimal cycle of (3, 4) is z 1 since z 1 1 < z 2 1 but the volume optimal cycle is z 2 . In this example, z 2 is better than z 1 to represent the birth-death pair (3,4). The example is deeply related to Theorem 5. Such a theorem does not hold for optimal cycles and it means that an optimal cycle may give misleading information about a birth-death pair. This is one advantage of volume optimal cycles compared to optimal cycles. Figure 8. A filtration whose optimal cycle and volume optimal cycle are different.
In Fig. 9 and Fig. 10, optimal cycles and volume optimal cycles are also different. In Fig. 9, the optimal cycle is z 1 but the volume optimal cycle z 2 . In Fig. 10, the optimal cycle for (3, 4) is z 1 but the volume optimal cycle is z 1 + z 2 .
In Fig 11, the 1st PD is (2, ∞) and we cannot define the volume optimal cycle but can define the optimal cycle. In general, we cannot define the volume optimal cycle for a birth-death pair with infinite death time. If we use an alpha filtration in R n , such a problem doest not occur because a Delaunnay triangulation is always Figure 9. Another filtration whose optimal cycle and volume optimal cycle are different. Figure 10. Another filtration whose optimal cycle and volume optimal cycle are different.
acyclic. But if we use another type of a filtration, we possibly cannot use volume optimal cycles. That may be a disadvantage of volume optimal cycles if we use a filtration other than an alpha filtration, such as a Vietoris-Rips filtration. Figure 11. A filtration without a volume optimal cycle. One more advantage of the volume optimal cycles is the simplicity of the computation algorithm. For the computation of the optimal cycles we need to keep track of all persistence cycles but for the volume optimal cycles we need only birth-death pairs. Some efficient algorithms implemented in phat and dipha do not keep track of such data, hence we cannot use such softwares to compute the optimal cycles without modification. By contrast we can use such softwares for the computation of the volume optimal cycles.

Example
In this section, we will show the example results of our algorithm. In all of these examples, we use alpha or weighted alpha filtrations.
For all of these examples, optimal volumes and volume optimal cycles are computed on a laptop PC with 1.2 GHz Intel(R) Core(TM) M-5Y71 CPU and 8GB memory on Debian 9.1. Dipha [5] is used to compute PDs, CGAL 8 is used to compute (weighted) alpha filtrations, and Clp [22] is used to solve the linear programming. Python is used to write the program and pulp 9 is used for the interface to Clp from python. Paraview 10 is used to visualize volume optimal cycles.
If you want to use the software, please contact with us. Homcloud 11 , a data analysis software with persistent homology developed by our laboratory, provides the algorithms shown in this paper. Homcloud provides the easy access to the volume optimal cycles. We can visualize the volume optimal cycle of a birth-death pair only by clicking the pair in a PD on Homcloud's GUI.   Figure 12 shows the 1st and 2nd PDs. The 1st PD has two birth-death pairs (0.001, 0.072) and (0.001, 0.453) and the 2nd PD has one birth-death pair (0.008, 0.081) far from the diagonal. These birth-death pairs correspond to generators of H 1 (T 2 ) k 2 and H 2 (T 2 ) k. Figure 13 shows the volume optimal cycles of these three birth-death pairs using Algorithm 2. Blue lines show volume optimal cycles, red lines show optimal volumes, black lines show σ d for each birth death pair (b, d) (we call this simplex the death simplex ). Black dots show the point cloud. By the figure, we understand how homology generators appear and disappear in the filtration of the torus point cloud. The computation times are 25sec, 33sec, and 7sec on our laptop PC. By using Algorithm 3, we can also compute volume optimal cycles in D 2 . In this example, the computation time by Algorithm 3 is about 2sec. This is much faster than Algorithm 2 even if Algorithm 3 computes all volume optimal cycles. 7.2. Amorphous silica. In this example, we use the atomic configuration of amorphous silica computed by molecular dynamical simulation as a point cloud and we try to reproduce the result in [7]. In this example, we use weighted alpha filtration whose weights are the radii of atoms. The number of atoms are 8100, 2700 silicon atoms and 5400 oxygen atoms. Figure 1 shows the 1st PD. This diagram have four characteristic areas C P , C T , C O , and B O . These areas correspond to the typical ring structures in the amorphous silica as follows. Amorphous silica consists of silicon atoms and oxygen atoms and the network structure is build by covalent bonds between silicons and   Figure 14 shows the volume optimal cycles for birth-death pairs in C P , C t , C O and B O . In this figure oxygen (red) and silicon (blue) atoms are also shown in addition to volume optimal cycles, optimal volumes, and death simplices. We can reproduce the result of [7] about ring reconstruction.
We also know that the oxygen atom rounded by the green circle in this figure is important to determine the death time. The death time of this birth-death pair is determined by the radius of circumcircle of the black triangle (the death simplex), hence if the oxygen atom moves away, the death time becomes larger. The oxygen atom is contained in another · · · −Si−O−Si−O−· · · ring structure around the volume optimal cycle (the blue ring). By the analysis of the optimal volume, we clarify that such an interaction of covalent bond rings determines the death times of birth-death pairs in C P . This analysis is impossible for the optimal cycles, and the volume optimal cycles enable us to analyze persistence diagrams more deeply. Figure 15 shows the children birth-death pairs of the green birth-death pair. The rings corresponding to these children birth-death pairs are subrings of the large ring corresponding to the green birth-death pair. This computation result shows that a ring in C P has subrings in C T , C O , and B O . The hierarchical structure of these rings shown in [7]. We can easily find such a hierarchical structure by using our new algorithm.
The computation time is 3 or 4 seconds for each volume optimal cycle on the laptop PC. The computation time for amorphous silica is much less than that for 2-torus even if the number of points in amorphous silica is larger than that in 2torus. This is because the locality of volume optimal cycles works very fine in the example of amorphous silica. 7.3. Face centered cubic lattice with defects. The last example uses the point cloud of face centered cubic (FCC) lattice with defects. By this example, we show how to use the persistence trees computed by Algorithm 3. The point cloud is prepared by constructing perfect FCC lattice, adding small Gaussian noise to each point, and randomly removing points from the point cloud. Figure 16.  In materials science, these cavities are famous as octahedron sites and tetrahedron sites. Figure 16(b) shows the 2nd PD of the lattice with defects. In the PD, birth-death pairs corresponding to octahedron and tetrahedron cavities remain ((i) and (ii) in Fig 16(b)), but other types of birth-death pairs appear in this PD. These pairs correspond to other types of cavities generated by removing points from the FCC lattice. Figure 17(a) shows a tree computed by Algorithm 3. Red markers are nodes of the tree, and lines between two markers are edges of the tree, where upper left nodes are ancestors and lower right nodes are descendants. The tree means that the largest cavity corresponding to most upper-left node has sub cavities corresponding descendant nodes. Figure 17(b) shows the volume optimal cycle of the most upperleft node, (c) shows the volume optimal cycles of pairs in (i), and (d) shows the volume optimal cycles of pairs in (ii). Using the algorithm, we can study the hierarchical structures of the 2nd PH. 7.4. Computation performance comparison with optimal cycles. We compare the computation performance between optimal cycles and volume optimal cycles. We use OptiPers for the computation of optimal cycles for persistent homology, which is provided by Dr. Escolar, one of the authors of [15]. OptiPers is written in C++ and our software is mainly written in python, and python is much slower than C++, so the comparison is not fair, but suggestive for the readers.
We use two test data. One test data is the atomic configuration of amorphous silica used in the above example. The number of points is 8100. Another data is the partial point cloud of the amorphous silica. The number of points is 881. We call these data the large data and the small data. Table 1 shows the computation time of optimal cycles/volume optimal cycles for all birth-death pairs in the 1st PD by OptiPers/Homcloud. optimal cycles (OptiPers) volume optimal cycles (Homcloud) the small data 1min 17sec 3min 9sec the large data 5hour 46min 4hour 13min Table 1. Computation time of optimal cycles and volume optimal cycles on the large/small data.
For the small data, OptiPers is faster than Homcloud, but to the contrary, for the large data, Homcloud is faster than OptiPers. This is because the performance improvement technique using the locality of the optimal volume works fine for the large data, but for the small data the technique is not so effective and the overhead cost using python is dominant for Homcloud. This benchmark shows that the volume optimal cycles have an advantage about the computation time when an input point cloud is large.

Conclusion
In this paper, we propose the idea of volume optimal cycles to identify good geometric realizations of homology generators appearing in persistent homology. Optimal cycles are proposed for that purpose in [15], but our method is faster for large data and gives better information. Especially, we can reasonably compute children birth-death pairs only from a volume optimal cycle. Volume optimal cycles are already proposed under the limitation of dimension in [16], and this paper generalize the idea.
Our idea and algorithm are widely applicable to and useful for the analysis of point clouds in R n by using the (weighted) alpha filtrations. Our method gives us intuitive understanding of PDs. In [23], such inverse analysis from a PD to its original data is effectively used to study many geometric data with machine learning on PDs and our method is useful to the combination of persistent homology and machine learning.
In this paper, we only treat simplicial complex, but our method is also applicable to a cell filtration and a cubical filtration. Our algorithms will be useful to study sublevel or superlevel filtrations given by 2D/3D digital images.

Appendix A. Proofs of Section 5
The theorems shown in this section are a kind of folklore theorems. Some researchers about persistent homology probably know the fact that the merge-tree algorithm gives a 0th PD, and the algorithm is available to compute an (n − 1)-th PD using Alexander duality, but we cannot find the literature for the complete proof. [16] stated that the algorithm also gives the tree structure on an (n − 1)-th PD, but the thesis does not have the complete proof. Therefore we will show the proofs here.
Alexander duality says that for any good topological subspace X of S n , the (k − 1)-th homology of X and (n − k)-th cohomology of S n \X have the same information. In this section, we show Alexander duality theorem on persistent homology. In this section, we always use Z 2 as a coefficient of homology and cohomology.
A.1. Persistent cohomology. The persistent cohomology is defined on a decreasing sequence Y : Y 0 ⊃ · · · ⊃ Y K of topological spaces. The cohomology vector spaces and the linear maps induced from inclusion maps define the sequence and this family of maps is called persistent cohomology H q (Y). The decomposition theorem also holds for persistent cohomology in the same way as persistent homology and we define the qth cohomologous persistence diagram D q (Y) using the decomposition.
A.2. Alexander duality. Before explaining Alexander duality, we show the following proposition about the dual decomposition.
Proposition 10. For any oriented closed n-manifold S and its simplicial decomposition K, there is a decomposition of M ,K, satisfying the followings: (1)K is a cell complex of M .
(2) There is a one-to-one correspondence between K andK. For σ ∈ K, we write the corresponding cell inK asσ. (3) dim σ = n − dimσ for any σ ∈ K. (4) If X ⊂ K a subcomplex of K, is a subcomplex ofK. (5) We consider the chain complex of K andK, let ∂ and∂ be boundary operators on those chain complexes, and let B andB be matrix representations of ∂ and∂, i.e. ∂σ i = j B ji σ j and∂σ i = jB jiσj . ThenB is the transpose of B.
This decompositionK is called the dual decomposition of K. One example of the dual decomposition is a Voronoi decomposition with respect to a Delaunnay triangulation. Using the dual decomposition, we can define the map θ from C k (K) to C n−k (K) for k = 0, . . . , n as the linear extension of The map θ satisfies the equation where δ is the coboundary operator on C * (K) from Proposition 10. The map θ induces the isomorphism H k (K) H n−k (K), and the isomorphism is called Poincaré duality.
Using the dual decomposition, we show Alexander duality theorem.
Theorem 11. For an n-sphere S n , its simplicial decomposition K, and a subcomplex of X ⊂ K, we take a dual decompositionK and a subcomplex ofX as in Proposition 10. Then,H k−1 (X) H n−k (X) (15) holds for any k = 1, . . . , n, whereH is the reduced (co)-homology.
To apply the duality theorem to persistent homology, we investigate that isomorphism in detail.
First, we consider the case of K = X. In this case,X = ∅ and the homology of X is the same as an n-sphere. Therefore,H k (X) = 0 for any k = 0, . . . , n − 1 and this is isomorphic toH n−k−1 (∅) = 0.
Next, we consider the case that K = X. In this case, there is a n-simplex of K which is not contained in X. We write the n-simplex as ω and let K 0 be K\{ω}.

Proposition 12.
There is the following isomorphism.
This isomorphism is induced by: The map θ is well-define and isomorphic since {σ s+1 , . . . ,σ t } is equal to the set of all (n − k)-simplices ofX. In addition, δ •θ =θ • ∂ holds where ∂ is the boundary operator on C * (K, X), and δ is the coboundary operator on C * (X) due to (14). Using the mapθ, the isomorphismθ * : H k (K, X) → H n−k (X) is defined as follows, The next key is the long exact sequence on the pair (K, X).
The map ∂ * is written as follows and j * is induced by the projection map from C k (K) to C k (K, X). If k = n, since bothH k (K) andH k−1 (K) are zero, The following map is isomorphic due to the long exact sequence (18).
By combining (16) and (20), we conclude the isomorphism (16) for k = n. We can explicitly write the isomorphism fromH n−k toH k−1 as follows using (17) and (19): When k = n, we need to treat the problem more carefully. From the long exact sequence (18), we can show that the following sequence is exact: Let {σ 1 , . . . , σ t−1 , σ t = ω} be n-simplices of K and {σ 1 , . . . , σ s } be n-simplices of X. From the assumption of X = K, s < t holds. It is easy to show that τ = σ 1 + · · · + σ t is the generator of Z n (K). From the definition of the reduced cohomology,H where Z 0 (X) = ker(δ : C 0 (X) → C 1 (X)) andτ =θ(j(τ )) =σ * s+1 + · · · +σ * t . For Z 2 coefficient, the following set is the basis of Z 0 (X) where cc(X) is the connected component decomposition of 0-cells inX. Therefore, we can writeH 0 (X) as:H where cc ω (X) ⊂ cc(X) is the set of connected components which do not containω. Using the above relations, we can showH 0 (X) H n−1 (X) whose isomorphism is the linear extension of the following: for all C ∈ cc ω (X).
A.3. Alexander duality and persistent homology. To apply Alexander duality to the persistent homology, we need to consider the relation between inclusion maps and the isomorphismθ * . For two subcomplex X 1 ⊂ X 2 of K, the following diagram commutes: where φ andφ ∨ are induced from the inclusion maps. Note that X 1 ⊂ X 2 induces X 1 ⊃X 2 andφ ∨ is defined from C n− (X 1 ) to C n− (X 2 ). Using (26), we have the following commutative diagram: We also have the following commutative diagram between two long exact sequences (28) From (27), (28), and the discussion in Section A.2, we have the following commutative diagram:H This diagram means that the isomorphism preserves the decomposition structure of persistent homology and henceH −1 (X) H n− (X) holds for X : X 0 ⊂ · · · ⊂ X K whereX :X 0 ⊃ · · · ⊃X K .
A.4. Alexander duality and a triangulation in R n . Here, we consider the simplicial filtration in R n satisfying Condition 6. Under the condition, we need to embed the filtration X on R n into S n by using one point compactification. We consider a embedding |X| → S n and take σ ∞ as S n \|X|. Using the embedding, we can regard X ∪ {σ ∞ } as a cell decomposition of S n . The above discussion about Alexander duality on persistent homology works on this cell complex, if we properly define the boundary operator and the dual decomposition. In that case, we regard σ ∞ as ω in the definition of K 0 .
A.5. Merge-Tree Algorithm for 0th Persistent Cohomology. The above discussion shows that only we need to do is to give an algorithm for computing 0th persistent cohomology of the dual filtration In fact, we can efficiently compute the 0th cohomologous persistence diagram using the following merge-tree algorithm.
To simplify the explanation of the algorithm, we assume the following condition. This condition corresponds to Condition 1 for persistent homology.
Under the condition, we explain the algorithm to compute the decomposition of 0th persistence cohomology on the decreasing filtration Y : Algorithm 4 computes the 0th cohomologous persistence diagram. In this algorithm, (V k , E k ) is a graph whose nodes are 0-cells of Y and whose edges have extra data in Z. Later we show this algorithm is applicable for computing D n−1 (X) using Alexander duality.
This algorithm tracks all {(V k , E k )} k=0,...,K for the mathematical proof, but when you implement the algorithm, you do not need to keep the history and you can directly update the set of nodes and edges.
Theorem 14. The 0th reduced cohomologous persistence diagramD 0 (Y) is given as follows:D To prove the theorem and justify the algorithm, we show some basic facts about the graph (V K , E k ) given by the algorithm. These facts are shown by checking the edges/nodes adding rule of each step in Algorithm 4.
Fact 16. For any k, (V k , E k ) is a forest, i.e. a set of trees. That is, the followings hold: • There is no loop in the graph • For any node, the number of outgoing edges from the node is zero or one.
-If the number is zero, the node is a root node -If the number is one, the node is a child node We can inductively prove Fact 16 since an edge is added between two roots of (V k , E k ) in the algorithm.
Fact 17. The topological connectivity of Y k is the same as (V k , T k ). That is, {σ i1 , . . . ,σ i } is all 0-simplices of a connected component in X k if and only if there is a tree in (V k , E k ) whose nodes are {σ i1 , . . . ,σ i }. This is because the addition of a node to the graph corresponds to the addition of a connected component in Y and the addition of an edge corresponds to the concatenation of two connected components.
Proof of Fact 18. The edgeσ s k − →σ t is added afterσ s k − →σ t is added in the algorithm since any edge is added between two root nodes, hence we have k < k. We also show that k < s < t and k < s < t from the rule of edge addition and this inequalities hold for any intermediate edge in the path, so we have s < s . The required inequality comes from these inequalities.
The following fact is shown since in the algorithm each edge is added between two root nodes.
Fact 19. Ifσ s is not a root of a tree in (V k , E k ), the subtree whose root node isσ s does not change in the sequence of graphs: (V k , E k ) ⊂ · · · ⊂ (V 0 , E 0 ).
Using these facts, we set up the 0th persistence cohomology. We prepare some symbols: R k = {σ s |σ s is a root of a tree in (V k , E k )}, desc k (σ s ) = {σ t : a descendant node ofσ s in (V k , E k ), includingσ s itself}, desc k (σ s ) = {σ t ∈ desc 0 (σ s ) | k ≤ t}, : the induced map of the inclusion map Y k ← − Y k+1 . We prove the following lemma.
Proof. From Fact 17 and the theory of 0th cohomology, we have that {y (k) s |σ s ∈ R k } are basis of H 0 (Y k ). Here, we prove the following three facts. Then the theory of linear algebra leads the statement of the lemma.
(i) is trivial. We show (ii). We can writeŷ (k) s explicitly by using the two graphs (V k , E k ) and (V 0 , E 0 ) by the following way. Let R k (σ s ) be R k (σ s ) = {σ t ∈ R k |σ t is a descendant ofσ s in (V 0 , E 0 ), includingσ s itself}.
Then we writeŷ (k) s = σt∈R k (σs) x (k) t . We can show the equation from the followings.
To prove the optimality of x (d) b , we show the following claim. Claim: If x is a persistent volume of (b, d), x (d) b ⊂ x holds. The claim immediately leads Theorem 7 and Theorem 9(ii). From Theorem 5, [∂x] b is well defined and nonzero. By the isomorphism for Alexander duality and Lemma 23(vi), there is R ⊂ R b such that From the relation (25) and the definition of x for any s with σ s ∈ R k . Therefore so there exists w ∈ C n (X b ) such that holds. Since X b is a simplicial complex embedded in R n , Z n (X b ) = 0 and s ∈ σ k : n-simplex | b < k ≤ d and w ∈ C n (X b ) = σ k : n-simplex | k < b , we have w = 0 and s .
From (10), x has always σ d term and so σ d ∈ R and finish the proof of the claim.
Theorem 8 and Theorem 9(iii) are immediately come from the definition of x (b) d and properties of the tree structure shown in Section A.5.