GitHub - JuliaEarth/CoDa.jl: Compositional data analysis in Julia

This package defines a Composition{D} type representing a D-part composition as defined by Aitchison 1986. In Aitchison's geometry, the D-simplex together with addition (a.k.a. pertubation) and scalar multiplication (a.k.a. scaling) form a vector space, and important properties hold:

Scaling invariance
Pertubation invariance
Permutation invariance
Subcompositional coherence

In practice, this means that one can operate on compositional data (i.e. vectors whose entries represent parts of a total) without destroying the ratios of the parts.

Installation

Get the latest stable release with Julia's package manager:

] add CoDa

Usage

Basics

Compositions are static vectors with named parts:

julia> using CoDa

julia> c = Composition(CO₂=2.0, CH₄=0.1, N₂O=0.3)
                  3-part composition
       ┌                                        ┐ 
   CO₂ ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 2.0   
   CH₄ ┤■■ 0.1                                    
   N₂O ┤■■■■■ 0.3                                 
       └                                        ┘ 

julia> CoDa.parts(c)
(:CO₂, :CH₄, :N₂O)

julia> CoDa.components(c)
3-element StaticArrays.SVector{3, Union{Missing, Float64}} with indices SOneTo(3):
 2.0
 0.1
 0.3

julia> c.CO₂
2.0

Default names are added otherwise:

julia> c = Composition(1.0, 0.1, 0.1)
                     3-part composition
      ┌                                        ┐ 
   w1 ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 1.0   
   w2 ┤■■■■ 0.1                                  
   w3 ┤■■■■ 0.1                                  
      └                                        ┘

and serve for internal compile-time checks.

Compositions can be added, subtracted, negated, and multiplied by scalars. Other operations are also defined including dot product, induced norm, and distance:

julia> cₒ = Composition(CO₂=1.0, CH₄=0.1, N₂O=0.1)
                  3-part composition
       ┌                                        ┐ 
   CO₂ ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 1.0   
   CH₄ ┤■■■■ 0.1                                  
   N₂O ┤■■■■ 0.1                                  
       └                                        ┘ 

julia> -cₒ
                  3-part composition
       ┌                                        ┐ 
   CO₂ ┤■■ 0.047619047619047616                   
   CH₄ ┤■■■■■■■■■■■■■■■■■■■ 0.47619047619047616   
   N₂O ┤■■■■■■■■■■■■■■■■■■■ 0.47619047619047616   
       └                                        ┘ 

julia> 0.5c
                  3-part composition
       ┌                                        ┐ 
   CO₂ ┤■■■■■■■■■■■■■■■■■■■■ 0.6207690197922022   
   CH₄ ┤■■■■ 0.13880817265812764                  
   N₂O ┤■■■■■■■■ 0.24042280754967013              
       └                                        ┘ 

julia> c - cₒ
                  3-part composition
       ┌                                        ┐ 
   CO₂ ┤■■■■■■■■■■■■■■■■■■■■■■■ 0.3333333333333333  
   CH₄ ┤■■■■■■■■■■■■ 0.16666666666666666          
   N₂O ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.5   
       └                                        ┘ 

julia> c ⋅ cₒ
3.7554028908352994

julia> norm(c)
2.1432393747688687

julia> aitchison(c, cₒ) # Aitchison distance
0.7856640352007868

More complex functions can be defined in terms of these operations. For example, the function below defines the composition line passing through cₒ in the direction of c:

julia> f(λ) = cₒ + λ*c
f (generic function with 1 method)

Finally, two compositions are considered to be equal when their closure is approximately equal:

julia> c == c
true

julia> c == cₒ
false

Log-ratio transformations

Currently, the following log-ratio transformations are implemented:

julia> alr(c)
2-element StaticArrays.SArray{Tuple{2},Float64,1,2} with indices SOneTo(2):
  1.8971199848858813
 -1.0986122886681096

julia> clr(c)
3-element StaticArrays.SArray{Tuple{3},Float64,1,3} with indices SOneTo(3):
  1.6309507528132907
 -1.3647815207407001
 -0.2661692320725906

julia> ilr(c)
2-element StaticArrays.SArray{Tuple{2},Float64,1,2} with indices SOneTo(2):
 -2.1183026052494185
 -0.3259894019031434

and their inverses alrinv, clrinv and ilrinv.

The transforms for tables are defined in the TableTransforms.jl package, they are: Closure, Remainder, ALR, CLR, ILR. These transforms are functors that can be used as follows:

julia> table |> ILR()

Arrays

It is often useful to compose D columns of a table into D-part compositions. The package provides a CoDaArray type that implements the Julia array interface and the Tables.jl interface. We recommend using the function compose(table, cols) to construct such arrays:

julia> table = (a=[1,2,3], b=[4,5,6], c=[7,8,9])
(a = [1, 2, 3], b = [4, 5, 6], c = [7, 8, 9])

julia> ctable = compose(table, (:a,:b))
(c = [7, 8, 9], CODA = Composition{2, (:a, :b)}[1.000 : 4.000, 2.000 : 5.000, 3.000 : 6.000])

julia> ctable.CODA[1]
                2-part composition
     ┌                                        ┐ 
   a ┤■■■■■■■■■ 1.0                             
   b ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 4.0   
     └                                        ┘

Random

D-part compositions can be created at random from a Dirichlet distribution:

julia> rand(Composition{3})
                 3-part composition
      ┌                                        ┐ 
   w1 ┤■■■■■■■■■■■■■■■■■ 0.39938229705106565     
   w2 ┤■■■■■■ 0.1491859823748656                 
   w3 ┤■■■■■■■■■■■■■■■■■■■ 0.45143172057406883   
      └                                        ┘

Plots

Separate packages are available for plotting compositional data:

Relative variation biplots: Biplots.jl
Ternary diagrams (Makie.jl) TernaryDiagrams.jl
Ternary diagrams (Plots.jl) TernaryPlots.jl

References

This package is heavily influenced by Aitchison's monograph:

Aitchison, J. 1986. The Statistical Analysis of Compositional Data

and by other textbooks:

den Boogaart, K. & Tolosana-Delgado. 2011. Analyzing Compositional Data with R
Pawlowsky-Glahn et al. 2015. Modeling and Analysis of Compositional Data
Pawlowsky-Glahn, V. & Buccianti, A. 2011. Compositional Data Analysis - Theory and Applications

Notes

The unicode display of composition objects can be obtained with the following code:

using UnicodePlots
using CoDa

function Base.show(io::IO, mime::MIME"text/plain",
                   c::Composition{D,PARTS}) where {D,PARTS}
  w = CoDa.components(c)
  x = Vector{Float64}()
  p = Vector{Symbol}()
  m = Vector{Symbol}()
  for i in 1:D
    if ismissing(w[i])
      push!(m, PARTS[i])
    else
      push!(p, PARTS[i])
      push!(x, w[i])
    end
  end
  plt = barplot(p, x, title="$D-part composition")
  isempty(m) || annotate!(plt, :t, "missing: $(join(m,", "))")
  show(io, mime, plt)
end

The code is not added to the CoDa.jl package itself because the UnicodePlots.jl package has become a very heavy dependency, see UnicodePlots/issues/291.

Name		Name	Last commit message	Last commit date
Latest commit History 267 Commits
.github		.github
docs		docs
src		src
test		test
.JuliaFormatter.toml		.JuliaFormatter.toml
.gitignore		.gitignore
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Usage

Basics

Log-ratio transformations

Arrays

Random

Plots

References

Notes

About

Releases 50

Packages

Contributors 8

Languages

License

JuliaEarth/CoDa.jl

Folders and files

Latest commit

History

Repository files navigation

Installation

Usage

Basics

Log-ratio transformations

Arrays

Random

Plots

References

Notes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 50

Packages 0

Contributors 8

Languages

Packages