Welcome to DeBaCl
-----------------
.. automodule:: debacl
.. contents::
Installation
------------
The most straightforward way to install DeBaCl is to download it from the PyPI
server. In a bash terminal (linux and Mac)::
$ pip install debacl
The current plan is to post a new version of DeBaCl on the PyPI server roughly
every two months. In between "official" updates, the latest code can be
installed by downloading the DeBaCl repo directly from GitHub. In a bash
terminal, the following commands download DeBaCl to a newly created "DeBaCl"
folder in your working directory, then add the new folder to the python path::
$ git clone https://github.com/CoAxLab/DeBaCl/
$ export PYTHONPATH='DeBaCl'
DeBaCl depends on the Python packages ``numpy``, ``networkx``, and
``prettytable``, and recommends the packages ``matplotlib`` (for plotting level
set trees), ``scipy``, and ``scikit-learn`` (for utilities that compute nearest
neighbors). These packages can all be installed with either "conda" or "pip"::
$ pip install numpy networkx prettytable
$ pip install matplotlib scipy scikit-learn
Quickstart
----------
DeBaCl runs in any standard Python 2.7 interpreter, such as IPython::
$ pip install ipython
$ ipython
The first step is to simulate some data. Here we use the scikit-learn
``datasets`` module to draw 100 observations from the "two moons" distribution.
The data is stored in a 100 x 2 ``numpy`` array.
>>> from sklearn.datasets import make_moons
>>> X = make_moons(n_samples=100, noise=0.1, random_state=19)[0]
Next we import ``DeBaCl`` and construct the level set tree for our simulated
dataset.
>>> import debacl as dcl
>>> tree = dcl.construct_tree(X, k=10, prune_threshold=10)
The ``construct_tree`` method takes a tabular numpy array as input and returns a
``LevelSetTree`` object. The parameter ``k`` indicates how many points to
consider neighbors for each point when constructing a similarity graph; higher
values of ``k`` lead to a more connected graph and a "smoother" level set tree.
The ``prune_threshold`` parameter is the minimum size of branches in the output
tree; if any branches are smaller than this number *after* the level set tree is
created, they are merged with nearby branches.
Printing the tree lets us see the key statistics for the branches. Each row of
the print output shows the starting and ending density and mass levels of the
corresponding branch, the number of points that belong to the branch, and the
parent and children branches (if any).
>>> print(tree)
+----+-------------+-----------+------------+----------+------+--------+----------+
| id | start_level | end_level | start_mass | end_mass | size | parent | children |
+----+-------------+-----------+------------+----------+------+--------+----------+
| 0 | 0.000 | 0.196 | 0.000 | 0.220 | 100 | None | [1, 2] |
| 1 | 0.196 | 0.396 | 0.220 | 0.940 | 37 | 0 | [] |
| 2 | 0.196 | 0.488 | 0.220 | 1.000 | 41 | 0 | [] |
+----+-------------+-----------+------------+----------+------+--------+----------+
For complex level set trees, the console output does not convey good intuition
about the shape of the tree, and plotting the tree is a better option. Each
vertical line segment represents a branch of the level set tree; the bottom
endpoint is at the density level where the branch is born and the top endpoint
is at the density level where the branch either splits or vanishes.
>>> fig = tree.plot(form='density')[0]
>>> fig.show()
.. image:: readme_tree.png
:height: 480px
Finally, use the ``get_clusters`` method to retrieve cluster labels from the
level set tree.
>>> labels = tree.get_clusters()
By default, each leaf node of the tree (i.e. a branch without any children)
becomes a cluster. Clusters are returned in the form of a ``numpy`` array with
two columns; the first column is the row index of the data point in the original
dataset, and the second is an integer cluster label.
Level Set Tree constructors
---------------------------
.. currentmodule:: level_set_tree
.. autosummary::
:toctree: generated/
:nosignatures:
construct_tree
construct_tree_from_graph
load_tree
Level Set Tree methods
----------------------
.. currentmodule:: level_set_tree
.. autosummary::
:toctree: generated/
:nosignatures:
LevelSetTree
LevelSetTree.branch_partition
LevelSetTree.get_clusters
LevelSetTree.get_leaf_nodes
LevelSetTree.plot
LevelSetTree.prune
LevelSetTree.save
Utilities
---------
.. currentmodule:: utils
.. autosummary::
:toctree: generated/
:nosignatures:
define_density_level_grid
define_density_mass_grid
epsilon_graph
knn_density
knn_graph
reindex_cluster_labels