Sparsity - sparse data processing toolbox

CircleCI Codecov

Sparsity builds on top of Pandas and Scipy to provide DataFrame-like API to work with numerical homogeneous sparse data.

Sparsity provides Pandas-like indexing capabilities and group transformations on Scipy csr matrices. This has proven to be extremely efficient as shown below.

Furthermore we provide a distributed implementation of this data structure by relying on the Dask framework. This includes distributed sorting, partitioning, grouping and much more.

Although we try to mimic the Pandas DataFrame API, some operations and parameters don’t make sense on sparse or homogeneous data. Thus some interfaces might be changed slightly in their semantics and/or inputs.

Install

Sparsity is available from PyPi:

# Install using pip
$ pip install sparsity

Attention

Please enjoy with carefulness as it is a new project and might still contain some bugs.