Import epiScanpy’s high-level API as:

import episcanpy.api as epi

Count Matrices: CT

Loading data, loading annotations, building count matrices, filtering of lowly covered methylation variables. Filtering of lowly covered cells.

Load features

In order to build a count matrix for either methylation or open chromatin data, loading the segmentation of the genome of interest or the set of features of interest is a prerequirement.

ct.load_features(file_features[, …])

The function load features is here to transform a bed file into a usable set of units to measure methylation levels.

ct.make_windows(size[, chromosomes, max_length])

Generate windows/bins of the given size for the appropriate genome (default choice is human).

ct.size_feature_norm(loaded_feature, size)

If the features loaded are too smalls or of different sizes, it is possible to normalise them to a unique given size by extending the feature coordinate in both directions.

ct.plot_size_features(loaded_feature[, …])

Plot the different feature sizes in an histogram.


Extract the names of the loaded features, specifying the chromosome they originated from.

Reading methylation file

Functions to read methylation files, extract methylation and buildthe count matrices:

ct.build_count_mtx(cells, annotation[, …])

Build methylation count matrix for a given annotation.

ct.read_cyt_summary(sample_name, meth_type, …)

Read file from which you want to extract the methylation level and (assuming it is like the Ecker/Methylpy format) extract the number of methylated read and the total number of read for the cytosines covered and in the right genomic context (CG or CH) :param sample_name: name of the file to read to extract key information.

ct.load_met_noimput(matrix_file[, path, save])

read the raw count matrix and convert it into an AnnData object.

Reading open chromatin(ATAC) file

ATAC-seq specific functions to build count matrices and load data:

ct.bld_atac_mtx(list_bam_files, loaded_feat)

Build a count matrix one set of features at a time.

ct.save_sparse_mtx(initial_matrix[, …])

Convert regular atac matrix into a sparse Anndata:

General functions

Functions non -omic specific:

Preprocessing: PP

Imputing missing data (methylation), filtering lowly covered cells or variables, correction for batch effect.

pp.coverage_cells(adata[, bins, key_added, …])

Histogram of the number of open features (in the case of ATAC-seq data) per cell.


pp.select_var_feature(adata[, max_score, …])

This function computes a variability score to rank the most variable features across all cells.

pp.binarize(adata[, copy])

convert the count matrix into a binary matrix.

pp.lazy(adata[, pp_pca, nb_pcs, …])

Automatically computes PCA coordinates, loadings and variance decomposition, a neighborhood graph of observations, t-distributed stochastic neighborhood embedding (tSNE) Uniform Manifold Approximation and Projection (UMAP)

pp.load_metadata(adata, metadata_file[, …])

Load observational metadata in adata.obs.

pp.read_ATAC_10x(matrix[, cell_names, …])

Load sparse matrix (including matrices corresponding to 10x data) as AnnData objects.

pp.filter_cells(adata[, min_counts, …])

Filter cell outliers based on counts and numbers of genes expressed.

pp.filter_features(data[, min_counts, …])

Filter features based on number of cells or counts.

pp.normalize_total(adata[, target_sum, …])

Normalize counts per cell.

pp.pca(adata[, n_comps, zero_center, …])

Principal component analysis [Pedregosa11].

pp.normalize_per_cell(adata[, …])

Normalize total counts per cell.

pp.regress_out(adata, keys[, n_jobs, copy])

Regress out unwanted sources of variation.

pp.subsample(data[, fraction, n_obs, …])

Subsample to a fraction of the number of observations.

pp.downsample_counts(adata[, …])

Downsample counts from count matrix.

pp.neighbors(adata[, n_neighbors, n_pcs, …])

Compute a neighborhood graph of observations [McInnes18].

pp.sparse(adata[, copy])

Transform adata.X from a matrix or array to a csc sparse matrix.

Methylation matrices

Methylation specific count matrices.

pp.imputation_met(adata[, …])

Impute missing values in methyaltion level matrices.

pp.load_met_noimput(matrix_file[, path, save])

read the raw count matrix and convert it into an AnnData object.

pp.readandimputematrix(file_name[, min_coverage])

Temporary function to load and impute methyaltion count matrix into an AnnData object

Tools: TL

tl.rank_features(adata, groupby[, omic, …])

It is a wrap-up function of scanpy sc.tl.rank_genes_groups function.

tl.silhouette(adata_name, cluster_annot[, …])

Compute silhouette scores.

tl.load_markers(path, marker_list_file)

Convert list of known cell type markers from literature to a dictionary Input list of known marker genes First row is considered the header

tl.identify_cluster(adata, cell_type, …[, …])

Use markers of a given cell type to plot peak openness for peaks in promoters of the given markers Input cell type, cell type markers, peak promoter intersections

tl.top_feature_genes(adata, gtf_file[, …])


tl.diffmap(adata[, n_comps, copy])

Diffusion Maps [Coifman05] [Haghverdi15] [Wolf18].

tl.draw_graph(adata[, layout, init_pos, …])

Force-directed graph drawing [Islam11] [Jacomy14] [Chippada18].

tl.tsne(adata[, n_pcs, use_rep, perplexity, …])

t-SNE [Maaten08] [Amir13] [Pedregosa11].

tl.umap(adata[, min_dist, spread, …])

Embed the neighborhood graph using UMAP [McInnes18].

tl.louvain(adata[, resolution, …])

Cluster cells into subgroups [Blondel08] [Levine15] [Traag17].

tl.leiden(adata[, resolution, restrict_to, …])

Cluster cells into subgroups [Traag18].

Plotting: PL

The plotting module episcanpy.plotting largely parallels the tl.* and a few of the pp.* functions. For most tools and for some preprocessing functions, you’ll find a plotting function with the same name.

pl.pca(adata[, color, feature_symbols, …])

Scatter plot in PCA coordinates.



Plot PCA results.

pl.pca_variance_ratio(adata[, n_pcs, log, …])

Plot the variance ratio.

pl.tsne(adata[, color, feature_symbols, …])

Scatter plot in tSNE basis.

pl.umap(adata[, color, feature_symbols, …])

Scatter plot in UMAP basis.


pl.rank_feat_groups(adata[, groups, …])

Plot ranking of features.

pl.rank_feat_groups_violin(adata[, groups, …])

Plot ranking of features for all tested comparisons.

pl.rank_feat_groups_dotplot(adata[, groups, …])

Plot ranking of features using dotplot plot (see dotplot())

pl.rank_feat_groups_stacked_violin(adata[, …])

Plot ranking of features using stacked_violin plot (see stacked_violin())

pl.rank_feat_groups_matrixplot(adata[, …])

Plot ranking of features using matrixplot plot (see matrixplot())

pl.rank_feat_groups_heatmap(adata[, groups, …])

Plot ranking of features using heatmap plot (see heatmap())

pl.rank_feat_groups_tracksplot(adata[, …])

Plot ranking of features using heatmap plot (see heatmap())

pl.cal_var(adata[, show])

pl.violin(adata, keys[, groupby, log, …])

Violin plot.

pl.scatter(adata[, x, y, color, use_raw, …])

Scatter plot along observations or variables axes.

pl.ranking(adata, attr, keys[, dictionary, …])

Plot rankings.

pl.clustermap(adata[, obs_keys, use_raw, …])

Hierarchically-clustered heatmap.

pl.stacked_violin(adata, var_names[, …])

Stacked violin plots.

pl.heatmap(adata, var_names[, groupby, …])

Heatmap of the expression values of genes.

pl.dotplot(adata, var_names[, groupby, …])

Makes a dot plot of the expression values of var_names.

pl.matrixplot(adata, var_names[, groupby, …])

Creates a heatmap of the mean expression values per cluster of each var_names If groupby is not given, the matrixplot assumes that all data belongs to a single category.

pl.tracksplot(adata, var_names, groupby[, …])

In this type of plot each var_name is plotted as a filled line plot where the y values correspond to the var_name values and x is each of the cells.

pl.dendrogram(adata, groupby[, …])

Plots a dendrogram of the categories defined in groupby.

pl.correlation_matrix(adata, groupby[, …])

Plots the correlation matrix computed as part of sc.tl.dendrogram.

pl.prct_overlap(adata, key_1, key_2[, norm, …])

% or cell count corresponding to the overlap of different cell types between 2 set of annotations/clusters.

pl.overlap_heatmap(adata, key_1, key_2[, …])

Heatmap of the cluster correspondance between 2 set of annaotations.

pl.cluster_composition(adata, cluster, condition)

pl.silhouette(adata_name, cluster_annot[, …])

Plot the product of tl.silhouette as a silhouette plot

pl.silhouette_tot(adata_name, cluster_annot)

Both compute silhouette scores and plot it.