API

Import epiScanpy’s high-level API as:

import episcanpy.api as epi

Count Matrices: CT

Loading data, loading annotations, building count matrices, filtering of lowly covered methylation variables. Filtering of lowly covered cells.

Building count matrices

Quickly build a count matrix from tsv/tbi file.

ct.bld_mtx_fly(tsv_file, annotation[, …])

Building count matrix on the fly.

Load features

In order to build a count matrix for either methylation or open chromatin data, loading the segmentation of the genome of interest or the set of features of interest is a prerequirement.

ct.load_features(file_features[, …])

The function load features is here to transform a bed file into a usable set of units to measure methylation levels.

ct.make_windows(size[, chromosomes, max_length])

Generate windows/bins of the given size for the appropriate genome (default choice is human).

ct.size_feature_norm(loaded_feature, size)

If the features loaded are too smalls or of different sizes, it is possible to normalise them to a unique given size by extending the feature coordinate in both directions.

ct.plot_size_features(loaded_feature[, …])

Plot the different feature sizes in an histogram.

ct.name_features(loaded_features)

Extract the names of the loaded features, specifying the chromosome they originated from.

Reading methylation file

Functions to read methylation files, extract methylation and buildthe count matrices:

ct.build_count_mtx(cells, annotation[, …])

Build methylation count matrix for a given annotation.

ct.read_cyt_summary(sample_name, meth_type, …)

Read file from which you want to extract the methylation level and (assuming it is like the Ecker/Methylpy format) extract the number of methylated read and the total number of read for the cytosines covered and in the right genomic context (CG or CH) :param sample_name: name of the file to read to extract key information.

ct.load_met_noimput(matrix_file[, path, save])

read the raw count matrix and convert it into an AnnData object.

Reading open chromatin(ATAC) file

ATAC-seq specific functions to build count matrices and load data:

ct.bld_atac_mtx(list_bam_files, loaded_feat)

Build a count matrix one set of features at a time.

ct.save_sparse_mtx(initial_matrix[, …])

Convert regular atac matrix into a sparse Anndata:

General functions

Functions non -omic specific:

ct.save_sparse_mtx(initial_matrix[, …])

Convert regular atac matrix into a sparse Anndata:

Preprocessing: PP

Imputing missing data (methylation), filtering lowly covered cells or variables, correction for batch effect.

pp.coverage_cells(adata[, key_added, log, …])

Histogram of the number of open features (in the case of ATAC-seq data) per cell.

pp.correlation_pc(adata, variable[, pc, …])

Correlation between a given PC and a covariate.

pp.coverage_features(adata[, binary, log, …])

Display how often a feature is measured as open (for ATAC-seq).

pp.density_features(adata[, threshold, …])

Display how often a feature is measured as open (for ATAC-seq).

pp.select_var_feature(adata[, min_score, …])

This function computes a variability score to rank the most variable features across all cells.

pp.cal_var(adata[, show, color, save])

Show distribution plots of cells sharing features and variability score.

pp.variability_features(adata[, min_score, …])

This function computes a variability score to rank the most variable features across all cells.

pp.binarize(adata[, copy])

convert the count matrix into a binary matrix.

pp.lazy(adata[, pp_pca, svd_solver, nb_pcs, …])

Automatically computes PCA coordinates, loadings and variance decomposition, a neighborhood graph of observations, t-distributed stochastic neighborhood embedding (tSNE) Uniform Manifold Approximation and Projection (UMAP)

pp.load_metadata(adata, metadata_file[, …])

Load observational metadata in adata.obs.

pp.read_ATAC_10x(matrix[, cell_names, …])

Load sparse matrix (including matrices corresponding to 10x data) as AnnData objects.

pp.filter_cells(adata[, min_counts, …])

Filter cell outliers based on counts and numbers of genes expressed.

pp.filter_features(data[, min_counts, …])

Filter features based on number of cells or counts.

pp.normalize_total(adata[, target_sum, …])

Normalize counts per cell.

pp.pca(adata[, n_comps, zero_center, …])

Principal component analysis [Pedregosa11].

pp.normalize_per_cell(adata[, …])

Normalize total counts per cell.

pp.regress_out(adata, keys[, n_jobs, copy])

Regress out unwanted sources of variation.

pp.subsample(data[, fraction, n_obs, …])

Subsample to a fraction of the number of observations.

pp.downsample_counts(adata[, …])

Downsample counts from count matrix.

pp.neighbors(adata[, n_neighbors, n_pcs, …])

Compute a neighborhood graph of observations [McInnes18].

pp.sparse(adata[, copy])

Transform adata.X from a matrix or array to a csc sparse matrix.

pp.sparse(adata[, copy])

Transform adata.X from a matrix or array to a csc sparse matrix.

Methylation matrices

Methylation specific count matrices.

pp.imputation_met(adata[, …])

Impute missing values in methyaltion level matrices.

pp.load_met_noimput(matrix_file[, path, save])

read the raw count matrix and convert it into an AnnData object.

pp.readandimputematrix(file_name[, min_coverage])

Temporary function to load and impute methyaltion count matrix into an AnnData object

Tools: TL

tl.rank_features(adata, groupby[, omic, …])

It is a wrap-up function of scanpy sc.tl.rank_genes_groups function.

tl.lazy(adata[, pp_pca, copy])

Automatically computes PCA coordinates, loadings and variance decomposition, a neighborhood graph of observations, t-distributed stochastic neighborhood embedding (tSNE) Uniform Manifold Approximation and Projection (UMAP)

tl.load_markers(path, marker_list_file)

Convert list of known cell type markers from literature to a dictionary Input list of known marker genes First row is considered the header

tl.identify_cluster(adata, cell_type, …[, …])

Use markers of a given cell type to plot peak openness for peaks in promoters of the given markers Input cell type, cell type markers, peak promoter intersections

tl.top_feature_genes(adata, gtf_file[, …])

Deprecated - Please use epi.tl.var_features_to_genes instead.

tl.var_features_to_genes(adata, gtf_file[, …])

Once you called the most variable features.

tl.geneactivity(adata, gtf_file[, …])

merge values of peaks/windows/features overlapping genebodies + 2kb upstream.

tl.diffmap(adata[, n_comps, copy])

Diffusion Maps [Coifman05] [Haghverdi15] [Wolf18].

tl.draw_graph(adata[, layout, init_pos, …])

Force-directed graph drawing [Islam11] [Jacomy14] [Chippada18].

tl.tsne(adata[, n_pcs, use_rep, perplexity, …])

t-SNE [Maaten08] [Amir13] [Pedregosa11].

tl.umap(adata[, min_dist, spread, …])

Embed the neighborhood graph using UMAP [McInnes18].

tl.dpt(adata[, n_dcs, n_branchings, …])

Infer progression of cells through geodesic distance along the graph [Haghverdi16] [Wolf19].

tl.louvain(adata[, resolution, …])

Cluster cells into subgroups [Blondel08] [Levine15] [Traag17].

tl.leiden(adata[, resolution, restrict_to, …])

Cluster cells into subgroups [Traag18].

tl.kmeans(adata, num_clusters)

Compute kmeans clustering using X_pca fits.

tl.hc(adata, num_clusters)

Compute hierarchical clustering using X_pca fits.

tl.getNClusters(adata, n_cluster[, …])

Function will test different settings of louvain to obtain the target number of clusters.

tl.dendogram(adata, groupby[, n_pcs, …])

Computes a hierarchical clustering for the given groupby categories.

tl.ARI(adata, label_1, label_2)

Compute Adjusted Rand Index.

tl.AMI(adata, label_1, label_2)

Compute adjusted Mutual Info.

tl.homogeneity(adata, label_1, label_2)

Compute homogeneity score.

tl.silhouette(adata_name, cluster_annot[, …])

Compute silhouette scores.

Plotting: PL

The plotting module episcanpy.plotting largely parallels the tl.* and a few of the pp.* functions. For most tools and for some preprocessing functions, you’ll find a plotting function with the same name.

pl.pca(adata, basis, *[, color, …])

Scatter plot in PCA coordinates.

pl.pca_overview(adata[, color, use_raw, …])

Plot PCA results.

pl.pca_variance_ratio(adata[, n_pcs, log, …])

Plot the variance ratio.

pl.tsne(adata, basis, *[, color, …])

Scatter plot in tSNE basis.

pl.umap(adata, basis, *[, color, …])

Scatter plot in UMAP basis.

pl.rank_feat_groups(adata[, groups, …])

Plot ranking of features.

pl.rank_feat_groups_violin(adata[, groups, …])

Plot ranking of features for all tested comparisons.

pl.rank_feat_groups_dotplot(adata[, groups, …])

Plot ranking of features using dotplot plot (see dotplot())

pl.rank_feat_groups_stacked_violin(adata[, …])

Plot ranking of features using stacked_violin plot (see stacked_violin())

pl.rank_feat_groups_matrixplot(adata[, …])

Plot ranking of features using matrixplot plot (see matrixplot())

pl.rank_feat_groups_heatmap(adata[, groups, …])

Plot ranking of features using heatmap plot (see heatmap())

pl.rank_feat_groups_tracksplot(adata[, …])

Plot ranking of features using heatmap plot (see heatmap())

pl.cal_var(adata[, show, color, save])

Show distribution plots of cells sharing features and variability score.

pl.violin(adata, keys[, groupby, log, …])

Violin plot.

pl.scatter(adata[, x, y, color, use_raw, …])

Scatter plot along observations or variables axes.

pl.ranking(adata, attr, keys[, dictionary, …])

Plot rankings.

pl.clustermap(adata[, obs_keys, use_raw, …])

Hierarchically-clustered heatmap.

pl.heatmap(adata, var_names[, groupby, …])

Heatmap of the expression values of genes.

pl.dotplot(adata, var_names[, groupby, …])

Makes a dot plot of the expression values of var_names.

pl.matrixplot(adata, var_names[, groupby, …])

Creates a heatmap of the mean expression values per cluster of each var_names If groupby is not given, the matrixplot assumes that all data belongs to a single category.

pl.tracksplot(adata, var_names, groupby[, …])

In this type of plot each var_name is plotted as a filled line plot where the y values correspond to the var_name values and x is each of the cells.

pl.dendrogram(adata, groupby[, …])

Plots a dendrogram of the categories defined in groupby.

pl.correlation_matrix(adata, groupby[, …])

Plots the correlation matrix computed as part of sc.tl.dendrogram.

pl.prct_overlap(adata, key_1, key_2[, norm, …])

% or cell count corresponding to the overlap of different cell types between 2 set of annotations/clusters.

pl.overlap_heatmap(adata, key_1, key_2[, …])

Heatmap of the cluster correspondance between 2 set of annaotations.

pl.cluster_composition(adata, cluster, condition)

pl.silhouette(adata_name, cluster_annot[, …])

Plot the product of tl.silhouette as a silhouette plot

pl.silhouette_tot(adata_name, cluster_annot)

Both compute silhouette scores and plot it.

pl.cal_var(adata[, show, color, save])

Show distribution plots of cells sharing features and variability score.

pl.variability_features(adata[, min_score, …])

This function computes a variability score to rank the most variable features across all cells.