episcanpy.api

API

Import epiScanpy’s high-level API as:

import episcanpy.api as epi

Count Matrices: CT

Loading data, loading annotations, building count matrices, filtering of lowly covered methylation variables. Filtering of lowly covered cells.

Load features

In order to build a count matrix for either methylation or open chromatin data, loading the segmentation of the genome of interest or the set of features of interest is a prerequirement.

ct.load_features(file_features[, …])

The function load features is here to transform a bed file into a usable set of units to measure methylation levels.

ct.make_windows(size[, chromosomes, max_length])

Generate windows/bins of the given size for the appropriate genome (default choice is human).

ct.size_feature_norm(loaded_feature, size)

If the features loaded are too smalls or of different sizes, it is possible to normalise them to a unique given size by extending the feature coordinate in both directions.

ct.plot_size_features(loaded_feature[, …])

Plot the different feature sizes in an histogram.

ct.name_features(loaded_features)

Extract the names of the loaded features, specifying the chromosome they originated from.

Reading methylation file

Functions to read methylation files, extract methylation and buildthe count matrices:

ct.build_count_mtx(cells, annotation[, …])

Build methylation count matrix for a given annotation.

ct.read_cyt_summary(sample_name, meth_type, …)

Read file from which you want to extract the methylation level and (assuming it is like the Ecker/Methylpy format) extract the number of methylated read and the total number of read for the cytosines covered and in the right genomic context (CG or CH) :param sample_name: name of the file to read to extract key information.

ct.load_met_noimput(matrix_file[, path, save])

read the raw count matrix and convert it into an AnnData object.

Reading open chromatin(ATAC) file

ATAC-seq specific functions to build count matrices and load data:

ct.bld_atac_mtx(list_bam_files, loaded_feat)

Build a count matrix one set of features at a time.

ct.save_sparse_mtx(initial_matrix[, …])

Convert regular atac matrix into a sparse Anndata:

General functions

Functions non -omic specific:

toctree

.

ct.save_sparse_mtx

Preprocessing: PP

Imputing missing data (methylation), filtering lowly covered cells or variables, correction for batch effect.

pp.coverage_cells(adata[, bins, key_added, …])

Histogram of the number of open features (in the case of ATAC-seq data) per cell.

pp.commoness_features

pp.binarize(adata[, copy])

convert the count matrix into a binary matrix.

pp.lazy(adata[, pp_pca, nb_pcs, …])

Automatically computes PCA coordinates, loadings and variance decomposition, a neighborhood graph of observations, t-distributed stochastic neighborhood embedding (tSNE) Uniform Manifold Approximation and Projection (UMAP)

pp.load_metadata(adata, metadata_file[, …])

Load observational metadata in adata.obs.

pp.read_ATAC_10x(matrix[, cell_names, …])

Load sparse matrix (including matrices corresponding to 10x data) as AnnData objects.

Methylation matrices

Methylation specific count matrices.

pp.imputation_met(adata[, …])

Impute missing values in methyaltion level matrices.

pp.load_met_noimput(matrix_file[, path, save])

read the raw count matrix and convert it into an AnnData object.

pp.readandimputematrix(file_name[, min_coverage])

Temporary function to load and impute methyaltion count matrix into an AnnData object

Tools: TL

tl.rank_features(adata, groupby[, omic, …])

It is a wrap-up function of scanpy sc.tl.rank_genes_groups function.

tl.silhouette(adata_name, cluster_annot[, …])

Compute silhouette scores.

tl.lazy(adata[, pp_pca, copy])

Automatically computes PCA coordinates, loadings and variance decomposition, a neighborhood graph of observations, t-distributed stochastic neighborhood embedding (tSNE) Uniform Manifold Approximation and Projection (UMAP)

tl.load_markers(path, marker_list_file)

Convert list of known cell type markers from literature to a dictionary Input list of known marker genes First row is considered the header

tl.identify_cluster(adata, cell_type, …[, …])

Use markers of a given cell type to plot peak openness for peaks in promoters of the given markers Input cell type, cell type markers, peak promoter intersections

Plotting: PL

The plotting module episcanpy.plotting largely parallels the tl.* and a few of the pp.* functions. For most tools and for some preprocessing functions, you’ll find a plotting function with the same name.