Utils

scarches.utils.add_annotations(adata, files, min_genes=0, max_genes=None, varm_key='I', uns_key='terms', clean=True, genes_use_upper=True)[source]

Add annotations to an AnnData object from files.

Parameters:
  • adata – Annotated data matrix.

  • files – Paths to text files with annotations. The function considers rows to be gene sets with name of a gene set in the first column followed by names of genes.

  • min_genes – Only include gene sets which have the total number of genes in adata greater than this value.

  • max_genes – Only include gene sets which have the total number of genes in adata less than this value.

  • varm_key – Store the binary array I of size n_vars x number of annotated terms in files in adata.varm[varm_key]. if I[i,j]=1 then the gene i is present in the annotation j.

  • uns_key – Sore gene sets’ names in adata.uns[uns_key].

  • clean – If ‘True’, removes the word before the first underscore for each term name (like ‘REACTOME_’) and cuts the name to the first thirty symbols.

  • genes_use_upper – if ‘True’, converts genes’ names from files and adata to uppercase for comparison.

scarches.utils.weighted_knn_trainer(train_adata, train_adata_emb, n_neighbors=50)[source]

Trains a weighted KNN classifier on train_adata. :param train_adata: Annotated dataset to be used to train KNN classifier with label_key as the target variable. :type train_adata: AnnData :param train_adata_emb: Name of the obsm layer to be used for calculation of neighbors. If set to “X”, anndata.X will be

used

Parameters:

n_neighbors (int) – Number of nearest neighbors in KNN classifier.

scarches.utils.weighted_knn_transfer(query_adata, query_adata_emb, ref_adata_obs, label_keys, knn_model, threshold=1, pred_unknown=False, mode='package')[source]

Annotates query_adata cells with an input trained weighted KNN classifier. :param query_adata: Annotated dataset to be used to queryate KNN classifier. Embedding to be used :type query_adata: AnnData :param query_adata_emb: Name of the obsm layer to be used for label transfer. If set to “X”,

query_adata.X will be used

Parameters:
  • ref_adata_obs (pd.DataFrame) – obs of ref Anndata

  • label_keys (str) – Names of the columns to be used as target variables (e.g. cell_type) in query_adata.

  • knn_model (KNeighborsTransformer) – knn model trained on reference adata with weighted_knn_trainer function

  • threshold (float) – Threshold of uncertainty used to annotating cells as “Unknown”. cells with uncertainties higher than this value will be annotated as “Unknown”. Set to 1 to keep all predictions. This enables one to later on play with thresholds.

  • pred_unknown (bool) – False by default. Whether to annotate any cell as “unknown” or not. If False, threshold will not be used and each cell will be annotated with the label which is the most common in its n_neighbors nearest cells.

  • mode (str) – Has to be one of “paper” or “package”. If mode is set to “package”, uncertainties will be 1 - P(pred_label), otherwise it will be 1 - P(true_label).