topomics.models.BaseTopicModel#

class topomics.models.BaseTopicModel(mdata, modalities=None)#

Base class for all models in the topomics package.

This class provides a common interface and shared functionality for all models. It can be extended by specific model implementations.

Attributes table#

Methods table#

check_input(mdata, modalities)

Validate and process the input data.

check_modalities_names()

Standardize and validate modality keys in data_dict.

clear_metric_cache()

Clear the cached metrics.

cross_modality_score(mod_a, mod_b, *[, ...])

Compute SHARE-Topic–style cross-modal interaction matrix P_{a,b}

fit(data)

Fit the model to the provided data.

get_cell_topic_dist()

Get the cell-topic matrix Θ (C × K).

get_entropy([normalised])

Compute mean entropy of cell-topic distributions.

get_feature_topic_dist(modality)

Get the feature-topic matrix Φ (K × G).

get_likelihood_per_modality(**kwargs)

Compute log-likelihood for each modality separately.

get_modality_weights(**kwargs)

Get normalized mixing weights showing how much each modality contributes to topic assignments.

get_perplexity(**kwargs)

Compute perplexity (reconstruction quality).

get_perplexity_per_modality(**kwargs)

Compute perplexity for each modality separately.

get_top_features_per_topic(modality[, ...])

Get top N features for each topic in a specific modality.

get_topic_diversity([modality])

Compute topic diversity as average pairwise cosine distance.

predict(data)

Predict using the fitted model on the provided data.

Attributes#

BaseTopicModel.spatial: bool = False#

Methods#

BaseTopicModel.check_input(mdata, modalities)#

Validate and process the input data.

Checks that data are adata or mudata objects, and that the modalities are correctly specified.

BaseTopicModel.check_modalities_names()#

Standardize and validate modality keys in data_dict.

Maps various synonyms to ‘rna’, ‘protein’, or ‘chromatin’, and rebuilds data_dict with standardized keys.

BaseTopicModel.clear_metric_cache()#

Clear the cached metrics.

Call this method after retraining the model to ensure metrics are recomputed with the updated parameters.

BaseTopicModel.cross_modality_score(mod_a, mod_b, *, normalise=True, return_df=True)#

Compute SHARE-Topic–style cross-modal interaction matrix P_{a,b}

Parameters:
  • model (fitted topic model with the two accessors above)

  • mod_a (str)

  • mod_b (str)

  • normalise (bool (default: True))

  • return_df (bool (default: True))

Return type:

ndarray | DataFrame

Returns:

P : shape (n_feat_a, n_feat_b) – interaction score between every feature of mod_a and every feature of mod_b

BaseTopicModel.fit(data)#

Fit the model to the provided data.

Parameters:

data (The input data to fit the model.)

BaseTopicModel.get_cell_topic_dist()#

Get the cell-topic matrix Θ (C × K).

Return type:

ndarray

Returns:

-Θ (ndarray) Cell-topic matrix, where C is the number of cells and K is the number of topics.

BaseTopicModel.get_entropy(normalised=True)#

Compute mean entropy of cell-topic distributions.

Higher entropy means topics are more evenly distributed across cells. This measures the uncertainty in topic assignments per cell.

Parameters:

normalised (bool (default: True)) – Whether to normalize cell-topic distributions before computing entropy. If True, ensures distributions sum to 1 (default: True).

Return type:

float

Returns:

float Mean entropy across all cells

BaseTopicModel.get_feature_topic_dist(modality)#

Get the feature-topic matrix Φ (K × G).

Parameters:

modality (str) – The name of the modality for which to retrieve the feature-topic matrix.

Return type:

ndarray | DataFrame

Returns:

Φ : np.ndarray or pd.DataFrame Feature-topic matrix, where K is the number of topics and G is the number of features. If the modality has feature names, returns a DataFrame with those names.

BaseTopicModel.get_likelihood_per_modality(**kwargs)#

Compute log-likelihood for each modality separately.

Higher is better.

Return type:

dict[str, float]

Returns:

dict[str, float] Dictionary mapping modality names to log-likelihood values

BaseTopicModel.get_modality_weights(**kwargs)#

Get normalized mixing weights showing how much each modality contributes to topic assignments.

Only applicable for multimodal models with mixture-of-experts or similar architectures. Returns weights in range [0, 1] that sum to 1 per cell. Higher weight = model relies more on that modality for inferring topics.

Return type:

DataFrame | dict[str, ndarray]

Returns:

pd.DataFrame or dict[str, np.ndarray] Normalized mixing weights for each cell and modality. DataFrame: cells × modalities Dict: modality name → weights array

BaseTopicModel.get_perplexity(**kwargs)#

Compute perplexity (reconstruction quality).

Lower is better. Perplexity = exp(-log_likelihood / N_tokens)

Return type:

float

Returns:

float Perplexity score

BaseTopicModel.get_perplexity_per_modality(**kwargs)#

Compute perplexity for each modality separately.

Lower is better. Perplexity = exp(-log_likelihood / N_tokens)

Return type:

dict[str, float]

Returns:

dict[str, float] Dictionary mapping modality names to perplexity values

BaseTopicModel.get_top_features_per_topic(modality, n_features=10, return_scores=False)#

Get top N features for each topic in a specific modality.

Parameters:
  • modality (str) – Modality name (e.g., ‘rna’, ‘protein’, ‘chromatin’)

  • n_features (int (default: 10)) – Number of top features to return per topic (default: 10)

  • return_scores (bool (default: False)) – If True, return (feature_name, score) tuples. If False, return feature names only (default: False).

Return type:

dict[str, list[str]] | dict[str, list[tuple[str, float]]]

Returns:

dict[str, list[str]] or dict[str, list[tuple[str, float]]] Dictionary mapping topic names (e.g., ‘topic_0’) to lists of top feature names or (feature_name, score) tuples.

BaseTopicModel.get_topic_diversity(modality=None)#

Compute topic diversity as average pairwise cosine distance.

Higher values indicate more distinct topics. This metric measures how different the topic-feature distributions are from each other.

Parameters:

modality (str | None (default: None)) – If provided, compute diversity for this specific modality’s feature-topic distribution. If None, compute diversity averaged across all modalities (default: None).

Return type:

float

Returns:

float Average pairwise cosine distance between topic distributions (0-1). Higher = more diverse/distinct topics.

BaseTopicModel.predict(data)#

Predict using the fitted model on the provided data.

Parameters:

data (The input data for prediction.)