gradec.model.annotate_lda

gradec.model.annotate_lda(dataset, dataset_nm, feature_group, n_topics=200, n_cores=1)[source]

Annotate Dataset with the resutls of an LDA model.

Parameters:
  • dset (Dataset) – A Dataset with, at minimum, text available in the self.text_column column of its texts attribute.

  • n_topics (int) – Number of topics for topic model. This corresponds to the model’s n_components parameter. Must be an integer >= 1.

  • dset_name (str) – Dataset name. Possible options: “neurosynth” or “neuroquery”

  • data_dir (str) – Path to data directory.

  • n_cores (int, optional) – Number of cores to use for parallelization. If <=0, defaults to using all available cores. Default is 1.

Returns:

dset (Dataset) – A new Dataset with an updated annotations attribute.