-
Notifications
You must be signed in to change notification settings - Fork 53
Hi everyone archr in scenicplus ??, #631
Description
Hi everyone,
I am trying to understand properly what should I give from my archr project build in R to scenic+ in order to make it run.
My Archr object is already clean, with a UMAP ATAC and UMAP RNA (miltiomic data) and preprocess with cisbp annotation .... How can I implement this informations to scenic plus ??
Thanks in advance for your advice and your time
I saw this previous issue but not get the point of everything
ISSUE OLD FROM DEC 6 2023 l @laurajohanne
I am working with an ArchR project with 64654 cells.
Obtaining count matrix (for peaks) by converting the count matrix to a h5ad file formate (due to size) ;
matrix <- ArchRtoSignac::getPeakMatrix(ArchRProject= ArchRPro)
scenic_plus_obj <- Seurat::CreateSeuratObject(counts = matrix, project = "count_matrix")
sceasy::convertFormat(scenic_plus_obj, from="seurat", to ="anndata",
outFile='/work/ArchR/data_trine_for_scenicplus/scenic_plus_obj.h5ad')
Next I follow the tutorial; https://pycistopic.readthedocs.io/en/latest/Toy_melanoma-RTD.html with the exception of when creating the cisTopic object since the count matrix is a Sparse matrix.
Create cisTopic object
cistopic_obj = create_cistopic_object(fragment_matrix= count_matrix, cell_names = cell_annotations_list , region_names = region_annotations_list, path_to_blacklist='/work/Scenicplus_pipeline/data/mm10-blacklist.v2.bed')
Adding cell information
cistopic_obj.add_cell_data(cell_data)
pickle.dump(cistopic_obj,
open(os.path.join(outDir, 'cistopic_obj.pkl'), 'wb'))
print(cistopic_obj)
Run models
models=run_cgs_models_mallet(path_to_mallet_binary,
cistopic_obj,
n_topics=[2,5,10,15,20,25,30,35,40,45,50],
n_cpu=55,
n_iter=500,
random_state=1,
alpha=50,
alpha_by_topic=True,
eta=0.1,
eta_by_topic=False,
tmp_path= tmpDir, #Use SCRATCH if many models or big data set
save_path = None)
with open(outDir+'Mallet_models_500.pkl', 'wb') as f:
pickle.dump(models, f)
model=evaluate_models(models,
select_model=None,
return_model=True,
metrics=['Arun_2010','Cao_Juan_2009', 'Minmo_2011', 'loglikelihood'],
plot_metrics= False,
save= outDir + 'models/model_selection.pdf')
cistopic_obj.add_LDA_model(model)
Find clusters
from pycisTopic.clust_vis import *
find_clusters(cistopic_obj,
target = 'cell',
k = 40,
res = [0.6, 1.2],
seed = 1,
prefix = 'pycisTopic_',
scale = True,
split_pattern = '-')
UMAP
from pycisTopic.clust_vis import *
run_umap(cistopic_obj, target = 'cell', scale = True, random_state = 1)
plot_metadata(cistopic_obj,
reduction_name = 'UMAP',
variables = ['OrderCelltype_merged', 'pycisTopic_leiden_40_0.6', 'pycisTopic_leiden_40_1.2'],
num_columns=3,
text_size=10,
dot_size=0.5,
figsize=(15,7),
save= '/work/Scenicplus_pipeline/dataoutput/visualization/dimensionality_reduction_label_uncorrected_merged_merged.pdf')
I do not encounter any issues in the next steps of the tutorial until I reach visualization with UMAP where I am having issues.
When using the cell annotation from the ArchR project the cells seems to be randomly clustered together. When using their own clustering method leiden they cluster nicely visualize but it does not match the cell annotation.
When using ArchR to cluster the data it also appear that the annotation is correct. The data have already been preprocessed and used by multiple people and I am confident that the annotation is correct.
UMAP from ArchR
And when using the UMAP coordinates from ArchR within pycisTopic it appear that the two methods find quite different coordinates.
UMAP with coordinates from ArchR in pycisTopic
I have already checked if the order of the cells are the same in both the ArchR project and the cisTopic object, which they are. And I have used same seed parameter and number of neighbors within both methods.
Does anyone might have an idea what my next step should be?
Thanks in advance,
Laura
Originally posted by @laurajohanne in #266