Single cell data analysis using VisR: Part1 - CellRanger
Well thanks for coming. And so today is basically part one of hopefully a series of two or three more sessions so of doing single-cell analysis. So the the platform we these be what we call. Apps are built into is via our platform so bizarre as a free software can download. The link is here by the way the link to the slides. Slides are also available so we can later. I mean download and get the links but basically the whole idea behind. Bazaar was was to be able to quickly turn our libraries into these so-called paths so that it's easy to use and it was mainly meant to be used by biologists so who aren't as proficient in art but I think even people who know are will find it useful because it sort of packages everything so basically you write the code words and you add a few basic lines of code and automatically generated an app for you but I won't go into the details of it. There are some YouTube videos that you have if you were interested into building apps there are some tutorials on YouTube videos for building those but today I'm gonna basically go through the first one of these apps we have made for single-cell analysis. So it's a basically the 10x genomics seller in your arcade library which adds a little bit more functionality over their own interactive tool a loop C to C so and I will show you how to sort of we. Can you can mix these apps together with some of the interactive function. All these others are to hopefully get a bit more insight into your into your data so as I said this is the first app and today. I'm just going to be going through this Ranger. But even though each a panda provide their own sort of different analysis workflows and plot and probably different type of analysis for the data. We have tried. We are trying to make all apps to sort of follow the same workflow so that it's easy to sort of switch between the app so basically right now the workflow is you specify. What's your input. Directory is your input for yours. Single cell data the output where their results can be exported into and then the analysis steps that that specific our library provides and and then the parameters for for those so the data that.
I'm basically at so this is going to be mainly an interactive demo. The slides are there for your reference later. Perhaps but the data that. I'll be using it is their PDF c3k data. Its 2700 single cells and around 33,000 genes and it can be down it said it's basically the typical data set for sort of. I use like maybe a toy data set to do the single cell analysis. But so you can. You can download that if you later you wanted to sort of start playing around with it or you can use your own data set so as I mentioned so maybe I can. I can just start right now into so so once you download a bizarre seek. Let's see if I if I can show you the directory where I download it to my download the desktop. I believe yeah so once you download it with our sake based on the platform that platform you have you use one of these run commands so for on Mac. I'm using the OSX and once you run it you will basically start with basically the main interface of the bazaar so the way these are basically works is that you have your apps here and you drag and drop your apps into the workspace as I will show you in a second and you specify the parameters and run them and the apps underneath are our code and then the our code basically runs generates results brings back results here and then you can sort of chain those results back into other apps to create a sort of a more complex workflows so I'll start with this cell Ranger app that today we're gonna go through and sequencing single cell and you drag and drop so here are the parameters just initially specify the data set directory so the PBF c3k data. That i downloaded and extracted. I just specified the the directory for it and open it and now it's asking me to provide the Opera tree this is where you want the plots and the report and the results sort of to go into it for those of you who are family or the older version of is RC.
And previously. You could only get them into the bazaar seek and then if you had closed a bazaar seek you would sort of lose all those that you had god unless if you had saved them but now we have decided to sort of to save them all also into directory in case you want to go back and and sort of see where you got them so. I'll just specify a new new folder for it. The basic this is where output gonna be so outputs cell Ranger create it and open it so you can say every time that you run the app it you can ask it to create a subdirectory for you. This is sort of so that doesn't overwrite the existing results so if you say that our basic decrease of directory with the date and time and these are just just to sort of facilitate your produce ability if you if you reran rerun the results of the different activity different parameters you sort of keep a history of all you but all you did in the past so. I'll keep that checked so the analysis steps that Stella Ranger provides is cell Ranger pipeline for those of you who maybe may not be familiar with already does some sort of preliminary analysis on the data it has some clustering and they are a library that they have is sort of to poke into that that analysis that they have done to redo new clusterings and to perform further analysis the cell. Ranger app doesn't provide them but then that's why we have Morocco and Seurat which hopefully in the next sort of parts next week will go through them so here you would basically say I want these my cell table and and then if basically if you enable that you can say what do you want to be extracted into your cells table and before I run this. I quickly switch perhaps into slides so so once once the tool is ran it it exports based on the parameters that you have set it exports several tables to that output directory and so so one of the tables is the sales table. And if you have so it's kind of a little bit hard to see but basically it it it will create your.
It will put barcodes for the cells and some information about the you know total um I count a lot normalization. And then the 2d projection of the cells using the T Snee algorithm or PCA. And then there are k-means the results for different k values. 2 3 4 up to 10 as I said this is since L Ranger already. The pipeline already provides in the data its basic this is to give you further access and then you can specify if you want to probe into individual genes you can specify gene list of your choice and also exports the the accounts for those genes and and with the normalization of the. I've chosen so here. This is basically what I'm gonna chose. I'm gonna say yeah. Include all of those and in addition I want to count for the genes and then here is just a list of comma separated genes. I don't remember any so I just gonna cheat and copy/paste them from their slides. So it's just a list of genes any any sort of list of your housekeeping genes or any any genes that you want to sort of further look into you choose the normalization if you want to get their raw values or you want the sum over the median and so some divided by the median or the log 10 of that and let's just run this without the differential expression analysis and then if I just run this it basically starts loading your cell Ranger so this is basically from now and everything is done on the our side so if you if you basically can even open the code for this app their base or open source and run it in our studio if you are familiar with and basically it'll go through the system and when it's done it opens up this report that it has generated the PDF report so right now this the PDF report basically says this is a summary of your data and it has 2,700 cells and the percentage of the valid barcodes or so and so forth and and then this is again these are so the plots that initially are provided by that app but we'll go through these Allah further and to get a little bit more value out of them so the total number of um eyes per cell so here.
Each point is a cell the dark places have have more total um I the and this is a lot to normalize. Obviously the brighter points are the cells with the lower number of um eyes. And this is the k-means results the clustering results. So when you cluster to two clusters three clusters four clusters up to ten clusters and this is again for just preliminary analysis and these are the gene markers for those genes that you had specified initially so whatever genes you had those sort of shows that okay here is this n kg7 gene. And that like this is sort of parrots more saturated you have more counts and so on so forth so this can can sort of help to go back into compare it with which cluster it is if you are looking at these so this is the base output that you get if even if even if you run this in our studio now what bizarre allows you to do is sort of to do some interactive analysis on this and I can start by opening the data table that was just generated so if I open my analysis directory there's this cells data table for the UM I count the Disney coordinates and the principal components k-means results and and those genes that I specified as an input is realized so now so now we can basically start exploring these this this table further. I'll start with this scatter plot so I drag the scatter plot up and then the cells and now as an x and y coordinate I want to use similarly T t1 and T sneak 2 so basically. This is a same plot that you would get as an output of the sort of the cell range of our kid but but now you have sort of more interactive interactivity over it so let's start coloring them with the say total um. I count as we had seen. So this is the total amount you can use even the same change the the the color math and so on so forth to match with your other figures but now let's go and color them by that by different clusterings so I select now. Kane needs 5. You can sort of toggle on on and off certain clusters if you're interested in specific clusters and now you can specify the tooltip to be your barcodes.
Probably not that not that important. I would say these barcodes are very cryptic right. It's but but if you had other information about your cells sometimes sometimes your tables have some phenotype data so you can basically say. I want the tooltips to be that phenotype column and and you can annotate them and so on so forth you can aggregate yourself. Sometimes because of over plotting you may not totally understand if like how big each cluster is so you can aggregate them so here it's basically you're just showing mean and standard deviation around them you can different use different aggregate aggregate functions median and quartiles. Or you can say scale. Just based on the size of each cluster so these are basically your different clusters so you see that cluster. 4 is sort of a smaller cluster. This is just for the size comparison. You can also do gating sort of filtering if you wanted to specifically get this a specific set of cells so you can just get those cells and and here you would get a list of those cells. You can export this as a different table. Just you're interested in those specific cells and so now. I'm going to just show play around a little bit with the another app the parallel coordinates. So if you remember we had those five six genes that for which we had these. DG markers so now if say we want to see how they are doing in different clusters within which cluster they're expressed more so on then. I grab again the data to the parallel coordinates initially. It's too crowded because there are so many columns so I'm just gonna start with first of all I'm going to be doing covering them again. By k-means results 18 is 5 and I'm going to disable the columns except for those genes that gene markers and I'd also don't want the barcodes so now just looking at this. This is sort of an alternative to violent floods.
So so you can you can use the violin plots. This is sort of an alternative. If you want to sort of be able to compare so just looking at this for example we see that this gene. CD 79a is mostly expressed in this cluster. Cluster number five and this gene. AKG seven is mostly expressed in these two clusters cluster. Green number three and then the blue one number two and so on so forth you can sort of go through. I guess this makes more sense if when you are looking at the specific genes of your interest by the way so if in order to compare them fare like so this is when we were exporting the cell Ranger output we selected the normalization to be log ten. So if you were interested in just the raw values for those genes you can get the raw values and then then here you get to each other all of our bodies but these these are ranked on the log ten harmonized we can also sort of we can specify the axis ranges here if we want to compare them so. I'm just gonna do all of them together so I'm just gonna say for all of them go from zero to two and just so that it's a sort of same across that mentions and you can change other visual properties to match again the figures in your paper or elsewhere so if if you want to have a custom color palette for your clusters to match them actually in fact there's one of the color palette custom ones is the one that's used also by by the by the plots which we were just looking at the if I bring back result so here it's so if you were looking at k-means five what why why am I even looking here. I could actually do it in bizarre seek so but so it's basically these clusters but probably better would be if I just do it side by side. I have parallel coordinates here and then here I would love I don't want to be. Where's the right-click here. I'm just going to delete this filter. This is what's just a tree but if we wanted to compare this on that together. Let's use the same color palette that this is using so to do look at both of them together and i lo at lowered areas.
Ellucian here so it's a little bit hard to fit two of them but usually on and hopefully on your screen you have a little bit more space so next is the next set of analysis that so so far is there any questions maybe I should stop and if there are any questions yeah so so. That was sort of our cells table so each point basically our units ourselves and for the next analysis you our units would be genes so if you want to have our genes as are basically points it would be the one that right. Now go through but the one that we had at the time is basically. We had cells as our rows of the table and sort of properties about those cells. So for each cell you can say what's total um. I count or what is the account for specific genes. So or which cluster does it belong to. I don't know if I yeah so so but then to get that list we have to get that list first which we can actually. I'll show you how to sew. So but so that is it so initially if you don't have a list of genes and we want to get that list then we would use the differential expression analysis part of the pipeline so for differential analysis we have to specify so in savage ER when you want to differential the differential expression analysis basically you are saying compare the cells in one cluster against all other clusters right so you have to specify which of those a clustering results you want to use so we had if you remember we had nine of them here so so if you want so and then as a result we're gonna basically get say if we use this clustering you will get everything all the genes that were differentially expressed in this cluster versus this cluster. And then everything that was differentially expressed in this cluster versus that one. So we'll get two sets of differentially expressed genes if you used k-means two if you use k-means five we're gonna get five sets of differential expressed genes each corresponding to one of these clusters so here.
I would choose which one I want to use so some something between two and ten. Let's use five and you can choose whether you want the output only to have the significant genes or old at thirty-two thousand genes. If you this is the sort of to give you a short list of of genes or if you want to get all the genes so right now. I'll just get the shortest of genes. I'll show you what happens. If you get all the genes we can shortlist them later if you want and and then we can specify which columns do. We want to have export to be exported per gene. So what are the columns gonna be. Whether it's significant or not basically significance condition the raw p-value adjusted p-value the law to fold change the order ordering based on. Weathertop how significant was it. Among the other genes and a whole lot of other sort of columns so this is summing them in this cluster compared to the sum of the counts in the other clusters just just so that we get a smaller table right now. I'm just going to say so deselect. And you'll just keep whether they are significant or not and maybe a lot to follow change as well and so just basically gonna give me a significance condition for each cluster so I can say in cluster one this gene was significant or not and then in cluster two whether it was significant or not and close so. I'm gonna get five since I have chosen five as a clustering Jaime's angry I'm gonna get five columns each specifying the significance for one of the clusters. I'm gonna get also five columns for the lot to follow change within each of those clusters so I can also do a heatmap lot of and then say okay. Plot did not the three most significant genes for each of those five clusters in the heat map and then just specify parameters for for the other plots generally. I think this is not that important. If you lay. They're gonna be using the interactive plot to do you know to fine-tune your your plot but what the options is there in case you wanted to sort of have have them in your report so I would.
I would run this. And it'll this time you will notice that it will go through competing. The differential expression for each of the cluster. So by the way this is a the setting probably one benefit of the celery rap is also since a lot of the results are pre competed not the differential expression analysis but the clustering results. It's it's the fastest among those three apps that you have seen so on average like the. Surratt gonna take a matter of minutes. Monaco may take up two hours of depending on the type of analysis you choose so maybe for the for those apps. Maybe you will do some screen recording rather than ding live demo. It won't be as easy to do a live demo for them so so now this time since we did a differential expression analysis if I go back and look at the the output so this is the new output at 1600 yeah. That's a time so now. I'm getting another table called differential expression of the analysis that it's so if I open that table so this is so I could. I could choose to overwrite the existing analysis but since I have chosen to create a new directory. I should make sure that I go and open the results of that the new analysis so so if you notice now I have a c1 significance. Basically the significance for cluster 1 and then lock log 2 fold change for cluster 1 and then c2 significance and see do lot to fold change and c3 significance all the way to c5 so so I have one significance column per cluster. And now if if you want you to take a look at them you can use the table view grab it here and so so these are basically the significant genes that we got if I had chosen to include all the genes here in the sound Ranger so not just a significant ones so if I had unchecked this I would get the list of all the 32,000 genes and then I could then use. There's this column significant in any which basically says whether that gene was significant in any of the clusters. So if we see here it's all true and it would be false for all the remaining of the genes so right now it's 90 genes so now.
I'm not sure if if you have recently noticed in some of the publication's there's a so the conventional way to do so now we want to see whether there is any intersection between the cells that expressed in in Swan clusters versus the other ones and the conventional way was was a Venn diagram and there's a Venn diagram still in bizarre but I'm going to use a different type of plot that's becoming more popular and actually it was initially also Nature Methods. It's becoming sort of a more popular way of visualizing the sets so it's called upset and it has an interactive online version as well unfurl as an hour version. So we have this app made for it so basically. I'm assigning the data to it. I only have to specify what are my columns which is going to use for the 70 section and if you remember we had these significance columns basically it says if a gene was significant in one in the skin in this cluster or not so I'm just gonna select those columns is for cluster. 1 plus T 2 plus T 3 cluster for cluster 5 and. Let's just run it without specifying any other options so now what this basically shows. Well actually let's change some of the options so that it actually looks so let's set the plot labels to be a bit larger. I'm going to just increase all of them a little bit. So we can individually change the plots. Okay just. Let's just run it again just so that it's easier to see so so what it basically shows is that so first of all these are our different clusters so you can see that. Cluster one had the most number of significant genes around 40 something cluster 2 is the next one cluster. 5 is the third because the tree is the other one in cluster four has a few were not fewest number of them and now here it shows all the possible intersections so it shows. I have 41 okay. I have 41 and here their cluster to 38 of them there are five. Genes that are both in. Cluster 1 and cluster 5 for genes in cluster 3 and one gene in cluster 5 and.
There's one gene that's all both in cluster 4 and cluster 1 and it basically shows all the possibility section in a much better way than a. Venn diagram would show and I recommend you using this or giving it tried over the Venn diagram so now if you want to get those genes say if you want to see ok what are those genes that are in cluster 5 and 1. That's when you can use the the va's arms table view app so if I just bring this table view are down here and drag the data so if I'm interested in everything that's maybe I run this again with a little bit smaller font size so that it fits over okay so if I'm interested in these guys these five genes which are cutting cluster 1 and cluster 5. I can come here and say okay. I am interested in the ones that are significant in cluster 1 and also significant in cluster 5 and I get those 5 genes basically so this sort of coincidentally kind of at least visually they to sort of go well together so if you have your upset output you can use it as a global view to your data and then sort of pick the specifics using the table view. And then once you have this you can save this resolve if you want and I don't know cluster. 1 and cluster 5. I don't I named it and then export it to a data table. If later you want to. You know. Also keep that in your analysis directory. Mr. 5 so basically this list. Let's see if there's anything else that. I wanted to cover today most of what I image what I showed. Basically today it's also on this slides just for your for you to revisit so we had maybe. I'll go through them quickly. Just refresh so we can once we get the results. We can use the scatterplot to sort of do interactive analysis to do gating or to get aggregated results. We can use the parallel coordinates to analyze specific genes of interest and this is the differential expression analysis table be disabled a lot of them. I just didn't want to have a large table to look at but we got the offset results and sort of looked at and looked at how to filter our gene lists based on the significance.
Yeah that's that's about it. If are there any questions you have the coordinates okay so baby you can. Actually yes a bit depends on what you select. You can select it to be mean or you can select it to be your median or you can. So that's there's an option and within the parallel coordinates here so you can actually say. I want it to be median and quartiles mean mean only min max range basically just to extend just to make sure that okay. I don't have anything sort of washed out when I when I do. The aggregate so so it depends on what you select and this is sort of it if you remember they did your bar plots sorry box plots or your the violin plots sort of show this this median plus minus your quartiles right so this can sum this can be used. I I think let's see if you can use the violin. Plots to to get the similar kind of it but so we want to group them by cluster. Bach a needs and then we want we say okay. We want the avaya a violin plot for these genes. And let's run it. I didn't try but so though this is the violin. Plot cut that sort of corresponds to this parallel coordinate blood. I think it has its uses but when you are comparing I think this parallel coordinate sort of it gives you a little bit of a more concise and easy way and then you can change your color maps to sort of for them to match if you wanted to. I guess yeah. Is there any other questions if you do. If you want to do. Differential expression between just one population versus another. You would just select that in the scatter plot. So so we don't have that actually but but it's something that we can add like if you want to do sort of selection base differential because differential expression analysis part of the sail. Ranger is done in real time like it's not pre process so we can do any sort of we can do the differential expression analysis of the selected cells afterwards.
Yeah but it's not there yet but it's probably something you can easily add. Yeah so so one thing probably for those of you who may have not seen the the wizard may have not used bizarre in the past sort of the idea behind desires also to have this sort of app store for you to have your own apps so each of these apps that you see here they do think the are code for it is open source. It's out there so we can go and modify it in fact actually we can. You can even modify the our code for it within this. R so this is sort of our coded but if if you wanted to have sort of your own set of apps for your own lab for your own group you can easily create that create a repository say and github if you are that like if you can develop or you have someone who can't develop apps and then basically you can specially can get half the URL for for that repository. That's basically your own app store and and then have the apps downloaded from that repository so right now this is the master repository for the apps that. I'm but in case you know you can you can sort of customize it for four to create your own app or pick if you wanted to pick one of the apps and then add new features to it if you know are that's completely like out there it's possible so even for people who who know our. I hope it has. It gives you some some additional value and so right now. It's only not the one that you have downloaded. Yeah but if if for example for example. Bernie is helping out right now with the other apps so what what Bernie has done is that ad repository for for the apps are is public so he have created a fork of that repository and Bernie makes changes and once basically thinks he's done sends me back those changes through and get. There's this thing called pull requests where you basically these are the changes integrate them back and I integrate them back into the surpass story. So so we have to just create a click this and then basically this is sort of the refresh button. This refresh does it locally this refresh.
Does it. Gets it from there so you don't have to download unless if there was also a change to the to the framework because the framework is also once in a while we had any features to it so in which case then you have to also download to redownload but but if you make changes here locally so. I don't know if I if there is anything that I can't right now touch without I don't know here for exactly the title if I change this to a title of test title ok and just run this and this test title this is right now only change so until actually I save this it's not even so if I if I it won't even overwrite my local files but if you now save this in violent prot now you have your saved version like your modified version but this is sort of for quick changes but usually if you want to make more significant changes it's just better to open it in in our studio and this is basically the app the entire code for the sale Ranger hat in our studio. That's where usually we make changes. So do you want to do it. Basically we're working on a few other apps are out in. Monaco all right now among among others we're just polishing those up but we'll do a couple more sessions like this to demo those. They're a little bit more involved but I don't know do you have stuff prepared - I think it's all that is that Bernie had well Israeli but I think it's better to keep it for the next session if you want to play around with them. I mean they are there like you as I said like in this they follow the same principle where you drag the app into the workspace. Sarat you specify so for example Seurat allows so the cell ranger app only allows the 10x pipeline output as in sort of as the input to that but for. Seurat there's there's like two options in Monaco. You can also if you have a read you have your counts in a textual format like if you have a data table. Monaco allows you to also specify that but here if we specify basically the same idea you specify where the data is and but I won't go through the analysis as you can see there's more analysis to step symbol and output directory and again so this one probably I should call it just output.
I don't know Seurat and then there's the different analysis steps that you can specify but I won't go through them but if you want to play with them and they mean well I mean feel free where they can download sure so then you have any feedback. Yeah actually yeah. This primary should have shown this. So if you if you see like bugs are like feature requests. It's fairly quick to report them. So help report bug or suffer feature basically do you only have to specify the title and description and whether it's a bug report or a feature request and the rest are optional but but for example. If you wanted to attach a screenshot you can attach a screenshot and then if you wanted to put your email just in case you want it to be like we can email you back. When that bug was fixed you can also put but that's optional. Like basically even if you just wanted to write a title and description of your requests or bug reports. It's fairly quick so you don't have to do any registration or anything. It's you can quickly just report them from within the tool as the link as you mentioned this. It's in the slides. Maybe maybe you can forward the link to this slides after the after session so so previously it was called. Bazaar seek because it was just for the sequencing data recently. I've renamed it to beef this R so the new website. The old website is still active but with new website is these are software that github that I own and then within that there's a download section so the latest version that I uploaded today it's here and then there's this these are the older versions. All the regions is sometimes good. If for example something breaks into in the new version and you want to quickly revert back to the old version to be able to redo your analysis but generally we try to only put the stable versions here so it's better to just download thank you.
I'm basically at so this is going to be mainly an interactive demo. The slides are there for your reference later. Perhaps but the data that. I'll be using it is their PDF c3k data. Its 2700 single cells and around 33,000 genes and it can be down it said it's basically the typical data set for sort of. I use like maybe a toy data set to do the single cell analysis. But so you can. You can download that if you later you wanted to sort of start playing around with it or you can use your own data set so as I mentioned so maybe I can. I can just start right now into so so once you download a bizarre seek. Let's see if I if I can show you the directory where I download it to my download the desktop. I believe yeah so once you download it with our sake based on the platform that platform you have you use one of these run commands so for on Mac. I'm using the OSX and once you run it you will basically start with basically the main interface of the bazaar so the way these are basically works is that you have your apps here and you drag and drop your apps into the workspace as I will show you in a second and you specify the parameters and run them and the apps underneath are our code and then the our code basically runs generates results brings back results here and then you can sort of chain those results back into other apps to create a sort of a more complex workflows so I'll start with this cell Ranger app that today we're gonna go through and sequencing single cell and you drag and drop so here are the parameters just initially specify the data set directory so the PBF c3k data. That i downloaded and extracted. I just specified the the directory for it and open it and now it's asking me to provide the Opera tree this is where you want the plots and the report and the results sort of to go into it for those of you who are family or the older version of is RC.
And previously. You could only get them into the bazaar seek and then if you had closed a bazaar seek you would sort of lose all those that you had god unless if you had saved them but now we have decided to sort of to save them all also into directory in case you want to go back and and sort of see where you got them so. I'll just specify a new new folder for it. The basic this is where output gonna be so outputs cell Ranger create it and open it so you can say every time that you run the app it you can ask it to create a subdirectory for you. This is sort of so that doesn't overwrite the existing results so if you say that our basic decrease of directory with the date and time and these are just just to sort of facilitate your produce ability if you if you reran rerun the results of the different activity different parameters you sort of keep a history of all you but all you did in the past so. I'll keep that checked so the analysis steps that Stella Ranger provides is cell Ranger pipeline for those of you who maybe may not be familiar with already does some sort of preliminary analysis on the data it has some clustering and they are a library that they have is sort of to poke into that that analysis that they have done to redo new clusterings and to perform further analysis the cell. Ranger app doesn't provide them but then that's why we have Morocco and Seurat which hopefully in the next sort of parts next week will go through them so here you would basically say I want these my cell table and and then if basically if you enable that you can say what do you want to be extracted into your cells table and before I run this. I quickly switch perhaps into slides so so once once the tool is ran it it exports based on the parameters that you have set it exports several tables to that output directory and so so one of the tables is the sales table. And if you have so it's kind of a little bit hard to see but basically it it it will create your.
It will put barcodes for the cells and some information about the you know total um I count a lot normalization. And then the 2d projection of the cells using the T Snee algorithm or PCA. And then there are k-means the results for different k values. 2 3 4 up to 10 as I said this is since L Ranger already. The pipeline already provides in the data its basic this is to give you further access and then you can specify if you want to probe into individual genes you can specify gene list of your choice and also exports the the accounts for those genes and and with the normalization of the. I've chosen so here. This is basically what I'm gonna chose. I'm gonna say yeah. Include all of those and in addition I want to count for the genes and then here is just a list of comma separated genes. I don't remember any so I just gonna cheat and copy/paste them from their slides. So it's just a list of genes any any sort of list of your housekeeping genes or any any genes that you want to sort of further look into you choose the normalization if you want to get their raw values or you want the sum over the median and so some divided by the median or the log 10 of that and let's just run this without the differential expression analysis and then if I just run this it basically starts loading your cell Ranger so this is basically from now and everything is done on the our side so if you if you basically can even open the code for this app their base or open source and run it in our studio if you are familiar with and basically it'll go through the system and when it's done it opens up this report that it has generated the PDF report so right now this the PDF report basically says this is a summary of your data and it has 2,700 cells and the percentage of the valid barcodes or so and so forth and and then this is again these are so the plots that initially are provided by that app but we'll go through these Allah further and to get a little bit more value out of them so the total number of um eyes per cell so here.
Each point is a cell the dark places have have more total um I the and this is a lot to normalize. Obviously the brighter points are the cells with the lower number of um eyes. And this is the k-means results the clustering results. So when you cluster to two clusters three clusters four clusters up to ten clusters and this is again for just preliminary analysis and these are the gene markers for those genes that you had specified initially so whatever genes you had those sort of shows that okay here is this n kg7 gene. And that like this is sort of parrots more saturated you have more counts and so on so forth so this can can sort of help to go back into compare it with which cluster it is if you are looking at these so this is the base output that you get if even if even if you run this in our studio now what bizarre allows you to do is sort of to do some interactive analysis on this and I can start by opening the data table that was just generated so if I open my analysis directory there's this cells data table for the UM I count the Disney coordinates and the principal components k-means results and and those genes that I specified as an input is realized so now so now we can basically start exploring these this this table further. I'll start with this scatter plot so I drag the scatter plot up and then the cells and now as an x and y coordinate I want to use similarly T t1 and T sneak 2 so basically. This is a same plot that you would get as an output of the sort of the cell range of our kid but but now you have sort of more interactive interactivity over it so let's start coloring them with the say total um. I count as we had seen. So this is the total amount you can use even the same change the the the color math and so on so forth to match with your other figures but now let's go and color them by that by different clusterings so I select now. Kane needs 5. You can sort of toggle on on and off certain clusters if you're interested in specific clusters and now you can specify the tooltip to be your barcodes.
Probably not that not that important. I would say these barcodes are very cryptic right. It's but but if you had other information about your cells sometimes sometimes your tables have some phenotype data so you can basically say. I want the tooltips to be that phenotype column and and you can annotate them and so on so forth you can aggregate yourself. Sometimes because of over plotting you may not totally understand if like how big each cluster is so you can aggregate them so here it's basically you're just showing mean and standard deviation around them you can different use different aggregate aggregate functions median and quartiles. Or you can say scale. Just based on the size of each cluster so these are basically your different clusters so you see that cluster. 4 is sort of a smaller cluster. This is just for the size comparison. You can also do gating sort of filtering if you wanted to specifically get this a specific set of cells so you can just get those cells and and here you would get a list of those cells. You can export this as a different table. Just you're interested in those specific cells and so now. I'm going to just show play around a little bit with the another app the parallel coordinates. So if you remember we had those five six genes that for which we had these. DG markers so now if say we want to see how they are doing in different clusters within which cluster they're expressed more so on then. I grab again the data to the parallel coordinates initially. It's too crowded because there are so many columns so I'm just gonna start with first of all I'm going to be doing covering them again. By k-means results 18 is 5 and I'm going to disable the columns except for those genes that gene markers and I'd also don't want the barcodes so now just looking at this. This is sort of an alternative to violent floods.
So so you can you can use the violin plots. This is sort of an alternative. If you want to sort of be able to compare so just looking at this for example we see that this gene. CD 79a is mostly expressed in this cluster. Cluster number five and this gene. AKG seven is mostly expressed in these two clusters cluster. Green number three and then the blue one number two and so on so forth you can sort of go through. I guess this makes more sense if when you are looking at the specific genes of your interest by the way so if in order to compare them fare like so this is when we were exporting the cell Ranger output we selected the normalization to be log ten. So if you were interested in just the raw values for those genes you can get the raw values and then then here you get to each other all of our bodies but these these are ranked on the log ten harmonized we can also sort of we can specify the axis ranges here if we want to compare them so. I'm just gonna do all of them together so I'm just gonna say for all of them go from zero to two and just so that it's a sort of same across that mentions and you can change other visual properties to match again the figures in your paper or elsewhere so if if you want to have a custom color palette for your clusters to match them actually in fact there's one of the color palette custom ones is the one that's used also by by the by the plots which we were just looking at the if I bring back result so here it's so if you were looking at k-means five what why why am I even looking here. I could actually do it in bizarre seek so but so it's basically these clusters but probably better would be if I just do it side by side. I have parallel coordinates here and then here I would love I don't want to be. Where's the right-click here. I'm just going to delete this filter. This is what's just a tree but if we wanted to compare this on that together. Let's use the same color palette that this is using so to do look at both of them together and i lo at lowered areas.
Ellucian here so it's a little bit hard to fit two of them but usually on and hopefully on your screen you have a little bit more space so next is the next set of analysis that so so far is there any questions maybe I should stop and if there are any questions yeah so so. That was sort of our cells table so each point basically our units ourselves and for the next analysis you our units would be genes so if you want to have our genes as are basically points it would be the one that right. Now go through but the one that we had at the time is basically. We had cells as our rows of the table and sort of properties about those cells. So for each cell you can say what's total um. I count or what is the account for specific genes. So or which cluster does it belong to. I don't know if I yeah so so but then to get that list we have to get that list first which we can actually. I'll show you how to sew. So but so that is it so initially if you don't have a list of genes and we want to get that list then we would use the differential expression analysis part of the pipeline so for differential analysis we have to specify so in savage ER when you want to differential the differential expression analysis basically you are saying compare the cells in one cluster against all other clusters right so you have to specify which of those a clustering results you want to use so we had if you remember we had nine of them here so so if you want so and then as a result we're gonna basically get say if we use this clustering you will get everything all the genes that were differentially expressed in this cluster versus this cluster. And then everything that was differentially expressed in this cluster versus that one. So we'll get two sets of differentially expressed genes if you used k-means two if you use k-means five we're gonna get five sets of differential expressed genes each corresponding to one of these clusters so here.
I would choose which one I want to use so some something between two and ten. Let's use five and you can choose whether you want the output only to have the significant genes or old at thirty-two thousand genes. If you this is the sort of to give you a short list of of genes or if you want to get all the genes so right now. I'll just get the shortest of genes. I'll show you what happens. If you get all the genes we can shortlist them later if you want and and then we can specify which columns do. We want to have export to be exported per gene. So what are the columns gonna be. Whether it's significant or not basically significance condition the raw p-value adjusted p-value the law to fold change the order ordering based on. Weathertop how significant was it. Among the other genes and a whole lot of other sort of columns so this is summing them in this cluster compared to the sum of the counts in the other clusters just just so that we get a smaller table right now. I'm just going to say so deselect. And you'll just keep whether they are significant or not and maybe a lot to follow change as well and so just basically gonna give me a significance condition for each cluster so I can say in cluster one this gene was significant or not and then in cluster two whether it was significant or not and close so. I'm gonna get five since I have chosen five as a clustering Jaime's angry I'm gonna get five columns each specifying the significance for one of the clusters. I'm gonna get also five columns for the lot to follow change within each of those clusters so I can also do a heatmap lot of and then say okay. Plot did not the three most significant genes for each of those five clusters in the heat map and then just specify parameters for for the other plots generally. I think this is not that important. If you lay. They're gonna be using the interactive plot to do you know to fine-tune your your plot but what the options is there in case you wanted to sort of have have them in your report so I would.
I would run this. And it'll this time you will notice that it will go through competing. The differential expression for each of the cluster. So by the way this is a the setting probably one benefit of the celery rap is also since a lot of the results are pre competed not the differential expression analysis but the clustering results. It's it's the fastest among those three apps that you have seen so on average like the. Surratt gonna take a matter of minutes. Monaco may take up two hours of depending on the type of analysis you choose so maybe for the for those apps. Maybe you will do some screen recording rather than ding live demo. It won't be as easy to do a live demo for them so so now this time since we did a differential expression analysis if I go back and look at the the output so this is the new output at 1600 yeah. That's a time so now. I'm getting another table called differential expression of the analysis that it's so if I open that table so this is so I could. I could choose to overwrite the existing analysis but since I have chosen to create a new directory. I should make sure that I go and open the results of that the new analysis so so if you notice now I have a c1 significance. Basically the significance for cluster 1 and then lock log 2 fold change for cluster 1 and then c2 significance and see do lot to fold change and c3 significance all the way to c5 so so I have one significance column per cluster. And now if if you want you to take a look at them you can use the table view grab it here and so so these are basically the significant genes that we got if I had chosen to include all the genes here in the sound Ranger so not just a significant ones so if I had unchecked this I would get the list of all the 32,000 genes and then I could then use. There's this column significant in any which basically says whether that gene was significant in any of the clusters. So if we see here it's all true and it would be false for all the remaining of the genes so right now it's 90 genes so now.
I'm not sure if if you have recently noticed in some of the publication's there's a so the conventional way to do so now we want to see whether there is any intersection between the cells that expressed in in Swan clusters versus the other ones and the conventional way was was a Venn diagram and there's a Venn diagram still in bizarre but I'm going to use a different type of plot that's becoming more popular and actually it was initially also Nature Methods. It's becoming sort of a more popular way of visualizing the sets so it's called upset and it has an interactive online version as well unfurl as an hour version. So we have this app made for it so basically. I'm assigning the data to it. I only have to specify what are my columns which is going to use for the 70 section and if you remember we had these significance columns basically it says if a gene was significant in one in the skin in this cluster or not so I'm just gonna select those columns is for cluster. 1 plus T 2 plus T 3 cluster for cluster 5 and. Let's just run it without specifying any other options so now what this basically shows. Well actually let's change some of the options so that it actually looks so let's set the plot labels to be a bit larger. I'm going to just increase all of them a little bit. So we can individually change the plots. Okay just. Let's just run it again just so that it's easier to see so so what it basically shows is that so first of all these are our different clusters so you can see that. Cluster one had the most number of significant genes around 40 something cluster 2 is the next one cluster. 5 is the third because the tree is the other one in cluster four has a few were not fewest number of them and now here it shows all the possible intersections so it shows. I have 41 okay. I have 41 and here their cluster to 38 of them there are five. Genes that are both in. Cluster 1 and cluster 5 for genes in cluster 3 and one gene in cluster 5 and.
There's one gene that's all both in cluster 4 and cluster 1 and it basically shows all the possibility section in a much better way than a. Venn diagram would show and I recommend you using this or giving it tried over the Venn diagram so now if you want to get those genes say if you want to see ok what are those genes that are in cluster 5 and 1. That's when you can use the the va's arms table view app so if I just bring this table view are down here and drag the data so if I'm interested in everything that's maybe I run this again with a little bit smaller font size so that it fits over okay so if I'm interested in these guys these five genes which are cutting cluster 1 and cluster 5. I can come here and say okay. I am interested in the ones that are significant in cluster 1 and also significant in cluster 5 and I get those 5 genes basically so this sort of coincidentally kind of at least visually they to sort of go well together so if you have your upset output you can use it as a global view to your data and then sort of pick the specifics using the table view. And then once you have this you can save this resolve if you want and I don't know cluster. 1 and cluster 5. I don't I named it and then export it to a data table. If later you want to. You know. Also keep that in your analysis directory. Mr. 5 so basically this list. Let's see if there's anything else that. I wanted to cover today most of what I image what I showed. Basically today it's also on this slides just for your for you to revisit so we had maybe. I'll go through them quickly. Just refresh so we can once we get the results. We can use the scatterplot to sort of do interactive analysis to do gating or to get aggregated results. We can use the parallel coordinates to analyze specific genes of interest and this is the differential expression analysis table be disabled a lot of them. I just didn't want to have a large table to look at but we got the offset results and sort of looked at and looked at how to filter our gene lists based on the significance.
Yeah that's that's about it. If are there any questions you have the coordinates okay so baby you can. Actually yes a bit depends on what you select. You can select it to be mean or you can select it to be your median or you can. So that's there's an option and within the parallel coordinates here so you can actually say. I want it to be median and quartiles mean mean only min max range basically just to extend just to make sure that okay. I don't have anything sort of washed out when I when I do. The aggregate so so it depends on what you select and this is sort of it if you remember they did your bar plots sorry box plots or your the violin plots sort of show this this median plus minus your quartiles right so this can sum this can be used. I I think let's see if you can use the violin. Plots to to get the similar kind of it but so we want to group them by cluster. Bach a needs and then we want we say okay. We want the avaya a violin plot for these genes. And let's run it. I didn't try but so though this is the violin. Plot cut that sort of corresponds to this parallel coordinate blood. I think it has its uses but when you are comparing I think this parallel coordinate sort of it gives you a little bit of a more concise and easy way and then you can change your color maps to sort of for them to match if you wanted to. I guess yeah. Is there any other questions if you do. If you want to do. Differential expression between just one population versus another. You would just select that in the scatter plot. So so we don't have that actually but but it's something that we can add like if you want to do sort of selection base differential because differential expression analysis part of the sail. Ranger is done in real time like it's not pre process so we can do any sort of we can do the differential expression analysis of the selected cells afterwards.
Yeah but it's not there yet but it's probably something you can easily add. Yeah so so one thing probably for those of you who may have not seen the the wizard may have not used bizarre in the past sort of the idea behind desires also to have this sort of app store for you to have your own apps so each of these apps that you see here they do think the are code for it is open source. It's out there so we can go and modify it in fact actually we can. You can even modify the our code for it within this. R so this is sort of our coded but if if you wanted to have sort of your own set of apps for your own lab for your own group you can easily create that create a repository say and github if you are that like if you can develop or you have someone who can't develop apps and then basically you can specially can get half the URL for for that repository. That's basically your own app store and and then have the apps downloaded from that repository so right now this is the master repository for the apps that. I'm but in case you know you can you can sort of customize it for four to create your own app or pick if you wanted to pick one of the apps and then add new features to it if you know are that's completely like out there it's possible so even for people who who know our. I hope it has. It gives you some some additional value and so right now. It's only not the one that you have downloaded. Yeah but if if for example for example. Bernie is helping out right now with the other apps so what what Bernie has done is that ad repository for for the apps are is public so he have created a fork of that repository and Bernie makes changes and once basically thinks he's done sends me back those changes through and get. There's this thing called pull requests where you basically these are the changes integrate them back and I integrate them back into the surpass story. So so we have to just create a click this and then basically this is sort of the refresh button. This refresh does it locally this refresh.
Does it. Gets it from there so you don't have to download unless if there was also a change to the to the framework because the framework is also once in a while we had any features to it so in which case then you have to also download to redownload but but if you make changes here locally so. I don't know if I if there is anything that I can't right now touch without I don't know here for exactly the title if I change this to a title of test title ok and just run this and this test title this is right now only change so until actually I save this it's not even so if I if I it won't even overwrite my local files but if you now save this in violent prot now you have your saved version like your modified version but this is sort of for quick changes but usually if you want to make more significant changes it's just better to open it in in our studio and this is basically the app the entire code for the sale Ranger hat in our studio. That's where usually we make changes. So do you want to do it. Basically we're working on a few other apps are out in. Monaco all right now among among others we're just polishing those up but we'll do a couple more sessions like this to demo those. They're a little bit more involved but I don't know do you have stuff prepared - I think it's all that is that Bernie had well Israeli but I think it's better to keep it for the next session if you want to play around with them. I mean they are there like you as I said like in this they follow the same principle where you drag the app into the workspace. Sarat you specify so for example Seurat allows so the cell ranger app only allows the 10x pipeline output as in sort of as the input to that but for. Seurat there's there's like two options in Monaco. You can also if you have a read you have your counts in a textual format like if you have a data table. Monaco allows you to also specify that but here if we specify basically the same idea you specify where the data is and but I won't go through the analysis as you can see there's more analysis to step symbol and output directory and again so this one probably I should call it just output.
I don't know Seurat and then there's the different analysis steps that you can specify but I won't go through them but if you want to play with them and they mean well I mean feel free where they can download sure so then you have any feedback. Yeah actually yeah. This primary should have shown this. So if you if you see like bugs are like feature requests. It's fairly quick to report them. So help report bug or suffer feature basically do you only have to specify the title and description and whether it's a bug report or a feature request and the rest are optional but but for example. If you wanted to attach a screenshot you can attach a screenshot and then if you wanted to put your email just in case you want it to be like we can email you back. When that bug was fixed you can also put but that's optional. Like basically even if you just wanted to write a title and description of your requests or bug reports. It's fairly quick so you don't have to do any registration or anything. It's you can quickly just report them from within the tool as the link as you mentioned this. It's in the slides. Maybe maybe you can forward the link to this slides after the after session so so previously it was called. Bazaar seek because it was just for the sequencing data recently. I've renamed it to beef this R so the new website. The old website is still active but with new website is these are software that github that I own and then within that there's a download section so the latest version that I uploaded today it's here and then there's this these are the older versions. All the regions is sometimes good. If for example something breaks into in the new version and you want to quickly revert back to the old version to be able to redo your analysis but generally we try to only put the stable versions here so it's better to just download thank you.