Hypothesis Chi square test for Data Analysis using Excel


Thanks for all the love that you have given in the last two statistical videos I have talked about Z test and t-test using some examples and I got a great response from those videos and that is one of the reasons why I am here for this particular video on key Square test key Square G square whatever you can call well we know what is Q Square test we usually study it under statistics under hypothesis testing but I'm pretty much sure many people might not be clear with the basics on where to use this kind of test in which instance we have to use it and what is the Practical applications of it so let's watch out this video where I have explained what exactly is key Square test how do we implement this k-square test using Excel and of course if you implement using Excel and if you are clear with the basics part I'm pretty much sure you can take ahead your learning and try it out using python so see you in the video I am sure that each and everyone will be able to understand what I am teaching and in case you like my video please like share and subscribe the channel and let me know what should be the topic of my next video in the comment section below see you in the video hi welcome to this module on statistics and in this video we shall be talking about the chi-square test or people call it as g-square test now chi-square test is basically used for categorical data now as I already told you that this test only works for categorical data which means the data which are in categories for example gender male or female or color red yellow green Etc in this particular example I have two categorical features one of them being gender which has two valid values and one of them being location which has three valid values which is Japan India and Korea now you can see the formula of chi square is basically summation of observed minus expected value to the power of 2 divided by expected now what is observed and what is expected value I'll try to explain so before getting into this problem we'll try to create our null hypothesis and Alternate hypothesis because ultimately we are going to use chi-square in order to check and test our hypothesis I hope everybody knows about hypothesis as we already covered what is hypothesis simply a hypothesis is a statement that might be true which can be tested in this particular example let me write down my null hypothesis which will be there is no association between gender and location the alternate hypothesis will be there is an association between gender and location all good if P value is greater than 0.

05 that means I will reject my null hypothesis if the p-value is less than 0.05 that means I will accept my null hypothesis now comes the first thing is what is O what is e o is basically observed values and E is basically your expected values in order to find the observed values I'll take help of pivot table so I will ask the table range to be this and location of my output to be somewhere here done my gender I will drag it to my rows location in my columns gender in the values and that's my pivot table so I'll kind of copy paste this pivot table to create my observed values so I'll just delete all these labels just take this like this okay and here I will note it down as observed values so these are my observed values my female female customers staying in India are six female staying in Japan are six female staying in Korea are zero male staying in India are two two and four this is my road total I will write it down as my row total and this will be my column total and basically this value 20 is nothing but my grand total okay simple now that's my observed value similarly I will create another table for my expected values I will empty all these records because I don't need all these things right now how to calculate the expected values it's very simple expected values are nothing but is nothing but row total multiplied with column total divided by grand total simple that means the expected value for Indian female customers are the row total multiplied with the column total divided by the grand total which is 4.

8 similarly for Japan will be row total divided by this sorry multiplied with this divided by the grand total Korea will be the root total will be 12 multiplied with this divided by this which will be 2.4 for male will be this multiplied with this divided by this is 3.2 and for male Japan will be this multiplied with this divided by this which is similar to India and for Korea will be this multiplied with this divided by this which makes it 1.16 everybody is clear with this table I will simply copy this and paste it I will delete all these values because I'm not going to take help of that and this particular thing is nothing but my this operation o I will write it down as o minus e to the power of 2 divided by E okay that's the mathematical expression same as this one so what will be this this will be observed minus expected give it a bracket to the power of 2 divided by expected that is 0.3 I will drag it for these three values and drag it for these three cool so you can see I'll repeat once again this is nothing but my observed 6 minus 4.8 which is 1.2 to the power of 2 which will be 1.44 divided by 4.8 so o minus e to the power of 2 divided by e if you want to validate this one let's say for this one it will be this value minus this value to the power of 2 which is 1.2 to the power of 2 divided by 3.2 okay now what will be my summation of this operation the sum of this operation my sum of this operation is nothing but I will write it down here this will be sum of all these values which is nothing but 7.5 so this basically becomes my x square value x square value this is nothing but my Chi Square value what will be my DF which is my degree of freedom my degree of Freedom formula is very simple degree of freedom is nothing but my row total like The valid values in row minus 1 multiplied with column minus 1. which basically means how many rows do we have here pay mail and mail minus 1 is 1 how many columns do we have India Japan and Korea minus 1 is 2 so it will be 2 minus 1 multiplied with column minus one okay so DF value will be 2 1 multiplied by 2.

Now this is where I will calculate my P value my P value formula is I will use the chi Square G Square dot disk dot RT which basically means it Returns the right tailed probability of the G Square distribution I'll call it I will pass my x square value past my degree of Freedom press enter and my P value is 0.02 I will try to create a table out of it that becomes my P value 0.02 and as p-value is less than 0.05 that means we are going to accept our null hypothesis which means there is no association between gender and location so my p-value is going to be 0.02 Which is less than 0.05 so our null hypothesis is going to be accepted which means there is no association between gender and location that is an end to this particular session on G Square that's it see you in the next videos for further topics.