Hypothesis Testing – A Statistical Technique to Reject or Accept An Existing Claim

Business Analysts have to assist decision makers to take right decision by providing timely insights, informations and facts. Decision makers encounter various tough questions in day to day business management. Following are few of them.

  1. Should XYZ Company reduce man power to 50% due to corona pandemic.
  2. Which incentive scheme best motivates employees in manufacturing industry.
  3. Increase in the spend budget by 30% would result in 50% sales increase.

Such questions need a statistic approach, to establish claim and evaluate the claim to accept or reject. Hypothesis is a statistical technique to establish a claim and “Hypothesis Testing” is a procedure for decision making to accept or reject claim.

What is Hypothesis?

It is an assumption about population or population parameter like population mean (µ), variance or proportion (p). It is something to be proved or disproved.It is an educated guess or a claim. It is also a tentative explanation of a principle operating in nature.

There are 2 types of hypothesis i.e Null Hypothesis denoted by  H0, and Alternative Hypothesis (H1).

Null Hypothesis – H0 :Null Hypothesis states that No or NULL condition exists, there is no new happening, the old theory is still true,  the old standards or quality are still correct, and system is under control. It assumes that no difference or no effect

Alternative Hypothesis H1 : Alternative hypothesis states that there is a new theory exists and it is true, there are new standards, existing system is not under control or something difference is happening all the time.

Application of Hypothesis Testing and Type of Test

No.ApplicationExampleTest Used
1To check whether Sample Mean =Population MeanAvg salary of Company XYZ, Sales Dept employees is Rs. 20,000One Sample T Test or Z Test
2To compare Mean of One Population Vs Mean of another populationApple Vs. Samsung mobile phone sales in a regionIndependent T Test or Z Test for both companies separately & compare
3To compare the effect before and after a particular event, on the same sampleEffect of a BP drug on patients, before and after consumption of the drugPaired T Test
4To compare more than two independent variables  or >2 independent populationsSales of Nokia, Samsung and Motorola mobilesANOVA or F TEST
5To find the association between 2 attributes (attributes are usually categorical/qualitative)Person’s edu qualification with his earning capacity/salary/professionChi-Square Test
6To find out Goodness of Fit (Observed =Expected or not)Estimated Vs actual salesChi-Square Test
Statistical Sampling Tests

Steps in Testing of Hypothesis

  1. State or define correctly Null hypothesis H0 and Alternative hypothesis H1. This primary step is very crucial in the entire process. Misrepresentation or misinterpretation of facts and wrongly understanding Null hypothesis(H0) or wrongly defining Alternative hypothesis would lead to wastage of time, money and effort to establish and prove unnecessary facts and assumptions. Read more…
  2.  Specify the Level of Significance In Statistics ‘margin of error’ always exist in any model we build. There is no room for 100% or fool proof model or estimation. This a stage, where a researcher defines ‘allowable non-confidence limits’, where a percentage of error would be considered as inevitable. As error being an inevitable part of accepting or rejecting a hypothesis, significance level defines the Probability of committing Type 1 Error, i.e Rejecting Null hypothesis H0, when it is true. If a researcher assures 95% accuracy, then he is asking 5% error tolerance. This 5% or 0.05 is statistically referred as α ( alpha). Read more..
  3. Use Appropriate Statistical Test This third step depends on the requirements and the definition of Null hypothesis H0 and Alternative hypothesis H1 constructed in the first step. Types of test can be classified as follows: Read more….
Hypothesis  Parametric Tests
Parametric Hypothesis Tests

4. Decision Rule

All statistical tests give one parameter i.e P – value, based on this parameter we can be judgemental to accept Null hypothesis (H0) or reject Alternative hypothesis (H1).

Siginificance levels may set at 95%, 98% or 99%. Accordingly the alpha (α) i.e expected error percentage value changes to 0.05, 0.03 ot 0.01. A researcher should be 1 – α confident to prove his Alternative hypothesis (H1). In a nutshell:

Decision Rule – 1 : If P – Value is < = 0.05 reject Null hypothesis

Decision Rule – 2 : If P – Value is >= 0.05 accept Alternative hypothesis

If alpha is 0.05 and P-Value is 0.02, then researcher is 98% confident of his Alternative hypothesis H1, he can reject old or existing claim i.e Null hypothesis.

5. Draw Conclusion

Based on the P -Value parameter or table value of T -value or F-Value, a researcher draws his conclusion of rejecting or accepting Null hypothesis.

How to deal with sparse data in Machine Learning?

Sparse data means incomplete or lack of input data or data with missing values,  on which we train machine learning models to predict.

On other hands, Data Density is exactly the opposite situation, where you do not have missing data.

The following data shows data sparsity and data density :

Name Age

Income ($)

Benny

27 NA#(0)

Anna

21

NA#(0)

Alice

NA#(0)

8,500

Benson

29

NA#(0)

Hudson

NA#(0)

7,500

Sheila NA#(0)

9,800

 

In the above table we can notice NAs (Not Available, represented as 0) . The above table as 6 X 3 sparse matrix of 18 elements ( with 6 rows and 3 columns excluding column labels), has 5 elements with 0 as value. So, we can say that the above input data have 28% – data sparsity and 72%-data density.

Data sparsity is the real time scenario, one would come across to deal with it. It is normal in customer-friendly surveys with extremely sensitive to personal information gathering process.

How do we scale the sparse data?

Scaling data is another significant preprocessing task to be carried on your sparse data. In SKlearn, sklearn.preprocessing has one class – MaxAbsScaler and a function – maxabs_scale

For our example let us use one simple array with Sparse data.

Code:

from sklearn.preprocessing import MaxAbsScaler

import numpy as np

sparse_matrix = np.array([[ 25, 28,  0],

[ 47,  0, 30],

[  0,  50, 80]])

scaler = MaxAbsScaler().fit(sparse_matrix)

scaler.transform(sparse_matrix)

Output:

array([[ 0.53191489,  0.56      ,  0.        ],

[ 1.        ,  0.        ,  0.375     ],

[ 0.        ,  1.        ,  1.        ]])

Note : Above code compiled and executed in spider GUI with Python 3.0.

MaxAbsScaler maps each element to its Absolute value between [0,1], and it does on positive values only and disregards all 0 values. This class does not take care of outliers is another drawback.

 

Google Ad Words replaced Sidebar Ads with Product Listing Ads

Google Ad Words replaced sidebar ads with product listing ads

I happened to notice Google Ad Words replaced Sidebar Ads with Product Listing Ads. Now you can see just 4 ads on top and 3 to 4 ads below the Search Engine Result Pages (SERP).

This is how I got an SERP for “T Shirt Printing in Bangalore” keyword.

Notice just 4 ads on the top

Adwords 4 Ads at Top

3 Ads on downside of Search Engine Result Page

Adwords Bottom Ads

What about side bar ads?

Side bar ads have been replaced by Product Listing Ads.

Side Bar Ads Replace with Product Listing Ads

Why the change ?
1.Google might have noticed that, people are not reaching out textual or search ads appearing on side bar. Consequently, side bar ads are poor candidates for contributing good CTR, hence resulting in poor conversion rate.
2.Google wants us to associate tiny images along with textual or search ads, like we do in Face book ads. It is possible through Product Listing Ads.
3. Google is convincing us to make use of Product Listing ad technology for enhancing, user search experience. I like this campaign type because, it is my chance to bring down my virtual store on to Search Engine Result Pages with appealing product images.
What google says ?
There is no official confirmation from Google about this new format of Adwords ad display. But there is buzz around the internet . Check out SEMPOST latest article for further reading.

Adobe Site catalyst first party and third party cookies. What to choose?

Cookies are fundamental building blocks of web analytics tools. It is small chunk of software code placed into visitor browser to track the some vital information about visitor.

But, with regard to web analytics cookies in general and site catalyst or google analytics cookies in particular usage of cookies is governed by Personally Identifiable Information ( PII) governed by “European ePrivacy Directive

Cookies can be classified as First Party and Third Party Cookies.

First party cookies are domain specific cookies placed in visitor browser by web analytics vendor on behalf of customer. Upon placing first party cookies, visitor information collected by Adobe site catalyst will not share with any other domain or party.

As a first part cookie, it collects all data anonymously with no reference to any personal data what so ever.

First Party cookies set by site catalyst Reporting and Analytics recognizes visitors who traverse across sub domain and Top Level Domains. For example first party cookies set on musicworld.com also recognizes the visitors who traverse across music. musicworld.com or music. musicworld.co.in

First Party cookies are :

  1. s_cc : This is cookie checks whether browser has enabled to accept the cookies are not. The default value is true , i.e. cookies are enabled, if not it is ‘false’
  2. s_sq : This cookie is to keep the click map data from previous page.

Third Party Cookies

Adobe also uses third party services like 207.net and omtrdc.com to track and collect visitors data. These cookies are third party cookies where in data collected by these 3rd parties may share data[there is no evidence] with other other domains, to target user on other domains for Remarketing or Re Targeting purpose. On other hand majority of browsers have sophisticated filters to reject third party cookies due to security concerns. To surprises , even first party cookies can also be rejected. This threatens the accuracy of web analytics data being collected by enterprise level web analytics tools.

s_vi[##] is third party unique visitor identification cookies placed by 207.net if you had chosen to go with third party cookies. On other hand, you can opt for first part cookie, if you could choose to work with first party cookies.

Last but important,Site catalyst cookies usage is governed by “European Union ePrivacy Directive”, learn from the leader. Happy visit to Adobe Digital Marketing Blog for more information about European Union ePrivacy Directive