Hi, and welcome back. This is the main section of this course. It is based on the knowledge that you acquired

previously, so if you haven’t been through it, you may have a hard time keeping up. Make sure you have seen all the videos about

confidence intervals, distributions, z-tables and t-tables, and have done all the exercises. If you’ve completed them already, you are

good to go. Confidence intervals provide us with an estimation

of where the parameters are located. However, when you are making a decision, you

need a yes/no answer. The correct approach in this case is to use

a test. In this section, we will learn how to perform

one of the fundamental tasks in statistics – hypothesis testing! Okay. There are four steps in data-driven decision-making. First, you must formulate a hypothesis. Second, once you have formulated a hypothesis,

you will have to find the right test for your hypothesis. Third, you execute the test. And fourth, you make a decision based on the

result. Let’s start from the beginning. What is a hypothesis? Though there are many ways to define it, the

most intuitive I’ve seen is: “A hypothesis is an idea that can be tested.” This is not the formal definition, but it

explains the point very well. So, if I tell you that apples in New York

are expensive, this is an idea, or a statement, but is not testable, until I have something

to compare it with. For instance, if I define expensive as: any

price higher than $1.75 dollars per pound, then it immediately becomes a hypothesis. Alright, what’s something that cannot be

a hypothesis? An example may be: would the USA do better

or worse under a Clinton administration, compared to a Trump administration? Statistically speaking, this is an idea, but

there is no data to test it, therefore it cannot be a hypothesis of a statistical test. Actually, it is more likely to be a topic

of another discipline. Conversely, in statistics, we may compare

different US presidencies that have already been completed, such as the Obama administration

and the Bush administration, as we have data on both. Alright, let’s get out of politics and get

into hypotheses. Here’s a simple topic that can be tested. According to Glassdoor (the popular salary

information website), the mean data scientist salary in the US is 113,000 dollars. So, we want to test if their estimate is correct. There are two hypotheses that are made: the

null hypothesis, denoted H zero, and the alternative hypothesis, denoted H one or H A. The null

hypothesis is the one to be tested and the alternative is everything else. In our example,

The null hypothesis would be: The mean data scientist salary is 113,000 dollars,

While the alternative: The mean data scientist salary is not 113,000 dollars. Now, you would want to check if 113,000 is

close enough to the true mean, predicted by our sample. In case it is, you would accept the null hypothesis. Otherwise, you would reject the null hypothesis. The concept of the null hypothesis is similar

to: innocent until proven guilty. We assume that the mean salary is 113,000

dollars and we try to prove otherwise. Alright. This was an example of a two-sided or а two-tailed

test. You can also form one sided or one-tailed

tests. Say your friend, Paul, told you that he thinks

data scientists earn more than 125,000 dollars per year. You doubt him so you design a test to see

who’s right. The null hypothesis of this test would be:

The mean data scientist salary is more than 125,000 dollars. The alternative will cover everything else,

thus: The mean data scientist salary is less than or equal to 125,000 dollars. It is important to note that outcomes of tests

refer to the population parameter rather than the sample statistic! As such, the result that we get is for the

population. Another crucial consideration is that, generally,

the researcher is trying to reject the null hypothesis. Think about the null hypothesis as the status

quo and the alternative as the change or innovation that challenges that status quo. In our example, Paul was representing the

status quo, which we were challenging. Alright. That’s all for now. In the next lectures, we will see some examples

and learn how to make data-driven decisions.

GOOD JOB

I'm pretty sure you made a mistake in your second example. The null must always be a form of equality and cannot be a >.

@365 Data Science I'm a little confused as to which hypothesis is supposed to be the counter and which we're trying to prove, since I watched a video by Investopedia where they said that the alternative hypothesis is ultimately the one we're trying to prove, while the null hypothesis is the one that's true until proven false. But my professor does the opposite. Does it really matter which hypothesis i set to = and =/=?

Thank you!

you will NEVER be able to test the hypothesis of a Clinton Presidency (Hillary) vs a Trump Presidency.

DEFINITELY Clinton administration

Thank you.

THIS MADE ME MORE CONFUSED

basic concepts in programming

this was sooooooooo helpful thanks a million

What is a null hypothesis

I was taught that the alternative never has = sign in it

This was awesome thank you so much.

confused… coz if i am not mistaken, null hypothesis use the symbol of "equal sign" and alternative hypothesis use "symbol of inequalities" like <, >, and is not equal to.

Awesome one!

I am here for more!

3:27 but i read somewhere that null hypothesis always contains equal sign…and that's the thing that confuses me in every single question

I was taught to never accept a hypothesis, but to either reject it or not reject it

Hey everyone, check out our super-informative webinar “Data Science for Beginners”! It’s free and places are limited, so don’t miss out! Save your spot today here: http://bit.ly/2YqhPeD

Hope to see you there!

Why is there a zero by the mean sign. What does it signify? Does it need to be there? It is in both null and alternative hypothesis

They OJed Obama🤦🏽♂️