Basic Statistics : Your Passport to Data Proficiency

Ashish J. Edward
Sep 6, 2023
10 min read

Updated: Oct 11, 2024

What is Statistics ?

Statistics is like a super useful tool that helps us make sense of the world. You know how we're surrounded by numbers and data all the time? Well, statistics steps in to make that data talk and tell us interesting things.

Think about it this way: imagine you're at a party, and you start chatting with people about different things like how tall they are, how much money they make, or how many hours they sleep. Statistics is like the cool friend who helps you make sense of all that information.

See, when you collect a bunch of data, like everyone's heights at the party, statistics steps in to organize, analyze, and interpret that data. It's like finding patterns and stories hidden in the numbers.

You can use statistics to figure out the average height of the people at the party, which tells you about the "typical" height. Or you can see how spread out the heights are, giving you an idea of how much variety there is.

But wait, there's more! Statistics isn't just about numbers; it's also about uncertainty. Life is full of uncertainty – like trying to predict the weather or guessing the outcome of an election. Statistics helps us deal with that uncertainty by giving us tools to make informed guesses and decisions.

You might have heard of terms like probability and confidence intervals. These are statistical concepts that help us navigate the uncertain waters of life. They let us say things like "there's a 95% chance that the average height of people at the party is between 160 cm and 180 cm."

Now, think beyond parties. Statistics is everywhere! From predicting election outcomes to understanding trends in the stock market, it's all about using math to help us understand the world around us a little better.

But here's the deal: statistics isn't magic. It's a tool, like a hammer or a wrench. And just like any tool, you need to use it right to get meaningful results. If you mess up the way you collect or analyse data, you could end up with wrong conclusions.

So, when you see statistics being used in news, research, or even in everyday conversations, remember that it's about finding the story within the numbers. It's about turning that bunch of data into insights that help us make better decisions, understand the world better, and maybe even impress your friends with your newfound data-wrangling skills.

Statistics is the science of collecting, analysing, presenting, and interpreting data. Its fundamental purpose is to use diverse computational and mathematical techniques to make sense of big and complex sets of information. Statistics assists us in reaching meaningful conclusions, making sound judgments, and identifying patterns or trends in data.

What is Population and Sample data ?

Population and sample data are like keys to unlock insights from a massive puzzle. Imagine you're curious about a whole city's music taste – that's your population, everyone's preference combined.

But let's be real, talking to every single person about their favourite song isn't practical. That's where the sample data comes in. It's like getting a handful of puzzle pieces that represent the whole picture. So, you might ask a bunch of folks from different neighbourhoods about their music choice.

Here's an example: you're analyzing smartphone usage in your town. The entire town is your population, but you select a sample – maybe a hundred people – to study instead. This sample should mirror the town's diversity – all ages, jobs, and interests.

Think about ice cream flavours. If you want to know the town's favourite, you don't need to ask every person. A well-chosen group can give you a pretty close answer.

Now, here's the kicker: your sample's quality matters. It's like picking puzzle pieces that fit well. If you're trying to understand music trends, your sample should include young and old, rock fans and jazz enthusiasts.

In a nutshell, population and sample data let you study big things without going nuts. They help you understand trends, preferences, and behaviours without talking to every single person. Just like puzzle pieces, they reveal the bigger picture, making your research efficient and meaningful.

Population Data: refers to all the individual observations or pieces of information that belong to a particular group you're interested in studying. It's like looking at every single member of that group.

Sample Data: is a smaller set of observations selected from the larger population. The idea is that you study this smaller group to make conclusions or predictions about the whole population. It's like taking a representative chunk that gives you an idea of what's going on in the larger group without having to examine every single piece of data.

When you collect data from a population or a sample, there are various measurements and numbers you can calculate from the data.. A parameter is a measure that describes the whole population. A statistic is a measure that describes the sample.

Sampling Techniques

Each sampling technique has its own flavor, and I'll toss in examples to help you pick the right one.

Random Sampling : This is like drawing names from a hat. Every member of your population has an equal chance of being picked. Use it when you want to avoid bias and have a diverse sample. For instance, if you're measuring the average height of students in a school, randomly pick a group of them.

Stratified Sampling : Think of this as making layers of a cake. Divide your population into smaller groups (strata) based on specific characteristics (like age or gender). Then, randomly pick samples from each group. Use it when you want to make sure you've got representation from all the different layers. For example, if you're studying a city's population, you could divide it into neighbourhoods and pick samples from each.

Systematic Sampling : Imagine you're collecting leaves from a tree. You don't pick randomly, but you choose every nth leaf. Use this when you have a list of the population and it's too big to go through randomly. For instance, if you're analysing customer feedback forms, you could choose every 10th form.

Cluster Sampling : Picture a potluck party. Instead of inviting everyone, you invite a few whole groups (clusters) of friends. This works when your population naturally forms groups. For example, if you're studying schools in a district, you could randomly choose a few schools and collect data from all the students in those schools.

Convenience Sampling : Sometimes, you just go with what's easy. If you're at a café and you ask your nearby friends for opinions, that's convenience sampling. Use it when time and resources are limited. But be careful, it might not represent the whole population accurately.

Purposive Sampling : This is like handpicking players for a specialized sport. You choose specific individuals who fit your research needs. If you're studying expert opinions on a topic, you'd purposefully select experts.

Snowball Sampling : Think of a snowball rolling and getting bigger. Start with one participant, then they refer you to others, and the chain keeps growing. This is great for finding rare or hard-to-reach groups. For example, when researching a niche hobby or a small immigrant community.

Remember, the choice of technique depends on your research goals, the population you're studying, and the resources you have. Each technique has its strengths and weaknesses, so pick the one that best fits your needs like choosing the right tool for a job.

Sampling error is the gap between what you find in a sample and what's true for the whole population. Even when you pick a sample at random, it won't match the population perfectly in terms of numbers like averages and spreads. Scientific quests aim to apply findings from samples to the larger population, so smaller sampling errors are the goal. To tame this error beast, you just pump up your sample size – that's the secret to sharper results.

Sample size calculator : http://www.raosoft.com/samplesize.html

Nomenclature & Statistical symbols

Formula

Sample data is an approximation of a population, hence, it always has error or a level of uncertainty built into them. The big question here is “Is this sample a good approximation of the population ?” Due to this uncertainty, we have a different formula for sample vis-à-vis the population.

The symbol ‘Σ xi’ used in this formula represents the represents the sum of all scores present in the sample (say, in this case) x1 x2 x3 and so on.

The term “x_bar” represents the sample mean. The symbol ‘Σ xi’ used in this formula represents the represents the sum of all scores present in the sample (say, in this case) x1 x2 x3..

The symbol ‘n,’ represents the total number of individuals or observations in the sample.

The term ‘Σ ( xi – x_bar )2’ represents the sum of the squared deviations of the scores from the sample mean.

On MS Excel

Don’t let these symbols bother you. These are very simple measures used in statistics and yes, they are very practical in today’s world.We will cover these in detail next with examples.

Sampling Guidelines

Types of Data

In the world of statistics, a variable is like a trait of what you're studying. Picking the right traits to measure is crucial for solid experiments. Imagine you're exploring whether exercise improves memory. Key traits to measure could be the type of exercise, the time spent exercising, and memory test scores. These traits guide you in picking the right statistical tools and unravelling the study's results.

Quantitative variables are about numbers you can do math with – like counting apples or measuring temperature.

Continuous variables are like things that can have any value, no matter how small or big – for instance, your height or the time it takes to run a race.
Discrete variables are like things you can count individually – imagine the number of books on your shelf or the number of friends you have.

Categorical variables are about sorting things into groups – like colours (red, blue), types of pets (dogs, cats), or favourite genres (action, comedy).

A binary variables have only two outcomes. E.g. Yes/No, Pass/Fail, Head/Tails.
A nominal variable is when things are sorted into groups, like sorting colours or types of animals, without any quantitative value. They are only categorized.
An ordinal variable is about arranging things in order, like ranking movie preferences as "favourite," "liked," or "disliked.“ They can be both categorized and ranked.
An interval variable is a type of measurement where the numerical difference between values is consistent, but it lacks a true zero point; for instance, temperature measured in Celsius, where the difference between 10°C and 20°C is the same as between 20°C and 30°C, but 0°C doesn't signify a complete absence of temperature. They can be categorized, ranked and infer equal interval between data points, but they lack a true zero point. E.g. Temperature, Test scores etc.

The way you measure something affects how you make sense of your data. The way you measure, whether it's by counting, ranking, or measuring precisely, puts limits on the types of stats you can use to understand your info. And if you're aiming to test out ideas with your data, the measurement level also shapes the kinds of analyses you can do. So, before you dive into collecting data, decide on the measurement level that suits your goals best!

Wrap your head around this : A variable can wear multiple hats! Take ordinal variables – they can also play the quantitative game if their scale uses numbers and doesn't stick to whole numbers. For instance, those star ratings on product reviews? They're ordinal (1 to 5 stars), but when you average them, you're diving into the quantitative world.

Arithmetic Mean: When you want a simple average that treats all values equally, like checking the average age of a group, go for the arithmetic mean. It's like adding everything up and dividing fairly.

*Geometric Mean: But, when you're dealing with stuff that changes in percentages, like investment returns over years, the geometric mean shines. It gives you an average that considers growth rates, making it great for comparing things with varying scales. Think of it as a "growth-friendly" average!

Standard Deviation: Imagine you have a bunch of test scores. If you want to know how far each score is from the average, use standard deviation. It's like showing how much the scores spread out from the middle.

**Relative Standard Deviation: Now, let's say you're comparing the weight fluctuations of two groups—one group is kids, and the other is adults. Since their weights are on different scales, the relative standard deviation helps you compare how much the ups and downs vary in relation to their average weights. It's like a fairness check for comparisons!

Type of statistics

Statistics is like a treasure map guiding us through the vast world of data. Within this map, two key landmarks stand out: descriptive statistics and inferential statistics. Let's explore these landmarks to understand their roles in uncovering insights from numbers.

Descriptive Statistics are like a snapshot of your data – a quick glance that captures the essence. Imagine you're taking a photo at a party. You capture the excitement, the laughter, and the overall vibe. Descriptive statistics do the same for data. They summarize the information, giving you a sense of the big picture without diving into intricate details.

These stats include measures like averages, which tell you what's "normal" or "typical." Then there's the range, which indicates how spread out the data is – like understanding the diversity of partygoers' ages. Standard deviation measures how much the data points vary from the average, providing insights into the level of consistency or variation.

Inferential Statistics, on the other hand, are like detectives on the case, unravelling the hidden stories within the data. Imagine you're trying to predict the outcome of a sports match based on players' past performances. Inferential statistics take a sample of data and use it to make educated guesses about the larger population.

Inferential stats help us make predictions, test hypotheses, and draw conclusions. They work by examining the relationships between variables. For instance, if you're studying the effect of sleep on exam performance, inferential statistics can determine if there's a significant connection or if the correlation is just coincidental.

Picture election polls – these are prime examples of inferential statistics at work. By surveying a sample of voters, analysts can predict how an entire population will likely vote. This allows us to gauge the bigger picture without having to question every single person.

In a nutshell, Descriptive statistics distill data into clear summaries, spotlighting trends and patterns. Inferential statistics make educated guesses about larger groups using smaller samples, unveiling deeper insights. Together, these two landmarks in the world of statistics help us extract valuable insights, enabling us to make informed decisions, test hypotheses, and understand the stories hidden within the numbers. So, whether you're summarizing data or drawing educated conclusions, these statistical tools guide you through the exciting journey of data exploration.

Basic Statistics : Your Passport to Data Proficiency

Recent Posts

Comments

Discover, Learn, and Rise Each Day