Schottler Consulting Social and Market Research Knowledge Centre
What is data weighting and how do I do it?
Data weighting is a process that aims to correct for distortions in the composition of a survey sample. For instance, perhaps you have 60% males, but you should have 50%, so you may want to decrease the effect of males so that they are around 50% of the sample.
So how does one do this and use weights in a stats package like Stata, SAS or SPSS?
Tips for success in weighting social and market research survey data
The following tips are useful in helping with basic data weighting:
First work out which groups in your survey sample are most likely to affect the results. For instance, this may include gender (males and females), the age of respondents (perhaps you have 3 groups) and their location (metro and non-metro). These are likely to become the factors that you need to incorporate into your data weighting
Next find a reference population - this is the profile of the true population. For instance, you may use population counts by gender x age x region from the Australian Bureau of Statistics Census
If you're running a customer or stakeholder survey, the 'reference population' may be the gender x age x location profile of your clients. Though only use this data as a reference population IF it is known
Then convert your surveys and reference population to a percentage of the total table - If the males 18-24 in Melbourne survey cell, for instance, has 50 respondents and there are 500 total respondents, this cell would be 50/500 or 10% of the total table
Repeat the same steps for each cell within the reference population
(your ABS population or other reference population)
Now you should have 2 tables each with percentages that sum to 100% within the table
During the last step, divide the percentage in each cell in the reference population table by the percentage in each cell in the survey table - This produces a basic weight. For instance, 20% in survey cell A (reference population) divided by 10% in the survey cell reference population = 2.0
If you ever get confused which to divide by which - think of it this way - Is the cell under-represented or over-represented relative to your reference population.
If it's under-represented, you have to obviously weight it up (so the weight should be more than 1.0). If it's over-represented, you weight it down (so the weight should be a proportion - like 0.5 etc.)
Import all the weights into your statistical package - If there are 2 genders x 3 age groups x 2 locations, you should have 12 different weights
This then gives you the ability to weight a basic survey.
The benefit of applying the weights is that it will make sure that males/females, different age groups and different locations each contribute to your overall results in the proportions that match reality (your reference population)!
It should be noted that there are also far more complicated forms of data weighting. These adjust for the probability of respondent selection in the sample, adjust for sampling distortions and also adjust to population benchmarks (e.g., Australian Bureau of Statistics data).
Landline and mobile surveys also involve very complex dual frame weighting approaches that also do this and additionally combine the landline and mobile frames together in a way that adjusts for a respondent's phone ownership.
However, as dual frame weighting can be very complex, many researchers and even companies do not know how to weight dual frame samples.
One recent example a dual frame weighting approach we developed for a Victorian study is have here for reference (Refer p29).