## N Median Mode Range

**SPSS 1 Assignment Instructions**

This assignment is designed to help teach you to describe a single variable – its central location, its dispersion, to create an appropriate graphic to illustrate the variable, and to discuss the way in which you variables distribute.

**Use the following format:**

- A title page with “SPSS 1: Describing a single variable” as the title and your name, section, TA’s name, professor’s name, date, G# in the upper, right hand corner.
- Start a new page for each variable.

- The variable name in
**bold**and underlined at the top of the page.

- Your answers should have the following sections:
- a) A properly formatted frequency table for the variable you are describing.
- b) A table of appropriate summary statistics.
- c) An appropriate graphic.
- d) A paragraph describing your variable.

**For each of the variables:**

- Identify the level of measurement for each variable.
- Build a table that shows the cumulative % and frequencies.
- The table must be in
**APA format** - Some variables will have to be recoded to effectively display in a table. RULE
OF THUMB – no more than
**10**categories should appear in ANY table.

- The table must be in

- Report the summary statistics that describes the variable in terms of
**all the****appropriate measures of central location and dispersion**.

- Create
**1 appropriate**graphic to display the distribution. - Write a paragraph that describes the distribution in terms of central location, dispersion, outliers, and skew (if any).

**These are your variables:**

- From the
**WORLD**2012dataset use the variable named “**polity**” with the label “**Higher scores more democratic (Polity)**”. - From the
**NES 2012**dataset use the variable named “**dem_marital**” with the label**“Marital Status”**. - From the
**NES 2012**dataset use the variable named “**relig_attend**” with the label “**Attendance: Religious Services**”. - From the
**GSS 2012**dataset use the variable named “**wordsum**” with the label**“Number of words correct in vocabulary test”.** - From the
**GSS 2012**dataset use the variable named “**educ_4**” with the label “**Education in 4 Categories**”.

*****Sample Problem*** **Variable: **age5**

- Level of measurement for this variable is
**ordinal**. - Cumulative % frequency table:

Age in 5 Categoriesa | ||

XF | % Cum % | |

18-30 437 | 21.7 21.7 | |

31-40 384 | 19.1 40.8 | |

41-50 403 | 20.0 60.8 | |

51-60 369 | 18.3 79.1 | |

61+ 421 | 20.9 100.0 | |

Total 2015 | 100.0 | |

a. General Social Survey 2008 |

III. Table of summary statistics:

Summary Statistics |

N Median Mode Range Minimum Maximum Q1 Q3 IQR V-ratio2015 41-50 18-30(61+) – (18-30) 18-30 61+ 31-40 51-60 (51-60) – (31-40) 0.783 |

** **Bar chart:

- Descriptive paragraph requirements:
- What does your variable measure?
- Describe this variable in terms of all appropriate measurements of central
- Describe this variable in terms of the appropriate measurements of dispersion.
- If appropriate, is this distribution skewed negative or positive?
- Discuss any other interesting and relevant details about this distribution.

**SPSS1 Frequently Asked Questions and Point Breakdown**

**5 questions:**

- Level of measurement:
- You must have the correct level of measurement
- You should not put interval/ratio. You must clearly identify if the data is interval or ratio for full
- You need to report the level of measurement on the original data, not the recoded data.

- Cumulative Frequency Table:
- You must have a title, a source and appropriate columns.
- Your table must be formatted according to APA guidelines. Week 6 under course content on our Blackboard Course website has both the template and an instructional video if you would like to refresh your memory from lab.
- If you have more than 10 rows you should recode the data. Make sure that the valid total of your recoded cumulative frequency table matches the valid total of the cumulative frequency table on the original data.

- Descriptive Statistics:
- Do not directly copy and paste from SPSS. You must format the tables according to the APA Guidelines.
- Measures of central tendency must be correct for the level of measurement of the data.
- Measures of dispersion must be appropriate for the level of measurement of the data.
- Always report the category labels for categorical data.
- You will need to perform some calculations. Remember, SPSS does not give you the IQR or the V ratio. You will need to calculate these out correctly for full points.
- Remember, if you have ordinal data the numbers associated with the data are just the coding Therefore, you need to simply the IQR and the Range as much as possible. (For example, if you were using a Likert Scale, you could end up with Hate-Love for the range and Somewhat dislike-Somewhat love for the IQR). Remember for categorical data you must work with the category labels.
- You must run statistics on the original data, not the recoded data. As such, your descriptive statistics should be appropriate to the level of measurement of your original data.

- Graphics:
- These can be copied and pasted from SPSS. You need to make sure that your graphs have titles and sources. You may use Excel but we strongly encourage you to use SPSS
- Your graph must be appropriate to the level of measurement of your data.
- (nominal = pie chart; ordinal = bar graph; interval/ratio = histogram)

- If using recoded variable, should use the level of measurement AFTER the recoding (interval recoded to ordinal should use bar graph)

- However, if you use a histogram with the interval/ratio original data, you will not be penalized

- Have value labels. Remember to use the crosshairs in the Graph Editor on SPSS. d. Bar chart should use % of cases (“Percent”) not count.
**Paragraph:**- You need to describe both central tendency and dispersion. Do not just laundry list all of the statistics. Focus on the most salient univariate statistics for your level of measurements (i.e. if you have ratio data, you should discuss the mean for central tendency and the standard deviation and the variance for measures of dispersion). You need to interpret theses as well. What do they tell you about the variable being measured? (For example, if you have a variable on age and the average age was 25, what could you infer about the population?). Report interesting trends from the statistics.

- Discuss the shape of the distribution curve (positive or negative skew) if it is appropriate for the level of measurement.

- Make sure that you report outliers if it is appropriate to the level of measurement for your data.

**Solution**

**SPSS1: Describing a Single variable**

Variable named “**polity**” with the label “**Higher scores more democratic (Polity)**” from **WORLD **2012 dataset

- Level of measurement

Level of measurement for this variable is **ratio **as the scores are representation of being democratic, so absolute zero may not be defined

- Cumulative Frequency Table

Higher scores more democratic (Polity) | ||

score | % Cum | |

-10 | 1.4 | |

-9 | 3.5 | |

-8 | 4.9 | |

-7 | 11.8 | |

-6 | 12.5 | |

-5 | 13.2 | |

-4 | 16.0 | |

-3 | 18.1 | |

-2 | 23.6 | |

-1 | 25.0 | |

0 | 25.7 | |

1 | 27.1 | |

2 | 29.2 | |

3 | 31.9 | |

4 | 34.7 | |

5 | 38.9 | |

6 | 47.2 | |

7 | 54.9 | |

8 | 66.7 | |

9 | 77.8 | |

10 | 100 | |

- Descriptive Statistics

Summary Statistics |

N Median Mode Range Minimum Maximum Q1 Q3 IQR V-ratio167 7 10 20 -10 10 -0.75 9 (Min-Q1 9.25, Q1-Q2 7.75, Q2-Q3 2, Q3- max11) , (1-32/167 = 0.8083) |

- Graphics

5.Paragraph

Since the data is ratio data, hence the mean of the of the data is 4.36 while its median is 7.00 which means median is greater than mean which means the data are “skewed to the left”, with a long tail of low scores pulling the mean down more than the median. This is further bolstered by the fact that skewness of the data is -.972. Skewness is a measure of the asymmetry. If there is an existence of negative skew which means the left tail is longer; the mass of the distribution is concentrated on the right of the figure. The distribution is said to be left-skewed, left-tailed, or skewed to the left. Further the standard deviation of the data is 6.104. Ideally approximately 99% of the data is + or – 2 standard deviations from the mean, therefore 99% of the data would be concentrated between 4.36 +2×6.104 and 4.36 – 2×6.104 i.e. between 16.568 and -7.848. However , in this case the results are different as distribution is left skewed.

Variable named “**dem_marital**” with the label **“Marital Status”** from the **NES 2012 **dataset

- Level of measurement

Level of measurement for this variable is nominal

- Cumulative Frequency Table

PRE: Marital status | ||

Category | % Cum | |

Married: spouse present | 51.5 | |

Married: spouse absent | 57.0 | |

Widowed | 67.6 | |

Divorced | 73.5 | |

Separated | 91.1 | |

Never married | 100.0 | |

- Descriptive Statistics

Summary Statistics |

N Median Mode Range Minimum Maximum Q1 Q3 IQR V-ratio5905 1 1 5 1 6 1 5 (Min-Q1 0, Q1-Q2 0, Q2-Q3 4, Q3- max 1) , (1-3043/5905 = 0.48467) |

- Graphics

5.Paragraph

The data is nominal in nature, therefore, pie chart has been plotted. The mean of the data is 2.59 while its median is 1 which means median is less than mean. The mode of the data is 1. Since data is nominal in nature, therefore, there is no scope for outliers. Further, mean also has a limited role in this case.

Variable named “**relig_attend**” with the label “**Attendance: Religious Services**” from the **NES 2012 **dataset

- Level of measurement

Level of measurement for this variable is ordinal

- Cumulative Frequency Table

PRE: Marital status | ||

Category | % Cum | |

Never | 42.9 | |

Few/Yr | 57.9 | |

1-2/Mnth | 67.5 | |

Alm/Evwk | 78.6 | |

Ev Week | 100.0 | |

- Descriptive Statistics

Summary Statistics |

N Median Mode Range Minimum Maximum Q1 Q3 IQR V-ratio5884 1 0 4 0 4 0 3 (Min-Q1 0, Q1-Q2 1, Q2-Q3 2, Q3- max 1) , (1-2526/5884 = 0.5707) |

- Graphics

5. Paragraph

The data is ordinal in nature, therefore, bar chart has been plotted. The mean of the data is 1.53 while its median is 1 which means median is less than mean. The mode of the data is 0. Since data is ordinal in nature, therefore, there is no scope for outliers. Further, mean also has a limited role in this case.

Variable named “**wordsum**” with the label **“Number of words correct in vocabulary test” **from the** GSS 2012 **dataset

- Level of measurement

Level of measurement for this variable is ratio scale

- Cumulative Frequency Table

Number Words Correct In Vocabulary Test | ||

score | % Cum | |

0 | .7 | |

1 | 2.1 | |

2 | 5.5 | |

3 | 10.8 | |

4 | 21.6 | |

5 | 39.3 | |

6 | 63.5 | |

7 | 78.7 | |

8 | 90.5 | |

9 | 96.4 | |

10 | 100.0 | |

- Descriptive Statistics

Summary Statistics |

N Median Mode Range Minimum Maximum Q1 Q3 IQR V-ratio1975 6 6 10 0 10 5 7 (Min-Q1 5, Q1-Q2 1, Q2-Q3 1, Q3- max 3) , (1- 310/1975 = 0.8430) |

- Graphics

5. Paragraph

Since the data is ratio data, hence the mean of the of the data is 5.91 while its median is 6.00 which means median is greater than mean which means the data are “skewed to the left”, with a long tail of low scores pulling the mean down more than the median. This is further bolstered by the fact that skewness of the data is -.234. Skewness is a measure of the asymmetry. If there is an existence of negative skew which means the left tail is longer; the mass of the distribution is concentrated on the right of the figure. The distribution is said to be left-skewed, left-tailed, or skewed to the left. Further the standard deviation of the data is 1.988. Ideally approximately 99% of the data is + or – 2 standard deviations from the mean, therefore 99% of the data would be concentrated between 5.91 +2×1.988 and 5.91 – 2×1.988 i.e. between 9.886 and 1.934. However , in this case the results are different as distribution is left skewed.

Variable named “**educ_4**” with the label “**Education in 4 Categories**” from the **GSS 2012 **dataset

- Level of measurement

Level of measurement for this variable is nominal

- Cumulative Frequency Table

Education: 4 Cats | ||

Category | % Cum | |

<HS | 16.2 | |

HS | 42.9 | |

Some Coll | 69.9 | |

Coll+ | 100.0 | |

- Descriptive Statistics

Summary Statistics |

N Median Mode Range Minimum Maximum Q1 Q3 IQR V-ratio1975 3 4 3 1 4 2 4 (Min-Q1 1, Q1-Q2 1, Q2-Q3 1, Q3- max 1) , (1- 593/1975 = 0.6997) |

- Graphics

5.Paragraph

The data is nominal in nature, therefore, pie chart has been plotted. The mean of the data is 2.71 while its median is 3 which means median is more than mean. The mode of the data is 4. Since data is nominal in nature, therefore, there is no scope for outliers. Further, mean also has a limited role in this case.

GET

FILE=’C:\Users\Akki\Desktop\fwdfiles\GSS2012.sav’.

DATASET NAME DataSet0 WINDOW=FRONT.

FREQUENCIES VARIABLES=wordsum

/NTILES=4

/STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SKEWNESS SESKEW KURTOSIS SEKURT

/HISTOGRAM

/ORDER=ANALYSIS.** **

**Frequencies**

Notes | ||

Output Created | 25-Oct-2017 01:21:40 | |

Comments | ||

Input | Data | C:\Users\Akki\Desktop\fwdfiles\GSS2012.sav |

Active Dataset | DataSet1 | |

Filter | <none> | |

Weight | Weight Variable | |

Split File | <none> | |

N of Rows in Working Data File | 1974 | |

Missing Value Handling | Definition of Missing | User-defined missing values are treated as missing. |

Cases Used | Statistics are based on all cases with valid data. | |

Syntax | FREQUENCIES VARIABLES=wordsum /NTILES=4 /STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SKEWNESS SESKEW KURTOSIS SEKURT /HISTOGRAM /ORDER=ANALYSIS.
| |

Resources | Processor Time | 00:00:00.280 |

Elapsed Time | 00:00:00.310 |

Statistics | ||

Number Words Correct In Vocabulary Test | ||

N | Valid | 1283 |

Missing | 692 | |

Mean | 5.91 | |

Std. Error of Mean | .056 | |

Median | 6.00 | |

Mode | 6 | |

Std. Deviation | 1.988 | |

Variance | 3.954 | |

Skewness | -.234 | |

Std. Error of Skewness | .068 | |

Kurtosis | .067 | |

Std. Error of Kurtosis | .137 | |

Range | 10 | |

Minimum | 0 | |

Maximum | 10 | |

Percentiles | 25 | 5.00 |

50 | 6.00 | |

75 | 7.00 |

Number Words Correct In Vocabulary Test | |||||

Frequency | Percent | Valid Percent | Cumulative Percent | ||

Valid | 0 | 9 | .5 | .7 | .7 |

1 | 17 | .9 | 1.4 | 2.1 | |

2 | 43 | 2.2 | 3.4 | 5.5 | |

3 | 69 | 3.5 | 5.4 | 10.8 | |

4 | 138 | 7.0 | 10.8 | 21.6 | |

5 | 227 | 11.5 | 17.7 | 39.3 | |

6 | 310 | 15.7 | 24.2 | 63.5 | |

7 | 195 | 9.9 | 15.2 | 78.7 | |

8 | 151 | 7.7 | 11.8 | 90.5 | |

9 | 76 | 3.9 | 5.9 | 96.4 | |

10 | 46 | 2.3 | 3.6 | 100.0 | |

Total | 1283 | 64.9 | 100.0 | ||

Missing | IAP | 662 | 33.5 | ||

DID NOT TRY | 30 | 1.5 | |||

Total | 692 | 35.1 | |||

Total | 1975 | 100.0 |

FREQUENCIES VARIABLES=dem_marital

/NTILES=4

/STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SKEWNESS SESKEW KURTOSIS SEKURT

/PIECHART FREQ

/ORDER=ANALYSIS.** **

**Frequencies**

Notes | ||

Output Created | 25-Oct-2017 00:14:58 | |

Comments | ||

Input | Data | C:\Users\Akki\Desktop\fwdfiles\NES2012.sav |

Active Dataset | DataSet1 | |

Filter | <none> | |

Weight | Weight variable | |

Split File | <none> | |

N of Rows in Working Data File | 5916 | |

Missing Value Handling | Definition of Missing | User-defined missing values are treated as missing. |

Cases Used | Statistics are based on all cases with valid data. | |

Syntax | FREQUENCIES VARIABLES=dem_marital /NTILES=4 /PIECHART FREQ /ORDER=ANALYSIS.
| |

Resources | Processor Time | 00:00:00.358 |

Elapsed Time | 00:00:00.621 |

Statistics | ||

PRE: Marital status | ||

N | Valid | 5905 |

Missing | 11 | |

Mean | 2.59 | |

Std. Error of Mean | .024 | |

Median | 1.00 | |

Mode | 1 | |

Std. Deviation | 1.874 | |

Variance | 3.514 | |

Skewness | .614 | |

Std. Error of Skewness | .032 | |

Kurtosis | -1.263 | |

Std. Error of Kurtosis | .064 | |

Range | 5 | |

Minimum | 1 | |

Maximum | 6 | |

Percentiles | 25 | 1.00 |

50 | 1.00 | |

75 | 5.00 |

PRE: Marital status | |||||

Frequency | Percent | Valid Percent | Cumulative Percent | ||

Valid | 1. Married: spouse present | 3043 | 51.4 | 51.5 | 51.5 |

2. Married: spouse absent {VOL} | 320 | 5.4 | 5.4 | 57.0 | |

3. Widowed | 629 | 10.6 | 10.6 | 67.6 | |

4. Divorced | 347 | 5.9 | 5.9 | 73.5 | |

5. Separated | 1042 | 17.6 | 17.7 | 91.1 | |

6. Never married | 524 | 8.9 | 8.9 | 100.0 | |

Total | 5905 | 99.8 | 100.0 | ||

Missing | System | 11 | .2 | ||

Total | 5916 | 100.0 |

FREQUENCIES VARIABLES=relig_attend

/NTILES=4

/BARCHART FREQ

/ORDER=ANALYSIS.** **

**Frequencies**

Notes | ||

Output Created | 25-Oct-2017 00:54:18 | |

Comments | ||

Input | Data | C:\Users\Akki\Desktop\fwdfiles\NES2012.sav |

Active Dataset | DataSet1 | |

Filter | <none> | |

Weight | Weight variable | |

Split File | <none> | |

N of Rows in Working Data File | 5916 | |

Missing Value Handling | Definition of Missing | User-defined missing values are treated as missing. |

Cases Used | Statistics are based on all cases with valid data. | |

Syntax | FREQUENCIES VARIABLES=relig_attend /NTILES=4 /BARCHART FREQ /ORDER=ANALYSIS.
| |

Resources | Processor Time | 00:00:00.296 |

Elapsed Time | 00:00:00.270 |

Statistics | ||

Attendance: Religious Services | ||

N | Valid | 5884 |

Missing | 32 | |

Mean | 1.53 | |

Std. Error of Mean | .021 | |

Median | 1.00 | |

Mode | 0 | |

Std. Deviation | 1.616 | |

Variance | 2.613 | |

Skewness | .478 | |

Std. Error of Skewness | .032 | |

Kurtosis | -1.413 | |

Std. Error of Kurtosis | .064 | |

Range | 4 | |

Minimum | 0 | |

Maximum | 4 | |

Percentiles | 25 | .00 |

50 | 1.00 | |

75 | 3.00 |

Attendance: Religious Services | |||||

Frequency | Percent | Valid Percent | Cumulative Percent | ||

Valid | Never | 2526 | 42.7 | 42.9 | 42.9 |

Few/Yr | 879 | 14.9 | 14.9 | 57.9 | |

1-2/Mnth | 566 | 9.6 | 9.6 | 67.5 | |

Alm/Evwk | 657 | 11.1 | 11.2 | 78.6 | |

EvWeek | 1256 | 21.2 | 21.4 | 100.0 | |

Total | 5884 | 99.5 | 100.0 | ||

Missing | System | 32 | .5 | ||

Total | 5916 | 100.0 |

GET

FILE=’C:\Users\Akki\Desktop\fwdfiles\GSS2012.sav’.

DATASET NAME DataSet0 WINDOW=FRONT.

FREQUENCIES VARIABLES=educ_4

/NTILES=4

/STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SUM SKEWNESS SESKEW KURTOSIS SEKURT

/PIECHART FREQ

/ORDER=ANALYSIS.** **

**Frequencies**

Notes | ||

Output Created | 25-Oct-2017 01:29:31 | |

Comments | ||

Input | Data | C:\Users\Akki\Desktop\fwdfiles\GSS2012.sav |

Active Dataset | DataSet1 | |

Filter | <none> | |

Weight | Weight Variable | |

Split File | <none> | |

N of Rows in Working Data File | 1974 | |

Missing Value Handling | Definition of Missing | User-defined missing values are treated as missing. |

Cases Used | Statistics are based on all cases with valid data. | |

Syntax | FREQUENCIES VARIABLES=educ_4 /NTILES=4 /STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SUM SKEWNESS SESKEW KURTOSIS SEKURT /PIECHART FREQ /ORDER=ANALYSIS.
| |

Resources | Processor Time | 00:00:00.421 |

Elapsed Time | 00:00:00.630 |

Statistics | ||

Education: 4 Cats | ||

N | Valid | 1974 |

Missing | 1 | |

Mean | 2.71 | |

Std. Error of Mean | .024 | |

Median | 3.00 | |

Mode | 4 | |

Std. Deviation | 1.064 | |

Variance | 1.132 | |

Skewness | -.209 | |

Std. Error of Skewness | .055 | |

Kurtosis | -1.214 | |

Std. Error of Kurtosis | .110 | |

Range | 3 | |

Minimum | 1 | |

Maximum | 4 | |

Sum | 5347 | |

Percentiles | 25 | 2.00 |

50 | 3.00 | |

75 | 4.00 |

Education: 4 Cats | |||||

Frequency | Percent | Valid Percent | Cumulative Percent | ||

Valid | <HS | 320 | 16.2 | 16.2 | 16.2 |

HS | 528 | 26.7 | 26.8 | 42.9 | |

Some Coll | 533 | 27.0 | 27.0 | 69.9 | |

Coll+ | 593 | 30.0 | 30.1 | 100.0 | |

Total | 1974 | 99.9 | 100.0 | ||

Missing | System | 1 | .1 | ||

Total | 1975 | 100.0 |