## N Median Mode Range

SPSS 1 Assignment Instructions

This assignment is designed to help teach you to describe a single variable – its central location, its dispersion, to create an appropriate graphic to illustrate the variable, and to discuss the way in which you variables distribute.

Use the following format:

1. A title page with “SPSS 1: Describing a single variable” as the title and your name, section, TA’s name, professor’s name, date, G# in the upper, right hand corner.
2. Start a new page for each variable.
• The variable name in bold and underlined at the top of the page.
• a)  A properly formatted frequency table for the variable you are describing.
• b)  A table of appropriate summary statistics.
• c)  An appropriate graphic.
• d)  A paragraph describing your variable.

For each of the variables:

1. Identify the level of measurement for each variable.
2. Build a table that shows the cumulative % and frequencies.
1. The table must be in APA format
2. Some variables will have to be recoded to effectively display in a table. RULE  OF THUMB – no more than 10 categories should appear in ANY table.
• Report the summary statistics that describes the variable in terms of all the appropriate measures of central location and dispersion.
1. Create 1 appropriate graphic to display the distribution.
2. Write a paragraph that describes the distribution in terms of central location, dispersion, outliers, and skew (if any).

1. From the WORLD 2012dataset use the variable named “polity” with the label “Higher scores more democratic (Polity)”.
2. From the NES 2012 dataset use the variable named “dem_marital” with the label “Marital Status”.
3. From the NES 2012 dataset use the variable named “relig_attend” with the label “Attendance: Religious Services”.
4. From the GSS 2012 dataset use the variable named “wordsum” with the label “Number of words correct in vocabulary test”.
5. From the GSS 2012 dataset use the variable named “educ_4” with the label “Education in 4 Categories”.

***Sample Problem*** Variable: age5

1. Level of measurement for this variable is ordinal.
2. Cumulative % frequency table:
 Age in 5 Categoriesa XF % Cum % 18-30 437 21.7 21.7 31-40 384 19.1 40.8 41-50 403 20.0 60.8 51-60 369 18.3 79.1 61+ 421 20.9 100.0 Total 2015 100.0 a. General Social Survey 2008

III. Table of summary statistics:

 Summary Statistics N Median Mode Range Minimum Maximum Q1 Q3 IQR V-ratio2015 41-50 18-30(61+) – (18-30) 18-30 61+ 31-40 51-60 (51-60) – (31-40) 0.783

Bar chart:

1. Descriptive paragraph requirements:
1. What does your variable measure?
2. Describe this variable in terms of all appropriate measurements of central
3. Describe this variable in terms of the appropriate measurements of dispersion.
4. If appropriate, is this distribution skewed negative or positive?

SPSS1 Frequently Asked Questions and Point Breakdown

5 questions:

• Level of measurement:
1. You must have the correct level of measurement
2. You should not put interval/ratio. You must clearly identify if the data is interval or ratio for full
3. You need to report the level of measurement on the original data, not the recoded data.
• Cumulative Frequency Table:
1. You must have a title, a source and appropriate columns.
2. Your table must be formatted according to APA guidelines. Week 6 under course content on our Blackboard Course website has both the template and an instructional video if you would like to refresh your memory from lab.
3. If you have more than 10 rows you should recode the data. Make sure that the valid total of your recoded cumulative frequency table matches the valid total of the cumulative frequency table on the original data.
• Descriptive Statistics:
1. Do not directly copy and paste from SPSS. You must format the tables according to the APA Guidelines.
2. Measures of central tendency must be correct for the level of measurement of the data.
3. Measures of dispersion must be appropriate for the level of measurement of the data.
4. Always report the category labels for categorical data.
5. You will need to perform some calculations. Remember, SPSS does not give you the IQR or the  V ratio. You will need to calculate these out correctly for full points.
6. Remember, if you have ordinal data the numbers associated with the data are just the coding   Therefore, you need to simply the IQR and the Range as much as possible. (For example, if you were using a Likert Scale, you could end up with Hate-Love for the range and Somewhat dislike-Somewhat love for the IQR). Remember for categorical data you must work with the category labels.
7. You must run statistics on the original data, not the recoded data. As such, your descriptive statistics should be appropriate to the level of measurement of your original data.
• Graphics:
1. These can be copied and pasted from SPSS. You need to make sure that your graphs have titles and sources. You may use Excel but we strongly encourage you to use SPSS
2. Your graph must be appropriate to the level of measurement of your data.
3. (nominal = pie chart; ordinal = bar graph; interval/ratio = histogram)
1. If using recoded variable, should use the level of measurement AFTER the recoding (interval recoded to ordinal should use bar graph)
• However, if you use a histogram with the interval/ratio original data, you will not be penalized
1. Have value labels. Remember to use the crosshairs in the Graph Editor on SPSS. d. Bar chart should use % of cases (“Percent”) not count.
2. Paragraph:
3. You need to describe both central tendency and dispersion. Do not just laundry list all of the statistics. Focus on the most salient univariate statistics for your level of measurements (i.e. if you have ratio data, you should discuss the mean for central tendency and the standard deviation and the variance for measures of dispersion). You need to interpret theses as well. What do they tell you about the variable being measured? (For example, if you have a variable on age and the average age was 25, what could you infer about the population?). Report interesting trends from the statistics.
• Discuss the shape of the distribution curve (positive or negative skew) if it is appropriate for the level of measurement.
1. Make sure that you report outliers if it is appropriate to the level of measurement for your data.  Solution

SPSS1: Describing a Single variable

Variable named “polity” with the label “Higher scores more democratic (Polity)” from WORLD 2012 dataset

1. Level of measurement

Level of measurement for this variable is ratio as the scores are representation of being democratic, so absolute zero may not be defined

1. Cumulative Frequency Table
 Higher scores more democratic (Polity) score % Cum -10 1.4 -9 3.5 -8 4.9 -7 11.8 -6 12.5 -5 13.2 -4 16.0 -3 18.1 -2 23.6 -1 25.0 0 25.7 1 27.1 2 29.2 3 31.9 4 34.7 5 38.9 6 47.2 7 54.9 8 66.7 9 77.8 10 100
1. Descriptive Statistics
 Summary Statistics N     Median  Mode  Range  Minimum  Maximum  Q1    Q3  IQR          V-ratio167   7            10        20         -10             10             -0.75    9      (Min-Q1 9.25, Q1-Q2 7.75, Q2-Q3 2, Q3- max11) , (1-32/167 = 0.8083)
1. Graphics 5.Paragraph

Since the data is ratio data, hence the mean of the of the data is 4.36 while its median is 7.00 which means median is greater than mean which means the data are “skewed to the left”, with a long tail of low scores pulling the mean down more than the median. This is further bolstered by the fact that skewness of the data is -.972. Skewness is a measure of the asymmetry. If there is an existence of negative skew which means the left tail is longer; the mass of the distribution is concentrated on the right of the figure. The distribution is said to be left-skewed, left-tailed, or skewed to the left. Further the standard deviation of the data is 6.104. Ideally approximately 99% of the data is + or – 2 standard deviations from the mean, therefore 99% of the data would be concentrated between 4.36 +2×6.104 and 4.36 – 2×6.104 i.e. between 16.568 and -7.848. However , in this case the results are different as distribution is left skewed.

Variable named “dem_marital” with the label “Marital Status” from the NES 2012 dataset

1. Level of measurement

Level of measurement for this variable is nominal

1. Cumulative Frequency Table
 PRE: Marital status Category % Cum Married: spouse present 51.5 Married: spouse absent 57.0 Widowed 67.6 Divorced 73.5 Separated 91.1 Never married 100.0
1. Descriptive Statistics
 Summary Statistics N     Median  Mode  Range  Minimum  Maximum  Q1    Q3  IQR          V-ratio5905   1           1        5          1                6                   1         5      (Min-Q1 0, Q1-Q2 0, Q2-Q3 4, Q3- max 1) , (1-3043/5905 = 0.48467)
1. Graphics 5.Paragraph

The data is nominal in nature, therefore, pie chart has been plotted. The mean of the data is 2.59 while its median is 1 which means median is less than mean. The mode of the data is 1. Since data is nominal in nature, therefore, there is no scope for outliers. Further, mean also has a limited role in this case.

Variable named “relig_attend” with the label “Attendance: Religious Services” from the NES 2012 dataset

1. Level of measurement

Level of measurement for this variable is ordinal

1. Cumulative Frequency Table
 PRE: Marital status Category % Cum Never 42.9 Few/Yr 57.9 1-2/Mnth 67.5 Alm/Evwk 78.6 Ev Week 100.0
1. Descriptive Statistics
 Summary Statistics N     Median  Mode  Range  Minimum  Maximum  Q1    Q3  IQR          V-ratio5884   1           0        4         0                4                  0         3     (Min-Q1 0, Q1-Q2 1, Q2-Q3 2, Q3- max 1) , (1-2526/5884 = 0.5707)
1. Graphics 5. Paragraph

The data is ordinal in nature, therefore, bar chart has been plotted. The mean of the data is 1.53 while its median is 1 which means median is less than mean. The mode of the data is 0. Since data is ordinal in nature, therefore, there is no scope for outliers. Further, mean also has a limited role in this case.

Variable named “wordsum” with the label “Number of words correct in vocabulary test” from the GSS 2012 dataset

1. Level of measurement

Level of measurement for this variable is ratio scale

1. Cumulative Frequency Table
 Number Words Correct In Vocabulary Test score % Cum 0 .7 1 2.1 2 5.5 3 10.8 4 21.6 5 39.3 6 63.5 7 78.7 8 90.5 9 96.4 10 100.0
1. Descriptive Statistics
 Summary Statistics N     Median  Mode  Range  Minimum  Maximum  Q1    Q3  IQR          V-ratio1975   6          6         10         0                10               5         7     (Min-Q1 5, Q1-Q2 1, Q2-Q3 1, Q3- max 3) , (1- 310/1975 = 0.8430)
1. Graphics 5. Paragraph

Since the data is ratio data, hence the mean of the of the data is 5.91 while its median is 6.00 which means median is greater than mean which means the data are “skewed to the left”, with a long tail of low scores pulling the mean down more than the median. This is further bolstered by the fact that skewness of the data is -.234. Skewness is a measure of the asymmetry. If there is an existence of negative skew which means the left tail is longer; the mass of the distribution is concentrated on the right of the figure. The distribution is said to be left-skewed, left-tailed, or skewed to the left. Further the standard deviation of the data is 1.988. Ideally approximately 99% of the data is + or – 2 standard deviations from the mean, therefore 99% of the data would be concentrated between 5.91 +2×1.988 and 5.91 – 2×1.988 i.e. between 9.886 and 1.934. However , in this case the results are different as distribution is left skewed.

Variable named “educ_4” with the label “Education in 4 Categories” from the GSS 2012 dataset

1. Level of measurement

Level of measurement for this variable is nominal

1. Cumulative Frequency Table
 Education: 4 Cats Category % Cum
1. Descriptive Statistics
 Summary Statistics N     Median  Mode  Range  Minimum  Maximum  Q1    Q3  IQR          V-ratio1975   3         4          3          1                4                 2           4    (Min-Q1 1, Q1-Q2 1, Q2-Q3 1, Q3- max 1) , (1- 593/1975 = 0.6997)
1. Graphics 5.Paragraph

The data is nominal in nature, therefore, pie chart has been plotted. The mean of the data is 2.71 while its median is 3 which means median is more than mean. The mode of the data is 4. Since data is nominal in nature, therefore, there is no scope for outliers. Further, mean also has a limited role in this case.

GET

FILE=’C:\Users\Akki\Desktop\fwdfiles\GSS2012.sav’.

DATASET NAME DataSet0 WINDOW=FRONT.

FREQUENCIES VARIABLES=wordsum

/NTILES=4

/STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SKEWNESS SESKEW KURTOSIS SEKURT

/HISTOGRAM

/ORDER=ANALYSIS.

Frequencies

 Notes Output Created 25-Oct-2017 01:21:40 Comments Input Data C:\Users\Akki\Desktop\fwdfiles\GSS2012.sav Active Dataset DataSet1 Filter Weight Weight Variable Split File N of Rows in Working Data File 1974 Missing Value Handling Definition of Missing User-defined missing values are treated as missing. Cases Used Statistics are based on all cases with valid data. Syntax FREQUENCIES VARIABLES=wordsum/NTILES=4/STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SKEWNESS SESKEW KURTOSIS SEKURT/HISTOGRAM/ORDER=ANALYSIS. Resources Processor Time 00:00:00.280 Elapsed Time 00:00:00.310
[DataSet1] C:\Users\Akki\Desktop\fwdfiles\GSS2012.sav

 Statistics Number Words Correct In Vocabulary Test N Valid 1283 Missing 692 Mean 5.91 Std. Error of Mean .056 Median 6.00 Mode 6 Std. Deviation 1.988 Variance 3.954 Skewness -.234 Std. Error of Skewness .068 Kurtosis .067 Std. Error of Kurtosis .137 Range 10 Minimum 0 Maximum 10 Percentiles 25 5.00 50 6.00 75 7.00
 Number Words Correct In Vocabulary Test Frequency Percent Valid Percent Cumulative Percent Valid 0 9 .5 .7 .7 1 17 .9 1.4 2.1 2 43 2.2 3.4 5.5 3 69 3.5 5.4 10.8 4 138 7.0 10.8 21.6 5 227 11.5 17.7 39.3 6 310 15.7 24.2 63.5 7 195 9.9 15.2 78.7 8 151 7.7 11.8 90.5 9 76 3.9 5.9 96.4 10 46 2.3 3.6 100.0 Total 1283 64.9 100.0 Missing IAP 662 33.5 DID NOT TRY 30 1.5 Total 692 35.1 Total 1975 100.0 FREQUENCIES VARIABLES=dem_marital

/NTILES=4

/STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SKEWNESS SESKEW KURTOSIS SEKURT

/PIECHART FREQ

/ORDER=ANALYSIS.

Frequencies

 Notes Output Created 25-Oct-2017 00:14:58 Comments Input Data C:\Users\Akki\Desktop\fwdfiles\NES2012.sav Active Dataset DataSet1 Filter Weight Weight variable Split File N of Rows in Working Data File 5916 Missing Value Handling Definition of Missing User-defined missing values are treated as missing. Cases Used Statistics are based on all cases with valid data. Syntax FREQUENCIES VARIABLES=dem_marital/NTILES=4/STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SKEWNESS SESKEW KURTOSIS SEKURT/PIECHART FREQ/ORDER=ANALYSIS. Resources Processor Time 00:00:00.358 Elapsed Time 00:00:00.621
[DataSet1] C:\Users\Akki\Desktop\fwdfiles\NES2012.sav

 Statistics PRE: Marital status N Valid 5905 Missing 11 Mean 2.59 Std. Error of Mean .024 Median 1.00 Mode 1 Std. Deviation 1.874 Variance 3.514 Skewness .614 Std. Error of Skewness .032 Kurtosis -1.263 Std. Error of Kurtosis .064 Range 5 Minimum 1 Maximum 6 Percentiles 25 1.00 50 1.00 75 5.00
 PRE: Marital status Frequency Percent Valid Percent Cumulative Percent Valid 1. Married: spouse present 3043 51.4 51.5 51.5 2. Married: spouse absent {VOL} 320 5.4 5.4 57.0 3. Widowed 629 10.6 10.6 67.6 4. Divorced 347 5.9 5.9 73.5 5. Separated 1042 17.6 17.7 91.1 6. Never married 524 8.9 8.9 100.0 Total 5905 99.8 100.0 Missing System 11 .2 Total 5916 100.0 FREQUENCIES VARIABLES=relig_attend

/NTILES=4

/STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SKEWNESS SESKEW KURTOSIS SEKURT

/BARCHART FREQ

/ORDER=ANALYSIS.

Frequencies

 Notes Output Created 25-Oct-2017 00:54:18 Comments Input Data C:\Users\Akki\Desktop\fwdfiles\NES2012.sav Active Dataset DataSet1 Filter Weight Weight variable Split File N of Rows in Working Data File 5916 Missing Value Handling Definition of Missing User-defined missing values are treated as missing. Cases Used Statistics are based on all cases with valid data. Syntax FREQUENCIES VARIABLES=relig_attend/NTILES=4/STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SKEWNESS SESKEW KURTOSIS SEKURT/BARCHART FREQ/ORDER=ANALYSIS. Resources Processor Time 00:00:00.296 Elapsed Time 00:00:00.270
[DataSet1] C:\Users\Akki\Desktop\fwdfiles\NES2012.sav

 Statistics Attendance: Religious Services N Valid 5884 Missing 32 Mean 1.53 Std. Error of Mean .021 Median 1.00 Mode 0 Std. Deviation 1.616 Variance 2.613 Skewness .478 Std. Error of Skewness .032 Kurtosis -1.413 Std. Error of Kurtosis .064 Range 4 Minimum 0 Maximum 4 Percentiles 25 .00 50 1.00 75 3.00
 Attendance: Religious Services Frequency Percent Valid Percent Cumulative Percent Valid Never 2526 42.7 42.9 42.9 Few/Yr 879 14.9 14.9 57.9 1-2/Mnth 566 9.6 9.6 67.5 Alm/Evwk 657 11.1 11.2 78.6 EvWeek 1256 21.2 21.4 100.0 Total 5884 99.5 100.0 Missing System 32 .5 Total 5916 100.0 GET

FILE=’C:\Users\Akki\Desktop\fwdfiles\GSS2012.sav’.

DATASET NAME DataSet0 WINDOW=FRONT.

FREQUENCIES VARIABLES=educ_4

/NTILES=4

/STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SUM SKEWNESS SESKEW KURTOSIS SEKURT

/PIECHART FREQ

/ORDER=ANALYSIS.

Frequencies

 Notes Output Created 25-Oct-2017 01:29:31 Comments Input Data C:\Users\Akki\Desktop\fwdfiles\GSS2012.sav Active Dataset DataSet1 Filter Weight Weight Variable Split File N of Rows in Working Data File 1974 Missing Value Handling Definition of Missing User-defined missing values are treated as missing. Cases Used Statistics are based on all cases with valid data. Syntax FREQUENCIES VARIABLES=educ_4/NTILES=4/STATISTICS=STDDEV VARIANCE RANGE MINIMUM MAXIMUM SEMEAN MEAN MEDIAN MODE SUM SKEWNESS SESKEW KURTOSIS SEKURT/PIECHART FREQ/ORDER=ANALYSIS. Resources Processor Time 00:00:00.421 Elapsed Time 00:00:00.630
[DataSet1] C:\Users\Akki\Desktop\fwdfiles\GSS2012.sav

 Statistics Education: 4 Cats N Valid 1974 Missing 1 Mean 2.71 Std. Error of Mean .024 Median 3.00 Mode 4 Std. Deviation 1.064 Variance 1.132 Skewness -.209 Std. Error of Skewness .055 Kurtosis -1.214 Std. Error of Kurtosis .110 Range 3 Minimum 1 Maximum 4 Sum 5347 Percentiles 25 2.00 50 3.00 75 4.00
 Education: 4 Cats Frequency Percent Valid Percent Cumulative Percent Valid    