First Program
Assignment for Week 2: First Program
Code:
# -*- coding: utf-8 -*-
"""
Created on Sat Jul 6 10:48:29 2019
@author: gdeal
"""
# program to import nesarc dataset limited to respondents who ever needed to drink more to get the intended effect
import pandas
import numpy
# option to avoid runtime errors/warnings
pandas.set_option('display.float_format',lambda x: '%f'%x)
# read in full nesarc dataset
nesarc_df=pandas.read_csv('nesarc_pds.csv',low_memory=False)
# convert variable names to upper case
nesarc_df.columns=map(str.upper,nesarc_df.columns)
# show number of observations and columns in base dataset
# print(len(nesarc_df))
# print(len(nesarc_df.columns))
# print("printing values of sex for subset S2BQ1A2")
# count_test = nesarc_df['S2BQ1A2'].value_counts(sort=False)
# print(count_test)
nesarc_df['S2BQ1A2'] = nesarc_df['S2BQ1A2'].convert_objects(convert_numeric=True)
nesarc_df['AGE'] = nesarc_df['AGE'].convert_objects(convert_numeric=True)
# create subset data frame with only respondents who have ever drunk more to
# feel the effect they wanted and who are 18 years or older but younger than 40.
df_sub1=nesarc_df[(nesarc_df['S2BQ1A2']==1) & (nesarc_df['AGE']<40)]
print("printing count of respondents with possible alcohol dependence")
print(len(df_sub1))
# print counts and percentages for sex of possible alcolol dependenc subset
print("printing count of sex for subset of respondents with possible alcohol dependence")
count_sex = df_sub1['SEX'].value_counts(sort=False)
print(count_sex)
print("printing percentages for sex for subset of respondents with possible alcohol dependence")
pct_sex = df_sub1['SEX'].value_counts(sort=False,normalize=True)
print(pct_sex)
# print counts and percentages for ages of possible alcolol dependenc subset
print("printing count of ages for subset of respondents with possible alcohol dependence")
count_ages = df_sub1 ['AGE'].value_counts(sort=False)
print(count_ages)
print("printing percentage of ages for subset of respondents with possible alcohol dependence")
pct_ages = df_sub1['AGE'].value_counts(sort=False,normalize=True)
print(pct_ages)
# print counts and percentages for whether respondents who have been to a detox clinic
print("printing count of whether respondents have been to a rehab clinic")
count_rehab = df_sub1 ['S2CQ2A6'].value_counts(sort=False)
print(count_rehab)
print("printing percentage of whether respondents have been to a rehab clinic")
pct_rehab = df_sub1['S2CQ2A6'].value_counts(sort=False,normalize=True)
print(pct_rehab)
Output:
printing count of respondents with possible alcohol dependence
2476
printing count of sex for subset of respondents with possible alcohol dependence
2 1037
1 1439
Name: SEX, dtype: int64
printing percentages for sex for subset of respondents with possible alcohol dependence
2 0.418821
1 0.581179
Name: SEX, dtype: float64
printing count of ages for subset of respondents with possible alcohol dependence
18 78
20 102
22 116
24 133
26 92
28 95
30 130
32 137
34 137
36 103
38 113
19 101
21 122
23 123
25 106
27 107
29 117
31 121
33 121
35 102
37 119
39 101
Name: AGE, dtype: int64
printing percentage of ages for subset of respondents with possible alcohol dependence
18 0.031502
20 0.041195
22 0.046850
24 0.053716
26 0.037157
28 0.038368
30 0.052504
32 0.055331
34 0.055331
36 0.041599
38 0.045638
19 0.040792
21 0.049273
23 0.049677
25 0.042811
27 0.043215
29 0.047254
31 0.048869
33 0.048869
35 0.041195
37 0.048061
39 0.040792
Name: AGE, dtype: float64
printing count of whether respondents have been to a rehab clinic
1 193
2 218
9 1
2064
Name: S2CQ2A6, dtype: int64
printing percentage of whether respondents have been to a rehab clinic
1 0.077948
2 0.088045
9 0.000404
0.833603
Name: S2CQ2A6, dtype: float64
Explanations:
The program produces a subset of the NESARC data set that includes respondents who have at least once drunk more to feel an effect and who were 18 years or older and younger than 30 at the time of the survey.
This produced a working set of 2,476 observations. Of these:
58% were male and 42% were female
The table included shows the breakdown by age. The most frequent ages in the subset were: 24, 30, 32 & 34
Of the observations in the subset, 193 respondents answered that they had sought help at a rehab clinic and 218 responded that they hadn't. 2,065 respondents did not answer the question or considered it irrelevant.
In summary, the working set provides a total of 2,476 observations of respondents who have at least experienced some symptoms of alcohol dependence. Of these 193, or about 8%, have sought treatment at a rehab facility. This provides a rough estimation of the number of respondents who have actively tried to recover from alcohol dependence, which is a central focus of the research question.
Note: I am still trying to figure out how to sort the variables in the output correct order. The syntax in the example program has been deprecated and replaced with new functions.
Nice work,
ReplyDeleteI think you are in right path and the sorting the data will be on the next week's lectures.