First Program



 Assignment for Week 2: First Program

Code:

# -*- coding: utf-8 -*-
"""
Created on Sat Jul  6 10:48:29 2019

@author: gdeal
"""
# program to import nesarc dataset limited to respondents who ever needed to drink more to get the intended effect

import pandas
import numpy

# option to avoid runtime errors/warnings
pandas.set_option('display.float_format',lambda x: '%f'%x)

# read in full nesarc dataset
nesarc_df=pandas.read_csv('nesarc_pds.csv',low_memory=False)

# convert variable names to upper case
nesarc_df.columns=map(str.upper,nesarc_df.columns)

# show number of observations and columns in base dataset
# print(len(nesarc_df))
# print(len(nesarc_df.columns))

# print("printing values of sex for subset S2BQ1A2")
# count_test = nesarc_df['S2BQ1A2'].value_counts(sort=False)
# print(count_test)

nesarc_df['S2BQ1A2'] = nesarc_df['S2BQ1A2'].convert_objects(convert_numeric=True)
nesarc_df['AGE'] = nesarc_df['AGE'].convert_objects(convert_numeric=True)

# create subset data frame with only respondents who have ever drunk more to
# feel the effect they wanted and who are 18 years or older but younger than 40.

df_sub1=nesarc_df[(nesarc_df['S2BQ1A2']==1) & (nesarc_df['AGE']<40)]
print("printing count of respondents with possible alcohol dependence")
print(len(df_sub1))

# print counts and percentages for sex of possible alcolol dependenc subset
print("printing count of sex for subset of respondents with possible alcohol dependence")
count_sex = df_sub1['SEX'].value_counts(sort=False)
print(count_sex)

print("printing percentages for sex for subset of respondents with possible alcohol dependence")
pct_sex = df_sub1['SEX'].value_counts(sort=False,normalize=True)
print(pct_sex)

# print counts and percentages for ages of possible alcolol dependenc subset
print("printing count of ages for subset of respondents with possible alcohol dependence")
count_ages = df_sub1 ['AGE'].value_counts(sort=False)
print(count_ages)

print("printing percentage of ages for subset of respondents with possible alcohol dependence")
pct_ages = df_sub1['AGE'].value_counts(sort=False,normalize=True)
print(pct_ages)

# print counts and percentages for whether respondents who have been to a detox clinic
print("printing count of whether respondents have been to a rehab clinic")
count_rehab = df_sub1 ['S2CQ2A6'].value_counts(sort=False)
print(count_rehab)

print("printing percentage of whether respondents have been to a rehab clinic")
pct_rehab = df_sub1['S2CQ2A6'].value_counts(sort=False,normalize=True)
print(pct_rehab)


Output:
printing count of respondents with possible alcohol dependence
2476
printing count of sex for subset of respondents with possible alcohol dependence
2    1037
1    1439
Name: SEX, dtype: int64
printing percentages for sex for subset of respondents with possible alcohol dependence
2   0.418821
1   0.581179
Name: SEX, dtype: float64
printing count of ages for subset of respondents with possible alcohol dependence
18     78
20    102
22    116
24    133
26     92
28     95
30    130
32    137
34    137
36    103
38    113
19    101
21    122
23    123
25    106
27    107
29    117
31    121
33    121
35    102
37    119
39    101
Name: AGE, dtype: int64
printing percentage of ages for subset of respondents with possible alcohol dependence
18   0.031502
20   0.041195
22   0.046850
24   0.053716
26   0.037157
28   0.038368
30   0.052504
32   0.055331
34   0.055331
36   0.041599
38   0.045638
19   0.040792
21   0.049273
23   0.049677
25   0.042811
27   0.043215
29   0.047254
31   0.048869
33   0.048869
35   0.041195
37   0.048061
39   0.040792
Name: AGE, dtype: float64
printing count of whether respondents have been to a rehab clinic
1     193
2     218
9       1
     2064
Name: S2CQ2A6, dtype: int64
printing percentage of whether respondents have been to a rehab clinic
1   0.077948
2   0.088045
9   0.000404
    0.833603
Name: S2CQ2A6, dtype: float64


Explanations:
The program produces a subset of the NESARC data set that includes respondents who have at least once drunk more to feel an effect and who were 18 years or older and younger than 30 at the time of the survey.
This produced a working set of 2,476 observations. Of these:
58% were male and 42% were female
The table included shows the breakdown by age. The most frequent ages in the subset were: 24, 30, 32 & 34
Of the observations in the subset, 193 respondents answered that they had sought help at a rehab clinic and 218 responded that they hadn't. 2,065 respondents did not answer the question or considered it irrelevant.
In summary, the working set provides a total of 2,476 observations of respondents who have at least experienced some symptoms of alcohol dependence. Of these 193, or about 8%, have sought treatment at a rehab facility. This provides a rough estimation of the number of respondents who have actively tried to recover from alcohol dependence, which is a central focus of the research question.

Note: I am still trying to figure out how to sort the variables in the output correct order. The syntax in the example program has been deprecated and replaced with new functions.

Comments

  1. Nice work,
    I think you are in right path and the sorting the data will be on the next week's lectures.

    ReplyDelete

Post a Comment

Popular posts from this blog

Creating Graphs - Week 4

Program for Making Data Management Decisions