DATA5100

DATA5100: Data Mining: R Programming
Short Project 1

Scenario

The only data researcher/chemical analyst has resigned at the Blane Research Company. Prior to this person resigning, the wine dataset results from a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars was under analysis for research recommendations for local growers in the same region. The analysis determined the quantities of 14 constituents found in each of the three types of wines.

Today is your first day as a data researcher/chemical analyst at the Blane Research Company. To get you up to speed, your supervisor has directed you to take the wine.csv file and analyze it using R.

Assignment Instructions

Download the wine.csv file from ulearn. Follow the steps below and take screenshots of your output and place in a word document, with a full description of each screenshot taken. When you are complete with the assignment, name your file ShortProject1.doc then submit via the Short Project 1 submission link.

The attributes are as follows:

1) Class
 2) Alcohol
 3) Malic acid
 4) Ash
5) Alcalinity of ash  
 6) Magnesium
7) Phenols
 8) Flavanoids
 9) Nonflavanoid phenols
 10) Proanthocyanins
11) Color intensity
 12) Hue
 13) OD280/OD315 of diluted wines
 14) Proline            

1) Load data into R

1.Open R Studio and set the directory where you have saved wine.csv as the working directory.

2.Load the wine.csv into an R object named data.

2) Examine the data

1.Determine the dimensionality of data.

2.Determine the column names of data.

3.Determine the structure of data.

4.Determine the attributes of data.

5.List the first 5 rows of data.

6.List the first 5 columns data.

7.List the contents of the Alcohol column in data.

8.Convert last result into a column vector.

9.List the contents of the first 10 rows of the Alcohol column in data.

10.Convert last result into a column vector.

3) Explore Individual Variables

1.Determine the summary (five number summary and mean) of the each of the fourteen variables in the data set (Class, Alcohol, MalicAcid,…).

2.Determine the mean, median, and range of the variable Alcohol.

3.Determine the 0, 25th, 50th, 75th, and 100th percentiles of the variable Phenol.

4.Determine the 10th, 30th, and 65th percentiles of the variable Phenol.

5.Determine the inter-quartile range (IQR) of the variable Hue.

6.Determine the frequency of each data value in the variable Class.

7.View the last result in a pie chart.

8.Determine the variance of the variable Flavanoids.

9.Determine the standard deviation of the variable Flavanoids.

你可能感兴趣的