# 医学生数据科学之路-Ⅰ（基础篇）

The data science roadmap for medical students

• 弱化数学及统计学
• 短期速成，容易掌握
• 启发思维为主，主张代码复用
• 符合医学思维和医药数据需求
• 实践为王，解决医学科学问题

## Week 3 / 扫盲科普

### Data！Data！Data！

#### 统计学数据类型

• Numeric
Data that are expressed on a numeric scale.

• Continuous
Data that can take on any value in an interval. (Synonyms: interval, float, numeric)
• Discrete
Data that can take on only integer values, such as counts. (Synonyms: integer, count)
• Categorical
Data that can take on only a specific set of values representing a set of possible categories. (Synonyms: enums, enumerated, factors, nominal)

• Binary
A special case of categorical data with just two categories of values, e.g., 0/1, true/false. (Synonyms: dichotomous, logical, indicator, boolean)
• Ordinal
Categorical data that has an explicit ordering. (Synonym: ordered factor)

#### Rectangular Data

The typical frame of reference for an analysis in data science is a rectangular data object, like a spreadsheet or database table.

xml,json,csv.....
##### Data frame
Rectangular data (like a spreadsheet) is the basic data structure for statistical and machine learning models.
##### Feature

A column within a table is commonly referred to as a feature.

attribute, input, predictor, (independent) variable, regressors, covariates
##### Outcome measurement Y

Many data science projects involve predicting an outcome Y

dependent variable, response, target, output
• In the regression problem, Y is `quantitative` (e.g price, blood pressure).
• In the `classification problem`, Y takes values in a finite, unordered set (survived/died, digit 0-9, cancer class of tissue sample).
##### Records

A row within a table is commonly referred to as a record.

case, example, instance, observation, pattern, sample