谁是目标受众? (Who are the targeted audience?)

This tutorial has been prepared for those who want to learn about the basics of NumPy, Pandas and Sklearn. It is specifically useful for algorithm developers and anyone who is curious about Machine Learning and wants to have in depth knowledge about ML or just needs to brush up a few concepts. After completing this tutorial, you will find yourself at a basic level of expertise from where you can take yourself to higher levels of expertise.

本教程是为那些想了解NumPy,Pandas和Sklearn基础知识的人准备的。 对于算法开发人员和对机器学习感到好奇并希望深入了解ML或只需要梳理一些概念的人来说,它特别有用。 完成本教程后,您将发现自己具有基本的专业知识水平,从那里您可以进入更高的专业知识水平。

为什么本教程会有用? (Why would this tutorial prove useful ?)

Since libraries are an integral part of Data preprocessing understanding these libraries is of utmost importance. Knowing the functions these libraries can provide can make your coding tasks a lot simpler and help you save your precious time and energy.

由于库是数据预处理不可或缺的一部分,因此理解这些库至关重要。 了解这些库可以提供的功能可以使您的编码任务简单得多,并帮助您节省宝贵的时间和精力。

To explore any path, we need to brush-up some skills that lay foundation and help us ease our journey to reach our ultimate destination.


In depth knowledge of Python Libraries helps us lay this strong foundation in mastering Machine Learning which proves essential in the long run.


Numpy, Pandas, Scikit-learn are some of these important libraries which can make machine learning a whole lot easier and time saving. They are the pillars on which a strong model can be designed.

Numpy,Pandas,Scikit-learn是其中的一些重要库,这些库可以使机器学习变得更加轻松且省时。 它们是可以设计强大模型的Struts。

什么是python库? (What are python Libraries?)

A Python library is a reusable chunk of code that you may want to include in your programs/ projects. Each library in Python contains a huge number of useful modules that you can import for your everyday programming.

Python库是您可能想包含在程序/项目中的可重用代码块。 Python中的每个库都包含大量有用的模块,您可以将它们导入以进行日常编程。

With technology reaching astonishing heights, Data Science, Artificial Intelligence, Machine Learning are some frequently used buzzwords we get to hear. They have completely transformed the way of living. This technology has proved to be a wonder in itself. So what’s all the fuss?

随着技术达到惊人的高度,数据科学,人工智能,机器学习是我们经常听到的一些流行词。 他们彻底改变了生活方式。 事实证明,这项技术本身就是一个奇迹。 那有什么大惊小怪的?

什么是机器学习? (What is Machine Learning?)

Here’s all you need to know about beginning your journey to excel in machine learning.


Machine Learning (ML) is an application of Artificial Intelligence (AI) that provides the system with the ability to learn and improve from experience without the need for explicit programming. Thus, the formal definition of ML is

机器学习(ML)是人工智能(AI)的一种应用,它使系统无需进行显式编程即可从经验中学习和改进。 因此,ML的正式定义是

A computer program is said to learn from experience ‘E’ concerning some task ‘T’ and some performance measure ‘P’ which improves with experience(E)

据说一个计算机程序可以从经验“ E”中学习有关某些任务“ T”和一些性能指标“ P”的经验,这些经验指标会随着经验的提高而提高(E)

Okay, so now that we are clear with what Machine learning is, let us understand why we should invest time in mastering it.


Goals of studying Machine Learning:


  • To make the computer smarter/more intelligent. The more direct objective in this aspect is to develop a system for specific practical learning tasks in the application domain.

    使计算机更智能/更智能。 在这方面更直接的目标是开发一种用于应用领域中特定实践学习任务的系统。
  • To develop computational models of the human learning process and perform computer simulations.

  • To explore new learning methods and develop general learning algorithms independent of applications.


Now, let us dive right in ML and start our journey to master it. Let us first get acquainted with some python libraries required for Machine Learning.

现在,让我们直接学习ML并开始掌握它的旅程。 首先让我们熟悉机器学习所需的一些python库。

(Note : The scope of python libraries is very vast to cover up, thus only the basic requirement is fulfilled in this article which can get you going with ease)


If you are thinking about a career in Machine Learning or Data science, the very first thing you will need to do is study some libraries.


为什么图书馆在机器学习中很重要? (Why are Libraries important in Machine Learning?)

Machine Learning is largely based upon mathematics. Designing a ML model involves complex mathematical calculations. Python libraries enable us to do these calculations effortlessly without writing numerous lines of code.

机器学习主要基于数学。 设计ML模型涉及复杂的数学计算。 Python库使我们能够轻松进行这些计算,而无需编写大量代码。

NumPy库的基础研究: (Basic Study of NumPy Library:)

NumPy forms the foundation for the machine learning stack. NumPy (Numerical Python) is a python package, consisting of multi-dimensional array objects and a collection of routines for processing these array objects.

NumPy构成了机器学习堆栈的基础。 NumPy(数字Python)是一个python程序包,由多维数组对象和用于处理这些数组对象的例程的集合组成。

In this article, we will cover frequently used NumPy operations used in ML


Firstly, we need to import the NumPy library using the following code:


import numpy as np

import numpy as np

Once we import the NumPy library we can use various routines that come with the library to perform array operations with ease. These include

导入NumPy库后,我们可以使用该库附带的各种例程轻松执行数组操作。 这些包括

  1. Creating a Vector:


1-D array is known as a vector. Vector can be created using NumPy as follows:

一维数组称为向量。 可以使用NumPy创建向量,如下所示:

#Load Libraryimport numpy as np#Create a vector as a Rowvector_row = np.array([11,21,31])#Create vector as a Columnvector_column = np.array([[15],[25],[35]])

2. Creating a Numpy Array: A 2-D array is known as Matrix. It can be created using NumPy as follows:

2.创建一个Numpy数组 :2-D数组称为Matrix。 可以使用NumPy如下创建它:

#Load Libraryimport numpy as np#Create a Matrixmatrix = np.array([[1,2,3],[41,52,63]])print(matrix)

3. Selecting Elements: Selection of one or more elements from the matrix can be done using the NumPy library as follows:

3. 选择元素:可以使用NumPy库从矩阵中选择一个或多个元素,如下所示:

#Load Libraryimport numpy as np#Create a vector as a Rowvector_row = np.array([ 1,2,3,4,5,6 ])#Create a Matrixmatrix = np.array([[1,2,3],[4,5,6],[7,8,9]])print(matrix)#Select 3rd element of Vectorprint(vector_row[2])#Select 2nd row 2nd columnprint(matrix[1,1])#Select all elements of a vectorprint(vector_row[:])#Select everything up to and including the 3rd elementprint(vector_row[:3])#Select the everything after the 3rd elementprint(vector_row[3:])#Select the last elementprint(vector[-1])#Select the first 2 rows and all the columns of the matrixprint(matrix[:2,:])#Select all rows and the 2nd column of the matrixprint(matrix[:,1:2])

Basic Study of Pandas Library:


Pandas which stands for ‘Panel Data’ has so many uses that it might be a time-saver to point out the things it cannot do, instead of what it can! As humans, we have some basic needs similarly, Pandas is the basic need for your data. Pandas help in analyzing, cleaning, and transforming your data.

代表“面板数据”的熊猫有很多用途,以至于指出它不能做的事情而不是它可以做的事很节省时间! 作为人类,我们同样有一些基本需求,Pandas是您数据的基本需求。 熊猫有助于分析,清理和转换数据。

We will now look at some essential bits of information regarding Pandas and its use.


To import Pandas we usually import it with a shorter name (np) since it is easy to use and used widely.


import pandas as pd

import pandas as pd

The primary two components of pandas are Series and DataFrame.


A series is essentially a column, and a Data Frame is a multi-dimensional table made up of a collection of Series.

Series本质上是一 ,而Data Frame是由Series集合组成的多维表

There are many ways to create a Data frame, the simplest method is to create using a dictionary and then pass it to the DataFrame constructor.


1.创建一个数据框并找到值: (1. Creating a Data frame and locating values:)

a. Create a dictionary

一个。 创建字典

data = {‘Pears’: [3, 7, 0, 11],‘oranges’: [0, 9, 5, 2]}

b. Pass it to DataFrame constructor

b。 将其传递给DataFrame构造函数

orders = pd.DataFrame(data)

A dictionary in Python is a pair of keys and values.


Let’s add corresponding keys to the values.


orders= pd.DataFrame(data, index=[‘Jonas’, ‘Dan’, ‘Serena’, ‘Emily’])

c. Locate Values

C。 定位值


2. Reading Values from a CSV file :


With CSV files all you need is a single line to load in the data:


df = pd.read_csv(‘Address where your csv is stored’)

Basic Study of Scikit-Learn Library:


If you are looking for a robust library using which you can use to bring your machine learning models into production, Scikit-learn is always a preferred option.


Scikit-learn supports different operations that are performed by machine learning models like classification, regression, clustering, model selection, etc.


You name it — and scikit-learn has a module for that.


This is the basic prerequisite to get you started with some basic ML models. The more we dive deeper the more libraries you’ll explore.

这是入门一些基本ML模型的基本前提。 我们越深入,您将探索的图书馆越多。

(Image Source: Internet)