1． What is the fundamental idea behind Support Vector Machines? （支持向量机背后的基本思想是什么？）
The fundamental idea behind Support Vector Machine is to fit the widest possible “street” between the classes. In other words, the goal is to have the largest possible margin between the decision boundary that separates the two classes and the training instance. When performing soft margin classification, the SVM searches for a compromise between perfectly separating the two classes and having the widest possible street（a few instances may end up on the street）. Another key idea is to use kernels when training on nonlinear datasets.
2． What is a support vector?（什么是支持向量）
After training an SVM, a support vector is any instance located on the “street”, including its border. The decision boundary is entirely determined by the support vectors. Any instance that is not a support vector has no influence whatsoever; you could remove them , add more instances, or move them around, and as long as they stay off the street they won’t affect the decision boundary. Computing the predictions only involves the support vector, not the whole training set.
3． Why is it important to scale the inputs when using SVMs?（为什么在使用支持向量机时特征缩放相当重要？）
SVMs try to fit the largest possible “street” between the classes, so if the training set is not scales, the SVMs will tend to neglect small features (Fig5-2) 如果训练集没有被归一化，那么支持向量机可能会忽略小尺度的特征。
4． Can an SVM classifier output a confidences score when it classifies an instance? What about a probability? （当支持向量机在分类一个实例时，如果输出置信度以及概率）
An SVM classifier can output the distance between the test instance and the decision boundary, and you can use this as a confidence score. However, this score cannot be directly converted into an estimation of the class probability. If you set probability=True when creating an SVM in Scikit-learn, then after training it will calibrate the probabilities using Logistic Regression on the SVM’s scores (training by an additional five-fold cross-validation on the training data).This will add the predict_proba() and predict_log_proba() methods to the SVM/
5． Should you use the primal or the dual form of the SVM problem to train a model on a training set with millions of instances and hundreds of features?
This question applies only to linear SVMs since kernelized can only use the dual from. The computational complexity of the primal form of SVM problem is proportional to the number of training instance m, while the computational complexity of dual from is proportional to a number between m2 and m3.So if there are millions of instance, you should definitely use the primal form, because the dual form will be much too slow.
6． Say you trained an SVM classifier with an RBF kernel. It seems to underfit the training set: Should you increase or decrease γ（gamma）? How about C?
If an SVM classifier trained with an RBF kernel underfit the training set, there might be too much regularization. To decrease it, you need to increase gamma or C(or both)