**First published:** 19 Nov 2016

**Last updated:** 19 Nov 2016

Machine learning theory makes extensive use of mathematics, which might put you off if you are not familiar with the requisite mathematical topics and notation. Having said that, it is still possible to learn and apply machine learning algorithms without knowing the mathematical details. For instance, it is trivial to apply linear regression using a library such as scikit-learn, without the need to know about gradient descent and cost functions. Similarly, you can also implement complex neural networks using various hidden layers using tools such as Keras without knowing and understanding the details of backpropagation.

However, I would recommend you spend some time learning the fundamental mathematical knowledge to get some insight into how the machine learning algorithm you use actually works. This will help you make informed decisions when choosing between different approaches and even more importantly it will help you to figure out why you might not be getting a good performance.

If on the other hand you plan to really learn machine learning and want to understand cutting edge research papers in the field, then the mathematical knowledge is a requirement.

The mathematical topics you need to look into to get started with machine learning are probability and statistics, linear algebra and calculus. Information theory is another very important topic if you really want to go deep into machine learning theory.

**By Salman Khan - Khan Academy**

If your mathematical knowledge is a little bit rusty or need a gentle introduction to the basics of probability, statistics, linear algebra and calculus, the brief one-to-one lectures and interactive exercises at Khan Academy - Math will help you get up to speed real quick.

**By Joe Blitzstein - Harvard University**

In these 34 video lectures you will get an introduction to probability including topics such as sample spaces and events, conditioning, Bayes' Theorem, distributions, both univariate and multivariate, markov chains, and limit theorems, i.e. the law of large numbers and the central limit theorem. The handouts and practice problems with solutions are also available.

**By Gilbert Strang - MIT**

As part of the MIT OpenCourseWare, this Linear Algebra course is delivered by eminent professor Gilbert Strang, an excellent lecturer.

**By David Jerison - MIT**

As part of the MIT OpenCourseWare, this single variable calculus course covers differentiation and integration of functions of one variable.

**By Denis Auroux - MIT**

As part of the MIT OpenCourseWare, this multivariable calculus course follows on from where 18-01sc Single Variable Calculus left, to cover differentiation and integration of functions of more than one variable.

**By mathematicalmonk**

The video lectures by mathematicalmonk assume a certain level of mathematical knowledge since they are intended for senior undergraduates or graduate students. Having said that, the topics are introduced at a gentle pace and are very well explained, so do give them a try. I would suggest starting off with the probability primer series and if you really want to study machine learning in detail you should also view the information theory series of lectures.

**By David J. C. MacKay**

Information theory, founded by Claude Shannon, is a fundamental theory underpinning many modern technologies amongst which data compression and machine learning. I think the best way to learn information theory is through this book by the late David MacKay. A series of 16 video lectures are also available here.

**By Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani**

This book is really good if you are getting started, and is available online for free, get it here. It uses the R statistical language, however, if you are more into Python, Jordi Warmenhoven was kind enough to provide a GitHub repo implementing some of the book code in Python.

**By Trevor Hastie, Robert Tibshirani and Jerome Friedman**

This book is also by Hastie and Tibshirani, amongst others, but is much more mathematical than An Introduction to Statistical Learning with Applications in R. In fact, this was the original book on the subject by the authors and the introduction book was released later to cater for a less mathematically knowledgeable audience that prefers a more hands on approach. This book is also available for free, get it here.

**By David Barber**

This book covers a lot of material, from probabilistic graphical models, Bayesian methods, supervised and unsupervised methods, Gaussian processes and mixture models, all the way to dynamical systems for timeseries analysis and predictions. Although the mathematical content in this book is introduced gradually and kept to the minimum possible, an undergraduate level of knowledge in statistics, linear algebra and calculus are assumed. I would say it sits somewhere in between the An Introduction to Statistical Learning with Applications in R book and the more maths heavy books such as Kevin Murphy's Machine Learning: A Probabilistic Approach and Christopher M. Bishop's Pattern Recognition and Machine Learning.

**By mathematicalmonk**

Just like the probability primer and information theory video lectures by mathematicalmonk, the machine learning lecture series are a gem. They start off explaining the difference between supervised and unsupervised machine learning, moving on to k-nearest neighbour classification, trees, bootstrapping, random forests, maximum likelihood estimation (MLE) and maximum a posteriori (MAP) estimation, all the way to bayesian methods, graphical models, and hidden markov models. There are over 100 lectures of not more than 15 minutes each.

**By Andrew Ng - Coursera**

If I am not mistaken this is the first massive open online course (MOOC) on machine learning delivered by one of the finest minds in the field, Andrew Ng, professor at Stanford University. Sign up for free on Coursera to enroll on the machine learning course. The material is explained very well and although there is some mathematical notation involved every now and then, it is kept to a minimum, with the emphasis being more on getting an intuition for how the algorithms presented work.

**By Andrew Ng - Stanford University**

If the machine learning course delivered by Andrew Ng on Coursera left you wanting to learn more about the mathematical underpinnings, head to the machine learning lecture collection delivered by Andrew Ng at Stanford University.

**By Hugo Larochelle - Université de Sherbrooke**

Hugo Larochelle's neural network class covers a lot of material starting from simple feedforward neural networks and how to train them, then moving on to conditional random fields and restricted Boltzmann machines, and finishing off with deep neural networks with applications in computer vision and natural language processing.

**By Fei-Fei Li, Andrej Karpathy and Justin Johnson - Stanford University**

CS231n focuses on deep learning for computer vision applications, specifically convolutional neural networks. If you want to supplement the lectures with notes you can access them here.

**By Richard Socher**

CS224d focuses on deep neural network techniques applied to natural language processing, covering topics such as recurrent neural networks, long short-term memory (LSTM) architecture, recursive neural networks, convolutional neural networks for sentence classification and also dynamic memory networks. If you prefer to read notes and other suggested readings, you can get them from here.