Python Libraries for Computational Statistics, Data Science & Machine Learning

Python libraries for computational statistics and data science

Python is a perfect choice for working with data, but it is not the only choice, R is also a great choice for instance. But since python is very accessible and it is a general-purpose language that you can use for other projects, it may be the best choice for you.  If so, here is a list of libraries that you should be familiar with if you are working with data and python.


NumPy is a package for efficient scientific computing in Python. It offers capabilities for handling N-dimensional arrays using libraries that are highly efficient as it is a wrapper around C and Fortran libraries that are well-tested and very fast. It also provides features for linear algebra, Fourier transform, and better random number generation then what is available in python itself.



SciPy is an expansion of the NumPy library. It contains functions for linear algebra, interpolation, integration, clustering, and so on.



Pandas is a library for data structures for handling your data sets. It allows for handling table-like structures called DataFrame objects. This has powerful and efficient numerical operations similar to NumPy’s array object.




Scikit-learn is the most popular machine learning library for Python. It provides many functionalities, such as preprocessing data, classification, regression, clustering, dimensionality reduction, and model selection.




Weka is a Java library for data mining and machine learning, but it has a python wrapper and you may want to learn more about it.  It is open source and free to use and maintained by the University of Waikato.

website: and




Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of formats and interactive environments across multiple platforms.  It is a graphing and plotting library that you can use to make figures, but it is also very powerful for general visualizations.



Shogun is a machine learning library for Python, which focuses on large-scale kernel methods such as support vector machines (SVMs). This library comes with a range of different SVM implementations.