Python Libraries for Computational Statistics, Data Science & Machine Learning

Python libraries for computational statistics and data science

Python is a perfect choice for working with data, but it is not the only choice, R is also a great choice for instance. But since python is very accessible and it is a general-purpose language that you can use for other projects, it may be the best choice for you.  If so, here is a list of libraries that you should be familiar with if you are working with data and python.

NumPy

NumPy is a package for efficient scientific computing in Python. It offers capabilities for handling N-dimensional arrays using libraries that are highly efficient as it is a wrapper around C and Fortran libraries that are well-tested and very fast. It also provides features for linear algebra, Fourier transform, and better random number generation then what is available in python itself.

website: http://www.numpy.org

SciPy

SciPy is an expansion of the NumPy library. It contains functions for linear algebra, interpolation, integration, clustering, and so on.

website: https://www.scipy.org/

Pandas

Pandas is a library for data structures for handling your data sets. It allows for handling table-like structures called DataFrame objects. This has powerful and efficient numerical operations similar to NumPy’s array object.

website: http://pandas.pydata.org/

 

Scikit-learn

Scikit-learn is the most popular machine learning library for Python. It provides many functionalities, such as preprocessing data, classification, regression, clustering, dimensionality reduction, and model selection.

website: http://scikit-learn.org/

 

Weka

Weka is a Java library for data mining and machine learning, but it has a python wrapper and you may want to learn more about it.  It is open source and free to use and maintained by the University of Waikato.

website: https://pypi.python.org/pypi/python-weka-wrapper and https://www.cs.waikato.ac.nz/ml/weka/index.html

 

 

Mathplot

Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of formats and interactive environments across multiple platforms.  It is a graphing and plotting library that you can use to make figures, but it is also very powerful for general visualizations.

website: http://matplotlib.org/

Shogun

Shogun is a machine learning library for Python, which focuses on large-scale kernel methods such as support vector machines (SVMs). This library comes with a range of different SVM implementations.

website: http://www.shogun-toolbox.org/