Python Libraries for Computational Statistics, Data Science & Machine Learning
Python libraries for computational statistics and data science
Python is a perfect choice for working with data, but it is not the only choice, R is also a great choice for instance. But since python is very accessible and it is a general-purpose language that you can use for other projects, it may be the best choice for you. If so, here is a list of libraries that you should be familiar with if you are working with data and python.
NumPy
NumPy is a package for efficient scientific computing in Python. It offers capabilities for handling N-dimensional arrays using libraries that are highly efficient as it is a wrapper around C and Fortran libraries that are well-tested and very fast. It also provides features for linear algebra, Fourier transform, and better random number generation than what is available in python itself.
website: http://www.numpy.org
SciPy
SciPy is an expansion of the NumPy library. It contains functions for linear algebra, interpolation, integration, clustering, and so on.
website: https://www.scipy.org/
Pandas
Pandas is a library for data structures for handling your data sets. It allows for handling table-like structures called DataFrame objects. This has powerful and efficient numerical operations similar to NumPy’s array object.
website: http://pandas.pydata.org/
Scikit-learn
Scikit-learn is the most popular machine-learning library for Python. It provides many functionalities, such as preprocessing data, classification, regression, clustering, dimensionality reduction, and model selection.
website: http://scikit-learn.org/
Weka
Weka is a Java library for data mining and machine learning, but it has a python wrapper and you may want to learn more about it. It is open source and free to use and maintained by the University of Waikato.
website: https://pypi.python.org/pypi/python-weka-wrapper and https://www.cs.waikato.ac.nz/ml/weka/index.html
Mathplot
Matplotlib is a Python 2D plotting library that produces publication-quality figures in a variety of formats and interactive environments across multiple platforms. It is a graphing and plotting library that you can use to make figures, but it is also very powerful for general visualizations.
website: http://matplotlib.org/
Shogun
Shogun is a machine learning library for Python, which focuses on large-scale kernel methods such as support vector machines (SVMs). This library comes with a range of different SVM implementations.
website: http://www.shogun-toolbox.org/