There are many libraries and tools available for learning and working on projects that are related to machine learning. In addition, you can use many programming languages to work with these projects as well. This article is focused on two programming languages (Python and Java) and the open source tools that they commonly use. We are also going to be installing Weka, an open source machine learning software package for data mining.
Installing the software
To begin with, you need Java and Python installed on your machine. If you have not done so already, you should install Java, Python and the development tools for them now. You will also want to have an IDE installed so you can write the software. If you need help with the installation on a mac, you can read how to install Python and Java.
Installing Python Tools
Python has a large variety of packages for machine learning, let’s cover a few
- Pkg-config – Pkg-Config is not a python library, but is required for the installation of Graphviz. Pkg-Config is helper tool for compiling binaries.
- GraphViz – A python library for visualizing trees and graphs.
- NumPy – Tools for scientific computing with python! NumPy is commonly used because it allows access to a C library for working with data in memory. For example, NymPy arrays are several times faster than python arrays and can be reshaped and manipulated with ease. So you can, for instance, view a single row or column of a matrix as an array.
- Pillow – An image processing library for python. Other libraries may require this, but in addition, you may use it for working with images, for example, if you are working on Image Recognition.
- Matplotlib – 2D plotting library that you may use to visualize your data.
- PyGraphViz – Python graphing library, also used to produce visualizations of your data.
Ok! Let’s get this all installed.
brew install pkg-config brew install graphviz pip install numpy pillow matplotlib pygraphviz
Finally, for Python, we are going to install the Weka wrapper that will allow us to make calls to the Java implementation of Weka. In order to allow that to work, we are going to need to install javabridge first, then we can install the wrapper.
- JavaBridge – Tool that allows java to be called from within Java.
- Python-Weka-Wrapper – The actual Weka wrapper for python.
pip install javabridge pip install python-weka-wrapper
And that is it, we are all set up and ready to start using python for data mining and machine learning.
Installing Java Tools
For Java, it seems to make sense to install the GUI as well as the package. On the Weka Installation Page there are instructions for multiple types of installations files available. I recommend installing the all in one version first.
Once installed, open the .dmg file, and copy the Weka application to your applications folder. Upon your first launch, you may have to approve the application for security reasons, depending on your OS settings since it is not a signed application.
Using Weka in a Java Project
Now for your java projects that use the weka library, you can easily include the library as a maven requirement.
<!-- https://mvnrepository.com/artifact/nz.ac.waikato.cms.weka/weka-stable --> <dependency> <groupId>nz.ac.waikato.cms.weka</groupId> <artifactId>weka-stable</artifactId> <version>3.8.0</version> </dependency>