HMLP: High-performance Machine Learning Primitives
HMLP (High Performance Machine Learning Primitives)

WARNING

HMLP and GOFMM are research projects and not in production. For SC'17 and SC'18 artifacts, see /artifact and our our GOFMM papers [SC'17 and SC'18] for details.

README

Thank you for deciding to give HMLP a try! This code can be used directly or pluged into many learning tasks such as kernel summation, nearest neighbor search, k-means clustering, convolution networks. It is also possible to create a specific general N-body operator by using our template frameworks.

See INSTALL on how to install it. Checkout LICENSE if you want to use or redistribute any part of HMLP. Notice that different parts of HMLP may have different licenses. We usually annotate the specific license on the top of the file.

HMLP (High Performance Machine Learning Primitives) is a portable framework that provides high performance, memory efficient matrix-matrix multiplication like operations and their extension based on the BLIS framework. Currently, HMLP has implementations on Intel x86_64, Knights Landing, ARM and NVIDIA GPU. We may further support other architectures in the future.

Depending on your need, you may only require the basic features if you just want to use some existing primitives. Acoording to the feature you require we suggest that you read different part of documents. Please checkout the wiki pages at:

WIKI PAGE

to see what feature better suit your need.

Architecture dependent implementations (a.k.a microkernels or kernels in short) are identified and saperated from the c++ loop base framework. Thus, porting any HMLP primitive to an new architecture only require rewriting the kernel part. You are welcome to contribute more kernels beyond this list. Checkout the guildline on implementing microkernel for HMLP at our wiki pages.

DOCUMENTATION

GOFMM, MPI-GOFMM, HMLP templates, and HMLP runtime APIs are documented by doxygen:

INSTALL

HMLP is tested on LINUX and OSX. Compilation REQUIRES:

  • Intel or GNU compilers with c++11, AVX and OpenMP support (for x86_64);
  • Arm GNU compilers (see [Cross Compilation]() for details on compilation on Android) with OpenMP support (for arm);
  • Intel-16 or later compilers (for Intel MIC, KNL);
  • nvcc (for NVIDIA GPU with capability > 2.0).

Configuration

Edit set_env.sh for compilation options.

You MUST manually setup each environment variable in the **"REQUIRED" CONGIFURATION** if any of those variables were not defined properly on you system.

1 export CC = icc to use Intel C compilers
2 export CXX = icpc to use Intel C++ compilers
3 export CC = gcc to use GNU C compilers
4 export CXX = g++ to use GNU C++ compilers
5 export HMLP_USE_BLAS = false if you don't have a BLAS library.
6 export MKLROOT = xxx to_the_path_of_intel_mkl
7 export OPENBLASROOT = xxx to_the_path_of_OpenBLAS
8 set HMLP_USE_CUDA = true to compile code with cuda templates.

The default BLAS library for Intel compiler is MKL. For GNU compilers, cmake will try to find a proper BLAS/LAPACK library on you system. If cmake fails to locate BLAS/LAPACK, then the compilation will fail as well. You need to manually setup the path in the cmake file.

Cmake installation

1 source set_env.sh
2 mkdir build
3 cd build
4 cmake ..
5 make
6 make install

Cross Compilation

If your Arm is run with OS that has native compiler and cmake support, then the installation instruction above should work just fine. However, while your target runs an Android OS, which currently does not have a native C/C++ compiler, you will need to cross compile HMLP on your Linux or OSX first. Although there are many ways to do cross compilation, we suggest that users follow these instructions:

  1. Install Android Studio with LLDB, cmake and NDK support.
  2. Create stand-alone-toolchain from NDK.
  3. Install adb (Android Debug Bridge)
  4. Compile HMLP with cmake. It will look for your arm gcc/g++, ar and ranlib support.
  5. Use the following instructions to push executables and scripts in hmlp/build/bin.
1 adb devices
2 adb push /hmlp/build/bin/* /data/local/tmp
3 adb shell
4 cd /data/local/tmp
5 ./run_hmlp.sh
6 ./run_gkmx.sh

EXAMPLE

The default compilation will also compile all the examples in /example. To run some basic examples from the testing drivers:

1 cd /build
2 ./run_hmlp.sh
3 ./run_gkmx.sh

To us HMLP library you need to include the header files <hmlp.h> and link ${HMLP_DIR}/build/libhmlp.a statically or ${HMLP_DIR}/build/libhmlp.so (or .dylib) dynamically.

C/C++ example:

1 {c++}
2 ...
3 #include <hmlp.h>
4 ...

Static linking example:

1 icc ... -I$(HMLP_DIR)/build/include $(HMLP_DIR)/build/libhmlp.a

Dynamic linking example on Linux:

1 icc ... -I$(HMLP_DIR)/build/include -L$(HMLP_DIR)/build/libhmlp.so

Dynamic linking example on Mac:

1 icc ... -I$(HMLP_DIR)/build/include -L$(HMLP_DIR)/build/libhmlp.dylib

TESTING

Following the steps in INSTALL using cmake, Google Test will be downloaded and compiled. All testing routines located in /test will be compiled. All executables locate in /build. To perform the whole test suits, follow these instructions.

1 cd build
2 make test

Optionally, you can perform and coverage analysis and update the report by

1 make coverage

The latest coverage report can be found at

## ACKNOWLEDGEMENTS

1 The HMLP library was primarily authored by
2 
3  Chenhan D. Yu (The University of Texas at Austin)
4 
5 but many others have contributed input and feedback. Contributors
6 are listed acoording to the part they contribute to:
7 
8 
9  GOFMM (Geometry-Oblivious FMM) in SC'17
10 
11  James Levitt (The University of Texas at Austin)
12  Severin Reiz (Technische Universität München)
13  George Biros (The University of Texas at Austin)
14 
15 
16  General Stride K-Nearest Neighbors in SC'15
17 
18  Jianyu Huang (The University of Texas at Austin)
19  Woody Austin (The University of Texas at Austin)
20  George Biros (The University of Texas at Austin)
21 
22 
23  General Stride Kernel Summation in IPDPS'15
24 
25  Bill March (The University of Texas at Austin)
26  George Biros (The University of Texas at Austin)
27 
28 
29  2-level Strassen in SC'16
30 
31  Jianyu Huang (The University of Texas at Austin)
32  Leslie Rice (The University of Texas at Austin)
33 
34 
35  HMLP on Arm
36 
37  Jianyu Huang (The University of Texas at Austin)
38  Matthew Badin (Qualcomm Corp. Santa Clara)
39 
40 
41  BLIS framework support
42 
43  Tyler Smith (The University of Texas at Austin)
44  Robert van de Geijn (The University of Texas at Austin)
45  Field Van Zee (The University of Texas at Austin)
46 
47 
48 The gratitude especially goes to the following individual who walks
49 me through the whole BLIS framework.
50 
51  Field Van Zee (The University of Texas at Austin)

Thank you again for being intersted in HMLP!

Best regards,

Chenhan D. Yu — chenh.nosp@m.an@c.nosp@m.s.ute.nosp@m.xas..nosp@m.edu