When you need ML algorithms implemented clearly in just NumPy
W

When you need ML algorithms implemented clearly in just NumPy

When you need ML algorithms implemented clearly in just NumPy

16,346 stars
N/A forks
N/A contributors

README

Project documentation from GitHub

When You Need ML Algorithms Implemented Clearly in Just NumPy

You know the feeling. You're trying to understand how a machine learning algorithm actually works under the hood, but every implementation you find is either a black-box library call or a tangled mess of framework-specific code. The math makes sense on paper, but the code? Not so much. Enter numpy-ml: a collection of machine learning algorithms written exclusively in NumPy, designed for readability over raw performance.

This isn't another ML framework trying to compete with TensorFlow or PyTorch. It's something rarer—a teaching tool that doubles as a prototyping sandbox. If you've ever wished you could step through the forward pass of a Transformer attention mechanism line by line, or trace the exact math behind a Wasserstein GAN loss, this project is for you.

What It Does

numpy-ml is a Python library that implements a wide range of machine learning algorithms using only NumPy as its numerical backend. No TensorFlow, no PyTorch, no JAX—just NumPy arrays and the algorithms built on top of them. The project covers everything from classical statistical models to modern deep learning architectures.

The available models include Gaussian mixture models (with EM training), hidden Markov models (with Viterbi decoding and Baum-Welch parameter estimation), latent Dirichlet allocation for topic modeling, and a substantial neural networks module. That neural networks section alone covers layers (LSTM, Elman RNN, convolutional layers with padding, dilation, and stride), modules (bidirectional LSTM, Transformer-style multi-headed attention, WaveNet-style residual blocks), optimizers (SGD with momentum, AdaGrad, RMSProp, Adam), weight initializers (Xavier, He), and full models like variational autoencoders and Wasserstein GANs.

Tree-based models are here too—CART decision trees, random forests, and gradient-boosted decision trees. Linear models cover ridge regression, logistic regression, ordinary least squares, and Bayesian linear regression with conjugate priors. The reinforcement learning agents even train on OpenAI Gym environments, though that's an optional install.

Why It's Cool

The value proposition here is unusual, and that's what makes it interesting.

  • Readability is the primary design goal. Most ML codebases optimize for speed or memory efficiency. numpy-ml optimizes for legibility. If you're trying to understand how the forward-backward algorithm works, you can read the actual implementation rather than deciphering optimized C++ bindings.

  • It covers the full spectrum. You get classical models (HMMs, LDA, Gaussian mixtures) alongside modern deep learning components (Transformer attention, WaveNet blocks, GANs). This makes it useful as a reference for both traditional and contemporary ML techniques.

  • The neural network module is surprisingly comprehensive. It includes not just standard lay

Did you like this issue?

Join our weekly newsletter

Love discovering amazing projects?

Help us continue bringing you the best open-source discoveries every week.

Back to Projects
Last updated: Jun 11, 2026