From zero to hero: A structured learning path for mastering NumPy
Getting Started โข Learning Path โข Examples โข Contributing โข License
This repository provides a comprehensive, step-by-step guide to mastering NumPy, the fundamental package for scientific computing in Python. Each phase builds systematically on previous knowledge, with practical examples and clear explanations.
- ๐ฐ Beginners looking to build a solid foundation in NumPy
- ๐ Intermediate users wanting to deepen their understanding of advanced features
- ๐ Students preparing for data science, machine learning, or AI coursework
- ๐ผ Professionals transitioning to roles requiring numerical computation skills
For those familiar with Python environments, get started immediately:
git clone https://github.com/Sourabh-Kumar04/Numpy-Basic.git
cd Numpy-Basic
python -m venv venv && source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
jupyter notebook
Begin with Phase_1/01_phase_1.ipynb
and progress through each phase sequentially.
Phase | Status | Topics | Key Concepts |
---|---|---|---|
๐งฉ Phase 1 | โ Complete | NumPy Fundamentals | Arrays vs Lists, Creating Arrays, Data Types, Basic Operations |
๐ Phase 2 | โ Complete | Data Manipulation | Indexing, Slicing, Sorting, Boolean Masks, Fancy Indexing |
๐ Phase 3 | โ Complete | Array Transformation | Reshaping, Stacking, Splitting, Broadcasting Rules |
๐งฎ Phase 4 | ๐ง In Progress | Advanced Topics | Vector/Matrix Operations, Trigonometric Functions, Statistics, File Operations |
Numpy-Basic/
โโโ LICENSE # Apache 2.0 License
โโโ README.md # Project documentation
โโโ main.py # Example runner script
โโโ pyproject.toml # Project dependencies and configuration
โโโ uv.lock # Dependency lock file (for uv users)
โโโ Phase_1/
โ โโโ 01_phase_1.ipynb # NumPy Basics: Arrays, Creation, Types
โโโ Phase_2/
โ โโโ 01_phase_2.ipynb # Data Access: Indexing, Slicing, Filtering
โโโ Phase_3/
โ โโโ 01_phase_3.ipynb # Data Transformation: Reshaping, Stacking, Broadcasting
โโโ Phase_4/
โโโ 01_phase_4.ipynb # Advanced: Math Operations, Statistics, Visualization
โโโ array1.npy # Sample data file for practice
โโโ array2.npy # Sample data file for practice
โโโ array3.npy # Sample data file for practice
โโโ numpy_logo.npy # NumPy logo encoded as an array
- Python 3.8+ installed
- Git (for cloning the repository)
- Basic familiarity with Python programming
- Clone the repository
git clone https://github.com/Sourabh-Kumar04/Numpy-Basic.git
cd Numpy-Basic
- Set up a virtual environment (Choose your preferred method)
# Option 1: Standard venv
python -m venv venv
source venv/bin/activate # On macOS/Linux
# OR
venv\Scripts\activate # On Windows
# Option 2: Using uv (faster alternative)
uv venv
uv activate
- Install dependencies
# Option 1: Using pip
pip install -r requirements.txt
# Option 2: Using uv with pyproject.toml
uv pip install -e .
- Launch Jupyter Notebook
jupyter notebook
- Why NumPy over standard Python lists?
- Performance benchmarks showing speed differences
- Memory efficiency comparisons
- Vectorized operations
- Creating arrays from different sources
- From Python lists
- Using built-in functions:
zeros()
,ones()
,arange()
,linspace()
- Random number generation
- Understanding array data types and properties
- Accessing array elements
- Basic indexing vs fancy indexing
- Difference between views and copies
- Slicing multi-dimensional arrays
- Advanced selection with boolean masks
- Filtering data with conditions
- Combining multiple conditions
- Practical comparison between
np.where()
and boolean indexing - Sorting arrays and finding unique values
- Inspecting array properties
- Shape, size, dimensions, data type
- Reshaping arrays
reshape()
,ravel()
,flatten()
- Adding/removing dimensions with
newaxis
andsqueeze()
- Combining arrays
- Vertical stacking with
vstack()
- Horizontal stacking with
hstack()
- General stacking with
concatenate()
- Vertical stacking with
- Broadcasting rules and compatibility
- When operations work between arrays of different shapes
- Common broadcasting errors and how to fix them
- Vector, matrix, and tensor operations
- Dot products, cross products, matrix multiplication
- Linear algebra operations
- Comprehensive angle function reference
- Trigonometric functions (
sin
,cos
,tan
) - Inverse trigonometric functions (
arcsin
,arccos
,arctan2
)
- Trigonometric functions (
- Statistical functions for data analysis
- Measures of central tendency
- Measures of dispersion
- Percentiles and quantiles
- Working with NumPy's native file formats
.npy
for single arrays.npz
for multiple arrays
- Data visualization with matplotlib
import numpy as np
import time
# Python list operation
start = time.time()
python_list = list(range(1000000))
python_list = [x * 2 for x in python_list]
list_time = time.time() - start
# NumPy array operation
start = time.time()
numpy_array = np.arange(1000000)
numpy_array = numpy_array * 2
numpy_time = time.time() - start
print(f"Python list processing time: {list_time:.5f} seconds")
print(f"NumPy array processing time: {numpy_time:.5f} seconds")
print(f"NumPy is {list_time/numpy_time:.1f}x faster!")
Output
Python list processing time: 0.12345 seconds
NumPy array processing time: 0.00567 seconds
NumPy is 21.8x faster!
import numpy as np
# Create sample data
data = np.random.randint(0, 100, size=(5, 5))
print("Original data:")
print(data)
# Boolean masking (values greater than 50)
mask = data > 50
filtered_data = data[mask]
print("\nValues greater than 50:")
print(filtered_data)
# Using np.where() for conditional values
result = np.where(data > 50, data * 2, data)
print("\nValues > 50 doubled, others unchanged:")
print(result)
import matplotlib.pyplot as plt
import numpy as np
# Set dark style
plt.style.use('dark_background')
# Generate data
x = np.linspace(0, 2 * np.pi, 100)
y1 = np.sin(x)
y2 = np.cos(x)
# Create plot
plt.figure(figsize=(10, 6))
plt.plot(x, y1, label='sin(x)', color='cyan', linewidth=2)
plt.plot(x, y2, label='cos(x)', color='magenta', linewidth=2)
plt.title("Trigonometric Functions", fontsize=16)
plt.xlabel("x (radians)", fontsize=12)
plt.ylabel("Amplitude", fontsize=12)
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Function | Description | Example |
---|---|---|
np.mean() |
Arithmetic mean | np.mean(arr, axis=0) |
np.median() |
Median value | np.median(arr) |
np.std() |
Standard deviation | np.std(arr, ddof=1) |
np.var() |
Variance | np.var(arr) |
np.min() |
Minimum value | np.min(arr, axis=1) |
np.max() |
Maximum value | np.max(arr) |
np.percentile() |
nth percentile | np.percentile(arr, 75) |
np.quantile() |
nth quantile | np.quantile(arr, [0.25, 0.5, 0.75]) |
np.corrcoef() |
Correlation coefficient | np.corrcoef(x, y) |
np.cov() |
Covariance matrix | np.cov(x, y) |
import numpy as np
# Create sample array
array = np.random.normal(0, 1, size=(100, 100))
# Save to .npy file
np.save('sample_array.npy', array)
# Load from .npy file
loaded_array = np.load('sample_array.npy')
# Verify it's the same
print("Arrays are identical:", np.array_equal(array, loaded_array))
Why use NumPy instead of Python lists?
NumPy arrays are more efficient than Python lists for numerical operations because:- They store data in contiguous memory blocks
- They leverage vectorized operations (SIMD instructions)
- They offer specialized numerical functions optimized in C
- They use less memory for the same amount of numerical data
What's the difference between a view and a copy?
- A view is just a different way to access the same data - changes to the view affect the original array
- A copy is a new array with the same values - changes to the copy don't affect the original
- Basic slicing typically returns views, while advanced indexing returns copies
What are broadcasting rules?
Broadcasting allows NumPy to perform operations on arrays of different shapes. The rules are:- Arrays are compared from their trailing dimensions
- Dimensions with size 1 are stretched to match the other array
- Missing dimensions are treated as having size 1
- If dimensions are compatible, broadcasting proceeds
- Official NumPy Documentation
- NumPy Cheat Sheet (PDF)
- From Python to NumPy
- NumPy Tutorials
- 100 NumPy Exercises
Contributions to improve this repository are welcome! Here's how you can help:
- Fork the repository
- Create a branch for your feature or fix
- Commit your changes with descriptive messages
- Push to your branch
- Submit a Pull Request
git commit -m "โจ Added Phase_3: Array reshaping and broadcasting examples"
Emoji | Description |
---|---|
โจ | New features or content |
๐ | Bug fixes |
๐ | Documentation updates |
๐ง | Configuration changes |
๐งน | Code cleanup |
๐จ | Style improvements |
- GitHub Discussions: Open a discussion
- Issue Tracker: Report bugs or request features
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
If you use this repository in your research or educational materials, please cite it as:
@misc{kumar2025numpybasic,
author = {Kumar, Sourabh},
title = {NumPy-Basic: Comprehensive Guide to NumPy Fundamentals},
year = {2025},
publisher = {GitHub},
url = {https://github.com/Sourabh-Kumar04/Numpy-Basic},
howpublished = {\url{https://github.com/Sourabh-Kumar04/Numpy-Basic}},
}
- Author: Sourabh Kumar
- GitHub: @Sourabh-Kumar04
- LinkedIn: linkedin.com/in/sourabh-kumar04