The universe of machine learning and data science is a fascinating one. On the surface, it may seem as though one is inundated with data in various forms – be it text, image, or voice, however, if dealt with properly, it makes for not just a great learning experience, but a rather enjoyable one too!
No matter what data you’re dealing with, the first step always should be to make it analyzable. Indeed, to deal with such massive data, a large amount of computational power is also needed.
For this reason, efficient storage and data manipulation is the key role of any tool. NumPy (Numerical Python) provides an efficient interface that allows one to store and operate on different kinds of numerical data. And that is what we are going to discuss in this blog. We are sure that you are going to enjoy learning about NumPy!
Introduction to NumPy:
Surely, you all know that Python is a great general-purpose programming language on its own. However, popular libraries like NumPy make it a powerful environment for scientific computing. In technical terms, NumPy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object and tools for working with these arrays. NumPy contains many other things as well:
- A powerful N-dimensional array object
- Broadcasting functions
- Tools for integrating C/C++
. . . .
Difference between NumPy arrays and Python Lists :
|NumPy Arrays :||Python Lists :|
|1. NumPy arrays are Homogeneous (All elements are of same type).|
Eg: np.array([1.0, 3.1, 5e-04])
|1. Python Lists are Heterogeneous (Elements can be different type)
Eg: [ 1, -0.038, 'gear', True]
|2. NumPy array has various methods, functions, and variables to ease complex computations.||2. Python lists has few methods compared to NumPy arrays.|
|3. Elements of a NumPy arrays are stored contiguously in memory. (More about this is given below).||3. Elements in Python lists are stored non-contiguously in memory.|
Why should we use NumPy?
One simple answer to this question is that it definitely eases our programming life! Just imagine, if you had an efficient way of handling data for mathematical operations – wouldn’t you prefer it to writing the same on your own?
Let’s consider a quick example to understand it better. Here, we will compare Python’s built-in sum() function and NumPy function np.sum() to compute the sum of all values.
While Python’s built-in sum function takes approximately 362 milliseconds per loop whereas np.sum function takes only 1.9 microseconds!
Note: The sum function and np.sum function are not identical. np.sum is mostly useful for multi dimensional arrays.
Applications of NumPy:
- Mathematics and Statistics 📉
- Visualisations (NumPy along with libraries like Matplotlib)📊
- Machine Learning 🤖
- Image Processing 📸 and many more
We can use pip command to install NumPy package either from command prompt or Jupyter notebook:
! pip install numpy
By convention, we shall import NumPy as np for ease of coding. To get the version of NumPy, you can use the command numpy.__version__.
Section -1: In this section we will learn:
- How to create NumPy arrays from Python Lists
- How to create NumPy arrays from scratch
- How to create an array with sequence of numbers
- How to create an array of random numbers
Creating arrays from Python lists:
A python list can be converted into a NumPy array using the command numpy.array().
Note: In Python, a list can contain different data types but in a NumPy array, all elements must be of the same type. If not, NumPy would upcast, if possible.
Creating N-Dimensional arrays using NumPy from scratch:
NumPy provides different methods to create different types of arrays, like:
- np.zeros # Creates an array filled with zeros
- np.ones # Creates an array filled with ones
- np.full # Creates an array of specified sizes filled with a given number
- np.arange # Creates an array from given starting to ending numbers (it starts with zero by default)
Let us now look at a visualization of how data is stored in a NumPy array:
np.arange() method creates a sequence of numbers in the given range. It has four basic parameters:
- start — starts the array from the start number
- stop — ends the array (excluding the last value)
- step — value to step-up
- dtype — the type of array or matrices
Creating array of Random numbers:
The command np.random() gives the random samples. This module contains simple random data generating methods, distribution functions and random generator functions. Some functions in np.random module are as follows:
- np.random.rand() # Generates uniform distribution of random numbers
- np.random.normal() # Generates normal distribution of random numbers(mean 0 and variance 1)
- np.random.randint() # Generates random integers from inclusive(low) to exclusive(high)
- np.random.random() # Generates random float values in interval [0.0,1.0]
Section – 2 : In this section, we will learn:
- How to do NumPy array manipulation to access data and subarrays
- How to split, reshape, and join NumPy arrays
Note: numpy.random() generates different random numbers for each execution. In order to generate the same random numbers we will use np.seed().
Each array has the following attributes – ndim (the number of dimensions), shape (the size of each dimension), and size (the total size of the array):
Indexing of NumPy arrays is similar to Python’s list indexing. In a 1D array, the ith value can be accessed by specifying the desired index in square brackets.
Indexing of N-Dimensional NumPy array:
In a multi-dimensional array, items can be accessed using a comma-separated tuple.
Multi-dimensional NumPy array values can be modified using index notation, as specified above:
Note: Unlike python lists, NumPy arrays have a fixed type of element. If we try to modify the value of NumPy array with different types of elements, the assigned value will be truncated (or) neglected.
NumPy Subarrays and Array Slicing:
NumPy slicing follows the standard Python list slicing. We use square brackets to access individual array elements and colon ( : ) to slice and access subarrays.
x[start:stop:step] . By default
start =0,stop =size of dimension, step =1
Slicing multi-dimensional NumPy array:
In a multi-dimensional NumPy array, [:,:] specifies all rows and all columns. [:2,:] specifies array from row index 0 to exclusive index 2.
NumPy array can be copied using
Reshaping of NumPy arrays:
reshape is one of the important functions in NumPy library. It converts an m-dimensional array to n-dimensional matrices/array.
Note: In order to make reshape() work properly, the size of the initial array must match the size of the reshaped array. One of the common uses of this method is to convert 1-dimensional array to n-dimensional array.
Instead of using the reshape method, this can be more easily done by newaxis keyword.
NumPy array Concatenation:
Concatenation or joining of two arrays in NumPy is primarily done using three methods:
np.hstack# Horizontal Stack
np.concatenate has two arguments — arrays to concatenate and axis.
axis = 0# Concatenate along row-wise (default)
axis = 1# Concatenate along column-wise
With this, we conclude the first part of our Basics of NumPy series. We hope we simplified it for you and made it enjoyable for you to read and learn. In the next part, we will cover some interesting topics few in-depth like Arithmetic functions, Trigonometric functions, Exponents and Logarithms, Minimum and Maximum, etc.
Complete Jupyter Notebook: https://jovian.ml/v-snehith999/numpy-basics-part-1?action=embed&cellId=2
Basics of NumPy – Part 2: https://hub.jovian.ml/?p=654&preview=true
– Snehit Vaddi
I am a Machine Learning enthusiast. I teach machines how to see, listen, and learn.