20 Examples to Master Pandas Series
A core data structure of Pandas
data:image/s3,"s3://crabby-images/63702/6370264dd7cb5f5f74dcd9e7426ba34701065f85" alt="Image for post"
Pandas is a highly popular data analysis and manipulation library for Python. It provides versatile and powerful functions to handle data in tabular form.
The two core data structures of Pandas are DataFrame and Series. DataFrame is a two-dimensional structure with labelled rows and columns. It is similar to a SQL table. Series is a one-dimensional labelled array. The labels of values in a Series are referred to as index. Both DataFrame and Series are able to store any data type.
In this article, we will go through 20 examples that demonstrate various operations we can perform on a Series.
Let’s first import the libraries and then start with the examples.
import numpy as np
import pandas as pd
1. DataFrame is composed of Series
An individual row or column of a DataFrame is a Series.
Consider the DataFrame on the left. If we select a particular row or column, the returned data structure is a Series.
a = df.iloc[0, :]
print(type(a))
pandas.core.series.Seriesb = df[0]
type(b)
pandas.core.series.Series
2. Series consists of values and index
Series is a labelled array. We can access the values and labels which are referred to as index.
ser = pd.Series(['a','b','c','d','e'])print(ser.index)
RangeIndex(start=0, stop=5, step=1)print(ser.values)
['a' 'b' 'c' 'd' 'e']
3. Index can be customized
As we see in the previous example, an integer index starting from zero are assigned to a Series by default. However, we can change it using the index parameter.
ser = pd.Series(['a','b','c','d','e'], index=[10,20,30,40,50])print(ser.index)
Int64Index([10, 20, 30, 40, 50], dtype='int64')
4. Series from a list
We have already seen this in the previous examples. A list can be passed to the Series function to create a Series.
list_a = ['data', 'science', 'machine', 'learning']ser = pd.Series(list_a)type(ser)
pandas.core.series.Series
5. Series from a NumPy array
Another common way to create a Series is using a NumPy array. It is just like creating from a list. We only change the data passed to the Series function.
arr = np.random.randint(0, 10, size=50)ser = pd.Series(arr)
6. Accessing individual values
Since Series contains labelled items, we can access to a particular item using the label (i.e. the index).
ser = pd.Series(['a','b','c','d','e'])print(ser[0])
aprint(ser[2])
c
7. Slicing a Series
We can also use the index to slice a Series.
ser = pd.Series(['a','b','c','d','e'])print(ser[:3])
0 a
1 b
2 c
dtype: object
print(ser[2:])
2 c
3 d
4 e
dtype: object
8. Data types
Pandas assigns an appropriate data type when creating a Series. We can change it using the dtype parameter. Of course, an appropriate data type needs to be selected.
ser1 = pd.Series([1,2,3,4,5])
print(ser1)
0 1
1 2
2 3
3 4
4 5
dtype: int64
ser2 = pd.Series([1,2,3,4,5], dtype='float')
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
dtype: float64
9. Number of items in a Series
There are multiple ways to count the number of values in a Series. Since it is a collection, we can use the built-in len function of Python.
ser = pd.Series([1,2,3,4,5])len(ser)
5
We can also use the size and shape functions of Pandas.
ser.size
5ser.shape
(5,)
The shape function returns the size in each dimension. Since a Series is one-dimensional, we get the length from the shape function. Size returns the total size of a Series or DataFrame. If used on a DataFrame, size returns the product of the number of rows and columns.
10. Unique and Nunique
The unique and nunique functions return the unique values and the number of unique values, respectively.
ser = pd.Series(['a','a','a','b','b','c'])ser.unique()
array(['a', 'b', 'c'], dtype=object)ser.nunique()
3
11. Largest and smallest values
The nlargest and nsmallest functions return the largest and smallest values in a Series. We get the 5 largest or smallest values by default but it can be changed using the n parameter.
ser = pd.Series(np.random.random(size=500))ser.nlargest(n=3)
292 0.997681
236 0.997140
490 0.996117
dtype: float64ser.nsmallest(n=2)
157 0.001499
140 0.002313
dtype: float64
12. Series from a dictionary
If we pass a dictionary to the series function, the returned series contains the values of the dictionary. The index is the keys of the dictionary.
dict_a = {'a':1, 'b':2, 'c':8, 'd':5}pd.Series(dict_a)
a 1
b 2
c 8
d 5
dtype: int64
13. Converting data type
We have the option to choose a data type when creating a Series. Pandas allows for changing the data type later on as well.
For instance, the following series contains integers but stored with object dtype. We can use the astype function to convert them to integers.
ser = pd.Series(['1','2','3','4'])ser
0 1
1 2
2 3
3 4
dtype: objectser.astype('int')
0 1
1 2
2 3
3 4
dtype: int64
14. Number of occurrences of values
The value_counts function returns the number of occurrences of each unique value in a Series. It is useful to get an overview of the distribution of values.
ser = pd.Series(['a','a','a','b','b','c'])ser.value_counts()
a 3
b 2
c 1
dtype: int64
15. From series to list
Just like we can create a Series from a list, it is possible to convert a Series to a list.
ser = pd.Series(np.random.randint(10, size=10))ser.to_list()
[8, 9, 0, 0, 7, 1, 8, 6, 0, 8]
16. Null values
It is likely to have missing values in a Series. Pandas makes it very simple to detect and deal with missing values.
For instance, the count function returns the number of non-missing values in a Series.
ser = pd.Series([1, 2, 3, np.nan, np.nan])ser.count()
3
17. Null values — 2
Another way to detect missing values is the isna function. It returns the Series with boolean values indicating missing values with True.
ser = pd.Series([1, 2, 3, np.nan, np.nan])ser.isna()
0 False
1 False
2 False
3 True
4 True
dtype: bool
We can count the number of missing values by chaining the sum function with the isna function.
ser.isna().sum()
2
18. Rounding up floating point numbers
In data analysis, we are most likely to have numerical values. Pandas is highly capable of manipulating numerical data. For instance, the round function allows for rounding the floating points numbers up to a specific decimal points.
Consider the following Series.
ser
0 0.349425
1 0.552831
2 0.104823
3 0.899308
4 0.825984
dtype: float64
Here is how the round function is used:
ser.round(2)
0 0.35
1 0.55
2 0.10
3 0.90
4 0.83
dtype: float64
19. Logical operators
We can apply logical operators to a Series such as equal, less than, or greater than. They return the Series with boolean values indicating the values that fit the specified condition with True.
ser = pd.Series([1, 2, 3, 4])ser.eq(3)
0 False
1 False
2 True
3 False
dtype: boolser.gt(2)
0 False
1 False
2 True
3 True
dtype: bool
The entire list of logical operators:
- lt : Less than
- le: Less than or equal
- gt: Greater than
- ge: Greater than or equal
- eq: Equal
- ne: Not equal
20. Data aggregations
We can apply aggregate functions on a Series such as mean, sum, median an so on. One way to apply them separately on a Series.
ser = pd.Series([1, 2, 3, 4, 10])ser.mean()
4
There is a better way if we need to apply multiple aggregate functions. We can pass them in a list to the agg function.
ser.agg(['mean','median','sum', 'count'])mean 4.0
median 3.0
sum 20.0
count 5.0
dtype: float64
Conclusion
We have done 20 examples that demonstrate the properties of Series and the functions to interact with it. It is just as important as DataFrame because a DataFrame is composed of Series.
The examples in this article cover a great deal of commonly used data operations with Series. There are, of course, more functions and methods to be used with Series. You can learn more advanced or detailed operations as you need them.
Thank you for reading. Please let me know if you have any feedback.