20 Examples to Master Pandas Series

20 Examples to Master Pandas Series

A core data structure of Pandas

Photo by Yannis Zaugg on Unsplash

Pandas is a highly popular data analysis and manipulation library for Python. It provides versatile and powerful functions to handle data in tabular form.

The two core data structures of Pandas are DataFrame and Series. DataFrame is a two-dimensional structure with labelled rows and columns. It is similar to a SQL table. Series is a one-dimensional labelled array. The labels of values in a Series are referred to as index. Both DataFrame and Series are able to store any data type.

In this article, we will go through 20 examples that demonstrate various operations we can perform on a Series.

Let’s first import the libraries and then start with the examples.

import numpy as np
import pandas as pd

1. DataFrame is composed of Series

An individual row or column of a DataFrame is a Series.

Consider the DataFrame on the left. If we select a particular row or column, the returned data structure is a Series.

a = df.iloc[0, :]
print(type(a))
pandas.core.series.Series
b = df[0]
type(b)
pandas.core.series.Series

2. Series consists of values and index

Series is a labelled array. We can access the values and labels which are referred to as index.

ser = pd.Series(['a','b','c','d','e'])print(ser.index)
RangeIndex(start=0, stop=5, step=1)
print(ser.values)
['a' 'b' 'c' 'd' 'e']

3. Index can be customized

As we see in the previous example, an integer index starting from zero are assigned to a Series by default. However, we can change it using the index parameter.

ser = pd.Series(['a','b','c','d','e'], index=[10,20,30,40,50])print(ser.index)
Int64Index([10, 20, 30, 40, 50], dtype='int64')

4. Series from a list

We have already seen this in the previous examples. A list can be passed to the Series function to create a Series.

list_a = ['data', 'science', 'machine', 'learning']ser = pd.Series(list_a)type(ser)
pandas.core.series.Series

5. Series from a NumPy array

Another common way to create a Series is using a NumPy array. It is just like creating from a list. We only change the data passed to the Series function.

arr = np.random.randint(0, 10, size=50)ser = pd.Series(arr)

6. Accessing individual values

Since Series contains labelled items, we can access to a particular item using the label (i.e. the index).

ser = pd.Series(['a','b','c','d','e'])print(ser[0])
a
print(ser[2])
c

7. Slicing a Series

We can also use the index to slice a Series.

ser = pd.Series(['a','b','c','d','e'])print(ser[:3])
0 a
1 b
2 c
dtype: object
print(ser[2:])
2 c
3 d
4 e
dtype: object

8. Data types

Pandas assigns an appropriate data type when creating a Series. We can change it using the dtype parameter. Of course, an appropriate data type needs to be selected.

ser1 = pd.Series([1,2,3,4,5])
print(ser1)
0 1
1 2
2 3
3 4
4 5
dtype: int64
ser2 = pd.Series([1,2,3,4,5], dtype='float')
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
dtype: float64

9. Number of items in a Series

There are multiple ways to count the number of values in a Series. Since it is a collection, we can use the built-in len function of Python.

ser = pd.Series([1,2,3,4,5])len(ser)
5

We can also use the size and shape functions of Pandas.

ser.size
5
ser.shape
(5,)

The shape function returns the size in each dimension. Since a Series is one-dimensional, we get the length from the shape function. Size returns the total size of a Series or DataFrame. If used on a DataFrame, size returns the product of the number of rows and columns.

10. Unique and Nunique

The unique and nunique functions return the unique values and the number of unique values, respectively.

ser = pd.Series(['a','a','a','b','b','c'])ser.unique()
array(['a', 'b', 'c'], dtype=object)
ser.nunique()
3

11. Largest and smallest values

The nlargest and nsmallest functions return the largest and smallest values in a Series. We get the 5 largest or smallest values by default but it can be changed using the n parameter.

ser = pd.Series(np.random.random(size=500))ser.nlargest(n=3)
292 0.997681
236 0.997140
490 0.996117
dtype: float64
ser.nsmallest(n=2)
157 0.001499
140 0.002313
dtype: float64

12. Series from a dictionary

If we pass a dictionary to the series function, the returned series contains the values of the dictionary. The index is the keys of the dictionary.

dict_a = {'a':1, 'b':2, 'c':8, 'd':5}pd.Series(dict_a)
a 1
b 2
c 8
d 5
dtype: int64

13. Converting data type

We have the option to choose a data type when creating a Series. Pandas allows for changing the data type later on as well.

For instance, the following series contains integers but stored with object dtype. We can use the astype function to convert them to integers.

ser = pd.Series(['1','2','3','4'])ser
0 1
1 2
2 3
3 4
dtype: object
ser.astype('int')
0 1
1 2
2 3
3 4
dtype: int64

14. Number of occurrences of values

The value_counts function returns the number of occurrences of each unique value in a Series. It is useful to get an overview of the distribution of values.

ser = pd.Series(['a','a','a','b','b','c'])ser.value_counts()
a 3
b 2
c 1
dtype: int64

15. From series to list

Just like we can create a Series from a list, it is possible to convert a Series to a list.

ser = pd.Series(np.random.randint(10, size=10))ser.to_list()
[8, 9, 0, 0, 7, 1, 8, 6, 0, 8]

16. Null values

It is likely to have missing values in a Series. Pandas makes it very simple to detect and deal with missing values.

For instance, the count function returns the number of non-missing values in a Series.

ser = pd.Series([1, 2, 3, np.nan, np.nan])ser.count()
3

17. Null values — 2

Another way to detect missing values is the isna function. It returns the Series with boolean values indicating missing values with True.

ser = pd.Series([1, 2, 3, np.nan, np.nan])ser.isna()
0 False
1 False
2 False
3 True
4 True
dtype: bool

We can count the number of missing values by chaining the sum function with the isna function.

ser.isna().sum()
2

18. Rounding up floating point numbers

In data analysis, we are most likely to have numerical values. Pandas is highly capable of manipulating numerical data. For instance, the round function allows for rounding the floating points numbers up to a specific decimal points.

Consider the following Series.

ser
0 0.349425
1 0.552831
2 0.104823
3 0.899308
4 0.825984
dtype: float64

Here is how the round function is used:

ser.round(2)
0 0.35
1 0.55
2 0.10
3 0.90
4 0.83
dtype: float64

19. Logical operators

We can apply logical operators to a Series such as equal, less than, or greater than. They return the Series with boolean values indicating the values that fit the specified condition with True.

ser = pd.Series([1, 2, 3, 4])ser.eq(3)
0 False
1 False
2 True
3 False
dtype: bool
ser.gt(2)
0 False
1 False
2 True
3 True
dtype: bool

The entire list of logical operators:

  • lt : Less than
  • le: Less than or equal
  • gt: Greater than
  • ge: Greater than or equal
  • eq: Equal
  • ne: Not equal

20. Data aggregations

We can apply aggregate functions on a Series such as mean, sum, median an so on. One way to apply them separately on a Series.

ser = pd.Series([1, 2, 3, 4, 10])ser.mean()
4

There is a better way if we need to apply multiple aggregate functions. We can pass them in a list to the agg function.

ser.agg(['mean','median','sum', 'count'])mean       4.0
median 3.0
sum 20.0
count 5.0
dtype: float64

Conclusion

We have done 20 examples that demonstrate the properties of Series and the functions to interact with it. It is just as important as DataFrame because a DataFrame is composed of Series.

The examples in this article cover a great deal of commonly used data operations with Series. There are, of course, more functions and methods to be used with Series. You can learn more advanced or detailed operations as you need them.

Thank you for reading. Please let me know if you have any feedback.

Leave a Comment