How to create a DataFrame in Pandas from a list and add columns

Imagine we have a list and we want to be able to use it as a Pandas DataFrame in Python, how do we do that?

# Our list is a list of strings
fruit_list = ['Apple', 'Banana', 'Cherry', 'Dragon Fruit', 'Elderberry']
print(fruit_list)

And just for fun, I got my inspiration for these delicious fruits from this website: https://www.whateatly.com/category/fruits/

Never mind, now we created and printed our list of fruit names. Now the question is how to turn this list into a DataFrame.

# First import the Pandas library
import pandas as pd

# Our list is a list of strings
fruit_list = ['Apple', 'Banana', 'Cherry', 'Dragon Fruit', 'Elderberry']
print(fruit_list)

# Turn fruit_list into a DataFrame
df = pd.DataFrame(fruit_list)
df

What have we done? On line 2 we imported the pandas library as pd. On line 5 we created our list of strings. On line 6 we print this list. On line 9 we call Pandas, use the .DataFrame method with the argument fruit_list and store this as our DataFrame df. On line 10 we print the df DataFrame.

Let’s have a look at the result in the console:

['Apple', 'Banana', 'Cherry', 'Dragon Fruit', 'Elderberry']

    0
0	Apple
1	Banana
2	Cherry
3	Dragon Fruit
4	Elderberry

As you can see on line 1 our assembly of fruits is printed as a list, with the brakes etc. From line 3 we can see a table like structure of our fruits, plus a index-column with row-numbers starting from 0. This is what Pandas automatically does for us, and it’s a good help for getting an overview of the DataFrame.

So, now we know how to turn a list into a DataFrame. But there are of course more options how to turn lists into DataFrames:

Two column DataFrame from two lists

Let’s say that we have already our fruit_list and we want to turn it into a shopping list, so with each item we want to add a number next to it. I call this list number_items. And it’s a bit funny to ask in a shop for 25 cherries and 50 elderberry, but that’s how I do it for this exercise.

# First import the Pandas library
import pandas as pd

# Our list is a list of strings called fruit_list
fruit_list = ['Apple', 'Banana', 'Cherry', 'Dragon Fruit', 'Elderberry']

# Create a list of numbers called number_items
number_items = [3, 4, 25, 2, 50)

# Zip both fruit_list and number_items together, store into DataFrame df2 and
# give appropriate column names
df = pd.DataFrame(list(zip(fruit_list, number_items)), columns = ['Fruit', 'Number']
df

Which results in:

          Fruit  Number
0         Apple       3
1        Banana       4
2        Cherry      25
3  Dragon Fruit       2
4    Elderberry      50

Adding a column to a DataFrame

Now let’s spice up our DataFrame df2 a bit by adding a column with rating.

# Add a column 'Rating' with values [8, 9, 7, 9, 5]
df['Rating'] = [8, 9, 7, 9, 5]
df

And the result of our actions shows the added Rating column.

          Fruit  Number  Rating
0         Apple       3       8
1        Banana       4       9
2        Cherry      25       7
3  Dragon Fruit       2       9
4    Elderberry      50       5

Something interesting is the difference between df and print(df). Both ways show a slightly different output.

Note that adding a column with values can also be done by using the .assign() method:

# Adding a column val to our DataFrame by using the .assign() method
df = df.assign( val = [324,35,645,867,78])
df

With the result in the console:

          Fruit  Number  Rating  val
0         Apple       3       8  324
1        Banana       4       9   35
2        Cherry      25       7  645
3  Dragon Fruit       2       9  867
4    Elderberry      50       5   78

Adding a column with 1’s

In some cases it’s necessary to add columns with fixed values. So in this example we add a column val2 with on each row a 1.

# Add a column val2 with value 1 on each row.
df['val2'] = 1
df

Which shows the new column val2 with values 1 in each row.

          Fruit  Number  Rating  val  val2
0         Apple       3       8  324     1
1        Banana       4       9   35     1
2        Cherry      25       7  645     1
3  Dragon Fruit       2       9  867     1
4    Elderberry      50       5   78     1

Adding a column with values based on a condition

The next step we do is adding a column Review to this DataFrame with value 1 if the Rating is equal or higher than 8 and value 0 if the Rating is lower than 8. To check with a condition we need to import NumPy as np

import numpy as np

# Adding a column Review with value 1 if Rating>=8 and 0 if Rating<8.
df['Review'] = np.where(df['Rating']>=8, '1', '0')
df

The addition of NumPy makes the use of DataFrames very powerful. The result is:

          Fruit  Number  Rating  val  val2 Review
0         Apple       3       8  324     1      1
1        Banana       4       9   35     1      1
2        Cherry      25       7  645     1      0
3  Dragon Fruit       2       9  867     1      1
4    Elderberry      50       5   78     1      0

Adding a column with several values

In the previous example we have seen how to make a new column with just two values 1 and 0. However, what to do if we want to make several values, for instance ‘delicious’, ‘plain’ and ‘not good’? For this we need to define a function which distinguishes the column Rating on:

  • values below 6: not good;
  • between 6 and 8: plain;
  • and values 8 and above: delicious.
# Define a function which helps classifying the values in Rating
def f(row):
    if row['Rating'] < 6:
        classification = 'not good'
    elif row['Rating'] < 8:
        classification = 'plain'
    else:
        classification = 'delicous'
    return classification

# Create a new column Review2 using the function above
df['Review2'] = df.apply(f, axis=1)
print(df)

The result of this is shown below:

          Fruit  Number  Rating  val  val2 Review   Review2
0         Apple       3       8  324     1      1  delicous
1        Banana       4       9   35     1      1  delicous
2        Cherry      25       7  645     1      0     plain
3  Dragon Fruit       2       9  867     1      1  delicous
4    Elderberry      50       5   78     1      0  not good

Use compared values in columns to create a new column

Imagine we did two tests on the fruit and we only want to buy the best fruit. The second test gave different test results. In the situation where the second test gave higher results we want to mark the fruit with further_check as ‘yes’, and in all other cases ‘no’. How do we do this?

# Add new column test1, and test2
df = df.assign( Test1 = [7,9,5,6,6])
df = df.assign( Test2 = [7,10,7,8,5])

# Create the new column 'Further_check' with value yes if the values of Test2 were higher than Test1.
# Lower values are marked with no.
df['Further_check'] = np.where(df['Test2']>=df['Test1'], 'yes', 'no')

Our DataFrame now looks as follows:

          Fruit  Number  Rating  val  val2 Review   Review2  Test1  Test2  \
0         Apple       3       8  324     1      1  delicous      7      7   
1        Banana       4       9   35     1      1  delicous      9     10   
2        Cherry      25       7  645     1      0     plain      5      7   
3  Dragon Fruit       2       9  867     1      1  delicous      6      8   
4    Elderberry      50       5   78     1      0  not good      6      5   

  Further_check  
0           yes  
1           yes  
2           yes  
3           yes  
4            no  

Last for this article is how to subset our dataframe based on the condition ‘Further_check=yes?

# Use the .loc method to subset for rows with certain values, store in df_check
df_check = df.loc[df['Further_check'] == 'yes', 'Fruit']

# Print df_check
print(df_check)

Resulting in our list of fruits that we definitely need to check further:

0           Apple
1          Banana
2          Cherry
3    Dragon Fruit

So much fruit makes me want to take a bite. The python script you can find here as a .zip file. I hope this helped you a bit further, if you have a question feel free reach out to at info [ at ] hylkerozema.nl

Enjoy fruits!

Add a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.