Pandas DataFrames - askthedev.com

In the world of data analysis and manipulation, Pandas is one of the most widely used libraries in Python, particularly known for its powerful data structures and data analysis capabilities. One of its central features is the DataFrame, which offers an efficient way to handle and process structured data. This article will guide you through the basics of using DataFrames in Pandas, catering to complete beginners with clear examples, tables, and responsive exercises.

1. What is a DataFrame?

A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). Think of it like a spreadsheet in Excel or a SQL table.

2. Creating a DataFrame

2.1 From a Dictionary

You can create a DataFrame from a dictionary where keys are the column names and values are lists of column values.


import pandas as pd

data = {
    "Name": ["Alice", "Bob", "Charles"],
    "Age": [25, 30, 35],
    "City": ["New York", "Los Angeles", "Chicago"]
}

df = pd.DataFrame(data)
print(df)

2.2 From a List of Tuples

Another method to create a DataFrame is from a list of tuples. Each tuple represents a row in the DataFrame.


data = [
    ("Alice", 25, "New York"),
    ("Bob", 30, "Los Angeles"),
    ("Charles", 35, "Chicago")
]

df = pd.DataFrame(data, columns=["Name", "Age", "City"])
print(df)

2.3 From a List

Creating a DataFrame from a list of lists is also possible. Each inner list forms one row in the DataFrame.


data = [["Alice", 25, "New York"], ["Bob", 30, "Los Angeles"], ["Charles", 35, "Chicago"]]

df = pd.DataFrame(data, columns=["Name", "Age", "City"])
print(df)

2.4 Using a CSV File

Pandas can easily read data from CSV files and convert it into a DataFrame.


df = pd.read_csv("data.csv")
print(df)

3. Viewing Data in a DataFrame

To view the entire DataFrame, you can simply print it. However, for larger DataFrames, you may want to view just a specific number of rows.


print(df.head(3))  # Display the first 3 rows
print(df.tail(3))  # Display the last 3 rows

4. Selecting Data in a DataFrame

4.1 Selecting Rows

You can select rows by their index. Here’s how to select specific rows:


selected_rows = df.iloc[0:2]  # Selects the first 2 rows
print(selected_rows)

4.2 Selecting Columns

To select a single column, use its label:


age_column = df["Age"]
print(age_column)

4.3 Selecting Specific Rows and Columns

You can select specific rows and columns using .loc and .iloc:


specific_selection = df.loc[0:1, ["Name", "City"]]  # Selects rows 0 to 1 and specified columns
print(specific_selection)

5. Modifying Data in a DataFrame

5.1 Adding a New Column

Adding a new column is straightforward:


df["Salary"] = [50000, 60000, 55000]
print(df)

5.2 Renaming a Column

Renaming a column can be done using the rename method:


df.rename(columns={"City": "Location"}, inplace=True)
print(df)

5.3 Dropping a Column

To drop a column, use the drop method:


df.drop("Salary", axis=1, inplace=True)
print(df)

6. Filtering Data in a DataFrame

Filtering allows you to extract rows that meet certain conditions. For example, to filter for ages greater than 30:


filtered_df = df[df["Age"] > 30]
print(filtered_df)

7. Sorting Data in a DataFrame

Sorting a DataFrame is easily done with the sort_values method:


sorted_df = df.sort_values(by="Age")
print(sorted_df)

8. Grouping Data in a DataFrame

You can group data for aggregation purposes. Here’s an example of grouping by a column:


grouped_df = df.groupby("City").mean()  # Takes the mean of numerical columns for each city
print(grouped_df)

9. Combining DataFrames

To combine two DataFrames, you can use methods such as concat or merge:


# Creating another DataFrame
data2 = {
    "Name": ["Diana", "Edward"],
    "Age": [28, 32],
    "City": ["Houston", "Phoenix"]
}
df2 = pd.DataFrame(data2)

# Concatenating DataFrames
combined_df = pd.concat([df, df2], ignore_index=True)
print(combined_df)

10. Conclusion

In summary, Pandas DataFrames provide an incredibly powerful and efficient way to manipulate structured data. They simplify a variety of data tasks such as creation, selection, modification, filtering, sorting, grouping, and combining.

FAQ

Q: What is Pandas?
A: Pandas is an open-source library in Python used for data manipulation and analysis.
Q: What is the benefit of using a DataFrame?
A: DataFrames allow for easy storage and manipulation of large data sets in a tabular format.
Q: How can I visualize data in a DataFrame?
A: You can use libraries such as Matplotlib or Seaborn to create visualizations based on the data in a DataFrame.

askthedev.com Latest Articles