Using pandas with chunksize (Best for Large Files)

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Using pandas with chunksize (Best for Large Files)

pythonsevenmentor
Using pandas with chunksize (Best for Large Files)
The chunksize parameter processes the CSV file in smaller portions, avoiding memory overload.
import pandas as pd

# Define filter condition function
def filter_chunk(chunk):
    return chunk[chunk["column_name"] > 50]  # Example: Filter rows where column_name > 50

# Process in chunks and write to a new CSV
chunksize = 10000  # Adjust based on available memory
filtered_data = pd.concat(filter_chunk(chunk) for chunk in pd.read_csv("large_file.csv", chunksize=chunksize))

# Save filtered data
filtered_data.to_csv("filtered_file.csv", index=False)

Do visit: Python course in Pune