Login  Register

Using pandas with chunksize (Best for Large Files)

classic Classic list List threaded Threaded
1 message Options Options
Embed post
Permalink
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Using pandas with chunksize (Best for Large Files)

pythonsevenmentor
1 post
Using pandas with chunksize (Best for Large Files)
The chunksize parameter processes the CSV file in smaller portions, avoiding memory overload.
import pandas as pd

# Define filter condition function
def filter_chunk(chunk):
    return chunk[chunk["column_name"] > 50]  # Example: Filter rows where column_name > 50

# Process in chunks and write to a new CSV
chunksize = 10000  # Adjust based on available memory
filtered_data = pd.concat(filter_chunk(chunk) for chunk in pd.read_csv("large_file.csv", chunksize=chunksize))

# Save filtered data
filtered_data.to_csv("filtered_file.csv", index=False)

Do visit: Python course in Pune