RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://gis.stackexchange.com/questions/493985/speeding-up-spatial-filtering-in-geopandas below:

clip - Speeding up spatial filtering in GeoPandas

Unioning the lakes into one big multipolygon will slow down any processing, you shouldnt do this if you dont have to. And checking if each river is intersecting it will be slow since no spatial index is used.

I have a polygon df with 21 k lakes and a line df with 42 k lines:

import geopandas as gpd
from timeit import default_timer
df_lakes = gpd.read_file(r'C:/Users/bera/Desktop/gistest/lakes_20k.gpkg')
df_rivers = gpd.read_file(r'C:/Users/bera/Desktop/gistest/river_42k.gpkg')
# print(df_lakes.shape[0])
# 20716
# print(df_rivers.shape[0])
# 41600

Your method:

#Union the polygons into one multipolygon, for each river check if it intersects the multipolygon
start = default_timer()
lakes_union = df_lakes.union_all() #Union the lakes into one
union_time = default_timer()-start
print(f"Time to union: {round(union_time, 1)} s")
#Time to union: 11.6

inter_mask = df_rivers.geometry.intersects(lakes_union) #Check intersections
intersection_time = default_timer()-start-union_time
print(f"Time to check for intersections: {round(intersection_time,1)} s")
#Time to check for intersections: 37.9

df_rivers_intersecting_lakes = df_rivers[inter_mask] #Mask/select the intersecting rivers
selection_time = default_timer()-start-union_time-intersection_time
print(f"Time to select: {round(selection_time,1)} s")
#Time to select: 0.0 s

print(f"Total time: {round(default_timer()-start,1)} s")
#Total time: 49.5 s

print(df_rivers_intersecting_lakes.shape[0])
#13478 rows

Try Spatial join instead, it uses spatial index:

start = default_timer()
df_rivers["unique_id"] = range(df_rivers.shape[0]) #Create a unique id
df_rivers_intersecting_lakes = df_rivers.sjoin(
    df=df_lakes[["geometry"]], how="inner", predicate="intersects") 
#Each river can intersect multiple lakes, so duplicates are created. Drop them by the unique id
df_rivers_intersecting_lakes = df_rivers_intersecting_lakes.drop_duplicates(
    subset=["unique_id"]).drop(columns=["unique_id", "index_right"])
print(f"Total time for spatial join: {round(default_timer()-start, 1)}")
#Total time for spatial join: 0.2

print(df_rivers_intersecting_lakes.shape[0])
#13478 rows

250 times faster.

And for intersection, difference etc. you should use geopandas.overlay or clip

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4