GPKG文件导入数据库

以全球城市数据的GeoPackage(GeoPackage - SQLite-based format for geospatial data)文件为例,下载链接https://geodata.ucdavis.edu/gadm/gadm4.1/gadm_410-gpkg.zip

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import geopandas as gpd
from sqlalchemy import create_engine


gdf = gpd.read_file("gadm_410.gpkg")
# 创建数据库引擎
db_string = "postgresql://postgres:xxxxxx@127.0.0.1:5432/geo"
engine = create_engine(db_string)
total = len(gdf)
chunk_size = 1000

# 数据量大,全量写入会报内存错误,分批写入
for i in range(0, total, chunk_size):
    sub_gdf = gdf.iloc[i:i+chunk_size]
    # print(i,end='\t,')
    if i==0:
        sub_gdf.to_postgis(name="world_city", con=engine, if_exists="replace", index=False)
    else:
        sub_gdf.to_postgis(name="world_city", con=engine, if_exists="append", index=False)
1
2
LOAD spatial;
CREATE TABLE world_city AS SELECT * FROM ST_Read('gadm_410.gpkg');

或者不导入,以DuckDB为桥梁,直接查询gpkg数据库文件

1
2
LOAD spatial;
SELECT UID FROM ST_Read('gadm_410.gpkg') WHERE ST_Contains(geom, ST_Point(76.3132597490079, 12.551483941107392)) OR ST_Contains(geom, ST_Point(98.28512001168576, 38.06662865215472));

这是一个伪命题,gpkg文件本身就是sqlite3数据库文件,可直接使用sqlite3打开,只是要进行spatial查询,需要先加载Spatialite扩展,作者通过多次尝试,发现查询结果不对,以 WHERE ST_Contains(geom, ST_Point(76.3132597490079, 12.551483941107392)) OR ST_Contains(geom, ST_Point(98.28512001168576, 38.06662865215472))为条件查询,结果返回了所有记录,而不是2条记录。迷之原因