Using Python to Unzip

by G. Peterson, PetersonGIS
October 1, 2009

Recently I had to download 10 SSURGO soils datasets from the Geospatial Data Gateway. Each of the 10 datasets, representing soils polygon data for 10 Midwestern U.S. states, came in large zip files. In most cases there was a separate zip file for each county in each state. After putting all these into one folder on my hard drive I decided some programming was in order. The client's previous GIS person had manually extracted all the zip files — though for only one state. I wasn't about to do that for 10 states worth of zip files. In the future, the client wants this done for every state in the contiguous U.S. so it'll be good to have an automated procedure in place.

Not only that, but all I needed was one shapefile that was buried within the SSURGO file structure. It went something like this: You start with a zip file — say soil_ia003 for example — then you unzip it, click on the Spatial folder (there are two folders and 3 documents), then find the shapefile with the name soilmu_a_ia003. Only, of course, the last 5 characters would change depending on the state abbreviation and the last 3 digits of the county FIPS code. My ultimate goal was to get all the shapefiles with the soilmu_axxxxx title into a single folder so I could then do some batch GIS processes on them with yet more code.

The resulting code (please excuse the hard-coded file paths, etc.) looks like this:

import os
import zipfile

print ""
print ""
print "--------------------------------------------"
print " Extracting soilmu_a_* Shapefiles"
print "--------------------------------------------"

tempDir = "d:\\temp"
extractDir = "d:\\Projects\\ColoradoStateUniversity\\Data\\SoilKeys"

def extractSoilsDataTest( dir ):
for item in os.listdir( dir ):
if os.path.isdir( os.path.join( dir , item ) ):
extractSoilsDataTest( os.path.join( dir , item ) )

if (item.find(".zip") > -1):
extractFile( dir, item )
#print item


def extractFile( dir, filename ):
if (filename.find(".zip") == -1 ):
return;

print os.path.join(dir, filename)
file = open(os.path.join(dir, filename), 'rb')
zip = zipfile.ZipFile(file)

for compressedFile in zip.namelist():
if ( compressedFile.find(".zip") > -1 ):
print os.path.join(tempDir,compressedFile)
# might be able to use os.tmpfile() here and avoid
# writing permanant files to disk
outfile = open( os.path.join(tempDir,compressedFile), 'wb')
outfile.write(zip.read(compressedFile))
outfile.close()
extractFile( tempDir, compressedFile )

if ( compressedFile.find("soilmu_a_") > -1 ):
writeToFile = compressedFile.replace("/", "\\")
print writeToFile
extractTo = os.path.join(extractDir,writeToFile)
verifyDir( extractTo )
outfile = open( extractTo, 'wb')
outfile.write(zip.read(compressedFile))
outfile.close()

file.close()


def verifyDir( filename ):
lastIndex = filename.rfind("\\")
directory = filename[:lastIndex]
if not os.path.isdir( directory ):
os.makedirs( directory )


extractSoilsDataTest( "d:\Projects\ColoradoStateUniversity\Data\SSURGO" )