I have a feature layer that contains about 460,000 records and it currently takes about 20 minutes to read that table using the arcpy.da.TableToNumPyArray()
tool. Is there a more efficient way to read the rows to then be able to manipulate these data? It seems like there should be a more “C-like” way to access the rows. Here is the function in it’s entirety, though I’m focused on the line near the bottom where I’m calling arcpy.da.TableToNumpyArray()
to read out the data:
def import_surface(surface, grid, nrow, ncol, row_col_fields, field_to_convert):
"""
References raster surface to model grid.
Returns an array of size nrow, ncol.
"""
out_table = r'in_memory{}'.format(surface)
grid_oid_fieldname = arcpy.Describe(grid).OIDFieldName
# Compute the mean surface value for each model cell and output a table (~20 seconds)
arcpy.sa.ZonalStatisticsAsTable(grid, grid_oid_fieldname, surface, out_table, 'DATA', 'MEAN')
# Build some layers and views
grid_lyr = r'in_memorygrid_lyr'
table_vwe = r'in_memorytable_vwe'
arcpy.MakeFeatureLayer_management(grid, grid_lyr)
arcpy.MakeTableView_management(out_table, table_vwe)
grid_lyr_oid_fieldname = arcpy.Describe(grid_lyr).OIDFieldName
# table_vwe_oid_fieldname = arcpy.Describe(table_vwe).OIDFieldName
# Join the output zonal stats table with the grid to assign row/col to each value.
arcpy.AddJoin_management(grid_lyr, grid_lyr_oid_fieldname, table_vwe, 'OID_', 'KEEP_ALL')
# Take the newly joined grid/zonal stats and read out tuples of (row, col, val) (takes ~20 minutes)
a = arcpy.da.TableToNumPyArray(grid_lyr, row_col_fields + [field_to_convert], skip_nulls=False)
# Reshape the 1D array output by TableToNumpy into a 2D structured array, sorting by row/col (~0.1 seconds)
a = np.rec.fromrecords(a.tolist(), names=['row', 'col', 'val'])
a.sort(order=['row', 'col'])
b = np.reshape(a.val, (nrow, ncol))
return b