← Return to program

Fast, accurate, open-source geocoding in Python

Sunday 11:15 AM–11:45 AM in Hall B

Anytime you search for a cafe on Google Maps, you are using a geocoder: an algorithm that converts a text input into a standardized output, along with the precise geographic coordinates. Mapping points on a journey, linking large customer databases, even contact tracing all involve geocoding. Google’s Geocoding API is a common way to do this, however this can be costly and slow when there are a large number of locations. Furthermore, in sensitive domains such as healthcare, the use of an external API may not be permitted.

So why don’t we make our own geocoder in Python instead?

In this talk, I will introduce an open source Python package for geocoding. Aimed at anyone who tinkers with geospatial data, I will describe how it was built, along the way exploring a few different geocoding algorithms as well as tools such as embedded databases and text embeddings.

See this talk and many more by getting your ticket to PyCon AU now!

I want a ticket!

Address data is one of the most common (and messiest) forms of data. Typical problems that arise when working with addresses include: finding a restaurant; linking large customer databases; pinpointing the locations of businesses, events or places of interest; or mapping out a cycling trip.

Answering these types of questions at scale requires geocoding algorithms, which convert an address to a standardized form along with a pair of coordinates; or reverse geocoding, that involves converting coordinates to an address. However, commercial tools such as Google’s Geocoding API may not provide a suitable solution due to cost and speed limitations or technical constraints when working with sensitive data.

In this talk, I will introduce an open source Python package for geocoding and reverse geocoding.

I will begin by presenting some typical use cases, before describing the challenges and some algorithmic approaches. Along the way we’ll delve into the wealth of publicly available, high quality location data, and learn about embedded databases and more elaborate techniques such as text embeddings, before presenting the final package.

This talk is aimed at anyone with even a cursory interest in geospatial data, from beginners to geospatial analysts, or who just wants to get a better idea of how tools like Google’s geocoder work under the hood.

Alex Lee He/Him • saunteringcat

Alex is currently a researcher and data scientist whose work involves developing machine learning and statistical algorithms for application to large-scale healthcare datasets. He has experience working on data science projects in business, government and academia, ranging from signal processing and machine learning applied to time series data, to geospatial analysis, and most recently NLP applied to problems in health care. His background is in maths and since 2015 has been part of the data world. Outside of work he enjoys cycling, middle-eastern cooking experiments, languages, books and spending time with friends and family.