# Spatial Search Functionality With Neo4j

WL

William Lyon / January 03, 2023

In this post we explore some techniques for working with geospatial data in Neo4j. We will cover some basic spatial Cypher functions, spatial search, routing algorithms, and different methods of importing geospatial data into Neo4j. Be sure to sign up for my newsletter to be notified of new blog posts like these:

Want to know when the next blog post or video is published? Subscribe now!

This post is based on my talk at NODES 2022, so you can watch the video recording instead:

## .css-12m0k8p{pointer-events:auto;}Spatial Cypher Functions And The .css-1xqfck9{display:inline-block;font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",monospace;font-size:0.84em;padding-left:0.2em;padding-right:0.2em;border-radius:0.125rem;background-color:#EDF2F7;color:#1A202C;}`Point` Type.css-16wuya{color:#3182ce;font-weight:400;outline:none;opacity:0;margin-left:0.375rem;}.css-16wuya:focus{opacity:1;box-shadow:0 0 0 3px rgba(66,153,225,0.6);}#

Native geospatial functionality in Neo4j is based on support for a native `Point` type. The `Point` data type can be used to represent point data using either 3D or 2D geographic (WGS84) or cartesian coordinates reference systems (CRS). Points (or arrays of points) can be stored as properties on nodes or relationships in Neo4j. Cypher contains several functions for working with points, notably:

• `point()` - for creating `Point` instances
• `point.distance()` - calculates distance between two points. Can also be used for radius distance search when used in a `WHERE` predicate
• `point.withinBBox()` - to find points within a given bounding box

Point instances can be created using the `point()` function in Cypher and passing an object that specifies latitude, longitude, and optionally height (when using geographic CRS) or x, y and optionally z (when using cartesian CRS). In this article we will focus on the 2D geographic CRS (using latitude an longitude).

Here we create a point instance by specifying latitude and longitude:

``````RETURN point(
{latitude:49.38713, longitude:12.12711}
)
``````
``````╒════════════════════════════════════════════════╕
│"point({latitude:49.38713, longitude:12.12711})"│
╞════════════════════════════════════════════════╡
│point({srid:4326, x:12.12711, y:49.38713})      │
└────────────────────────────────────────────────┘
``````

To set a point property value we can use the `SET` Cypher command. Here we create a new `PointOfInterest` node representing a bakery and set the `location` property to a point value:

``````CREATE (p:PointOfInterest)
SET p.name = "West Side Bakery",
p.location = point(
{latitude:49.38713, longitude:12.12711}
)
RETURN p
`````` We can use the `point.distance()` function to search for our bakery. Here we search for nodes with the `PointOfInterest` label that are within 200 meters of a given latitude and longitude:

``````MATCH (p:PointOfInterest)
WHERE point.distance(
p.location, point({latitude:49.38713, longitude:12.12711})
) < 200
RETURN p
``````

Let's look at some real-world data to see how we can implement spatial search functionality in a simple web map application.

## Spatial Search With Neo4j#

A common feature for web applications that deal with geospatial data involves spatial search: finding things near other things or within the bounds of a box or perhaps within a polygon. Let's see how we can use Neo4j to search for points of interest on a map using Leaflet.js for the following types of spatial search:

2. Within Bounding Box
3. Within Polygon I wanted to use a real world dataset, so I pulled in points of interest from the Daylight Earth Table distribution of OpenStreetMap. The Daylight Earth Table is based on the Daylight OpenStreetMap distribution, but adds a data schema that classifies OpenStreetMap tags into a three level ontology: theme, class, and subclass. We'll be using the "poi" theme. Currently there are almost 39 million points of interest in the dataset. The Daylight Earth Table is available in Parquet files stored in S3 so I used the `pandas` Python library to read the Parquet files from S3 into a DataFrame. By converting the DataFrame to Python dicts and passing in batches as Cypher parameters we can import the dataset in a few minutes.

``````# Here we define a function to split our DataFrame into batches for import

def insert_data(query, rows, batch_size=10000):
total = 0
batch = 0
start = time.time()
result = None

while batch * batch_size < len(rows):
res = conn.query(query, parameters = {'rows': rows[batch*batch_size:(batch+1)*batch_size].to_dict('records')}, db="osmpois")
total += res['total']
batch += 1
result = {"total": total, "batches": batch, "time": time.time()-start}
print(result)
return result
``````
``````def insert_pois(rows, batch_size=10000):
print(rows)
query = '''
UNWIND \$rows AS row
CREATE (p:PointOfInterest {name: row.names.local})
CREATE (g:Geometry)
SET g.location = point({latitude: toFloat(row.point), longitude: toFloat(row.point) })
CREATE (g)<-[:HAS_GEOMETRY]-(p)
SET g:Point
CREATE (t:Tags)
SET t += row.original_tags_dict
CREATE (p)-[:HAS_TAGS]->(t)
WITH *
CALL apoc.create.addLabels(p, [row.class, row.subclass]) YIELD node
RETURN COUNT(*) AS total
'''
return insert_data(query, rows, batch_size)
`````` See the code on GitHub: johnymontana/daylight-earth-graph

Now that we have all points of interest loaded into Neo4j we'll create a point index to enable fast index-backed spatial search operations.

``````CREATE POINT INDEX poiIndex
FOR (p:Point) ON (p.location)
``````

Our first spatial search type is radius distance search: finding all points of interest within a given distance of a point. Here we search for all points of interest within 200 meters of a point in Northern California.

``````PROFILE MATCH (p:Point)
WHERE
point.distance(p.location, point({latitude: 37.563434, longitude:-122.322255})) < 200
RETURN p{.*} AS point
``````

By prepending `PROFILE` to our Cypher query we will render the database query plan as well as basic performance metrics for the query, such as memory usage, execution time, number of rows at each operation, and db hits. We can see that our point index is used resulting in an execution time of 1-2ms. Compare to the execution plan on the left which does not use an index and rather must scan every point of interest and takes about 4 seconds to execute. When we're dealing with millions of nodes it behooves us to make sure our spatial search operations are using an index for best performance. The next type of spatial search to consider is searching within a rectangular bounding box. We will use the `point.withinBBox` Cypher predicate function to accomplish this. This function takes 3 arguments: a point value that defines a location and two other point values that define the bounding box's lower left and upper right points.

``````MATCH (p:Point)
WHERE point.withinBBox(
p.location,
point({longitude:-122.325447, latitude: 37.55948 }),  // Lower left point
point({longitude:-122.314675 , latitude: 37.563596})) // Upper right point
RETURN p
``````

We can prepend `PROFILE` to the above Cypher query to verify that we are indeed using the index for this spatial search operation. Point in polygon search will return points within the bounds of an arbitrary polygon. Point in polygon search is not natively supported in Neo4j, however with some help on the client side of our application we can accomplish a polygon search while taking advantage of the point database index to ensure the search is fast. To accomplish this we will use the following steps:

1. Convert polygon to bounding box
2. Cypher `withinBBox` query
3. Filter results to those within polygon Index backed point in polygon search can be accomplished by first converting the polygon to a bounding box, using Cypher's `point.withinBBox` predicate function to find points within the bounding box (using our database index), and then filtering the results on the client to the polygon bounds. Example code here, using Turf.js:

``````const polygon = layer.toGeoJSON();
var bbox = turf.bbox(polygon); // convert polygon to bounding box

// Within Bounding Box Cypher query
const cypher = `
MATCH (p:Point)-[:HAS_GEOMETRY]-(poi:PointOfInterest)-[:HAS_TAGS]->(t:Tags)
WHERE point.withinBBox(
p.location,
point({longitude: \$lowerLeftLon, latitude: \$lowerLeftLat }),
point({longitude: \$upperRightLon, latitude: \$upperRightLat}))
RETURN p { latitude: p.location.latitude,
longitude: p.location.longitude,
name: poi.name,
categories: labels(poi),
tags: t{.*}
} AS point
`;

var session = driver.session({
database: "osmpois",
});

session
.run(cypher, {
lowerLeftLat: bbox,
lowerLeftLon: bbox,
upperRightLat: bbox,
upperRightLon: bbox,
})
.then((result) => {
const bboxpois = [];
result.records.forEach((record) => {
const poi = record.get("point");
var point = [poi.longitude, poi.latitude];
bboxpois.push(point);
});
// filter results of bouding box query to polygon bounds
const poisWithin = turf.pointsWithinPolygon(
turf.points(bboxpois),
polygon
);

poisWithin.features.forEach((e) => {
L.marker([
e.geometry.coordinates,
e.geometry.coordinates,
])
.bindPopup("Polygon");
})
``````

You can find the code for the spatial search examples (including using the Leaflet.js web map with Neo4j) on GitHub: johnymontana/geospatial-graph-demos

### Working With Line Geometries#

So far we've been working with points, let's take a quick look at an example using line geometries. For this quick example I exported my Strava activities data and loaded into Neo4j. To import data, first export user data from Strava then to add activities we will use the exported `activities.csv` file and load individual activity routes from the GPX files (first converted to geojson):

``````// Create Activity Nodes
MERGE (a:Activity {activity_id: row.`Activity ID`})
SET a.filename = row.Filename,
a.activity_type = row.`Activity Type`,
a.distance = toFloat(row.Distance),
a.activity_name = row.`Activity Name`,
a.activity_data = row.`Activity Date`,
a.activity_description = row.`Activity Description`,
a.elevation_high = toFloat(row.`Elevation High`),
a.elevation_loss = toFloat(row.`Elevation Loss`),
a.elevation_gain = toFloat(row.`Elevation Gain`),
a.elevation_low = toFloat(row.`Elevation Low`),
a.moving_time = toFloat(row.`Moving Time`),
a.max_speed = toFloat(row.`Max Speed`),

// Parse geojson geometries and create Geometry:Line nodes
MATCH (a:Activity)
WITH a WHERE a.filename IS NOT NULL AND a.filename CONTAINS ".gpx"
MERGE (n:Geometry {geom_id:a.activity_id })
MERGE (n)<-[:HAS_FEATURE]-(a)
WITH n,a
CALL apoc.load.json('file:///' + replace(a.filename, '.gpx', '.geojson')) YIELD value
UNWIND value.features.geometry.coordinates AS coord
WITH n, collect(point({latitude: coord, longitude: coord})) AS coords
SET n.coordinates = coords
SET n:Line;
``````

We store the routes as an array of point values to represent the line geometry of each route. Now, we can use the same spatial search techniques described above with our line geometries. By using the list predicate `any` Cypher function in our query we can check if at least one point in the line are within the radius distance search distance.

``````WITH point({latitude: \$latitude, longitude: \$longitude}) AS radiusCenter
MATCH (g:Geometry)<-[:HAS_FEATURE]-(a:Activity)
WHERE any(
)
RETURN *
``````

Code for this example is available on GitHub here.

## Routing With Graph Algorithms#

One more use case I'd like to touch on besides spatial search is that of routing. We'll use the Graph Data Science Neo4j Sandbox to find airline routes between airports. There are a number of relevant path finding algorithms we might consider. In this case we will use Dijkstra's algorithm, an extension of breadth first search that uses relationship weights to find the shortest path between two nodes, or airports in our case. Dijkstra's algorithm is one of many available in Neo4j's Graph Data Science library. First, we will project our airline route graph into an in-memory projected graph structure used by the Graph Data Science library:

``````CALL gds.graph.project(
'routes-weighted',
'Airport',
'HAS_ROUTE',
{
relationshipProperties: 'distance'
}
) YIELD graphName, nodeProjection, nodeCount,
relationshipProjection, relationshipCount
``````

Then, to search for the shortest path between two airports we use the `gds.shortestPath.dijkstra` procedure:

``````MATCH (source:Airport {iata: 'PUQ'}), (target:Airport {iata: 'HNL'})
CALL gds.shortestPath.dijkstra.stream('routes-weighted', {
sourceNode: source,
targetNode: target,
relationshipWeightProperty: 'distance'
})
YIELD index, sourceNode, targetNode,
totalCost, nodeIds, costs, path
RETURN
nodes(path) as path
`````` In this post we introduced some of the spatial functionality natively supported by Neo4j including the point type and related Cypher functions and demonstrated how to accomplish various spatial search operations as well as a brief look at routing with graph algorithms.

## Resources#

For comments and discussion please join the conversation for this post at Dev.to