API Documentation

Graph building and troubleshooting

MissingLinks.get_example_data — Function

get_example_data()

Get the example dataset used in the Quickstart and save it into the current working directory, in a folder called "example_data".

source

MissingLinks.semi_to_fully_noded — Function

semi_to_fully_noded(data...; snap_tolerance=1e-6, split_tolerance=1e-6)

Convert GDAL data sources where the sources are "semi-noded" - that is, coincident ends, or an end coincident with a line segment, count as intersections - into a fully noded graph with nodes at each intersection, ready to be built into a graph.

Data is a vararg, so arbitrarily many layers may be supplied. They will be merged into a single output layer.

The snap tolerance is how close an end has to be to a line to be considered connected. The split tolerance is how close two splits must be to each other to be considered the same split, and should generally be much smaller, to avoid a situation where two ends snap to a line at points near each other, the line is split at one of them, and the split is more than snap_tolerance distance from the other one. I further recommend setting the tolerance when building the graph to snap_tolerance + split_tolerance. These should both be quite small; larger gaps in the network should be addressed with add_short_edges!.

source

MissingLinks.graph_from_gdal — Function

graph_from_gdal(layers...; tolerance=1e-6, max_edge_length=250)

Build a graph from one or more GeoDataFrame layers.

This builds a graph from a fully-noded GIS network. You can pass in an arbitrary number of GeoDataFrame layers, which will be combined into a single graph. They should all be in the same coordinate system.

The tolerance specifies how far apart nodes can be and still be considered connected. This should be smaller; larger gaps should be closed by add_short_edges!; see discussion there for why.

We assume that potential links are always where two edges pass closest to one another. As the length of the edges increases, this assumption gets worse. max_edge_length controls how long edges can be; any edges longer than this will be split.

Both tolerance and max_edge_length are in the same units as the underlying data.

source

MissingLinks.graph_to_graphml — Function

graph_to_graphml(outfile, graph; pretty=false)

Export the graph to a graphml file specified in outfile, for visualization or manipulation with other tools.

If pretty is set to true, the XML file will be pretty-printed.

source

MissingLinks.graph_to_gis — Function

graph_to_gis(fn, G; crs=nothing, layer_name="edges", update=false)

Write graph G to GIS file fn.

The file type is determined by extension; I recommend .gpkg for GeoPackage output. The format must support multiple layers. You can specify a CRS (e.g. GFT.EPSG(32119)) so that the output has CRS information, which will make it easier to load into GIS. You can also specify a layer_name if you plan to have multiple layers in the output file. If update=true, will add layers to a file rather than overwriting.

source

MissingLinks.nodes_to_gis — Function

nodes_to_gis(G)

Extract the nodes of graph G into a GeoDataFrame.

source

MissingLinks.remove_tiny_islands — Function

remove_tiny_islands(graph, min_vertices_to_retain)

Remove islands from graph with fewer than min_vertices_to_retain, and return a modified graph.

Often, due to bad data, there will be small islands isolated from the rest of the network. This identifies all components of the graph, and returns a copy of the graph with only those components with at least min_vertices_to_retain.

This should already work for directed graphs, as for directed graphs it uses strongly connected components to determine vertices to remove. See extensive discussion here to see why this is relevant.

source

MissingLinks.add_short_edges! — Function

add_short_edges!(graph, max_edge_length)

Add edges to graph in-place between all nodes that are closer than max_edge_length to one another, and currently more than 2 * maxedgelength apart via the network.

You might think, as I did, that snapping nodes during graph build would be an easier solution, but that can be prone to errors because it results in nodes being actually moved. Consider this situation:

---a (0.3 m) b--3.1m---c (0.2m) d----

Assuming the vertices are read in a b c d order, b and c will get snapped to a, and d will not get snapped to anything.

Instead, we recommend only snapping a very small distance during graph build (default 1e-6 meters). We then add edges between nodes close together.

source

MissingLinks.find_disconnected_crossings — Function

find_disconnected_crossings(G, dmat; tolerance=10)

Find locations where edges of graph G cross without intersecting, and where the network distance between the intersecting points is more than tolerance.

These are often graph errors, but may also be overpasses, tunnels, etc. Returns a GeoDataFrames-compatible point DataFrame with the locations, for further examination in GIS.

source

MissingLinks.find_dead_ends — Function

find_dead_ends(G)

Find locations in graph G where there is a dead end.

Returns a GeoDataFrame. This is useful for network data creation as by opening this layer in GIS you can easily see where there are existing dead ends and determine if they are correct or not.

source

MissingLinks.remove_elevation! — Function

remove_elevation!(geom)

Remove the Z component of a geometry, and change the ArchGDAL wrapper type to match.

This does mutate its argument, but in order to re-wrap in a new ArchGDAL type a new wrapper must be returned. So you should use the return value of the function.

Workaround for https://github.com/yeesian/ArchGDAL.jl/issues/333

source

MissingLinks.TraversalPermissionSettings — Type

Settings about what streets are considered walkable. Contains sets walkabletags, notwalkabletags, and overridewalkabletags. Each of these is a pair "key"=>"value" (e.g. "highway"=>"footway"). A way is considered walkable if it (has a tag that is in walkabletags and does not have a tag in notwalkabletags) or (has a tag that is in overridewalkabletags). overridewalkabletags exists to handle roads that normally would not be considered walkable, but have e.g. "foot"="yes".

The default tags are below. You can modify these after constructing a TraversalPermissionsSettings, by directly adding/removing from the sets, or you can replace them completely by constructing the settings with arguments, e.g.

TraversalPermissionSettings(
    walkable_tags=Set(["highway"=>"footway"]),
    not_walkable_tags=Set(["foot"=>"no"]),
    override_walkable_tags=Set(["foot"=>"designated"])
)

If even that is not enough control, you can create your type and define MissingLinks.is_traversable to determine if a link should be included using arbitrary Julia code.

Walkable tags

highway=road
highway=footway
sidewalk=yes
highway=path
highway=sidewalk
highway=cycleway
highway=service
highway=residential
sidewalk=both
sidewalk=left
sidewalk:left=yes
sidewalk:right=yes
sidewalk:both=yes
highway=track
highway=pedestrian
sidewalk=right
highway=crossing
highway=steps

Non-walkable tags

foot=no
access=no

Override walkable tags

foot=yes
foot=designated

source

MissingLinks.graph_from_osm — Function

graph_from_osm(pbf, settings, projection)

Build a graph from OpenStreetMap PBF data. pbf should be the path to a PBF file.

settings is a an object that determines what edges are considered traversable. It should have a method defined MissingLinks.is_traversable(settings, way::Way) where way is a Way object from OpenStreetMapPBF.jl, which returns true if a way should be included in the graph. The simplest is to use a [TraversalPermissionSettings]{@ref} object.

MissingLinks always works with Euclidean coordinates, so coordinates need to be projected. projection is a GeoFormatTools coordinate system to project the OSM data to. e.g. for North Carolina I use GeoFormatTools.EPSG(32119).

source

MissingLinks.is_traversable — Function

is_traversable(settings, way)

Based on settings, should way be included in the graph? Override if you are creating custom traversal permissions settings objects.

source

MissingLinks.extent — Function

extent(G)

Get the spatial extent of graph G. Returns a NamedTuple with members min_x, max_x, min_y, max_y. The bounding box is based only on the vertices.

source

Opportunity data

MissingLinks.create_graph_weights — Function

create_graph_weights(graph, geodata, weightcols, distance)

Using a graph and a spatial data frame, assign weights to graph nodes.

Each row in the spatial data frame should have a weight associated with it (if they are all equivalent, the weight can be set to a column of ones). For each row, all graph edges within distance of the geometry of that row will be identified (the spatial data frame should be in the same projection as the network data used to create the graph). An equal amount of weight will be assigned to the nodes at each end of all identified edges. (In a corner, twice as much weight will be assigned to the node at the corner, as it will receive weight from each of the links adjacent to the property. The effect may be even more pronounced if the streets on the other side of the intersection from the row are also within distance).

weightcols specifies which columns to use as weights. It must be a vector. Multiple weight columns can be specified. The function will return an n x w matrix, where n is the number of nodes in the graph, and w is the number of weightcols specified. This will contain the weight for each node for each weightcol.

source

Link identification, deduplication, and scoring

MissingLinks.fill_distance_matrix! — Function

fill_distance_matrix!(G, mtx::AbstractMatrix{T}; maxdist=5000, origins=1:nv(G))

Fill an already created distance matrix mtx with shortest-path distances based on graph G.

The units are the same as the underlying data, and will be rounded to the resolution of whatever the element type T of mtx is. We usually use UInt16 meters as these can represent quite long trips with reasonable accuracy while minimizing memory consumption. Using other data types is not tested.

To make this faster, you can set a maximum distance maxdist; for destinations beyond this distance (or destinations that are unreachable altogether) the matrix will contain typemax(T).

Will use multiple threads if Julia is started with multiple threads.

source

MissingLinks.identify_potential_missing_links — Function

identify_potential_missing_links(graph, distance_matrix, max_link_dist, min_net_dist)

Identify locations in graph that are less than max_link_dist apart geographically, but at least min_net_dist apart in network distance (or disconnected entirely).

dmat should be a distance matrix for all nodes in the graph, generally created by fill_distance_matrix!

source

MissingLinks.deduplicate_links — Function

deduplicate_links(graph, list, distance_matrix, sphere_of_influence_radius)

Deduplicate links using a heuristic.

Arguments are a list of links (e.g. as produced by identify_potential_missing_links), a distance matrix between nodes for the graph used to identify those links, and the radius of the sphere of influence (see below).

Consider the situation below. Lines are existing roads. Here, you have the ends of two blocks in different subdivisions, that do not connect.

--a--o    o--d--
     |    |
     c    e
     |    |
--b--o    o--f--

Since there are three edges in each subdivision in the graph above, all of which pass within (say) 100m of each other, there are nine possible candidate links: ad, ae, af, bd, be, bf, cd, ce, cf. Clearly, all of these provide essentially the same level of access, so we want to deduplicate them.

We consider two candidate links to be duplicates if they connect pairs of locations within a sphere_of_influence_radius of one another (we use 100m in the paper). To identify these links, we use the following greedy algorithm. Note that since it is a greedy algorithm, the order the links are identified in matters. Currently, link identification is not multithreaded, so the order of the links should be deterministic, and therefore you should get the same results from this deduplication whenever it is run.

For the first candidate link, we identify the "sphere of influence" of that link; the sphere of influence is all nodes that are within 100m of either end of the candidate link. We calculate this using the node-to-node distance matrix and the distances of the link from the start and end of the edges it connects. We do not need to define separate spheres of influence for the start and end of the link—as long as the radius of the sphere of influence is less than half the minimum network distance for proposing a link, any candidate link that connects two nodes in the sphere of influence must have one end near each end of the original link that defined the sphere of influence.

For subsequent candidate links, we determine if the link is within an already-identified sphere of influence. We check to see if one of the ends of each of the edges the candidate link connects is in an already identified sphere of influence. If it is, we further check to see if the network distances between the ends of this candidate link and the corresponding ends of candidate link that defined the sphere of influence are less than the threshold for both ends of the links (note that links are undirected, so we explicitly consider links duplicates even if the start of one is near the end of the other and vice versa). If these network distances are both less than the radius of the sphere of influence, we add the link to this sphere of influence and continue to the next link. If they are not, we continue to the next sphere of influence. If a link is not within any existing sphere of influence, we define a new sphere of influence based on the link.

We then return the link with the shortest geographic distance from each sphere of influence. Note that there may still be links that are close to one another; if two links define adjacent or even overlapping spheres of influence, but with neither link in either sphere of influence, the best links in the two spheres of influence may be very similar ones in between the original two links. You could iterate the algorithm if you wanted to to prevent this, but this could result in links being supplanted by ones more than 100m away.

source

MissingLinks.score_links — Function

score_links(decay_function, graph, links, distance_matrix, origin_weights, dest_weights, decay_cutoff_m)

Score the contribution of each link in links to aggregate accessibility, using the decay function and weights specified.

decay_function is a Julia function that takes a single argument, a distance, and returns a weight; smaller weights mean less influence. For a two-mile cumulative opportunities function, this could just be e.g. x -> x < 3218. It is the first argument to allow use of Julia do-block syntax.

links is the vector of links (e.g. from deduplicate_links). The distance matrix is the same one used throughout the analysis. origin_weights and dest_weights are vectors of weights associated with each node, e.g. computed by create_graph_weight.

decay_cutoff_m is the distance at which decay_function goes to zero (for a smooth function, we recommend reimplementing as piecewise with a dropoff to zero at some point where additional access is largely immaterial). It is in meters if the input data were in meters (which we recommend as we have not tested with other units to ensure units are not hardcoded anywhere. Just use meters. Be a scientist.)

For each link, we identify how much making this connection would increase the aggregate number of origin-weighted destinations (i.e. the sum across origins of the number of destinations accessible within a threshold distance). We do this in several steps:

Identify all origins and destinations within the threshold of the start of the proposed new link
For each origin and destination: a. generate distances from the origin to the destination via the new link by summing - Distance from origin to link - Length of link - Distance from link to each destination b. if the distance via the new link is longer than the existing distance, continue to the next pair c. otherwise, compute the distance weight using the weighting function in the accessibility metric for the distance via the new link and the existing distance d. take the difference of these weights e. multiply by the origin and destination weights
Sum the results
Reverse the start and end of the link, and repeat
Sum the results

source

MissingLinks.links_to_gis — Function

links_to_gis(graph, links, pairs...)

Convenience function that converts a vector of links (and associated graph) into a GeoDataFrame, suitable for export to a GIS file.

pairs` should be pairs of :col_name=>values that will be added to the output. For example, you will often want to add scores to your output.

source

Routing and mapping of impacts of individual links

MissingLinks.route_one_to_one — Function

route_one_to_one(G, origin, dest; crs=nothing, index=nothing)

Do a one-to-one route. If CRS is specified, origin and dest can be in lat-lon (not lon-lat) coordinates and will be reprojected to the specified CRS before routing.

If you're doing many routes, setting index to the result of MissingLinks.index_graph_edges` will speed up results.

If you want to do a route on a graph with candidate link(s) included, see realize_graph`.

source

MissingLinks.distance_surface — Function

distance_surface(G, origin; maxdist=5000, resolution=25, crs=nothing)

Get a Raster.jl raster of the travel times from origin.

source

MissingLinks.regional_access — Function

regional_access(decay_func, G, dmat, link, weights, decay_cutoff; resolution=50)

Perform a regional access analysis (change in access to weights for each point) as a result of the link. Return a Rasters.jl raster with the values at each point in the area around the link. decay_cutoff is the distance (in coordinate system units) after which additional destinations have no impact on accessibility.

source

Graph partitioning

Large graphs are intractable; these tools can be used to split up a graph and analyze it.

MissingLinks.GraphPartition — Type

A GraphPartition represents a subgraph of a larger graph, defined by a bounding box. The subgraph is actually buffer units larger than the bounding box, to avoid edge effects, but only candidate links with an end inside the bounding box should be considered. Consider a portion of a graph that looks like this:

✂

A –✂–- B ✂ | ✂ | ✂ | D–-✂–- C ✂

A is relatively directly connected to D, but suppose where the scissors are is the edge of a partition that extends to the left. In the partition that extends to the left, AB and CD will be present, but BC will not as it is completely outside the partition. This will result in a candidate link AB->CD being identified and may score highly as it closes a major gap in the graph. This subgraph will be completely within the partition to the right, and we will get the correct link and score there, but the score may be lower or AB->CD may not be identified at all.

I think what we need to do is only include links where at least one end of them is in the "main" part of the partition, and then increase the buffer to maxdist + maxlinkdist.

source

MissingLinks.partition — Function

partition(G, nrow, ncol, overlap)

Partition graph G into nrow x ncol subgraphs. This is useful with graphs that are too large to process at once. buffer is the amount the graphs should overlap, in meters/graph units (though using units other than meters is untested and unscientific).

Specifically, the bounding box of each subgraph will be extended by buffer meters, so the actual overlap will be 2 * overlap.

Returns an nrow x ncol matrix of graphs.

source

MissingLinks.identify_and_score — Function

identify_and_score(decay_function, Matrix{GraphPartition}, G, Gs; max_link_distance=100, min_net_distance=1000, sphere_of_influence_radius=100,
    cutoff_distance=missing, origin_weights=missing, dest_weights=missing, mmap=false, deduplicate=true)

Run the full process of routing, identifying, and scoring links on a partitioned graph. Optionally deduplicate, though keep in mind that which links are retained in deduplication depends slightly on the partitioning structure.

source

MissingLinks.partition_weights — Function

partition_weights(G, Gsub, weights)

Given weights weights that go with graph G, extract a weight array that goes with GraphPartition Gsub.

source

MissingLinks.merge_links — Function

merge_links(G, Gs, links, scores)

Merge links from partitioned graphs Gs back into a single list referencing graph G. Return a tuple of (links, scores).

source

Base.contains — Function

contains(p, l)

Is link l in the non-buffer area of graph partition p? Link l is assumed to be identified from this graph partition.

source

Internal

MissingLinks.new_graph — Function

Create a new graph with appropriately set types

source

MissingLinks.for_each_geom — Function

Run the provided function on each constituent geometry of a multigeometry

source

MissingLinks.get_first_point — Function

get the first point of a line

source

MissingLinks.get_geometry — Function

Get the geometry column from a GeoDataFrame

source

MissingLinks.CandidateLink — Type

A CandidateLink represents a link between two existing edges. It stores the vertex codes (not labels) from each end of each edge, as well as the distances from each vertex at the point where the link is.

source

MissingLinks.add_geom_to_graph! — Function

This adds edge(s) representing geom to the graph. It may add multiple edges in cases where two edges would otherwise be duplicates because they connect the same intersections (see #2).

Returns a tuple of tuples (frv, tov) - usually just one, but some edges may be split before adding to graph.

source

MissingLinks.compute_net_distance — Function

Compute the network distance between the two points on links, by computing between from ends and adding fractions of the edge.

source

MissingLinks.add_unless_typemax — Function

Add y to x, but only if x is not already typemax (used to indicate unreachable). This assumes that if x < typemax(typeof(T)), x + y < typemax(typeof(T)), which is generally reasonable as x and y are in meters and anything we're adding is well less than 65km, unless it's unreachable, which should only be the case for x (coming from the distance matrix). However, we do use a checked_add to throw an error if that assumption is violated.

If x == typemax(T), return x If x < typemax(T) and x + y <= typemax(T), return x + y If x < typemax(T) and x + y > typemax(T), throw OverflowError

source

MissingLinks.break_long_line — Function

Break long lines into pieces of at most length

source

MissingLinks.index_graph_nodes — Function

Create a spatial index from Graphs.jl vertex codes.

source

MissingLinks.compute_link_score — Function

Score a potential missing link

source

MissingLinks.split_link! — Function

split_link!(G, fr, to, dist)

Split the link from fr to to at dist, and return the new vertex number.

source

MissingLinks.collapse_realized_graph! — Function

collapse_realized_graph!(G)

Remove degree-2 nodes and combine the associated edges. Please note that this graph has not been tested with link identification code and unexpected things may happen because of vertex removal changing the order of nodes.

source

MissingLinks.get_xy — Function

get_xy(geom)

Convert a geometry into a Vector{Vector{Float64}} with the xy coordinates (used in testing).

source

MissingLinks.link_points! — Function

link_point!(G, pts, idx; tol=20, min_split_length=1, create=false)

Link points pts (Vector of ArchGDAL objects) into graph destructively by splitting edges.

tol represents the tolerance (in graph units) for linking, min_split_length represents the tolerance for splitting (if the closest point is closer to the end of the edge than minsplitlength, we just connect it to the end), and if create is true we will create an orphaned edge to connect the point to if there is no nearby edge. Since the algorithm only connects edges to edges, this allows missing links to the point to be identified even if there is no nearby edge.

source

MissingLinks.write_tntp — Function

write_tntp(filename, G)

Write a graph out in TNTP format for use in other software

source

MissingLinks.realize_graph — Function

realize_graph(G, links)

"Realize" the graph G and candidate links links.

In identifying, deduplicating, and scoring missing links, we never actually put those links into the graph. Instead, each link stores information about where it connects to the network, and scoring is based only on the distance matrix, not on the graph itself.

However, for some applications, we need to "realize" the graph—i.e. create a graph that has edges for all the candidate links, labeled as such. This function does that.

source

MissingLinks.index_candidate_links — Function

index_candidate_links(links)

Return a NamedTuple with three members:

links: Vector of RealizedCandidateLinks, one for each link in links
src: Dict mapping (VertexID, VertexID) (i.e. an edge ID) to a vector of RealizedCandidateLinks starting from that edge
dest: Similar dict mapping edge to links ending on that edge

G should be the original graph the links were built with (although since the indices are the same, the realized graph should also work).

The RealizedCandidateLinks in src and dest are references to those in links, so modifying a link in any of these will modify it in all of them.

source

MissingLinks.index_graph_edges — Function

index_graph_edges(G)

Index the edges of a graph. Return a tuple with an RTree and a vector of edge IDs (themselves tuples of VertexID) that go with each integer index in the RTree.

source

MissingLinks.closest_edge — Function

Get the closest edge to a location (tuple/vector of coordinates)

source

Base.reverse — Function

Create a reversed version of a candidate link, used in evaluating accessibility to calculate accessibility in both directions.

source