API Documentation
Graph building and troubleshooting
MissingLinks.get_example_data — Function
get_example_data()Get the example dataset used in the Quickstart and save it into the current working directory, in a folder called "example_data".
MissingLinks.semi_to_fully_noded — Function
semi_to_fully_noded(data...; snap_tolerance=1e-6, split_tolerance=1e-6)Convert GDAL data sources where the sources are "semi-noded" - that is, coincident ends, or an end coincident with a line segment, count as intersections - into a fully noded graph with nodes at each intersection, ready to be built into a graph.
Data is a vararg, so arbitrarily many layers may be supplied. They will be merged into a single output layer.
The snap tolerance is how close an end has to be to a line to be considered connected. The split tolerance is how close two splits must be to each other to be considered the same split, and should generally be much smaller, to avoid a situation where two ends snap to a line at points near each other, the line is split at one of them, and the split is more than snap_tolerance distance from the other one. I further recommend setting the tolerance when building the graph to snap_tolerance + split_tolerance. These should both be quite small; larger gaps in the network should be addressed with add_short_edges!.
MissingLinks.graph_from_gdal — Function
graph_from_gdal(layers...; tolerance=1e-6, max_edge_length=250)Build a graph from one or more GeoDataFrame layers.
This builds a graph from a fully-noded GIS network. You can pass in an arbitrary number of GeoDataFrame layers, which will be combined into a single graph. They should all be in the same coordinate system.
The tolerance specifies how far apart nodes can be and still be considered connected. This should be smaller; larger gaps should be closed by add_short_edges!; see discussion there for why.
We assume that potential links are always where two edges pass closest to one another. As the length of the edges increases, this assumption gets worse. max_edge_length controls how long edges can be; any edges longer than this will be split.
Both tolerance and max_edge_length are in the same units as the underlying data.
MissingLinks.graph_to_graphml — Function
graph_to_graphml(outfile, graph; pretty=false)Export the graph to a graphml file specified in outfile, for visualization or manipulation with other tools.
If pretty is set to true, the XML file will be pretty-printed.
MissingLinks.graph_to_gis — Function
graph_to_gis(fn, G; crs=nothing, layer_name="edges")Write graph G to GIS file fn.
The file type is determined by extension; I recommend .gpkg for GeoPackage output. The format must support multiple layers. You can specify a CRS (e.g. GFT.EPSG(32119)) so that the output has CRS information, which will make it easier to load into GIS. You can also specify a layer_name if you plan to have multiple layers in the output file.
MissingLinks.nodes_to_gis — Function
nodes_to_gis(G)Extract the nodes of graph G into a GeoDataFrame.
MissingLinks.remove_tiny_islands — Function
remove_tiny_islands(graph, min_vertices_to_retain)Remove islands from graph with fewer than min_vertices_to_retain, and return a modified graph.
Often, due to bad data, there will be small islands isolated from the rest of the network. This identifies all components of the graph, and returns a copy of the graph with only those components with at least min_vertices_to_retain.
This should already work for directed graphs, as for directed graphs it uses strongly connected components to determine vertices to remove. See extensive discussion here to see why this is relevant.
MissingLinks.add_short_edges! — Function
add_short_edges!(graph, max_edge_length)Add edges to graph in-place between all nodes that are closer than max_edge_length to one another, and currently more than 2 * maxedgelength apart via the network.
You might think, as I did, that snapping nodes during graph build would be an easier solution, but that can be prone to errors because it results in nodes being actually moved. Consider this situation:
---a (0.3 m) b--3.1m---c (0.2m) d----Assuming the vertices are read in a b c d order, b and c will get snapped to a, and d will not get snapped to anything.
Instead, we recommend only snapping a very small distance during graph build (default 1e-6 meters). We then add edges between nodes close together.
MissingLinks.find_disconnected_crossings — Function
find_disconnected_crossings(G, dmat; tolerance=10)Find locations where edges of graph G cross without intersecting, and where the network distance between the intersecting points is more than tolerance.
These are often graph errors, but may also be overpasses, tunnels, etc. Returns a GeoDataFrames-compatible point DataFrame with the locations, for further examination in GIS.
MissingLinks.find_dead_ends — Function
find_dead_ends(G)Find locations in graph G where there is a dead end.
Returns a GeoDataFrame. This is useful for network data creation as by opening this layer in GIS you can easily see where there are existing dead ends and determine if they are correct or not.
MissingLinks.remove_elevation! — Function
remove_elevation!(geom)Remove the Z component of a geometry, and change the ArchGDAL wrapper type to match.
This does mutate its argument, but in order to re-wrap in a new ArchGDAL type a new wrapper must be returned. So you should use the return value of the function.
Workaround for https://github.com/yeesian/ArchGDAL.jl/issues/333
MissingLinks.TraversalPermissionSettings — Type
Settings about what streets are considered walkable. Contains sets walkabletags, notwalkabletags, and overridewalkabletags. Each of these is a pair "key"=>"value" (e.g. "highway"=>"footway"). A way is considered walkable if it (has a tag that is in walkabletags and does not have a tag in notwalkabletags) or (has a tag that is in overridewalkabletags). overridewalkabletags exists to handle roads that normally would not be considered walkable, but have e.g. "foot"="yes".
The default tags are below. You can modify these after constructing a TraversalPermissionsSettings, by directly adding/removing from the sets, or you can replace them completely by constructing the settings with arguments, e.g.
TraversalPermissionSettings(
walkable_tags=Set(["highway"=>"footway"]),
not_walkable_tags=Set(["foot"=>"no"]),
override_walkable_tags=Set(["foot"=>"designated"])
)If even that is not enough control, you can create your type and define MissingLinks.is_traversable to determine if a link should be included using arbitrary Julia code.
Walkable tags
- highway=road
- highway=footway
- sidewalk=yes
- highway=path
- highway=sidewalk
- highway=cycleway
- highway=service
- highway=residential
- sidewalk=both
- sidewalk=left
- sidewalk:left=yes
- sidewalk:right=yes
- sidewalk:both=yes
- highway=track
- highway=pedestrian
- sidewalk=right
- highway=crossing
- highway=steps
Non-walkable tags
- foot=no
- access=no
Override walkable tags
- foot=yes
- foot=designated
MissingLinks.graph_from_osm — Function
graph_from_osm(pbf, settings, projection)Build a graph from OpenStreetMap PBF data. pbf should be the path to a PBF file.
settings is a an object that determines what edges are considered traversable. It should have a method defined MissingLinks.is_traversable(settings, way::Way) where way is a Way object from OpenStreetMapPBF.jl, which returns true if a way should be included in the graph. The simplest is to use a [TraversalPermissionSettings]{@ref} object.
MissingLinks always works with Euclidean coordinates, so coordinates need to be projected. projection is a GeoFormatTools coordinate system to project the OSM data to. e.g. for North Carolina I use GeoFormatTools.EPSG(32119).
MissingLinks.is_traversable — Function
is_traversable(settings, way)Based on settings, should way be included in the graph? Override if you are creating custom traversal permissions settings objects.
Opportunity data
MissingLinks.create_graph_weights — Function
create_graph_weights(graph, geodata, weightcols, distance)Using a graph and a spatial data frame, assign weights to graph nodes.
Each row in the spatial data frame should have a weight associated with it (if they are all equivalent, the weight can be set to a column of ones). For each row, all graph edges within distance of the geometry of that row will be identified (the spatial data frame should be in the same projection as the network data used to create the graph). An equal amount of weight will be assigned to the nodes at each end of all identified edges. (In a corner, twice as much weight will be assigned to the node at the corner, as it will receive weight from each of the links adjacent to the property. The effect may be even more pronounced if the streets on the other side of the intersection from the row are also within distance).
weightcols specifies which columns to use as weights. It must be a vector. Multiple weight columns can be specified. The function will return an n x w matrix, where n is the number of nodes in the graph, and w is the number of weightcols specified. This will contain the weight for each node for each weightcol.
Link identification, deduplication, and scoring
MissingLinks.fill_distance_matrix! — Function
fill_distance_matrix!(G, mtx::AbstractMatrix{T}; maxdist=5000, origins=1:nv(G))Fill an already created distance matrix mtx with shortest-path distances based on graph G.
The units are the same as the underlying data, and will be rounded to the resolution of whatever the element type T of mtx is. We usually use UInt16 meters as these can represent quite long trips with reasonable accuracy while minimizing memory consumption. Using other data types is not tested.
To make this faster, you can set a maximum distance maxdist; for destinations beyond this distance (or destinations that are unreachable altogether) the matrix will contain typemax(T).
Will use multiple threads if Julia is started with multiple threads.
MissingLinks.identify_potential_missing_links — Function
identify_potential_missing_links(graph, distance_matrix, max_link_dist, min_net_dist)Identify locations in graph that are less than max_link_dist apart geographically, but at least min_net_dist apart in network distance (or disconnected entirely).
dmat should be a distance matrix for all nodes in the graph, generally created by fill_distance_matrix!
MissingLinks.deduplicate_links — Function
deduplicate_links(graph, list, distance_matrix, sphere_of_influence_radius)Deduplicate links using a heuristic.
Arguments are a list of links (e.g. as produced by identify_potential_missing_links), a distance matrix between nodes for the graph used to identify those links, and the radius of the sphere of influence (see below).
Consider the situation below. Lines are existing roads. Here, you have the ends of two blocks in different subdivisions, that do not connect.
--a--o o--d--
| |
c e
| |
--b--o o--f-- Since there are three edges in each subdivision in the graph above, all of which pass within (say) 100m of each other, there are nine possible candidate links: ad, ae, af, bd, be, bf, cd, ce, cf. Clearly, all of these provide essentially the same level of access, so we want to deduplicate them.
We consider two candidate links to be duplicates if they connect pairs of locations within a sphere_of_influence_radius of one another (we use 100m in the paper). To identify these links, we use the following greedy algorithm. Note that since it is a greedy algorithm, the order the links are identified in matters. Currently, link identification is not multithreaded, so the order of the links should be deterministic, and therefore you should get the same results from this deduplication whenever it is run.
For the first candidate link, we identify the "sphere of influence" of that link; the sphere of influence is all nodes that are within 100m of either end of the candidate link. We calculate this using the node-to-node distance matrix and the distances of the link from the start and end of the edges it connects. We do not need to define separate spheres of influence for the start and end of the link—as long as the radius of the sphere of influence is less than half the minimum network distance for proposing a link, any candidate link that connects two nodes in the sphere of influence must have one end near each end of the original link that defined the sphere of influence.
For subsequent candidate links, we determine if the link is within an already-identified sphere of influence. We check to see if one of the ends of each of the edges the candidate link connects is in an already identified sphere of influence. If it is, we further check to see if the network distances between the ends of this candidate link and the corresponding ends of candidate link that defined the sphere of influence are less than the threshold for both ends of the links (note that links are undirected, so we explicitly consider links duplicates even if the start of one is near the end of the other and vice versa). If these network distances are both less than the radius of the sphere of influence, we add the link to this sphere of influence and continue to the next link. If they are not, we continue to the next sphere of influence. If a link is not within any existing sphere of influence, we define a new sphere of influence based on the link.
We then return the link with the shortest geographic distance from each sphere of influence. Note that there may still be links that are close to one another; if two links define adjacent or even overlapping spheres of influence, but with neither link in either sphere of influence, the best links in the two spheres of influence may be very similar ones in between the original two links. You could iterate the algorithm if you wanted to to prevent this, but this could result in links being supplanted by ones more than 100m away.
MissingLinks.score_links — Function
score_links(decay_function, graph, links, distance_matrix, origin_weights, dest_weights, decay_cutoff_m)Score the contribution of each link in links to aggregate accessibility, using the decay function and weights specified.
decay_function is a Julia function that takes a single argument, a distance, and returns a weight; smaller weights mean less influence. For a two-mile cumulative opportunities function, this could just be e.g. x -> x < 3218. It is the first argument to allow use of Julia do-block syntax.
links is the vector of links (e.g. from deduplicate_links). The distance matrix is the same one used throughout the analysis. origin_weights and dest_weights are vectors of weights associated with each node, e.g. computed by create_graph_weight.
decay_cutoff_m is the distance at which decay_function goes to zero (for a smooth function, we recommend reimplementing as piecewise with a dropoff to zero at some point where additional access is largely immaterial). It is in meters if the input data were in meters (which we recommend as we have not tested with other units to ensure units are not hardcoded anywhere. Just use meters. Be a scientist.)
For each link, we identify how much making this connection would increase the aggregate number of origin-weighted destinations (i.e. the sum across origins of the number of destinations accessible within a threshold distance). We do this in several steps:
- Identify all origins and destinations within the threshold of the start of the proposed new link
- For each origin and destination: a. generate distances from the origin to the destination via the new link by summing - Distance from origin to link - Length of link - Distance from link to each destination b. if the distance via the new link is longer than the existing distance, continue to the next pair c. otherwise, compute the distance weight using the weighting function in the accessibility metric for the distance via the new link and the existing distance d. take the difference of these weights e. multiply by the origin and destination weights
- Sum the results
- Reverse the start and end of the link, and repeat
- Sum the results
MissingLinks.links_to_gis — Function
links_to_gis(graph, links, pairs...)Convenience function that converts a vector of links (and associated graph) into a GeoDataFrame, suitable for export to a GIS file.
pairs` should be pairs of :col_name=>values that will be added to the output. For example, you will often want to add scores to your output.
Routing and mapping of impacts of individual links
MissingLinks.route_one_to_one — Function
route_one_to_one(G, origin, dest; crs=nothing, index=nothing)Do a one-to-one route. If CRS is specified, origin and dest can be in lat-lon (not lon-lat) coordinates and will be reprojected to the specified CRS before routing.
If you're doing many routes, setting index to the result of MissingLinks.index_graph_edges` will speed up results.
If you want to do a route on a graph with candidate link(s) included, see realize_graph`.
MissingLinks.distance_surface — Function
distance_surface(G, origin; maxdist=5000, resolution=25, crs=nothing)Get a Raster.jl raster of the travel times from origin.
MissingLinks.regional_access — Function
regional_access(decay_func, G, dmat, link, weights, decay_cutoff; resolution=50)Perform a regional access analysis (change in access to weights for each point) as a result of the link. Return a Rasters.jl raster with the values at each point in the area around the link. decay_cutoff is the distance (in coordinate system units) after which additional destinations have no impact on accessibility.
Internal
MissingLinks.new_graph — Function
Create a new graph with appropriately set types
MissingLinks.for_each_geom — Function
Run the provided function on each constituent geometry of a multigeometry
MissingLinks.get_first_point — Function
get the first point of a line
MissingLinks.get_geometry — Function
Get the geometry column from a GeoDataFrame
MissingLinks.CandidateLink — Type
A CandidateLink represents a link between two existing edges. It stores the vertex codes (not labels) from each end of each edge, as well as the distances from each vertex at the point where the link is.
MissingLinks.add_geom_to_graph! — Function
This adds edge(s) representing geom to the graph. It may add multiple edges in cases where two edges would otherwise be duplicates because they connect the same intersections (see #2).
Returns a tuple of tuples (frv, tov) - usually just one, but some edges may be split before adding to graph.
MissingLinks.compute_net_distance — Function
Compute the network distance between the two points on links, by computing between from ends and adding fractions of the edge.
MissingLinks.add_unless_typemax — Function
Add y to x, but only if x is not already typemax (used to indicate unreachable). This assumes that if x < typemax(typeof(T)), x + y < typemax(typeof(T)), which is generally reasonable as x and y are in meters and anything we're adding is well less than 65km, unless it's unreachable, which should only be the case for x (coming from the distance matrix). However, we do use a checked_add to throw an error if that assumption is violated.
If x == typemax(T), return x If x < typemax(T) and x + y <= typemax(T), return x + y If x < typemax(T) and x + y > typemax(T), throw OverflowError
MissingLinks.break_long_line — Function
Break long lines into pieces of at most length
MissingLinks.index_graph_nodes — Function
Create a spatial index from Graphs.jl vertex codes.
MissingLinks.compute_link_score — Function
Score a potential missing link
MissingLinks.split_link! — Function
split_link!(G, fr, to, dist)Split the link from fr to to at dist, and return the new vertex number.
MissingLinks.collapse_realized_graph! — Function
collapse_realized_graph!(G)Remove degree-2 nodes and combine the associated edges. Please note that this graph has not been tested with link identification code and unexpected things may happen because of vertex removal changing the order of nodes.
MissingLinks.get_xy — Function
get_xy(geom)Convert a geometry into a Vector{Vector{Float64}} with the xy coordinates (used in testing).
MissingLinks.link_points! — Function
link_point!(G, pts, idx; tol=20, min_split_length=1, create=false)Link points pts (Vector of ArchGDAL objects) into graph destructively by splitting edges.
tol represents the tolerance (in graph units) for linking, min_split_length represents the tolerance for splitting (if the closest point is closer to the end of the edge than minsplitlength, we just connect it to the end), and if create is true we will create an orphaned edge to connect the point to if there is no nearby edge. Since the algorithm only connects edges to edges, this allows missing links to the point to be identified even if there is no nearby edge.
MissingLinks.write_tntp — Function
write_tntp(filename, G)Write a graph out in TNTP format for use in other software
MissingLinks.realize_graph — Function
realize_graph(G, links)"Realize" the graph G and candidate links links.
In identifying, deduplicating, and scoring missing links, we never actually put those links into the graph. Instead, each link stores information about where it connects to the network, and scoring is based only on the distance matrix, not on the graph itself.
However, for some applications, we need to "realize" the graph—i.e. create a graph that has edges for all the candidate links, labeled as such. This function does that.
MissingLinks.index_candidate_links — Function
index_candidate_links(links)Return a NamedTuple with three members:
- links: Vector of RealizedCandidateLinks, one for each link in
links - src: Dict mapping (VertexID, VertexID) (i.e. an edge ID) to a vector of RealizedCandidateLinks starting from that edge
- dest: Similar dict mapping edge to links ending on that edge
G should be the original graph the links were built with (although since the indices are the same, the realized graph should also work).
The RealizedCandidateLinks in src and dest are references to those in links, so modifying a link in any of these will modify it in all of them.
MissingLinks.index_graph_edges — Function
index_graph_edges(G)Index the edges of a graph. Return a tuple with an RTree and a vector of edge IDs (themselves tuples of VertexID) that go with each integer index in the RTree.
MissingLinks.closest_edge — Function
Get the closest edge to a location (tuple/vector of coordinates)
Base.reverse — Function
Create a reversed version of a candidate link, used in evaluating accessibility to calculate accessibility in both directions.