New version 1.1 !

Introduction

PATHMATRIX is a tool to compute matrices of effective geographic distances among samples, based on a least-cost path algorithm. Punctual locations (points) or zones encompassing sample data points (polygons) are used in conjunction with a species-specific friction map representing the cost of movement through the landscape. Three different types of distances can be computed. 1) least-cost distance, 2) length of the least-cost path, 3) Euclidean distance. Matrices of effective distances can then be exported to other software to test, for example, for isolation by distance. The use of effective distances allows to investigate the role of the environment on the spatial genetic structuring of populations. Especially for habitat specialists, least-cost distances may give a more realistic measure of spatial isolation (or its inverse, connectivity) than standard Euclidean distance (e.g. Chardon et al. 2003; Coulon et al. 2004)

PATHMATRIX is an extension to the Geographical Information System (GIS) software ARCVIEW 3.x, and is written in the language Avenue. It needs to be used in conjunction with the ARCVIEW module Spatial Analyst.

The goal of this online manual is to describe the user interface and the formats of input and output files.


Related articles

This software was developed by Nicolas Ray, and the reference to cite is:

Ray N. (2005) PATHMATRIX: a GIS tool to compute effective distances among samples, Molecular Ecology Notes, 5: 177-180

For a list of papers having used PathMatrix, click on this Google Scholar link.


What's new in version 1.1

The release of version 1.1 adds two functionnalities:

- It is now possible to use unprojected points or polygons in a projected View. The distances are then obtained in map units (e.g. meters, instead of decimal degrees as in version 1.0) and least-cost paths can be correctly displayed as graphs in the View or saved in an unprojected shapefile. See Important notes on projections below.

- It is now not required to have a grid selected in the View when only Euclidean distances are computed


Download and installation

Download:

pathmatrix1.1.zip : PATHMATRIX 1.1 ArcView extension (version 1.0 can still be downloaded here if needed)

 

pathmatrix_data.zip : Zip file containing a friction grid (and its legend friction.avl), a point shapefile and a polygon shapefile

Installation:

1. If you have already installed PATHMATRIX ver. 1.0, remove it from the extension folder (Ext32) of your ARCVIEW installation. This folder is typically located under: C:\ESRI\AV_GIS30\ARCVIEW\EXT32

2. Unzip pathmatrix1.1.zip, and place a copy of pathmatrix1.1.avx in your extension folder

3. In ARCVIEW, go to ‘File/Extensions…’, and load the PATHMATRIX extension. Make sure that the Spatial Analyst extension is also loaded. Once in a View, the PATHMATRIX user interface is accessed by clicking the PATHMATRIX button that appears on the right side of the list of buttons:


User interface

Once clicked, the PATHMATRIX button accessed the user interface, which is described below.
 


Figure 1 PATHMATRIX user interface.

- Inputs
[  1  ] Sample file. It can either be chosen from a shapefile (points or polygons) displayed in the current View, or from a text file describing point coordinates (see input files)
[  2  ] Friction grid. It can either be the selected grid in the current View, or a set of grids on disk.
[  3  ] Maximum accumulative cost distance. If a maximum cost distance is set, the accumulative cost map will stop expending after the maximum accumulative cost it reached. If a point or a polygon is not reached by the accumulative cost map, its corresponding least-cost distance (or length of the least-cost path) will be set to -1 in the output table.

- Outputs
[  4  ] Change current directory. It is useful to use a dedicated working directory for PATHMATRIX computation, because a lot of temporary files are created and cannot always be automatically deleted.
[  5  ] Base name for output table. This is the base name that any output table will have. Depending on the output format (see below), a complement to this name will indicate the output format.
[  6  ] Option of saving accumulative cost grids. If this option is selected, each accumulative cost grid (one per point or polygon) will be saved in the working directory. The base name of these grids can also be indicated.
[  7  ] Option to display least-cost path. If this option is selected, the whole set of computed least-cost path (n*(n-1)/2 paths) will be display in the View as graphics. This option is not available if the View is projected, but see below to know how to overcome this limitation.
[ 8  ] Option to save least-cost path in a shapefile. If this option is selected, the whole set of computed least-cost path (n*(n-1)/2 paths) will be saved as a polyline shapefile in the working directory, and displayed in the View. This option is not available if the View is projected, but see below to know how to overcome this limitation.
[ 9  ] Choice of computed distances. Normal or natural logarithm of distance (Ln) can be chosen. The three different types of distances are discussed below. For each type of distance, an extension (_lcd, _apd, _geo) is added to the file name of the corresponding output table.
[ 10 ] Choice of output formats. The standard output format is Dbase format (*.dbf) and is always output. Choice is given to output format in 6 other text formats which are discussed below.


Input files

Sample coordinates can either be described through an existing point shapefile displayed in the current View, or through a simple (text) coordinates file in SPLATCHE format (Currat M et al. 2004). If an existing point shapefile is used, it MUST have an “Id” field (numbers or strings) with unique identifiers.

The format of the simple coordinate file is described below.

Sample coordinates file (SPLATCHE format, *.asc)
A file with the extension ".sam" allows to specify the coordinates of the samples, as well as the number of genes sampled in each population (see
SPLATCHE online help for more information).
On the first line of this file, the user must specify the number (integer) of samples. The second line is reserved for the legends. Then, each line defines a sample with 4 fields separated by "tabs" (not “spaces”!).

Example of sample coordinates file:

3

#name #size #lat  #long

spl1  40    35.8  12.3

spl2  25    40.1  13.1

spl3  35    34.1  14.9

where #name is the name of the sample, #size is the sample size, #lat is the latitude, and #long is the longitude

The field #size is not used in PATHMATRIX and is not compulsory, so that the file can be simplified as follow:

 

3

#name #lat  #long

spl1  35.8  12.3

spl2  40.1  13.1

spl3  34.1  14.9

 

In the above example, latitudes and longitudes are in decimal degrees. It is possible to use coordinates in other units, given that 1) your friction grid(s) are in the same units and projection, and 2) that your are using PATHMATRIX in an unprojected View (see the important note on projection).

An example with coordinates units in meters:

 

3

#name #lat  #long

spl1  5889000     712000

spl2  5895000     717000

spl3  5935000     738000

 

The use of a Sample coordinates file will generate a new point shapefile (splatchepoints.shp) that will be saved in the current working directory. Because PATHMATRIX must use an “ID” field in the input and output files, each input sample will be assigned a numeric ID starting at 1.

 

Fiction grid(s)

One or several friction grid files can be used. These grids can either be integer or float, but keep in mind that computation is usually slower with float grids.

 

The first option is to use the selected grid in the current view. In this case, the base name for the output distance tables(s) can be set. Another option allows to save the individual accumulative cost maps. There is one accumulative cost map per sample, and these maps from which the least-cost distances are computed. Saving and visualizing these maps can often be useful to check that adequate friction values have been chosen. If this option is chosen, the base name of these grids can be set. Each grid name will be composed of the base name plus an increasing ID number. If grids with a similar base name already exist in the current working directory, successive grids will have higher ID numbers.

 

The second option is to use several friction grids, in a way similar to a batch process. In this case, the user chooses a set of grids on disks. It is not possible to save the accumulative cost grids in this case, and the names of the output distances tables will correspond to the name of each corresponding friction grid. 


Computed distances

Two types of effective distances, and one type of Euclidean distance can be computed with PATHMATRIX. The distances are computed among all input samples (point or polygons) in a pairwise fashion. With polygons, distances are computed using closest edge-to-edge. The natural logarithm of distance can be alternatively chosen when computing distances. Although the log of an Euclidean distance is often used when comparing genetic and geographic distances, it is not clear yet what a log of a least-cost distance is. Users should be aware of that if they use it in their work.

It is possible in PATHMATRIX to display the computed least-cost paths on the view, or to save them in a polyline shapefile. There are always n*(n-1)/2 paths, where n is the number of points or polygons. By definition, the least-cost distance (and the length of a least-cost path) is equal when computed between points A and B or between points B and A. Therefore, PATHMATRIX only computes one least-cost path between any given pair of points (or polygons). However, the least-cost path that is drawn in the view can be slightly different between A and B or between B and A, and this can be visible. This is due to the fact that sometimes, in a zone with uniform friction, there are several possibilities to draw the least-cost path. ArcView uses a deterministic method to choose to go “right” or “left”. Hence, the minor visual differences between paths. However, despites these alternatives least-cost path drawing, the total least-cost distance or the length of the least-cost path is identical.

Individual least-cost paths can be selected in a View, so that it is possible to display or save only a subset of the whole set of paths. After selection, a subset of paths can be saved by using the tool “Theme/Convert to Shapefile…”.

Description of the three available distances follows.


Least-cost distance
This distance is the accumulative cost distance of the least-cost path. It is the minimum distance in cost units to reach the target point (or polygon) from the source point (or polygon).


Along least-cost path distance
This distance is the length (in map units., e.g. meters) of the least-cost path. It is equivalent as “walking” on the least-cost path and recording the total distance. The map units of the View must be defined to compute these distances.


Euclidean distance
This distance is the standard Euclidean distance (“as the crow flies”) between the target point (or polygon) and the source point (or polygon). The map units of the View must be defined to compute these distances.


Output formats

Apart from the standard dBase format (*.dbf) that can be opened in Excel, several output text formats can be chosen. Below are examples of output matrices computed among 3 sample locations.

Simple tab-delimited text diagonal matrix

0.000000               

158934.328125     0.000000         

257079.937500     190815.437500     0.000000

 

Simple tab-delimited text square matrix

0.000000    158934.328125     257079.937500    

158934.328125     0.000000    190815.437500    

257079.937500     190815.437500     0.000000

 

Single column text format

158934.328125

257079.937500

190815.437500

 

IBD single column text format (Bohonak 2002)

GEOGRAPHIC_DISTANCE

1     2     158934.328125

1     3     257079.937500

2     3     190815.437500

SPAGeDI matrix format (Hardy & Vekemans 2002)

M3    ID1   ID2   ID3  

ID1   0.000000    158934.328125     257079.937500    

ID2   158934.328125     0.000000    190815.437500    

ID3   257079.937500     190815.437500     0.000000   

END

FSTAT single column text format (Goudet 1995)

Effective distances computed with PATHMATRIX

3

pathdist    (out_fstat_lcd.dis)

158934.328125

257079.937500

190815.437500

 

 An important note on projections

In ARCVIEW 3.x, it is possible to work either in a projected or in an unprojected View, and with projected and/or unprojected data. Working with projections is often required, but is sometimes cumbersome when lots of data in different projections are used. The user should familiarize himself or herself with how to efficiently work with projections in ARCVIEW.

The following table shows how to handle each specific case when using projected View/data with PATHMATRIX.

 

Case

View

Sample shapefile (points or polygons)

Friction grids

Computation of distance matrices in PATHMATRIX

Display of least-cost path graphs and saved least-cost paths in PATHMATRIX

A

unprojected

unprojected

unprojected

B

unprojected

projected

C

projected

projected

D

projected

unprojected

E

projected

unprojected

projected

F

unprojected

unprojected

G

projected

projected/
unprojected

 

-          the cases A and C are likely to be encountered by most users. PATHMATRIX handles distances and display of least-cost paths correctly;

-          with any of the four cases represented by grey lines (B, D, F and G), vector data (points or polygons) and grids do not match spatially and are wrongly displayed in the View. To avoid wrong analysis, ArcView users should never allow such cases to happen;

-          with case E, PATHMATRIX 1.1 handles correctly the computation of distances in map units (e.g. meters), and is also able to display or save least-cost paths (in a unprojected shapefile). Alternatively to this case, the user can also work with a projected sample shapefile in an unprojected View, which is equivalent to case C. To obtain a projected sample shapefile from an unprojected sample shapefile, the user can use the tool “Theme/Convert to Shapefile” while in the original projected View. By answering “yes” at the question “Do you want the new shapefile to be saved in the projected units?”, the shapefile is projected and can then be used in a unprojected View (case C).

 Limitations and other notes

-          When creating a new folder for the working directory, be sure to click again on the newly created directory to select it.

-          When using a sample shapefile with a large numbers of points or polygons, or when using several large friction grids in the batch process, you might encounter the following Error message:

The typical reason for this message is due to limitations in the Spatial Analyst engine. During an ArcView session, Spatial Analyst can only be called about 32'000 times. After that, ArcView becomes unstable and eventually crashes with this type of error message. When you use of lot of points in Pathmatrix (typically more than 150 ), this limit is usually reached, and ArcView crashes. The same situation can also appear when you use much less points (or polygons) in batch mode, by using several cost grids. In that case, the accumulation of Spatial Analyst calls can also add up to the limit and ArcView crashes. Altough there is no solution to avoid the first type of crash (with one grid and too many points), for the second type (few points, but many grids in batch mode), you can always close ArcView after the crash, restart your project and launch Pathmatrix on the remaining set of grids. Remember that restarting ArcView resets to zero the number of calls to the Spatial Analyst engine.

Other reasons for this message, altough less likely, can be a lack of available memory (RAM). Check that you are not using virtual memory (swapping) by looking at the task manager of Windows. If it's the case, you can either add RAM, or try to resample your grid to a lower resolution.

I'm currently working on a solution to this problem, which will be done by using another piece of software to compute least-cost paths (no calls to the Spatial Analyst) and/or by recoding Pathmatrix in ArcGIS.


Acknowledgment

I am grateful to Thomas Broquet for feedbacks on an earlier version of this extension. The development of PATHMATRIX was possible through a Swiss postdoctoral NSF grant n° PBGEA-101314 while I was working at the Environmental Science Lab of the University of Melbourne, Australia.


References

Bohonak AJ (2002) IBD (Isolation by Distance): a program for analyses of isolation by distance. Journal of heredity 93, 153-154.

 

Chardon JP, Adriaensen F, Matthysen E (2003) Incorporating landscape elements into a connectivity measure: a case study for the Speckled wood butterfly (Pararge aegeria L.). Landscape Ecology 18, 561-573.

 

Coulon A, Cosson JF, Angibault JM, et al. (2004) Landscape connectivity influences gene flow in a roe deer population inhabiting a fragmented landscape: an individual-based approach. Molecular Ecology 13, 2841-2850

 

Currat M, Ray N, Excoffier L (2004) SPLATCHE: A program to simulate genetic diversity taking into account environmental heterogeneity. Molecular Ecology Notes 4, 139-142.

Goudet J (1995) FSTAT: A computer program to calculate F- statistics. Journal of Heredity 86, 485-486.

 

Hardy OJ, Vekemans X (2002) SPAGeDi : a versatile computer program to analyse spatial genetic structure at the individual or population levels. Molecular Ecology Notes 2, 618-620.


For any question or bug report on PATHMATRIX, please contact Nicolas Ray, working now at the enviroSPACE Lab at University of Geneva

Last edited on July 25th, 2008