Paper:
Chen, Q. and Poorthuis, A. (2021) Identifying home locations in human mobility data: an open-source R package for comparison and reproducibility. International Journal of Geographical Information Science, Pages 1425-1448, https://doi.org/10.1080/13658816.2021.1887489
The goal of homelocator
is to improve the consistency and replicability of algorithms used for identifying home locations in human mobility data. The easy-to-use homelocator
package provide a consistent framework and interface for the adoption of different algorithms for location inference. With the package, you are able to write structured, algorithmic ‘recipes’ to identify home locations according to your research requirements. The package also has a number of built-in ‘recipes’ that have been translated from approaches in the existing literature.
# Install development version from GitHub
install_github("spatialnetworkslab/homelocator")
These are some basic examples that show you how to use common functions in the package.
You need to make sure the input dataset includes three essential attributes:
You can use validate_dataset()
to validate your input dataset before starting identifying meaningful locations. In this function, you need to specify the names of three essential attribute that used in your dataset.
# Load homelocator library
library(homelocator)
#> Welcome to homelocator package!
# load test sample dataset
data("test_sample", package = "homelocator")
df_validated <- validate_dataset(test_sample, user = "u_id", timestamp = "created_at", location = "grid_id")
#> 🎉 Congratulations!! Your dataset has passed validation.
#> 👤 There are 100 unique users in your dataset.
#> 🌏 Now start your journey identifying their meaningful location(s)!
#> 👏 Good luck!
#>
head(df_validated)
#> # A tibble: 6 x 3
#> u_id grid_id created_at
#> <int> <int> <dttm>
#> 1 92298565 1581 2016-04-17 22:43:06
#> 2 33908340 1461 2014-10-03 16:29:48
#> 3 92298565 1136 2014-02-07 07:26:15
#> 4 11616678 1375 2014-07-18 10:08:21
#> 5 11616678 1375 2013-11-24 23:16:24
#> 6 47727539 736 2016-06-21 15:59:49
To speed up computing progress, you can nest the validated dataset by user so that the subsequent location inference can be applied to each user at the same time.
df_nested <- nest_verbose(df_validated, c("created_at", "grid_id"))
#> 🛠 Start nesting...
#> ✅ Finish nesting!
#> ⌛ Nesting time: 0.192 secs
#>
head(df_nested)
#> # A tibble: 6 x 2
#> u_id data
#> <int> <list>
#> 1 92298565 <tibble [1,291 × 2]>
#> 2 33908340 <tibble [1,170 × 2]>
#> 3 11616678 <tibble [938 × 2]>
#> 4 47727539 <tibble [307 × 2]>
#> 5 54875363 <tibble [903 × 2]>
#> 6 40153763 <tibble [1,688 × 2]>
head(df_nested$data[[1]])
#> # A tibble: 6 x 2
#> created_at grid_id
#> <dttm> <int>
#> 1 2016-04-17 22:43:06 1581
#> 2 2014-02-07 07:26:15 1136
#> 3 2012-08-18 19:26:31 1038
#> 4 2014-11-25 19:39:00 1699
#> 5 2014-12-23 01:54:55 1499
#> 6 2015-06-01 19:48:18 740
Add additional needed varialbes derived from the timestamp column. These are often used/needed as intermediate variables in home location algorithms, such as year, month, day, day of the week and hour of the day, etc.
df_enriched <- enrich_timestamp(df_nested, timestamp = "created_at")
#> 🛠 Enriching variables from timestamp...
#>
#> ✅ Finish enriching! New added variables: year, month, day, wday, hour, ymd.
#> ⌛ Enriching time: 0.708 secs
#>
head(df_enriched$data[[1]])
#> # A tibble: 6 x 8
#> created_at grid_id year month day wday hour ymd
#> <dttm> <int> <dbl> <dbl> <int> <dbl> <int> <date>
#> 1 2016-04-17 22:43:06 1581 2016 4 17 1 22 2016-04-17
#> 2 2014-02-07 07:26:15 1136 2014 2 7 6 7 2014-02-07
#> 3 2012-08-18 19:26:31 1038 2012 8 18 7 19 2012-08-18
#> 4 2014-11-25 19:39:00 1699 2014 11 25 3 19 2014-11-25
#> 5 2014-12-23 01:54:55 1499 2014 12 23 3 1 2014-12-23
#> 6 2015-06-01 19:48:18 740 2015 6 1 2 19 2015-06-01
Current available recipes, where HMLC
is the default recipe used in identify_location
:
HMLC
:
FREQ
OSNA
: Efstathiades et al.2015
APDM
: Ahas et al. 2010
# default recipe: homelocator -- HMLC
identify_location(test_sample, user = "u_id", timestamp = "created_at", location = "grid_id", show_n_loc = 1, recipe = "HMLC")
# recipe: Frequency -- FREQ
identify_location(test_sample, user = "u_id", timestamp = "created_at", location = "grid_id",
show_n_loc = 1, recipe = "FREQ")
# recipe: Online Social Network Activity -- OSNA
identify_location(test_sample, user = "u_id", timestamp = "created_at", location = "grid_id",
show_n_loc = 1, recipe = "OSNA")
# recipe: Online Social Network Activity -- APDM
## APDM recipe strictly returns the most likely home location
## It is important to create your location neighbors table before you use the recipe!!
## example: st_queen <- function(a, b = a) st_relate(a, b, pattern = "F***T****")
## neighbors <- st_queen(df_sf) ===> convert result to dataframe
data("df_neighbors", package = "homelocator")
identify_location(test_sample, user = "u_id", timestamp = "created_at", location = "grid_id",
show_n_loc = 1, recipe = "APDM")