recitR.Rmd
library(recitR)
To find the identifiers of items and properties of interest, you can:
WikidataR
(functions find_item()
,find_property()
).Then recitR
functions might be used to start exploring data.
Imagine you are interested in the Wikidata available regarding the Lyon Metro network.
Let’s try and see if there are Wikidata about it:
find_item("Metro Lyon")
#>
#> Wikidata item search
#>
#> Number of results: 1
#>
#> Results:
#> 1 Lyon Metro (Q1552) - public transportation network in Lyon, France
So you’d be interested, for instance, in all the subway stations that are part of this network.
find_property("part of")
#>
#> Wikidata property search
#>
#> Number of results: 10
#>
#> Results:
#> 1 part of (P361) - object of which the subject is a part (if this subject is already part of object A which is a part of object B, then please only make the subject part of object A). Inverse property of "has part" (P527, see also "has parts of the class" (P2670)).
#> 2 parent organization (P749) - parent organization of an organization, opposite of subsidiaries (P355)
#> 3 published in (P1433) - larger work that a given work was published in, like a book, journal or music album
#> 4 constellation (P59) - the area of the celestial sphere of which the subject is a part (from a scientific standpoint, not an astrological one)
#> 5 part of the series (P179) - series which contains the subject
#> 6 member of sports team (P54) - sports teams or clubs that the subject currently represents or formerly represented
#> 7 on focus list of Wikimedia project (P5008) - property to indicate that an item is of particular interest for a Wikimedia project. This property does not add notability. Items should not be created with this property if they are not notable for Wikidata. See also P6104, P972, P2354.
#> 8 highway system (P16) - system (or specific country specific road type) of which the highway is a part
#> 9 partially coincident with (P1382) - object that is partially part of, but not fully part of (P361), the subject
#> 10 diaspora (P3833) - diaspora that a cultural group belongs to
So you’re looking for all the stations that are part of (“wd:P361”) the Lyon metro network (“wd:Q1552”).
You could access this information through:
stations=get_triplets(subject="?items",verb="wdt:P361",object="wd:Q1552")
stations %>% head()
#> # A tibble: 6 x 1
#> items
#> <chr>
#> 1 http://www.wikidata.org/entity/Q2944
#> 2 http://www.wikidata.org/entity/Q2965
#> 3 http://www.wikidata.org/entity/Q2969
#> 4 http://www.wikidata.org/entity/Q2976
#> 5 http://www.wikidata.org/entity/Q5298
#> 6 http://www.wikidata.org/entity/Q599865
Notice that we do not have values yet for the stations (that’s what we’re looking for) hence the use of “?” at the beginning ot the subject string.
To also get the labels for stations, we can use the argument label
:
parts_metro_Lyon=get_triplets(subject="?items",verb="wdt:P361",object="wd:Q1552", label="?items")
parts_metro_Lyon %>% head()
#> # A tibble: 6 x 2
#> items itemsLabel
#> <chr> <chr>
#> 1 http://www.wikidata.org/entity/Q2944 Lyon Metro Line A
#> 2 http://www.wikidata.org/entity/Q2965 Lyon Metro Line B
#> 3 http://www.wikidata.org/entity/Q2969 Lyon Metro Line C
#> 4 http://www.wikidata.org/entity/Q2976 Lyon Metro Line D
#> 5 http://www.wikidata.org/entity/Q5298 Bellecour
#> 6 http://www.wikidata.org/entity/Q599865 Place Guichard - Bourse du Travail
For now, we get 50 items, not only stations but also other types of items such as metro lines. Let’s have a look at the item “Place Guichard - Bourse du Travail” (Q59855) which we know correspond to a station.
The function get_claims()
of package recitR
enables you to see all the direct wikidata properties (and their values) based on an item’s id.
get_claims("wd:Q599865")
#> # A tibble: 16 x 7
#> property propertyLabel value valueLabel propertyType propertyDescript…
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 http://w… country http:/… France http://wikib… "sovereign state…
#> 2 http://w… image http:/… http://comm… http://wikib… "image of releva…
#> 3 http://w… instance of http:/… metro stati… http://wikib… "that class of w…
#> 4 http://w… instance of http:/… station loc… http://wikib… "that class of w…
#> 5 http://w… connecting li… http:/… Lyon Metro … http://wikib… "railway line(s)…
#> 6 http://w… located in th… http:/… 3rd arrondi… http://wikib… "the item is loc…
#> 7 http://w… adjacent stat… http:/… Saxe - Gamb… http://wikib… "the stations ne…
#> 8 http://w… adjacent stat… http:/… Gare Part-D… http://wikib… "the stations ne…
#> 9 http://w… station code 43 43 http://wikib… "generic identif…
#> 10 http://w… part of http:/… Lyon Metro http://wikib… "object of which…
#> 11 http://w… Commons categ… Place … Place Guich… http://wikib… "name of the Wik…
#> 12 http://w… inception 1981-0… 1981-09-14T… http://wikib… "date or point i…
#> 13 http://w… coordinate lo… Point(… Point(4.847… http://wikib… "geocoordinates …
#> 14 http://w… number of pla… 2.0 2 http://wikib… "number of track…
#> 15 http://w… date of offic… 1981-0… 1981-09-14T… http://wikib… "date or point i…
#> 16 http://w… Google Knowle… /g/1s0… /g/1s04bfz18 http://wikib… "identifier for …
#> # … with 1 more variable: propertyAltLabel <chr>
Property “wdt:P31” should enable us to collect only stations (“wd:Q928830”) instead of all parts of the Lyon Metro network.
We can enrich and refine our query incrementally with add_triplets()
before sending it (whereas get_triplets()
was a shortcut to build and send simple requests) :
stations_metro_Lyon=add_triplets(subject="?stations",verb="wdt:P361",object="wd:Q1552", label="?stations") %>%
add_triplets(subject="?stations",verb="wdt:P31",object="wd:Q928830") %>%
build_sparql() %>%
send_sparql()
head(stations_metro_Lyon)
#> # A tibble: 6 x 2
#> stations stationsLabel
#> <chr> <chr>
#> 1 http://www.wikidata.org/entity/Q5298 Bellecour
#> 2 http://www.wikidata.org/entity/Q599865 Place Guichard - Bourse du Travail
#> 3 http://www.wikidata.org/entity/Q613893 Hôtel de Ville - Louis Pradel
#> 4 http://www.wikidata.org/entity/Q776088 Cordeliers
#> 5 http://www.wikidata.org/entity/Q934869 Gare de Vénissieux
#> 6 http://www.wikidata.org/entity/Q1847502 Stade de Gerland
We now get 42 stations that are part of the Lyon metro network.
If we wanted to get other properties and associated values for these stations (for instance their location (“wdt:P625”)) we could proceed this way:
stations_metro_Lyon=add_triplets(subject="?stations",verb="wdt:P361",object="wd:Q1552", label="?stations") %>%
add_triplets(subject="?stations",verb="wdt:P31",object="wd:Q928830") %>%
add_triplets(subject="?stations",verb="wdt:P625",object="?coords") %>%
build_sparql() %>%
send_sparql()
head(stations_metro_Lyon)
#> # A tibble: 6 x 3
#> stations stationsLabel coords
#> <chr> <chr> <chr>
#> 1 http://www.wikidata.org/ent… Bellecour Point(4.83405 45.757…
#> 2 http://www.wikidata.org/ent… Place Guichard - Bourse du… Point(4.847308333 45…
#> 3 http://www.wikidata.org/ent… Hôtel de Ville - Louis Pra… Point(4.836022222 45…
#> 4 http://www.wikidata.org/ent… Cordeliers Point(4.835894444 45…
#> 5 http://www.wikidata.org/ent… Gare de Vénissieux Point(4.88804 45.705…
#> 6 http://www.wikidata.org/ent… Stade de Gerland Point(4.83038 45.727…
recitR
provides functions to clean and transform “raw” Wikidata tibbles into tibbles that are easier to use in R.
Function clean_wikidata_table()
lightens some columns with wikidata-formatted URIs:
stations_metro_Lyon=clean_wikidata_table(stations_metro_Lyon)
Function transform_wikidata_coords()
get the coordinates as longitude (lng
) and latitude (lat
) based on the Wikidata WKT formatting of spatial coordinates (“Point(lng lat)”).
stations_metro_Lyon=stations_metro_Lyon %>%
transform_wikidata_coords("coords")
The resulting table may then be used easily with (for instance) package leaflet
:
leaflet::leaflet(stations_metro_Lyon) %>%
leaflet::addTiles() %>%
leaflet::addCircles(popup=~stationsLabel)
Now, let’s imagine that we are interested in the cities in a 200km radius around Lyon.
find_item("Lyon")
#>
#> Wikidata item search
#>
#> Number of results: 10
#>
#> Results:
#> 1 Lyon (Q456) - commune in the metropolis of Lyon, France
#> 2 Olympique Lyonnais (Q704) - association football club in Lyon, France
#> 3 Lyon (Q30102635) - family name
#> 4 Marcus Ward Lyon, Jr. (Q3290305) - U.S. mammalogist, bacteriologist, and pathologist (1875–1942)
#> 5 Lyon (Q294688) - Wikimedia disambiguation page
#> 6 Lyon (Q867818) - town in Coahoma County, Mississippi, United States
#> 7 9381 Lyon (Q1192758) - asteroid
#> 8 Lyon (Q30014648) - male given name
#> 9 Harold Lloyd Lyon (Q18911234) - botanist (1879-1957)
#> 10 Lion (Q3833172) - rock band from the United States
find_item("city")
#>
#> Wikidata item search
#>
#> Number of results: 10
#>
#> Results:
#> 1 city (Q515) - large permanent human settlement
#> 2 Manchester City F.C. (Q50602) - association football club in Manchester, England
#> 3 Leicester City F.C. (Q19481) - association football club in Leicester, England
#> 4 Stoke City F.C. (Q18736) - association football club in Stoke-on-Trent, England
#> 5 Birmingham City F.C. (Q19444) - association football club in Birmingham, England
#> 6 Cardiff City F.C. (Q18662) - association football club in Cardiff, Wales
#> 7 Norwich City F.C. (Q18721) - association football club in Norwich, England
#> 8 Swansea City A.F.C. (Q18659) - association football club in Swansea, Wales
#> 9 Hull City A.F.C. (Q19477) - association football club in Kingston upon Hull, England
#> 10 Bradford City A.F.C. (Q48879) - association football club in Bradford, England
We could start exploring Wikidata with this query which finds all items that are instances (“wdt:P31”) of “city” or of any subclass (“wdt:P279”) of “city” . This query might return many items so that it seems reasonable to limit the number of items retrieved for now with the argument limit
add_triplets(subject="?city", verb="wdt:P31/wdt:P279*", object="wd:Q515", label="?city",limit=10) %>%
build_sparql() %>%
send_sparql()
#> # A tibble: 10 x 2
#> city cityLabel
#> <chr> <chr>
#> 1 http://www.wikidata.org/entity/Q309436 Ksar of Aït Benhaddou
#> 2 http://www.wikidata.org/entity/Q817274 Beni Isguen
#> 3 http://www.wikidata.org/entity/Q2670896 Ksar Ouled Soltane
#> 4 http://www.wikidata.org/entity/Q3200131 Ksar Ifegh
#> 5 http://www.wikidata.org/entity/Q3200135 Ksar of Lamaarka
#> 6 http://www.wikidata.org/entity/Q3818705 Ksar Nalut
#> 7 http://www.wikidata.org/entity/Q11736787 Ksar Hallouf
#> 8 http://www.wikidata.org/entity/Q12233025 Ksar Beni Barka
#> 9 http://www.wikidata.org/entity/Q16593169 Ksar of Taourirt
#> 10 http://www.wikidata.org/entity/Q21029712 Ksar of Rgabi N'Ait Hassou
Now, let’s get the location (“wdt:P625”) of the cities
add_triplets(subject="?city",
verb="wdt:P31/wdt:P279*",
object="wd:Q515",
label=c("?city"),
limit=10) %>%
add_triplets(subject="?city",
verb="wdt:P625",
object="?coords") %>%
build_sparql() %>%
send_sparql()
#> # A tibble: 10 x 3
#> city cityLabel coords
#> <chr> <chr> <chr>
#> 1 http://www.wikidata.org/ent… Ksar of Tamoussa ou Ali Point(-4.446111111 32.…
#> 2 http://www.wikidata.org/ent… Ksar of Tatiouine Point(-4.455555555 32.…
#> 3 http://www.wikidata.org/ent… Ksar of Tissouit Ait Se… Point(-4.431944444 32.…
#> 4 http://www.wikidata.org/ent… Ksar of Tissouit Sidi H… Point(-4.443611111 32.…
#> 5 http://www.wikidata.org/ent… Ksar El Atteuf Point(3.7449 32.4755)
#> 6 http://www.wikidata.org/ent… Ksar Bounoura Point(3.7039 32.4827)
#> 7 http://www.wikidata.org/ent… Ksar Melika Point(3.6867 32.4831)
#> 8 http://www.wikidata.org/ent… Ksar of Ouadane Point(-11.6236 20.9289)
#> 9 http://www.wikidata.org/ent… Ksar of Chinguetti Point(-12.3667 20.4633)
#> 10 http://www.wikidata.org/ent… Ksar of Tichit Point(-9.4692 18.4194)
We can refine this query, stating that we want cities (or items of subclasses of city) in a radius of 5km around Lyon (which has lat-long coordinates ~ 45.76 and 4.84). We will use the argument within_distance
:
cities_around_Lyon=add_triplets(subject="?city",
verb="wdt:P31/wdt:P279*",
object="wd:Q486972", label="?city") %>%
add_triplets(subject="?city",
verb="wdt:P625",
object="?coords",
within_distance=list(center=c(long=4.84,lat=45.76),
radius=5)) %>%
build_sparql() %>%
send_sparql()
head(cities_around_Lyon)
#> # A tibble: 6 x 3
#> city cityLabel coords
#> <chr> <chr> <chr>
#> 1 http://www.wikidata.org/entity/Q29047… Bissardon Point(4.83879 45.7834)
#> 2 http://www.wikidata.org/entity/Q21209… Q21209234 Point(4.885679 45.761305)
#> 3 http://www.wikidata.org/entity/Q30078… Cusset Point(4.90338 45.7668)
#> 4 http://www.wikidata.org/entity/Q31157… Gratte-ciel Point(4.87947 45.7679)
#> 5 http://www.wikidata.org/entity/Q20987… Q20987614 Point(4.89223385 45.77660…
#> 6 http://www.wikidata.org/entity/Q16507… Croix-Luizet Point(4.88347 45.7808)
Actually, rather than getting Lyon’s long-lat coordinates “by hand” we could get them directly through our query, which means we have to add a triplet about Lyon (“wd:456”) and its coordinates:
add_triplets(subject="wd:Q456",
verb="wdt:P625",
object="?coordLyon") %>%
add_triplets(subject="?city",
verb="wdt:P625",
object="?coords",
within_distance=list(center="?coordLyon", radius=2)) %>% add_triplets(subject="?city",
verb="wdt:P31/wdt:P279*",
object="wd:Q486972",
label="?city") %>%
build_sparql() %>%
send_sparql()