library(recitR)

Find items and properties to build your query

To find the identifiers of items and properties of interest, you can:

  • browse Wikidata
  • use package WikidataR (functions find_item(),find_property()).

Then recitR functions might be used to start exploring data.

Example 1: Lyon Metro

Imagine you are interested in the Wikidata available regarding the Lyon Metro network.

Let’s try and see if there are Wikidata about it:

find_item("Metro Lyon")
#> 
#>  Wikidata item search
#> 
#> Number of results:    1 
#> 
#> Results:
#> 1     Lyon Metro (Q1552) - public transportation network in Lyon, France

So you’d be interested, for instance, in all the subway stations that are part of this network.

find_property("part of")
#> 
#>  Wikidata property search
#> 
#> Number of results:    10 
#> 
#> Results:
#> 1     part of (P361) - object of which the subject is a part (if this subject is already part of object A which is a part of object B, then please only make the subject part of object A). Inverse property of "has part" (P527, see also "has parts of the class" (P2670)). 
#> 2     parent organization (P749) - parent organization of an organization, opposite of subsidiaries (P355) 
#> 3     published in (P1433) - larger work that a given work was published in, like a book, journal or music album 
#> 4     constellation (P59) - the area of the celestial sphere of which the subject is a part (from a scientific standpoint, not an astrological one) 
#> 5     part of the series (P179) - series which contains the subject 
#> 6     member of sports team (P54) - sports teams or clubs that the subject currently represents or formerly represented 
#> 7     on focus list of Wikimedia project (P5008) - property to indicate that an item is of particular interest for a Wikimedia project. This property does not add notability. Items should not be created with this property if they are not notable for Wikidata. See also P6104, P972, P2354. 
#> 8     highway system (P16) - system (or specific country specific road type) of which the highway is a part 
#> 9     partially coincident with (P1382) - object that is partially part of, but not fully part of (P361), the subject 
#> 10    diaspora (P3833) - diaspora that a cultural group belongs to

So you’re looking for all the stations that are part of (“wd:P361”) the Lyon metro network (“wd:Q1552”).

You could access this information through:

stations=get_triplets(subject="?items",verb="wdt:P361",object="wd:Q1552")
stations %>% head()
#> # A tibble: 6 x 1
#>   items                                 
#>   <chr>                                 
#> 1 http://www.wikidata.org/entity/Q2944  
#> 2 http://www.wikidata.org/entity/Q2965  
#> 3 http://www.wikidata.org/entity/Q2969  
#> 4 http://www.wikidata.org/entity/Q2976  
#> 5 http://www.wikidata.org/entity/Q5298  
#> 6 http://www.wikidata.org/entity/Q599865

Notice that we do not have values yet for the stations (that’s what we’re looking for) hence the use of “?” at the beginning ot the subject string.

To also get the labels for stations, we can use the argument label:

parts_metro_Lyon=get_triplets(subject="?items",verb="wdt:P361",object="wd:Q1552", label="?items")
parts_metro_Lyon %>% head()
#> # A tibble: 6 x 2
#>   items                                  itemsLabel                        
#>   <chr>                                  <chr>                             
#> 1 http://www.wikidata.org/entity/Q2944   Lyon Metro Line A                 
#> 2 http://www.wikidata.org/entity/Q2965   Lyon Metro Line B                 
#> 3 http://www.wikidata.org/entity/Q2969   Lyon Metro Line C                 
#> 4 http://www.wikidata.org/entity/Q2976   Lyon Metro Line D                 
#> 5 http://www.wikidata.org/entity/Q5298   Bellecour                         
#> 6 http://www.wikidata.org/entity/Q599865 Place Guichard - Bourse du Travail

For now, we get 50 items, not only stations but also other types of items such as metro lines. Let’s have a look at the item “Place Guichard - Bourse du Travail” (Q59855) which we know correspond to a station.

The function get_claims() of package recitR enables you to see all the direct wikidata properties (and their values) based on an item’s id.

get_claims("wd:Q599865")
#> # A tibble: 16 x 7
#>    property  propertyLabel  value   valueLabel   propertyType  propertyDescript…
#>    <chr>     <chr>          <chr>   <chr>        <chr>         <chr>            
#>  1 http://w… country        http:/… France       http://wikib… "sovereign state…
#>  2 http://w… image          http:/… http://comm… http://wikib… "image of releva…
#>  3 http://w… instance of    http:/… metro stati… http://wikib… "that class of w…
#>  4 http://w… instance of    http:/… station loc… http://wikib… "that class of w…
#>  5 http://w… connecting li… http:/… Lyon Metro … http://wikib… "railway line(s)…
#>  6 http://w… located in th… http:/… 3rd arrondi… http://wikib… "the item is loc…
#>  7 http://w… adjacent stat… http:/… Saxe - Gamb… http://wikib… "the stations ne…
#>  8 http://w… adjacent stat… http:/… Gare Part-D… http://wikib… "the stations ne…
#>  9 http://w… station code   43      43           http://wikib… "generic identif…
#> 10 http://w… part of        http:/… Lyon Metro   http://wikib… "object of which…
#> 11 http://w… Commons categ… Place … Place Guich… http://wikib… "name of the Wik…
#> 12 http://w… inception      1981-0… 1981-09-14T… http://wikib… "date or point i…
#> 13 http://w… coordinate lo… Point(… Point(4.847… http://wikib… "geocoordinates …
#> 14 http://w… number of pla… 2.0     2            http://wikib… "number of track…
#> 15 http://w… date of offic… 1981-0… 1981-09-14T… http://wikib… "date or point i…
#> 16 http://w… Google Knowle… /g/1s0… /g/1s04bfz18 http://wikib… "identifier for …
#> # … with 1 more variable: propertyAltLabel <chr>

Property “wdt:P31” should enable us to collect only stations (“wd:Q928830”) instead of all parts of the Lyon Metro network.

We can enrich and refine our query incrementally with add_triplets() before sending it (whereas get_triplets() was a shortcut to build and send simple requests) :

stations_metro_Lyon=add_triplets(subject="?stations",verb="wdt:P361",object="wd:Q1552", label="?stations") %>% 
  add_triplets(subject="?stations",verb="wdt:P31",object="wd:Q928830") %>% 
  build_sparql() %>% 
  send_sparql()

head(stations_metro_Lyon)
#> # A tibble: 6 x 2
#>   stations                                stationsLabel                     
#>   <chr>                                   <chr>                             
#> 1 http://www.wikidata.org/entity/Q5298    Bellecour                         
#> 2 http://www.wikidata.org/entity/Q599865  Place Guichard - Bourse du Travail
#> 3 http://www.wikidata.org/entity/Q613893  Hôtel de Ville - Louis Pradel     
#> 4 http://www.wikidata.org/entity/Q776088  Cordeliers                        
#> 5 http://www.wikidata.org/entity/Q934869  Gare de Vénissieux                
#> 6 http://www.wikidata.org/entity/Q1847502 Stade de Gerland

We now get 42 stations that are part of the Lyon metro network.

If we wanted to get other properties and associated values for these stations (for instance their location (“wdt:P625”)) we could proceed this way:

stations_metro_Lyon=add_triplets(subject="?stations",verb="wdt:P361",object="wd:Q1552", label="?stations") %>% 
  add_triplets(subject="?stations",verb="wdt:P31",object="wd:Q928830") %>% 
  add_triplets(subject="?stations",verb="wdt:P625",object="?coords") %>% 
  build_sparql() %>% 
  send_sparql()

head(stations_metro_Lyon)
#> # A tibble: 6 x 3
#>   stations                     stationsLabel               coords               
#>   <chr>                        <chr>                       <chr>                
#> 1 http://www.wikidata.org/ent… Bellecour                   Point(4.83405 45.757…
#> 2 http://www.wikidata.org/ent… Place Guichard - Bourse du… Point(4.847308333 45…
#> 3 http://www.wikidata.org/ent… Hôtel de Ville - Louis Pra… Point(4.836022222 45…
#> 4 http://www.wikidata.org/ent… Cordeliers                  Point(4.835894444 45…
#> 5 http://www.wikidata.org/ent… Gare de Vénissieux          Point(4.88804 45.705…
#> 6 http://www.wikidata.org/ent… Stade de Gerland            Point(4.83038 45.727…

recitR provides functions to clean and transform “raw” Wikidata tibbles into tibbles that are easier to use in R.

Function clean_wikidata_table() lightens some columns with wikidata-formatted URIs:

stations_metro_Lyon=clean_wikidata_table(stations_metro_Lyon)

Function transform_wikidata_coords() get the coordinates as longitude (lng) and latitude (lat) based on the Wikidata WKT formatting of spatial coordinates (“Point(lng lat)”).

stations_metro_Lyon=stations_metro_Lyon %>% 
  transform_wikidata_coords("coords")

The resulting table may then be used easily with (for instance) package leaflet:

leaflet::leaflet(stations_metro_Lyon) %>%
  leaflet::addTiles() %>%
  leaflet::addCircles(popup=~stationsLabel)

Example 2 : cities around Lyon

Now, let’s imagine that we are interested in the cities in a 200km radius around Lyon.

find_item("Lyon")
#> 
#>  Wikidata item search
#> 
#> Number of results:    10 
#> 
#> Results:
#> 1     Lyon (Q456) - commune in the metropolis of Lyon, France 
#> 2     Olympique Lyonnais (Q704) - association football club in Lyon, France 
#> 3     Lyon (Q30102635) - family name 
#> 4     Marcus Ward Lyon, Jr. (Q3290305) - U.S. mammalogist, bacteriologist, and pathologist (1875–1942) 
#> 5     Lyon (Q294688) - Wikimedia disambiguation page 
#> 6     Lyon (Q867818) - town in Coahoma County, Mississippi, United States 
#> 7     9381 Lyon (Q1192758) - asteroid 
#> 8     Lyon (Q30014648) - male given name 
#> 9     Harold Lloyd Lyon (Q18911234) - botanist (1879-1957) 
#> 10    Lion (Q3833172) - rock band from the United States
find_item("city")
#> 
#>  Wikidata item search
#> 
#> Number of results:    10 
#> 
#> Results:
#> 1     city (Q515) - large permanent human settlement 
#> 2     Manchester City F.C. (Q50602) - association football club in Manchester, England 
#> 3     Leicester City F.C. (Q19481) - association football club in Leicester, England 
#> 4     Stoke City F.C. (Q18736) - association football club in Stoke-on-Trent, England 
#> 5     Birmingham City F.C. (Q19444) - association football club in Birmingham, England 
#> 6     Cardiff City F.C. (Q18662) - association football club in Cardiff, Wales 
#> 7     Norwich City F.C. (Q18721) - association football club in Norwich, England 
#> 8     Swansea City A.F.C. (Q18659) - association football club in Swansea, Wales 
#> 9     Hull City A.F.C. (Q19477) - association football club in Kingston upon Hull, England 
#> 10    Bradford City A.F.C. (Q48879) - association football club in Bradford, England

We could start exploring Wikidata with this query which finds all items that are instances (“wdt:P31”) of “city” or of any subclass (“wdt:P279”) of “city” . This query might return many items so that it seems reasonable to limit the number of items retrieved for now with the argument limit

add_triplets(subject="?city", verb="wdt:P31/wdt:P279*", object="wd:Q515", label="?city",limit=10) %>% 
  build_sparql() %>% 
  send_sparql()
#> # A tibble: 10 x 2
#>    city                                     cityLabel                 
#>    <chr>                                    <chr>                     
#>  1 http://www.wikidata.org/entity/Q309436   Ksar of Aït Benhaddou     
#>  2 http://www.wikidata.org/entity/Q817274   Beni Isguen               
#>  3 http://www.wikidata.org/entity/Q2670896  Ksar Ouled Soltane        
#>  4 http://www.wikidata.org/entity/Q3200131  Ksar Ifegh                
#>  5 http://www.wikidata.org/entity/Q3200135  Ksar of Lamaarka          
#>  6 http://www.wikidata.org/entity/Q3818705  Ksar Nalut                
#>  7 http://www.wikidata.org/entity/Q11736787 Ksar Hallouf              
#>  8 http://www.wikidata.org/entity/Q12233025 Ksar Beni Barka           
#>  9 http://www.wikidata.org/entity/Q16593169 Ksar of Taourirt          
#> 10 http://www.wikidata.org/entity/Q21029712 Ksar of Rgabi N'Ait Hassou

Now, let’s get the location (“wdt:P625”) of the cities

add_triplets(subject="?city",
             verb="wdt:P31/wdt:P279*",
             object="wd:Q515",
             label=c("?city"),
             limit=10) %>%
  add_triplets(subject="?city",
               verb="wdt:P625",
               object="?coords") %>% 
  build_sparql() %>% 
  send_sparql()
#> # A tibble: 10 x 3
#>    city                         cityLabel                coords                 
#>    <chr>                        <chr>                    <chr>                  
#>  1 http://www.wikidata.org/ent… Ksar of Tamoussa ou Ali  Point(-4.446111111 32.…
#>  2 http://www.wikidata.org/ent… Ksar of Tatiouine        Point(-4.455555555 32.…
#>  3 http://www.wikidata.org/ent… Ksar of Tissouit Ait Se… Point(-4.431944444 32.…
#>  4 http://www.wikidata.org/ent… Ksar of Tissouit Sidi H… Point(-4.443611111 32.…
#>  5 http://www.wikidata.org/ent… Ksar El Atteuf           Point(3.7449 32.4755)  
#>  6 http://www.wikidata.org/ent… Ksar Bounoura            Point(3.7039 32.4827)  
#>  7 http://www.wikidata.org/ent… Ksar Melika              Point(3.6867 32.4831)  
#>  8 http://www.wikidata.org/ent… Ksar of Ouadane          Point(-11.6236 20.9289)
#>  9 http://www.wikidata.org/ent… Ksar of Chinguetti       Point(-12.3667 20.4633)
#> 10 http://www.wikidata.org/ent… Ksar of Tichit           Point(-9.4692 18.4194)

We can refine this query, stating that we want cities (or items of subclasses of city) in a radius of 5km around Lyon (which has lat-long coordinates ~ 45.76 and 4.84). We will use the argument within_distance:

cities_around_Lyon=add_triplets(subject="?city",
                                verb="wdt:P31/wdt:P279*",
                                object="wd:Q486972", label="?city") %>% 
  add_triplets(subject="?city",
               verb="wdt:P625",
               object="?coords",
               within_distance=list(center=c(long=4.84,lat=45.76),
                                    radius=5)) %>% 
            build_sparql() %>%
            send_sparql()
head(cities_around_Lyon)
#> # A tibble: 6 x 3
#>   city                                   cityLabel    coords                    
#>   <chr>                                  <chr>        <chr>                     
#> 1 http://www.wikidata.org/entity/Q29047… Bissardon    Point(4.83879 45.7834)    
#> 2 http://www.wikidata.org/entity/Q21209… Q21209234    Point(4.885679 45.761305) 
#> 3 http://www.wikidata.org/entity/Q30078… Cusset       Point(4.90338 45.7668)    
#> 4 http://www.wikidata.org/entity/Q31157… Gratte-ciel  Point(4.87947 45.7679)    
#> 5 http://www.wikidata.org/entity/Q20987… Q20987614    Point(4.89223385 45.77660…
#> 6 http://www.wikidata.org/entity/Q16507… Croix-Luizet Point(4.88347 45.7808)

Actually, rather than getting Lyon’s long-lat coordinates “by hand” we could get them directly through our query, which means we have to add a triplet about Lyon (“wd:456”) and its coordinates:

add_triplets(subject="wd:Q456",
               verb="wdt:P625",
               object="?coordLyon") %>% 
  add_triplets(subject="?city",
               verb="wdt:P625",
               object="?coords",
               within_distance=list(center="?coordLyon", radius=2)) %>%   add_triplets(subject="?city",
             verb="wdt:P31/wdt:P279*",
             object="wd:Q486972",
             label="?city") %>%
  build_sparql() %>% 
  send_sparql()