Import/Catalogue/Hotels-RAFVG

From OpenStreetMap Wiki
Jump to navigation Jump to search

About

This page is about importing Hotels dataset published by Regione Autonoma Friuli Venezia Giulia (RAFVG), Italy. Dataset shall be adapted in order to generate OSM files suitable to be imported in planet.osm. It shall not be a blind import: source data shall be checked by mappers through an audit support map.

The import is being discussed on the regional OSM mailing list. The import will be the result of consensus there.

Goals

This import aims to have a RAFVG-certified and updated set of POIs (OSM tourism=hotel) for the RAFVG territory (OSM admin_level=4).

Schedule

Import will be performed after community audit on shared json preview files. Audit progress will be trackable in project page. Import size should take 30-60 days to be accomplished.

Import Data

Background

Source dataset contains 738 punctual objects (as oct 2017) w/o geo coordinates. they were "directly managed" by municipalities where each POI has been registered, as stated in metadata page. Since regional OSM addresses were recently (2014) imported from RAFVG dataset, geocoding has been performed. Dataset addresses are provided by municipalities and compiled by Hotel operators, hence some POIs has not been correctly geocoded: final result is subset of source data.

Metadata

As defined in RAFVG metadata page, dataset feature the following:

  • provider: Anagrafe regionale delle strutture turistico-ricettive (ARSTR)
  • update frequency: 12 month
  • last update: 10/10/2017
  • refer to: Servizio turismo
  • licenza ed attribuzione: IODL

Legal

Record format and tagging plan

RAFVG dataset table structure will be pruned and adapted thru OpenRefine; fields will be mapped referring to hotel wiki page.

input data record format
Field Name Description:it Description:en sample tagged as
1 CODICE ESERCIZIO codice univoco scheda hotel unique hotel code 40080 ref
2 provincia nome provincia province name Gorizia
3 comune Nome Comune Municipality name Cormons
4 categoria categoria category comfort TBD stars=3
5 denominazione nome +"di"+ operatore nome +"run by"+ operator B&B MORENA di PICECH MORENA name + operator
6 indirizzo indirizzo address VIA PIAVE, 29/3
7 TELEFONO numero telefonico phone number 0481 630370 phone
8 FAX numero fax fax number 0481 630371 fax
9 EMAIL indirizzo email email address name@domain email
10 SITO url sito site url mybb.com website
11 N_CAMERE numero di camere number of rooms 3 rooms
12 N_POSTI_LETTO numero di posti letti number of beds 9 beds
13 PERIODO_APERTURA periodi di apertura openng times 01.I - 11.VIII, 13.VIII - 31.XII opening_hours

Import Type

The dataset will be imported on a regional base (OSM admin_level=4). Prior to upload, osm candidate file will be published to be manually checked by local team.

Team Approach

Import will be managed by the following OSM users:

  • Cascafico

Workflow

Step by step operations:

  1. dataset download
  2. OpenrRefine operations
  3. csvgeocode nominatim geocoding
  4. OpenRefine json export
  5. conflator run
  6. audit map announcement & publication
  7. wait for community validation
  8. conflation re-run
  9. support tags removed from .osm
  10. fixme removed where applicable
  11. osm file publication

In case of import problems, changeset involved will be reverted using proper reverter

Data Preparation

The data is presented as csv "comma separated values" file in a collection of punctual elements, one for each B&B.

Refining

Prior to OSM JSON conversion, some issues require refining operations, documented herein. Below, a summary of actions performed thru OpenRefine:

  • Remove unuseful columns
  • Standardization of TELEFONO, CELLULARE and FAX
  • Conversion of column PERIODO_APERTURA to OSM opening_hours standard
  • Split DENOMINAZIONE in name and operator
  • Split column INDIRIZZO by separator ","
  • Some INDIRIZZO 1 abbreviations expanded (ie: Loc. > Località, Fraz. > Frazione)
  • Reconcile cells in column INDIRIZZO 1 to authorirative dataset
  • Match each cell to its best recon candidate in column INDIRIZZO 1

Intermediate files

Here you can find all files used to set import: source dataset, refining operation, export template, conflation profile.

Conflation

Conflation is performed by OSM Conflator. Objects tagged ad tourism=guest_house will be extracted from OSM in a bounding box defined by source dataset. Existing OpenStreetMap data within a range is merged and tags will be added/replaced accordingly to conflator parameter file profile.py.

Conflator output example

pi@raspberrypi:~/OSM conflate -i AlberiMonumentali-FVG-csv.json  -o AM.osm -c previewAM.json profile.py
15:11:33 Downloaded 116 objects from OSM 
15:11:34 Matched 11 points 
15:11:34 Removed 17 unmatched duplicates 
15:11:34 Adding 410 unmatched dataset points 
15:11:35 Deleted 0 and retagged 0 unmatched objects from OSM
pi@raspberrypi:~/OSM

Upload

Dedicated upload account

The account attilaimport will be used to upload community revised .osm file.

Changeset Tags

Changeset will be tagged with:

Upload candidate

BBconflated.osm

Uploaded

changeset objects notes
x

QA

In case some problems will be detected after upload:

Widespread:

  • TBD

Limited:

  • TBD