Imagine you are in the monitoring and control room of your city, where all relevantcoming from sensors, , social media streams, IoT devices, vehicles and so on are displayed on multifunctional dashboards. Imagine you can read on those dashboards predictions and user behaviour analysis, which can support your decisions and address your . As IT professionals, we know this is the work of separate, interlaced technologies, not magic, and we like to move from the initial sense of wonder to understanding and confidence.
Snap4City, an open-source platform that allows us to create solutions for, is the best testbed to learn how this magic happens. Snap4City is developed by DISIT Lab (Distributed Systems and Internet Technologies Lab) of University of Florence and is currently in use to aggregate open and private from and to some local administrations in .
Being a framework that must produce and provide some kind of value from data exploitation, at Snap4City’s core is the ability to aggregate and integratefrom different providers, in different protocols.
The 5 Vs of Smart Cities Data
The aims, problems and challenges for Snap4City are to turnfrom disparate sources into actionable information. In fact, Snap4City provides a platform able to ingest and take advantage of large amounts of spread , exploiting data integration and reasoning to deliver new services and application to citizens and administrators.
Img 1: Private and public data, from static and real-time sources
This data can be provided in many different protocols and formats, and from many different sources. Moreover, theis not aligned, for example, the same street names, dates or tags may differ when provided by different sources.
Snap4City must be able to manage a diversity of data types – variety – that are being created with rapidly increasing speed by technological advances – velocity. This huge amount of information – volume – can change its exact flow from time to time – variability – and the infrastructure required to collect and interpretmust produce insights – value.
Those are the five relevant characteristics of a big data problem/solution, a.k.a. “the 5 Vs of big”.
Modalities and Strategies for Data Ingestion
Snap4City platform gathers information from several sources.can be provided in any format and via any protocol. This means that can be both structured and non-structured, as well as flowing as static or as real time data, with or without metadata descriptors.
Staticcan be imported in Snap4City via driven, stream, sporadic and/or periodic processes. Static is typically ingested with DataGate for automated ingestion, while both static and real-time can be ingested through Node-RED, Apache NiFi, and custom procedures in the form of ETL processes (Extract, Transform, Load). Several different formats and structures can be addressed by creating specific ETL processes for each data source family.
Img 2: Schematic of semantic data aggregation in Snap4City/Km4City
The mined/acquired data is subsequently stored into noSQL. From here, information can be exploited for data analytics, dashboards, etc.
ETL Processes for Data Ingestion
ETL processes are used for data gathering, collecting files from HTTP/FTP protocols. Some ETL processes that have been developed and are now in place to manage data ingestion for the DISIT Lab Github project page.of Florence and Tuscany are accessible on
These includefor from traffic sensors, parking lots, weather forecast, cost of fuels, environmental , etc. Examples of sources and data ingestion processes by ETL and Snap4City applications are crawling public web pages for collecting hospital triage statuses, periodically reading from data in GTFS format about public transportation schedule, stops, paths, etc., integrating civic number location from OpenStreetMap to Snap4City Knowledge Base.
Img 3: Example of batch processing in Snap4City integrated ETL development environment for dynamic data ingestions
Node-RED Blocks for Data Ingestion
Img 04: Example of Node-RED flow in Snap4Cityfor event driven, real time data ingestion
Detailed information about how to create ETL process and Node-Red flows in Snap4City can be found on the project website.
Semantic Modeling and City Knowledge Base
Being strictly related tomoving to and from sensors and actuators placed on territory, Snap4City must deal with complex and interrelated geospatial info. For example, Snap4City must be able to manage restricted traffic zone gates, or environmental sensors for air quality, pollution, rain or public light pillars, must know public buses routes or POIs inside a defined area and so on.
Any entity managed by Snap4City is mapped into a smart-city dedicated ontology. This ontology and the related knowledge model enables the description of . It is not small and can be viewed as consisting of various macro classes or macro categories:
- Administration – PA, Municipality, Province, Region, Resolution,
- Street guide – Road, RoadElement, AdministrativeRoad, Milestone, StreetNumber, RoadLink, Junction, Entry, Node, EntryRule and Maneuver,
- Points-of-Interest – includes all services, activities, which may be useful to the citizen and that may have the need to reach
- Local public transport – Ride, Route, RouteSection, BusStopForeast, Lot, BusStop, RouteLink,
- Sensors – macro-class relative to coming from sensors, from parking lot status to weather
- Temporal – include concepts related to time (time instants and time intervals) in the ontology, so that you can associate a timeline to the recorded events and can be able to make predictions
- Metadata – set of triples associated with the context of each dataset, useful to ingestion process
After ingestion, flows through several phases in order to have semantically interoperable . First is data quality improvement, in order to resolve inconsistencies and incompleteness. Usual problems in this phase are locations and street names, and normalisation of dates, times and numbers.
A subsequent phase is data mapping, where
Mapped in triples have to be uploaded (and indexed) to a RDF Store, where a dataset may be connected with the others if entities refer to the same triples.
Img 05:ingested by Snap4City displayed on Linked Open Graph interface
Applications can access all managedusing dedicated SPARQL endpoint. Examples of application accessing the info collected by Snap4City are ServiceMap (http://servicemap.disit.org) for a map based access and Linked Open Graph (http://log.disit.org) for browsing the directly from SPARQL/Linked Data sources.
Snap4City platforms and solutions are available at Snap4city.org website, where you can register and start exploring its . But if you want to really challenge your skills and propose your innovative solutions for connected cities, you can join the upcoming Snap4City Hack, the big online hackathon on the topics of and IoT.
Snap4City launched a big online Hackathon on Ecological Watch, Social and Service Evolution, Stimulating Business Growth, City Aware.
Using Snap4City tools to manage IoT, Big Data and analytics, the hackathon challenges will cover different themes – Ecological Watch, Social and Service Evolution, Stimulating Business Growth, City Aware – and realfrom cities such as Helsinki or Antwerp.
Further information on the Hackathon is available on the on Snap4City.org/hackathon website.