Imagine you are in the monitoring and control room of your city, where all relevant data coming from sensors, mobile devices, social media streams, IoT devices, vehicles and so on are displayed on multifunctional dashboards. Imagine you can read on those dashboards predictions and user behaviour analysis, which can support your decisions and address your strategies. As IT professionals, we know this is the work of separate, interlaced technologies, not magic, and we like to move from the initial sense of wonder to understanding and confidence.
Snap4City, an open-source platform that allows us to create solutions for smart cities, is the best testbed to learn how this magic happens. Snap4City is developed by DISIT Lab (Distributed Systems and Internet Technologies Lab) of University of Florence and is currently in use to aggregate open data and private data from and to some local administrations in Italy.
Being a framework that must produce and provide some kind of value from data exploitation, at Snap4City’s core is the ability to aggregate and integrate data from different providers, in different protocols.
The 5 Vs of Smart Cities Data
The aims, problems and challenges for Snap4City are to turn data from disparate sources into actionable information. In fact, Snap4City provides a platform able to ingest and take advantage of large amounts of spread data, exploiting data integration and reasoning to deliver new services and application to citizens and administrators.
Img 1: Private and public data, from static and real-time sources
This data can be provided in many different protocols and formats, and from many different sources. Moreover, the data is not aligned, for example, the same street names, dates or tags may differ when provided by different sources.
Snap4City must be able to manage a diversity of data types – variety – that are being created with rapidly increasing speed by technological advances – velocity. This huge amount of information – volume – can change its exact flow from time to time – variability – and the infrastructure required to collect and interpret data must produce insights – value.
Those are the five relevant characteristics of a big data problem/solution, a.k.a. “the 5 Vs of big data”.
Modalities and Strategies for Data Ingestion
Snap4City platform gathers information from several sources. Data can be provided in any format and via any protocol. This means that data can be both structured and non-structured, as well as flowing as static data or as real time data, with or without metadata descriptors.
Static data can be imported in Snap4City via data driven, stream, sporadic and/or periodic processes. Static data is typically ingested with DataGate for automated ingestion, while both static and real-time data can be ingested through Node-RED, Apache NiFi, and custom procedures in the form of ETL processes (Extract, Transform, Load). Several different formats and structures can be addressed by creating specific ETL processes for each data source family.
Img 2: Schematic of semantic data aggregation in Snap4City/Km4City
The mined/acquired data is subsequently stored into noSQL database. From here, information can be exploited for data analytics, dashboards, etc.
ETL Processes for Data Ingestion
ETL processes are used for data gathering, collecting files from HTTP/FTP protocols. Some ETL processes that have been developed and are now in place to manage data ingestion for the Smart City of Florence and Tuscany are accessible on DISIT Lab Github project page.
These include management for data from traffic sensors, parking lots, weather forecast, cost of fuels, environmental data, etc. Examples of sources and data ingestion processes by ETL and Snap4City applications are crawling public web pages for collecting hospital triage statuses, periodically reading from web server data in GTFS format about public transportation schedule, stops, paths, etc., integrating civic number location from OpenStreetMap to Snap4City Knowledge Base.
Img 3: Example of batch processing in Snap4City integrated ETL development environment for dynamic data ingestions
Node-RED Blocks for Data Ingestion
Img 04: Example of Node-RED flow in Snap4City integrated development environment for event driven, real time data ingestion
Detailed information about how to create ETL process and Node-Red flows in Snap4City can be found on the project website.
Semantic Modeling and City Knowledge Base
Being strictly related to data moving to and from sensors and actuators placed on territory, Snap4City must deal with complex and interrelated geospatial info. For example, Snap4City must be able to manage restricted traffic zone gates, or environmental sensors for air quality, pollution, rain or public light pillars, must know public buses routes or POIs inside a defined area and so on.
Any entity managed by Snap4City is mapped into a smart-city dedicated ontology. This ontology and the related knowledge model enables the description of smart cities. It is not small and can be viewed as consisting of various macro classes or macro categories:
- Administration – PA, Municipality, Province, Region, Resolution,
- Street guide – Road, RoadElement, AdministrativeRoad, Milestone, StreetNumber, RoadLink, Junction, Entry, Node, EntryRule and Maneuver,
- Points-of-Interest – includes all services, activities, which may be useful to the citizen and that may have the need to reach
- Local public transport – Ride, Route, RouteSection, BusStopForeast, Lot, BusStop, RouteLink,
- Sensors – macro-class relative to data coming from sensors, from parking lot status to weather
- Temporal – include concepts related to time (time instants and time intervals) in the ontology, so that you can associate a timeline to the recorded events and can be able to make predictions
- Metadata – set of triples associated with the context of each dataset, useful to ingestion process
After ingestion, data flows through several phases in order to have semantically interoperable data. First is data quality improvement, in order to resolve inconsistencies and incompleteness. Usual problems in this phase are locations and street names, and normalisation of dates, times and numbers.
A subsequent phase is data mapping, where data is transformed to RDF triples (a triple is the atomic data entity in the Resource Description Framework data model and codifies a statement about semantic data in the form of subject–predicate–object expressions).
Mapped data in triples have to be uploaded (and indexed) to a RDF Store, where a dataset may be connected with the others if entities refer to the same triples.
Img 05: Data ingested by Snap4City displayed on Linked Open Graph interface
Applications can access all managed data using dedicated SPARQL endpoint. Examples of application accessing the info collected by Snap4City are ServiceMap (http://servicemap.disit.org) for a map based access and Linked Open Graph (http://log.disit.org) for browsing the data directly from SPARQL/Linked Data sources.
Snap4City platforms and solutions are available at Snap4city.org website, where you can register and start exploring its features. But if you want to really challenge your skills and propose your innovative solutions for connected cities, you can join the upcoming Snap4City Hack, the big online hackathon on the topics of Smart City and IoT.
Snap4City launched a big online Hackathon on Ecological Watch, Social and Service Evolution, Stimulating Business Growth, City Aware.
Using Snap4City tools to manage IoT, Big Data and analytics, the hackathon challenges will cover different themes – Ecological Watch, Social and Service Evolution, Stimulating Business Growth, City Aware – and real data from cities such as Helsinki or Antwerp.
Further information on the Hackathon is available on the on Snap4City.org/hackathon website.