
Before the COVID-19 outbreak, the call for papers for Codemotion Rome 2020 had closed with 646 proposals submitted. After all these months, there are still some important lessons that we can learn from those proposals. The following article represents a way to look back at those submissions.
If you want to know more about how modern technologies and tools can support you in organising – and during – a virtual event, don’t miss this article showcasing the best of the tools used to host our online conferences since the COVID-19 outbreak.
Loading the data
Each author that submitted a talk to a Codemotion conference filled in a form that included their employer (company) and related communities in the submission: while some fields are mandatory – the company field, for example – others. like communities, are not, so we knew beforehand that some authors wouldn’t fill in the communities field as it wasn’t strictly necessary.
Let’s move on with the data loading process and leave remaining thoughts on the data to the analysis. The data was exported in a JSON file ready to be imported to a database.

To perform the analysis, I used a graph database – specifically, Neo4J in its Desktop version – which provides an easy management utility for quick and dirty analysis of datasets.
Once the Codemotion Rome 2020 database was created on the application, the next task to perform was the importation, for which we needed to set up the right environment, i.e.:
- give the database the permission to read a file from disk (it’s disabled by default)
- put the file into the import folder
- enable the APOC plugin (a set of utility functions for the database)
- write the cypher query to ingest the data
The first point simply requires the addition of a single line to the neo4j.conf file. It is possible to do this in the Settings tab on the Neo4J Desktop:
// append it to neo4j.conf to give permission to read from disk
apoc.import.file.enabled=true
Code language: JavaScript (javascript)
Next, move to the Plugins tab and install the APOC plugin:

When APOC has completed the installation process, click on the “Open Folder” button and then on the “Import” folder: copy the JSON file with the data into the folder.

Finally, restart the database by pressing the STOP and PLAY buttons on the Desktop application to make the new configuration take effect, and when ready, click on the “Open Browser” button.
In the query box, paste the import query for the data:
// load the file content into a variable name "value"
WITH "file:///codemotion-rome-2020-papers.json" AS url
CALL apoc.load.json(url) YIELD value
// now iterate on each topic and create a node for each topic
FOREACH (t IN value.topics |
MERGE (topic:Topic {id: t.idtopic})
ON CREATE SET
topic.name = t.nametopic
)
// from the value, now pick one paper per time with UNWIND
WITH value AS v
UNWIND v.papers AS p
// create a node for each Paper (talk/submission)
MERGE (paper:Paper {id: p.idcall4paper})
ON CREATE SET
paper.title = p.title,
paper.abstract = p.abstract,
paper.level = p.level,
paper.performedAlready = p.performed,
paper.length = p.length,
paper.language = p.language
// Now create a node for each speaker who submitted
MERGE (speaker:Speaker {id: p.speaker.idspeaker})
ON CREATE SET
speaker.name = p.speaker.name,
speaker.surname = p.speaker.surname,
speaker.bio = p.speaker.biography
// And create a :PRESENTED link between the speaker and its paper
MERGE (paper)<-[:PRESENTED {submitted: p.submission_date}]-(speaker)
// Create a company node and link it to the current speaker
MERGE (company:Company {name: p.company})
MERGE (speaker)-[:WORKS_AT]->(company)
// pick the "community" field and split it by comma,
// if any, create a node and link it to the current speaker
FOREACH(comm in SPLIT(p.community, ', ') |
MERGE (community:Community {name: comm})
MERGE (speaker)-[:BELONGS_TO]->(community)
)
// if any co-speaker declared, create a node and link it with the speaker
FOREACH (co IN p.cospeakers |
MERGE (cospeaker:Speaker {id: co.idspeaker})
ON CREATE SET
cospeaker.name = co.name,
cospeaker.surname = co.surname,
cospeaker.bio = co.biography
MERGE (paper)<-[:PRESENTED {submitted: p.submission_date}]-(cospeaker)
// Look at the company of the co-speaker and link it with him
MERGE (company:Company {name: co.company})
MERGE (cospeaker)-[:WORKS_AT]->(company)
// Same for the communities
FOREACH(comm in SPLIT(co.community, ', ') |
MERGE (community:Community {name: comm})
MERGE (speaker)-[:BELONGS_TO]->(community)
)
)
// last, connect the paper with the topic based on its Id
WITH paper, p
MATCH (t:Topic{id: p.idtopic})
MERGE (paper)-[:IS_ABOUT]->(t)
Code language: PHP (php)
The model created for the data is as follows:

While the model looks pretty basic and simple, it is enough to perform some interesting analysis of the domain.
Codemotion Rome 2020 – explore the data: Communities
Codemotion has been always a strong supporter of the IT community, initially via the conference only, evolving into the #Aperitech brand for local Italian meetups – a brand that has recently extended across the whole of Europe with its own Codemotion meetup platform.
Consequently, this was a good opportunity to gain an insight into the impact of communities on the Codemotion Rome 2020 conference, starting from analysis of the submissions.
The first question: is it possible to quantify the number of communities involved in the dataset somehow, perhaps via the speaker bio? How many communities will be (indirectly) involved with the next edition of the Codemotion conference?
// how many communities are there in the dataset?
MATCH (c:Community) RETURN count(c)
Code language: PHP (php)
492 communities have been named by the various speakers.
This is quite a basic query, and it’s hard to derive any value from that simple number: as an analyst I’m interested in gaining value from more interesting questions that can provide an insight into the conference speakers’ audience.
Analysis of clusters of communities could provide one interesting insight, starting with the paper association: given a submission and the communities related to it, is it possible to see a cluster of communities?
// Community -> Paper
MATCH (community)<-[:BELONGS_TO]-()-[:PRESENTED]->(paper:Paper)
WHERE NOT community.name = ""
WITH community, paper, apoc.create.vRelationship(paper, "RELATED", {}, community) AS r1
RETURN community, paper, r1
Code language: PHP (php)
Because in the model above the ‘Paper’ node and the ‘Community’ node are indirectly connected, it is necessary to create a virtual edge using an APOC utility as a way to create a direct relationship between the two, inferred from the indirect one.
The result given is the following:

From a 10,000 foot view, while it is not possible to read the text in each node and link of the graph, it’s already possible to detect some clusters of communities (centered) and filter out some small graph components (those at the border).
In particular, there’s a sizable cluster in the center, with quite a few communities involved: what are those communities? Why are they so connected? Is there a shared property?
Zooming in on the particular graph component, it is possible to read the community names:

On closer inspection, it looks like a cluster of DotNet communities (with some exceptions), so the first guess of shared property worked quite well. In this case, all of these communities share the Microsoft framework foundation, even though they are geographically distributed across Italy and Europe in general.
But of these communities, which is the most ‘popular’ – the one that contributes the highest number of speakers?
To answer this question we can count the ‘degree’ of each community node – thereby counting all the ‘RELATED’ relationships stemming from each community node – and show it in a table:
// Community degree sorted by degree
MATCH (community:Community)
WHERE NOT community.name = ""
WITH community, size((community)<-[:BELONGS_TO]-()-[:PRESENTED]->(:Paper)) as degree
ORDER BY degree DESC
RETURN community.name, degree
Code language: PHP (php)
These are the top five results:

In the first three positions, we can see DotNet Communities related to twelve, eleven and ten submissions respectively. To compare, the first Google community, GDG, is fifth in rank, with nine related submissions – still quite a good result!