
Codemotion Rome 2020: what data science can tell us
Before the COVID-19 outbreak, Codemotion was organising its next big conference. The Call for Papers for Codemotion Rome 2020 closed with a staggering number of submissions received: 646 proposals!
The planned conference was then cancelled, and of course, a lot has changed since then – so have you, most likely. But we can still benefit from this past experience by taking a deeper look into those submissions.
If you want to know more about how modern technologies and tools can support you in organising – and during – a virtual event, don’t miss our article showcasing the best of the tools we have used to host our online conferences since the start of the COVID-19 outbreak.
Deep dive into the data: Communities and Topics
In the first post of the series I analysed only a portion of the data imported into Neo4J, in particular the data about Papers (what we also call “submission”) and Communities. But the ingested data contains more information, such as information on Topics or Companies.
In this second post of the series, I’m going to explore how the topics set for the conference and the submission communities relate to see if it’s possible to see a pattern.
Let’s start by investigating the distribution of submission by topic: what are the top ten topics in terms of number of submissions?
// Topic -> Submissions
MATCH (topic:Topic)
WITH topic, size((topic)<-[:IS_ABOUT]-(:Paper)) as degree
ORDER BY degree DESC
RETURN topic.name, degree
LIMIT 10
This retrieves the following results:

‘Software Architectures‘ is the most popular topic for authors, with 97 submissions, followed closely by the catch-all topic ‘Inspirational‘ with 86 submissions.
In third position, ‘Front-end Dev‘ and ‘Cloud‘ have 74 submissions each, which are quite broad topics as well, although perhaps a bit less so than the first two.
It’s interesting to note that ‘Mobile‘ and ‘IoT‘, topics that were a focus for the conference, were also quite popular in 2019, but did not rank in the top ten this year.
Following this quantitative analysis, let’s move on to a qualitative analysis.
As mentioned in the previous post, speakers declare all the communities they belong to when applying, so one may see some strange (interesting? unusual?) associations, but this factor should be reduced by the quantity of submissions involved. I’d expect a frontend community to be closer to frontend topics in the graph, simply because more people from these communities should have submitted talks on the subject.
Is this guess accurate? Does the data confirm this?
To provide a reference to the mode adopted, this is the schema visualization:

Bearing this in mind, the request to extract the information I’m interested in is expressed with this Cypher query:
// Communities -> Topics
MATCH (community)<-[:BELONGS_TO]-()-[:PRESENTED]->()-[:IS_ABOUT]->(topic:Topic)
WHERE NOT community.name = ""
WITH community, topic, apoc.create.vRelationship(topic, "RELATED", {}, community) AS r1
RETURN community, topic, r1
In this query, I’m taking advantage of an APOC virtual relationship utility to infer a direct relationship between a community and a topic if there’s a path between the two nodes in the graph. This type of presentation provides a low virtual barrier that quickly filters out interesting patterns in the visualization, compared to the full data representation from the physical model.

Graph Data Visualization
It’s worth noting that data displayed on the browser as a graph are governed by a layout algorithm – the algorithm that decides where to position the nodes on the screen – that is based on forces between each node.
This is a sort of small scale physics simulation that provides a neat positioning for each node that is easy for our brain to navigate.
One classic property that this algorithm usually emphasises is ‘clusters‘, given the similar proximity of the nodes based on the graph topology. Put simply, two nodes that are connected to the same third node are likely to remain closer than others which have nothing in common.
Using this assumption from the algorithm, we can therefore quickly navigate through the visualization looking for topics that we expect to be closer: for instance, ‘Design/UX‘, ‘Mobile‘ and ‘Frontend‘ are all in the same area, as we might expect.

Where do we see violations of this (assumed) rule?
Probably the most surprising of these is the proximity of the topics ‘IoT‘ and ‘Games‘: experts in these subjects may have expected that these two subjects would be really close, but I was surprised to see ‘Cybersecurity‘ so far removed from this hub, for instance.
// Communities (filtered) -> Argomenti
MATCH (community)<-[:BELONGS_TO]-()-[:PRESENTED]->()-[:IS_ABOUT]->(topic:Topic)
WHERE NOT community.name = "" AND topic.name IN ['IoT', 'Game Dev', 'AI/Machine Learning', 'Cloud', 'Cybersecurity']
WITH community, topic, apoc.create.vRelationship(topic, "RELATED", {}, community) AS r1
RETURN community, topic, r1
By filtering out just these four communities, one can notice a specific distribution of communities in the visualization: some topics like ‘Cybersecurity‘ or ‘AI/Machine Learning‘ have a tight cluster of communities that focus primarily on that topic.

It was predictable that both the ‘OWASP‘ and ‘ISACA‘ communities would be connected to the ‘Cybersecurity‘ topic, as they are specific cybersecurity oriented communities. The same occurs where the Tensorflow or Machine Learning Meetup connects to the ‘AI/Machine Learning‘ topic.
The ‘Cloud‘ topic cluster seems the most popular of the selection. Several communities, mostly directly connected to the topic, but others not so, appear to have submitted a talk.
But what about the communities in the center? Those that are highlighted in the picture are a central cluster of communities that submitted on a multitude of the selected topics.
It would be surprising to find ‘vertical’ communities here (in this context meaning communities focused only on a single topic): in fact, these communities are mostly either technology-oriented (for instance Java or .NET focused) or have a wider focus (for example, GraphRM).