I shouldn’t spend time on blogging, because that is taking away time from doing “real research”.
But maybe the line between what is “real research” (publishable as a thesis or a journal paper) and academic blogging as the intersection of talking about research and social activism is very thin. Since we who do research on political phenomena need data to do such research, getting data is important. Having access to open data makes part of our life much easier, but for now, getting open data that is interesting for our research still seems to be as much activism as it is part of research.
Take, for example, the image below. What you see there is an image of a network of 84 EU Commission expert groups and subgroups (a link means a minimum of 3 joint members)*.
This network image (click for full size) is based on Rob‘s data set on “organisations that are members of expert groups” (csv download file). Rob has scraped that data from the EU Commission’s expert group register and has posted it here.
Now why is this interesting for an academic blog post?
- First, I’m doing research using network analysis and so using, visualising and analysing relational data is what I try to become good at. And as I’ve indicated not long ago, I’m interested in affiliation and what Breiger (1974) has called “the duality of persons and groups“. The data set used here is such affiliation data (organisations being affiliated to expert groups). Furthermore, without focussing on expert groups as a subject, I’m also dealing with one or two of these groups in my empirical research.
- Second, I’ve recently listened to a podcast here on Ideasoneurope.eu with Mark Field who is doing research on EU Commission expert groups for his PhD. One of the problems Mark talked about was the problem of getting data (including through the Commission register). Having spoken on and discussed about that subject upon the invitation of ALTER-EU at an event of theirs not long ago (representing TI EU for whom I was volunteering until June), I though it was timely to make the link between activism and academic research now.
But what does this have to do with open data?
When Mark Field talked about the problem of getting data on EU expert groups, I was reminded that Rob had done the scraping. It also reminded me that I have talked with the Commission officials responsible for the expert group register before I went to the ALTER-EU workshop and asked them whether it was possible to publish the register data as an open data set.
The visualisation above, though just a non-perfect proof-of-concept, shows that access to a full open data set (in this case on Commission expert groups) allows a quick and more complete study of the expert group system. In the image above, we can see for example that groups on education and vocational training are linked through one group on the single market dialogue with the rest of the (sub-)system.
Seeing how groups interrelate can tell us more than just looking at individual groups. It can tell us how policy fields are linked and, which public or private organisations are actually responsible for this connection. From a different perspective, seeing how some groups or organisations are positioned within the whole system (e.g. by calculating their centrality), we may be able to make a more objective or more pertinent choice to look at some of them in more detail.
Yet, in order to get such data, it still needed somebody like Rob to scrape it from the Commission website although in principle the data is already available as a database (in order to make it available as a publicly searchable register).
For political scientists like Mark and me, this means that in order to study a system like the EU Commission expert groups, to discover positive and negative dynamics, process of interrelation and influence, we first have to find somebody who provides us with such data before we can start analysing – although the database as such exists already somewhere within the Commission.
In other words, publicly financed institutions like the Commission have data that they do not make available in a useful format so that mostly publicly-financed researchers need to spend a good share of the time of their research to gather such data before they can start analysing it. I think this is a quite an inefficient way of making use of public resources.
And if public institutions would make more databases available in usable data formats, especially when they already publish the information, political scientists (and others) would not waste time with double work but were able to provide good and new research, e.g. complementing public data sets with information that are not already gathered by institutions.
Yet, if we don’t discuss these issues publicly, this situation won’t change.
If we as researchers, instead of wasting time with scraping data or approaching institutions individually in order to get data for our “private” research, were asking for data that is of interest for our research to be made available to everyone, our academic efforts might yield much more for the general interest (and other researches) than just developing and test our own theories.
Thus, sometimes one also needs to spend a good amount of time blogging today about these issues instead of doing “proper research” – hoping that this might enable more public data and then “proper research” in the future. At the same time, one can talk about relevant research questions and methods to a public that goes beyond the academic community.
In the worst case, one loses a little time for one’s own research through academic blogging. In the best case, this creates a win-win-win situation.
* The image is just a proof-of-concept and the underlying data is both partially outdated or incomplete – because the register is undergoing an update – and just slightly cleaned up by me. Much more cleaning is needed to be sure the image would represent the actual network. Cleaning is needed because the same organisations are sometimes listed with slightly different names throughout the Commission expert group register which makes it difficult to construct an accurate organisation-by-group network.
The image you see is the transposed group-by-group network (using iGraph for R) that I’ve reduced so that a link between two expert groups indicates that at least 3 organisations are a member of both groups. The thicker a line between two groups, the more organisations with exactly the same name are co-members.
The whole data sample included 487 groups and sub-groups; the total number of expert groups of the Commission is somewhere below 1000 according to estimations. I’ve selected one interesting connected component of 84 groups and subgroups that spans several policy areas and includes one expert group on higher education which Mark seems to be is interested in. The colouring of the nodes is done automatically with the Conductance Cutting Clustering algorithm (granularity 0.5) implemented in visone, the program I used for the visualisation of network.