sexta-feira, 3 de outubro de 2014

The rules of the game: the evaluation of Portuguese research units

The latest of the periodic evaluations of Portuguese research units of the responsibility of the Portuguese Science Foundation (FCT) was outsourced to the European Science Foundation (ESF).

Due to the crucial importance of such a process, researchers had expected this to be a well-planned and well-executed exercise, having as its main target the creation of conditions for the best Portuguese science to continue developing at least at the same rate as it had been progressing over the last 20 years.

Instead, we were faced with an operation which had as its (non-disclosed) purpose the elimination of half of the research units in the country, as is clear from the contract between FCT and ESF, made public under legal pressure after the results of the first stage came out.

Furthermore, the whole process is ridden with lack of transparency and irregularities ranging from the curious (a strong geographic bias in the panels towards the UK) to the very serious (rules changed after the deadline for applications when the evaluation was already in full swing).

Below we present a short summary of the set of differences between the evaluation of Portuguese research units as announced by FCT/ESF and as it is actually being executed. For a full description, including the context of such evaluations, see here for a version in English, and here for a more complete version in Portuguese, including some data.


The evaluation, as announced by FCT/ESF
The evaluation, as carried out by FCT/ESF
Evaluation based on quality and merit alone
A hidden quota of 50% of units making it to the second stage was imposed. This prevailed over quality and merit considerations whenever necessary for panels to produce the shortlist with half the units, as requested by the work plan.
Reports by up to 5 reviewers, one from the panel
Reports by 3 reviewers in all cases, one from the panel; the reviewer from the panel was not necessarily an expert in the area under consideration. Besides not being in agreement with best international practices, such a small number of reviewers posed problems due to the large discrepancies between marks – see next point.










Robust
Large discrepancies between highest and lowest marks of the 3 reviewers. From a sample of 40% of all units evaluated, it is estimated that in 50% of the cases this difference was greater than or equal to 5 points in a possible maximum of 17. One implication is that in about 50% of the cases referees disagreed as to whether or not a unit should make it to the second stage.
Results are at odds with the bibliometrics study ordered by FCT from Elsevier, specifically for the purpose of this evaluation. In many situations they are also in contradiction with the results from previous evaluations and international rankings.
An error in the bibliometrics file used by panels during the first stage shows that the scientific productions used by evaluators were about half the real value.

This file was made available to units only after the results of the first stage became known. The error was then detected by some units and immediately pointed out to FCT. In some first stage reports panels criticised “low productivity rates” explicitly.






Peer review
The mixed panels did not provide a proper global coverage of areas, with disparate subject matters such as Chemistry, Physics, Material Sciences, Mathematics and Nanosciences and Nanotechnologies all being packed under an 11-member panel called Exact Sciences.

According to FCT/ESF, this was not a problem since that coverage would be ensured by the external referees. However, the opinions of these experts were ignored in many cases whenever it became necessary to fulfill the 50% quota. In situations where the internal reviewer's mark and the average of the external reviewers' (experts) marks did not agree as to whether the unit should make it to the second stage, the opinion of the internal reviewer prevailed in 2/3 of the cases.
Referee indicated by research unit was to participate in the final consensus report of the first stage
Simply did not happen.







Geographic and gender balance (from the evaluation guide, page 12):

The constitution of the evaluation panels will take into consideration the number of applications for each scientific domain, a good gender balance as well as a fair geographic and institutional distribution of evaluators.

Actual distribution:

Geographic:
UK: 17 (4 chairs)
Italy: 11 (1 chair)
France: 5
Germany: 5
Belgium: 4 (1 chair)
the Netherlands: 4
Denmark: 3
Finland: 3
Ireland: 3
Spain: 3
Austria: 1
Croatia: 1
Czech Republic: 1
Cyprus: 1
Estonia: 1
Greece: 1
Hungary: 1
Israel: 1
Luxembourg: 1
Norway: 1
Poland: 1
Sweden: 1
Switzerland: 1
Turkey: 1
USA: 1
Gender:
M: 61 (84%); F: 12 (16%)
Note: three of the panels are all male.
Feedback to centres (from the evaluation guide, page 15):

In accordance with the Portuguese law, the candidates will also have the right to submit a prior hearing, within 10 days after notification of the results, which should be answered before the beginning of the 2nd stage of the evaluation process.
What happened:

The second stage began in July, was interrupted during the holiday season in August, and started again on the 3rd of September. The results were communicated to the units on the 2nd of October.




Chart with the actual geographic distribution of evaluators


Sem comentários:

UM CRIME OITOCENTISTA

Artigo meu num recente JL: Um dos crimes mais famosos do século XIX português foi o envenenamento de três crianças, com origem na ingestão ...