Open data in Italy: open on paper, unusable in practice

Direct answer: in Italy, public-interest data — when it exists — is often badly kept: «open» on paper but technically hard to use. Integrating it for Open·Parlamento, we hit, source after source, barriers that shouldn’t exist: portals returning zero bytes to programs, empty «official» endpoints, broken security certificates, malformed links even where an API exists, and civic projects frozen for years. This article documents them one by one — not as a rant, but because the gap between «published» and «usable» is huge.
TL;DR
- Constitutional Court: to a program not pretending to be a browser it answers zero bytes; data arrives in nested ZIPs with 1990s encodings (cp1252). And the «official» SPARQL answers 200 empty.
- Court of Cassation: no documented API; the
.giustizia.itdomains have incomplete TLS certificates that break standard connections. - ISTAT: first unreachable (HTTP 000), then back but limited to 5 requests per minute (with IP ban if exceeded).
- data.europa.eu: the API exists, but the dataset links come back broken (an «identifier» field returned as a list).
- Abandoned civic projects: municipal budgets frozen at 2021/22, Open Municipio with no updates since 2017.
- CKAN: there are ~950–2000 instances worldwide, but many are dated; the real value is in a few well-kept exceptions.
«Open» doesn’t mean «usable»
There’s a difference the open-data rhetoric tends to hide: a dataset can be published — ticking the transparency box — without being usable by a program. It’s the distance between a scanned PDF and a documented API. In Italy that distance is often huge, and it falls on those who’d actually use the data: researchers, journalists, civic developers.
What follows isn’t an opinion: it’s the log of real problems we hit building the connectors, with the workarounds we had to implement.
Doors shut to programs (bot-gating)
The Constitutional Court’s open-data portal, to a client not sending a browser User-Agent, answers zero bytes. To get the data we had to pretend to be a browser (Mozilla/5.0). Once inside, data arrives in nested ZIP archives (zip inside zip), with 1990s cp1252 text encodings: it takes recursive unzipping and multi-encoding decoding.
It’s paradoxical: public data, formally open, but effectively locked against the only sensible way to consume it at scale — a program.
The dead official engines
More than one Italian «official» SPARQL endpoint, queried from a script, answers 200 but empty. The Constitutional Court is the example: «the official SPARQL answers 200 empty from a script → unusable». The fix is to fall back to raw JSON datasets instead of the endpoint built precisely for queries. When the official infrastructure is a façade, the real work is done by whoever downloads the dumps.
The «APIs» that aren’t there, and broken certificates
Searching Cassation rulings has no documented public interface: the real backend (Solr) is undocumented. And the .giustizia.it sites present an incomplete TLS certificate chain that errors out (CERTIFICATE_VERIFY_FAILED) any standard connection. To read read-only public data we had to explicitly handle the TLS exception — something a State service should never require.
Hidden limits, even where the API exists
- ISTAT. The statistics endpoint (SDMX) first returned HTTP 000 (unreachable), then came back but limited to 5 requests per minute, with an IP ban for 1–2 days if you exceed it. Practical result: unusable for serious integrations — we used Eurostat, which exposes the same key indicators without these barriers.
- data.europa.eu (the EU portal). The API exists, but dataset links came back broken: the
identifierfield returned as a list instead of a string, producing malformed URLs we had to reconstruct connector-side. - Undeclared time series. A «clean» CSV on migrant reception actually contained 7 overlapping annual snapshots (2018–2024): if you don’t deduplicate, you count everything seven times. Nothing in the schema flagged it.
The abandoned civic projects
The best initiatives often stall. Open municipal budgets were updated only to 2021/22, with no API, accessible only via dump or scraping. Open Municipio, a platform built for city-council transparency, has been frozen since 2017 and must be installed municipality by municipality. Precious data that, without maintenance, slowly becomes unusable.
And CKAN? The standard is there, the care isn’t
CKAN is the de facto standard for open-data catalogs: there are ~950–2000 active instances worldwide. That’s good news — a single connector opens hundreds of portals. The bad news is that having a CKAN portal doesn’t mean keeping it updated: many Italian catalogs are dated, populated once «for transparency» and then left to age. The real value concentrates in a few well-maintained exceptions (some regional and city portals, and the European portal for harmonized datasets). The rest is often a shop window: open on paper, frozen in practice.
Why it matters
It’s not a technical detail. It’s why data that should belong to everyone remains, in practice, accessible only to those with the time and skills to «tame» it. Declared transparency isn’t enough: without maintenance, documentation and usable formats, open data stays a label. Open·Parlamento’s job is exactly this middle layer — making queryable the sources that, on their own, aren’t — always citing the original.
FAQ
Isn’t Italian public data open by law?
Much of it formally is: published, with an open license. But «published» doesn’t mean «usable by a program»: between an endpoint that answers empty, broken certificates and malformed links, real accessibility is often far worse than declared.
What’s the most common problem?
There isn’t just one: bot-gating against non-browser clients, dead official SPARQL, broken TLS, legacy encodings, nested ZIPs, hidden rate limits, broken links even where an API exists, and abandoned civic projects. The underlying pattern is a lack of maintenance.
How do you make them actually usable?
You need a layer that probes and «tames» each source: correct User-Agents, fallback to dumps when the endpoint is empty, encoding handling, deduplication of time series, URL reconstruction. It’s the unglamorous but decisive work that separates published data from usable data.
We turn hard public sources into truly queryable data and knowledge graphs — always citing the original. See how Open·Parlamento works and its open data, or let’s talk.
Related articles
- How I put the Italian Constitution and codes into a knowledge graphThe Constitution and seven codes (criminal, civil, procedure, road, consumer, insurance): ~6,445 articles ingested one by one, with ELI identifiers and 86,672 authoritative amendment relations from Normattiva. Here’s the method, the numbers and the real limits.
- I downloaded EU law and added it to the Italian-law graphFrom the EU Publications Office bulk dump (56,292 acts) I extracted and linked thousands of European statutes — regulations, directives, decisions — to the Italian-law graph, with CELEX and ELI. Here’s the method, the real numbers and what’s still missing.
- Italian Parliament graph: why the Chamber and Senate don’t speak the same languageThe Italian Chamber and Senate publish great open data but with two different ontologies (OCD and OSR): surname, legislature, dates and relations are modeled in incompatible ways. Here are the problems — and how we solved them building a single graph.