Open data in Italy: open on paper, unusable in practice

Direct answer: in Italy, public-interest data — when it exists — is often badly kept: «open» on paper but technically hard to use. Integrating it for Open·Parlamento, we hit, source after source, barriers that shouldn’t exist: portals returning zero bytes to programs, empty «official» endpoints, broken security certificates, malformed links even where an API exists, and civic projects frozen for years. This article documents them one by one — not as a rant, but because the gap between «published» and «usable» is huge.

TL;DR

Constitutional Court: to a program not pretending to be a browser it answers zero bytes; data arrives in nested ZIPs with 1990s encodings (cp1252). And the «official» SPARQL answers 200 empty.
Court of Cassation: no documented API; the .giustizia.it domains have incomplete TLS certificates that break standard connections.
ISTAT: first unreachable (HTTP 000), then back but limited to 5 requests per minute (with IP ban if exceeded).
data.europa.eu: the API exists, but the dataset links come back broken (an «identifier» field returned as a list).
Abandoned civic projects: municipal budgets frozen at 2021/22, Open Municipio with no updates since 2017.
CKAN: there are ~950–2000 instances worldwide, but many are dated; the real value is in a few well-kept exceptions.

«Open» doesn’t mean «usable»

There’s a difference the open-data rhetoric tends to hide: a dataset can be published — ticking the transparency box — without being usable by a program. It’s the distance between a scanned PDF and a documented API. In Italy that distance is often huge, and it falls on those who’d actually use the data: researchers, journalists, civic developers.

What follows isn’t an opinion: it’s the log of real problems we hit building the connectors, with the workarounds we had to implement.

Doors shut to programs (bot-gating)

The Constitutional Court’s open-data portal, to a client not sending a browser User-Agent, answers zero bytes. To get the data we had to pretend to be a browser (Mozilla/5.0). Once inside, data arrives in nested ZIP archives (zip inside zip), with 1990s cp1252 text encodings: it takes recursive unzipping and multi-encoding decoding.

It’s paradoxical: public data, formally open, but effectively locked against the only sensible way to consume it at scale — a program.

The dead official engines

More than one Italian «official» SPARQL endpoint, queried from a script, answers 200 but empty. The Constitutional Court is the example: «the official SPARQL answers 200 empty from a script → unusable». The fix is to fall back to raw JSON datasets instead of the endpoint built precisely for queries. When the official infrastructure is a façade, the real work is done by whoever downloads the dumps — and here it's worth remembering that the graph is the door, the asset is the engine: an endpoint that answers guarantees nothing; what counts is what's actually behind it.

The «APIs» that aren’t there, and broken certificates

Searching Cassation rulings has no documented public interface: the real backend (Solr) is undocumented. And the .giustizia.it sites present an incomplete TLS certificate chain that errors out (CERTIFICATE_VERIFY_FAILED) any standard connection. To read read-only public data we had to explicitly handle the TLS exception — something a State service should never require.

Hidden limits, even where the API exists

ISTAT. The statistics endpoint (SDMX) first returned HTTP 000 (unreachable), then came back but limited to 5 requests per minute, with an IP ban for 1–2 days if you exceed it. Practical result: unusable for serious integrations — we used Eurostat, which exposes the same key indicators without these barriers.
data.europa.eu (the EU portal). The API exists, but dataset links came back broken: the identifier field returned as a list instead of a string, producing malformed URLs we had to reconstruct connector-side.
Undeclared time series. A «clean» CSV on migrant reception actually contained 7 overlapping annual snapshots (2018–2024): if you don’t deduplicate, you count everything seven times. Nothing in the schema flagged it.

The abandoned civic projects

The best initiatives often stall. Open municipal budgets were updated only to 2021/22, with no API, accessible only via dump or scraping. Open Municipio, a platform built for city-council transparency, has been frozen since 2017 and must be installed municipality by municipality. Precious data that, without maintenance, slowly becomes unusable.

And CKAN? The standard is there, the care isn’t

CKAN is the de facto standard for open-data catalogs: there are ~950–2000 active instances worldwide. That’s good news — a single connector opens hundreds of portals. The bad news is that having a CKAN portal doesn’t mean keeping it updated: many Italian catalogs are dated, populated once «for transparency» and then left to age. The real value concentrates in a few well-maintained exceptions (some regional and city portals, and the European portal for harmonized datasets). The rest is often a shop window: open on paper, frozen in practice.

Why it matters

It’s not a technical detail. It’s why data that should belong to everyone remains, in practice, accessible only to those with the time and skills to «tame» it. Declared transparency isn’t enough: without maintenance, documentation and usable formats, open data stays a label. Open·Parlamento’s job is exactly this middle layer — making queryable the sources that, on their own, aren’t — always citing the original.

FAQ

Isn’t Italian public data open by law?

Much of it formally is: published, with an open license. But «published» doesn’t mean «usable by a program»: between an endpoint that answers empty, broken certificates and malformed links, real accessibility is often far worse than declared.

What’s the most common problem?

There isn’t just one: bot-gating against non-browser clients, dead official SPARQL, broken TLS, legacy encodings, nested ZIPs, hidden rate limits, broken links even where an API exists, and abandoned civic projects. The underlying pattern is a lack of maintenance.

How do you make them actually usable?

You need a layer that probes and «tames» each source: correct User-Agents, fallback to dumps when the endpoint is empty, encoding handling, deduplication of time series, URL reconstruction. It’s the unglamorous but decisive work that separates published data from usable data.

We turn hard public sources into truly queryable data and knowledge graphs — always citing the original. See how Open·Parlamento works and its open data, or discover how agents that read open data help understand a country from its past. Or let’s talk.