Senate on GitHub, Chamber SPARQL-only: two opposite ways to open data

6/17/20264 min read
Senate on GitHub, Chamber SPARQL-only: two opposite ways to open data

Direct answer: the Italian Chamber and Senate both publish their data as open data, but with opposite strategies. The Chamber exposes everything only through a SPARQL endpoint (dati.camera.it/sparql, on Virtuoso). The Senate complements its SPARQL with a GitHub repositorySenatoDellaRepubblica/AkomaNtosoBulkData — with bills and amendments in Akoma Ntoso, CC BY 4.0, CI-updated. Integrating both for Open·Parlamento, we felt firsthand why this difference matters.

TL;DR

  • Chamber: data via SPARQL only. No public versioned dumps. If the endpoint is slow or down, you have no plan B.
  • Senate: SPARQL + bulk data on GitHub (AKN, CC BY 4.0, CI). You can clone, version and process the texts offline.
  • Fragile «official» SPARQL: more than one Italian public endpoint answers 200 but empty from a script — you must fall back to raw dumps.
  • Lesson: a versioned bulk dump is more robust than a live endpoint for anyone building data pipelines.

Why two different approaches?

The Chamber and Senate are autonomous institutions that opened their data at different times and with different technical choices. The Chamber invested in a SPARQL triple store (Virtuoso 07.10.3207): elegant for ad-hoc queries, but a single point of failure — if the endpoint slows down or returns empty, there’s no official alternative.

The Senate made a more «developer-minded» choice: besides SPARQL, it puts legislative texts in a GitHub repository updated via continuous integration. It’s a pattern data people know well: a versioned bulk dump is easy to clone, batch-process and reproduce, and doesn’t depend on a query server’s uptime.

Aspect Chamber Senate
SPARQL endpoint dati.camera.it/sparql dati.senato.it/sparql
Public bulk dump ✅ GitHub (AKN, CC BY 4.0)
Versioned / CI ✅ CI-updated
Offline / batch work hard easy
Plan B if the endpoint is down none the dump

The problem of «official» SPARQL endpoints that answer empty

Building the connectors, we found a recurring pattern in Italian public data: official SPARQL endpoints that, from a script, answer HTTP 200 with zero results, even though they work from a browser. It happens, for example, with the Constitutional Court’s open-data portal: «the official SPARQL answers 200 empty from a script → unusable», and the fix was to use the JSON datasets instead of the live endpoint.

This is exactly why the Senate’s choice — also publishing a versioned dump — is valuable: when the live endpoint betrays you, you still have a copy of the data you can process. Whoever publishes only SPARQL leaves you without a safety net.

What it means for anyone building on this data

  1. A SPARQL endpoint isn’t enough. Great for exploratory queries, fragile as the only access path. Serious pipelines need a dump.
  2. Versioning is a feature, not a detail. Knowing which version of the data you processed (a GitHub commit) makes analyses reproducible. A live endpoint doesn’t guarantee that.
  3. Normalize the diversity. Chamber via SPARQL, Senate via SPARQL + dump: our connector layer reads both and maps them to a single model before the knowledge graph. (We also covered the two ontologies OCD and OSR.)

FAQ

Does the Chamber have a downloadable data dump?

As far as we found, no: the Chamber exposes its data publicly only through the SPARQL endpoint dati.camera.it/sparql. There’s no public versioned-dump repository equivalent to the Senate’s.

What is the Senate’s AkomaNtosoBulkData?

It’s an official GitHub repository of the Senate of the Republic with legislative texts (bills, amendments) in Akoma Ntoso format, under a CC BY 4.0 license and updated via continuous integration. It’s especially useful for amendment texts.

Why is a dump more reliable than a SPARQL endpoint?

A versioned dump is cloned once and processed offline: it doesn’t depend on server uptime, it’s reproducible (you know which version you used) and it doesn’t suffer timeouts or rate limits. A live endpoint is handy for ad-hoc queries but becomes a bottleneck for pipelines.


We build AI agents and knowledge graphs on real data — handling source diversity instead of hiding it. See how Open·Parlamento works, or let’s talk.

Open DataKnowledge GraphSPARQLCivic Tech

Scritto da Giulio Garofalo