Data Publishing: Difference between revisions
(Created page with "Introduction Mozilla’s history is steeped in openness and transparency - it’s simply core to what we do and how we see ourselves in the world. We are always looking fo...") |
No edit summary |
||
Line 1: | Line 1: | ||
Introduction | '''Introduction''' | ||
Mozilla’s history is steeped in openness and transparency - it’s simply core to what we do and how we see ourselves in the world. We are always looking for ways to bring our mission to life in ways that help create a healthy internet and support the Mozilla Manifesto. One of our commitments says “We are committed to an internet that elevates critical thinking, reasoned argument, shared knowledge, and verifiable facts”. | Mozilla’s history is steeped in openness and transparency - it’s simply core to what we do and how we see ourselves in the world. We are always looking for ways to bring our mission to life in ways that help create a healthy internet and support the Mozilla Manifesto. One of our commitments says “We are committed to an internet that elevates critical thinking, reasoned argument, shared knowledge, and verifiable facts”. | ||
Line 5: | Line 5: | ||
To this end, we have spent a good amount of time considering how we can publicly share our Mozilla telemetry data sets - it is one of the most simple and effective ways we can enable collaboration and share knowledge. But, only if it can be done safely and in a privacy protecting, principled way. We believe we’ve designed a way to do this and we are excited to outline our approach here. | To this end, we have spent a good amount of time considering how we can publicly share our Mozilla telemetry data sets - it is one of the most simple and effective ways we can enable collaboration and share knowledge. But, only if it can be done safely and in a privacy protecting, principled way. We believe we’ve designed a way to do this and we are excited to outline our approach here. | ||
Dataset Publishing Process | '''Dataset Publishing Process''' | ||
We want our data publishing review process, as well as our review decisions to be public and understandable, similar to our Mozilla Data Collection program. To that end, our full dataset publishing policy and details about what considerations we look at before determining what is safe to publish can be found below, including asummary of the critical pieces of that process. | We want our data publishing review process, as well as our review decisions to be public and understandable, similar to our Mozilla Data Collection program. To that end, our full dataset publishing policy and details about what considerations we look at before determining what is safe to publish can be found below, including asummary of the critical pieces of that process. | ||
Line 22: | Line 22: | ||
* What metrics are sensitive, and at which level | * What metrics are sensitive, and at which level | ||
* How we characterize the levels of aggregation | * How we characterize the levels of aggregation | ||
''' | |||
How we characterize the levels of aggregation''' | |||
The table below describes the various types of aggregation levels we are defining. | |||
{| class="wikitable" | |||
|- | |||
! Level !! Aggregation !! Examples | |||
|- | |||
| 1 || Statistical / ML Models A model built/trained using real data || TAAR, Federated learning models, Forecasting models | |||
|- | |||
| Example || Example || Example | |||
|- | |||
| Example || Example || Example | |||
|} |
Revision as of 21:30, 18 September 2020
Introduction
Mozilla’s history is steeped in openness and transparency - it’s simply core to what we do and how we see ourselves in the world. We are always looking for ways to bring our mission to life in ways that help create a healthy internet and support the Mozilla Manifesto. One of our commitments says “We are committed to an internet that elevates critical thinking, reasoned argument, shared knowledge, and verifiable facts”.
To this end, we have spent a good amount of time considering how we can publicly share our Mozilla telemetry data sets - it is one of the most simple and effective ways we can enable collaboration and share knowledge. But, only if it can be done safely and in a privacy protecting, principled way. We believe we’ve designed a way to do this and we are excited to outline our approach here.
Dataset Publishing Process
We want our data publishing review process, as well as our review decisions to be public and understandable, similar to our Mozilla Data Collection program. To that end, our full dataset publishing policy and details about what considerations we look at before determining what is safe to publish can be found below, including asummary of the critical pieces of that process.
The goal of our data publishing process is to:
- Reduce friction for data publishing requests with low privacy risk to users;
- Have a review system of checks and balances that considers both data aggregations and data level sensitivities to determine privacy risk prior to publishing, and;
- Create a public record of these reviews, including making data and the queries that generate it publicly available and putting a link to the dataset + metadata on a public-facing Mozilla property.
This page defines all of the factors that must be taken into consideration before publicly sharing Mozilla’s telemetry data. It describes:
- The levels of possible dataset aggregations using Mozilla’s data
- The levels of publishing sensitivity
- What dimensions are sensitive, and at which level
- What metrics are sensitive, and at which level
- How we characterize the levels of aggregation
How we characterize the levels of aggregation The table below describes the various types of aggregation levels we are defining.
Level | Aggregation | Examples |
---|---|---|
1 | Statistical / ML Models A model built/trained using real data | TAAR, Federated learning models, Forecasting models |
Example | Example | Example |
Example | Example | Example |