Building on a year of open data: progress and promise
One year ago, Microsoft launched an Open Data Campaign to help close the data divide between those countries and companies that have the data they need to innovate and those that do not. We learned quickly that this continued divide risks leaving some people behind, without the ability to put data to work and without the ability to generate economic wealth and opportunity more broadly.
To address the challenges of tomorrow, we need to make it easier to open, share and collaborate around data today. That’s why we’re sharing 10 key lessons from the first year of our campaign to help other organizations of all sizes unlock the power of data.
Before I dive into the lessons, I want to share an update on the progress we’ve made toward our goals, as well as our focus areas for the year ahead.
- We helped to launch nine data collaborations toward our goal of 20 data collaborations by 2022, addressing challenges in the areas of sustainability, health, and equity and inclusion. In some cases, we opened our own data to seed these collaborations, including new United States broadband usage data and Bing Maps aerial and streetside imagery data to take on challenges, from more effectively targeting educational resources for remote learners to knowing where to build electric vehicle (EV) charging points as to increase the adoption of EVs over time. These collaborations are detailed in our new report, Open Data Campaign: Year one in review.
- We are announcing our priorities for the year ahead to: scale data stewardship; foster new data collaborations; grow data analyst skills and data literacy; make data sharing easier; and advance policy discussions. More information on these focus areas is detailed below.
We’ve made steady progress toward closing the data divide, but we still have a long way to go. The data divide persists and continues to threaten the democratization of data – but we also see a lot of reason for hope and optimism. Microsoft President Brad Smith and I recently discussed the challenges, what we’ve learned since the campaign launched and our ambitions going forward. I encourage you to watch our conversation, where we also heard from thought leaders who are advancing the cause of open data.
Learning by doing
The biggest takeaway from our work this past year – and the one thing I hope any reader of this post will take away – is that data collaboration is a spectrum. From the presence (or absence) of data to how open that data is to the trust level of the collaboration participants, these factors may necessarily lead to different configurations and different goals, but they can all lead to more open data and innovative insights and discoveries.
Here are a few other lessons we have learned over the last year:
- Principles set the foundation for stakeholder collaboration: When we launched the Open Data Campaign, we adopted five principles that guide our contributions and commitments to trusted data collaborations: Open, Usable, Empowering, Secure and Private. These principles underpin our participation, but importantly, organizations can build on them to establish responsible ways to share and collaborate around their data. The London Data Commission, for example, established a set of data sharing principles for public- and private-sector organizations to ensure alignment and to guide the participating groups in how they share data.
- There is value in pilot projects: Traditionally, data collaborations with several stakeholders require time – often including a long runway for building the collaboration, plus the time needed to execute on the project and learn from it. However, our learnings show short-term projects that experiment and test data collaborations can provide valuable insights. The London Data Commission did exactly that with the launch of four short-term pilot projects. Due to the success of the pilots, the partners are exploring how they can be expanded upon.
- Open data doesn’t require new data: Identifying data to share does not always mean it must be newly shared data; sometimes the data was narrowly shared, but can be shared more broadly, made more accessible or analyzed for a different purpose. Microsoft’s environmental indicator data is an example of data that was already disclosed in certain venues, but was then made available to the Linux Foundation’s OS-Climate Initiative to be consumed through analytics, thereby extending its reach and impact.
- Data collaborations can start without data: Not all data collaborations are about sharing existing datasets, but rather partnering to create new datasets. For example, The Data Foundation recognized that there is a long-overdue gap in systematically understanding the American people’s views of the criminal justice system and police forces. They invited partners to remedy this gap leading to the Policing in America Survey, which aims to collect new, original data in select cities and then makes this data open.
- It’s not just the data that can be made more open: The value of “open” can come in a variety of forms by leveraging synergies between open data and other domains of open innovation, such as open source. This also includes sharing open frameworks, open tools, APIs to access data and more. One goal of the OS-Climate Initiative is to build a publicly available platform of modeling and technical infrastructure that decision-makers can use to model different scenarios. Another example is found in the Digital Public Goods Registry which increases access to open source software, open data, open AI models and open standards, provided to help attain the UN’s Sustainable Development Goals.
- Data visualizations go a long way in helping to show the value of collaboration: Data visualizations help tell the story of a data collaboration in a tangible way – and, in some cases, visualizations may be the primary way of sharing information, especially if the raw data informing the visualization is sensitive and cannot be shared publicly. As part of the London Data Commission’s EV Infrastructure pilot, a dashboard was created to show how sensitive infrastructure data and traffic camera data can be combined with other datasets to produce insights on optimal EV charging station locations.
- Be ready to pivot: The data streams from The Alan Turing Institute’s London Air Quality Project were quickly repurposed to also look at the effects of lockdown easing during Covid-19 to better understand how people were responding to those changes. It’s not a use case that could have ever been anticipated during the planning stages of the Air Quality Project but, with the data streams in place, new insights could be drawn at a time when they were most needed.
- Data is synergistic: Without embarking on a data collaboration, you may never realize the positive unexpected benefits that can result along the way. Data is synergistic and can become considerably more valuable when combined with other datasets. This came through in the Purdue Food and Agricultural Vulnerability Index, which draws on vastly different open datasets to generate new insights into the impact of Covid-19 on farm production and the health of farmers and farmworkers.
- “Open” shouldn’t get in the way of “more open”: There is a spectrum of data sharing – from publicly available data that can be supported through open data use agreements to trusted data sharing scenarios involving data with privacy or commercial sensitivities, and in which case the governance model and supporting technologies will operate differently. Considering this range, data doesn’t need to be completely open to be useful, and being “open” with data shouldn’t get in the way of “more open.”
- Data collaborations evolve, so start by managing trust: The data collaboration problem statement, assumptions held at the start of the collaboration and stakeholder opinions may all change over time. In some instances, we learned that we didn’t build-in enough flexibility upfront to allow for change. And change can affect trust. We learned that data collaborations need to establish a process for managing change – and trust. This might include leveraging the Open Data Institute’s Data Ecosystem Mapping tool to map value exchanges. Because at the end of the day, trust – in the data, frameworks, governance, technology and more – is the key factor that enables data collaborations to take off.
To get started, we suggest that emerging data collaborations make use of the wealth of existing resources. When embarking on data collaborations, we leveraged many of the definitions, toolkits and guides from leading organizations in this space. As examples, resources such as the Open Data Institute’s Data Ethics Canvas are extremely useful as a framework to develop ethical guidance. Additionally, The GovLab’s Open Data Policy Lab and Executive Course on Data Stewardship, both supported by Microsoft, highlight important case studies, governance considerations and frameworks when sharing data. If you want to learn more about the exciting work our partners are doing, check out the latest posts from the Open Data Institute and GovLab.
Moving forward
Using these lessons learned and building on the progress we’ve made alongside our partners, we plan to spend the next year focused on the practical aspects of data sharing and making the process easier.
Scaling data stewardship
One key insight we gained this past year is the strong interest and need for guidance when it comes to data stewardship. Many organizations want to do more around open data and data sharing, but when it comes to the practical aspects of how to do it, they often don’t know where to start. Building on the success of this year’s Data Stewardship Executive Course, the Open Data Policy Lab is today publishing its course materials so that organizations everywhere can use these resources to guide their data reuse strategies.
Additionally, the Open Data Policy Lab will focus on scaling data stewardship guidance for the public and private sectors. To help address these needs, a new Data Stewardship Academy will be designed for a much broader reach. Second, the Open Data Policy Lab will develop an Open Cities initiative to build community and share insights among cities that are opening and using data to innovate and drive change. The Open Data Policy Lab will also continue to drive new research on open data and data reuse, including a closer look at the value of open data.
Fostering new data collaborations
We’ll continue to identify and help launch data collaborations to address societal issues focused on sustainability, health, and equity and inclusion.
To help address climate challenges, together with the Open Data Institute, we are committed to launching and supporting three data collaborations to address climate change. These three data collaborations will each focus on one of the six priority areas that we identified in the report, Accelerating Progress on Tackling the Climate Crisis Through Data Collaboration.
Additionally, in partnership with the Open Data Institute, we’ll be announcing an open call for a new Peer Learning Network for data collaborations to participate and learn from each other. Another focus area with the Open Data Institute is to build momentum through case studies across sectors that highlight the value of opening and sharing data that would otherwise not be realized.
Growing data analyst skills and data literacy
We also want to help connect those who currently work with data or would like to explore a career in data analysis with related, in-demand skills, in partnership with the Microsoft skills initiative. This work includes sharing opportunities to obtain Microsoft Certifications on data and AI fundamentals and courses for data analysts.
Making data sharing easier
We must make data sharing easier through scalable tools and technologies. Technologies, such as differential privacy, made more accessible through the first-ever open source differential privacy platform, SmartNoise, are critical to preserving privacy when sharing data. Additionally, continuing work on legal and licensing tools – such as the Open Use of Data Agreement (O-UDA) and Computational Use of Data Agreement (C-UDA), both initiated by Microsoft in 2019 and now stewarded by the Linux Foundation – will encourage and simplify broader data sharing. We will continue to focus on development of these frameworks, resources and technologies that make data sharing more accessible and achievable.
Advancing policy discussions
A robust data reuse regime necessitates good governance frameworks. Increasingly, policymakers are taking measures to improve existing open data initiatives and are exploring data governance mechanisms.
To fully realize the benefits of data, policymakers must work with industry, academia and civil society to develop incentives, infrastructure and mechanisms to responsibly share public and private sector data within – and across – organizational and national boundaries that are in line with the rule of law and safeguard human rights, while allowing for effective data re-use for innovation. In addition to properly maintained and funded national open data programs, data governance frameworks create trust in the integrity of the data sharing ecosystem by ensuring that the benefits of data are equitably shared and by providing adequate safeguards to protect cybersecurity, human rights and privacy.
At Microsoft, our mission is to empower every person and organization on the planet to achieve more. Closing the data divide won’t happen overnight, but if we continue to build a bold, diverse movement committed to this work, we know the impact will benefit future generations in pursuit of a safer, healthier world.
You can read all insights from techUK's AI Week here