02 Aug 2022

Event round-up - Future Visions: Synthetic Data

The Future Visions series explores the next-generation technologies at the cutting edge of research and development that are set to disrupt industries, challenge incumbents, and act as a catalyst for growth. It offers the opportunity to learn about the latest advances in technology from those at the heart of its development, and what this might mean for your business.

This webinar explored the creation and use of synthetic data, where it could be revolutionary, and whether it will live up to its hype.

A rich panel of industry experts included: 

  • Dr. Martin O'reilly, Director of Research Engineering, The Alan Turing Institute 

  • Jeremy Poulter, Business Development & Solutions Director, Defence and National Security, Microsoft UK 

  • Alexandra Ebert, Chief Trust Officer, Mostly AI 

  • Emanuele Haerens, Commercial Director, Hazy 

  • Laura Foster, Head of Tech & Innovation, techUK 

The conversation began with the importance of understanding what synthetic data is and its uses. Martin O’Reily explained that synthetic data is data that is generated artificially that contain key properties, which differentiate itself from other data (forms). Alexandra Ebert added that synthetic data’s main utility is about saving access, and Emanuele Haerens crucial noted that whilst synthetic data uses are key to the future of data technology, there are issues of slower innovation and a high-risk level of data and privacy. 

Jeremy Poulter continued the discussion by focusing on synthetic data’s augmentation use in the defense industry.  It was made clear by Jeremy that there should be a high priority on enhancing AI reliability/responsibility and a strong emphasis on safety if synthetic data is utilised to augment data in defense. Jeremy also provided an example to help visualise, which was satellite imagery. In the defense industry, synthetic data could be implemented to improve current satellite systems, where factors such as weather can affect their imagery, clouding the accuracy of intelligence/data.  

Alexandra Ebert then shifted the discussion to the topic of the fidelity of data and concluded that synthetic data is a viable and better replacement for the current data sets that are currently being utilised. Additionally, Alexandra had pointed that her work has shown successful outcomes of utilising synthetic data, to which was promising and surprising for some attendees. Following on from this point Martin O’Reily noted the subject of narrative fidelity, and attendees were curious to know the details of how successful narrative fidelity had been in achieving datasets.  

Questions were asked from the audience, starting with a question around the processes involved in generating synthetic data and how costly and time-consuming it could be, and how to determine when a synthesised data set is 'good enough'? Alexandra Ebert responded by explaining the uses of granular insights, learnt patterns and that the synthetic data process randomly generates using multiple data sets. Martin O’Reily added that synthetic is able to obtain also different combinations of data, but noted that there is need to explain these models and how it impacts privacy concerns. 

Jeremy Poulter then directed the discussion towards how synthetic data provides a progressive approach to drive inclusiveness. Alexandra Ebert echoed Jeremy’s point and highlighted the bias at hand with regards to the current anti-discrimination laws, which do not allow the use of AI. Alexandra added that to help drive synthetic data’s ability to drive inclusivity, third parties will need access to the current data sets to utilise synthetic data productively and effectively. Emanuele Haerens also agreed with Jeremy’s point but added that in conjunction to this, there should be understanding as to the concept of fairness and the need for a consistent definition. To which Alexandra highlighted that data could play facilitating role to Emanuele’s point. 

The discussion then shifted after a question by an attendee directed at Jeremy Poulter: Would it be feasible to develop a synthetic data set that is fully representative of a military classified data set in order that non-security cleared sub-contractors can test their algorithms and have confidence that it will work reliably on the classified data? Jeremy answered that there always be a blend of datasets and there will always be an important role for real data. Jeremy added that there will most likely be extreme scenarios where synthetic data will need to be utilised and this was highlighted in his example of the UK’s defence strategy – for the UK to be successful in defence, synthetic would be needed to help utilise declassified/unclassified data to aid current and future defence models. He also added that this would help to refine the current data sets and Alexandra Ebert agreed with Jeremy’s point. Alexandra also added that in terms of the privacy context, real data should not be included with synthetic data. Emanuele also voiced their agreement with Jeremy’s point and pointed out that there would be a good chance/would not be surprised to see if data scientists/data engineers utilise synthetic data in their areas of work. 

An integral part of this discussion was Alexandra Ebert’s contribution to democratising the data; synthetic data would be a game-changer for a multitude of businesses. Following from this, Martin O’Reily mentioned the different variations in synthetic data and its details. They added the need for data sharing agreements, the need for access to data and questioned whether the data should be made publicly beforehand. Ultimately, Martin highlighted how the discussion needs to be had to figure out what works and what does not work, and currently, that the biggest challenge is data availability. Emanuele Haerens also echoed the Martin’s point and Alexandra added that there should be a focus on the need for privacy but also access, keeping in mind it would be difficult to manage them both. Jeremy mentioned that the discussion of where synthetic data is going, is integral, as they importantly recognise the global shift and introduction to the metaverse. 

To finish up the meeting, the panellists were asked what advice they would give to attendees. Emanuele began with advising that for any business to begin utilising synthetic data, one would need to understand what they plan to use synthetic data exactly for and have a pre-determined expectation of the results. Alexandra next advised that in the future there will be a strong emphasis on synthetic data and that it will important to democratise it. Jeremy’s advise highlighted the importance of safety, transparency, inclusivity and validation of synthetic data. Martin echoed Jeremy’s point and added the datasets should be explicit. 

Thank you to all of those who attended techUK’s event on synthetic data. Please do reach out if you would like to learn more about techUK’s synthetic data campaign. 


 Click below to view our other Supercharging Innovation series:

Click below to view our other Unleashing Innovation series:

Laura Foster

Laura Foster

Associate Director - Technology and Innovation, techUK

Laura is techUK’s Associate Director for Technology and Innovation.

She supports the application and expansion of emerging technologies, including Quantum Computing, High-Performance Computing, AR/VR/XR and Edge technologies, across the UK. As part of this, she works alongside techUK members and UK Government to champion long-term and sustainable innovation policy that will ensure the UK is a pioneer in science and technology

Before joining techUK, Laura worked internationally as a conference researcher and producer covering enterprise adoption of emerging technologies. This included being part of the strategic team at London Tech Week.

Laura has a degree in History (BA Hons) from Durham University, focussing on regional social history. Outside of work she loves reading, travelling and supporting rugby team St. Helens, where she is from.

Email:
[email protected]
LinkedIn:
www.linkedin.com/in/lauraalicefoster

Read lessmore