The increasingly digital nature of our world has seen a huge rise in data size and quantity. Now, new technologies such as the Square Kilometre Array are about to create an even fiercer deluge of data. Baz Macdonald investigates.
In 2011 NZ became one of the founding members of the SKA organisation, a not-for-profit company dedicated to creating the world’s largest radio telescope.
When completed, the Square Kilometre Array (SKA) will be the most sensitive telescope ever created, and allow us to see 10 times deeper into space than we ever have before. However, a telescope of this size and sensitivity also brings with it some significant challenges, the foremost of which is the amount of data it will create – with some estimates showing that in one day the array will create more data than the whole internet has created in the past 30 years.
The telescope will require thousands of dishes and aperture array telescopes, which will be installed in the deserts of Australia and South Africa.
While NZ may not be hosting the telescope infrastructure, we are playing an important role by helping to advance technologies that will support it. NZ researchers and research and development firms are working to find ways of dealing with the big data deluge the array will create.
This research is underway, and although SKA is not completed, we are already starting to reap the benefits – with our country considered one of the world leaders in tackling this data challenge, as we continue to invest in fields of Artificial Intelligence, data learning and cloud computation.
The big data deluge
Nicolás Erdödy the founder and CEO of Open Parallel, an R&D firm that is working alongside our universities and companies in developing the technology necessary to make SKA a success, said they are continually discovering practical applications for industries.
“To process [SKA’s data] you need supercomputers, as well as new tools, operating systems, storage and applications. By working on that, we start to learn how we can use these new tools to optimise primary industries”
Open Parallel demonstrated this by using an algorithm developed for SKA to show how satellite data could be used to read water levels on an area of pasture, making water and irrigation management easier for farmers.
“Essentially, by participating in SKA, we are improving our way of thinking, which is giving us better intelligence in improving problems that exist in NZ’s day-to-day system.”
The research also prepares us for impending issues – such as helping this country and the rest of the world stay one step ahead of growing data needs.
Our data levels have continued to climb exponentially over the past 30 years – with what was once a trickle quickly becoming a waterfall.
Increasingly, the question is not about how to store it, but how to process and glean pertinent information from such vast quantities of raw data.
Erdödy said humans were excellent models for dealing with big data. Our brains are capable of storing incredible amounts of information, but more importantly, we are capable of organising, prioritising and classifying information so it is applicable to any given situation – instead of constantly being bombarded with every single thing we know.
As the amount of data generated increases, we need to develop ways of parsing out that information into digestible quantities and for specific functions, as our brains do. However, instead of our biological intellect and intuition, big data requires computational tools such as AI, machine learning and cloud computing.
The AI solution
Artificial intelligence is a technology that has grown hand-in-hand with big data, and is considered by many to be the most effective and efficient way of extracting relevant information from raw data.
On the cutting edge of AI development is Christchurch based geospatial company Orbica. Using satellite, drone and aerial imagery, Orbica founder Kurt Janssen and his team have created an AI algorithm which can quickly extract, with above 90% accuracy, building outlines, roads, and surface water types.
This is a process that has previously been mapped manually by someone at organisations such as LINZ. But now, using Orbica’s technology, what previously took hours or days now takes only minutes with no manual effort.
Orbica do this by taking huge amounts of data and using an algorithm to distil it into only the relevant information – in this case, the outlines of buildings, roads and water sources.
“Our focus is turning that data, into big intelligence – getting information out of this big data automatically, with certainty and precision. Information that people inside organisations can actually use to make real world decisions daily,” Janssen said.
Orbica are using a sophisticated new sub-class of AI called deep learning to extract this information. The process mimics biological neural networks in how it uses information already known in order to make judgments about new data. In the case of Orbica, it is how its algorithm isolates objects regardless of appearance, by learning all the different visual permutations of the subjects and adding them to a knowledge bank for future reference.
Once the data is processed through the AI algorithm, it is then put through a geographic information system engine to fine tune it – with these corrections fed back to the algorithm so that with each data set, the accuracy improves.
Using the cloud
Alongside processing, one of the other major hurdles of big data is storage. The datasets satellites and the like only get bigger as their imagery becomes more complex and accurate, making it a challenge to download and store this information.
For many companies and researchers, this data size has acted as a barrier to entry in using earth observation data. However, the rise of cloud computing has begun to offer an easy and accessible solution.
GNS Science remote sensing scientist Rogier Westerhoff has been working on reading groundwater levels nationwide using satellite data. Only a few years ago, Westerhoff’s work was being dramatically slowed by the time and effort required to download and process satellite data – with it taking a month to download a year’s worth of data and several more months to process it.
Last year Westerhoff began using cloud computing service Google Earth Engine, which is designed specifically for the processing and analysis of earth science data. With this service, which is free for non-commercial use, Google offers scientists access to a cluster of computers which contain multi-petabyte storage servers.
“Computations which would have taken me months on my own computer, now all of a sudden take me only a few minutes,” Westerhoff said.
“I literally felt the world of possibilities open up [when I began using Google Earth Engine].”
Cloud computing removes the hurdles of the processing and download speeds of internal computing facilities, by allowing users to access data online through a vast network of facilities – each processing only a portion of the data, but at scale this adds up to an incredibly fast and efficient way to manage large amounts of information.
Cloud computing means data can be processed much more quickly without any need to download the entire data set. What may have been terabytes of raw data can now be analysed and refined down to only a few hundred megabytes of specific information to download.
Westerhoff said by freeing up this time and energy cloud computing had opened the world of scientific research and allowed scientists more room to explore cross-discipline research.
“You start to realise that you could integrate different scientific disciplines. In my case, for instance, integrating surface water science with ground water science.”
Another effect, is how cloud computing has increased the scope of open-source information. With data now more open and accessible due to the cloud, scientists can avoid creating the same datasets and instead focus on analysing data for their own research. As creating datasets has previously been a large part of any research endeavour, this accessibility has the potential to exponentially increase the amount of research being done.
Westerhoff has been working to introduce this cloud technology to regional councils, so that they can more better harness satellite data for the management of a range of regional features – including water levels, land use and coastal sediment flow. Westerhoff said the reaction from councils he worked with had been enthusiastic, as they realised the possibilities that arise from analysing satellite data so efficiently.
This is the second in a series by Newsroom contributor Baz McDonald on the impact satellites and space age technology is having on New Zealand. The first is here