Koichi Tanigawa [authored] edited by DB Online 2021/01/19 10:00
On December 8, 2020, Shoeisha's conference event "data tech 2020" was held online to reveal the current and future state of corporate data utilization. The theme of this year's event was "Data Driven Update: The True Data Driven Management. In the session of the data utilization infrastructure and data management category, Snowflake, which has been attracting much attention for its cloud data utilization platform, gave a presentation titled "DATA CLOUD: Snowflake's Data Collaboration Platform". The company introduced the value and case studies of "DATA CLOUD," a revolutionary architecture that enables data utilization leading to data-driven management by storing the company's data in Snowflake and data collaboration with other organizations.
Data is a mirror of the world, but it cannot reflect the world well if the data is disparate.
DATA CLOUD is about data collaboration across organizations, and it is said to be worth about 300 trillion yen a year," said Mr. KT, Senior Sales Engineer at Snowflake.
Snowflake Senior Sales Engineer KT
Snowflake Senior Sales Engineer KT
In-house data alone is not sufficient to achieve truly data-driven management. The availability of data across the organization will enable more valuable decisions to be made more quickly.
Before touching on the details of DATA CLOUD, the current state of data utilization in the world was explained. Today, data has become a mirror reflecting the world. The world is not only what we see in front of us. By using some kind of information to fill in the "in-between," the imaginable world expands more and more. Data allows us to see things that are not right in front of us," said KT.
Right now, people are confronting the various problems of COVID-19. However, many people do not see the infected and afflicted in front of their eyes. We can analyze the data and understand what the COVID-19 infection situation is like in the world, and from there, people will start to take action. In other words, the world of COVID-19 will be revealed through data analysis, and new actions will be taken as a result.
Various people around the world are making steady efforts to analyze data in order to reveal the world of COVID-19. However, data analysis takes a great deal of time and effort. For example, the data provided may be in PDF format, or one day it may suddenly become image data. It takes a lot of time and effort to collect them in a form that can be analyzed. If it takes time and effort to collect and clean up the data, it is difficult to concentrate on the analysis.
Not only in understanding the COVID-19 situation, but also in making data-driven decisions if the data is dispersed. In other words, although data is a mirror of the world, KT points out that in many cases data is not well utilized and "data ≠ world.
Snowflake, with an architecture optimized for the cloud from the start
Snowflake was created in 2012 to solve the problem of data being distributed and underutilized. This was just as the cloud was becoming popular and the first generation of cloud databases were emerging on the cloud. This first generation is a database that has been running on-premise until now, "so it can be used with the unlimited resources of the cloud," said KT. Many on-premise databases were put directly into the cloud and were not optimized for the cloud," he explained.
In contrast, Snowflake was born with a cloud-native architecture from the beginning. We started by thinking about how to use cloud resources efficiently," said KT. With the cloud-optimized architecture born from this idea, Snowflake first launched its service to the public in 2014 as a cloud data warehouse.
It will then be used not only as a data warehouse, but also as a database for AI, and by 2019, it will evolve into a cloud data platform to support all workloads. By continuing to innovate its own technology and adapt it to the needs of the times, it further evolved into DATA CLOUD in 2020 to enable data collaboration between organizations.
A major cause of poor data-driven decision making in organizations is the siloing of data. Data now comes from many places. An HR system generates information about HR there, and the same goes for accounting and customer relationship management systems. Data is processed in each of these places according to the purpose of the system, and the data becomes siloed in each system. The elimination of data silos is the starting point of Snowflake's architecture," said KT.
If data is stored in the smallest possible units, it can be projected as close to the real world as possible. Therefore, if we try to record as much data as possible, the amount of data handled becomes enormous. To collect the enormous amount of data generated by each system in one place, it takes a lot of time and effort to process the data, and it is not possible to obtain sufficient processing performance to handle the data properly. As a result, data is stored separately or for different purposes.
With Snowflake, data is stored in a single cloud storage location. Any number of computer resources can be set aside to process the data as needed. There are various workloads for the stored data, such as loading large amounts of data, ad hoc searches on huge amounts of data, and batch processing for aggregation. In the past, infrastructure capacity such as hardware has been purchased, set up, and used based on the maximum processing capacity for various workloads.
Snowflake allocates the necessary computing resources for each workload it wants to run. For example, "For an ETL workload, we allocate only the resources it needs. We start it up in seconds and use only what is really needed. We only charge for what we need," says KT. Also, if the data is stored in a single storage location, security and governance controls can be applied only to that location. Snowflake currently handles structured data as well as semi-structured data. Snowflake currently handles both structured and semi-structured data, and plans to actively support unstructured data in the future.
Snowflake can handle any amount of data. We can put in everything we want to put in, and we don't have to worry about storage size limits. Snowflake is multi-cloud compatible with AWS, Azure, and Google Cloud, so you can choose any region in the world.
Snowflake enables data sharing and decision-making for the future
By putting the data in one database in Snowflake, "we can immediately share the data not only with ourselves, but also with other companies in the group," said KT. There is no need to copy the data anywhere for sharing, and having the data in Snowflake makes it easy to share it not only with group companies, but also with third-party external organizations, as well as with publicly available open data. This is possible because of Snowflake's cloud-native, flexible architecture," he says.
This Snowflake DATA CLOUD initiative is not a story for the future. For example, Sainsbury's, a major supermarket in the U.K., used DATA CLOUD to share disparate data across its operating companies, and a search that used to take six hours can now be processed in three seconds. The company is now able to share a variety of data that it does not have, such as its own inventory data, economic impact information from Nielsen's COVID-19, consumer behavior history from Ibotta's mobile app, and weather information from Weather Source, to optimize inventory and make the right decisions.
Snowflake has a Data Marketplace where data can be immediately shared for free or for a fee. The data here is being used by the government to make decisions about the COVID-19 outbreak," said KT. The state of California, for example, is using the data to understand the situation in real time and to make decisions on actual infection control measures. He also says that data-driven decision making will increase even more in the future.
Snowflake already has about 3,000 users worldwide. Many of those companies will be able to place their data in the Data Marketplace and collaborate to make decisions that determine the future. KT hopes that this session will help you understand the value of sharing data and how collaboration can help you make meaningful decisions. Your data is a tremendous asset," said KT. With Snowflake, you can leverage your data effortlessly and turn it into a real asset. We would like to make the world data-driven by sharing data through DATA CLOUD," he says.
Snowflake is evolving from a cloud data warehouse to a cloud data platform and then to DATA CLOUD, and this session gave us a glimpse into the future where this evolution will accelerate data-driven management in the enterprise.
0 コメント: