Pages - Menu

Pages - Menu

Pages

2022年6月1日水曜日

What is Snowflake and what can it do?

JAPANESE

https://neovisionconsulting.blogspot.com/2022/06/snowflakeit.html

ーーー

ENGLISH

https://www.itmedia.co.jp/enterprise/articles/2109/01/news010.html#_ga=2.168644834.25786175.1654077443-266040718.1651278889


What can Snowflake do? Basic Information Explained (1)

(Page 1/2)

In recent years, Snowflake has become one of the most talked-about solutions when discussing data utilization infrastructures. We will look at its advanced design concept and data processing features that make it more than just a cloud DWH, and how it can be used to solve problems in existing corporate information systems.

Sep 10, 2021 08:00 AM Published.

[Hiroyuki Murayama, NTT Data Corporation]

Print

Notification

33

Share

11

This article is for members only. Register now to see the full article.

 In this series of articles, we introduce Snowflake, a company that has been garnering a lot of attention in recent years. Over the course of several articles, we will look at how Snowflake technology can contribute to the challenges that have faced business and IT in the past and will face us in the future, from an architect's perspective.


 In this first installment, we'll focus on the following topics to provide some basic information on Snowflake and its recent trends.


What is Snowflake?

Why Snowflake is attracting attention and its three symbolic features

What is the "Data Cloud," the world Snowflake is aiming for


Author's Introduction: Hiroki Murayama (General Manager, Solutions, Data Management Division, Data&Intelligence Business Unit, NTT Data)


Mr. Murayama joined NTT Data in 2000. Since joining the company, he has been engaged in product and technology research related to data integration such as EAI/SOA/ESB/MSA and related projects. From around 2014, he participated in projects in the financial, utility, and manufacturing industries as an architect of big data infrastructure, and then from 2018, he has been engaged in company-wide cloud service-type data analysis infrastructure projects as a PM and architect. Currently, as the lead manager of Snowflake business at NTT Data, he is working on partnership with Snowflake, deployment, and support for customer adoption.


What is Snowflake? It is not just a DWH service.

Snowflake is a cloud service provided by Snowflake, a company founded in Silicon Valley in 2012. The service was also launched on AWS Tokyo in February 2020. NTT DATA, to which the author belongs, had been researching the product before the establishment of the Japanese subsidiary, and in February 2020, when the service was launched at AWS Tokyo, we concluded a partner agreement and began handling the product.



History of Snowflake since its establishment (Source: Snowflake)

 Originally a "cloud-based DWH (data warehouse) service," Snowflake has recently expanded beyond DWH functionality to include data sharing, data marketplace, and multi-cloud replication to become a "Cloud Data Platform" and even a "Data Cloud. However, the term "Data Cloud" does not mean a data analysis platform service.


 However, since "Data Cloud" is not a familiar concept, it is easier to understand if you think of "Cloud Native DWH" as its basic functions and basic features.


 Although Snowflake has a wide variety of functions, there are three basic features that should be grasped in order to understand the full scope of Snowflake.


Standard SQL-based data warehouse

Built on cloud technology

Offered as a service (Database as a Service)

 For readers who have been considering and implementing various DWH products for some time, it may be difficult to understand what makes Snowflake so different from conventional products.


Three symbolic features that make Snowflake stand out from the crowd

 Three key features of Snowflake have attracted the attention of many companies in their various data utilization and analysis efforts.


Multi-cluster shared data architecture" that separates storage and compute

High scalability and flexibility for centralized data management

Near zero maintenance

 We will discuss these features in more detail in the next issue, but we believe Snowflake's arrival will have an impact on a wide range of cloud services and enterprise DX initiatives.



Positioning of Data Cloud in Snowflake and image of data platform configuration (Source: Snowflake, Inc.)

What is Snowflake's Data Cloud?

 We are often asked, "What makes Snowflake different from other services?" In fact, I believe the biggest difference is the concept of "Data Cloud" and the world it aims to create.


 Data Cloud refers to a world where live data can be shared with customers and business partners via the Internet, as well as easily connected via data to organizations and companies that operate as data consumers, data providers, and service providers.


 Companies aiming for "data-driven management," which emphasizes decision-making based on data analysis, may consider it necessary to collaborate with a variety of external partners because there are limits to the insights that can be obtained by limiting the target of analysis to only their own internal data.


 When promoting data collaboration with external parties, if file and API collaboration with external partners is assumed each time, data freshness and consistency, development/operation costs, etc. become issues. Snowflake's Data Cloud makes it very easy to securely share fresh data between companies, just as you would share files between individuals via a cloud storage service such as Google Drive. In addition to one-to-one data exchange, data can also be made widely available through the Snowflake Data Marketplace, a data marketplace for Snowflake users ("Data Marketplace").


 Japan-specific data had not been available, but on June 8, 2021, Weathernews announced that it would provide weather data on the Data Marketplace. Snowflake makes it easy to perform data analysis using these data. Snowflake makes it easy to perform data analysis on this data.


 There are other services that support such data exchange platforms and data sharing capabilities, and marketplaces are emerging. Nevertheless, the authors are focusing on Snowflake because we believe that its architectural characteristics are so simple and superior that they make it cost-effective and convenient for users. If these advantages are appreciated, we expect that Snowflake's Data Marketplace will continue to provide a lot of data in the future.


Why the U.S. Stock Market Listing Attracted Great Attention and Technology Expectations


Snoflake's growth speed and market expectations (Source: Snowflake)

 Snowflake, a fast-growing company mainly in the U.S., had its initial public offering (IPO) on the New York Stock Exchange on September 16, 2020.




 At the time of the IPO, the company attracted a great deal of attention in Japan, as reported in the business press as "Snowflake, the largest U.S. IPO in 2020, with a market capitalization of 7 trillion yen.




 Considering that there are only a dozen or so companies in Japan with a market capitalization of over 7 trillion yen based on outstanding shares, it is clear how attractive Snowflake is to the market.




 Companies with technology that claims to be a data center are nothing new. Yet the reason Snowflake is attracting so much attention and growth is because the service is a solution that meets the needs of modern IT infrastructures.




 The author believes that this is because more and more companies that are progressive in their IT investments are choosing Snowflake in their aggressive IT investments as a means to solve their current issues.




Sep 10, 2021 08:00 PM Published.

[Hiroyuki Murayama, NTT Data Corporation]

Print

Notification.

33

Share

11

Previous page 1|2       

How Data Analysis Infrastructure Architecture Changes Before and After Snowflake

 The authors and NTT Data have been focusing on Snowflake since around 2019. The reason for this is that we felt that it is a "future-inspired" service and has the potential to significantly change existing architectures. In fact, there have been solutions with similar ideals in the past, but we have not been able to find a service that actually meets our expectations.


 Snowflake's basic concept is to thoroughly utilize cloud services, and it has the flexibility, scalability, and agility that cloud services should have, while at the same time providing stability, high availability, and robustness, and it also has the ability to thoroughly reduce the burden of system operations when handling large amounts of data. It was also a solution that sincerely tackled the theme of thoroughly reducing the system operation load when handling large amounts of data. The cost of the solution was also very small, easy to start, and easily scalable, which is an advantage not seen in the past.


How to reduce the burden of access management in an IT system environment that is becoming increasingly complex with the spread of telework


What is the cloud service that will clear away the "expense reimbursement" issue, a stumbling block to the reform of work styles?


 In addition, conventional DWHs excel at aggregate processing of large volumes of data and can be expected to perform extremely high-speed processing, but they were designed for analysis by a small number of users for decision-making purposes. However, they were designed for decision-making analysis by a small number of users. They were not designed for simultaneous access by a large number of users, as symbolized by the "democratization of data," and their relatively high cost per data volume tended to be a problem when large amounts of data were stored.


 Snowflake was developed with the goal of separating compute and storage from the architectural design stage. As a result, Snowflake has succeeded in maintaining extremely high scalability in the face of increased data processing (multiple accesses and large volumes of data to be processed) and retained data volumes.


Maintaining Scalability and Data Consistency with Unconventional Characteristics of Distributed Transactions

 The authors were concerned about the issue of transaction processing and data consistency in a cloud-native environment. Traditionally, the CAP theorem states that it is impossible to satisfy "Consistency," "Availability," and "Partition-tolerance" all at the same time. Snowflake, however, avoids this problem by using cloud computing technology in an innovative implementation that satisfies all of the CAPs as a service as a whole.


 Understanding this technical characteristic, the authors saw the potential for this solution to be a disruptive and unconventional architectural solution.


 In our experience, when building a data analysis environment to promote data-driven management, such as democratization of data and AI, we used to build a system with a reference architecture like the one shown in the left figure below. In doing so, we had turned a blind eye to the risk of data siloing in order to prioritize agility, but this problem was becoming heavier and heavier on us day by day.


 The operational issues related to the data infrastructure, such as costs for new analysis themes, various adjustments, version upgrades, and analysis and restoration work in the event of a failure, were making it difficult to respond to new themes. We felt that by introducing Snowflake to these situations, we could simplify data management and expect a significant reduction in data management costs. We also appreciated the fact that the lower data management costs and high scalability would make it easier to develop new services and products, and we could expect to expand business opportunities through early market launches, as well as early implementation of various data-driven actions.



How will the architecture of the data analysis infrastructure change before and after Snowflake (Source: NTT Data)

The Unfortunate Reality of Past "Data Utilization Infrastructures" and the Potential of Cloud-Native Data Infrastructures

 While the state of data utilization infrastructures varies from company to company, there are probably very few companies that have built a perfect data utilization infrastructure. The author believes that many companies are facing the following issues


Inability to cope with the increase in data volume

Inability to immediately respond to new utilization and analysis needs

No data to begin with (not much data to utilize)

Poor data quality

Lack of proper data management, etc.

 Most of these issues cannot be solved by implementing solutions, but in fact, some of them can be solved by taking advantage of the technical characteristics of cloud computing.


 According to the NIST (National Institute of Standards and Technology) definition (Note 1), cloud computing has the following five characteristics.


Cloud characteristics as defined by NIST

On-demand, self-service

Broad network access

Resource pooling

Rapid elasticity

Measured Service

(Note 1) https://csrc.nist.gov/publications/detail/sp/800-145/final (Japanese translation of document by IPA: https://www.ipa.go.jp/files/000025366.pdf).



 What characteristics would the "ideal" data utilization infrastructure have if it were built with full use of these characteristics? The author believes that it is very important to aim for the following state in order not to impair the speed and agility of data utilization.


The ideal state of the data utilization infrastructure

Use resources only when you want to analyze them, and use them only when you need a very large amount of resources (and only when you use them).

Immediate access to data in the 2ndParty / 3rdParty, which are publicly available on the Internet.

Efficient and appropriate sharing of data (Storage) and compute resources (Server)

Scale-up and scale-out can be done in an instant without server stoppage or other operational impact or system operation load.

User usage can be measured and understood, and the usage of each user can be visualized.

 As a data-driven culture is fostered in an organization, an avalanche of requests for "more data like this" and "more data like that" will occur one after another. The author believes that creating a situation in which those who prepare data can respond to such requests as quickly as possible is a condition for success when promoting data-driven management, DX, and the use of AI technology.


 The reality was that there were many areas that could not be fully addressed by conventional DBs and DWHs in relation to the ideal situation. There are probably not a few companies that have experienced stumbling blocks in their attempts to build a meaningful data infrastructure. However, this does not mean that we should give up on the idea that "ideals cannot be realized." Instead, we would like to encourage companies to consider introducing new technologies such as Snowflake and new systems with the expectation that "new technologies are providing solutions for a more convenient world for people who have probably faced the same issues. We hope that you will consider implementing new technologies such as Snowflake and new mechanisms.


What you will learn in this series

 NTT DATA, to which the author belongs, has developed a "Digital Success Program" based on the know-how gained from supporting technology-driven digital transformation over the past decade or so, and has positioned Snowflake as a key technology in the Digital Success Program. In addition to developing a variety of know-how, the company intends to utilize Snowflake as a "disruptive solution" that rewrites conventional system architecture and know-how.


 In this series of articles, we will discuss what kind of business and IT issues Snowflake can address with what kind of technology, what specific benefits Snowflake can offer, what kind of world Snowflake is aiming for with its "Data Cloud" concept, and what kind of applications Snowflake can be used for. The following sections will provide a step-by-step introduction to the Data Cloud concept and its application examples.


 We hope that readers will find this article useful as a hint for solving issues related to data infrastructures, or as a resource for considering what data infrastructures should look like in the cloud-native era.


Previous page 1|2       

Copyright © ITmedia, Inc.

0 件のコメント:

コメントを投稿