Serverless Kubernetes Has Become Invaluable to Data Scientists

摘要： Kubernetes is a powerful platform for data scientists, which can be even more useful when you use it in a serverless environment.

images/2022-03-07_012136.png

▲圖片標題(來源：Shutterstock Photo License - Piotr Swat)

Data science is a growing profession. While it involves more opportunities than ever, it also has a lot more complications. Standards and expectations are rapidly changing, especially in regards to the types of technology used to create data science projects.

Most data scientists are using some form of DevOps interface these days. One of the most popular is Kubernetes. Kyle Gallatin recently recorded a Kubernetes tutorial that was presented at the New York City Data Science Academy, which illustrates the importance of this platform for his profession.

There are a lot of important nuances for data scientists using Kubernetes. One of the most important is the adaption of serverless Kubernetes.

In this post, we will look at how serverless is changing the traditional Kubernetes architecture. However, we will first address the benefits of Kubernetes in data science.

Benefits of Kubernetes for Data Science

Kubernetes is based on a control node combined with multiple worker nodes to facilitate its cluster architecture. Workloads then get distributed to these worker nodes while being managed by the control node. With the emergence of serverless technologies, there is growing interest in utilizing serverless within Kubernetes both to manage workloads and provide the cluster itself.

It should be relatively obvious why data scientists can benefit from this interface. Bob Laurent, Senior Director of Domino Data Labs has talked about some of the biggest reasons. He points out that Kubernetes allows scalable access to GPUs and CPUs and helps with infrastructure abstraction. These features make data science projects scalable, cost-effective and easier to manage.

Why Serverless in Kubernetes?

Kubernetes is clearly a useful feature for data scientists. After this is understood, it is important to come to terms with the wonders of using it in a serverless enviornment.

First of all, it is important to dispel a misconception. Serverless does not mean the absence of servers. It just means that the server is abstracted to a certain level that users do not need to consider how their applications are executed. You only have to simply provide your packaged application or a container, and the serverless platform will manage all the underlying infrastructure considerations. This means it can still be used to handle data projects at different levels of your infrastructure.

Even with all the advantages Kubernetes brings, users still need to manage the underlying servers. While managed K8s reduce this burden somewhat, it still does not eliminate servers completely from the equation. They will manage the control plane, yet you still have to provision and manage worker nodes on the various data science projects you are working on.

Serverless implementation like AWS Fargate completely eliminates the need for data scientists to manage the worker nodes and moves the workloads into serverless architecture. This approach completely shifts the responsibility of server (node) management from the user to the service providers. Serverless can also bring cost reductions, as users only pay for the resources used. Furthermore, it ensures no overprovisioning has occurred while having the flexibility to scale as needed.

Kubernetes without Nodes?

Each worker node has an agent called kubelet that connects it to the Kubernetes API. When a user interacts with the Kubernetes API via kubectl commands, kubelet allows each node to receive instruction from the API on how to manage the pods in the specific nodes. Kubectl also uses PodSpecs to manage the underlying pods whenever a kubelet is running on a server and connected to K8s API.

This opens a lot of doors for data scientists trying to boost scalability and customize their projects. The biggest benefit in data science projects boils down to virtualization.

In a serverless setting, this functionality is typically emulated by a virtual kubelet. This allows the Kubernetes API to recognize the virtual kubelet implementation as a node within a cluster. However, this virtual kubelet will schedule containers elsewhere, typically in supported backends like AWS Fargate, AWS Batch, HasiCorp Normad, etc… Although users can interact with the K8s cluster usual way the underlying containers will be scheduled in serverless containers services. Thus, with this implementation, users can gain the advantages of serverless without sacrificing the functionality of Kubernetes. The best part of a virtual kubelet is it even allows for mixed configurations, where actual worker nodes and virtual kubectl can coexist within a cluster.

轉貼自： Smart Data Collective

Serverless Kubernetes Has Become Invaluable to Data Scientists

摘要： Kubernetes is a powerful platform for data scientists, which can be even more useful when you use it in a serverless environment.

Benefits of Kubernetes for Data Science

Why Serverless in Kubernetes?

Kubernetes without Nodes?

留下你的回應

以訪客張貼回應

回應

釘選列表

喜愛列表

Web Services

YOU MAY BE INTERESTED

Popular Tags

	今日	3688
	昨日	3838
	本週	3688
	本月	69281
	總訪客量	2195526