Best ways to run Spark jobs on Kubernetes for development, data exploration and in production

During the years of working on Apache Spark applications, I have always been swtiching my environment between development and production. I would use an IDE such as Visual Studio Code to write the Scala or PySpark code, test it locally against a small piece of the data, submit the Spark…

Address challenges of managing and sharing GPU and data in AI infrastructure with a couple of clicks.

How easy is it to access your AI infrastructure? Pure Storage has partnered with Lablup to make on-premise AI accessible and fast.

Lablup’s Backend.ai is an open-source computing resource orchestrator designed for AI/ML. It makes AI easier to access for your data scientists. Adding Pure Storage’s FlashBlade®, an industry-leading flash-based…

Making Sense of Big Data

Building a scalable and reliable logging solution for large Kubernetes cluster with scalable tools and fast object storage.

Building a basic logging solution for Kubernetes could be as easy as running a couple of commands. However, to support large-scale Kubernetes clusters, the logging solution itself needs to be scalable and reliable.

In my previous blog, I described an overview of my Kubernetes monitoring and logging solution. At that…

Building reproducible and scalable deep learning system with fast S3 as the central data and model repository.

I know most data scientists do not care about storage, and they shouldn’t. However, having a fast S3 object storage in the system would definitely help optimise our deep learning workflow. Let me explain why “storage actually matters in DL” in this blog.

Distributed Training with Fast NFS

In my previous blog, I explained why…

Someone asked me to help benchmark and compare throughput of on premise and cloud big data storage. Instead of just running the benchmark, I thought it might be worth writing a blog post about Hadoop TeraGen, which many people have been using as a basic benchmark tool to measure/compare underlying…

Yifeng Jiang

Software & solutions engineer, big data and machine learning, jogger, hiker, traveler, gamer.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store