Yifeng JiangGenerative AI, RAG and Data InfrastructureA practical introduction to Generative AI, RAG and their data infrastructure6 min read·3 days ago----
Yifeng JiangBenchmarking Storage for AI WorkloadsChoose the right storage for your AI infrastructure6 min read·Jan 19, 2024----
Yifeng JiangData and AI Skills, Better TogetherView from a “Data Scientist” at a Storage Company4 min read·Dec 5, 2023----
Yifeng JiangMake Petabytes Searchable — Elasticsearch Data Tiering Made Simple and FastElastic searchable snapshots with fast S3 object storage5 min read·Mar 3, 2023--1--1
Yifeng JiangAccelerating Apache Spark with RAPIDS on GPUGetting started, and benchmarking Spark RAPIDS on Kubernetes and fast S35 min read·Feb 13, 2023----
Yifeng Jiang2022 in Big Data and Machine LearningA review from a field data and machine learning architect6 min read·Dec 30, 2022----
Yifeng JiangSmaller is Better — Big Data System in 2023Consolidating and accelerating big data with fast S3, Kubernetes and Spark RAPIDS4 min read·Nov 28, 2022----
Yifeng JiangBuild an Open Data Lakehouse with Spark, Delta and Trino on S3Combining the strength of data lake and warehouse in a way that is open, simple, and runs anywhere6 min read·Nov 7, 2022--1--1
Yifeng JiangComparing Big Data Performance with Different Data Lake StoragesBig data benchmarks using TPC-DS and YCSB with HDFS, FlashBlade S3, and Amazon S38 min read·Jun 16, 2022----
Yifeng JiangMetadata — Meet Big Data’s Little BrotherUnderstand, protect and leverage metadata in big data systems8 min read·Mar 1, 2022----