Hi, I’m Lingxiao

Researcher @ Systems for Machine Learning

I am Lingxiao Ma (马凌霄), a Senior Researcher in Intelligent Cloud and Edge Group of Systems and Networking Research Group in Microsoft Research Asia (MSRA). I obtained Ph.D. in Computer Architecture from Peking University in 2020, under supervision of Prof. Yafei Dai and Prof. Zhi Yang. Before that, I obtained B.Sc in Computer Science from Beijing Normal University in 2015. My research works are focused on building efficient parallel systems for large-scale data analytics scenarios, e.g., deep learning, machine learning, graph processing, through leveraging modern hardware like GPU.

Work Experience

Present

Senior Researcher @ Intelligent Cloud and Edge

NNFusion: Rammer (OSDI'20), Roller (OSDI'22), Welder (OSDI'23), Cocktailer (OSDI'23), T10 (SOSP'24)
SparTA: SparTA (OSDI'22), nmSPARSE (MLSys'23), PIT (SOSP'23)
BitBLAS: Ladder (OSDI'24)
T-MAC: T-MAC (EuroSys'25)

Nov. 2017 - Jan. 2020

Research Intern (Full-Time) @ Systems Research Group

Mentor: Dr. Jilong Xue, Dr. Ming Wu
Projects: Rammer (OSDI'20), NeuGraph (NGra, USENIX ATC'19), SeerNet (CVPR'19)

Dec. 2016 - Jul. 2020

System Administrator @ Institute of NC&IS

Manage the website of Institute of Network Computing and Information Systems
Manage servers

Education

Sept. 2015 - Jul. 2020

Ph.D. @ Computer Architecture

Supervisor: Prof. Yafei Dai, Prof. Zhi Yang
Distributed Systems Group
Institute of Network Computing and Information Systems
School of Electronics Engneering and Computer Science

Sept. 2011 - Jul. 2015

Bachelor of Science @ Computer Science

Department of Computer Science and Technology
College of Information Science and Technology
Rank 1st in the 2013-2014 school year

Publications

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge.
Jianyu Wei, Shijie Cao, Ting Cao, Lingxiao Ma, Lei Wang, Yanyong Zhang, Mao Yang
Accepted by the 2025 ACM European Conference on Computer Systems (EuroSys'25)

Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor with T10.
Yiqi Liu, Yuqi Xue, Yu Cheng, Lingxiao Ma, Ziming Miao, Jilong Xue, Jian Huang
Accepted by the 30th ACM Symposium on Operating Systems Principles (SOSP'24)

Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation.
Lei Wang, Lingxiao Ma, Shijie Cao, Quanlu Zhang, Jilong Xue, Yining Shi, Ningxin Zheng, Ziming Miao, Fan Yang, Ting Cao, Yuqing Yang, Mao Yang
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI'24)

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits.
Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei
arXiv preprint arXiv:2402.17764

ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores.
     Best Paper Award
     Yuetao Chen, Kun Li, Yuhao Wang, Donglin Bai, Lei Wang, Lingxiao Ma, Liang Yuan, Yunquan Zhang, Ting Cao, Mao Yang
     Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2024 (PPoPP'24)

PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation.
Ningxin Zheng, Huiqiang Jiang, Quanlu Zhang, Zhenhua Han, Lingxiao Ma, Yuqing Yang, Fan Yang, Chengruidong Zhang, Lili Qiu, Mao Yang, Lidong Zhou
Proceedings of the 29th ACM Symposium on Operating Systems Principles (SOSP'23)

BitNet: Scaling 1-bit Transformers for Large Language Models.
Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, Furu Wei
arXiv preprint arXiv:2310.11453

Cocktailer: Analyzing and Optimizing Dynamic Control Flow in Deep Learning.
Chen Zhang, Lingxiao Ma, Jilong Xue, Yining Shi, Ziming Miao, Fan Yang, Jidong Zhai, Zhi Yang, Mao Yang
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI'23)

Welder: Scheduling Deep Learning Memory Access via Tile-graph.
Yining Shi, Zhi Yang, Jilong Xue, Lingxiao Ma, Yuqing Xia, Ziming Miao, Yuxiao Guo, Fan Yang, Lidong Zhou
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI'23)

Optimizing Dynamic Neural Networks with Brainstorm.
Weihao Cui, Zhenhua Han, Lingji Ouyang, Yichuan Wang, Ningxin Zheng, Lingxiao Ma, Yuqing Yang, Fan Yang, Jilong Xue, Lili Qiu, Lidong Zhou, Quan Chen, Haisheng Tan, Minyi Guo
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI'23)

Efficient GPU Kernels for N:M-Sparse Weights in Deep Learning.
Bin Lin, Ningxin Zheng, Lei Wang, Shijie Cao, Lingxiao Ma, Quanlu Zhang, Yi Zhu, Ting Cao, Jilong Xue, Yuqing Yang, Fan Yang
Proceedings of the Sixth Conference on Machine Learning and Systems (MLSys'23)

FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement.
Xiaonan Nie, Xupeng Miao, Zilong Wang, Zichao Yang, Jilong Xue, Lingxiao Ma, Gang Cao, Bin Cui
Proceedings of the 2023 ACM SIGMOD/PODS International Conference on Management of Data (SIGMOD'23)

Roller: Fast and Efficient Tensor Compilation for Deep Learning.
Hongyu Zhu, Ruofan Wu, Yijia Diao, Shanbin Ke, Haoyu Li, Chen Zhang, Jilong Xue, Lingxiao Ma, Yuqing Xia, Wei Cui, Fan Yang, Mao Yang, Lidong Zhou, Asaf Cidon, Gennady Pekhimenko
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22)

SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute.
Ningxin Zheng, Bin Lin, Quanlu Zhang, Lingxiao Ma, Yuqing Yang, Fan Yang, Yang Wang, Mao Yang, Lidong Zhou
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22)

Accelerating GNN Training with Locality-Aware Partial Execution.
     Best Paper Award
     Taehyun Kim, KyoungSoo Park, Changho Hwang, Peng Cheng, Youshan Miao, Lingxiao Ma, Zhiqi Lin, Yongqiang Xiong
     Proceedings of the 12th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys'21)

Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce.
Xupeng Miao, Xiaonan Nie, Yingxia Shao, Zhi Yang, Jiawei Jiang, Lingxiao Ma, Bin Cui
Proceedings of the 2021 ACM SIGMOD/PODS International Conference on Management of Data (SIGMOD'21)

Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks.
Lingxiao Ma, Zhiqiang Xie, Zhi Yang, Jilong Xue, Youshan Miao, Wei Cui, Wenxiang Hu, Fan Yang, Lintao Zhang, Lidong Zhou
Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI'20)

CuWide: Towards Efficient Flow-based Training for Sparse Wide Models on GPUs.
     Xupeng Miao, Lingxiao Ma, Zhi Yang, Yingxia Shao, Bin Cui, Lele Yu, Jiawei Jiang
     Proceedings of the 37th IEEE International Conference on Data Engineering (Extended Abstract) (ICDE'21)
     IEEE Transactions on Knowledge and Data Engineering (TKDE)

Architectural Implications of Graph Neural Networks.
Zhihui Zhang, Jingwen Leng, Lingxiao Ma, Youshan Miao, Chao Li, Minyi Guo
IEEE Computer Architecture Letters (CAL)

PCGCN: Partition-Centric Processing for Accelerating Graph Convolutional Network.
Chao Tian, Lingxiao Ma, Zhi Yang, Yafei Dai
Proceedings of the 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS'20)

NeuGraph: Parallel Deep Neural Network Computation on Large Graphs.
Lingxiao Ma, Zhi Yang, Youshan Miao, Jilong Xue, Ming Wu, Lidong Zhou, Yafei Dai
Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC'19)

SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity through Low-Bit Quantization.
Shijie Cao, Lingxiao Ma, Wencong Xiao, Chen Zhang, Yunxin Liu, Lintao Zhang, Lanshun Nie, Zhi Yang
Proceedings of the 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'19)

Towards Efficient Large-Scale Graph Neural Network Computing.
Lingxiao Ma, Zhi Yang, Youshan Miao, Jilong Xue, Ming Wu, Lidong Zhou, Yafei Dai
arXiv preprint arXiv:1810.08403

Garaph: Efficient GPU-accelerated Graph Processing on a Single Machine with Balanced Replication.
Lingxiao Ma, Zhi Yang, Han Chen, Jilong Xue, Yafei Dai
Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATC'17)

Patents

Graph Processing Method Using Auto-Replication Model.
一种基于自动选择副本因子模型的图计算方法. (ID: 201710533444.5)
Han Chen, Lingxiao Ma, Zhi Yang, Jilong Xue, Yafei Dai

Awards

Recent Awards

Best Paper Award, PPoPP'24	Mar. 2024
Best Paper Award, APSys'21	Aug. 2021
Award for Scientific Research, Peking University	Dec. 2019
Award for Scientific Research, Peking University	Dec. 2018
Zhi-Tang (智唐) Scholarship, Peking University	Dec. 2017
Award for Scientific Research, Peking University	Dec. 2017
Ph.D President Scholarship of Peking University	Jun. 2017
Miao-Zhen (秒针) Scholarship, Peking University	Dec. 2016

Selected Awards (Before Ph.D.)

Outstanding Graduate of Beijing Normal University	May 2015
First Award, The 12th Li-Yun (励耘) Outstanding Undergraduate Scholarship (6 of Beijing Normal University Undergraduates)	Dec. 2014
National Scholarship (Selected in book "Hope: Highlights of 2014 National Scholarship Winners" (《希望-2014年国家奖学金获奖学生风采录》) (ISBN: 9787301265581), 103 of 50000 Scholarship Winners in China, the only one in Beijing Normal University)	Oct. 2014
Silver Medal, The 39th ACM/ICPC Asia Regional Contest Anshan Site	Oct. 2014
Meritorious Winner, The 30th Mathematical Contest in Modeling	Feb. 2014
Bronze Medal, The 38th ACM/ICPC Asia Regional Contest Changchun Site	Dec. 2013
First Prize, China Undergraduate Mathematical Contest in Modeling, Beijing Regional Contest	Oct. 2013

Invited Talks

The 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI'24), Santa Clara, USA	Jul. 2024
The 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI'20), Virtual Event	Nov. 2020
2019 USENIX Annual Technical Conference (USENIX ATC'19), Renton, USA	Jul. 2019
Alibaba Group, Beijing, China	Jan. 2019
Institute of Network Computing and Information Systems Forum (Rank 1st), Beijing, China	Dec. 2018
The 16th Nationnal Software Application Conference (NASAC'17), Harbin, China	Nov. 2017
2017 USENIX Annual Technical Conference (USENIX ATC'17), Santa Clara, USA	Jul. 2017