Work Experience


Present


Senior Researcher @ Intelligent Cloud and Edge

  • NNFusion: Rammer (OSDI'20), Roller (OSDI'22), Welder (OSDI'23), Cocktailer (OSDI'23), T10 (SOSP'24)
  • SparTA: SparTA (OSDI'22), nmSPARSE (MLSys'23), PIT (SOSP'23)
  • BitBLAS: Ladder (OSDI'24)
  • T-MAC: T-MAC (EuroSys'25)

Nov. 2017 - Jan. 2020


Research Intern (Full-Time) @ Systems Research Group

  • Mentor: Dr. Jilong Xue, Dr. Ming Wu
  • Projects: Rammer (OSDI'20), NeuGraph (NGra, USENIX ATC'19), SeerNet (CVPR'19)

Dec. 2016 - Jul. 2020


System Administrator @ Institute of NC&IS

  • Manage the website of Institute of Network Computing and Information Systems
  • Manage servers

Education



Sept. 2015 - Jul. 2020


Ph.D. @ Computer Architecture



Sept. 2011 - Jul. 2015


Bachelor of Science @ Computer Science

Publications

  • T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge.    
         Jianyu Wei, Shijie Cao, Ting Cao, Lingxiao Ma, Lei Wang, Yanyong Zhang, Mao Yang
         Accepted by the 2025 ACM European Conference on Computer Systems (EuroSys'25)
  • Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor with T10.  
         Yiqi Liu, Yuqi Xue, Yu Cheng, Lingxiao Ma, Ziming Miao, Jilong Xue, Jian Huang
         Accepted by the 30th ACM Symposium on Operating Systems Principles (SOSP'24)
  • Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation.    
         Lei Wang, Lingxiao Ma, Shijie Cao, Quanlu Zhang, Jilong Xue, Yining Shi, Ningxin Zheng, Ziming Miao, Fan Yang, Ting Cao, Yuqing Yang, Mao Yang
         Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI'24)
  • The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits.  
         Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei
         arXiv preprint arXiv:2402.17764
  • ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores.    
         Best Paper Award
         Yuetao Chen, Kun Li, Yuhao Wang, Donglin Bai, Lei Wang, Lingxiao Ma, Liang Yuan, Yunquan Zhang, Ting Cao, Mao Yang
         Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2024 (PPoPP'24)
  • PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation.    
         Ningxin Zheng, Huiqiang Jiang, Quanlu Zhang, Zhenhua Han, Lingxiao Ma, Yuqing Yang, Fan Yang, Chengruidong Zhang, Lili Qiu, Mao Yang, Lidong Zhou
         Proceedings of the 29th ACM Symposium on Operating Systems Principles (SOSP'23)
  • BitNet: Scaling 1-bit Transformers for Large Language Models.  
         Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, Furu Wei
         arXiv preprint arXiv:2310.11453
  • Cocktailer: Analyzing and Optimizing Dynamic Control Flow in Deep Learning.    
         Chen Zhang, Lingxiao Ma, Jilong Xue, Yining Shi, Ziming Miao, Fan Yang, Jidong Zhai, Zhi Yang, Mao Yang
         Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI'23)
  • Welder: Scheduling Deep Learning Memory Access via Tile-graph.    
         Yining Shi, Zhi Yang, Jilong Xue, Lingxiao Ma, Yuqing Xia, Ziming Miao, Yuxiao Guo, Fan Yang, Lidong Zhou
         Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI'23)
  • Optimizing Dynamic Neural Networks with Brainstorm.    
         Weihao Cui, Zhenhua Han, Lingji Ouyang, Yichuan Wang, Ningxin Zheng, Lingxiao Ma, Yuqing Yang, Fan Yang, Jilong Xue, Lili Qiu, Lidong Zhou, Quan Chen, Haisheng Tan, Minyi Guo
         Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI'23)
  • Efficient GPU Kernels for N:M-Sparse Weights in Deep Learning.    
         Bin Lin, Ningxin Zheng, Lei Wang, Shijie Cao, Lingxiao Ma, Quanlu Zhang, Yi Zhu, Ting Cao, Jilong Xue, Yuqing Yang, Fan Yang
         Proceedings of the Sixth Conference on Machine Learning and Systems (MLSys'23)
  • FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement.  
         Xiaonan Nie, Xupeng Miao, Zilong Wang, Zichao Yang, Jilong Xue, Lingxiao Ma, Gang Cao, Bin Cui
         Proceedings of the 2023 ACM SIGMOD/PODS International Conference on Management of Data (SIGMOD'23)
  • Roller: Fast and Efficient Tensor Compilation for Deep Learning.    
         Hongyu Zhu, Ruofan Wu, Yijia Diao, Shanbin Ke, Haoyu Li, Chen Zhang, Jilong Xue, Lingxiao Ma, Yuqing Xia, Wei Cui, Fan Yang, Mao Yang, Lidong Zhou, Asaf Cidon, Gennady Pekhimenko
         Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22)
  • SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute.    
         Ningxin Zheng, Bin Lin, Quanlu Zhang, Lingxiao Ma, Yuqing Yang, Fan Yang, Yang Wang, Mao Yang, Lidong Zhou
         Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22)
  • Accelerating GNN Training with Locality-Aware Partial Execution.  
         Best Paper Award
         Taehyun Kim, KyoungSoo Park, Changho Hwang, Peng Cheng, Youshan Miao, Lingxiao Ma, Zhiqi Lin, Yongqiang Xiong
         Proceedings of the 12th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys'21)
  • Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce.  
         Xupeng Miao, Xiaonan Nie, Yingxia Shao, Zhi Yang, Jiawei Jiang, Lingxiao Ma, Bin Cui
         Proceedings of the 2021 ACM SIGMOD/PODS International Conference on Management of Data (SIGMOD'21)
  • Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks.    
         Lingxiao Ma, Zhiqiang Xie, Zhi Yang, Jilong Xue, Youshan Miao, Wei Cui, Wenxiang Hu, Fan Yang, Lintao Zhang, Lidong Zhou
         Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI'20)
  • CuWide: Towards Efficient Flow-based Training for Sparse Wide Models on GPUs.  
         Xupeng Miao, Lingxiao Ma, Zhi Yang, Yingxia Shao, Bin Cui, Lele Yu, Jiawei Jiang
         Proceedings of the 37th IEEE International Conference on Data Engineering (Extended Abstract) (ICDE'21)
         IEEE Transactions on Knowledge and Data Engineering (TKDE)
  • Architectural Implications of Graph Neural Networks.  
         Zhihui Zhang, Jingwen Leng, Lingxiao Ma, Youshan Miao, Chao Li, Minyi Guo
         IEEE Computer Architecture Letters (CAL)
  • PCGCN: Partition-Centric Processing for Accelerating Graph Convolutional Network. 
         Chao Tian, Lingxiao Ma, Zhi Yang, Yafei Dai
         Proceedings of the 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS'20)
  • NeuGraph: Parallel Deep Neural Network Computation on Large Graphs.      
         Lingxiao Ma, Zhi Yang, Youshan Miao, Jilong Xue, Ming Wu, Lidong Zhou, Yafei Dai
         Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC'19)
  • SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity through Low-Bit Quantization.  
         Shijie Cao, Lingxiao Ma, Wencong Xiao, Chen Zhang, Yunxin Liu, Lintao Zhang, Lanshun Nie, Zhi Yang
         Proceedings of the 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'19)
  • Towards Efficient Large-Scale Graph Neural Network Computing.    
         Lingxiao Ma, Zhi Yang, Youshan Miao, Jilong Xue, Ming Wu, Lidong Zhou, Yafei Dai
         arXiv preprint arXiv:1810.08403
  • Garaph: Efficient GPU-accelerated Graph Processing on a Single Machine with Balanced Replication.  
         Lingxiao Ma, Zhi Yang, Han Chen, Jilong Xue, Yafei Dai
         Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATC'17)
  • Patents

  • Graph Processing Method Using Auto-Replication Model.  
         一种基于自动选择副本因子模型的图计算方法. (ID: 201710533444.5)
         Han Chen, Lingxiao Ma, Zhi Yang, Jilong Xue, Yafei Dai
  • Awards

    Recent Awards

    Best Paper Award, PPoPP'24 Mar. 2024
    Best Paper Award, APSys'21 Aug. 2021
    Award for Scientific Research, Peking University Dec. 2019
    Award for Scientific Research, Peking University Dec. 2018
    Zhi-Tang (智唐) Scholarship, Peking University Dec. 2017
    Award for Scientific Research, Peking University Dec. 2017
    Ph.D President Scholarship of Peking University Jun. 2017
    Miao-Zhen (秒针) Scholarship, Peking University Dec. 2016

    Selected Awards (Before Ph.D.)

    Outstanding Graduate of Beijing Normal University May 2015
    First Award, The 12th Li-Yun (励耘) Outstanding Undergraduate Scholarship
    (6 of Beijing Normal University Undergraduates)
    Dec. 2014
    National Scholarship
    (Selected in book "Hope: Highlights of 2014 National Scholarship Winners" (《希望-2014年国家奖学金获奖学生风采录》) (ISBN: 9787301265581), 103 of 50000 Scholarship Winners in China, the only one in Beijing Normal University)
    Oct. 2014
    Silver Medal, The 39th ACM/ICPC Asia Regional Contest Anshan Site Oct. 2014
    Meritorious Winner, The 30th Mathematical Contest in Modeling Feb. 2014
    Bronze Medal, The 38th ACM/ICPC Asia Regional Contest Changchun Site Dec. 2013
    First Prize, China Undergraduate Mathematical Contest in Modeling, Beijing Regional Contest Oct. 2013

    Invited Talks

    The 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI'24), Santa Clara, USA Jul. 2024
    The 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI'20), Virtual Event Nov. 2020
    2019 USENIX Annual Technical Conference (USENIX ATC'19), Renton, USA Jul. 2019
    Alibaba Group, Beijing, China Jan. 2019
    Institute of Network Computing and Information Systems Forum (Rank 1st), Beijing, China Dec. 2018
    The 16th Nationnal Software Application Conference (NASAC'17), Harbin, China Nov. 2017
    2017 USENIX Annual Technical Conference (USENIX ATC'17), Santa Clara, USA Jul. 2017