聂晓辉
聂晓辉
硕士生导师
岗位:副研究员
职务:
所属部门:信息化前瞻技术研究开放实验室
学历:博士研究生
邮箱:xhnie@cnic.cn
通讯地址:

学科专业(学术型/专业型):计算机系统结构/计算机技术

招生方向:

AI for Networking、智能运维 (AIOps)、互联网基础资源监测与治理


承担科研项目等情况

主要承担中国科学院互联网基础资源监测和治理相关项目,研究利用 AI 技术赋能互联网和IT系统的监控、告警关联分析、故障定位、性能优化等方向,进一步提升互联网和IT系统稳定性和安全性。在国内外顶级会议及期刊发表论文25篇,负责研制系统曾在多家银行、证券、运营商、互联网等企业实施落地并取得良好效果,获得2023 年中国电子学会科学进步一等奖。


代表论著

[1] Guanglei He, Xiaohui Nie*, Ruming Tang, Kun Wang, Zhaoyang Yu, Xidao Wen, Kanglin Yin, Dan Pei. Guardian of the Resiliency: Detecting Erroneous Software Changes Before They Make Your Microservice System Less Fault-Resilient. IEEE/ACM 29th International Symposium on Quality of Service (IWQoS), 2024 (CCF B)
[2] Shenglin Zhang, Jun Zhu, Bowen Hao, Yongqian Sun, Xiaohui Nie, Jingwen Zhu, Xilin Liu, Xiaoqian Li, Yuchi Ma, Dan Pei. Fault Diagnosis for Test Alarms in Microservices Through Multi-source Data. Proceedings of the 32st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 2024 (CCF A)
[3] Zhe Xie, Shenglin Zhang, Yitong Geng, Yao Zhang, Xiaohui Nie, Zhenhe Yao, Longlong Xu, Yongqian Sun, Wentao Li, Dan Pei. Microservice Root Cause Analysis With Limited Observability Through Intervention Recognition in the Latent Space. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024. (CCF A)

[4] Zhenhe Yao, Changhua Pei, Wenxiao Chen, Hanzhang Wang, Liangfei Su, Huai Jiang, Zhe Xie, Xiaohui Nie, Dan Pei. Chain-of-Event: Interpretable Root Cause Analysis for Microservices through Automatically Learning Weighted Event Causal Graph. Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, 2024.(CCF A)
[5] Shenglin Zhang, Yongxin Zhao, Xiao Xiong, Yongqian Sun, Xiaohui Nie, Jiacheng Jiang, Fenglai Wang, Xian Zheng, Yuzhi Zhang, Dan Pei. Illuminating the Gray Zone: Non-intrusive Gray Failure Localization in Server Operating Systems. Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, 2024. (CCF A)
[6] Yiran Cheng, Bo Cheng, Pengxiang Jin, Yongqian Sun, Xiaohui Nie, Nengwen Zhao, Shenglin Zhang, Dan Pei. Effective Attribute Selection for Multi-dimensional Root Cause Analysis. IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE), 2022. (CCF B)
[7] Mingjie Li, Zeyan Li, Kanglin Yin, Xiaohui Nie, wenchi zhang, Kaixin Sui, Dan Pei. Causal Inference-Based Root Cause Analysis for Online Service Systems with Intervention Recognition. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022. (CCF B)
[8] Li, Zeyan, Zhao, Nengwen, Li, Mingjie, Lu, Xianglin, Wang, Lixin, Chang, Dongdong, Nie, Xiaohui, Cao, Li, Zhang, Wenzhi, Sui, Kaixin, Wang, Yanhua, Du, Xu, Duan, Guoqiang, Pei, Dan. Actionable and Interpretable Fault Localization for Recurring Failures in Online Service Systems. Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022. (CCF A)
[9] Canhua Wu, Nengwen Zhao, Lixin Wang, Xiaoqin Yang, Shining Li, Ming Zhang, Xing Jin, Xidao Wen, Xiaohui Nie, Wenchi Zhang, Kaixin Sui, Dan Pei. Identifying root-cause metrics for incident diagnosis in online service systems. IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), 2021. (CCF B)
[10] Minghua Ma, Shenglin Zhang, Junjie Chen, Jim Xu, Haozhe Li, Yongliang Lin, Xiaohui Nie, Bo Zhou, Yong Wang, Dan Pei. Jump-Starting Multivariate Time Series Anomaly Detection for Online Service Systems. USENIX ATC, 2021. (CCF A)
[11] Yuchao Zhang, Xiaohui Nie#, Junchen Jiang, WANG WenDong, Ke Xu, Youjian Zhao, Martin J. Reed, Kai Chen. BDS+: A Centralized Near-Optimal Network System for Inter-Datacenter Data Replication. IEEE/ACM Transaction on Networking[J], 2021, 29(2): 918-934. (CCF A)
[12] Nengwen Zhao, Junjie Chen, Zhou Wang, Xiao Peng, Gang Wang, Yong Wu, Fang Zhou, Zhen Feng, Xiaohui Nie, Wenchi Zhang, Kaixin Sui, Dan Pei. Real-time incident prediction for online service systems. Proceedings of the 28th ACM ESEC/FSE, 2020. (CCF A)
[13] Nengwen Zhao, Junjie Chen, Xiao Peng, Honglin Wang, Xinya Wu, Yuanzong Zhan, Zikai Chen, Xiangzhong Zheng, Xiaohui Nie, et al. Understanding and Handling Alert Storm for Online Service Systems. Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering(ICSE), 2020. (CCF A)

[14] Ping Liu, Yu Chen, Xiaohui Nie, Jing Zhu, Shenglin Zhang, Kaixin Sui, Ming Zhang, Dan Pei. FluxRank: A Widely-Deployable Framework to Automatically Localizing Root Cause Machines for Web Service Failure Mitigation. The 30th International Symposium on Software Reliability Engineering (ISSRE), 2019, (CCF B)
[15] Xiaohui Nie, Youjian Zhao, Zhihan Li, Guo Chen, Kaixin Sui, Jiyang Zhang, Zijie Ye, Dan  Pei. Dynamic TCP Initial Windows and Congestion Control Schemes Through Reinforcement Learning. IEEE Journal on Selected Areas in Communications (JSAC) [J], 2019, 37(6): 1231-1247. (CCF A)

[16]Yuchao Zhang, Junchen Jiang, Ke Xu, Xiaohui Nie, et al. ”BDS: A Centralized Near-Optimal Overlay Network for Inter-Datacenter Data Replication.” Proceedings of the Thirteenth EuroSys Conference. ACM, 2018. (CCF A)
[17] Nie Xiaohui, Zhao Youjian, Pei Dan, Chen Guo, Sui Kaixin, Zhang Jiyang, IEEE. Reducing Web Latency through Dynamically Setting TCP Initial Window with Reinforcement Learning. IEEE/ACM 26TH INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS)[J], 2018. (CCF B)