Chapter 108 Chips
Chapter 108 Chips
On the third day after Shen Yiming joined the company, Zhang Lei threw a chip research report on Zuo Cheng's desk.
"Brother Cheng, the research report is out, and the situation isn't very optimistic." Zhang Lei pulled up a chair and sat down. "There are mainly two domestic AI chip manufacturers worth considering: Cambricon and Horizon Robotics. Cambricon's MLU270 has good computing power, reaching a peak of 128 TOPS, but its power consumption is relatively high, exceeding 15 watts. Horizon Robotics' Sunrise 3 has good power consumption control, only 2 watts, but its computing power is only 5 TOPS, which isn't enough to run large models."
Zuo Cheng flipped through the report, his brows furrowing deeper and deeper.
"What about imported ones?"
"Nvidia's Jetson Nano has a good balance between computing power and power consumption, but the purchase price of a single chip is three times that of domestically produced ones. Moreover, with the escalating trade frictions and unstable supply chains, it would be troublesome if we were to be cut off from supplies one day." Zhang Lei tapped the table. "Brother Cheng, my suggestion is to use domestically produced chips in the short term, and try to develop our own in the long term."
"Developed in-house?" Zuo Cheng looked up. "We're making AI chips. Are you confident in your abilities?"
"Not now," Zhang Lei said. "Fang Ze and I have discussed this. The core of AI chips is architecture design, not manufacturing. We can do the design, and we can handle the tape-out. The key is the architecture, which requires deep coupling between algorithms and hardware, and we happen to have both an algorithm team and a hardware team."
Zuo Cheng remained silent for a while. Developing its own AI chips is a long and expensive road, but if it doesn't do it, 402's AI business will forever be subject to chip suppliers.
"Let's finalize the short-term plan first," Zuo Cheng said. "For edge AI inference scenarios, we'll use Cambricon's MLU270; although its power consumption is a bit higher, its computing power is sufficient. For federated learning scenarios, we'll use Horizon Robotics' Sunrise 3, which has low power consumption and is suitable for large-scale deployment. For high-end training scenarios, we'll use NVIDIA initially, while simultaneously negotiating a customized solution with Cambricon."
"Okay, I'll follow up on the procurement. By the way, Cambricon said they could provide a batch of engineering samples for us to test, free of charge." Zhang Lei stood up, then remembered something, "Fang Ze said someone at Cambricon wants to see you. They're looking for partners in vertical applications for chip verification, and they're very interested in our IoT platform running AI."
"Let's set a time," Zuo Cheng said.
After Zhang Lei left, Zuo Cheng opened the system panel and flipped to the list of leaves for the AI branch.
Model compression optimization. This blade's capability is to significantly compress the size of AI models while maintaining accuracy, allowing large models to run on resource-constrained edge devices. If model compression is optimized to the extreme, Cambricon's MLU270 can run models that would otherwise require high-end NVIDIA chips, while also reducing power consumption.
Thinking of this, Zuo Cheng summoned Shen Yiming.
"Yiming, can the adaptive compression ratio you mentioned in your previous paper be combined with the compression optimization approach of this model in the system panel?"
Shen Yiming had only been with the company for three days and was still familiarizing himself with the 402 technology stack. Hearing Zuo Cheng's question, he paused for a moment, "Model compression optimization? You mean general model compression technology?"
Zuo Cheng realized he had almost let something slip and quickly corrected himself: "What I meant was, could your adaptive compression ratio algorithm be combined more deeply with mainstream model compression technologies in the industry? For example, knowledge distillation plus quantization plus your adaptive compression, a three-pronged approach."
Shen Yiming thought for a moment, then adjusted his glasses, his eyes lighting up: "Theoretically, it's possible. Knowledge distillation transfers knowledge from a large model to a smaller one, quantization reduces accuracy requirements, and adaptive compression dynamically adjusts communication and computation. With these three paths superimposed, the compression ratio could potentially exceed fifty times. But if the three paths are deeply coupled, it can go even further, because the losses from quantization and distillation can be compensated for in adaptive compression."
"Fifty times?" Zuo Cheng's heart raced.
"A conservative estimate." Shen Yiming picked up the whiteboard marker from Zuo Cheng's desk and drew a flowchart on the small whiteboard next to him. "You see, the traditional approach is a three-step sequential process: distillation, quantization, and compression. The error at each step accumulates. But if we parallelize the three steps, making the loss function of distillation include quantization constraints, and using adaptive compression to search the parameter space of quantization, the error won't accumulate; instead, they can compensate for each other."
He wrote several formulas on the whiteboard. Although the writing was messy, the logic was clear.
"However, this requires deep coupling of code in three directions, which is a considerable amount of work." Shen Yiming put down his pen. "It will take at least three people three months."
"You do it," Zuo Cheng said. "I'll provide you with whatever resources you need. You and Ma Hao will work together on the algorithm, and Fang Ze's hardware team will provide support for the engineering. Give me a technical solution within two weeks."
Shen Yiming took a deep breath: "Two weeks is a bit tight, but we can try."
Zuo Cheng patted him on the shoulder: "It's not about trying, it's about doing."
Shen Yiming paused for a moment, then nodded vigorously.
That afternoon, Zuo Cheng checked the passive effects of the tech tree on the system panel. After the AI branch was activated, the technology boost of all fused leaves increased from 1.2 times to 1.25 times. In other words, when Shen Yiming's model compression scheme was implemented in 402, its actual efficiency would be higher than theoretically expected.
But he couldn't tell Shen Yiming this.
Zuo Cheng closed the system panel, picked up his phone, and dialed Yu Ying's number.
"Kongkong, are you free tonight?"
Yes, what's wrong?
"I'd like to invite you to a good show," Zuo Cheng smiled. "People from Cambricon are coming to the company tomorrow to discuss cooperation. Would you like to come and sit in? They're very interested in edge AI federated learning, which is a perfect fit for your research."
Yu Ying was silent for two seconds on the other end of the phone: "You want me to be a technical consultant, right?"
"I mainly miss you. Being a consultant is just a side thing."
"Hmph." Yu Ying laughed. "Alright, what time?"
"At 10 a.m., I'll have Han Lu send you your location."
After hanging up the phone, Zuo Cheng leaned back in his chair and looked out the window. As dusk fell, the streetlights in the science park gradually came on. Office 402 was brightly lit. Shen Yiming was engrossed in writing a proposal at his workstation, Fang Ze was testing chip power consumption in the lab, and Chen Hao was adjusting computing power allocation in the server room.
402's journey with AI chips began with selecting suppliers, proceeding step by step. But Zuo Cheng knew this was only the first step. The real trump card was model compression optimization combined with federated learning. Once these two technologies were properly integrated, 402 would be able to achieve high-end chip performance using domestically produced mid-range chips. At that point, chips would no longer be the bottleneck, but rather 402's cost advantage.
That's the real moat.
novelraw