Beyond the Chill: Understanding the True Scope of U.S. Outbound Investment Restrictions in Chinese AI

On January 2, 2025, the Provisions Pertaining to U.S. Investments in Certain National Security Technologies and Products in Countries of Concern (so called Outbound Investment Program, “OIP”) became effective.[1] Under the rule, U.S. persons are prohibited from engaging the following transactions concerning the development of any AI system that is trained using a quantity of computing power greater than 10^25 computational operations (e.g., integer or floating-point operations), or 10^24 computational operations (e.g., integer or floating-point operations) using biological sequence data. Further, U.S. person shall file a notification of a transaction with the Department of Treasury if the transaction concerning the development of any AI system that is trained using a quantity of computing power greater than 10^23 computational operations (e.g., integer or floating-point operations).

The Treasury Department’s outbound investment restrictions have cast a long shadow over U.S. investment in China’s AI sector, creating a widespread reluctance that exceeds the actual scope of the regulations. While many investors have retreated entirely from Chinese AI opportunities, treating the rules as an effective blanket ban, a careful analysis of the regulatory framework reveals well-defined parameters that still permit significant investment activity. The disconnect between the rules’ actual requirements and their perceived severity has created an unintended “chilling effect,” where investors’ overcautious interpretation has led to a self-imposed restriction. This gap between perception and reality suggests that opportunities exist for investors who take the time to understand the technical specifications and compliance requirements, rather than following the market’s overly conservative reaction.

This article aims to carefully explain the misconstrued “computing power,” and argues that U.S. funds should continue to invest in valuable Chinese AI companies without violating the OIP.

Rule-Making History Regarding Computing Power

The Treasury Department’s decision to define prohibited and notifiable transactions using precise technical parameters like computing power thresholds marks a striking departure from traditional financial and investment regulations, which typically rely on broader, more qualitative criteria (like CFIUS review). During the rulemaking process, various stakeholders proposed alternative approaches, such as using end-use restrictions or focusing on specific AI applications. However, despite extensive dialogue with industry experts and careful consideration of different frameworks, Treasury ultimately opted for clear-cut technical specifications. This choice of concrete metrics like 10^25 computing power thresholds, while unusual for a financial regulator, reflects Treasury’s determination to create objective, measurable standards.

There are several noticeable feedbacks regarding computing power collected by the Treasury Department during the rule-making process, and below sets out a summary:

One commenter requested the Treasury Department clarify that “the computing power thresholds involving an AI system pertain to the combined computing power required to train a given AI system, including computing power used to train relevant sub-models or generate inputs to inform such an AI system.” The Treasury clarified that “the computing power thresholds refer to the aggregate or combined computing power required to train a given AI system…” and “developing an AI model based on the transfer of knowledge from one model to another would include the computing power required to train both models.” The Treasury further clarified that “different versions of an AI system, including adaptations, derivatives, subsequent generations, or successor systems, should be assessed as distinct AI systems since the designed end-use or capabilities of a successor system could vary from a prior version.”

Several commenters expressed concerns about using computing power (e.g. floating-point operations per second) to assess risks. Regarding the 10^23 computational operations threshold, the Treasury explained that this threshold was selected “based on the current number of publicly known AI models originating from the PRC.” Treasury Department reserves the right to adjust and improve benchmarks for evaluating AI capabilities.

With respect to the 10^24 technical threshold concerning bio-sequence data, one commenter cited the “inconclusive relationship between AI training compute and bio-related risks, the distinct characteristics and open-source nature of life sciences research, and the value of a notification regime towards better understanding this sub-category of AI models.” The Treasury Department insisted upon using computing power as the technical threshold despite such commentary.

One commenter recommended that the Treasury Department add a prohibition requirement focused on targeting computing clusters required to train frontier AI systems. The commenter provided specific recommendations for technical criteria related to such computing clusters, including networking of over 100Gbits/s and a calculation of theoretical maximum computing capacity. The Treasury tacitly rejected the proposal by noting the suggestion “aligns conceptually with a reporting requirement for AI clusters.” Note that using computing clusters as a technical criteria appears in another executive order issued by President Biden relating to domestic AI companies’ reporting (see fn. 2).

One commenter recommended that the Treasury Department publish a list of AI applications “authorized” for investment regardless of computing power. In response, the Treasury Department stated that “there are no restrictions on outbound investment involving AI applications that do not meet the relevant definitions and thresholds set forth in the Final Rule, even if there is not a definitive list of such applications.”

Based on the rule-making history outlined above, it is very clear that the Treasury Department had the opportunities to consider other parameters or thresholds, and was informed that computing power might be an inconclusive factor to measure an AI system’s performance, before deciding to use computing power as a key metric to measure the risks posed by controlled AI systems. Although the Treasury agreed to periodically assess whether the quantity of computing power remains effective in addressing threats to the national security of the U.S. with other agencies, the author learned from an anonymous former U.S. government official that once the technical parameters are set out in the rule, they shall be very difficult to change at a later point given the complexities associated with the rule-making process.

Why Computing Power Might Not Be A Good Metric

When the OIP rule was at its rule-making stage, some industry experts resisted the idea of using computing power as a metric to measure an AI system’s performance. In fact, reducing compute power while achieving optimal performance in large language models (“LLMs”) is a critical area of research aimed at addressing the escalating computational demands associated with these advanced AI systems. In this section, I shall explain why the Treasury Department has chosen computing power as a technical criteria in the OIP, and opine on how model developers can utilize different methods to get their models beneath that threshold.

Scaling Law

The theoretical basis for the U.S. government’s choice of compute power as a regulatory threshold is rooted in the Scaling Law – an empirical law akin to the Moore’s Law in the semiconductor world that talks about computing power, size of model, data size, and model performance. The Scaling Law suggests that larger models trained on more extensive datasets generally exhibit improved performance.

The most influential paper on computing power and model performance might be DeepMind’s 2022 paper entitled “Training Compute-Optimal Large Language Models” by Jordan Hoffmann, Sebastian Borgeaud, and Arthur Mensch.[2]The paper investigated the optimal model size and number of tokens for training a transformer language model under a given compute budget, and found that “current large language models are significantly undertrained.” The paper finally concluded that as compute budget increases, model size and the amount of training data should be increased in approximately equal proportions. The Hoffmann paper then becomes a classic research advocate for scaling law. As a result, one heuristic for estimating the FLOPs used to train a model is as follows:

FLOPs = 6ND (where N is parameters, and D is tokens)

In the example of Llama 3 400B model, which has 400B parameters and 15.6T Tokens, its compute power measured by FLOPs is approximately 3.8 * 1025.[3]

Rebuttals to Use Computing Power as A Threshold

While Scaling Law suggests that larger models trained on more extensive datasets generally exhibit improved performance, the relationship is not necessarily linear. This leaves room to improve a model’s performance by focusing on high-quality data and efficient training techniques to maximize resource utilization and reduce computing power needed to train such a model.

Ingrid Stevens published on May 1, 2024 an article entitled “Regulating AI: the Limits of FLOPs as a Metric”, which beautifully laid out why she believed using computing power (e.g. FLOPs) is not a good regulatory measure and fails to encapsulate the complexity and potential risks associated with AI models.[4] Stevens’ first argument is that using compute power as a regulatory proxy overlooks other critical factors that have influence over the performance of a model, such as the quality of training data. One can get a very capable model by spending fewer FLOPs training on very good training data. Further, one can improve a training model by fine-tuning the model and combining the model with external knowledge base (so called “Retrieval-Augmented Generation,” or RAG). As a result, Stevens believe that very strong LLMs can be developed without fitting exactly in the regulatory math (i.e. the 1023 or 1025 threshold).

Stevens is right that there are indeed many other techniques for reducing compute power to reach the same level of model performance, such as using high-quality data, model compression, quantization, and innovative architectural solutions. To say the very least, to build a model for a specific domain or task (in lieu of a ChatGPT-type LLM), you can start by designing a model with fewer parameters and use high-quality data to train the model, which is achievable by using far less computing power. If you are interested in developing a foundational LLM, however, there are many techniques that you can employ to combat computing power restraint:

Using high-quality data. Having a smaller but high-quality dataset can sometimes be more effective than a larger, lower-quality one. High-quality data helps in better generalization, where the model learns to apply learned patterns to new, unseen data more effectively. This can lead to a reduction in the need for extensive training data, thus saving compute resources. One example of high-quality data is synthesized data. In fact, while the author was in Silicon Valley last year, one of the most hotly discussed topics was to use synthesized data to pretrain models to achieve better performance.

Data utilization. Relatedly, efficient data utilization is essential to mitigate the extensive power consumption and lengthy training times associated with large datasets. Strategies such as data filtering can enhance training focus by directing attention toward more informative samples while discarding less useful data. Also note that this technique is particularly useful in inferencing stage.[5]

Model compression. Model compression involves techniques that aim to reduce the size and computational requirements of models without severely impacting their performance. This includes methods such as pruning, where less significant weights are removed from the model. However, under the Treasury Department’s interpretation, knowledge distillation, which transfers knowledge from a larger, more complex model to a smaller, more efficient one, would not be considered as a good method to compress models as the larger model’s computing power will be included in the calculation.

Sparse Matrix LLM. A sparse matrix LLM refers to a model where its weights are represented using a sparse matrix instead of a traditional dense matrix. In a sparse matrix, most of the elements are zeroed out or irrelevant, while only a small percentage of the elements are non-zero and contain significant values. This reduces the memory footprint and computational overhead.[6]

Quantization. Quantization involves reducing the precision of the parameters used in a model, which can significantly decrease the amount of memory and computing power. For instance, a model that uses 64-bit precision for each parameter can be converted to 16-bit or even 8-bit precision. The most aggressive work I have seen is to use a transformer architecture with just three values in each parameter (i.e. log2 3 = 1.58 bit). The author of that research work reported that his architecture “sav[ed] 71.4 times the arithmetic operations energy for matrix multiplication compared to the Llama baseline.”[7] Additionally, one can also use the lower precision calculation formats such as INT8 or INT4 in lieu of the traditional FP32 or FP16 formats.[8]

Efficient Use of Memory Hierarchy. Another optimization approach involves restructuring computation orders to better leverage the memory hierarchy of hardware, particularly GPUs. A model that has a large memory requirement will naturally be slower than one with a small memory need since data has to travel a longer distance to reach the computation block of the system. By fusing computations across layers, models can minimize the frequency of memory access, resulting in better performance. This technique not only accelerates computation but also reduces the overall workload on memory bandwidth, which is critical for improving the efficiency of LLMs.[9]

This list of computing power-reducing techniques is far from exhaustive. By employing these techniques, developers can significantly reduce the compute power required for large language models, enabling broader adoption and more efficient use of resources in various applications. Most importantly, they would allow a Chinese startup to develop a LLM that falls below the OIP’s threshold.

Examples: DeepSeek V3 Model and MiniCPM

We now provide two exemplar models developed by Chinese companies / teams that demonstrate strong capabilities while falling under the prohibitive and notifiable transactions thresholds. The core question is whether a U.S. person is prohibited from investing in extremely promising Chinese AI companies like these two without violating the new OIP’s computing power threshold.

DeepSeek V3

The first example is DeepSeek. On December 27, 2024, DeepSeek released a widely-praised V3 model. To capture how innovative DeepSeek’s model is, Andrej Karpathy, founding member of OpenAI, posted the following:

DeepSeek-V3 is a Mixture-of-Experts (“MoE”) language model with 671B total parameters pretrained on 14.8 trillion diverse and high-quality tokens, followed by supervised fine-tuning and reinforcement learning. DeepSeek-V3 uses low-precision training, i.e. FP8 mixed precision training, and validates its effectiveness on an extremely large-scale model. DeepSeek also designed its own algorithm for efficient pipeline parallelism, developed efficient communication kernels to fully utilize connection bandwidths, and optimized the memory footprint. DeepSeek-V3 only requires 2.788M H800 GPU hours for its full training with a rough cost of only USD 5.576 million (assuming USD 2 per GPU hour rental cost). [10]To recap, DeepSeek utilized many of the techniques outlined above to reduce computing power required to train such a large-scale model.

DeepSeek-V3’s technical report did not detail how much computing power was used in training the model. Let’s do a simple calculation:[11]

If we consider the peak performance of the H800 in FP32 (59.3 * 1012 FLOPS),[12] and knowing that FP8 can quadruple that of FP32, H800’s FP8 theoretical performance would be roughly 4*59.3*1012 FLOPS per GPU = 2.37 * 1014 FLOPS.

Total FLOPs for all GPUs would be 2.37 * 1014 FLOPS * 3600 seconds/hour * 2,788,000 hours = (approx.) 2.38 * 1024 FLOP

What makes DeepSeek’s approach particularly innovative is their holistic optimization strategy. Unlike companies that focus on a single optimization technique, DeepSeek has combined multiple approaches - from low-precision training to efficient pipeline parallelism and optimized memory footprint - creating a synergistic effect that dramatically reduces computing power requirements while maintaining model performance. This comprehensive approach to efficiency demonstrates that Chinese AI companies can achieve state-of-the-art results while staying well within regulatory thresholds.

Because the computing power required to train DeepSeek V3 is less than the regulatory threshold for prohibited transaction, a U.S. investor will be able to invest in DeepSeek – one of the most promising AI companies in China – with a filing to the Treasury Department. Note that the OIP did not specify whether the Treasury Department will selectively block certain notifiable transaction just like the CFIUS review process – for now, the requirements surrounding notifiable transactions appear to be procedural instead of calling for substantive review.

MiniCPM

MiniCPM is another Chinese model developed by researcher from Tsinghua University and ModelBest. MiniCPM is a series of multimodal language models that uses far fewer parameters than state-of-the-art LLMs. For example, its MiniCPM-Llama3-V2.5 model has achieved outstanding results on a set of comprehensive evaluation suite covering 11 popular multimodal benchmarks.[13] Similarly, MiniCPM also efficiently employed an array of optimizations to enable its deployment on resource-constrained devices, ranging from quantization and integration with Qualcomm’s QNN framework to accelerate integration onto mobile platforms.[14]

MiniCPM represents an important direction for Chinese startups (frankly, all startups) that do not have the resources to compete with industry giants to procure massive computational resources. That is, they develop efficient small language model (“SLM”) for a specific domain or purpose. For example, MiniCPM-V 2.8B is a strong multimodal model for efficient end-side deployment that can achieve state-of-the-art performance. Using the overinclusive formula for calculating computing power 6 ND, a MiniCPM’s computing power is roughly on the scale of 1019, far smaller than the threshold for notifiable notification.[15]

As a result, a U.S. investor will be able to invest in ModelBest without even needing to file a notification to the Treasury Department.

Can Chinese AI Models Continue to Attract U.S. Fundings?

Despite the unprecedented scope of the Treasury Department’s outbound investment restrictions and the chilling effect on U.S. investment in China’s AI sector, significant opportunities remain for Chinese startups operating below the specified technical thresholds. The regulation’s precise parameters, particularly the computing power threshold of 10^25 FLOPs (or INTS), creates a clear pathway for continued U.S. investment in AI companies that focus on commercial applications without reaching these high-performance computing levels. This technological bright line allows Chinese startups to structure their operations and growth strategies to maintain access to U.S. capital markets, while investors who understand the technical specifications can confidently navigate these new boundaries. In my view, rather than representing an insurmountable barrier, the regulations effectively create a framework within which U.S.-China AI investment can continue, albeit with more defined technological limitations.

Further, opportunities exist for investment to reach certain Chinese casing AI startups. To recap, the OIP is heavily focused on training.[16] If your startup only utilizes these foundational models at an inferencing stage, it is likely your startup is outside of the OIP’s remit, because inferencing typically requires far less computing power than training. On the other hand, if your startup company customizes, configures, or fine-tunes an existing third-party AI model that goes over the technical computing power threshold, there is an arguable case that the OIP applies (see fn. 12). Consider the examples below:

Perplexity, a leading U.S. AI-powered search engine that just reportedly closed a USD 500 million funding round based on USD 9 billion valuation. Perplexity does not develop its proprietary LLMs; rather, it relies on external LLMs (e.g. OpenAI and Anthropic models) and customize them for Perplexity’s own use. If Perplexity is a Chinese company, an arguable case exists that a U.S. investor cannot invest in Perplexity because it finetunes existing models that go beyond the technical threshold.

POE (Platform for Open Exploration), a product currently under development by Quora, aims to make an API that will make it easy for any AI developers to plug their model into the platform. As a platform to collect all models to enable users to perform inferencing more easily, it is likely that a U.S. investor’s investment in POE is outside of the OIP’s remit, had POE been a Chinese company.

Also consider a Chinese healthcare AI startup developing diagnostic tools using computer vision. While the company might utilize existing foundation models for initial development, their focus on specific medical imaging applications and custom datasets means they’re likely to operate well below the computing power thresholds. The company could develop highly sophisticated diagnostic capabilities through transfer learning and domain-specific optimization without approaching the regulatory limits. This type of focused, application-specific AI development represents a clear opportunity for continued U.S. investment in Chinese AI innovation.

Looking Ahead: Adaptation and Innovation

As Chinese AI companies adapt to this new regulatory environment, we’re likely to see several emerging trends. First, companies may increasingly focus on domain-specific AI applications where sophisticated capabilities can be achieved with lower computing requirements. Second, we may see accelerated innovation in model efficiency techniques, with Chinese companies potentially leading global development in areas like model compression and optimization. Third, companies might adopt hybrid approaches, combining smaller, highly efficient proprietary models with selective use of larger third-party models for specific applications.

These adaptations could ultimately drive positive innovation in the AI field, pushing companies to develop more efficient and focused AI solutions. While the OIP sets certain boundaries, it may inadvertently accelerate developments in AI efficiency that benefit the entire industry. Companies that successfully navigate these constraints while maintaining innovation will likely emerge as attractive investment opportunities for U.S. investors seeking exposure to China’s AI sector. And U.S. investors that do not shun away from the world’s second largest AI market will reap benefits in the long term.

Conclusion

Chinese AI companies should maintain their innovative momentum and technological advancement while carefully navigating the new landscape of U.S. investment restrictions. Rather than reflexively withdrawing from international capital markets due to the Treasury Department’s outbound investment program, these companies would benefit from developing a nuanced understanding of the regulatory parameters. The clear technical thresholds provide a roadmap for structuring operations and development paths that remain open to U.S. investment. By combining continued innovation in proprietary technologies with a sophisticated grasp of compliance requirements, Chinese AI companies can pursue growth strategies that leverage available funding opportunities while respecting regulatory boundaries. This thoughtful approach to development and fundraising will be crucial for companies seeking to maintain their competitive edge in the global AI landscape.

[1] The full set of the rule is available at https://home.treasury.gov/system/files/206/TreasuryDepartmentOutboundInvestmentFinalRuleWEBSITEVERSION_0.pdf

[2] Hoffmann, Borgeud, and Mensch, “Training Compute-Optimal Large Language Models,” arXiv:2203.15556v1 [cs.CL] 29 May 2022, available at https://arxiv.org/pdf/2203.15556

[3] Note that Llama 3’s computing power is just under the threshold set out in President Biden’s Executive Order on “the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence”, under which the Secretary of Commerce shall require companies developing or demonstrating an intent to develop potential dual-use foundation models to provide the Federal Government with certain information regarding their models or computing clusters if (1) their model was trained using a quantity of computing power greater than 1026 integer or floating-point operations, or using primarily biological sequence data and using a quantity of computing power greater than 1023 integer or floating-point operations; or (2) their computing cluster that has a set of machines physically collocated in a single datacenter, transitively connected by data center networking of over 100 Gbit/s, and having a theoretical maximum computing capacity of 1020 integer or floating-point operations per second for training AI. See https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/

[4] https://medium.com/@ingridwickstevens/regulating-ai-the-limits-of-flops-as-a-metric-41e3b12d5d0c

[5] https://developer.nvidia.com/blog/mastering-llm-techniques-inference-optimization/

[6] https://roundtable.datascience.salon/optimizing-large-language-models-techniques-and-future-directions-for-efficiency

[7] https://huggingface.co/blog/1_58_llm_extreme_quantization

[8] I would like to thank my husband for explaining this point to me.

[9] https://roundtable.datascience.salon/optimizing-large-language-models-techniques-and-future-directions-for-efficiency

[10] https://arxiv.org/pdf/2412.19437

[11] It’s important to note that calculating actual computing power usage involves several variables and potential uncertainties. While our calculation based on GPU specifications provides a reasonable approximation, actual computing power consumption may vary due to factors such as: (1) Hardware utilization efficiency: The theoretical peak performance of GPUs may differ from actual utilization rates during training, as factors like memory bandwidth, cooling requirements, and system overhead can affect real-world performance.(2) Training optimization methods: Different implementation choices in distributed training, gradient accumulation, and memory management can lead to variations in actual computing power consumption. (3) Dynamic scaling: Modern AI training often involves dynamic batch sizes and adaptive learning rates, which can cause fluctuations in computing power usage throughout the training process. Therefore, while our estimate of 2*10^21 scale computing power for DeepSeek V3 is based on reasonable assumptions, actual computing power usage might vary within a certain range of this figure. However, even accounting for potential variations, DeepSeek V3’s computing power requirements would still fall well below the OIP’s regulatory threshold.

[12] https://www.techpowerup.com/gpu-specs/h800-sxm5.c3975

[13] https://medium.com/@simeon.emanuilov/minicpm-llama3-v-2-5-review-a-game-changing-open-source-multimodal-language-model-109d2e68989f

[14] Ibid.

[15] Please refer to MiniCPM’s technical report available at https://arxiv.org/pdf/2404.06395 for its parameter information and token size information.

[16] The OIP does not differentiate between pretraining and fine-tuning stages in the training process. In fact, the Treasury Department explicitly explained in the rule that:“[C]ustomizing, configuring, or fine-tuning a third-party AI model or machine-based system strictly for internal, non-commercial use would not itself trigger the prohibition … for covered transaction involving AI systems unless such activity has a government intelligence, mass-surveillance, or military end use, or is for digital forensics tools, penetration testing tools, or the control of robotic system. The effect of this is that a person customizing, configuring, or fine-tuning a third-party AI model or machine-based system strictly for its own internal, non-commercial use for cybersecurity applications, or other end uses or applications not listed… would not implicate a prohibition solely on that basis.”

Page updated

Google Sites

Report abuse