On Deepseek

Andrew Bolster

Senior R&D Manager (Data Science) at Black Duck Software and Treasurer @ Bsides Belfast and NI OpenGovernment Network

"I can't believe ChatGPT lost its job to AI"

Note: The continuing adventures of ‘a dozen people asked what I thought about a new AI model in work so I put them together and republished it a few months later when I got a quiet weekend’…

So, Deepseek stripped billions from the market on Monday. Do we care?

My 2c is that this is a fantastic series of innovations on the core design of LLMs, and based on those innovations, I wouldn’t be surprised if the training costs quoted as being in the mid-to-high-single-digit-millions-of-dollars are around the right order of magnitude for this (assuming you already had the team expertise of a PhD fueled quant-hedge fund in house and didn’t pay them SV salaries).

Yes, it is hilarious that ChatGPT lost its job to AI

No, I have no sympathy for OpenAI claiming that Deepseek used their model outputs for training; Ask the New York Times how they feel about OpenAI’s ‘hunger’

No, it shouldn’t be surprising that either Intellectual Property, Salaries, or Import Regimes are ‘relaxed’ in China; that’s been 40 years of globalisation policies coming home to roost.

No, China isn’t taking over the world or hiding illicit H100’s; it’s genuinely a clever architectural twist on the training and inference paradigms (I’m particularly entertained by the FP8 linearisation / quantisation coupled with the paired token inference and the reduced indexing implications of that, see here for that story),

No, NVidia isn’t ‘doomed’, but Sam might be a little warm under the collar,

Yes, Apple called it with the ‘don’t get in a model arms race’;

Yes, Meta is probably paying 10-50x of the ‘proposed’ training costs just for the salaries of a couple of the leads on their team, but hey, that’s capitalism

On the ‘OpenSource’ ness of all this, the release of DeepSeek undeniably showcases the immense potential of open-source LLM innovation (somewhat ironically following in the footsteps of “Open”AI releasing both GPT-2 and BERT in the open in 2019/2020 which arguably kicked off this whole mess). By making such a powerful model available under an MIT license, it not only democratizes access to cutting-edge technology but also fosters innovation and collaboration across the global AI community.

DeepSeek’s (rumoured) use of OpenAI “Chain of Thought” (Which I commented on before) originated data for its initial training highlights the importance of transparency and shared resources in advancing AI. In the context of ‘Open Source AI,’ it’s crucial that the underlying training and evaluation data are open, not just the architecture, evaluations and the resultant model weights. DeepSeek’s achievement in AI efficiency (leveraging a clever Reinforcement Learning-based multi-stage training approach, rather than the current trend of using larger datasets for bigger models), signals a future where AI is accessible beyond the billionaire-classes. Open-source AI, with its transparency and collective development, often outpaces closed source alternatives in terms of adaptability and trust.

Note: This is a little ‘inside baseball’ but retained for posterity.

Naturally, several people have been asking me when we’re putting DeepSeek on the LLM Gateway, and the short answer is no, we’re not deploying any self-hosted / open models at the moment. (We put it on the internal LLM gateway once it was supported in Azure AI Foundry as a self service model).

As for running it locally with ollama, this is totally fine under our LLM Guidelines; and Deepseek-coder is one of the more performant models on my 32GB 2021 M1 Pro, so go have fun, but no, we’ve no current plans to self-host a model through the LLM Gateway.

As a related side note, to me this all shows that while innovations are still happening in the foundational and reasoning modelling space, the best place that we can invest our time to make the most of these is in domain-specific data collection, representation, and modelling to make the most of these natural language systems, and that’s where the Data Science group are spending the majority of our resources.

Published: March 16 2025