The product and the processes used in the products of Deepseek are the result of cost reduction and productivity improvement orientation promoted by industrial engineering in engineering activities and tasks.
Modern Industrial Engineering - A Book of Online Readings.
365+ Lessons and articles and 100+ Case Studies on Industrial Engineering.
https://www.academia.edu/126612353/Modern_Industrial_Engineering_A_Book_of_Online_Readings
29.1.2025
🎉 DeepSeek-R1 is now live and open source, rivaling OpenAI's Model o1. Available on web, app, and API.
Click for details.
Into the unknown
Start Now
Free access to DeepSeek-V3.
Experience the intelligent model.
Get DeepSeek App
Chat on the go with DeepSeek-V3
Your free all-in-one AI tool
On 29 November 2023, DeepSeek released the DeepSeek-LLM series of models, with 7B and 67B parameters in both Base and Chat forms. It was developed to compete with other LLMs available at the time. The paper claimed benchmark results higher than most open source LLMs at the time, especially Llama 2. Like DeepSeek Coder, the code for the model was under MIT license, with DeepSeek license for the model itself.
7 May 2024
DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference.
It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models.
https://arxiv.org/abs/2405.04434
GitHub page on DeepSeek-V2
https://github.com/deepseek-ai/DeepSeek-V2
Model Architecture for Lower Cost
DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference:
MLA (Multi-head Latent Attention), which utilizes low-rank key-value union compression to eliminate the bottleneck of inference-time key-value cache, thus supporting efficient inference.
For Feed-Forward Networks (FFNs), we adopt DeepSeekMoE architecture, a high-performance MoE architecture that enables training stronger models at lower costs.
In December 2024, they released a base model DeepSeek-V3-Base and a chat model DeepSeek-V3. The model architecture is essentially the same as V2.
26.12.2024
NewsIntroducing DeepSeek-V3 2024/12/26
🚀 Introducing DeepSeek-V3
Biggest leap forward yet
⚡ 60 tokens/second (3x faster than V2!)
💪 Enhanced capabilities
🛠 API compatibility intact
🌍 Fully open-source models & papers
https://api-docs.deepseek.com/news/news1226
Cost Reduction of Deekseek Products
The Steps
[D] How exactly did Deepseek R1 achieve massive training cost reductions, most posts I read are about its performance, RL, chain of thought, etc, but it’s not clear how the cost of training of the model was brought down so drastically
DeepSeek’s AI Cuts $95M in Costs and 98% of GPUs
https://www.linkedin.com/pulse/deepseeks-ai-cuts-95m-costs-98-gpusthe-disruption-big-tech-lfhof/
DeepSeek’s Optimization Strategy: Redefining AI Cost and Efficiency.
Posted on Jan 29
By focusing on cost reduction, open-source collaboration, and efficient model architectures, DeepSeek is redefining what’s possible in AI—democratizing access and challenging the status quo.
As AI continues to evolve, one thing is clear: the future belongs to those who can do more with less. And DeepSeek is leading the way.
Technical Report
PDF Available
Analysis of Efficiency Enhancement through DEEPSEEK Technology and DIKWP Semantic Space Transformation Interaction
January 2025
DOI:10.13140/RG.2.2.29761.67684
Authors:
Yucong Duan, Hainan University
Zhendong Guo, Hainan University
6.2.2025
Event - Deploying DeepSeek V3 and DeepSeek-R1 on Amazon SageMaker
Speakers:
Supreeth S Angadi | GenAI/ML Startups Solution Architect, AWS,
Pradipta Dash | Senior Startups Solutions Architect, AWS,
Sourabh Jain | Sr. GenAI Startups Account Manager, AWS
Language:
English
Address:
Bagmane Constellation Business Park Block-7, Bagmane Constellation Service Rd, Ferns City, Doddanekkundi, Bengaluru, Karnataka 560048, IN
Event details
Day Thursday, February 6, 2025
Time 10:00 AM - 4:00 PM India Time
Type IN PERSON
Place Bagmane Constellation Business Park Block-7, Bagmane Constellation Service Rd, Ferns City, Doddanekkundi, Bengaluru, Karnataka 560048, IN
Are you a startup founder or machine learning (ML) engineer looking to effectively deploy and manage AI models while optimizing costs?
Join us for an intensive hands-on workshop exploring Amazon SageMaker Studio's unified ML development environment and learn production-ready strategies for model deployment.
DeepSeek is a cutting-edge family of large language models that has gained significant attention in the AI community for its impressive performance, cost-effectiveness, and open-source nature. DeepSeek offers a range of models including the powerful DeepSeek-V3, the reasoning-focused DeepSeek-R1, and various distilled versions. These models stand out for their innovative architecture, utilizing techniques like Mixture-of-Experts and Multi-Head Latent Attention to achieve high performance with lower computational requirements.
In this hands-on workshop, you'll learn about Amazon SageMaker Studio's comprehensive toolkit to self-host large language models from DeepSeek while maintaining cost efficiency.
Who is this for? This workshop is ideal for:
Startup founders and technical leaders creating AI solutions
ML Engineers and Data Scientists
DevOps professionals managing GenAI/ML infrastructure
Technical decision-makers evaluating GenAI/ML platforms
Developers interested in self-hosting open-source LLMs
Engineers looking to optimize their GenAI/ML infrastructure costs
During this hands-on workshop, you'll learn how to leverage Amazon SageMaker Studio's unified environment to streamline your ML workflows and implement cost-effective model deployment strategies.
Key highlights:
Master Amazon SageMaker Studio's unified interface and development environment
Hands-on implementation of self-hosting DeepSeek and similar models
Deploy cost-optimization strategies including scale-to-zero capabilities
Enhance inference performance using Fast Model Loader and container caching
Best practices for managing GenAI/ML development lifecycle
Real-world examples of production GenAI/ML infrastructure optimization
Interactive troubleshooting and optimization sessions
This workshop is specifically designed for startup teams who want to productionze GenAI/ML infrastructure while maintaining cost efficiency. You'll gain hands-on experience with Amazon SageMaker's advanced features and learn practical strategies for managing GenAI/ML workloads.
Prerequisites:
Laptop with adequate specifications for hands-on exercises
Basic understanding of machine learning concepts
Familiarity with Python programming
AWS account access
Basic knowledge of container technologies
Understanding of ML deployment concepts.
https://aws.amazon.com/startups/events/deploying-deepseek-v3-and-deepseek-r1-on-amazon-sagemaker-q1
https://en.wikipedia.org/wiki/DeepSeek
No comments:
Post a Comment