Course Outline:
Module 1: On-Premises Data Center Design for LLM Inferencing Systems
Objective: Design an on-premises data center with Cisco and NVIDIA technologies.
Topics:
- Cisco UCS and NVIDIA GPUs for high-performance compute.
- Network design and automation with Cisco Nexus Dashboard.
- Storage solutions for large-scale data management.
Module 2: On-Premises Data Center Implementation for LLM Inferencing Systems
Objective: Implement and configure an LLM inferencing data center using NVIDIA and Cisco technologies
Topics:
- Physical setup: NVIDIA GPUs on Cisco AI Pods and Secure AI Factories and Nexus networking configuration
- Performance testing and validation of inferencing pipelines
Module 3: Large Language Model (LLM) Foundations
Objectives:
- Understand the architecture and mathematical principles of LLMs.
- Learn design trade-offs for scalability and performance.
- Explore emerging innovations in LLM development.
Topics:
- Transformer architecture, self-attention mechanism, and positional encoding.
- Types of LLMs: Encoder-only, decoder-only, and encoder-decoder.
- Training objectives: Masked language modeling (MLM), causal language modeling (CLM), and sequence-to-sequence modeling.
- Scaling laws and challenges: Parameter size, dataset size, and compute.
- Emerging architectures: Reformer, Longformer, and multi-modal LLMs.
Module 4: Deployment of LLMs for Inferencing
Objectives:
- Deploy LLMs for production inferencing with high performance and scalability.
- Use NVIDIA TensorRT and Cisco Nexus Dashboard for optimized deployment.
Topics:
- Deployment architectures: On-premises, cloud, and hybrid.
- Optimizing inferencing with NVIDIA TensorRT: Precision calibration, layer fusion, and batching.
- Traffic management and load balancing with Cisco Nexus Dashboard.
- Exposing LLM APIs: RESTful and gRPC endpoints with security mechanisms.
Module 5: Monitoring, Logging, and Maintenance for LLM Systems
Objectives:
- Monitor and maintain LLM deployments using NVIDIA and Cisco tools.
Topics:
- Key metrics: Latency, throughput, GPU utilization, and memory usage.
- Monitoring tools: NVIDIA DCGM and Cisco Nexus Dashboard Insights.
- Maintenance workflows for hardware and software reliability.
Module 6: Optimizing LLM Models and their performance for Inferencing
Objectives:
- Optimize LLM inferencing pipelines for low latency and high throughput.
- Learn techniques like quantization, pruning, and model compression.
Topics:
- Quantization: FP16, INT8, and mixed precision.
- Pruning and knowledge distillation for lightweight models.
- TensorRT optimization: Dynamic batching and asynchronous execution.
- Benchmarking tools: NVIDIA Triton Inference Server, TensorRT Profiler.
Module 7: Data Collection, Retrieval and Preparation for LLM Applications
Objectives:
- Understand data requirements for LLMs and their impact on performance.
- Learn techniques for sourcing, cleaning, and managing large-scale datasets.
- Explore NVIDIA and Cisco tools for efficient data handling.
Topics:
- Data sourcing: Open-source, proprietary, and domain-specific datasets.
- Preprocessing: Cleaning, deduplication, tokenization, and filtering.
- Data management: Sharding, scalable storage, and high-speed data transfer.
- Ethical considerations: Bias detection, privacy compliance, and fairness.
Module 8: Scalable Pipeline Design for LLM Inferencing
Objectives:
- Build robust, scalable, and fault-tolerant pipelines for inferencing.
- Use batching, caching, and dynamic scaling for efficient pipelines.
Topics:
- Pipeline components: Batching, caching, and queuing.
- Load balancing with Cisco Nexus Dashboard for traffic optimization.
- Fault tolerance: Automatic failover and disaster recovery plans.
- Monitoring pipeline performance with NVIDIA DCGM and Cisco Nexus Dashboard.
Module 9: Hybrid Operations for LLM Inference
Objective: Implement a hybrid LLM inference approach that integrates on-premises and cloud-based models for flexible, secure, and scalable AI service delivery.
Topics:
- Deploying LLM inference endpoints on-premises and in the cloud.
- Configuring hybrid model access and request-routing policies.
- Managing security, visibility, and operational consistency across hybrid AI services.
Module 10: Security and Privacy Considerations in LLM Training and Inferencing
Objectives:
- Secure LLM pipelines using Cisco Nexus Dashboard, Cisco XDR, and NVIDIA tools.
Topics:
- NVIDIA runtime encryption and secure boot.
- Cisco Robust Intelligence for adversarial defense and vulnerability detection.
- Cisco XDR for unified threat detection and automated response.
- Traffic segmentation and endpoint authentication.
Lab Outline:
- Design a complete data center architecture for LLM inferencing
- Deploying Transformer Model Inference Services on Red Hat OpenShift AI
- Tokenization
- Build LLM-powered applications using inference services
- Configuring GPU Monitoring for LLM Inference Service
- Implementing Observability for LLM Systems with Splunk
- Benchmarking LLM Inference on Cisco AI PODs
- Implementing a Retrieval-Augmented Generation Pipeline
- Deploying Multi-Model Routing
- Configuring API Gateway and Rate Limiting for LLM Inference Services
- Implementing Multi-Model Hybrid Access for LLM Inference
- Enabling AI Runtime Protection
- Discovering AI Access
- Enforcing AI Guardrails
- Operating AI Security