Microsoft AI Releases Phi 3.5 mini, MoE and Vision with 128K context, Multilingual and MIT License👋 Hello ,Happy Friday! 🌟Welcome toDataPro #108—Your Weekly Data Science & ML Digest! 🚀This week, we’re diving into exciting new advancements, including Snowflake Arctic’s debut on Amazon SageMaker JumpStart, the Jamba 1.5 Model Family on Vertex AI, and Mistral-NeMo-Minitron's game-changing efficiency. Plus, we’ve handpicked top resources for big data processing, extraction, and modeling just for you!⚡Quick Bytes: Stay Ahead of the Curve!AWS Gets a BoostSnowflake Arctic Now on Amazon SageMaker JumpStart:Elevate your models with this latest addition.Optimize with AI:Explore Amazon Redshift Serverless for smarter scaling.Google's ML PowerhouseJamba 1.5 on Vertex AI:Unleash AI21 Labs' latest models.Airflow Mastery:Tackle Apache Airflow with new Cloud Composer updates.📚 Must-Read ResourcesEssential Data Science GuideData Science Fundamentals Pocket Primer: Your go-to manual for key concepts.Unlock Looker’s PotentialMastering Looker and LookML: Become a pro in views, dashboards, and databases.AI Techniques DemystifiedArtificial Intelligence and Expert Systems: Dive deep into problem-solving with AI.🔍LLMs & GPTs: What's New?DaRec FrameworkPlug-and-Play Alignment: Revolutionize your models with DaRec.Tinygrad InsightsSimplified Deep Learning: Experiment with this lightweight framework.NVIDIA’s LatestMistral-NeMo-Minitron: Redefining performance with advanced techniques.Microsoft AI UpdatePhi 3.5 Mini: Multilingual, scalable, and open-source.Innovative ProjectsOpenResearcher: AI-driven research acceleration.DeepSeek-Prover: The new leader in formal theorem proving.E-commerce AdvancementsMarqo Fashion Models: Tailored embeddings for retail success.Compact AI SolutionsAnswer.AI's ColBERT: Faster and smarter search models.✨ Spotlight: What’s TrendingGenAI’s Document Extraction Revolution:Transforming the way we process information.AI-Driven Prosperity:The future of work and universal basic income.Machine Unlearning:A crucial skill for modern data scientists.Protecting Speaker Privacy:New tools for DNN-based speech processing.Azure Cloud Platforms:Building robust data solutions with Azure Landing Zones.Stay inspired and ahead of the curve! 🌐DataPro Newsletter is not just a publication; it’s a complete toolkit for anyone serious about mastering the ever-changing landscape of data and AI. Grab your copyand start transforming your data expertise today!Calling Data & ML Enthusiasts!Want to share your insights and build your online reputation? Contribute to our new Packt DataPro column! Discuss tools, share experiences, or ask questions. Gain recognition among 128,000+ data professionals and boost your CV. Simply reply with your Google Docs link or use our feedback form. Whether you’re looking for visibility or a discreet approach, we’re here to support you.Share your content today and engage with our vibrant community! We’re excited to hear from you!Take our weekly survey and get a free PDF copy of our best-selling book,"Interactive Data Visualization with Python - Second Edition."We appreciate your input and hope you enjoy the book!Share Your Insights and Shine! 🌟💬📚Expert Insights from Packt CommunityDid you know? “Books are the quietest, most constant friends, holding the world’s treasured wisdom. They offer gentle guidance and timeless lessons, passing their rich inheritance from one generation to the next.”We’re thrilled to bring you this week’s hottest new releases, straight from the experts to your bookshelf! Whether you’re aiming to upskill or explore something new, now’s the perfect time to grab these invaluable resources.As a special thank you to our newsletter readers, enjoy an exclusive30% off all eBooks at Packtpub.com.Crafted by industry professionals, these books offer unique insights you won’t find elsewhere.Don’t miss out on these Packt-exclusive deals—your chance to learn from the best at a fantastic price!Data Science Fundamentals Pocket Primer: An Essential Guide to Data Science Concepts and TechniquesBy Mercury Learning and Information, Oswald CampesatoImagine having a go-to guide that gently walks you through the essentials of data science, making complex concepts feel accessible. This book does just that. With a blend of practical exercises and real-world examples, it simplifies the vast world of data science. Here’s what you’ll love:- A clear introduction to data science fundamentals.- Hands-on learning with practical examples.- Mastery of tools like Python, NumPy, Pandas, and R.- Techniques for data visualization to bring your data to life.Whether you're just starting or looking to sharpen your skills, this book is your companion on the journey to mastering data science.Get your copy now for $41.98 (originally $59.99).Mastering Looker and LookML - Complete Looker Guide for Developers: Master Looker and LookML to create views, dashboards, and databases with this guide [Video]By HHN Automate Book Inc.Embark on a journey to unlock the full potential of Looker with our all-encompassing course. Whether you’re new to Looker or looking to deepen your skills, this course guides you step-by-step through everything you need to know.Here’s what you can expect:- Hands-on tutorials for setting up your environment and connecting data.- In-depth exploration of LookML fields, parameters, and joins.- Advanced techniques for creating and managing impactful dashboards.By the end, you’ll have the confidence to create dynamic, data-driven insights that can drive meaningful decisions in your organization.Get the full video course now for $104.99 (MP4 download available).Artificial Intelligence and Expert Systems: Techniques and Applications for Problem SolvingBy Mercury Learning and Information ,I. Gupta ,G. NagpalDive into the world of AI with a guide that makes complex concepts approachable and practical. This book is your gateway to mastering AI, offering:- In-depth coverage of AI and expert systems.- Clear explanations paired with real-world applications.- Exploration of advanced topics like neural networks and fuzzy logic.From understanding the basics of AI to applying expert systems and neural networks, this book equips you with the tools to solve real-world problems. Perfect for anyone eager to enhance their knowledge of intelligent systems.Grab your copy now for $34.98 (originally $49.99).🔰 Data Science Tool Kit➤SeldonIO/alibi:Alibi is a Python library focused on machine learning model inspection, offering diverse explanation methods for classification and regression models.➤Trusted-AI/AIX360:AI Explainability 360 offers an open-source Python toolkit for detailed model interpretability across various data types, supporting diverse explanation methods.➤dssg/aequitas:Aequitas is an open-source toolkit for bias auditing and Fair ML, aiding data scientists and researchers in assessing and correcting model biases.➤albermax/innvestigate:iNNvestigate is a Python library providing a unified interface for various methods to analyze neural networks' predictions and understand their internal workings.➤mindsdb/lightwood:Lightwood is an AutoML framework simplifying machine learning pipelines with JSON-AI syntax, allowing customization and automation across diverse data types.Access 100+ data tools in this specially curated blog, covering everything from data analytics to business intelligence—all in one place. Check out"Top 100+ Essential Data Science Tools & Repos: Streamline Your Workflow Today!"on PacktPub.com.⚡Tech Tidbits: Stay Wired to the Latest Industry Buzz!AWS ➤Snowflake Arctic models are now available in Amazon SageMaker JumpStart:Snowflake Arctic Instruct, an enterprise-grade LLM by Snowflake, is now available on Amazon SageMaker JumpStart. It offers exceptional capabilities in SQL querying, coding, and instruction following, optimized for cost-efficiency and performance. The post guides deploying and using the model for enterprise-focused tasks through SageMaker.➤Optimize your workloads with Amazon Redshift Serverless AI-driven scaling and optimization:Amazon Redshift Serverless now features AI-driven scaling, optimizing compute resources based on query complexity, data volume, and more, beyond just query queuing. This enhances performance and cost management, enabling better efficiency in handling varied workloads, as demonstrated through detailed use cases.Google➤Jamba 1.5 Model Family from AI21 Labs is now available on Vertex AI:AI21 Labs has launched the Jamba 1.5 Model Family on Google Cloud's Vertex AI Model Garden. The models, Jamba 1.5 Mini and Jamba 1.5 Large, are designed for enterprise applications like customer service and financial analysis. These models feature a 256K context window, Mamba-Transformer architecture, and advanced developer tools, supporting high-quality, efficient AI solutions on a fully managed infrastructure.➤Apache Airflow hierarchy and alerting options with Cloud Composer:This guide discusses the importance of robust logging and alerting for Google Cloud's managed Airflow service, Cloud Composer. It outlines the alerting hierarchy, explains different alerting options, including log-based alerting policies, and provides sample code to set up alerts for monitoring DAGs and tasks effectively.🔍From Bits to BERT: Keeping Up with LLMs & GPTs➤DaRec: A Novel Plug-and-Play Alignment Framework for LLMs and Collaborative Models.This blog discusses the development and evaluation of DaRec, an innovative framework designed to align large language models (LLMs) with collaborative filtering models in recommender systems. By disentangling representations and employing dual-level structure alignment, DaRec overcomes challenges in integrating LLMs, demonstrating superior performance across various datasets.➤Tinygrad: A Simplified Deep Learning Framework for Hardware Experimentation.This blog discusses Tinygrad, a new deep learning framework designed for simplicity and flexibility, making it easier for developers to experiment with and add support for new hardware accelerators. Despite its simplicity, Tinygrad can run popular models and offers promising potential for innovation.➤MegaAgent: A Practical AI Framework Designed for Autonomous Cooperation in Large-Scale LLM Agent Systems.This blog discusses MegaAgent, a new framework for LLM-powered multi-agent systems (LLM-MA), designed to enhance autonomy and scalability. By enabling dynamic task splitting, parallel execution, and real-time coordination among many agents, MegaAgent overcomes the limitations of traditional sequential models, making it highly effective for complex, large-scale tasks.➤Mistral-NeMo-Minitron 8B Released: NVIDIA's Latest AI Model Redefines Efficiency and Performance Through Advanced Pruning and Knowledge Distillation Techniques.This blog discusses NVIDIA's Mistral-NeMo-Minitron 8B, an advanced large language model created using width-pruning and knowledge distillation. It outperforms similar models in its size class, showcasing impressive efficiency and accuracy, and setting a new standard in natural language processing.➤Microsoft AI Releases Phi 3.5 mini, MoE and Vision with 128K context, Multilingual and MIT License:This blog discusses Microsoft's introduction of three advanced AI models—Phi 3.5 Mini Instruct, Phi 3.5 MoE, and Phi 3.5 Vision Instruct—each designed for specific tasks in natural language processing, multimodal AI, and high-performance computing, showcasing significant advancements in efficiency and capability.➤OpenResearcher: An Open-Source Project that Harnesses AI to Accelerate Scientific Research.This blog discusses the introduction of OpenResearcher, an open-source AI tool designed to assist researchers by offering a unified solution for scientific queries. It outperforms existing industry tools by actively guiding users, leveraging Retrieval-Augmented Generation, and delivering accurate, elaborate answers.➤DeepSeek-AI Open-Sources DeepSeek-Prover-V1.5: A Language Model with 7 Billion Parameters that Outperforms all Open-Source Models in Formal Theorem Proving in Lean 4.This blog discusses DeepSeek-Prover-V1.5, a language model designed to tackle formal theorem proving challenges in systems like Lean and Isabelle. By integrating proof-step and whole-proof generation with advanced techniques like Monte-Carlo tree search, the model significantly improves formal proof generation accuracy and efficiency.➤Marqo Releases Marqo-FashionCLIP and Marqo-FashionSigLIP: A Family of Embedding Models for E-Commerce and Retail.This blog discusses the release of two advanced multimodal models, Marqo-FashionCLIP and Marqo-FashionSigLIP, for fashion search and recommendation. These models improve search accuracy and personalization by merging visual and textual data, outperforming previous models in various benchmarks and offering faster inference times.➤Answer.AI Releases answerai-colbert-small: A Proof of Concept for Smaller, Faster, Modern ColBERT Models.AnswerAI's answerai-colbert-small-v1 is a compact 33 million parameter model that outperforms larger models in multi-vector retrieval tasks. Built on ColBERT architecture and enhanced by JaColBERTv2.5, it excels in out-of-domain generalization, demonstrating impressive efficiency and future compatibility.✨On the Radar: Catch Up on What's Fresh➤Document Extraction Is GenAI’s Killer App:The blog discusses the challenges of understanding and standardizing job titles and seniority from résumés, a task that remained difficult even for LinkedIn's data team. However, large language models like GPT-4 can now easily tackle these tasks, highlighting the potential for LLMs in automating complex document analysis and extraction processes. The author and their cofounder created Docupanda.io to address text extraction challenges from complex documents, offering a solution where existing tools fall short.➤The End of Required Work: Universal Basic Income and AI-Driven Prosperity.The blog discusses the inevitability of AI taking over most jobs, emphasizing the need for society to adapt by implementing solutions like taxing AI work to fund Universal Basic Income (UBI). This approach aims to fairly distribute AI-generated wealth, ensuring societal well-being and avoiding dystopian inequity.➤Learning to Unlearn: Why Data Scientists and AI Practitioners Should Understand Machine Unlearning.The article discusses the widespread digital footprint of over 5.9 billion people, primarily due to social media, and the challenges of data privacy in AI. It introduces concepts like Machine Unlearning and the SISA framework to address privacy concerns by enabling the removal of specific data points from AI models without retraining the entire model.➤Speaker’s Privacy Protection in DNN-Based Speech Processing Tools:This post introduces "Privacy-PORCUPINE," a privacy-preserving technique for speech processing, addressing potential privacy threats from vector quantization in deep neural network bottlenecks. It proposes Space-Filling Vector Quantization (SFVQ) with resampling to ensure equal codebook element occurrences, minimizing private information leakage.➤The Azure Landing Zone for a Data Platform in the Cloud:This post discusses designing a secure Azure cloud infrastructure for data platforms, emphasizing the importance of implementing Azure landing zones, networking, naming conventions, and Infrastructure as Code (IasC) to ensure security and consistency across environments, especially when handling sensitive data.See you next time!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{line-height:0;font-size:75%} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more