👋 Hey ,
Happy Thursday! 🍃
Welcome to BIPro #72—Your Weekly Dose of Data Brilliance!
This week’s newsletter is packed with insights to boost your data game:
🔮 Data Insights & Tools
◉ Pipeline Perfection: Build Efficient Data Pipelines with Prefect
◉ Taming Outliers: Handle Dataset Outliers Using Pandas
◉ Regex Wizardry: 5 Tips for Data Cleaning with Regular Expressions
◉ NumPy Secrets: Solve Nonlinear Equations with NumPy
◉ Docker Essentials: Use Docker Volumes for Persistent Storage
◉ Python Library Creation: A Beginner’s Guide to Pip Install YOU
⚡ Industry Highlights
◉ Microsoft Fabric: August 2024 Update, Advanced Anomaly Detection, Custom Sparklens JARs, and CI/CD Capabilities
◉ AWS BI: Batch Data Processing with AWS Lambda, Zero Copy Data Sharing, and Unified Analytics
◉ Google Cloud: Trusted Metrics with Looker, AI-Powered Data Clean Rooms
◉ Tableau: New Features in Tableau 2023.1, Creative Collaboration with Deloitte, Salesforce CRM Integration, and Tableau Portals
✨ Fresh Reads
◉ Microsoft Power BI Cookbook by Greg Deckler & Brett Powell: Master data transformation
◉ Big Data on Kubernetes by Neylson Crepalde: Build scalable data solutions
◉ Big Data Using Hadoop and Hive by Nitin Kumar: Advanced Hadoop & Hive techniques
◉ Tableau Masterclass 2024 by Nikolai Schuler: From Basics to Advanced Analytics
💡 BI Community Scoop
◉ Microsoft Fabric Security: Object, Column, and Row Level
◉ SQL Performance: Local Variables and Metadata Insights
◉ Data Replication & Integration: Change Data Capture Strategies
◉ SQL Server Logs: How to Access Error Logs
Stay sharp and keep innovating with BIPro!
Calling All Data & BI Enthusiasts!
Do you dream of sharing your insights and building your reputation in the Data & BI community? Contribute to our new column in the Packt BIPro newsletter! Share your experiences, discuss new BI tools, or ask questions. Gain recognition among 37,000 BI professionals. Reply with your Google Docs article or use our weekly feedback form. Enjoy a free PDF of "Interactive Data Visualization with Python - Second Edition" for participating. Click reply or share your content today!
Cheers,
Merlyn Shelley
Editor-in-Chief, Packt
We’re thrilled to bring you this week’s must-have new releases, straight from the experts to your bookshelf! Whether you're eager to enhance your skills or explore new horizons, now is the perfect moment to add these invaluable resources to your collection.
For a limited time, enjoy 30% off all eBooks at Packtpub.com. These books are thoughtfully crafted by industry insiders with hands-on experience, offering unique insights you won’t find anywhere else.
Don’t let these Packt-exclusive deals slip away—seize the opportunity to learn from the best at an unbeatable price!
If you're looking to elevate your data game, the latest edition of the Power BI Cookbook is your perfect guide. Whether you're a seasoned BI developer or just diving into data analytics, this updated resource offers:
◾ Deeper insights through Microsoft Data Fabric for robust data strategies.
◾ Simplified creation of Hybrid tables, scorecards, and shared cloud connections.
◾ Enhanced visualization tools to turn complex data into clear, actionable insights.
With step-by-step guidance, this book equips you to navigate the evolving landscape of Power BI and stay ahead with the latest innovations. Perfect for refining skills or mastering new ones. Grab the Power BI Cookbook 3rd Edition now for just $29.99!
Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
If you're navigating the complexities of big data in a cloud environment, Big Data on Kubernetes is your guide to mastering scalable, resilient data pipelines. This book offers practical insights to:
◾ Seamlessly integrate Kubernetes with popular tools
◾ Optimize big data pipelines for peak performance
◾ Build end-to-end solutions with Spark, Airflow, and Kafka
Whether you're just starting out or looking to enhance your expertise, this resource will empower you to handle real-world data challenges confidently. Grab it now at just $21.99 – save $10 on your essential big data guide!
Big Data Using Hadoop and Hive: Master Big Data Solutions with Hadoop and Hive
By Mercury Learning and Information, Nitin Kumar
Looking to master big data? This new release is your go-to guide for diving deep into Hadoop 3 and Hive 3.x. You’ll get:
◾ Comprehensive coverage of Hadoop 3 and Hive 3.x
◾ Real-world examples and sample code for practical applications
◾ Advanced insights into YARN, MapReduce, and data compression
Perfect for developers and engineers, this book takes you from the basics of big data to advanced data management techniques, ensuring you can confidently set up, configure, and optimize Hadoop and Hive to tackle big data challenges. Ready to level up your data game? This guide is your essential companion. Unlock the power of big data for just $37.99—save $17 on this must-have first edition!
Tableau Masterclass - 2024: Master Tableau: From Basics to Advanced Analytics [Video]
Ready to transform your data into compelling stories? This new release is your guide to mastering Tableau—from basic connections to advanced visualizations. Learn to blend data, craft dynamic dashboards, and publish your insights with confidence. Whether you're starting out or leveling up, this course equips you with everything you need to excel in data visualization.
Key Benefits:
◾ Master Tableau from basics to advanced features
◾ Learn effective data visualization and storytelling techniques
◾ Create, design, and publish interactive dashboards
Get ready to tackle any data visualization challenge with this detailed video guide! Become a Tableau Pro: 8-Hour Course for $109.99—Watch Now!
➤ PrefectHQ/prefect: Prefect simplifies Python data pipeline orchestration, transforming scripts into dynamic workflows that react to changes and ensure resilience.
➤ airbytehq/airbyte: Airbyte, an open-source data integration platform, offers 300+ connectors for seamless ELT pipelines between diverse data sources and destinations.
➤ argoproj/argo-workflows: Argo Workflows orchestrates parallel jobs on Kubernetes via container-native workflows, supporting DAGs and accelerating compute-intensive tasks like ML and data processing.
➤ dagster-io/dagster:Dagster is a cloud-native data pipeline orchestrator with integrated lineage, observability, declarative programming, and robust testability across the lifecycle.
➤ Avaiga/taipy: Taipy simplifies web app development for data scientists & ML engineers using Python, focusing on AI algorithms with no extra languages.
Access 100+ data tools in this specially curated blog, covering everything from data analytics to business intelligence—all in one place. Check out "Top 100+ Essential Data Science Tools & Repos: Streamline Your Workflow Today!" on PacktPub.com.
➤Building Data Pipeline with Prefect: The tutorial introduces Prefect, a modern workflow orchestration tool, by building a data pipeline with Pandas and comparing it to a Prefect workflow. It covers task orchestration, deployment, and monitoring of workflows, demonstrating Prefect's features for efficient workflow management and observability in MLOps.
➤ How to Handle Outliers in Dataset with Pandas? This blog discusses the detection and handling of outliers in datasets, explaining their impact on data analysis and models, and explores various techniques such as removal, capping, imputation, and transformation to manage outliers effectively.
➤ 5 Tips for Using Regular Expressions in Data Cleaning: This blog explains how to use regular expressions in Python for text processing and data cleaning, covering tasks like removing unwanted characters, extracting specific patterns, replacing text, validating data formats, and splitting strings, including examples with Pandas.
➤ How to Use NumPy to Solve Systems of Nonlinear Equations? This blog explains nonlinear equations, their importance in modeling real-world problems, and how to solve them using Python's NumPy and SciPy. It covers defining equations, making initial guesses, solving systems, and visualizing results in 2D and 3D.
➤ How to Use Docker Volumes for Persistent Data Storage? This blog explains how to use Docker volumes to persist data in PostgreSQL containers. It covers creating a Docker volume, running a PostgreSQL container with the volume, verifying data persistence, and ensuring data remains intact after stopping and restarting the container.
➤ Pip Install YOU: A Beginner’s Guide to Creating Your Python Library. This blog provides a step-by-step guide for creating, structuring, and distributing custom Python libraries. It covers everything from project initialization, module creation, and adding tests to publishing the library on PyPI for others to use.
Microsoft Fabric
➤ Microsoft Fabric August 2024 Update: The August 2024 Fabric update introduces key features: managing V-Order in Fabric Warehouses, ML experiment monitoring from the Monitor Hub, and streamlined Azure connectivity in Data Pipeline. It highlights the European Fabric Community Conference, new Copilot features, visual-level format strings in Power BI, and the Fabric Influencers Spotlight. Additionally, it stresses the importance of browser upgrades for Power BI and offers new certification and community engagement opportunities.
➤ Advanced Time Series Anomaly Detector in Fabric: Azure's Anomaly Detector, retiring in October 2026, enabled time series anomaly detection using advanced algorithms. This blog outlines a migration strategy to Microsoft Fabric, leveraging similar algorithms with added benefits like easier model management, seamless data integration, and expanded detection capabilities using Fabric's native tools and the new time-series-anomaly-detector package.
➤ Building a Custom Sparklens JAR for Microsoft Fabric: This blog explains how to build a Sparklens JAR compatible with Spark 3.X for profiling Microsoft Fabric Spark Notebooks. It covers modifying build and configuration files, updating code for Spark 3.X compatibility, and compiling and packaging the JAR for use in Microsoft Fabric.
➤ Exploring CI/CD Capabilities in Microsoft Fabric: A Focus on Data Pipelines. This blog explores Microsoft Fabric's CI/CD features, focusing on automating and managing data integration and analytics processes. It highlights Git integration, deployment pipelines, and workspace setup for streamlined continuous integration and deployment. The blog also provides a step-by-step guide for setting up and operating CI/CD processes in Microsoft Fabric using Azure DevOps and Git.
AWS BI
➤ Efficiently processing batched data using parallelization in AWS Lambda: This post explains how to optimize AWS Lambda functions for efficient message processing by using techniques like batching and parallelization, enhancing resource utilization, reducing invocation times, and improving overall performance in high-volume data processing scenarios.
➤ Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics: This article discusses how Salesforce and Amazon have collaborated to enable seamless, bidirectional Zero Copy data sharing between Salesforce Data Cloud and Amazon Redshift. It details how this integration allows analytics teams to access and analyze unified customer data without the need for traditional ETL processes, enhancing efficiency and accelerating insights.
Google Cloud Data
➤ Grounding Analytical AI Agents with Looker’s Trusted Metrics: This article explores how organizations can integrate AI, particularly Large Language Models (LLMs) like Gemini, with Google Cloud's data tools like Looker to enhance Business Intelligence (BI). By combining AI with Looker's semantic layer, companies can offer users intuitive, AI-driven insights, simplifying data access and decision-making processes. The article highlights the ease and effectiveness of training AI models to align with specific business needs, lowering barriers to data-driven decision-making and enabling faster, more accurate analytics.
➤ Modern Marketer’s Strategic Advantage AI Powered Data Clean Rooms: This article explains how businesses can use Google BigQuery data clean rooms to securely analyze and share sensitive customer data across organizations, driving insights and collaboration. It highlights the importance of AI-powered data clean rooms for modern marketers to unlock insights, fuel innovation, and enhance customer experiences while maintaining data privacy and security.
Tableau
➤ What's New in Tableau 2023.1? The Tableau 2023.1 feature update includes significant enhancements such as improved Tableau-Slack integration, dynamic axis titles, Accelerator Data Mapping, and advanced management features like Identity Pools and RMT improvements. It also introduces new tools for developers, web authoring improvements, expanded data connectivity options, and enhanced data preparation and management capabilities.
➤ Building a Culture of Creative Collaboration with Deloitte and Tableau: This article discusses the growing data and analytical skills gap in the AI-driven business landscape. It highlights the collaboration between Salesforce and Deloitte to bridge this gap through innovative talent development programs like Deloitte Viz Games, which foster a data-driven culture and enhance data literacy, analytics, and creative collaboration among employees.
➤ Salesforce Embeds Tableau Pulse into its CRM: Salesforce has introduced Pulse for Salesforce, a version of Tableau Pulse integrated into Salesforce CRM, starting with Sales Cloud. Built on the Einstein 1 Platform, it leverages generative AI to provide personalized, contextual insights and metrics within users' workflows, enhancing data-driven decision-making and supporting daily business activities securely.
➤ What Is a Tableau Portal? Tools, Benefits & Case Study. This article discusses Tableau Portals, customized web interfaces that integrate Tableau’s data visualization into a centralized, branded platform. It highlights the benefits of self-service analytics, centralized access, enhanced security, and improved customer experience. The article also details features like content management, alerts, and a case study showcasing the effectiveness of implementing Tableau Portals for client reporting.
➤ Microsoft Fabric Warehouse Security: Object, Column and Row Level. The article addresses the challenge of implementing granular access control in Microsoft Fabric's data warehouse. It explores various security mechanisms like object-level, column-level, and row-level security to restrict sensitive data access. Additionally, it highlights limitations when users access data through Spark or OneLake, bypassing these controls.
➤ SQL Local Variables and Performance Issues: The article discusses the potential negative impact of using local variables in T-SQL queries on performance. It explores how local variables can lead to inefficient query plans and offers solutions and workarounds to mitigate these issues, ultimately improving query performance.
➤ SQL Metadata in sys.databases, sys.objects, sys.tables and sys.columns: The article explains how SQL Server metadata, which includes data about databases, tables, columns, and keys, can be accessed using sys schema catalog views like sys.databases, sys.objects, and sys.columns. It provides T-SQL examples to query and understand metadata, helping users efficiently manage and utilize SQL Server metadata for various database objects.
➤ Data Replication and Change Data Capture for Data Integration: The article discusses the challenge of replicating real-time data from production databases to data products without impacting database performance. It introduces Integrate.io as a low-code platform offering high-velocity data replication using Change Data Capture (CDC) and ETL. The platform supports seamless data pipeline automation, scalability, and security, ensuring efficient real-time data integration for business intelligence applications.
➤ Microsoft Fabric OneLake Role Based Access Control (RBAC): The article discusses how to implement granular access control in a Microsoft Fabric lakehouse using Role-Based Access Control (RBAC) in OneLake. This feature allows administrators to restrict access to specific folders or tables within a lakehouse, ensuring that users only access the data they are permitted to see.
➤ How to Access the SQL Server Error Log? The article explains how to view SQL Server and SQL Agent error logs, highlighting three primary methods: using SQL Server Management Studio (SSMS) Log File Viewer, accessing logs via the system stored procedure `sp_readerrorlog`, and discussing when to use each method based on the need for speed, filtering, and custom log analysis.
See you next time!
Copyright (C) 2024 Packt Publishing. All rights reserved.
Our mailing address is:
Packt Publishing Grosvenor House 11 St Paul's Square Birmingham, West Midlands B3 1RB United Kingdom
Want to change how you receive these emails?
You canupdate your preferencesorunsubscribe