Ans: Cloud-based integration service that allows creating data-driven workflows in the cloud for orchestrating and automating data movement and data transformation.
Ans:
Ans:
This individual process is an activity.
For example: Consider SQL server, you need a connection string that you can connect to an external device. you need to mention the source and the destination of your data.
Ans: Data Warehouse is a traditional way of storing data that is still used widely. Data Lake is complementary to Data Warehouse i.e if you have your data at a data lake that can be stored in the data warehouse as well but there are certain rules that need to be followed.
DATA LAKE | DATA WAREHOUSE |
Complementary to data warehouse | Maybe sourced to the data lake |
Data is Detailed data or Raw data. It can be in any particular form. you just need to take the data and dump it into your data lake | Data is filtered, summarised, refined |
Schema on reading (not structured, you can define your schema in n number of ways) | Schema on write(data is written in Structured form or in a particular schema) |
One language to process data of any format(USQL) | It uses SQL |
Ans: The integration runtime is the compute infrastructure that Azure Data Factory uses to provide the following data integration capabilities across various network environments.
3 Types of integration runtimes:
Ans: Windows Azure is a cloud platform developed by Microsoft that enables businesses to completely run in the cloud.
Cloud computing is Web-based computing that allows businesses and individuals to consume computing resources such as virtual machines, databases, processing, memory, services, storage, or even number of calls or events and pay-as-you-go. The pay-as-you-go model charges for the resources as much as you use. Unlike traditional computing, if you do not use any resources, you do not pay. It is similar to having a water connection or an electricity line. You have a meter and the meter keeps track of your monthly usage and you pay for that usage at a given rate.
Cloud computing is a culmination of numerous attempts at large-scale computing with seamless access to virtually limitless resources.
Here are some key advantages of cloud computing:
Ans: Azure Table storage is a very popular service used across many projects which helps to store structured NoSQL data in the cloud, providing a key/attribute store with a schemaless design. Table storage is very well known for its schemaless architecture design. The main advantage of using this is, table storage is fast and cost-effective for many types of applications.
Another advantage of table storage is that you can store flexible datasets like user data for a web application or any other device information or any other types of metadata that your service requires.
You can store any number of entities in the table. One storage account may contain any number of tables, up to the capacity limit of the storage account.
Another advantage of Azure Table storage is that it stores a large amount of structured data. The service is a NoSQL data store that accepts authenticated calls from inside and outside the Azure cloud.
Ans: As discussed above, the companies which provide the cloud service are called the Cloud Providers. There are a lot of cloud providers out there, out of them one is Microsoft Azure. It is used for accessing Microsoft’s infrastructure for the cloud.
Ans: Azure Blob Storage is a service for storing large amounts of unstructured object data, such as text or binary data. You can use Blob Storage to expose data publicly to the world or to store application data privately. Common uses of Blob Storage include:
Ans: While we are trying to extract some data from the Azure SQL server database, if something has to be processed, then it will be processed and is stored in the Data Lake Store.
Steps for Creating ETL
Ans: Pipeline: It acts as a carrier in which we have various processes taking place.
This individual process is an activity.
For example: Consider SQL server, you need a connection string that you can connect to an external device. you need to mention the source and the destination of your data.
HDInsight(PaaS) | ADLA(SaaS) |
HDInsight is Platform as a service | Azure Data Lake Analytics is Software as a service. |
If we want to process a data set, first of all, we have to configure the cluster with predefined nodes and then we use a language like pig or hive for processing data | It is all about passing queries, written for processing data and Azure Data Lake Analytics will create necessary compute nodes as per our instruction on-demand and process the data set |
Since we configure the cluster with HD insight, we can create as we want and we can control it as we want. All Hadoop subprojects such as a spark, Kafka can be used without any limitation. | With azure data lake analytics, it does not give much flexibility in terms of the provision in the cluster, but Azure takes care of it. We don’t need to worry about cluster creation. The assignment of nodes will be done based on the instruction we pass. In addition to that, we can make use of USQL taking advantage of dotnet for processing data. |
Ans: The Mapping Data Flow feature currently allows Azure SQL Database, Azure SQL Data Warehouse, delimited text files from Azure Blob storage or Azure Data Lake Storage Gen2, and Parquet files from Blob storage or Data Lake Storage Gen2 natively for source and sink.
Ans: You can use the scheduler trigger or time window trigger to schedule a pipeline.
Ans: Azure Service Fabric is a distributed systems platform that makes it easy to package, deploy, and manage scalable and reliable microservices. Service Fabric also addresses the significant challenges in developing and managing cloud applications. Developers and administrators can avoid complex infrastructure problems and focus on implementing mission-critical, demanding workloads that are scalable, reliable, and manageable. Service Fabric represents the next-generation middleware platform for building and managing these enterprise-class, tier-1, cloud-scale applications.
Ans: You will no longer have to bring your own Azure Databricks clusters.
Ans: The traffic manager is allocated to control the distribution of the user to deploy the cloud service. The benefit of the traffic manager constitutes;
Ans: Redis is an open-source (BSD licensed), in-memory data structure store, used as a database, cache, and message broker. Azure Redis Cache is based on the popular open-source Redis cache. It gives you access to a secure, dedicated Redis cache, managed by Microsoft, and accessible from any application within Azure. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, and geospatial indexes with radius queries.
Ans: You can use the @coalesce construct in the expressions to handle the null values gracefully.
Ans: Iaas, PaaS, and SaaS are three major components of Azure and cloud computing.
Infrastructure as a Service (IaaS):
With IaaS, you rent IT infrastructure – servers and virtual machines (VMs), storage, networks, operating systems – from a cloud provider on a pay-as-you-go basis.
Platform as a Service (PaaS):
Platform as a service (PaaS) refers to cloud computing services that supply an on-demand environment for developing, testing, delivering, and managing software applications.
Software as a Service (SaaS):
Software as a service (SaaS) is a method for delivering software applications over the Internet, on-demand, and typically on a subscription basis. With SaaS, cloud providers host and manage the software application and underlying infrastructure and handle any maintenance, such as software upgrades and security patching.
Learn more here: Introduction to Cloud Computing.