All projects and assignments of Course COMP9313 Big Data Management will be pushed here.
- Data Process and management
- Volume/ Velocity/ Variety
- Veracity/ Visibility/ Value
- Big Data Processes
- Data Management
- Acquiition and Recording
- Extraction, Cleaning and Annotation
- Integration, Aggregation and Representation
- Analytics
- Modelling and Analysis
- Interpretation
- Data Management
- Architecture: Cloud Computing (SaaS/ PaaS/ IaaS)
- Hadoop
- MapReduce
- Data Access(Hbase, Hive, Pig, Mahout)/ Tools(Hue, Sqoop)
- Data Curation
- Ingestion/ Validation/ Transformation/ Correction/ Consolidation/ Visualization
- Tools: Data Tamer/ ZenCrowd/ CrowdDB/ Talend/ Pentaho Data Integration
- Hadoop Security
- Authentication
- Kerberos (TGT/ TGS)
- Authorization
- Encryption
- Monitoring and Auditing
- Jobs on NameNodes and JobTrackers/ Authorization Failure/ Authentication Failures
- Authentication
- Spark
- Spark SQL/ Spark Streaming/ GraphX/ MLlib
- Spark Workflow
- SparkContext
- Cluster manager
- Spark executor
- RDDs (Resilient Distributed Datasets)
- Traits: In-Memory/ Immutable/ Lazy evaluated/ Cacheable/ Parallel/ Typed/ Partitioned
- RDD Operations
- Transformation (returns new RDD)
- Action (evaluates and returns new value)
- Lineage Graph
- RDD Persistence: Cache/ Persist
- DAG of operators
- Narrow/ Wide Transformation
- Apache Pig
- Architecture: Parser/ Optimizer/ Compiler/ Execution Engine
- Execution Modes: Local Mode/ MapReduce Mode/ Tez Mode/ Spark Mode
- Pig Data Model: Atom/ Tuple/ Bag/ Map/ Relation
- Grunt/ Pig Latin
- NoSQL/ Elastic Search
- CAP Theorem: Consistency, Availability, Partition-tolerance
- NoSQL Taxonomy
- Key-Value stores: DynamoDB
- Column stores: BigTable (Google), HBase (Apache)
- Document stores: MongoDB, ElasticSearch (supports JSON, XML, etc)
- Graph databases: Neo4j, FlockDB
- ElasticSearch
- ElasticSearch Elements: Cluster, Node, Shard, Index, Type, Mapping, Document, Replicas
- Search APIs
- Process Mining
- Petri nets/ BPMN
- Event logs, alpha-algorithm, conformance checking
- Decision trees