Repository for Digital Talent Scholarship 2018 | Big Data | UGM
Big Data Analytics specialization class facilitated by Universitas Gadjah Mada (UGM) and event held by Ministry of Communication and Information Technology of the Republic of Indonesia
Python Notebook program to calculate Body mass index (BMI) is a measure of body fat based on height and weight that applies to adult men and women. Categorize Body Mass Index(BMI) according to the BMI table
Python Notebook program to measure the common statistical functions. A statistical function, such as Mean, Median, Variance, standard deviation summarizes a sample of values by a single value. By default, they expect their parameter(s) to be a probabilistic value represented by a random sample of values over the Run index.
Word Count is a technique in which a sorted list of words and their frequency from data sources. In this Python Notebook program, Word Count is generated from a sample paragraph that we got from a news description, separate each word, and calculate each word based on their frequency.
Analysis data from data.go.id which determine the percentage of the Senior High School student (SMA). The initial dataset contains the percentage of high school students according to the residence, in each province in Indonesia. Data sourced from Data and Statistics Center of Education and Culture of Indonesia
Web scraping is a program or algorithm to extract and process large amounts of data from the web. All of the tasks related on the scrapping function are placed in scrapping folder.
Scrapping articles from the kompas.com website. Getting articles data from kompas.com which related to technology and put the result on scrapping/scrap-images data scrapping/scrap-data folders.
- All of the images are placed in
scrapping/scrap-imagesfolder - All of the data of the article are saved on
scrapping/scrap-data/data_berita.csvfile
Scrapping tweets from twitter website. Getting tweets from the twitter which related to specific term. All of the tweets are saved in the scrapping/scrap-data folder.
- All of the twees are saved on
scrapping/scrap-data/data_twitter.xlsxfile
Scrapping articles from Tirto.id website. Trying to get articles from this website and put all of the articles to the scrapping/scrap-data folder
- First, Get list articles from 1st index of the website and save on
scrapping/scrap-data/tirto.xlsxfile - Second, Get list articles from 1st to 14th index of the website and save on
scrapping/scrap-data/tirtoArticles.xlsxfile
Data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data. All of the tasks related on the scrapping function are placed in cleaningData folder.
- Missing Value
- Encoding Data Category
- Binarizing
- Scaling feature
- Feature Extraction(Count Vectorizer, Vectorizer Dict, TfIdf Vectorizer)
SQL (Structured Query Language) is a standard language for storing, manipulating and retrieving data in databases, SQL lets you access and manipulate database. All of the tasks related on the SQL are placed in SQL folder.
Before start this folder notebook, please make sure to complete this following list:
- Install XAMPP
- Start MySQL Database, and Apache Webserver
- Install Package PyMySQL in this notebook, this package contains a pure-Python MySQL client library
- Import database
SQL/mysqlsampledatabase.sqlto MySQL Database
- SQL Query
- SQL Join
- SQL SubQuery
Statistics is the science of collecting, analyzing and understanding data, and accounting for the relevant uncertainties. As such, it permeates the physical, natural and social sciences; public health; medicine; business; and policy. Statistics is fundamental to ensuring meaningful, accurate information is extracted from Big Data. All of the tasks related on the Statistics are placed in statistics folder.
A descriptive statistic is a summary statistic that quantitatively describes or summarizes features of a collection of information, while descriptive statistics in the mass noun sense is the process of using and analyzing those statistics.
A Bivariate analysis is the simultaneous analysis of two variables (attributes). It explores the concept of relationship between two variables, whether there exists an association and the strength of this association, or whether there are differences between two variables and the significance of these differences.
This repository requires Python 3.7 or v3+ to run.
Install the Jupyter Notebook to run .ipynb file.
$ git clone https://github.com/t4f1d/digital-talent.git






