digital-talent

Repository for Digital Talent Scholarship 2018 | Big Data | UGM

Big Data Analytics specialization class facilitated by Universitas Gadjah Mada (UGM) and event held by Ministry of Communication and Information Technology of the Republic of Indonesia

- Tugas BMI

Python Notebook program to calculate Body mass index (BMI) is a measure of body fat based on height and weight that applies to adult men and women. Categorize Body Mass Index(BMI) according to the BMI table

- Tugas List Nilai

Python Notebook program to measure the common statistical functions. A statistical function, such as Mean, Median, Variance, standard deviation summarizes a sample of values by a single value. By default, they expect their parameter(s) to be a probabilistic value represented by a random sample of values over the Run index.

- Tugas Word Count

Word Count is a technique in which a sorted list of words and their frequency from data sources. In this Python Notebook program, Word Count is generated from a sample paragraph that we got from a news description, separate each word, and calculate each word based on their frequency.

- Analisis Data Siswa SMA

Analysis data from data.go.id which determine the percentage of the Senior High School student (SMA). The initial dataset contains the percentage of high school students according to the residence, in each province in Indonesia. Data sourced from Data and Statistics Center of Education and Culture of Indonesia

Scrapping

Web scraping is a program or algorithm to extract and process large amounts of data from the web. All of the tasks related on the scrapping function are placed in scrapping folder.

- Scrapping Kompas

Scrapping articles from the kompas.com website. Getting articles data from kompas.com which related to technology and put the result on scrapping/scrap-images data scrapping/scrap-data folders.

All of the images are placed in scrapping/scrap-images folder
All of the data of the article are saved on scrapping/scrap-data/data_berita.csv file

- Scrapping Twitter

Scrapping tweets from twitter website. Getting tweets from the twitter which related to specific term. All of the tweets are saved in the scrapping/scrap-data folder.

All of the twees are saved on scrapping/scrap-data/data_twitter.xlsx file

- Scrapping Tirto

Scrapping articles from Tirto.id website. Trying to get articles from this website and put all of the articles to the scrapping/scrap-data folder

First, Get list articles from 1^st index of the website and save on scrapping/scrap-data/tirto.xlsx file
Second, Get list articles from 1^st to 14^th index of the website and save on scrapping/scrap-data/tirtoArticles.xlsx file

Data Cleaning

Data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data. All of the tasks related on the scrapping function are placed in cleaningData folder.

Missing Value
Encoding Data Category
Binarizing
Scaling feature
Feature Extraction(Count Vectorizer, Vectorizer Dict, TfIdf Vectorizer)

SQL

SQL (Structured Query Language) is a standard language for storing, manipulating and retrieving data in databases, SQL lets you access and manipulate database. All of the tasks related on the SQL are placed in SQL folder.

Before start this folder notebook, please make sure to complete this following list:

Install XAMPP
Start MySQL Database, and Apache Webserver
Install Package PyMySQL in this notebook, this package contains a pure-Python MySQL client library
Import database SQL/mysqlsampledatabase.sql to MySQL Database

SQL List Notebook

SQL Query
SQL Join
SQL SubQuery

Statistics

Statistics is the science of collecting, analyzing and understanding data, and accounting for the relevant uncertainties. As such, it permeates the physical, natural and social sciences; public health; medicine; business; and policy. Statistics is fundamental to ensuring meaningful, accurate information is extracted from Big Data. All of the tasks related on the Statistics are placed in statistics folder.

- Descriptive Statistics

A descriptive statistic is a summary statistic that quantitatively describes or summarizes features of a collection of information, while descriptive statistics in the mass noun sense is the process of using and analyzing those statistics.

- Bivariate analysis

A Bivariate analysis is the simultaneous analysis of two variables (attributes). It explores the concept of relationship between two variables, whether there exists an association and the strength of this association, or whether there are differences between two variables and the significance of these differences.

Installation

This repository requires Python 3.7 or v3+ to run.

Install the Jupyter Notebook to run .ipynb file.

$ git clone https://github.com/t4f1d/digital-talent.git

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.ipynb_checkpoints		.ipynb_checkpoints
SQL		SQL
classification_clustering		classification_clustering
cleaningData		cleaningData
data		data
dataMining		dataMining
scrapping		scrapping
sentimentAnalysis		sentimentAnalysis
statistics		statistics
Analisis Data Siswa SMA.ipynb		Analisis Data Siswa SMA.ipynb
README.md		README.md
TugasBMI.ipynb		TugasBMI.ipynb
TugasListNilai.ipynb		TugasListNilai.ipynb
TugasWordCount.ipynb		TugasWordCount.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

digital-talent

- Tugas BMI

- Tugas List Nilai

- Tugas Word Count

- Analisis Data Siswa SMA

Scrapping

- Scrapping Kompas

- Scrapping Twitter

- Scrapping Tirto

Data Cleaning

SQL

SQL List Notebook

Statistics

- Descriptive Statistics

- Bivariate analysis

Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

digital-talent

- Tugas BMI

- Tugas List Nilai

- Tugas Word Count

- Analisis Data Siswa SMA

Scrapping

- Scrapping Kompas

- Scrapping Twitter

- Scrapping Tirto

Data Cleaning

SQL

SQL List Notebook

Statistics

- Descriptive Statistics

- Bivariate analysis

Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages