Analysis Report

written by: Toluwalase Tawak

Project Goal

The goal of this project is to practice and improve my data wrangling skills which include; Gathering, Assessment, Transformation, Cleaning, Visualisation and Analyis. These activities will be carried out on the twitter archive of the @RateDogs account. This account rates people's dogs with humorous comments about the dogs.
This README summarises how I approached the data wrangling for this project and displays some visualization(s) which produced my insights.

Introduction

For this project, I worked with three dataset provided by Udacity. They each contained different information needed to carry out analysis and reporting.

The first dataset was a csv file named twitter_archive_enhanced, it contained information about 2356 tweets and was downloaded manually.

The second dataset was a tsv file named image_prediction, a url to the data was provided by udacity server which was used to programmatically downloaded the file. It contained 2075 predictions classifying dogs by their breeds using pictures provided with tweets.

The third dataset, I downloaded manually as a txt file, containing JSON format of tweet informations. It contained extra information about tweets like the rewetet count, favorite count recieved for 2357 tweets.

The dataset were assessed visually and programmatically for quality and tidiness issues. These issues were addressed and corrected, a merge/join was performed on the three datframes to create one master dataset which I then used to carry out my investigative analysis.

Cleaning Data

This section of the wrangling process was broken down into three parts:

Define: Cleaning process to be carried out was explained

Code: Code needed to achieve the cleaning goal that had been defined.

Test: Code to confirm that the cleaning goal had been achieved.

To begin, copies of the three datasets were created. These copies were used to carry out the cleaning activities.
Some of the cleaning proccesses carried out on the datasets are as follows:

Some rows and columns containing null values were dropped.
Attributes type were converted to appropriate type
Some columns were concatenated, unpivot(ed) to form single columns
The three datasets were merged

Analysis and Visualisations

We can see from the chart that there seems to be correlation between some attributes of our data. The strongest correlation can be observed between favourite count and retweet counts.
We can also see that the date column seems to have a relationship with the favourite count, rating numerator and to a much lesser extent retweet count and length of tweets.

The visual above shows us the popularity of breeds or occurrence of breeds rated by the twitter account.
Terriers were the most talked about dogs
Investigation showed that 6 of these breeds were small sized dogs

The highest rated breed, of the most popular(breeds that occur more than 14 times) breeds, is the Samoyed.
Golden Retriever, which happens to be the most commonly tweeted about dog, is the second highest rated dog.
5 of our most common breeds within the time frame of our entire data, also happen to be among the Top 10 rated breeds.

The French Bulldog and the Cocker Spaniel, small sized dogs, are two of the top 3 most popular breeds for twitter users engagemets.

Insights

These include other insights not shown in this summarised report. These can be found in the Jupyter notebook used to carry out the wrangling and analysis process.

The handler of the this twitter account likes to use the abbreviation "af" a lot. We can assume that this abbreviation means that he/she liked to emphasize their description or sentiments for any dog they were rating.
Of the four different classisfication of dogs, Pupper was the handlers favourite description for dogs. From the 'dogtionary', these dogs are physically small or young dogs.
In the time period under consideration, the maximum number of characters for a tweet was 140. Therefore, the handler used a lot of long tweets, between 100 and 140 characters for most of their tweets.
The handler was very generous with ratings, giving most dogs a numerator 10 or more.
The two most popular breeds, were both Retrievers. If people sent their dog's pictures to be rated, most of the dogs owners who are aware of the account owned Retrievers (Golden and Labrador). If the handler found the pictures on their own, is it easier to find pictures of retrievers than other breeds ?
It appears the handler has a bias towards the Golden Retriever. Of the most popular dogs, it possessed the second highest rating. The Labrador Retriever is also in the top 10 for popular dogs with the highest ratings.
The most popular dogs also had the most total engagements in terms of Favourites and retweets.
Followers of the accounts seemed to have lots of love for the French Bulldog, Cocker Spaniel and the Samoyed as they were the most popular dogs which received the most likes or retweets on average.
The Beddington Terrier and Saluki breeds are probably the breed followers of the account liked the most. They received the most total engagements on average even with their lack of popularity. (All though these engagement values could be skewed due to just one of their pictures receiving very very large number of engagements.)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Images		Images
README.md		README.md
act_report.ipynb		act_report.ipynb
act_report.pdf		act_report.pdf
dog.jpg		dog.jpg
engagements.csv		engagements.csv
image-predictions.tsv		image-predictions.tsv
tweet-json.txt		tweet-json.txt
twitter-api.py		twitter-api.py
twitter_archive_enhanced.csv		twitter_archive_enhanced.csv
twitter_archive_master.csv		twitter_archive_master.csv
wrangle_act.ipynb		wrangle_act.ipynb
wrangle_report.html		wrangle_report.html
wrangle_report.ipynb		wrangle_report.ipynb
wrangle_report.pdf		wrangle_report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Analysis Report

written by: Toluwalase Tawak

Project Goal

Introduction

Cleaning Data

Analysis and Visualisations

Insights

About

Uh oh!

Releases

Packages

Languages

LaseTawak/WeRateDogs-Analysis

Folders and files

Latest commit

History

Repository files navigation

Analysis Report

written by: Toluwalase Tawak

Project Goal

Introduction

Cleaning Data

Analysis and Visualisations

Insights

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages