Skip to content

DevJupyHUB/supreme-carnival

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

supreme-carnival

This and that with Python

City grids

Beliefs

Cosy reading corner

More parks, please!

Merry Xmas!

Icebergs

Oh, Rats (Dataset is availaible on Kaggle)

Gen AI

Stocks


Diplomatic missions of Portugal

Goal:

To look at what type of portuguese diplomatic missions are in the world and where these missions are located.

Method:

The list of diplomatic missions is scraped from Wikipedia using pandas the read_html() method. After that a thorough data cleaning took place. The cleaned data was joined with a country codes dataset from Kaggle. Finally, the data was visualized by using a waffle chart and two coropleth map charts.

image

image

image


Predicting US tornados' magnitude

Goal:

The main goal is to explore the US Tornado data by performing exploratory data analysis and try to predict tornado magnitude by building machine learning models. It is a multi-class classification problem where balaced accuray is seelected as metric.

Data:

The data is from NOAA and can be found on Kaggle.

Summary:

Tornados occur in all US states but the most of them occure atthe middle and at the east states. The number of tornado occurencies seems to be increasing over time but further analysis relvealed that only the number weak tornados shows a strong upward trend while the number of strong tornados doesn't seem to change. Most tornados likley tend to occure in May and June. They happen at any time but most of them seem to occure at afternoon and evening hours. Texas is the most heavily hit by tornados comparing to other states but the most devastating tornados occur in Oklahoma. It is observed that the more powerful the tornado, the longer and wider its path but most tornadoes are neither long nor wide, big and powerful tornadoes are rare. 2011 was oustanding in terms of injuries, fatalities and loss in USD and Texas, Alabama and Oklahoma states suffered the most.

image

The Random Forest model performed the best comparing to the other models, however, the selected metric of balanaced accuracy improved only about by 0.003619 comparing to Decision Tree cv and by 0.039984 comparing to the baseline Decision Tree model.

image

Neither of these model worked satisfactorily as the selected metric of balanced accuracy remained under 0.5. Future work should consider other techniques for handling imbalanced data, including over and/or under sampling techniques and more hyperparameter tuning.

About

This and that with Python

Resources

License

Stars

Watchers

Forks

Releases

No releases published