diff --git a/analyses/2019_03_aliamcami_value_analyses/README.md b/analyses/2019_03_aliamcami_value_analyses/README.md new file mode 100644 index 0000000..0039d9f --- /dev/null +++ b/analyses/2019_03_aliamcami_value_analyses/README.md @@ -0,0 +1,87 @@ +# Overview + +## JSON +All the greatest values are JSON, but they represent very little percentual of the whole data. + +### Most of the data have small value_len + (mean = 1356 for the 10% sample) +- 95,58% of the data have value_len smaller than the mean +- 4,42% are bigger than the mean +- 9.35% are valid JSON + +### Values above the mean: +- 61,54% are NOT valid JSON +- 38,46% are valid JSON + +### Values that are 1 standard deviation (std) above the mean + (std = 26310 for 10% sample): +- 0,11% are NOT valid JSON +- 99,88% are valid JSON +- The bigger the value the greater the chance of being a valid JSON + +### Values 4 std above the mean +- 100% are valid JSON +- The biggest non-JSON value have the length of 104653 + +## +The top 46745 gratest value_len are valid JSONs, that is 9.35% of the filtered sample (value_len > mean) and 0,41% of the original 10% sample. + +--- +## Correlation of location_domain and value + +- One domain can produces a single type of output (31%). +- 99% of the domains with single type of output do not produces JSON. + + +- 31% of all domains can produce JSON. +- Only 0,016% of all the domains will aways have JSON as output, and less than half of it will always have the same JSON. + + +- One JSON is usually (83.09%) produced by a single script domain. + + +--- + +# Future questions + +## About JSONs: +- **The JSON values are always from the same location or related domains?*** +- **Are there a set of location domains that always produces a JSON?*** +- Does the JSON values follow a structure pattern? What pattern? +- What data does the JSON hold? Is there any pattern on content? +- Do they have nested JSON? Css? Html? Javascript? Recursive study on JSON properties. + +- Is a JSON's structure for a single script_url domain always the same? +- Is every JSON with the same structure produced by the same script_url domain? + + *See notebook 'isJson_Quantitative_Comparasion.ipynb' for more information + +## General +I'm think some things here maybe a crawler investigation or just wiki reading, since someone may have already described and explained. I just need to find, read and understand it. + +- Are there other valid data types like html, css... in the values column or just JSON? +- Where does the value comes from? What is it used for? + +## Smal: value_len < mean +- What are the small values? +- Does the smaller values have any pattern? +- What the majority data type? + +## Medium: mean < value_len < (mean + std) +- How many rows are there in the intersection of *“no JSON”* and *“everything is JSON”* ? +- What are they? Are they from a specific script_url domain? Or realated domains? + +## Big: value_len > (mean + std) +- What are the big non-JSON values? + +## Security and data sharing: +- Do the value columns have any javascript? nested javascript? +- Do the javascripts in the dataset contain known malicious behaviors? +- Can they collect data that threatens user's privacy? + +## Statistical knowledge / coincidence: +The **mean** of the original 10% sample is pretty similar to the **std** of the sample taken after filtering for values above the mean +- why? +- Is it a coincidence? +- Is it always like this? +- Is it a statistical pattern? \ No newline at end of file diff --git a/analyses/2019_03_aliamcami_value_analyses/isJson_Identify_Source.ipynb b/analyses/2019_03_aliamcami_value_analyses/isJson_Identify_Source.ipynb new file mode 100644 index 0000000..23bae7d --- /dev/null +++ b/analyses/2019_03_aliamcami_value_analyses/isJson_Identify_Source.ipynb @@ -0,0 +1,350 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Start Dask" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/anaconda3/envs/overscripted/lib/python3.6/site-packages/dask/config.py:168: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.\n", + " data = yaml.load(f.read()) or {}\n" + ] + } + ], + "source": [ + "import dask.dataframe as dd\n", + "from dask.diagnostics import ProgressBar\n", + "import matplotlib.pyplot as plt\n", + "import pandas as pd\n", + "import numpy as np\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Objective\n", + "\n", + "The objective of this notebook is to answer: \n", + " - \"The JSON values are always from the same location or related domains?\" \n", + "\n", + "To answer this we will use the sample data set produced by the notebook \"isJson_dataPrep.ipynb\":\n", + "- 's0_domains_isJson_jsonKeys_md5_TLD_JSON_ONLY.parquet'\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Findings\n", + "To answer the question \"The JSON values are always from the same location or related domains?\" \n", + "NO not aways, but usually. 83.09% of the JSONs are produced by a single script domain. \n", + "\n", + "---\n", + "\n", + "About 71% of the JSONs are seen more than once across the data set, that means that they MAY have different origins.\n", + "- Most JSON are from a single script domain. \n", + "- Almost 17% of the JSONs have multiple origins[1], mostly they have 2 to 3 origins, very few have more than this. \n", + "- They may be related for 40% of them have the same TLD[2]. \n", + "- Some of the ones that have multiple script domains have the same location domain (41%) calling different scripts but producing the same JSON[3]. \n", + "- They may have some similarities in usage, 99% of them have a single simbol across the different domains[4]\n", + "\n", + "---\n", + " For further investigation: \n", + " 1. Are this jsons any different? Are they big/small jsons? I may be that they have the same top keys but are in reality very different? \n", + " 2. Do the scripts domains that produces the same json have any relation between them? How can I relate domains?\n", + " 3. What does it mean to different scripts get the same JSON for a single Location? \n", + " 4. Are they used for the same purpose? can we really say that based on the symbol? " + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "DIR = 'sample_0_prep/'" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['symbol', 'location_domain', 'script_domain', 'json_keys', 'keys_md5',\n", + " 'script_tld', 'value_len'],\n", + " dtype='object')" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = dd.read_parquet(DIR + 's0_domains_isJson_jsonKeys_md5_TLD_JSON_ONLY.parquet',\n", + " engine='pyarrow',\n", + " columns=['symbol', 'location_domain', 'script_domain', 'json_keys', 'keys_md5', 'script_tld', 'value_len'])\n", + "df.columns" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "# The JSON values are always from the same location or related domains?\n", + "How many locations one JSON has?\n", + "All bigger json have the same locations?\n", + "what is \"related domains\"?\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[########################################] | 100% Completed | 1.8s\n", + "The total number of different keys_md5 is 14374\n" + ] + } + ], + "source": [ + "with ProgressBar():\n", + " group_by_keys_md5 = df.compute().groupby(['keys_md5'])\n", + " group_by_keys_md5_number_of_different_keys = len(group_by_keys_md5)\n", + " print(\"The total number of different {} is {}\".format('keys_md5', group_by_keys_md5_number_of_different_keys))" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "agg = group_by_keys_md5.agg(['nunique'])" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are a total of 10222(71.11%) JSONs that appear in multiple rows\n" + ] + } + ], + "source": [ + "\n", + "json_multiple_appearances = agg['symbol'][group_by_keys_md5['symbol'].count() > 1]\n", + "json_multiple_appearances_len = len(json_multiple_appearances)\n", + "agg_len = len(agg['symbol'])\n", + "print('There are a total of {0}({1:0.2f}%) JSONs that appear in multiple rows'.format(\n", + " json_multiple_appearances_len, \n", + " json_multiple_appearances_len*100/agg_len))" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "def get_multiple(agg, column, title=''):\n", + " agg_len = len(agg[column])\n", + " x = agg[agg[column]['nunique'] > 1]\n", + " x_len = len(x)\n", + " print(title + '{0} ({1:0.2f}%) multiple {2},\\n{3} ({4:0.2f}%) unique {2}'.format(\n", + " x_len,\n", + " x_len*100/agg_len,\n", + " column, \n", + " agg_len - x_len,\n", + " (agg_len - x_len) * 100 / agg_len\n", + " ))\n", + " return x" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### JSONs origin: script domain\n", + "\n", + "Plot that shows that most JSONs are originated from a single script domain" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "SCRIPT DOMAIN data: from the total of json\n", + "2430 (16.91%) multiple script_domain,\n", + "11944 (83.09%) unique script_domain\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "multiple_script_domain = get_multiple(agg, 'script_domain', 'SCRIPT DOMAIN data: from the total of json\\n')\n", + "pd.DataFrame([[len(multiple_script_domain)/agg_len], \n", + " [(agg_len - len(multiple_script_domain))/agg_len]], \n", + " ['multiple', 'single']).plot(kind='bar', title='JSONs origin: Script domain')" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "count 2430.000000\n", + "mean 2.483128\n", + "std 1.213823\n", + "min 2.000000\n", + "25% 2.000000\n", + "50% 2.000000\n", + "75% 3.000000\n", + "max 34.000000\n", + "Name: nunique, dtype: float64" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "multiple_script_domain.script_domain['nunique'].describe()" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "7 (0.29%) multiple symbol,\n", + "2423 (99.71%) unique symbol\n" + ] + } + ], + "source": [ + "# Out of the multiple_script_domain\n", + "multiple_script_domain_symbol = get_multiple(multiple_script_domain, 'symbol')" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1413 (58.15%) multiple script_tld,\n", + "1017 (41.85%) unique script_tld\n" + ] + } + ], + "source": [ + "# Out of the multiple_script_domain\n", + "multiple_script_domain_location_tld = get_multiple(multiple_script_domain, 'script_tld')" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "98 (4.03%) multiple location_domain,\n", + "2332 (95.97%) unique location_domain\n" + ] + } + ], + "source": [ + "# Out of the multiple_script_domain\n", + "multiple_script_domain_location_tld = get_multiple(multiple_script_domain, 'location_domain')" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/analyses/2019_03_aliamcami_value_analyses/isJson_Occurrence_of_operation_symbols_domains.ipynb b/analyses/2019_03_aliamcami_value_analyses/isJson_Occurrence_of_operation_symbols_domains.ipynb new file mode 100644 index 0000000..cd64c26 --- /dev/null +++ b/analyses/2019_03_aliamcami_value_analyses/isJson_Occurrence_of_operation_symbols_domains.ipynb @@ -0,0 +1,1279 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Start Dask" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/anaconda3/envs/overscripted/lib/python3.6/site-packages/dask/config.py:168: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.\n", + " data = yaml.load(f.read()) or {}\n" + ] + } + ], + "source": [ + "import dask.dataframe as dd\n", + "from dask.diagnostics import ProgressBar\n", + "import matplotlib.pyplot as plt\n", + "import pandas as pd\n", + "import numpy as np\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This notebook uses the parquet produced by the \"isJson_dataPrep.ipynb\":\n", + "- ‘s0_domains_isJson_jsonKeys_md5_TLD.parquet'\n", + "\t- It contains all the original 10% sample with extra columns.\n", + "\n", + "# Objective\n", + "Show and compare between samples the presence and occurrence of operation/symbols/domain/tld. \n", + "\n", + "I'll be doing two of each graph to show the difference between the whole data and the filtered data by only rows that have the value_len above the mean\n", + "\n", + "# Overview\n", + "### Operation\n", + "Most operation used across the entire sample is GET. \n", + "99.67% of the valid JSONs has GET as operation. If filtered by values_len above the mean then all 100% of the valid JSONs are GET. \n", + "\n", + "### Symbols\n", + "The difference for the unique symbols counting for the whole sample and the filtered one is really big. The one thing I can say is that 'window.localStorage' is the one that produces most JSONs (65%) and ‘window.document.cookie' is the one responsible for 34% of the non-JSON, anything else may require further investigation and understanding. \n", + "\n", + "### Domain\n", + "'Baidu' has the most occurrences for valid JSON values (15%) but it's only in the 5th position when it comes to the values above the mean (5.9%).\n", + "‘Google.Analytics’ is the top one for the non-JSON values for both all values and bigger values.\n", + "\n", + "\n", + "### TLD\n", + "The TLD is more balanced between the non-JSON and JSON values, and the top ones remain for the filtered data. \n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "DIR = 'sample_0_prep/'" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[########################################] | 100% Completed | 6.8s\n" + ] + } + ], + "source": [ + "columns=['operation', 'symbol', 'script_domain', 'is_json', 'keys_md5', 'script_tld', 'value_len']\n", + "df = dd.read_parquet(DIR + 's0_domains_isJson_jsonKeys_md5_TLD.parquet',\n", + " engine='pyarrow',\n", + " columns=columns)\n", + "with ProgressBar():\n", + " mean = df['value_len'].mean().compute()\n", + "\n", + "df_a = df[df.value_len > mean]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Support code\n", + "This section is where some support code is placed. \n", + "Some of the code here is where the math actually happen and the other section uses it. " + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[########################################] | 100% Completed | 7.1s\n", + "[########################################] | 100% Completed | 7.3s\n" + ] + } + ], + "source": [ + "with ProgressBar():\n", + " df_json = df[df.is_json == True].compute()\n", + " df_other = df[df.is_json == False].compute()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[########################################] | 100% Completed | 7.1s\n", + "[########################################] | 100% Completed | 6.8s\n" + ] + } + ], + "source": [ + "with ProgressBar():\n", + " df_a_json = df_a[df_a.is_json == True].compute()\n", + " df_a_other = df_a[df_a.is_json == False].compute()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "def calcUniquePercentual(df, column):\n", + " v = df[column].value_counts()\n", + " l = df[column].count()\n", + " return v/l" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "def plotUsageComparation(df_json, df_other, column):\n", + " nonjsons = calcUniquePercentual(df_other, column=column)\n", + " jsons = calcUniquePercentual(df_json, column=column)\n", + " p1 = pd.DataFrame({'json': jsons,'other':nonjsons}).sort_values('json', ascending=False)\n", + " p1.plot(kind='bar')\n", + " return p1" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "def plotTopUsageComparation(df_json, df_other, column, top):\n", + " nonjsons = calcUniquePercentual(df_other, column=column)\n", + " jsons = calcUniquePercentual(df_json, column=column)\n", + " \n", + " p1 = pd.DataFrame({'json': jsons,'other':nonjsons})\n", + " top_json = p1.sort_values('json', ascending=False).head(top)\n", + " top_other = p1.sort_values('other', ascending=False).head(top)\n", + " tops = pd.concat([top_json, top_other]).drop_duplicates()\n", + " tops.plot(kind='bar')\n", + " return tops" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [], + "source": [ + "def plotUniqueValuesComparation(df_json, df_other, column):\n", + " nonjsons = calcUniquePercentual(df_other, column=column)\n", + " jsons = calcUniquePercentual(df_json, column=column)\n", + " \n", + " #Value counts\n", + " count_nonjson = len(nonjsons)\n", + " count_json = len(jsons)\n", + " p1 = pd.DataFrame([count_json, count_nonjson], \n", + " index= [ 'Json', 'Other' ], \n", + " columns=['Value Counts'])\n", + " p1.plot(kind='bar')\n", + " print(\"There are {} unique {} present on the non-json dataset and {} on the JSONs\".format(count_nonjson,\n", + " column,\n", + " count_json))\n", + " return p1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# OPERATION:\n", + "\n", + "The operation columns can have 3 different values \n", + " - GET\n", + " - SET\n", + " - CALL\n", + "\n", + "We can see below that pretty much all[1] JSONs have the operation GET when the whole sample is analysed and ALL JSONs have GET when we filter the sample to values above the mean. \n", + "The GET operation is the most common among the non-json values as well. \n", + "\n", + "---\n", + " For futher investigation: \n", + "1. Are the JSONs that have SET as operation really JSON? Are they false positives? Why are they different? " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Full sample:" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEDCAYAAADOc0QpAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAEh1JREFUeJzt3X+QVeV9x/H3NwjSqMEE11ZZCExKVNwkK6yAoQOY6ChqAY1NtCapEyOTSdQ6/qhEWsfaZPLD1MTOaBpNNDWpqDGNMpaOk4mKvy2gkAEdHWK0bmASJEIVawTn2z/ulazryp5dLnt3H96vmZ2955znnvPdubuffe5zz3lOZCaSpLK8q9kFSJIaz3CXpAIZ7pJUIMNdkgpkuEtSgQx3SSqQ4S5JBTLcJalAhrskFWivZh34gAMOyPHjxzfr8JI0JK1cufLFzGzprV3Twn38+PGsWLGiWYeXpCEpIp6v0s5hGUkqkOEuSQUy3CWpQE0bc5ekKrZt20ZnZyevvfZas0sZUCNHjqS1tZXhw4f36/m9hntE3ACcBPwuM9t62B7A1cAJwKvAmZn5eL+qkaRuOjs72W+//Rg/fjy1uClfZrJp0yY6OzuZMGFCv/ZRZVjmh8DxO9k+B5hY/1oAfLdflUhSD1577TVGjx69xwQ7QEQwevToXXq30mu4Z+b9wO930mQecFPWPArsHxEH9bsiSepmTwr2N+3qz9yID1THAC90We6sr5MkNUkjPlDt6d9LjzdmjYgF1IZuGDduXAMOXd34hf85oMd77usnDujxpD1Fo/+Wq/6tfvSjH+Xhhx9u6LF3p0b03DuBsV2WW4H1PTXMzOsysyMzO1paer16VpIGjaEU7NCYcF8CfDZqpgNbMnNDA/YrSYPGvvvuy4YNG5g5cybt7e20tbXxwAMPALB48WI+9KEP0dbWxiWXXPKW5yxatIiPfOQjTJ8+nd/+9rcDVm+v4R4Ri4FHgEMiojMizoqIL0TEF+pNlgLPAuuA64Ev7rZqJamJbr75Zo477jhWrVrF6tWraW9vZ/369VxyySXcc889rFq1iuXLl3PHHXcAsHXrVqZPn87q1auZOXMm119//YDV2uuYe2ae3sv2BL7UsIokaZA68sgj+dznPse2bduYP38+7e3t3HPPPcyePZs3h5rPOOMM7r//fubPn8+IESM46aSTAJgyZQo///nPB6xWpx+QpIpmzpzJ/fffz5gxY/jMZz7DTTfdRK1/27Phw4fvOKVx2LBhbN++faBKNdwlqarnn3+eAw88kLPPPpuzzjqLxx9/nGnTprFs2TJefPFF3njjDRYvXsysWbOaXapzy0gaWpp1mnFEcN9993HllVcyfPhw9t13X2666SYOOuggvva1r3H00UeTmZxwwgnMmzevKTW+pd6dvaXYnTo6OnIgb9bhee7S0PTUU09x2GGHNbWGTZs2MXnyZJ5/vtJ9Mhqmp589IlZmZkdvz3VYRpJ2Yv369Rx11FFcdNFFzS6lTxyWkaSdOPjgg3nmmWeaXUaf2XOXpAIZ7pJUIMNdkgpkuEtSgfxAVdLQcvmoBu9vS7+etnnzZm6++Wa++MXadFr33Xcf3/rWt7jrrrsaWV2/2XOXpH7YvHkz1157bcP21+ipCQx3Sargqquuoq2tjba2Nr7zne+wcOFCfvWrX9He3s7FF18MwCuvvMKpp57KoYceyhlnnLFj3pmVK1cya9YspkyZwnHHHceGDbVZ0WfPns2ll17KrFmzuPrqqxtar8MyktSLlStXcuONN/LYY4+RmUybNo0f//jHrFmzhlWrVgG1YZknnniCtWvXcvDBBzNjxgweeughpk2bxrnnnsudd95JS0sLt956K4sWLeKGG24Aau8Ali1b1vCaDXdJ6sWDDz7IySefzD777APAKaecsuNGHV1NnTqV1tZWANrb23nuuefYf//9WbNmDcceeywAb7zxBgcddNCO53zqU5/aLTUb7pLUi6pzcO299947Hr85xW9mcvjhh/PII4/0+Jw3/2E0mmPuktSLmTNncscdd/Dqq6+ydetWfvaznzFjxgxefvnlXp97yCGHsHHjxh3hvm3bNtauXbu7S7bnLmmI6eepi7ti8uTJnHnmmUydOhWAz3/+80yZMoUZM2bQ1tbGnDlzOPHEnmeCHTFiBLfffjvnnXceW7ZsYfv27Zx//vkcfvjhu7Vmp/zdTZzyV2qMwTDlb7M45a8k6S0Md0kqkOEuadBr1vBxM+3qz2y4SxrURo4cyaZNm/aogM9MNm3axMiRI/u9D8+WkTSotba20tnZycaNG5tdyoAaOXLkjgui+sNwlzSoDR8+nAkTJjS7jCHHYRlJKpDhLkkFMtwlqUCGuyQVyHCXpAIZ7pJUIMNdkgpkuEtSgSqFe0QcHxFPR8S6iFjYw/ZxEXFvRDwREb+MiBMaX6okqapewz0ihgHXAHOAScDpETGpW7O/B27LzCOA04BrG12oJKm6Kj33qcC6zHw2M18HbgHmdWuTwHvqj0cB6xtXoiSpr6qE+xjghS7LnfV1XV0OfDoiOoGlwLk97SgiFkTEiohYsadNAiRJA6lKuEcP67rPvXk68MPMbAVOAH4UEW/bd2Zel5kdmdnR0tLS92olSZVUCfdOYGyX5VbePuxyFnAbQGY+AowEDmhEgZKkvqsS7suBiRExISJGUPvAdEm3Nv8DfBwgIg6jFu6Ou0hSk/Qa7pm5HTgHuBt4itpZMWsj4oqImFtvdiFwdkSsBhYDZ+aedNsUSRpkKt2sIzOXUvugtOu6y7o8fhKY0djSJEn95RWqklQgw12SCmS4S1KBDHdJKpDhLkkFMtwlqUCGuyQVyHCXpAIZ7pJUIMNdkgpkuEtSgQx3SSqQ4S5JBTLcJalAhrskFchwl6QCGe6SVCDDXZIKZLhLUoEMd0kqkOEuSQUy3CWpQIa7JBXIcJekAhnuklQgw12SCmS4S1KBDHdJKpDhLkkFMtwlqUCGuyQVyHCXpALtVaVRRBwPXA0MA76fmV/voc0ngcuBBFZn5l83sM6h5/JRA3y8LQN7PEmDWq/hHhHDgGuAY4FOYHlELMnMJ7u0mQh8GZiRmS9FxIG7q2BJUu+qDMtMBdZl5rOZ+TpwCzCvW5uzgWsy8yWAzPxdY8uUJPVFlXAfA7zQZbmzvq6rDwIfjIiHIuLR+jCOJKlJqoy5Rw/rsof9TARmA63AAxHRlpmb37KjiAXAAoBx48b1uVhJUjVVeu6dwNguy63A+h7a3JmZ2zLz18DT1ML+LTLzuszsyMyOlpaW/tYsSepFlXBfDkyMiAkRMQI4DVjSrc0dwNEAEXEAtWGaZxtZqCSpul7DPTO3A+cAdwNPAbdl5tqIuCIi5tab3Q1siogngXuBizNz0+4qWpK0c5XOc8/MpcDSbusu6/I4gQvqX5KkJvMKVUkqkOEuSQUy3CWpQIa7JBXIcJekAhnuklQgw12SCmS4S1KBDHdJKpDhLkkFMtwlqUCGuyQVyHCXpAIZ7pJUIMNdkgpkuEtSgQx3SSqQ4S5JBTLcJalAhrskFchwl6QCGe6SVCDDXZIKZLhLUoEMd0kqkOEuSQUy3CWpQIa7JBXIcJekAhnuklQgw12SCmS4S1KBDHdJKlClcI+I4yPi6YhYFxELd9Lu1IjIiOhoXImSpL7qNdwjYhhwDTAHmAScHhGTemi3H3Ae8Fiji5Qk9U2VnvtUYF1mPpuZrwO3APN6aPdPwDeB1xpYnySpH6qE+xjghS7LnfV1O0TEEcDYzLxrZzuKiAURsSIiVmzcuLHPxUqSqqkS7tHDutyxMeJdwLeBC3vbUWZel5kdmdnR0tJSvUpJUp9UCfdOYGyX5VZgfZfl/YA24L6IeA6YDizxQ1VJap4q4b4cmBgREyJiBHAasOTNjZm5JTMPyMzxmTkeeBSYm5krdkvFkqRe9RrumbkdOAe4G3gKuC0z10bEFRExd3cXKEnqu72qNMrMpcDSbusue4e2s3e9LEnSrvAKVUkqkOEuSQUy3CWpQIa7JBXIcJekAhnuklQgw12SCmS4S1KBDHdJKpDhLkkFMtwlqUCGuyQVyHCXpAIZ7pJUIMNdkgpkuEtSgQx3SSqQ4S5JBTLcJalAhrskFchwl6QCGe6SVCDDXZIKZLhLUoEMd0kq0F7NLkCSGuryUQN8vC0De7yK7LlLUoEMd0kqkOEuSQUy3CWpQIa7JBXIcJekAhnuklSgSuEeEcdHxNMRsS4iFvaw/YKIeDIifhkRv4iI9ze+VElSVb2Ge0QMA64B5gCTgNMjYlK3Zk8AHZn5YeB24JuNLlSSVF2VnvtUYF1mPpuZrwO3APO6NsjMezPz1frio0BrY8uUJPVFlXAfA7zQZbmzvu6dnAX8V08bImJBRKyIiBUbN26sXqUkqU+qhHv0sC57bBjxaaADuLKn7Zl5XWZ2ZGZHS0tL9SolSX1SZeKwTmBsl+VWYH33RhFxDLAImJWZf2hMeZKk/qjSc18OTIyICRExAjgNWNK1QUQcAXwPmJuZv2t8mZKkvug13DNzO3AOcDfwFHBbZq6NiCsiYm692ZXAvsBPImJVRCx5h91JkgZApfncM3MpsLTbusu6PD6mwXVJknaBV6hKUoEMd0kqkOEuSQUy3CWpQIa7JBXIcJekAhnuklQgw12SClTpIiZpj3L5qAE+3paBPZ72CPbcJalAhrskFchwl6QCGe6SVCDDXZIKZLhLUoEMd0kqkOEuSQUy3CWpQIa7JBXIcJekAhnuklQgw12SCmS4S1KBDHdJKpDhLkkFMtwlqUCGuyQVyHCXpAIZ7pJUIMNdkgpkuEtSgQx3SSpQpXCPiOMj4umIWBcRC3vYvndE3Frf/lhEjG90oZKk6noN94gYBlwDzAEmAadHxKRuzc4CXsrMPwe+DXyj0YVKkqqr0nOfCqzLzGcz83XgFmBetzbzgH+rP74d+HhEROPKlCT1xV4V2owBXuiy3AlMe6c2mbk9IrYAo4EXuzaKiAXAgvriKxHxdH+KHgoCDqDbz79b/aP/SxvI125oK/31e3+VRlXCvafKsx9tyMzrgOsqHHPIi4gVmdnR7DrUd752Q5uvX02VYZlOYGyX5VZg/Tu1iYi9gFHA7xtRoCSp76qE+3JgYkRMiIgRwGnAkm5tlgB/U398KnBPZr6t5y5JGhi9DsvUx9DPAe4GhgE3ZObaiLgCWJGZS4AfAD+KiHXUeuyn7c6ih4g9YvipUL52Q5uvHxB2sCWpPF6hKkkFMtwlqUCGuyQVyHCXqM2PVGWdNFQY7g0SEROqrNOg9UjFddKQUOUKVVXzU2Byt3W3A1OaUIsqiog/ozZ9xp9ExBH88Wrr9wDvblphqiQiLtjZ9sy8aqBqGWwM910UEYcChwOjIuKULpveA4xsTlXqg+OAM6lded01CP4XuLQZBalP9mt2AYOV57nvooiYB8wH5vLWK3dfBm7JzIebUpj6JCI+kZk/bXYdUqMY7g0SEUdlpmO0Q1R9eOarwMGZOad+z4KjMvMHTS5NOxER/7Kz7Zl53kDVMtg4LNM4myLiF8CfZmZbRHwYmJuZX2l2YarkxvrXovryM8Ct1KbW0OC1stkFDFb23BskIpYBFwPfy8wj6uvWZGZbcytTFRGxPDOPjIgnurx+qzKzvdm1Sf1hz71x3p2Z/93tBlTbm1WM+mxrRIymfh+CiJgObGluSaoqIlqAS6jdCnTHiQyZ+bGmFdVkhnvjvBgRH+CP4XAqsKG5JakPLqD2gfgHIuIhoIXa9NUaGv6d2jDaicAXqE1BvrGpFTWZ4d44X6I21eihEfEb4NfAGc0tSX3wAWo3gR8LfILarST9+xg6RmfmDyLibzNzGbCsPlS6x/KXt3HmA0uBe6ld+bsVOCYiVmbmqqZWpir+ITN/EhHvBY4B/hn4Lm+/X7AGp2317xsi4kRqd4trbWI9Tef0A43TQe3t4HuB/andCHw2cH1E/F0T61I1b9S/nwj8a2beCYxoYj3qm69ExCjgQuAi4PvA+c0tqbkM98YZDUzOzIsy80JqYd8CzKR2BaQGt99ExPeATwJL65OG+fcxdPwVtbP/1mTm0cCxwMlNrqmp/OVtnHHA612WtwHvz8z/A/7QnJLUB5+kdivJ4zNzM/A+aqe2amj4cP11AyAzfw8c0cR6ms4x98a5GXg0Iu6sL/8lsDgi9gGebF5ZqiIzXwX+o8vyBjzbaSh5V0S8NzNfAoiI97GH55sXMTVQREwB/oLazIIPZuaKJpck7REi4rPAl6nNxJrU3ol9NTN/1NTCmshwl1SE+nxAH6PWufpFZu7R75gNd0kqkB+oSlKBDHdJKpDhLkkFMtwlqUD/D8IypErrsxOtAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "p1 = plotUsageComparation(df_json, df_other, 'operation')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Above the mean sample:" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAECCAYAAAAFL5eMAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAEUZJREFUeJzt3X+M1/V9wPHnq/zwWkHd4MzE00I6quK1IlwBSwOYagTtZFrT6lw3EytpWjXGlkm1Mc5l61Y72y7RbbhqZxtQZ1ckysKSWrX+LEdFCxoNtTivkPa8VqZYK7jX/vieeJ6H97njy33v3vd8JCb3+Xzf970Xhnvyuc99vp9vZCaSpLK8p9EDSJLqz7hLUoGMuyQVyLhLUoGMuyQVyLhLUoGMuyQVyLhLUoGMuyQVaGyjvvDkyZNz6tSpjfrykjQibdy48cXMbO5vXcPiPnXqVNrb2xv15SVpRIqI56us87SMJBXIuEtSgYy7JBWoYefcJamK3bt309HRwWuvvdboUYZUU1MTLS0tjBs3blCfb9wlDWsdHR1MnDiRqVOnEhGNHmdIZCZdXV10dHQwbdq0QT1Hv6dlIuLmiPh1RGzex+MREf8UEVsj4smImDWoSSSpD6+99hqTJk0aNWEHiAgmTZq0Xz+tVDnn/h1g8bs8vgSY3v3fMuCfBz2NJPVhNIX9Tfv7Z+437pn5APCbd1myFLg1ax4FDouII/ZrKknSfqnHOfcjgRd6bHd079vRe2FELKN2dM/RRx9dhy994E1dcU+jR6hk29+f0egRpCFR7+/Jqt87H/3oR3n44Yfr+rUPpHrEva+fHfp81+3MXAmsBGhra/OduevpmkMbPUE11+xs9ATSoIyksEN9rnPvAI7qsd0CbK/D80rSsDFhwgR27NjBggULmDlzJq2trfz4xz8GYPXq1XzoQx+itbWVK6644m2fc9VVV3HCCScwb948fvWrXw3ZvPWI+1rgL7qvmpkH7MzMd5ySkaSRbtWqVZx22mls2rSJJ554gpkzZ7J9+3auuOIK7r33XjZt2sSGDRtYs2YNALt27WLevHk88cQTLFiwgJtuumnIZq1yKeRq4BHgmIjoiIgLI+JzEfG57iXrgOeArcBNwOcP2LSS1EAf+chHuOWWW7jmmmv42c9+xsSJE9mwYQOLFi2iubmZsWPHcv755/PAAw8AMH78eD7xiU8AMHv2bLZt2zZks/Z7zj0zz+vn8QS+ULeJJGmYWrBgAQ888AD33HMPn/nMZ1i+fDmHHHLIPtePGzdu7yWNY8aMYc+ePUM1qveWkaSqnn/+eQ4//HAuuugiLrzwQn76058yd+5c7r//fl588UXeeOMNVq9ezcKFCxs9qrcfkDSyNOqy34jgvvvu47rrrmPcuHFMmDCBW2+9lSOOOIKvfvWrnHzyyWQmp59+OkuXLm3IjG+bt3ZWZei1tbXlSHizjhFznXvTnzV6hGq8FFID9PTTT3Pcccc1dIauri5mzZrF889Xep+Muunrzx4RGzOzrb/P9bSMJL2L7du3c9JJJ/GlL32p0aMMiKdlJOldTJkyhWeffbbRYwyYR+6SVCDjLkkFMu6SVCDjLkkF8heqkkaWet8BdZCX57700kusWrWKz3++dseV++67j69//evcfffd9Zxu0Dxyl6RBeOmll7jxxhvr9nz1vjWBcZekCq6//npaW1tpbW3lm9/8JitWrODnP/85M2fOZPny5QC88sornHPOORx77LGcf/75vPki0Y0bN7Jw4UJmz57Naaedxo4dtRvnLlq0iCuvvJKFCxfyrW99q67zelpGkvqxceNGbrnlFh577DEyk7lz5/K9732PzZs3s2nTJqB2Wubxxx9ny5YtTJkyhfnz5/PQQw8xd+5cLrnkEu666y6am5u5/fbbueqqq7j55puB2k8A999/f91nNu6S1I8HH3yQs846i4MPPhiAs88+e+8bdfQ0Z84cWlpaAJg5cybbtm3jsMMOY/PmzZx66qkAvPHGGxxxxFtvM/3pT3/6gMxs3CWpH1XvwXXQQQft/fjNW/xmJscffzyPPPJIn5/z5j8Y9eY5d0nqx4IFC1izZg2vvvoqu3bt4gc/+AHz58/n5Zdf7vdzjznmGDo7O/fGfffu3WzZsuVAj+yRu6QRpgF3Fp01axYXXHABc+bMAeCzn/0ss2fPZv78+bS2trJkyRLOOKPvWxGPHz+eO++8k0svvZSdO3eyZ88eLrvsMo4//vgDOrO3/O2Ht/ytM2/5qwEaDrf8bRRv+StJehvjLkkFMu6Shr1GnT5upP39Mxt3ScNaU1MTXV1doyrwmUlXVxdNTU2Dfg6vlpE0rLW0tNDR0UFnZ2ejRxlSTU1Ne18QNRjGXdKwNm7cOKZNm9boMUYcT8tIUoGMuyQVyLhLUoGMuyQVyLhLUoGMuyQVyLhLUoGMuyQVqFLcI2JxRDwTEVsjYkUfjx8dET+KiMcj4smIOL3+o0qSquo37hExBrgBWALMAM6LiBm9ln0FuCMzTwTOBW6s96CSpOqqHLnPAbZm5nOZ+TpwG7C015oEDun++FBge/1GlCQNVJV7yxwJvNBjuwOY22vNNcB/R8QlwMHAKXWZTpI0KFWO3KOPfb3vvXke8J3MbAFOB74bEe947ohYFhHtEdE+2u7wJklDqUrcO4Cjemy38M7TLhcCdwBk5iNAEzC59xNl5srMbMvMtubm5sFNLEnqV5W4bwCmR8S0iBhP7Rema3ut+R/g4wARcRy1uHtoLkkN0m/cM3MPcDGwHnia2lUxWyLi2og4s3vZF4GLIuIJYDVwQY6mt02RpGGm0pt1ZOY6YF2vfVf3+PgpYH59R5MkDZavUJWkAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAlWKe0QsjohnImJrRKzYx5pPRcRTEbElIlbVd0xJ0kCM7W9BRIwBbgBOBTqADRGxNjOf6rFmOvBlYH5m/jYiDj9QA0uS+lflyH0OsDUzn8vM14HbgKW91lwE3JCZvwXIzF/Xd0xJ0kBUifuRwAs9tju69/X0QeCDEfFQRDwaEYv7eqKIWBYR7RHR3tnZObiJJUn9qhL36GNf9toeC0wHFgHnAf8WEYe945MyV2ZmW2a2NTc3D3RWSVJFVeLeARzVY7sF2N7Hmrsyc3dm/gJ4hlrsJUkNUCXuG4DpETEtIsYD5wJre61ZA5wMEBGTqZ2mea6eg0qSqus37pm5B7gYWA88DdyRmVsi4tqIOLN72XqgKyKeAn4ELM/MrgM1tCTp3fV7KSRAZq4D1vXad3WPjxO4vPs/SVKD+QpVSSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAhl3SSqQcZekAlWKe0QsjohnImJrRKx4l3XnRERGRFv9RpQkDVS/cY+IMcANwBJgBnBeRMzoY91E4FLgsXoPKUkamCpH7nOArZn5XGa+DtwGLO1j3d8AXwNeq+N8kqRBqBL3I4EXemx3dO/bKyJOBI7KzLvrOJskaZCqxD362Jd7H4x4D/AN4Iv9PlHEsohoj4j2zs7O6lNKkgakStw7gKN6bLcA23tsTwRagfsiYhswD1jb1y9VM3NlZrZlZltzc/Pgp5Ykvasqcd8ATI+IaRExHjgXWPvmg5m5MzMnZ+bUzJwKPAqcmZntB2RiSVK/+o17Zu4BLgbWA08Dd2Tmloi4NiLOPNADSpIGbmyVRZm5DljXa9/V+1i7aP/HkiTtD1+hKkkFMu6SVCDjLkkFMu6SVCDjLkkFMu6SVCDjLkkFMu6SVCDjLkkFMu6SVCDjLkkFMu6SVCDjLkkFMu6SVCDjLkkFMu6SVCDjLkkFMu6SVCDjLkkFMu6SVCDjLkkFMu6SVCDjLkkFMu6SVCDjLkkFMu6SVCDjLkkFMu6SVCDjLkkFMu6SVCDjLkkFMu6SVCDjLkkFqhT3iFgcEc9ExNaIWNHH45dHxFMR8WRE/DAi3l//USVJVfUb94gYA9wALAFmAOdFxIxeyx4H2jLzw8CdwNfqPagkqboqR+5zgK2Z+Vxmvg7cBiztuSAzf5SZr3ZvPgq01HdMSdJAVIn7kcALPbY7uvfty4XAf/X1QEQsi4j2iGjv7OysPqUkaUCqxD362Jd9Loz4c6ANuK6vxzNzZWa2ZWZbc3Nz9SklSQMytsKaDuCoHtstwPbeiyLiFOAqYGFm/r4+40mSBqPKkfsGYHpETIuI8cC5wNqeCyLiROBfgTMz89f1H1OSNBD9xj0z9wAXA+uBp4E7MnNLRFwbEWd2L7sOmAD8R0Rsioi1+3g6SdIQqHJahsxcB6zrte/qHh+fUue5JEn7wVeoSlKBjLskFci4S1KBjLskFci4S1KBjLskFci4S1KBjLskFci4S1KBjLskFci4S1KBjLskFci4S1KBjLskFci4S1KBjLskFci4S1KBjLskFci4S1KBjLskFci4S1KBjLskFci4S1KBjLskFci4S1KBjLskFci4S1KBjLskFci4S1KBjLskFci4S1KBjLskFahS3CNicUQ8ExFbI2JFH48fFBG3dz/+WERMrfegkqTq+o17RIwBbgCWADOA8yJiRq9lFwK/zcw/Br4B/EO9B5UkVVflyH0OsDUzn8vM14HbgKW91iwF/r374zuBj0dE1G9MSdJAjK2w5kjghR7bHcDcfa3JzD0RsROYBLzYc1FELAOWdW++EhHPDGZovVPAZHr9/x6W/tp/80ehkfF3c+R4f5VFVeLe13djDmINmbkSWFnha2qAIqI9M9saPYfUm383G6PKaZkO4Kge2y3A9n2tiYixwKHAb+oxoCRp4KrEfQMwPSKmRcR44Fxgba81a4G/7P74HODezHzHkbskaWj0e1qm+xz6xcB6YAxwc2ZuiYhrgfbMXAt8G/huRGyldsR+7oEcWn3ydJeGK/9uNkB4gC1J5fEVqpJUIOMuSQUy7pJUIOMuqe4i4qAq+3TgGPcRKiKmVdknNcgjFffpAKnyClUNT98HZvXadycwuwGzSABExB9Rux3JeyPiRN569fohwPsaNtgoZNxHmIg4FjgeODQizu7x0CFAU2OmkvY6DbiA2ivZr++x/3+BKxsx0Gjlde4jTEQsBf4UOJO3v1L4ZeC2zHy4IYNJPUTEJzPz+42eYzQz7iNURJyUmZ7D1LDUfXrmb4Epmbmk+z0gTsrMbzd4tFHDX6iOXF0R8cOI2AwQER+OiK80eiip2y3UblkypXv7WeCyxo0z+hj3kesm4MvAboDMfBLv6aPhY3Jm3gH8H9TuUQW80diRRhfjPnK9LzN/0mvfnoZMIr3TroiYRPf7OkTEPGBnY0caXbxaZuR6MSI+wFvfPOcAOxo7krTX5dR+4f+BiHgIaKZ2O3ANEeM+cn2B2q1Uj42IXwK/AM5v7EjSXh8AllB7E59PUntrTnszhLxaZoSKiMu7P3wvtdNru6j92LsxMzc1bDAJiIgnM/PDEfEx4O+AfwSuzMze77+sA8Rz7iNXG/A54A+Aw6i98fgi4KaI+KsGziXBW788PQP4l8y8CxjfwHlGHY/cR6iIWA98MjNf6d6eQO32A2dRO3qf0cj5NLpFxN3AL4FTqN0S43fATzLzhIYONop45D5yHQ283mN7N/D+zPwd8PvGjCTt9Slq17kvzsyXgD8Eljd2pNHFX3CMXKuARyPiru7tPwFWR8TBwFONG0uCzHwV+M8e2zvwaq4h5WmZESwiZgMfo3bnvQczs73BI0kaJoy7JBXIc+6SVCDjLkkFMu6SVCDjLkkF+n9G2kb/7zCg+wAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "p2 = plotUsageComparation(df_a_json, df_a_other, 'operation')" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
jsonotherjsonother
get0.9967190.6167281.00.999047
set0.0032810.131672NaN0.000953
callNaN0.251601NaNNaN
\n", + "
" + ], + "text/plain": [ + " json other json other\n", + "get 0.996719 0.616728 1.0 0.999047\n", + "set 0.003281 0.131672 NaN 0.000953\n", + "call NaN 0.251601 NaN NaN" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEDCAYAAADOc0QpAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAFhZJREFUeJzt3X2QVfWd5/H3NzzIjqBGbXfB1kBlHUWa2EoLKFsNzmgh6sro9EaFdbDKh0o56rqJLIzsptjsTJkZXUenSmejiUyMC+o4K1LKllsVA8SHuDTajCDRQgfXHiiDJLCK6wSo7/5xW9JpWvr05dKXPr5fVZT3nPs79366Ln4499fnITITSVK5fKneASRJtWe5S1IJWe6SVEKWuySVkOUuSSVkuUtSCVnuklRClrsklZDlLkklNLReb3ziiSfm2LFj6/X2kjQorVu37sPMbOhrXN3KfezYsbS3t9fr7SVpUIqI94qMc1pGkkrIcpekErLcJamE6jbnLklF7Nmzh87OTj799NN6RxlQI0aMoLGxkWHDhlW1veUu6YjW2dnJqFGjGDt2LBFR7zgDIjPZsWMHnZ2djBs3rqrX6HNaJiIeiYhfRMSGz3k+IuKvImJzRPx9RJxTVRJJ6sWnn37KCSec8IUpdoCI4IQTTjikbytF5tz/Brj4IM/PAk7r+nMT8NdVp5GkXnyRiv0zh/oz91numbkG+OVBhswGHs2KnwHHRcToQ0olSToktZhzPxl4v9tyZ9e6bT0HRsRNVPbuOfXUU2vw1keuiT+cWNV2b8x7o8ZJpHIZu/C5mr7elu9eWmjc+eefz8svv1zT9z6calHuvX136PWu25n5EPAQQEtLy4DembvavxBFP/ha2XTG+Kq2G//zTTVOIqm7wVTsUJvj3DuBU7otNwJba/C6knTEGDlyJNu2baO1tZXm5maampr46U9/CsCyZcuYOHEiTU1NLFiw4Le2WbRoEWeddRZTp07lgw8+GLC8tSj3FcAfdR01MxXYlZkHTMlI0mC3dOlSZs6cSUdHB+vXr6e5uZmtW7eyYMECXnjhBTo6Oli7di3Lly8HYPfu3UydOpX169fT2trKww8/PGBZixwKuQx4BTg9Ijoj4vqI+EZEfKNryErgXWAz8DBw82FLK0l1dO6557JkyRIWL17MG2+8wahRo1i7di0zZsygoaGBoUOHMnfuXNasWQPA8OHDueyyywCYNGkSW7ZsGbCsfc65Z+Y1fTyfwB/XLJEkHaFaW1tZs2YNzz33HNdeey3z58/nmGOO+dzxw4YN239I45AhQ9i7d+9ARfXaMpJU1HvvvcdJJ53EjTfeyPXXX89rr73GlClTWL16NR9++CH79u1j2bJlTJ8+vd5RvfyApMFloI9g+0xEsGrVKu6++26GDRvGyJEjefTRRxk9ejR33XUXF1xwAZnJJZdcwuzZs+uSsTvLXZL6sGPHDo4//njmzZvHvHnzDnh+zpw5zJkz54D1H3/88f7HbW1ttLW1Hdac3TktI0kHsXXrVs477zzuuOOOekfpF/fcJekgxowZw9tvv13vGP3mnrsklZDlLkklZLlLUglZ7pJUQv5CVdLgsvjYGr/erqo227lzJ0uXLuXmmytXXFm1ahX33HMPzz77bC3TVc09d0mqws6dO3nwwQdr9nq1vjSB5S5JBdx77700NTXR1NTEfffdx8KFC3nnnXdobm5m/vz5QOWkpba2Ns444wzmzp1L5dJbsG7dOqZPn86kSZOYOXMm27ZVLpw7Y8YM7rzzTqZPn879999f07xOy0hSH9atW8eSJUt49dVXyUymTJnCY489xoYNG+jo6AAq0zKvv/46GzduZMyYMUybNo2XXnqJKVOmcOutt/LMM8/Q0NDAE088waJFi3jkkUeAyjeA1atX1zyz5S5JfXjxxRe54oorOProowG48sor99+oo7vJkyfT2NgIQHNzM1u2bOG4445jw4YNXHTRRQDs27eP0aN/c5vpq6666rBkttwlqQ+fTa/05aijjtr/+LNL/GYmEyZM4JVXXul1m8/+wag159wlqQ+tra0sX76cTz75hN27d/P0008zbdo0Pvrooz63Pf3009m+ffv+ct+zZw8bN2483JHdc5c0yFR56OKhOOecc7juuuuYPHkyADfccAOTJk1i2rRpNDU1MWvWLC69tPdLEQ8fPpynnnqK2267jV27drF3715uv/12JkyYcFgzR9GvG7XW0tKS7e3tA/Z+Yxc+V9V21V47euIPJ1a13ZN3VXc41Pifb6pqO+lIt2nTJsaPH1/vGHXR288eEesys6WvbZ2WkaQSstwlqYQsd0kqIctdkkrIcpekErLcJamEPM5d0qBS7WHGn+eNeW8UGnf++efz8ssv1/S9Dyf33CWpgMFU7GC5S1IhI0eOZNu2bbS2ttLc3ExTU9P+i4ctW7aMiRMn0tTUxIIFC35rm0WLFnHWWWcxdepUPvjggwHLa7lLUkFLly5l5syZdHR0sH79epqbm9m6dSsLFizghRdeoKOjg7Vr17J8+XIAdu/ezdSpU1m/fj2tra08/PDDA5bVcpekgs4991yWLFnC4sWLeeONNxg1ahRr165lxowZNDQ0MHToUObOncuaNWuAynVlLrvsMgAmTZrEli1bBiyr5S5JBbW2trJmzRpOPvlkrr32Wh599NGDXg542LBhRATwm0sADxTLXZIKeu+99zjppJO48cYbuf7663nttdeYMmUKq1ev5sMPP2Tfvn0sW7aM6dOn1zuqh0JKGlyKHrpYaxHBqlWruPvuuxk2bBgjR47k0UcfZfTo0dx1111ccMEFZCaXXHIJs2fPrkvG7gqVe0RcDNwPDAG+n5nf7fH8qcAPgeO6xizMzJU1zipJdbFjxw6OP/545s2bx7x58w54fs6cOcyZM+eA9R9//PH+x21tbbS1tR3WnN31OS0TEUOAB4BZwJnANRFxZo9h/xF4MjPPBq4GHqx1UEmqh61bt3Leeedxxx131DtKvxTZc58MbM7MdwEi4nFgNvBmtzEJHNP1+Fhgay1DSlK9jBkzhrfffrveMfqtSLmfDLzfbbkTmNJjzGLgf0XErcDRwIU1SSdJqkqRo2Wil3U9j/25BvibzGwELgF+FBEHvHZE3BQR7RHRvn379v6nlSQVUqTcO4FTui03cuC0y/XAkwCZ+QowAjix5wtl5kOZ2ZKZLQ0NDdUlliT1qUi5rwVOi4hxETGcyi9MV/QY83+A3weIiPFUyt1dc0mqkz7n3DNzb0TcAjxP5TDHRzJzY0R8B2jPzBXAt4CHI+LfU5myuS4PdtqWJFVp0xnja/p643++qartdu7cydKlS7n55psBWLVqFffccw/PPvtsLeNVrdBx7l3HrK/sse7b3R6/CUyrbTRJOnLt3LmTBx98cH+5H6q9e/cydGjtziv18gOSVMC9995LU1MTTU1N3HfffSxcuJB33nmH5uZm5s+fD1ROWmpra+OMM85g7ty5+687s27dOqZPn86kSZOYOXMm27ZtA2DGjBnceeedTJ8+nfvvv7+meb38gCT1Yd26dSxZsoRXX32VzGTKlCk89thjbNiwgY6ODqAyLfP666+zceNGxowZw7Rp03jppZeYMmUKt956K8888wwNDQ088cQTLFq0iEceeQSofANYvXp1zTNb7pLUhxdffJErrriCo48+GoArr7xy/406ups8eTKNjY0ANDc3s2XLFo477jg2bNjARRddBMC+ffsYPXr0/m2uuuqqw5LZcpekPhQ9PuSoo47a//izS/xmJhMmTOCVV17pdZvP/sGoNefcJakPra2tLF++nE8++YTdu3fz9NNPM23aND766KM+tz399NPZvn37/nLfs2cPGzduPNyR3XOXNLhUe+jioTjnnHO47rrrmDx5MgA33HADkyZNYtq0aTQ1NTFr1iwuvfTSXrcdPnw4Tz31FLfddhu7du1i79693H777UyYMOGwZo56HY7e0tKS7e3tA/Z+Yxc+V9V2W77b+wfWl4k/nFjVdk/eVd2dWurxF14aCJs2bWL8+Noe2z5Y9PazR8S6zGzpa1unZSSphCx3SSohy13SEe+LeDWTQ/2ZLXdJR7QRI0awY8eOL1TBZyY7duxgxIgRVb+GR8tIOqI1NjbS2dnJF+0eECNGjNh/QlQ1LHdJR7Rhw4Yxbty4escYdJyWkaQSstwlqYQsd0kqIctdkkrIcpekErLcJamELHdJKiHLXZJKyHKXpBKy3CWphCx3SSohy12SSshyl6QSstwlqYQsd0kqIctdkkrIcpekErLcJamELHdJKiHLXZJKqNANsiPiYuB+YAjw/cz8bi9jvg4sBhJYn5lzapizfhYfW912406tbQ5J6oc+yz0ihgAPABcBncDaiFiRmW92G3Ma8CfAtMz8VUScdLgCS5L6VmRaZjKwOTPfzcxfA48Ds3uMuRF4IDN/BZCZv6htTElSfxQp95OB97std3at6+53gd+NiJci4mdd0zgHiIibIqI9Itq3b99eXWJJUp+KlHv0si57LA8FTgNmANcA34+I4w7YKPOhzGzJzJaGhob+ZpUkFVSk3DuBU7otNwJbexnzTGbuycx/AN6iUvaSpDooUu5rgdMiYlxEDAeuBlb0GLMcuAAgIk6kMk3zbi2DSpKK67PcM3MvcAvwPLAJeDIzN0bEdyLi8q5hzwM7IuJN4CfA/MzccbhCS5IOrtBx7pm5EljZY923uz1O4JtdfyRJdeYZqpJUQpa7JJWQ5S5JJWS5S1IJWe6SVEKWuySVkOUuSSVkuUtSCVnuklRClrsklZDlLkklZLlLUglZ7pJUQpa7JJWQ5S5JJWS5S1IJWe6SVEKWuySVkOUuSSVkuUtSCVnuklRClrsklZDlLkklZLlLUglZ7pJUQpa7JJWQ5S5JJWS5S1IJWe6SVEKWuySVkOUuSSVkuUtSCRUq94i4OCLeiojNEbHwIOPaIiIjoqV2ESVJ/dVnuUfEEOABYBZwJnBNRJzZy7hRwG3Aq7UOKUnqnyJ77pOBzZn5bmb+GngcmN3LuP8C/AXwaQ3zSZKqUKTcTwbe77bc2bVuv4g4GzglM5+tYTZJUpWKlHv0si73PxnxJeAvgW/1+UIRN0VEe0S0b9++vXhKSVK/FCn3TuCUbsuNwNZuy6OAJmBVRGwBpgIrevulamY+lJktmdnS0NBQfWpJ0kEVKfe1wGkRMS4ihgNXAys+ezIzd2XmiZk5NjPHAj8DLs/M9sOSWJLUpz7LPTP3ArcAzwObgCczc2NEfCciLj/cASVJ/Te0yKDMXAms7LHu258zdsahx5IkHQrPUJWkErLcJamELHdJKiHLXZJKyHKXpBKy3CWphCx3SSohy12SSshyl6QSstwlqYQsd0kqIctdkkrIcpekErLcJamELHdJKiHLXZJKyHKXpBKy3CWphCx3SSohy12SSshyl6QSstwlqYQsd0kqIctdkkrIcpekEhpa7wCS1KvFx1a53a7a5hik3HOXpBKy3CWphCx3SSohy12SSshyl6QSstwlqYQsd0kqoULlHhEXR8RbEbE5Ihb28vw3I+LNiPj7iPhxRHyl9lElSUX1We4RMQR4AJgFnAlcExFn9hj2OtCSmV8DngL+otZBJUnFFdlznwxszsx3M/PXwOPA7O4DMvMnmflJ1+LPgMbaxpQk9UeRcj8ZeL/bcmfXus9zPfA/e3siIm6KiPaIaN++fXvxlJKkfilS7tHLuux1YMS/BVqAu3t7PjMfysyWzGxpaGgonlKS1C9FLhzWCZzSbbkR2NpzUERcCCwCpmfmP9UmniSpGkX23NcCp0XEuIgYDlwNrOg+ICLOBr4HXJ6Zv6h9TElSf/RZ7pm5F7gFeB7YBDyZmRsj4jsRcXnXsLuBkcDfRkRHRKz4nJeTJA2AQtdzz8yVwMoe677d7fGFNc4lSToEnqEqSSVkuUtSCVnuklRClrsklZDlLkklZLlLUglZ7pJUQpa7JJVQoZOYpEFr8bFVbLOr9jmkAeaeuySVkOUuSSVkuUtSCVnuklRClrsklZDlLkklZLlLUglZ7pJUQpa7JJWQ5S5JJWS5S1IJWe6SVEKWuySVkOUuSSVkuUtSCVnuklRClrsklZDlLkklZLlLUglZ7pJUQpa7JJWQ5S5JJWS5S1IJFSr3iLg4It6KiM0RsbCX54+KiCe6nn81IsbWOqgkqbg+yz0ihgAPALOAM4FrIuLMHsOuB36Vmf8S+Evgz2sdVJJUXJE998nA5sx8NzN/DTwOzO4xZjbww67HTwG/HxFRu5iSpP4YWmDMycD73ZY7gSmfNyYz90bELuAE4MPugyLiJuCmrsWPI+KtakIPpOr/hdpwIj1+/iJ6fiUqzH9La+c/R1WfnY4Q5f/8vlJkUJFy7601sooxZOZDwEMF3nPQi4j2zGypdw71n5/d4ObnV1FkWqYTOKXbciOw9fPGRMRQ4Fjgl7UIKEnqvyLlvhY4LSLGRcRw4GpgRY8xK4B5XY/bgBcy84A9d0nSwOhzWqZrDv0W4HlgCPBIZm6MiO8A7Zm5AvgB8KOI2Exlj/3qwxl6kPhCTD+VlJ/d4ObnB4Q72JJUPp6hKkklZLlLUglZ7pJUQpa7ROX6SEXWSYOF5V4jETGuyDodsV4puE4aFIqcoapi/g44p8e6p4BJdciigiLiX1C5fMY/i4iz+c3Z1scAv1O3YCokIr55sOcz896BynKksdwPUUScAUwAjo2IK7s9dQwwoj6p1A8zgeuonHndvQj+L3BnPQKpX0bVO8CRyuPcD1FEzAb+ALic3z5z9yPg8cx8uS7B1C8R8YeZ+Xf1ziHViuVeIxFxXmY6RztIdU3P/BkwJjNndd2z4LzM/EGdo+kgIuKvDvZ8Zt42UFmONE7L1M6OiPgx8M8zsykivgZcnpl/Wu9gKmRJ159FXctvA09QubSGjlzr6h3gSOWee41ExGpgPvC9zDy7a92GzGyqbzIVERFrM/PciHi92+fXkZnN9c4mVcM999r5ncz83z1uQLW3XmHUb7sj4gS67kMQEVOBXfWNpKIiogFYQOV+N/sPZMjM36tbqDqz3Gvnw4j4Kr8phzZgW30jqR++SeUX4l+NiJeABiqXr9bg8N+pTKNdCnyDyiXIt9c1UZ1Z7rXzx1QuNXpGRPwj8A/A3PpGUj98lcpN4E8B/pDKrST9/2PwOCEzfxAR/y4zVwOru6ZKv7D8y1s7fwCsBH5C5czf3cCFEbEuMzvqmkxF/KfM/NuI+DJwIfBfgb/mwPsF68i0p+u/2yLiUip3i2usY5668/IDtdNC5evgl4HjqNwIfAbwcET8hzrmUjH7uv57KfDfMvMZYHgd86h//jQijgW+BdwBfB+4vb6R6styr50TgHMy847M/BaVsm8AWqmcAakj2z9GxPeArwMruy4a5v8fg8e/oXL034bMvAC4CLiizpnqyr+8tXMq8Otuy3uAr2Tm/wP+qT6R1A9fp3IryYszcydwPJVDWzU4fK3rcwMgM38JnF3HPHXnnHvtLAV+FhHPdC3/a2BZRBwNvFm/WCoiMz8B/ke35W14tNNg8qWI+HJm/gogIo7nC95vnsRUQxExCfhXVK4s+GJmttc5kvSFEBF/BPwJlSuxJpVvYn+WmT+qa7A6stwllULX9YB+j8rO1Y8z8wv9jdlyl6QS8heqklRClrsklZDlLkklZLlLUgn9fwMN6ewEpG0IAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "p3 = pd.concat([p1, p2], axis=1, sort=False).drop_duplicates()\n", + "p3.plot(kind='bar')\n", + "p3" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# SYMBOLS\n", + "There is a pretty big difference[1] of unique values that appear on the whole sample and the filtered sample. \n", + "For the whole sample there is as much as 245 different symbols on the non json values, but it is drastically reduced to 2 symbols[2] for the filtered sample with only value_len above the mean, most being 'window.document.cookie' (99%). \n", + "For the valid JSONs there are only 12 symbols total reduced to 5 symbols[3]. \n", + "\n", + "---\n", + " For futher investigation: \n", + "1. Why is the difference so big? does it have any meaning?\n", + "2. What are the meaning of the 2 symbols of the non-json? are they special? why only 2? Why 'window.document.cookie'?\n", + "3. Why these 5 symbols? what do they do? what do they represent?\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Full Sample:" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are 245 unique symbol present on the non-json dataset and 12 on the JSONs\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAEOCAYAAACHE9xHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAE8NJREFUeJzt3X9wXWWdx/H3FxqoClYshQVaCDhFobaEEll2caVLcalF5cfKjw7ajnWtzoDKLhXbsjP+WMXKoLLsrJ3BUSlM15atMjCIbKFTBpzxV1MjtFS0owVCawmIFQRqW777R04wtGmTJrm9yZP3a+bOOee5zznnmyb95OS5zz03MhNJUrkOqHcBkqTaMuglqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhRtR7wIADj/88GxsbKx3GZI0pLS0tDyTmWN66jcogr6xsZHVq1fXuwxJGlIi4vHe9HPoRpIK12PQR8S4iFgVEesjYl1EfKpq/1xEPBURrdVjepd95kfEhoh4LCLOreUXIEnau94M3ewArs7MNRFxKNASEfdVz309M2/o2jkiTgYuAyYARwP3R8SJmblzIAuXJPVOj0GfmZuBzdX68xGxHjhmL7ucDyzNzG3A7yJiA3A68ON9KWz79u20tbXx8ssv78tuqqGRI0cyduxYGhoa6l2KpH2wTy/GRkQjcCrwU+BM4MqImAmspuOq/zk6fgn8pMtubXTziyEi5gBzAI499tjdztXW1sahhx5KY2MjEbEvZaoGMpNnn32WtrY2jj/++HqXI2kf9PrF2Ig4BPgecFVm/glYBLwFaKLjiv+rnV272X23TzfJzJszszkzm8eM2X120Msvv8zo0aMN+UEiIhg9erR/YUlDUK+CPiIa6Aj5JZn5fYDM3JKZOzPzFeCbdAzPQMcV/Lguu48FNvWlOEN+cPH7IQ1NvZl1E8C3gPWZ+bUu7Ud16XYhsLZavwu4LCIOjojjgfHAzwauZEnSvujNGP2ZwIeARyKitWpbAMyIiCY6hmU2Ah8DyMx1EXE78CgdM3auGIgZN43zftDfQ7zGxoXn7fX5KVOmMH/+fM4996+zQ2+88UZ+/etf841vfGOP+x1yyCG88MILA1LjrbfeyvXXX09mkpnMnj2buXPnDsixO1133XUsWLBgQI+p/W+g/38Mdz3lw1DT4xV9Zv4oMyMzJ2VmU/W4JzM/lJkTq/b3V7NzOvf5Uma+JTPfmpk/rO2XUBszZsxg6dKlr2lbunQpM2bM2C/n/+EPf8iNN97IihUrWLduHWvWrGHUqFEDfp7rrrtuwI8paXDxnbF78IEPfIC7776bbdu2AbBx40Y2bdrEO9/5Tl544QWmTp3K5MmTmThxInfeeedu+z/wwAO8973vfXX7yiuv5JZbbgGgpaWFs846i9NOO41zzz2XzZs377b/l7/8ZW644QaOPvpooGNq40c/+lEAWltbOeOMM5g0aRIXXnghzz33HNDxV0jnrSSeeeYZOu8fdMstt3DRRRcxbdo0xo8fzzXXXAPAvHnzeOmll2hqauLyyy/nz3/+M+eddx6nnHIKb3/721m2bNkA/EtKqjeDfg9Gjx7N6aefzr333gt0XM1feumlRAQjR47kjjvuYM2aNaxatYqrr76azN0mFnVr+/btfOITn2D58uW0tLQwe/Zsrr322t36rV27ltNOO63bY8ycOZOvfOUrPPzww0ycOJHPf/7zPZ63tbWVZcuW8cgjj7Bs2TKefPJJFi5cyOte9zpaW1tZsmQJ9957L0cffTS//OUvWbt2LdOmTevV1yRpcDPo96Lr8E3XYZvMZMGCBUyaNIlzzjmHp556ii1btvTqmI899hhr167l3e9+N01NTXzxi1+kra2t1zVt3bqVP/7xj5x11lkAzJo1iwcffLDH/aZOncqoUaMYOXIkJ598Mo8/vvu9kCZOnMj999/PZz7zGR566KGaDBVJ2v8M+r244IILWLlyJWvWrOGll15i8uTJACxZsoT29nZaWlpobW3lyCOP3G1++YgRI3jllVde3e58PjOZMGECra2ttLa28sgjj7BixYrdzj1hwgRaWlr2qd6u59y1noMPPvjV9QMPPJAdO3bstv+JJ55IS0sLEydOZP78+XzhC1/Yp/NLGpwM+r045JBDmDJlCrNnz37Ni7Bbt27liCOOoKGhgVWrVnV7dXzcccfx6KOPsm3bNrZu3crKlSsBeOtb30p7ezs//nHHHSG2b9/OunXrdtt//vz5XHPNNfz+978HYNu2bdx0002MGjWKww47jIceegiA22677dWr+8bGxld/OSxfvrxXX2NDQwPbt28HYNOmTbz+9a/ngx/8IHPnzmXNmjW9OoakwW1Q3I++N+o13WnGjBlcdNFFr5mBc/nll/O+972P5uZmmpqaeNvb3rbbfuPGjeOSSy5h0qRJjB8/nlNPPRWAgw46iOXLl/PJT36SrVu3smPHDq666iomTJjwmv2nT5/Oli1bOOecc8hMIoLZs2cDsHjxYj7+8Y/z4osvcsIJJ/Cd73wHgLlz53LJJZdw2223cfbZZ/fq65szZw6TJk1i8uTJzJw5k09/+tMccMABNDQ0sGjRoj79m0kaXKK3LyLWUnNzc+76wSPr16/npJNOqlNF2hO/L4OT8+gH1lCZRx8RLZnZ3FM/h24kqXAGvSQVblAH/WAYVtJf+f2QhqZBG/QjR47k2WefNVwGic770Y8cObLepUjaR4N21s3YsWNpa2ujvb293qWo0vkJU5KGlkEb9A0NDX6SkSQNgEE7dCNJGhgGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhTPoJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBXOoJekwhn0klS4HoM+IsZFxKqIWB8R6yLiU1X7myPivoj4TbU8rGqPiLgpIjZExMMRMbnWX4Qkac96c0W/A7g6M08CzgCuiIiTgXnAyswcD6ystgHeA4yvHnOARQNetSSp13oM+szcnJlrqvXngfXAMcD5wOKq22Lggmr9fODW7PAT4E0RcdSAVy5J6pV9GqOPiEbgVOCnwJGZuRk6fhkAR1TdjgGe7LJbW9UmSaqDXgd9RBwCfA+4KjP/tLeu3bRlN8ebExGrI2J1e3t7b8uQJO2jXgV9RDTQEfJLMvP7VfOWziGZavl01d4GjOuy+1hg067HzMybM7M5M5vHjBnT1/olST3ozaybAL4FrM/Mr3V56i5gVrU+C7izS/vMavbNGcDWziEeSdL+N6IXfc4EPgQ8EhGtVdsCYCFwe0R8BHgCuLh67h5gOrABeBH48IBWLEnaJz0GfWb+iO7H3QGmdtM/gSv6WZckaYD4zlhJKpxBL0mFM+glqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhTPoJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBXOoJekwhn0klQ4g16SCmfQS1LhDHpJKpxBL0mFM+glqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSpcj0EfEd+OiKcjYm2Xts9FxFMR0Vo9pnd5bn5EbIiIxyLi3FoVLknqnd5c0d8CTOum/euZ2VQ97gGIiJOBy4AJ1T7fiIgDB6pYSdK+6zHoM/NB4A+9PN75wNLM3JaZvwM2AKf3oz5JUj/1Z4z+yoh4uBraOaxqOwZ4skuftqpNklQnfQ36RcBbgCZgM/DVqj266ZvdHSAi5kTE6ohY3d7e3scyJEk96VPQZ+aWzNyZma8A3+SvwzNtwLguXccCm/ZwjJszszkzm8eMGdOXMiRJvdCnoI+Io7psXgh0zsi5C7gsIg6OiOOB8cDP+leiJKk/RvTUISK+C0wBDo+INuCzwJSIaKJjWGYj8DGAzFwXEbcDjwI7gCsyc2dtSpck9UaPQZ+ZM7pp/tZe+n8J+FJ/ipIkDRzfGStJhTPoJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBXOoJekwhn0klQ4g16SCmfQS1LhDHpJKpxBL0mFM+glqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhTPoJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqnEEvSYUz6CWpcD0GfUR8OyKejoi1XdreHBH3RcRvquVhVXtExE0RsSEiHo6IybUsXpLUs95c0d8CTNulbR6wMjPHAyurbYD3AOOrxxxg0cCUKUnqqx6DPjMfBP6wS/P5wOJqfTFwQZf2W7PDT4A3RcRRA1WsJGnf9XWM/sjM3AxQLY+o2o8BnuzSr61q201EzImI1RGxur29vY9lSJJ6MtAvxkY3bdldx8y8OTObM7N5zJgxA1yGJKlTX4N+S+eQTLV8umpvA8Z16TcW2NT38iRJ/dXXoL8LmFWtzwLu7NI+s5p9cwawtXOIR5JUHyN66hAR3wWmAIdHRBvwWWAhcHtEfAR4Ari46n4PMB3YALwIfLgGNUuS9kGPQZ+ZM/bw1NRu+iZwRX+LkiQNHN8ZK0mFM+glqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhTPoJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBXOoJekwhn0klQ4g16SCmfQS1LhDHpJKpxBL0mFM+glqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhRvRn50jYiPwPLAT2JGZzRHxZmAZ0AhsBC7JzOf6V6Ykqa8G4or+HzOzKTObq+15wMrMHA+srLYlSXVSi6Gb84HF1fpi4IIanEOS1Ev9DfoEVkRES0TMqdqOzMzNANXyiO52jIg5EbE6Ila3t7f3swxJ0p70a4weODMzN0XEEcB9EfGr3u6YmTcDNwM0NzdnP+uQJO1Bv67oM3NTtXwauAM4HdgSEUcBVMun+1ukJKnv+hz0EfGGiDi0cx34J2AtcBcwq+o2C7izv0VKkvquP0M3RwJ3RETncf4nM++NiJ8Dt0fER4AngIv7X6Ykqa/6HPSZ+VvglG7anwWm9qcoSdLA8Z2xklQ4g16SCmfQS1LhDHpJKpxBL0mFM+glqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhTPoJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBXOoJekwhn0klQ4g16SCjei3gUMJY3zflDvEoqyceF59S5BGha8opekwhn0klQ4g16SCmfQS1LhDHpJKpxBL0mFq1nQR8S0iHgsIjZExLxanUeStHc1CfqIOBD4b+A9wMnAjIg4uRbnkiTtXa2u6E8HNmTmbzPzL8BS4PwanUuStBe1CvpjgCe7bLdVbZKk/axWt0CIbtryNR0i5gBzqs0XIuKxGtUyHB0OPFPvInoSX6l3BaoDfzYH1nG96VSroG8DxnXZHgts6tohM28Gbq7R+Ye1iFidmc31rkPalT+b9VGroZufA+Mj4viIOAi4DLirRueSJO1FTa7oM3NHRFwJ/B9wIPDtzFxXi3NJkvauZrcpzsx7gHtqdXztlUNiGqz82ayDyMyee0mShixvgSBJhTPoJalwBr2kmomIAyLi7+tdx3DnGP0QFxFjgI8CjXR5cT0zZ9erJqmriPhxZv5dvesYzvxw8KHvTuAh4H5gZ51rkbqzIiL+Gfh+emVZF17RD3ER0ZqZTfWuQ9qTiHgeeAMdFyIv0XGLlMzMN9a1sGHEMfqh7+6ImF7vIqQ9ycxDM/OAzGzIzDdW24b8fuQV/RDX5WrpL8D2qtmrJQ0aERHA5cDxmfkfETEOOCozf1bn0oYNg15STUXEIuAV4OzMPCkiDgNWZOY76lzasOGLsQWIiPcD76o2H8jMu+tZj7SLv83MyRHxC4DMfK662aH2E8foh7iIWAh8Cni0enyqapMGi+3Vx4smvDol+JX6ljS8OHQzxEXEw0BTZr5SbR8I/CIzJ9W3MqlDRFwOXApMBhYDHwD+PTP/t66FDSMO3ZThTcAfqvVR9SxE2lVmLomIFmAqHVMrL8jM9XUua1gx6Ie+LwO/iIhVdPwnehcwv74lSbv5DfAnqsyJiGMz84n6ljR8OHRTgIg4CngHHUH/08z8fZ1Lkl4VEZ8APgtsoeNNU51vmHJ4cT8x6Ie4iDgTaM3MP0fEB+kYB/3PzHy8zqVJAETEBjpm3jxb71qGK2fdDH2LgBcj4hTg08DjwK31LUl6jSeBrfUuYjhzjH7o25GZGRHnAzdl5rciYla9i5Ii4t+q1d8CD0TED4Btnc9n5tfqUtgwZNAPfc9HxHzgQ8A/VNMrG+pckwRwaLV8onocVD2gmlOv/cMx+iEuIv4GmAH8PDN/VN1HZEpm3lbn0iQAIuLiXefMd9em2jHoh6jqZmad37yollmtbwM2ANdm5so6lCe9KiLWZObkntpUOw7dDFGZeeienquGb94OLKmW0n4XEe8BpgPHRMRNXZ56I7CjPlUNTwZ9gTJzJ/DLiPiveteiYW0TsBq4GPg1HX9x7qRjPv2/1rGuYcehG0k1ERENwJeAfwE20jGsOA74DrAgM7fveW8NJOfRS6qV64HDgOMyc3JmngqcQMf9mG6oa2XDjFf0kmoiIn4DnLjrB4JXryH9KjPH16ey4ccrekm1kruGfNW4E+fR71cGvaRaeTQiZu7aWN2T6Vd1qGfYcuhGUk1ExDHA94GXgBY6ruLfAbwOuDAzn6pjecOKQS+ppiLibGACHbNu1vkmvv3PoJekwjlGL0mFM+glqXAGvSQVzqCXpMIZ9JJUuP8HBP+YLYpPQ7EAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "p1 = plotUniqueValuesComparation(df_json, df_other, 'symbol')" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
jsonother
window.localStorage0.6530110.020276
window.sessionStorage0.3077430.009565
HTMLCanvasElement.style0.0208590.000780
window.document.cookieNaN0.342406
window.navigator.userAgentNaN0.149935
window.Storage.getItemNaN0.101712
\n", + "
" + ], + "text/plain": [ + " json other\n", + "window.localStorage 0.653011 0.020276\n", + "window.sessionStorage 0.307743 0.009565\n", + "HTMLCanvasElement.style 0.020859 0.000780\n", + "window.document.cookie NaN 0.342406\n", + "window.navigator.userAgent NaN 0.149935\n", + "window.Storage.getItem NaN 0.101712" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plotTopUsageComparation(df_json, df_other, 'symbol', 3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Above the mean Sample:" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are 2 unique symbol present on the non-json dataset and 5 on the JSONs\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAEOCAYAAACpVv3VAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAEKNJREFUeJzt3X9s1HWex/HXGxitLsgarHciYDXBX0gtpXLe6QkHKAi6q5yL2+Bi0jsbk/PXKSLFixdzHqIxG9bLaY7crihLFu44jRdUjtWDiImny9Si/FDXGFwrioXVrih0i7zvj04JlMJ8kZl+590+H0nTmel3vvOGlme+fOY7U3N3AQDi6Jf2AACAY0O4ASAYwg0AwRBuAAiGcANAMIQbAIIh3AAQDOEGgGAINwAEM6AYOz3ttNO8oqKiGLsGgF4pm83udPfyJNsWJdwVFRXasGFDMXYNAL2SmX2UdFuWSgAgGMINAMEQbgAIpihr3ABKQ3t7u5qbm7V37960R0FOWVmZhg0bpkwm8533QbiBXqy5uVmDBg1SRUWFzCztcfo8d9euXbvU3Nyss88++zvvJ1G4zWybpK8kfStpn7vXfOdHBNBj9u7dS7RLiJlpyJAhamlpOa79HMsR91+5+87jejQAPY5ol5ZCfD94chIAgkl6xO2S1piZS/o3d1/cdQMzq5dUL0kjRowo3IRFUjHvhbRH6FW2LZye9ghIoNA/9/m+7xMmTFBDQ4OmTJly4LZFixbp/fff1xNPPHHE+w0cOFC7d+8uyIzPPPOMHn30Ubm73F11dXWaM2dOQfbdacGCBZo/f35B93k0SY+4L3P3aklXS/o7M7ui6wbuvtjda9y9prw80as2AfRytbW1Wr58+SG3LV++XLW1tT3y+C+99JIWLVqkNWvWaPPmzWpsbNTgwYML/jgLFiwo+D6PJlG43X177vPnkp6TNK6YQwHoHW644QatWrVKbW1tkqRt27Zp+/btuvzyy7V7925NmjRJ1dXVGj16tJ5//vnD7r9u3Tpdc801B67fdtttWrJkiSQpm81q/PjxGjt2rKZMmaJPP/30sPs//PDDeuyxxzR06FBJHafi3XLLLZKkpqYmXXrppaqsrNT111+vL774QlLH/xI637Jj586d6nzfpSVLlmjGjBmaOnWqRo4cqblz50qS5s2bpz179qiqqkqzZs3S119/renTp+viiy/WRRddpBUrVhTgb/JQecNtZt8zs0GdlyVdJWlTwScB0OsMGTJE48aN0+rVqyV1HG3feOONMjOVlZXpueeeU2Njo9auXat77rlH7p5ov+3t7br99tu1cuVKZbNZ1dXV6f777z9su02bNmns2LHd7mP27Nl65JFH9Pbbb2v06NF68MEH8z5uU1OTVqxYoXfeeUcrVqzQxx9/rIULF+qkk05SU1OTli1bptWrV2vo0KHauHGjNm3apKlTpyb6Mx2LJEfcfyLpNTPbKOlNSS+4++qCTwKgVzp4ueTgZRJ31/z581VZWanJkyfrk08+0Y4dOxLt87333tOmTZt05ZVXqqqqSg899JCam5sTz9Ta2qovv/xS48ePlyTdfPPNevXVV/Peb9KkSRo8eLDKysp04YUX6qOPDn9fqNGjR+vll1/Wfffdp/Xr1xdlaSbvk5Pu/qGkiwv+yAD6hOuuu0533323GhsbtWfPHlVXV0uSli1bppaWFmWzWWUyGVVUVBz2Cs8BAwZo//79B653ft3dNWrUKL3++utHfexRo0Ypm81q4sSJiec9+DG7znPiiSceuNy/f3/t27fvsPufe+65ymazevHFF9XQ0KCrrrpKDzzwQOLHT4LTAQEU1cCBAzVhwgTV1dUd8qRka2urTj/9dGUyGa1du7bbo9ezzjpLW7ZsUVtbm1pbW/XKK69Iks477zy1tLQcCHd7e7s2b9582P0bGho0d+5cffbZZ5KktrY2Pf744xo8eLBOPfVUrV+/XpK0dOnSA0ffFRUVymazkqSVK1cm+jNmMhm1t7dLkrZv366TTz5ZN910k+bMmaPGxsZE+zgWvOQd6EPSOm2ztrZWM2bMOOQMk1mzZunaa69VTU2NqqqqdP755x92v+HDh2vmzJmqrKzUyJEjNWbMGEnSCSecoJUrV+qOO+5Qa2ur9u3bp7vuukujRo065P7Tpk3Tjh07NHnyZLm7zEx1dXWSpKefflq33nqrvvnmG51zzjl66qmnJElz5szRzJkztXTp0sRH6vX19aqsrFR1dbVmz56te++9V/369VMmk9GTTz75nf7OjsaSPhlwLGpqarzUf5EC53EXFudxl6atW7fqggsuSHsMdNHd98XMsknfToSlEgAIhnADQDCEG+jlirEciu+uEN8Pwg30YmVlZdq1axfxLhGd78ddVlZ2XPvhrBKgFxs2bJiam5uP+/2fUTidvwHneBBuoBfLZDLH9ZtWUJpYKgGAYAg3AARDuAEgGMINAMEQbgAIhnADQDCEGwCCIdwAEAzhBoBgCDcABEO4ASAYwg0AwRBuAAiGcANAMIQbAIIh3AAQDOEGgGAINwAEQ7gBIBjCDQDBEG4ACIZwA0AwhBsAgkkcbjPrb2ZvmdmqYg4EADi6YznivlPS1mINAgBIJlG4zWyYpOmS/r244wAA8kl6xL1I0lxJ+4s4CwAggbzhNrNrJH3u7tk829Wb2QYz29DS0lKwAQEAh0pyxH2ZpB+Y2TZJyyVNNLNfdt3I3Re7e42715SXlxd4TABAp7zhdvcGdx/m7hWSfizpf939pqJPBgDoFudxA0AwA45lY3dfJ2ldUSYBACTCETcABEO4ASAYwg0AwRBuAAiGcANAMIQbAIIh3AAQDOEGgGAINwAEQ7gBIBjCDQDBEG4ACIZwA0AwhBsAgiHcABAM4QaAYAg3AARDuAEgGMINAMEQbgAIhnADQDCEGwCCIdwAEAzhBoBgCDcABEO4ASAYwg0AwRBuAAiGcANAMIQbAIIh3AAQDOEGgGAINwAEkzfcZlZmZm+a2UYz22xmD/bEYACA7g1IsE2bpInuvtvMMpJeM7OX3P3/ijwbAKAbecPt7i5pd+5qJvfhxRwKAHBkida4zay/mTVJ+lzSr939jW62qTezDWa2oaWlpdBzAgByEoXb3b919ypJwySNM7OLutlmsbvXuHtNeXl5oecEAOQc01kl7v6lpHWSphZlGgBAXknOKik3s+/nLp8kabKkd4s9GACge0nOKjlD0tNm1l8dof8Pd19V3LEAAEeS5KyStyWN6YFZAAAJ8MpJAAiGcANAMIQbAIIh3AAQDOEGgGAINwAEQ7gBIBjCDQDBEG4ACIZwA0AwhBsAgiHcABAM4QaAYAg3AARDuAEgGMINAMEQbgAIhnADQDCEGwCCIdwAEAzhBoBgCDcABEO4ASAYwg0AwRBuAAiGcANAMIQbAIIh3AAQDOEGgGAINwAEQ7gBIBjCDQDB5A23mQ03s7VmttXMNpvZnT0xGACgewMSbLNP0j3u3mhmgyRlzezX7r6lyLMBALqR94jb3T9198bc5a8kbZV0ZrEHAwB075jWuM2sQtIYSW8UYxgAQH5JlkokSWY2UNJ/SbrL3f/QzdfrJdVL0ogRIwo2INAXVcx7Ie0RepVtC6enPUJBJTriNrOMOqK9zN2f7W4bd1/s7jXuXlNeXl7IGQEAB0lyVolJ+rmkre7+0+KPBAA4miRH3JdJ+omkiWbWlPuYVuS5AABHkHeN291fk2Q9MAsAIAFeOQkAwRBuAAiGcANAMIQbAIIh3AAQDOEGgGAINwAEQ7gBIBjCDQDBEG4ACIZwA0AwhBsAgiHcABAM4QaAYAg3AARDuAEgGMINAMEQbgAIhnADQDCEGwCCIdwAEAzhBoBgCDcABEO4ASAYwg0AwRBuAAiGcANAMIQbAIIh3AAQDOEGgGAINwAEQ7gBIBjCDQDB5A23mf3CzD43s009MRAA4OiSHHEvkTS1yHMAABLKG253f1XS73tgFgBAAgVb4zazejPbYGYbWlpaCrVbAEAXBQu3uy929xp3rykvLy/UbgEAXXBWCQAEQ7gBIJgkpwP+StLrks4zs2Yz+5vijwUAOJIB+TZw99qeGAQAkAxLJQAQDOEGgGAINwAEQ7gBIBjCDQDBEG4ACIZwA0AwhBsAgiHcABAM4QaAYAg3AARDuAEgGMINAMEQbgAIhnADQDCEGwCCIdwAEAzhBoBgCDcABEO4ASAYwg0AwRBuAAiGcANAMIQbAIIh3AAQDOEGgGAINwAEQ7gBIBjCDQDBEG4ACIZwA0AwhBsAgkkUbjObambvmdkHZjav2EMBAI4sb7jNrL+kf5V0taQLJdWa2YXFHgwA0L0kR9zjJH3g7h+6+x8lLZf0w+KOBQA4kiThPlPSxwddb87dBgBIwYAE21g3t/lhG5nVS6rPXd1tZu8dz2A44DRJO9MeIh97JO0JkBJ+PgvnrKQbJgl3s6ThB10fJml7143cfbGkxUkfGMmY2QZ3r0l7DqA7/HymI8lSyW8kjTSzs83sBEk/lvTfxR0LAHAkeY+43X2fmd0m6X8k9Zf0C3ffXPTJAADdSrJUInd/UdKLRZ4F3WP5CaWMn88UmPthzzMCAEoYL3kHgGAINwAEQ7gBJGJm/czsL9KeA6xxlxwzK5d0i6QKHfTksbvXpTUT0MnMXnf3P097jr4u0Vkl6FHPS1ov6WVJ36Y8C9DVGjP7a0nPOkd9qeGIu8SYWZO7V6U9B9AdM/tK0vfUcVCxRx1vieHufkqqg/UxrHGXnlVmNi3tIYDuuPsgd+/n7hl3PyV3nWj3MI64S8xBRzR/lNSeu5kjGpQEMzNJsySd7e7/ZGbDJZ3h7m+mPFqfQrgBJGZmT0raL2miu19gZqdKWuPul6Q8Wp/Ck5MlyMx+IOmK3NV17r4qzXmAg/yZu1eb2VuS5O5f5N58Dj2INe4SY2YLJd0paUvu487cbUApaM/9OkOXDpy+uj/dkfoelkpKjJm9LanK3ffnrveX9Ja7V6Y7GSCZ2SxJN0qqlvS0pBsk/YO7/2eqg/UxLJWUpu9L+n3u8uA0BwEO5u7LzCwraZI6TgW8zt23pjxWn0O4S8/Dkt4ys7Xq+IdxhaSGdEcCDvFbSX9Qrh9mNsLdf5fuSH0LSyUlyMzOkHSJOsL9hrt/lvJIgCTJzG6X9I+SdqjjRTidL8BhKa8HEe4SY2aXSWpy96/N7CZ1rCX+zN0/Snk0QGb2gTrOLNmV9ix9GWeVlJ4nJX1jZhdLulfSR5KeSXck4ICPJbWmPURfxxp36dnn7m5mP5T0uLv/3MxuTnso9G1mdnfu4oeS1pnZC5LaOr/u7j9NZbA+inCXnq/MrEHSTyT9Ze50wEzKMwGDcp9/l/s4Ifch5c7pRs9hjbvEmNmfSqqV9Bt3fy33XhAT3H1pyqMBMrMfdT1nu7vbUFyEu0Tk3lyq85thuc+eu9wm6QNJ97v7KymMB0iSzKzR3avz3YbiYqmkRLj7oCN9LbdccpGkZbnPQI8ys6slTZN0ppk9ftCXTpG0L52p+i7CHYC7fytpo5n9S9qzoM/aLmmDpB9Jel8d/xv8Vh3nc/99inP1SSyVAMjLzDKS/lnS30rapo4lvOGSnpI0393bj3xvFBrncQNI4lFJp0o6y92r3X2MpHPU8V46j6U6WR/EETeAvMzst5LO7foLgnPPv7zr7iPTmaxv4ogbQBLe3W91zz3/wtFfDyPcAJLYYmazu96Yez+dd1OYp09jqQRAXmZ2pqRnJe2RlFXHUfYlkk6SdL27f5LieH0O4QaQmJlNlDRKHWeVbOYFYekg3AAQDGvcABAM4QaAYAg3AARDuAEgGMINAMH8P/RhZ+yV/TDFAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "p2 = plotUniqueValuesComparation(df_a_json, df_a_other, 'symbol')" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
jsonother
window.localStorage0.822601NaN
window.sessionStorage0.171297NaN
HTMLCanvasElement.ownerDocument0.004006NaN
window.name0.0020760.000182
HTMLCanvasElement.style0.000021NaN
window.document.cookieNaN0.999818
\n", + "
" + ], + "text/plain": [ + " json other\n", + "window.localStorage 0.822601 NaN\n", + "window.sessionStorage 0.171297 NaN\n", + "HTMLCanvasElement.ownerDocument 0.004006 NaN\n", + "window.name 0.002076 0.000182\n", + "HTMLCanvasElement.style 0.000021 NaN\n", + "window.document.cookie NaN 0.999818" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAGxCAYAAACZcfZXAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3XmcXFWZ//HPF0gIOyMEBQIkIoshsoZFgoALyjYwKi4Irgg/V0AFQXEQcVcGFxQVB1AcAREHQUBhBBJ2TQKBBBBBFokghMiOSALP749zK6lUOulKcrvPrdPf9+vVr9S9dbt5iq5+6tyzPEcRgZmZlWW53AGYmVn9nNzNzArk5G5mViAndzOzAjm5m5kVyMndzKxATu5mZgVycjczK5CTu5lZgVbI9R9ee+21Y/To0bn+82ZmPWnq1KmPRsTI/q7LltxHjx7NlClTcv3nzcx6kqT7u7nO3TJmZgVycjczK5CTu5lZgbL1ufdlzpw5zJw5k+eeey53KINqxIgRjBo1imHDhuUOxcwK0ajkPnPmTFZbbTVGjx6NpNzhDIqIYPbs2cycOZMxY8bkDsfMCtFvt4ykMyQ9ImnGIp6XpO9KulvSrZK2XdpgnnvuOdZaa60hk9gBJLHWWmsNubsVMxtY3fS5/wTYczHP7wVsUn0dBvxgWQIaSom9ZSi+ZjMbWP0m94i4GvjHYi7ZHzgrkhuBNSWtW1eAZma25Oroc18feKDteGZ17qFl/cGjj71kWX/EAu772j5dXbfzzjtz/fXX1/rfNjMbTHUk9776FPrcdVvSYaSuGzbccMMa/tMDw4ndzLp2whpL+X1P1BtHhzrmuc8ENmg7HgU82NeFEXFaRIyPiPEjR/ZbGiGbVVddlYceeohdd92VrbfemnHjxnHNNdcAcM455/CqV72KcePGccwxxyzwPccddxxbbbUVO+20Ew8//HCu8M3MaknuFwHvqWbN7AQ8ERHL3CWT29lnn82b3vQmpk2bxi233MLWW2/Ngw8+yDHHHMOVV17JtGnTmDx5Mr/+9a8BeOaZZ9hpp5245ZZb2HXXXfnxj3+c+RWY2VDWzVTIc4AbgM0kzZR0iKQPSfpQdcmlwD3A3cCPgY8MWLSDaPvtt+fMM8/khBNOYPr06ay22mpMnjyZ3XffnZEjR7LCCitw0EEHcfXVVwMwfPhw9t13XwC222477rvvvozRm9lQ12+fe0Qc2M/zAXy0togaYtddd+Xqq6/mkksu4d3vfjdHH300q6+++iKvHzZs2Lwpjcsvvzxz584drFDNzBbi2jKLcP/997POOutw6KGHcsghh3DTTTex4447MmnSJB599FFeeOEFzjnnHHbbbbfcoZqZLaRR5Qc6dTt1sW6SmDhxIt/85jcZNmwYq666KmeddRbrrrsuX/3qV3nta19LRLD33nuz//77Z4nRzGxxlHpVBt/48eOjc7OOO+64g1e+8pVZ4mmZPXs22267Lfff31U9/No04bWb2VIY5KmQkqZGxPj+rnO3TJsHH3yQV7/61Rx11FG5QzEzWyaN7pYZbOuttx5//vOfc4dhZrbM3HI3MyuQk7uZWYGc3M3MCuTkbmZWoGYPqC7tFKNF/rylm3r0+OOPc/bZZ/ORj6TKChMnTuSkk07i4osvrjM6M7PauOXehccff5xTTz21tp/n0gRmNtCc3Ptw8sknM27cOMaNG8e3v/1tjj32WP7yl7+w9dZbc/TRRwPw9NNPc8ABB7D55ptz0EEH0VoMNnXqVHbbbTe222473vSmN/HQQ6lA5u67785nP/tZdtttN77zne9ke21mNjQ0u1smg6lTp3LmmWfyhz/8gYhgxx135H/+53+YMWMG06ZNA1K3zM0338xtt93Geuutx4QJE7juuuvYcccd+fjHP86FF17IyJEj+cUvfsFxxx3HGWecAaQ7gEmTJuV8eWY2RDi5d7j22mt585vfzCqrrALAW97ylnkbdbTbYYcdGDVqFABbb7019913H2uuuSYzZsxgjz32AOCFF15g3XXnbyf7jne8YxBegZmZk/tCuq21s+KKK8573CrxGxFsscUW3HDDDX1+T+sDw8xsoLnPvcOuu+7Kr3/9a5599lmeeeYZLrjgAiZMmMBTTz3V7/duttlmzJo1a15ynzNnDrfddttAh2xmtpBmt9wHeAPZvmy77ba8733vY4cddgDggx/8INtttx0TJkxg3Lhx7LXXXuyzT9+liIcPH87555/P4YcfzhNPPMHcuXM58sgj2WKLLQbzJZiZueRvUwzl127W01zy18zMBouTu5lZgRqX3HN1E+U0FF+zmQ2sRiX3ESNGMHv27CGV7CKC2bNnM2LEiNyhmFlBGjVbZtSoUcycOZNZs2blDmVQjRgxYt6CKDOzOjQquQ8bNowxY8bkDsPMrOc1qlvGzMzq4eRuZlYgJ3czswI5uZuZFcjJ3cysQE7uZmYFcnI3MytQo+a5d2P0sZcs1ffd97W+y/SamZWoq5a7pD0l3SnpbknH9vH8hpKuknSzpFsl7V1/qGZm1q1+k7uk5YHvA3sBY4EDJY3tuOxzwHkRsQ3wTuDUugM1M7PuddNy3wG4OyLuiYjngXOB/TuuCWD16vEawIP1hWhmZkuqmz739YEH2o5nAjt2XHMCcLmkjwOrAG+oJTozM1sq3bTc1ce5zpq8BwI/iYhRwN7AzyQt9LMlHSZpiqQpQ63yo5nZYOomuc8ENmg7HsXC3S6HAOcBRMQNwAhg7c4fFBGnRcT4iBg/cuTIpYvYzMz61U1ynwxsImmMpOGkAdOLOq75K/B6AEmvJCV3N83NzDLpN7lHxFzgY8BlwB2kWTG3STpR0n7VZZ8CDpV0C3AO8L4YStspmZk1TFeLmCLiUuDSjnPHtz2+HZhQb2hmZra0XH7AzKxATu5mZgVycjczK5CTu5lZgZzczcwK5ORuZlYgJ3czswI5uZuZFcjJ3cysQE7uZmYFcnI3MyuQk7uZWYGc3M3MCuTkbmZWICd3M7MCObmbmRXIyd3MrEBO7mZmBXJyNzMrkJO7mVmBnNzNzArk5G5mViAndzOzAjm5m5kVyMndzKxATu5mZgVycjczK5CTu5lZgZzczcwK5ORuZlYgJ3czswI5uZuZFcjJ3cysQF0ld0l7SrpT0t2Sjl3ENW+XdLuk2ySdXW+YZma2JFbo7wJJywPfB/YAZgKTJV0UEbe3XbMJ8BlgQkQ8JmmdgQrYzMz6103LfQfg7oi4JyKeB84F9u+45lDg+xHxGEBEPFJvmGZmtiS6Se7rAw+0Hc+szrXbFNhU0nWSbpS0Z18/SNJhkqZImjJr1qyli9jMzPrVTXJXH+ei43gFYBNgd+BA4L8lrbnQN0WcFhHjI2L8yJEjlzRWMzPrUjfJfSawQdvxKODBPq65MCLmRMS9wJ2kZG9mZhl0k9wnA5tIGiNpOPBO4KKOa34NvBZA0tqkbpp76gzUzMy6129yj4i5wMeAy4A7gPMi4jZJJ0rar7rsMmC2pNuBq4CjI2L2QAVtZmaL1+9USICIuBS4tOPc8W2PA/hk9WVmZpl5haqZWYGc3M3MCuTkbmZWICd3M7MCObmbmRXIyd3MrEBO7mZmBXJyNzMrkJO7mVmBnNzNzArk5G5mViAndzOzAjm5m5kVyMndzKxATu5mZgVycjczK5CTu5lZgZzczcwK5ORuZlYgJ3czswI5uZuZFcjJ3cysQE7uZmYFcnI3MyuQk7uZWYGc3M3MCuTkbmZWICd3M7MCObmbmRXIyd3MrEBO7mZmBXJyNzMrUFfJXdKeku6UdLekYxdz3QGSQtL4+kI0M7Ml1W9yl7Q88H1gL2AscKCksX1ctxpwOPCHuoM0M7Ml003LfQfg7oi4JyKeB84F9u/jui8C3wCeqzE+MzNbCt0k9/WBB9qOZ1bn5pG0DbBBRFxcY2xmZraUuknu6uNczHtSWg74FvCpfn+QdJikKZKmzJo1q/sozcxsiXST3GcCG7QdjwIebDteDRgHTJR0H7ATcFFfg6oRcVpEjI+I8SNHjlz6qM3MbLG6Se6TgU0kjZE0HHgncFHryYh4IiLWjojRETEauBHYLyKmDEjEZmbWr36Te0TMBT4GXAbcAZwXEbdJOlHSfgMdoJmZLbkVurkoIi4FLu04d/wirt192cMyM7Nl4RWqZmYFcnI3MyuQk7uZWYGc3M3MCuTkbmZWICd3M7MCObmbmRXIyd3MrEBO7mZmBXJyNzMrkJO7mVmBnNzNzArk5G5mViAndzOzAjm5m5kVyMndzKxATu5mZgVycjczK5CTu5lZgZzczcwK5ORuZlYgJ3czswI5uZuZFcjJ3cysQE7uZmYFcnI3MyuQk7uZWYGc3M3MCuTkbmZWICd3M7MCObmbmRXIyd3MrEBO7mZmBeoquUvaU9Kdku6WdGwfz39S0u2SbpV0haSN6g/VzMy61W9yl7Q88H1gL2AscKCksR2X3QyMj4gtgfOBb9QdqJmZda+blvsOwN0RcU9EPA+cC+zffkFEXBURz1aHNwKj6g3TzMyWRDfJfX3ggbbjmdW5RTkE+G1fT0g6TNIUSVNmzZrVfZRmZrZEuknu6uNc9HmhdDAwHvhmX89HxGkRMT4ixo8cObL7KM3MbIms0MU1M4EN2o5HAQ92XiTpDcBxwG4R8a96wjMzs6XRTct9MrCJpDGShgPvBC5qv0DSNsCPgP0i4pH6wzQzsyXRb3KPiLnAx4DLgDuA8yLiNkknStqvuuybwKrALyVNk3TRIn6cmZkNgm66ZYiIS4FLO84d3/b4DTXHZWZmy6Cr5G6DZ/SxlyzV9933tX1qjsTMepnLD5iZFcjJ3cysQE7uZmYFcnI3MyuQk7uZWYGc3M3MCuTkbmZWICd3M7MCObmbmRXIyd3MrEBO7mZmBXJyNzMrkJO7mVmBnNzNzArk5G5mViAndzOzAjm5m5kVyMndzKxATu5mZgVycjczK5CTu5lZgZzczcwK5ORuZlYgJ3czswI5uZuZFcjJ3cysQE7uZmYFcnI3MyuQk7uZWYGc3M3MCuTkbmZWoK6Su6Q9Jd0p6W5Jx/bx/IqSflE9/wdJo+sO1MzMutdvcpe0PPB9YC9gLHCgpLEdlx0CPBYRrwC+BXy97kDNzKx73bTcdwDujoh7IuJ54Fxg/45r9gd+Wj0+H3i9JNUXppmZLYkVurhmfeCBtuOZwI6LuiYi5kp6AlgLeLSOIK0co4+9ZKm+776v7VNzJPUr+bVZ7+kmuffVAo+luAZJhwGHVYdPS7qzi/9+LfR11qbgDxu/vt5V8mur+PX15QtL3bmxUTcXdZPcZwIbtB2PAh5cxDUzJa0ArAH8o/MHRcRpwGndBFY3SVMiYnyO//Zg8OvrXSW/NvDry6WbPvfJwCaSxkgaDrwTuKjjmouA91aPDwCujIiFWu5mZjY4+m25V33oHwMuA5YHzoiI2ySdCEyJiIuA04GfSbqb1GJ/50AGbWZmi9dNtwwRcSlwace549sePwe8rd7QapelO2gQ+fX1rpJfG/j1ZSH3npiZlcflB8zMCuTkbmZWICd3M7MCObn3KCUHSzq+Ot5Q0g6546qTpCu6OWfNJGklSZvljmMgSNpI0huqxytJWi13TJ2KTe6SXirpdEm/rY7HSjokd1w1OhV4NXBgdfwUqcBbz5M0QtJLgLUl/Zukl1Rfo4H18kZXj9Lfn5L+HZgG/K463lpS5/qYniTpUFINrR9Vp0YBv84XUd+KTe7AT0hz81vJ4M/Akdmiqd+OEfFR4DmAiHgMGJ43pNr8P2AqsHn1b+vrQgr5AKP89+cJpKKDjwNExDRgdMZ46vRRYALwJEBE3AWskzWiPpSc3NeOiPOAFyEtxgJeyBtSreZU5ZgDQNJIqtfa6yLiOxExBjgqIl4eEWOqr60i4nu546tJ6e/PuRHxRO4gBsi/qgq5AFQlVxo3p7yrRUw96hlJazE/+e0ElPRm+y5wAbCOpC+Tyj58Lm9I9YqIUyTtTGrxrdB2/qxsQdWn9PfnDEnvApaXtAlwOHB95pjqMknSZ4GVJO0BfAT4TeaYFlLsIiZJ2wKnAOOAGcBI4ICIuDVrYDWStDnwelJVzisi4o7MIdVK0s+AjUl9t61WbUTE4fmiqkfp709JKwPHAW8kvT8vA75YrWbvaZKWI21Q1P7a/rtp9bSKTe4w73ZpM9Iv4M6ImJM5pNpUA46dnirsNd4BjG3aH01dSn5/Wn7FdstIekvHqU2rTUSmR8QjOWKq2U2kMsuPkZLDmsBDkh4BDo2IqTmDq8kM4GXAQ7kDqVs1XrI387uc3iiJiDg5a2DLSNJvWEz/c0TsN4jh1ErSeRHxdknT6eM1RsSWGcJapGKTO+m26dXAVdXx7sCNpCR/YkT8LFdgNfkdcEFEXAYg6Y3AnsB5pGmSnbtl9aK1gdsl/RH4V+tkLyeINr8hzXSaTiED4ZWTcgcwgI6o/t03axRdKrZbpmpBfDAiHq6OXwr8APggcHVEjMsZ37Lqa4OA1jlJ0yJi61yx1UXSbn2dj4hJgx1L3STd2rSWXp0k7QtcGhElfXABIGmviPhtx7kPRcQPc8XUl5KnQo5uJfbKI8CmEfEPoIS+zX9IOqZaKbeRpE8Dj1W3+0X8QVVJ/D5gWPV4Mqk7qgS/re62SvVO4C5J35D0ytzB1Ow/Jb2udSDpGGD/jPH0qeRumWskXQz8sjp+K3C1pFWoFlb0uHcBnyetjBNwbXVueeDtGeOqTbUS8DDgJaRZM+sDPyTNEOp1NwIXVDMv5pB+hxERq+cNqx4RcbCk1UkrqM+UFMCZwDkR8VTe6JbZfsDFko4mdYVuXp1rlJK7ZURK6BOYn/x+VerMixJJmkZa5fiHiNimOjc9Il6VN7JlJ+ke4D9IA/zFviclrQ0cTFp9ewfwCuC7EXFK1sCWkaR1gN+TVk5/oIm/w2Jb7tX/7POrr+JUK1I/DWwBjGidj4jXLfKbes+/IuL59Dnd3JWAS+kuYEYTk0IdJO0HvJ90x/UzYIeIeKSa/34HaY5/T5H0FAu+/4YDLwcOkNS4u65ik3u14u8U4JWkX8LywDNN+wUsg58DvyCN3H+ItEH5rKwR1a8nVgIupYeAiVXhsPaZQD09FbLNAcC3IuLq9pMR8aykD2SKaZlEROMqPy5OyQOq3yP1990FrESaJdNzrYXFWCsiTgfmRMSkiPgAsFPuoGp2LOkDazqpmNillFNi4V7gClLDY7W2r1I81JnYJX0dICJ6vmyzpP0knVR9NXJqZMl97q1pgfOmnEm6PiJ2zh1bHSTdGBE7SbqMVGfmQeD8iNg4c2hmSLopIrbtOFfE9E9JXwO2J909Q2pETo2IY/NFtbBiu2WAZyUNB6ZJ+gbpNniVzDHV6UuS1gA+RbojWR34RN6Q6lW1iL4IbER6rxYzo6TUMRNJHyZ1n20sqb1OzmrAdXmiqt3ewNatOfySfgrcTLrTbIySk/u7Sd1OHyMlvQ1Is2d6XjWXfZOIuJhUSfC1mUMaKN8G3kKZM0pKHTM5G/gt8FUWTHZPVWtMSrEm0Ho9a+QMZFGK7Japkt9PI+Lg3LEMFElXRUSpSR1IrxF4faGrHKdGxHYd3YaTIqLPVbm9RtLGwMyI+Jek3YEtgbMioufXmEg6EPgaqbSJgF2Bz0TEuVkD61Bkyz0iXpA0UtLw9qL6hble0vdIrb9nWicjopQVnJC6LS6VNInyZpS0Vkk/JGkf0pjJqIzx1O1XwHhJrwBOBy4iter3zhpVDSLiHEkTSf3uAo6JiL/njWphRSb3yn3AddW+je3Jr4TEANAaGD6x7VwAPd1n2+HLwNOkPulSthBsKX3M5MWImFtVZ/12tfHKzbmDqtH2pBY7pHIfjZuiW3Jyf7D6Wo6yppgBUHqXTOUlEVFk/ZVqvATKHTOZU3VfvAf49+rcsIzx1KaP2TKHS9o5Ij6TMayFFNnn3k7SaqQZFk/njqVOVavv88xvPUwCTixp38rqj+jKiLg8dyx1q2bLHMrCWwj25AKfTpLGkgaKb6i6McYA74iIr2UObZlVs4DaZ8ssD9zctGmexSZ3SeNIy55bOxY9CrwnIm7LF1V9JP2KtJnFT6tT7wa2iojOTUp6VrXcexVSf3tRxbUkXQ9cQ6pNMm9j7Ij4VbagrCtVct+9Nfun2hVtopP7IKn+eI6LiKuq492BrxS0iGmhmu2l1HEfCobi70rSCRFxQu44lpVny+S3SiuxA0TExKrcbyn+KWmXiLgWQNIE4J+ZY6qVpF37Ot+5rL1HXSxp74i4NHcgg6iErR97ZrZMyS33C0gbO7S20zsYGB8R/5EvqvpI2go4i/kLKB4D3hsRty76u3pLtZtWywhS+d+pvb6KE8rucoLU2IiI6/o714skvZk0FvREdbwmqZvm13kjW1DJyf3fgC8Au1SnrgZOKGERBYCkMRFxb7UhAhHxZOtc7tgGiqQNgG9ExIG5Y7HFW0RtmYXO9aJFdIne3NpzoClK7pZ5Q0Qc3n5C0tuYvzNTr/sVsG1EPNl27nxgu0zxDIaZQE/vfVs6Sa8mrcEYKemTbU+tTiq7XYK+quk2Lpc2LqAafYaFE3lf53qKpM1JxabWqBaItKxOWwGqEkg6hfmbIywHbA3cki+igVVIy3Y4sCopt7SvL3mSVOO9BFMknQx8n/T+/DgNHE8orltG0l6kJc5vJy3Nb1kdGBsRO2QJrCaS9idtz7YfaUl3y1PAuRFxfZbABoCk97YdzgXuK6HPdiiQtFFE3J87joFQTcz4T+AN1anLgS9HxDOL/q7BV2Jy34rUwjsROL7tqaeAqyLisSyB1UzSqyPihtxxDKTqj+i5iHihOl4eWDEins0b2bKrdiO6JiLuyh3LQJC0KXAUCy/S6vnB8F5RXHJvkTQsIuZIGkbqp/1bRDySO65lJelQ0oKJu5Q2Fz2dVMr4fuB9JRUOk3Qjaezk6ep4VeDyEtYqSDqRNNi/EemW/hpSsp+WNbCaSLoF+CELL9JqXPdFHSQdFhGn5Y6jXXF97pJ+CJwSEbdVS/RvIL25XiLpqIg4J2+Ey+wI4CfV4wOBrUib9G4DfAd4TZ6wBsSI9rIREfG00gbLPS8ijgeQtBKpDMHRpPr1pQw6zo2IH+QOYhApdwCdStxD9TVtJQbeD/w5Il5FmkXy6Xxh1WZuRLTKxe5LqpE9OyJ+T1k7TQE8I2neAKOk7ShkoZakz1WbY18OvILUhVFSyd/fSPqIpHUlvaT1lTuoOlR1cjo1rv5RcS13oL1++x5Us2Mi4u+pF6PnvShpXdKipdeTyuK2rJQnpAFzJPBLSQ9Wx+sC78gYT53eQhokvoRU9O3GiHgub0i1ag2GH912Lkh3mb3uV0DnrKbGTUMuMbk/Xu29+TdgAnAIgKQVKCP5HQ9MId2+X9S6S5G0G3BPzsDqFhGTq6mfm5Fue//UdtfS0yJi26pi6S6kRsiPJT0cEbv08609ISL6at32tF6bhlxicv9/wHeBlwFHttV8eD2pldTTIuJiSRsBq3XM/JlCOa1aIA2KAx9mflnjiZJ+VEKCr6qWvgbYDRgPPEAaVC1CNTbySWDDiDhM0ibAZm117HvRZqSu0DWZX6Me0ky8Q7NEtBjFzpYZCiTtzMJTzc7KFlDNJP03aYOH9rLGL0TEB/NFVQ9Jl5BKYlwDTC7hA6udpF+QZsq8JyLGVQPHN5RQCbNXpiEXl9w7VjUupLMkQa+S9DNgY2Aa86eaRSmvD9J0uojYqr9zvUrScGDT6vDOkhK8pCkRMb695kopv7te2WilxG6ZKbkDGCTjSStuy/p0XtALkjaOiL8ASHo5bXOme1k1RnIWaa9fARtIem8h5YwBnq9a6wEgaWPaNjnvcReS7rh+T4Pfj8Ul94j4af9XFWEGaVzhodyBDKCjgask3UNKgBuRpreW4GTgjRFxJ8xb0XkODZtxsQw+D/yO9KH1c9Lkhvdljag+K0fEMbmD6E9x3TIt1a3TMcBY2kayS1n+LOkqUpmFP9LWIoqI/bIFNQAkrciCs2WKaP1JurVzW7a+zvUySWsBO5F+dzdGxKOZQ6qFpC8B1zd9o5XiWu5tfk4qHLYPaaPe9wKzskZUrxNyBzCQqsTwLmDz6tQdpBklRSR3UmXB05m/mcxBNLCy4DJanzRldwVgV0lExP9mjqkORwCflfQ8aV1NIzdaKbnlPjUitmtvDUmaFBG75Y6tLpJeStrqC+CPJdTOAZD0SuBK4DLgZtIfzzak+eCvi4g/ZQyvFtUdyUdJ89xFmjlzakF3JmcAWwK3AS9Wp6Npg44lKzm53xgRO0m6jDTv/UHg/IjYOHNotZD0duCbwERScngNcHREnJ8zrjpIOh84LyLO6zj/VuBdEfHWPJFZtyTdHhFjc8cxEKqCfQcBYyLii9UOYetGxB8zh7aAkpP7vqQR7Q2AU0iryL4QERct9ht7RFV1b49Wa70aY/h9IVPN7oyIzZb0uV4gaTqLn6pbRJ971eX0XxFxe+5Y6ibpB6S7kddFxCuVtvS8PCK27+dbB1Wxfe5tK+GeAF6bM5YBslxHN8xsyikEt7hNDxq1IcJS2Lf696PVv+197j1fp77NT4EbJP2dNE7S6pcu4cNrx6p8xM0AEfFYtWahUYpN7pJ+ChwR1YbY1afrfxXU5/e7qsupVcL4HUCjR++XwDod+2+2CBg52MHUqbU7kaQJETGh7aljJV1H2mSmBGeQVhRPZ36feynmVBvHtObwj6SBr7HY5A5s2UrsMO/TtVG7ky+LiDi66oOeQEp6p0XEBZnDqsuPWXD/zXb/PZiBDKBVJO0SEdfCvFISJZVs/mspXaB9+C5wAakR8mXS3rCfyxvSwkruc78F2L1VXKuqJT2pqu1uDVe1jA6PiG/ljmUgVLXpzwDWqE49DnyglJ20JJ1KKrD1GxZch1HCVMhWhcjXkxpWV0TEHZlDWkjJyf09wGdIdZYB3kbaxPZni/6u5pN0bUTsIukpFhyYa+Rc22Uh6aqIKHG8ZB5Jq5P+Dp/IHUudJJ3Zx+lipkJW3bwbsGBtmUZ9MBeb3AEkjQVaK1KvLHHkvmTVLe8apMVo8wZSm/ZHtDSqee6e+py4AAAUY0lEQVRvZeHiU6X0uRdL0hdJpRT+wvwGVjRt9XvJfe6QysWK9AsYljmWWlWFmGZGxL8k7U5aMHJW+zhDAVobYbcnvGD+B3Yvu5A0k2sq5ay6naeqlfMD4KVVyd8tgf0i4kuZQ6vD24GNI+L5fq/MqNiWu6QjSGU5f0VK8G8mDTqekjWwmkiaRqoMOZq0kvMi0mYIe+eMy7ojaUZEjMsdx0CRNIlU+O1HbSV/i3jNkn4FfLjpK8JLbrkfQpqP+gyApK8DN5AWNJXgxYiYK+nNwLcj4pTWvNtSVOUVvgKsFxF7Vd1sr46I0zOHVofrJb0qIqbnDmSArBwRf+zYt3hurmBq9lXgZkkzaHDRvpKTu1iw1vIL1blSzJF0IKkgWmvLr6K6noCfAGcCx1XHfyb1v5eQ3HcB3ifpXspb5APwaNV12JoLfgDllKf+KfB1Gj6Hv+TkfibwB0mtud//QRlJoeX9pGqXX46IeyWNAf4nc0x1WzsizpP0GYDqTqWxmyMsob1yBzDAPgqcBmwu6W/AvcDBeUOqzaMR8d3cQfSn2D53AEnb0lZ1LyKK6rZoaU3Liohbc8dSJ0kTSTNK/q9a7r0T8PVeruwpafWIeLJad7GQiPjHYMc0kCStQiqV8VTuWOoi6WTS3dZFLNgt06hZXMUl90X90bSU8sdTJb79SHdf00i16idFRF/L9ntS9eF8CjCOtPPUSOCAXv4Qk3RxROxbdccEC3YVRkS8PFNotVhE2Yh5IuLkwYploFQb5XTyVMhBMJUF/2han16tKZE9/cfTZo2qBfhB4MyI+Lyknk16fYmIm6q9Rls7MfX8JtIR0Socdi2phvs1JdSnb7OoshHF6JWFdcW13IeKqnTsG0mDO8dFxOTStmmDeTVXRrPgQp+zsgVUE0mvI3UZvobU4LiZlOi/kzUw65ek4/s637QFaKWUiB2KTiTNb/9LldhfDtyVOaZaSfoZcBIpCW5ffY3PGlRNIuJK4MvAf5KKoW0PfDhrUDWQdF7b4693PHf54Ec0IJ5p+3qBNDg+OmdAfRlSLXdJN0XEtrnjsO5IugMYGwW+SSVdQaoCeQNpU5lrm74ophuSbm5btLTA31v7cyWpSklcFBFvyh1LuyHVci8psUvaVNIV1UIKJG0pqXFlR5fRDOBluYMYILeSNlceRyodMU7SSnlDqsXiPoiL+5CurEwDx/JKHFAFQNIHSH2YRXVVtPkx1fJugIi4VdLZQAm1O1rWBm6X9EcavBJwaUTEJwAkrUpas3Am6YNsxZxx1WDlat+E5YCVqseqvkr48OrcKnF50iyuRvW3Q8HJndQHdrCkjUgzaK4hJftpWaOqT8nLu1tOyB3AQJH0MdJg6nbA/aTa7tdkDaoeDwGt6Y5/b3vcOi7Bvm2P5wIPR0Tj/vaKTe4RcTxAdat7KKmV+23SJ20JSl7e3bIx5d59rURKfFObmBiWVq9ME1xG6wK3tRZmSVpV0hYR8YfMcS2g2AHVqv95ArAqaZrZtaREUUQCrGbHnEYqi/sY1fLuiLgvZ1x1knQiaaZMqXdfxZL0NuB3EfFU9be4LfDFElaJVwX6tm0N9EtaDpjStDG9kpP7TaRbpkuAScCNEfFc3qjqV+Ly7k5td19HAetHRCl3X8VqrbmQtAupiuJJwGcjYsfMoS0zSdMiYuuOc41bY1LsbJnqU/T1wB+BPYDpkq7NG1V9JB1RbdH2LPAtSTdJemPuuOok6XOSfgtcDryClNxH5Y3KutQq8LYP8IOIuBAYnjGeOt0j6XBJw6qvI4B7cgfVqdjkLmkcqQrde4F3ADOBK7MGVa8PRMSTpFWq65BmXHwtb0i1ewuwFvB74H9Jc4mL6FYbAv4m6UekXYsureaCl5JvPkTqDv1b9bUjcFjWiPpQcrfMJVS1O4DJvV6TpFPbbe93gIkRcUGJi0QkrUbqd9+FlCgejohd8kZl/ZG0MrAnMD0i7pK0LvCqiChllWrjFZvcASQNBzatDnu+6FS7anf59YExwFakWUATI2K7rIHVqLr7eg2wG6nswAOkAdU+a3tY80haBxjROo6Iv2YMpxaSRpGqlU4gzVa7FjgiImZmDaxDscm9qiZ4FnAfaQHFBsB7I+LqnHHVpRqh3xq4JyIel7QWabCxmMqQpd99lUzSfsB/AesBjwAbAn+KiC2yBlYDSf8HnA38rDp1MHBQROyRL6qFlZzcpwLviog7q+NNgXNKatkOBSXffZVM0i3A64DfR8Q2kl4LHBgRjeubXlKLmC2z0LncShng6MuwVmIHiIg/U94eowuopn8Wo7r7ugv4PnAq8GdJu+aNyro0JyJmA8tJWi4iriLdaZbgUUkHS1q++joYmJ07qE7FrlAFpkg6nfm3TgeRFsIUq2mLKGpwMvDGzrsv0pJ9a7bHq7o5VwM/l/QI5ZTH+ADwPeBbpD7366tzjVJyt8yKpE165+2hCpwaEf9a7Df2iCFQGK3PhSFNXCxiC6sW1/2T1DtwELAG8POqNW+DoNjkXrqhsDRf0hmkllH73dcKEfH+fFFZNyR9Avhl02aQLAtJp7CYssURcfgghtOv4pJ7RznOhZTW6it5aX7pd18lk/R50rqEfwDnAudHxMN5o1o2kt5bPZwAjAV+UR2/jVQA7hNZAluEEpP7RtXDj1b/trf6nm3aPodLq/TCaFYGSVuSVoi/FZgZEW/IHNIyk3QVaSxoTnU8DLi8aRUxi5stExH3R8T9wISI+HRETK++jgUatQ3WMhqSS/MlnZA7Blsij5DquM8mlckowXrAam3Hq1bnGqW45N5mlaoiHQCSdibtWVmE0gujLUbRM55KIenDkiYCV5B21Dq0oC7RrwE3S/qJpJ8ANwFfyRvSwkqeCnkIcIakNarjx2ngdKWltail+VmDqpmkCRFxXcfpf2QJxpbURsCRJQ3wt0TEmVW10lb54mMjonG7TBXX596pKouriHgidyx1GgpL8yXd1Dl3v69z1kySlgdeSlsjsoTaMr2i2JZ7NdPiraS9VFdo7TVayoBqROzTtjR/M0nFLM2X9GpSSdWRkj7Z9tTqlLNNYtGqPWJPAB4GXqxOB1BK18wCmtjoKDa5AxcCT5D6aIubOtdXYTRJpRRGG04apFqBBQeungQOyBKRLakjgc2GyqKlpiV2KLhbRtKMiBiXO46BMhQKo0naqJr5ZD2mmi64R0mbf7f0yurwklvu10t6VURMzx3IAFmoMFo137YkK0o6japrrXUyIl6XLSLr1j3AxGpsaN6dc0ScnC+k2owGDq7W1DR2dXjJLffbSftu3kt6cwmIUqZjDYWl+VXZ2B+S/oBae3ISEZ4O2XDVCtWFRMQXBjuWgdL01eElJ/eN+jpfym3+UFiaL2lqSd1MVoZeWR1eXHKXtHpEPCnpJX09HxGeJ90jqtWojwAXsOCtvX+HDSdpJPBpYAsW3Gav57vUqn0T5gKXAJOAGyPiubxRLazE5H5xROwr6V5St4Xano6IeHmm0GoxlAqjVb/DTj3/OxwKJF1OKqx1FPAh4L3ArIg4JmtgNemFjduLG1CNiH2rh9dSLfKJiD9lDKlurdfXZ2G0wQ9n4ETEmNwx2FJbKyJOl3REREwCJkmalDuoOvTK6vDiWu4tkl5H+lR9DfByUt/YNRHxnayB1UTSdRExob9zvUzSysAngQ0j4jBJm5DmTl+cOTTrh6QbI2InSZcB3wUeJJX93ThzaMusV1aHF5vcYd7y5+2B15JuDf8ZEZvnjaoekqYBH4uIa6vjnUkDqqXsU4mkX5BmyrwnIsZVsxNuKOk1lkrSvqTktwFwCml18Rci4qKsgdWkFzZuLza5S7qCVAXyBtKb7NqIeCRvVPWRtB1wBmn7MqgKo0VEMZtkS5oSEeMl3RwR21TnbomIrXLHZkNXX6vDgcatDi+uz73NraSNlMeRyhA8LumGiPhn3rDqUc313qrUwmiV56vWegBI2pgCS0mURNI3gHsi4ocd5z8BvKyQAdWe2Li92JZ7S7UD+/tJo/Yvi4gVM4dUi87CaK3zpRRGA5C0B/A50pZml5PmFr8vIibmjMsWrVo8OC4iXuw4vxxwawklQXpl4/ZiW+5VVbrXkD5N7yd1YTRuRHsZFF0YDSAi/q+aU7wT6fb3iIh4NHNYtnjRmdirky+qVZq1902RdDoLzlRr3KrpYpM7sBLp9mlqicWLgFERsWfuIAbB+qQyvysAu0oiIv43c0y2aM9K2qSzqFY106mILlHgw6SpyIfTtjo8a0R9KL5bplRVQa1TCi6M1qqfsyVwG201wSOimB21SiNpL9LsmC8xvzU7HvgMaWemS3PFNtQ4ufeo0gujQXqNETE2dxy2ZKpFPkeTJjMAzABO6vWGSK+tDi+5W6Z0e+UOYBDcIGlsRNyeOxDrXkTMIJUbWICkkyLiqAwh1aWnVoe75d5jhlJhNEm7Ar8B/k6hdydDiaS/RsSGueNYVr2yOtwt995zNqkFMZU+CqORSi2U4gzg3cB05ve5W+8qZbbMKpJ26VgdvkrmmBbi5N5jhkBhtHZ/LWW5+lCxqDtKUmIvJbkfApwhaYHV4Rnj6ZO7ZXpU6YXRACSdCqxJ6pppr+fuqZANtYhS2/OUVOmz6avDndx7WMmF0QAkndnHaU+FtKx6ZXW4u2V6VB+F0bYvqTAaQEn7wQ4VkrZd3POFFLbridXhTu69q+jCaDCvINMPgJdWJX+3BPaLiC9lDs0WbQpp0dms6rhzwL/nt9mjR1aHu1umx5VaGA2g2rnnaOBHbSV/Z5RQfKpUVfXHt5IaHOcCF0TE03mjqlevrA53cu9RfRRGa82cuTJrYDWSNDkitu+o5z7Nm3U0n6QxwIHA/qT351ciYlreqOrRK6vD3S3Tu0ovjAbwaFXDvVXP/QDgobwhWTci4l5JF5Lep+8m7VpURHKnR1aHu+VujSXp5cBpwM7AY6SW0sERcV/OuGzRqt/ZO0kt9gdIXTMXR8RzWQOrQa+tDndyt8aTtAqwXEQ8lTsWWzxJL5IG+y8EnqSj0FZEnJwjrjpIujgi9l3EXP6IiEatDne3jDWOpE8u4jzQ2wliCPhC2+NVs0UxAHptdbiTuzXRarkDsKU2OyK+lzuIAXYmaXX4KVU3VCNXh7tbxsxqI+mmiFjsQqYS9MLq8OVyB2DWSdJ5bY+/3vHc5YMfkdl81erw64B3AHeSVoc3KrGDk7s10yZtj/foeG7kYAZiS2xLSU/28fWUpCdzB1eTW4HnSavDtwTGSVopb0gLc5+7NdHi+grdj9hs01sLzkoVEZ+ABVaHnwm8DGjU6nAnd2uilSVtQ7qzXKl63KoH3rgWkg0tfawOP4NUvK9RnNytiR4irb6FtMVe+9THvw9+OLYEfpk7gEHQE6vDPVvGzGoj6RQW03UWEYcPYjhDmgdUrbEkvU3SatXjz0n636qLxpprCqnO+VRgv7bHrS8bJG65W2NJujUitpS0C/BV4CTgsxGxY+bQrAvt1Txt8Lnlbk32QvXvPsAPIuJCYHjGeGzJuOWYkZO7NdnfJP0IeDtwabV3pd+zZl1wt4w1lqSVgT1Jc6fvkrQu8KqI8CrVhpL0FPNb7CsDz7aeIlVOXD1LYEOQk7s1nqR1gBGt44j4a8ZwzHqCb3GtsSTtJ+ku0iYdk6p/f5s3KrPe4ORuTfZFYCfgzxExBngDqWCTmfXDyd2abE5EzAaWk7RcRFwFeHNssy64/IA12eNVcaargZ9LegRo7HJvsybxgKo1VrV36j9Jd5gHAWsAP69a82a2GE7u1liSPgH8MiJm5o7FrNe4z92abHXgMknXSPqopJfmDsisV7jlbo0naUvSlmZvBWZGxBsyh2TWeG65Wy94hFTHfTawTuZYzHqCk7s1lqQPS5oIXAGsDRwaEVvmjcqsN3gqpDXZRsCRETEtdyBmvcZ97tZokpYHXkpbQ8S1Zcz655a7NVa1EfEJwMPAi9XpANw1Y9YPt9ytsSTdDezoRUtmS84DqtZkDwBP5A7CrBe5W8aa7B5goqRLgH+1TkbEyflCMusNTu7WZH+tvobjvVPNloj73M3MCuSWuzWWpJHAp4EtWHCbvddlC8qsR3hA1Zrs58CfgDHAF4D7gMk5AzLrFe6WscaSNDUitpN0a6vsgKRJEbFb7tjMms7dMtZkc6p/H5K0D/AgMCpjPGY9w8ndmuxLktYAPgWcQqrv/om8IZn1BnfLmJkVyAOq1jiSviHpQ32c/4Skr+eIyazXuOVujSPpdmBcRLzYcX454NaIGJcnMrPe4Za7NVF0Jvbq5IuAMsRj1nOc3K2JnpW0SefJ6tw/M8Rj1nM8W8aa6Hjgt5K+BEytzo0HPgMcmS0qsx7iPndrJEnjgKOBVv/6DOCkiJieLyqz3uHkbj1F0kkRcVTuOMyazsndeoqkv0bEhrnjMGs6D6har/FsGbMueEDVGkfSSxb1FE7uZl1xcrcmmgoEfSfyOX2cM7MO7nM3MyuQW+7WOJK2XdzzEXHTYMVi1qvccrfGkfQicBswq3Wq7enwNntm/XPL3ZroU8BbSaUGzgUuiIin84Zk1lvccrfGkjQGOBDYH7gf+EpETMsblVlv8Dx3a6yIuBe4ELgc2AHYNG9EZr3DLXdrHEkvB95JarE/QOqauTginssamFkPcXK3xqkGVG8ltdqfJM15nyciTs4Rl1kv8YCqNdEX2h6vmi0Ksx7m5G5NNDsivpc7CLNe5gFVa6IP5A7ArNc5uZuZFcgDqtY4kuYCz/b1FGmF6uqDHJJZz3GfuzXR9IjYJncQZr3M3TJmZgVycrcm+mXuAMx6nfvcrXEknULHwqV2EXH4IIZj1pPc525NNKXt8ReAz+cKxKxXueVujSbpZg+umi0597lb07n1YbYUnNzNzArkbhlrHElPMb/FvjLzFzR5EZNZl5zczcwK5G4ZM7MCObmbmRXIyd3MrEBO7mZmBXJyNzMr0P8HbOML3SVICEgAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plotUsageComparation(df_a_json, df_a_other, 'symbol')" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Value CountsValue Counts
Json125
Other2452
\n", + "
" + ], + "text/plain": [ + " Value Counts Value Counts\n", + "Json 12 5\n", + "Other 245 2" + ] + }, + "execution_count": 45, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAEOCAYAAACHE9xHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAFCRJREFUeJzt3X+QnVWd5/H3F2hoHTBiCCyQSEcrKMSEJrQsuzhDhuASgw4/RpAUmtTENWMVqOwSMYlTJTqKaDEOy9ZKFVMqgcpMwmSkoBCZSCoUWOWvdGwhIaIpJ0CTGJqILQhkk/CdP/ppbJJO/7653affr6pb97nnnud5vp3u/vSTc889NzITSVK5Dqt3AZKk2jLoJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqnEEvSYU7ot4FABx33HHZ1NRU7zIkaUxpbW19PjMn9ddvVAR9U1MTGzZsqHcZkjSmRMRTA+nn0I0kFa7foI+IKRGxPiK2RMTmiPhM1X5DRDwbEW3VbV6PfZZFxNaIeDIiLqzlFyBJ6ttAhm72Atdl5saIOAZojYgfVM/9Y2be3LNzRJwOXAlMB04CHoqIUzNz30gWLkkamH6DPjN3ADuq7RcjYgtwch+7XAysyszdwH9ExFbgbOBHgylsz549tLe38+qrrw5mN9VQY2MjkydPpqGhod6lSBqEQb0YGxFNwJnAT4BzgWsiYgGwga6r/hfo+iPw4x67tdPLH4aIWAwsBnj7299+wLna29s55phjaGpqIiIGU6ZqIDPZtWsX7e3tTJ06td7lSBqEAb8YGxFHA/8GXJuZfwBuA94JNNN1xf8P3V172f2ATzfJzNszsyUzWyZNOnB20KuvvsrEiRMN+VEiIpg4caL/w5LGoAEFfUQ00BXyKzPzuwCZuTMz92Xma8A/0TU8A11X8FN67D4Z2D6U4gz50cXvhzQ2DWTWTQDfArZk5jd6tJ/Yo9ulwKZq+z7gyog4KiKmAtOAn45cyZKkwRjIGP25wMeAxyOirWpbDsyPiGa6hmW2AX8LkJmbI+Ju4Am6ZuxcPRIzbpqWfm+4h3iDbTdd1Ofzs2fPZtmyZVx44Z9mh95yyy386le/4pvf/OZB9zv66KN56aWXRqTGO++8k69//etkJpnJokWLWLJkyYgcu9uNN97I8uXLR/SYOvRG+vejVvr7vVNt9HtFn5k/zMzIzJmZ2VzdHsjMj2XmjKr9r6rZOd37fCUz35mZ78rM79f2S6iN+fPns2rVqje0rVq1ivnz5x+S83//+9/nlltuYe3atWzevJmNGzcyYcKEET/PjTfeOOLHlDS6+M7Yg/jwhz/M/fffz+7duwHYtm0b27dv533vex8vvfQSc+bMYdasWcyYMYN77733gP0ffvhhPvjBD77++JprruGOO+4AoLW1lfPOO4+zzjqLCy+8kB07dhyw/1e/+lVuvvlmTjrpJKBrauMnPvEJANra2jjnnHOYOXMml156KS+88ALQ9b+Q7qUknn/+ebrXD7rjjju47LLLmDt3LtOmTeP6668HYOnSpbzyyis0Nzdz1VVX8cc//pGLLrqIM844g/e85z2sXr16BP4lJdWbQX8QEydO5Oyzz+bBBx8Euq7mP/KRjxARNDY2cs8997Bx40bWr1/PddddR+YBE4t6tWfPHj71qU+xZs0aWltbWbRoEZ///OcP6Ldp0ybOOuusXo+xYMECvva1r/HYY48xY8YMvvjFL/Z73ra2NlavXs3jjz/O6tWreeaZZ7jpppt405veRFtbGytXruTBBx/kpJNO4he/+AWbNm1i7ty5A/qaJI1uBn0feg7f9By2yUyWL1/OzJkzueCCC3j22WfZuXPngI755JNPsmnTJt7//vfT3NzMl7/8Zdrb2wdcU2dnJ7///e8577zzAFi4cCGPPPJIv/vNmTOHCRMm0NjYyOmnn85TTx24FtKMGTN46KGH+NznPsejjz5ak6EiSYeeQd+HSy65hHXr1rFx40ZeeeUVZs2aBcDKlSvp6OigtbWVtrY2TjjhhAPmlx9xxBG89tprrz/ufj4zmT59Om1tbbS1tfH444+zdu3aA849ffp0WltbB1Vvz3PuX89RRx31+vbhhx/O3r17D9j/1FNPpbW1lRkzZrBs2TK+9KUvDer8kkYng74PRx99NLNnz2bRokVveBG2s7OT448/noaGBtavX9/r1fEpp5zCE088we7du+ns7GTdunUAvOtd76Kjo4Mf/ahrRYg9e/awefPmA/ZftmwZ119/Pb/97W8B2L17N7feeisTJkzg2GOP5dFHHwXgrrvuev3qvqmp6fU/DmvWrBnQ19jQ0MCePXsA2L59O29+85v56Ec/ypIlS9i4ceOAjiFpdBsV69EPRL2mZc2fP5/LLrvsDTNwrrrqKj70oQ/R0tJCc3Mz7373uw/Yb8qUKVxxxRXMnDmTadOmceaZZwJw5JFHsmbNGj796U/T2dnJ3r17ufbaa5k+ffob9p83bx47d+7kggsuIDOJCBYtWgTAihUr+OQnP8nLL7/MO97xDr7zne8AsGTJEq644gruuusuzj///AF9fYsXL2bmzJnMmjWLBQsW8NnPfpbDDjuMhoYGbrvttiH9m0kaXWKgLyLWUktLS+7/wSNbtmzhtNNOq1NFOhi/L6OT8+jHp4hozcyW/vo5dCNJhTPoJalwBr0kFc6gl6TCGfSSVDiDXpIKN2bm0XPDCL8d/4bOPp92mWJJpfCK/iBcplhSKQz6g3CZYpcplkph0B+EyxS7TLFUCoO+Dy5T7DLFUgkM+j64TLHLFEslMOj74DLFLlMslWAMTa/sezpkrbhMsaSxzmWKNSh+X0Ynlyken1ymWJIEGPSSVLxRHfSjYVhJf+L3QxqbRm3QNzY2smvXLsNllMhMdu3aRWNjY71LkTRIo3bWzeTJk2lvb6ejo6PepajS2NjI5MmT612GpEEatUHf0NDA1KlT612GJI15o3boRpI0Mgx6SSqcQS9JhTPoJalwBr0kFa7foI+IKRGxPiK2RMTmiPhM1f62iPhBRPy6uj+2ao+IuDUitkbEYxExq9ZfhCTp4AZyRb8XuC4zTwPOAa6OiNOBpcC6zJwGrKseA3wAmFbdFgMugShJddRv0GfmjszcWG2/CGwBTgYuBlZU3VYAl1TbFwN3ZpcfA2+NiBNHvHJJ0oAMaow+IpqAM4GfACdk5g7o+mMAHF91Oxl4psdu7VWbJKkOBhz0EXE08G/AtZn5h7669tJ2wII1EbE4IjZExAaXOZCk2hlQ0EdEA10hvzIzv1s17+wekqnun6va24EpPXafDGzf/5iZeXtmtmRmy6RJk4ZavySpHwOZdRPAt4AtmfmNHk/dByysthcC9/ZoX1DNvjkH6Owe4pEkHXoDWdTsXOBjwOMR0Va1LQduAu6OiI8DTwOXV889AMwDtgIvA38zohVLkgal36DPzB/S+7g7wJxe+idw9TDrkiSNEN8ZK0mFM+glqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhTPoJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBXOoJekwhn0klQ4g16SCmfQS1LhDHpJKpxBL0mFM+glqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9Jhes36CPi2xHxXERs6tF2Q0Q8GxFt1W1ej+eWRcTWiHgyIi6sVeGSpIEZyBX9HcDcXtr/MTObq9sDABFxOnAlML3a55sRcfhIFStJGrx+gz4zHwF+N8DjXQysyszdmfkfwFbg7GHUJ0kapuGM0V8TEY9VQzvHVm0nA8/06NNetUmS6mSoQX8b8E6gGdgB/EPVHr30zd4OEBGLI2JDRGzo6OgYYhmSpP4MKegzc2dm7svM14B/4k/DM+3AlB5dJwPbD3KM2zOzJTNbJk2aNJQyJEkDMKSgj4gTezy8FOiekXMfcGVEHBURU4FpwE+HV6IkaTiO6K9DRPwLMBs4LiLagS8AsyOima5hmW3A3wJk5uaIuBt4AtgLXJ2Z+2pTuiRpIPoN+syc30vzt/ro/xXgK8MpSpI0cnxnrCQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhTPoJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBXOoJekwhn0klQ4g16SCmfQS1LhDHpJKpxBL0mFM+glqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhTPoJalwBr0kFc6gl6TC9Rv0EfHtiHguIjb1aHtbRPwgIn5d3R9btUdE3BoRWyPisYiYVcviJUn9G8gV/R3A3P3algLrMnMasK56DPABYFp1WwzcNjJlSpKGqt+gz8xHgN/t13wxsKLaXgFc0qP9zuzyY+CtEXHiSBUrSRq8oY7Rn5CZOwCq++Or9pOBZ3r0a6/aDhARiyNiQ0Rs6OjoGGIZkqT+jPSLsdFLW/bWMTNvz8yWzGyZNGnSCJchSeo21KDf2T0kU90/V7W3A1N69JsMbB96eZKk4Rpq0N8HLKy2FwL39mhfUM2+OQfo7B7ikSTVxxH9dYiIfwFmA8dFRDvwBeAm4O6I+DjwNHB51f0BYB6wFXgZ+Jsa1CxJGoR+gz4z5x/kqTm99E3g6uEWJUkaOb4zVpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBXOoJekwhn0klQ4g16SCmfQS1LhDHpJKpxBL0mFM+glqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhTPoJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBXOoJekwhn0klQ4g16SCnfEcHaOiG3Ai8A+YG9mtkTE24DVQBOwDbgiM18YXpmSpKEaiSv6v8zM5sxsqR4vBdZl5jRgXfVYklQntRi6uRhYUW2vAC6pwTkkSQM03KBPYG1EtEbE4qrthMzcAVDdH9/bjhGxOCI2RMSGjo6OYZYhSTqYYY3RA+dm5vaIOB74QUT8cqA7ZubtwO0ALS0tOcw6JEkHMawr+szcXt0/B9wDnA3sjIgTAar754ZbpCRp6IYc9BHxZxFxTPc28D+ATcB9wMKq20Lg3uEWKUkauuEM3ZwA3BMR3cf558x8MCJ+BtwdER8HngYuH36ZkqShGnLQZ+ZvgDN6ad8FzBlOUZKkkeM7YyWpcAa9JBXOoJekwhn0klQ4g16SCmfQS1LhDHpJKpxBL0mFM+glqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhTPoJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBXuiHoXMJY0Lf1evUsYkG03XVTvEiSNIl7RS1LhDHpJKpxBL0mFM+glqXAGvSQVzlk3JbphQr0rGJgbOutdgTQu1OyKPiLmRsSTEbE1IpbW6jySpL7VJOgj4nDg/wEfAE4H5kfE6bU4lySpb7Uaujkb2JqZvwGIiFXAxcATNTqfpLHAYcW6qNXQzcnAMz0et1dtkqRDrFZX9NFLW76hQ8RiYHH18KWIeLJGtYw7AccBz9e7jn59sbcfE5XMn80Rd8pAOtUq6NuBKT0eTwa29+yQmbcDt9fo/ONaRGzIzJZ61yHtz5/N+qjV0M3PgGkRMTUijgSuBO6r0bkkSX2oyRV9Zu6NiGuAfwcOB76dmZtrcS5JUt9q9oapzHwAeKBWx1efHBLTaOXPZh1EZvbfS5I0ZrnWjSQVzqCXpMIZ9JJqJiIOi4j/Xu86xjvH6Me4iJgEfAJooseL65m5qF41ST1FxI8y87/Vu47xzGWKx757gUeBh4B9da5F6s3aiPhr4LvplWVdeEU/xkVEW2Y217sO6WAi4kXgz+i6EHmFriVSMjPfUtfCxhHH6Me++yNiXr2LkA4mM4/JzMMysyEz31I9NuQPIa/ox7geV0v/H9hTNXu1pFEjIgK4CpiamX8fEVOAEzPzp3Uubdww6CXVVETcBrwGnJ+Zp0XEscDazHxvnUsbN3wxtgAR8VfAX1QPH87M++tZj7Sf/5qZsyLi5wCZ+UK12KEOEcfox7iIuAn4DF2f3vUE8JmqTRot9lQfL5rw+pTg1+pb0vji0M0YFxGPAc2Z+Vr1+HDg55k5s76VSV0i4irgI8AsYAXwYeDvMvNf61rYOOLQTRneCvyu2h4jH8qp8SIzV0ZEKzCHrqmVl2TmljqXNa4Y9GPfV4GfR8R6un6J/gJYVt+SpAP8GvgDVeZExNsz8+n6ljR+OHRTgIg4EXgvXUH/k8z8bZ1Lkl4XEZ8CvgDspOtNU91vmHJ48RAx6Me4iDgXaMvMP0bER+kaB/0/mflUnUuTAIiIrXTNvNlV71rGK2fdjH23AS9HxBnAZ4GngDvrW5L0Bs8AnfUuYjxzjH7s25uZGREXA7dm5rciYmG9i5Ii4n9Xm78BHo6I7wG7u5/PzG/UpbBxyKAf+16MiGXAx4A/r6ZXNtS5JgngmOr+6ep2ZHWDak69Dg3H6Me4iPgvwHzgZ5n5w2odkdmZeVedS5MAiIjL958z31ubasegH6Oqxcy6v3lR3We1vRvYCnw+M9fVoTzpdRGxMTNn9dem2nHoZozKzGMO9lw1fPMeYGV1Lx1yEfEBYB5wckTc2uOptwB761PV+GTQFygz9wG/iIj/W+9aNK5tBzYAlwO/out/nPvomk//v+pY17jj0I2kmoiIBuArwP8EttE1rDgF+A6wPDP3HHxvjSTn0Uuqla8DxwKnZOaszDwTeAdd6zHdXNfKxhmv6CXVRET8Gjh1/w8Er15D+mVmTqtPZeOPV/SSaiX3D/mqcR/Ooz+kDHpJtfJERCzYv7Fak+mXdahn3HLoRlJNRMTJwHeBV4BWuq7i3wu8Cbg0M5+tY3njikEvqaYi4nxgOl2zbjb7Jr5Dz6CXpMI5Ri9JhTPoJalwBr0kFc6gl6TCGfSSVLj/BI9aQOe2deiMAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "p3 = pd.concat([p1, p2], axis=1, sort=False).drop_duplicates()\n", + "p3.plot(kind='bar')\n", + "p3" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# DOMAINS\n", + "\n", + "Again, the difference for unique values for the whole sample and the filtered one is really big. Only 7.2%[1] of the values remain on the filtered sample for the non-JSON values and 30% for the JSON’s. \n", + "\n", + "---\n", + " For futher investigation: \n", + "1. Only few of domains produce bigger values, why? Do they have something in common? Does that mean that some domains only produce bigger values? \n", + " \n", + "2. What are the top domains commonly used for?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Full Sample" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are 11166 unique script_domain present on the non-json dataset and 3507 on the JSONs\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEOCAYAAABiodtuAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAFdNJREFUeJzt3X+MV/Wd7/HnWxhFqyJF9CpwHczSVhFEnFp321uNWEFtF+uqldiFFLekN9a2e2ut2E1MrVptzNb15mpi1h9oyILLttFY649STN3EWhmcKkgtxKqMKI6CbP1FQd/3jzm4I59BcL4jZ+D7fCST7znv8zln3sOPec35nHO+E5mJJEk97VF3A5KkgcdwkCQVDAdJUsFwkCQVDAdJUsFwkCQVDAdJUsFwkCQVDAdJUmFw3Q301YEHHpitra11tyFJu4z29vZXMnPEjozdZcOhtbWVJUuW1N2GJO0yIuK5HR3rtJIkqWA4SJIKhoMkqbDLXnPozaZNm+js7OTtt9+uuxVVhgwZwqhRo2hpaam7FUkfwm4VDp2dney33360trYSEXW30/Qyk1dffZXOzk7GjBlTdzuSPoTdalrp7bffZvjw4QbDABERDB8+3DM5aRe0W4UDYDAMMP59SLum3S4cJEmN262uOWyt9ZJf9Ovxnr369A/cfuKJJzJnzhymTJnyXu26667jj3/8IzfccMM299t33315/fXX+6XH22+/nZ/85CdkJpnJrFmzuOiii/rl2FtcddVVXHrppf16TO18/f3/o9lt7/vDrsYzh340ffp05s+f/77a/PnzmT59+k75/L/85S+57rrreOCBB1i+fDlLly5l6NCh/f55rrrqqn4/pqSBxXDoR2eddRb33HMPGzduBODZZ59lzZo1fO5zn+P1119n8uTJTJo0ifHjx3PXXXcV+z/00EN88YtffG/9m9/8JrfddhsA7e3tnHDCCRx77LFMmTKFF198sdj/xz/+Mddeey2HHnoo0H0b6de//nUAOjo6OP7445kwYQJf/vKXWb9+PdB9trPlbUheeeUVtrxf1W233caZZ57J1KlTGTt2LBdffDEAl1xyCW+99RYTJ07kvPPO44033uD000/n6KOP5qijjmLBggX98CcpqW6GQz8aPnw4xx13HPfddx/Qfdbwla98hYhgyJAh/PznP2fp0qUsXryY7373u2TmDh1306ZNXHjhhSxcuJD29nZmzZrFD37wg2LcsmXLOPbYY3s9xowZM7jmmmt44oknGD9+PD/84Q+3+3k7OjpYsGABTz75JAsWLGD16tVcffXV7L333nR0dDBv3jzuu+8+Dj30UH7/+9+zbNkypk6dukNfk6SBzXDoZz2nlnpOKWUml156KRMmTODkk0/mhRdeYO3atTt0zKeffpply5bxhS98gYkTJ3LFFVfQ2dm5wz1t2LCB1157jRNOOAGAmTNn8pvf/Ga7+02ePJmhQ4cyZMgQjjzySJ57rnzPrvHjx/OrX/2K73//+zz88MMfyTSWpJ3PcOhnZ5xxBosWLWLp0qW89dZbTJo0CYB58+bR1dVFe3s7HR0dHHzwwcX9/4MHD+bdd999b33L9sxk3LhxdHR00NHRwZNPPskDDzxQfO5x48bR3t7+ofrt+Tm37mevvfZ6b3nQoEFs3ry52P8Tn/gE7e3tjB8/njlz5nD55Zd/qM8vaWAyHPrZvvvuy4knnsisWbPedyF6w4YNHHTQQbS0tLB48eJefwo/7LDDeOqpp9i4cSMbNmxg0aJFAHzyk5+kq6uLRx55BOieZlq+fHmx/5w5c7j44ot56aWXANi4cSPXX389Q4cOZdiwYTz88MMA3HHHHe+dRbS2tr4XKAsXLtyhr7GlpYVNmzYBsGbNGvbZZx+++tWvctFFF7F06dIdOoakgW23vpW1rlvLpk+fzplnnvm+O5fOO+88vvSlL9HW1sbEiRP51Kc+Vew3evRozjnnHCZMmMDYsWM55phjANhzzz1ZuHAh3/rWt9iwYQObN2/mO9/5DuPGjXvf/qeddhpr167l5JNPJjOJCGbNmgXA3Llz+cY3vsGbb77J4Ycfzq233grARRddxDnnnMMdd9zBSSedtENf3+zZs5kwYQKTJk1ixowZfO9732OPPfagpaWFG2+8sU9/ZpIGltjRi6IDTVtbW279y35WrFjBEUccUVNH2hb/XgYmn3PoX7vCcw4R0Z6ZbTsy1mklSVLBcJAkFXa7cNhVp8l2V/59SLum3SochgwZwquvvuo3pAFiy+9zGDJkSN2tSPqQdqu7lUaNGkVnZyddXV11t6LKlt8EJ2nXsluFQ0tLi79xTJL6wW41rSRJ6h+GgySpYDhIkgqGgySpsN1wiIhbIuLliFjWo/bxiHgwIlZWr8OqekTE9RGxKiKeiIhJPfaZWY1fGREze9SPjYgnq32uD38jvSTVbkfOHG4Dtv4NLpcAizJzLLCoWgc4FRhbfcwGboTuMAEuAz4DHAdctiVQqjGze+znb4uRpJptNxwy8zfAuq3K04C51fJc4Iwe9duz22+BAyLiEGAK8GBmrsvM9cCDwNRq2/6Z+Uh2P7l2e49jSZJq0tdrDgdn5osA1etBVX0ksLrHuM6q9kH1zl7qvYqI2RGxJCKW+KCbJH10+vuCdG/XC7IP9V5l5k2Z2ZaZbSNGjOhji5Kk7elrOKytpoSoXl+u6p3A6B7jRgFrtlMf1UtdklSjvobD3cCWO45mAnf1qM+o7lo6HthQTTvdD5wSEcOqC9GnAPdX2/4cEcdXdynN6HEsSVJNtvveShHxb8CJwIER0Un3XUdXA3dGxPnA88DZ1fB7gdOAVcCbwNcAMnNdRPwIeKwad3lmbrnI/b/pviNqb+CX1YckqUbbDYfMnL6NTZN7GZvABds4zi3ALb3UlwBHba8PSdLO4xPSkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKjQUDhHxjxGxPCKWRcS/RcSQiBgTEY9GxMqIWBARe1Zj96rWV1XbW3scZ05VfzoipjT2JUmSGtXncIiIkcC3gLbMPAoYBJwLXAP8NDPHAuuB86tdzgfWZ+ZfAT+txhERR1b7jQOmAjdExKC+9iVJalyj00qDgb0jYjCwD/AicBKwsNo+FzijWp5WrVNtnxwRUdXnZ+bGzPwTsAo4rsG+JEkN6HM4ZOYLwLXA83SHwgagHXgtMzdXwzqBkdXySGB1te/mavzwnvVe9nmfiJgdEUsiYklXV1dfW5ckbUcj00rD6P6pfwxwKPAx4NRehuaWXbaxbVv1sph5U2a2ZWbbiBEjPnzTkqQd0si00snAnzKzKzM3AT8D/gY4oJpmAhgFrKmWO4HRANX2ocC6nvVe9pEk1aCRcHgeOD4i9qmuHUwGngIWA2dVY2YCd1XLd1frVNt/nZlZ1c+t7mYaA4wFftdAX5KkBg3e/pDeZeajEbEQWApsBh4HbgJ+AcyPiCuq2s3VLjcDd0TEKrrPGM6tjrM8Iu6kO1g2Axdk5jt97UuS1Lg+hwNAZl4GXLZV+Rl6udsoM98Gzt7Gca4ErmykF0lS//EJaUlSwXCQJBUMB0lSwXCQJBUMB0lSwXCQJBUMB0lSwXCQJBUMB0lSwXCQJBUMB0lSwXCQJBUMB0lSwXCQJBUMB0lSwXCQJBUMB0lSwXCQJBUMB0lSwXCQJBUMB0lSwXCQJBUMB0lSwXCQJBUMB0lSwXCQJBUMB0lSwXCQJBUaCoeIOCAiFkbEHyJiRUT8dUR8PCIejIiV1euwamxExPURsSoinoiIST2OM7MavzIiZjb6RUmSGtPomcO/APdl5qeAo4EVwCXAoswcCyyq1gFOBcZWH7OBGwEi4uPAZcBngOOAy7YEiiSpHn0Oh4jYH/g8cDNAZv4lM18DpgFzq2FzgTOq5WnA7dntt8ABEXEIMAV4MDPXZeZ64EFgal/7kiQ1rpEzh8OBLuDWiHg8Iv41Ij4GHJyZLwJUrwdV40cCq3vs31nVtlUvRMTsiFgSEUu6uroaaF2S9EEaCYfBwCTgxsw8BniD/55C6k30UssPqJfFzJsysy0z20aMGPFh+5Uk7aBGwqET6MzMR6v1hXSHxdpquojq9eUe40f32H8UsOYD6pKkmvQ5HDLzJWB1RHyyKk0GngLuBrbccTQTuKtavhuYUd21dDywoZp2uh84JSKGVReiT6lqkqSaDG5w/wuBeRGxJ/AM8DW6A+fOiDgfeB44uxp7L3AasAp4sxpLZq6LiB8Bj1XjLs/MdQ32JUlqQEPhkJkdQFsvmyb3MjaBC7ZxnFuAWxrpRZLUf3xCWpJUMBwkSQXDQZJUMBwkSQXDQZJUMBwkSQXDQZJUMBwkSQXDQZJUMBwkSQXDQZJUMBwkSQXDQZJUMBwkSQXDQZJUMBwkSQXDQZJUMBwkSQXDQZJUMBwkSQXDQZJUMBwkSQXDQZJUMBwkSQXDQZJUMBwkSQXDQZJUMBwkSYWGwyEiBkXE4xFxT7U+JiIejYiVEbEgIvas6ntV66uq7a09jjGnqj8dEVMa7UmS1Jj+OHP4NrCix/o1wE8zcyywHji/qp8PrM/MvwJ+Wo0jIo4EzgXGAVOBGyJiUD/0JUnqo4bCISJGAacD/1qtB3ASsLAaMhc4o1qeVq1TbZ9cjZ8GzM/MjZn5J2AVcFwjfUmSGtPomcN1wMXAu9X6cOC1zNxcrXcCI6vlkcBqgGr7hmr8e/Ve9nmfiJgdEUsiYklXV1eDrUuStqXP4RARXwRezsz2nuVehuZ2tn3QPu8vZt6UmW2Z2TZixIgP1a8kaccNbmDfzwJ/GxGnAUOA/ek+kzggIgZXZwejgDXV+E5gNNAZEYOBocC6HvUteu4jSapBn88cMnNOZo7KzFa6Lyj/OjPPAxYDZ1XDZgJ3Vct3V+tU23+dmVnVz63uZhoDjAV+19e+JEmNa+TMYVu+D8yPiCuAx4Gbq/rNwB0RsYruM4ZzATJzeUTcCTwFbAYuyMx3PoK+JEk7qF/CITMfAh6qlp+hl7uNMvNt4Oxt7H8lcGV/9CJJapxPSEuSCoaDJKlgOEiSCoaDJKnwUdytpErrJb+ou4XdyrNXn153C1LT8MxBklQwHCRJBcNBklQwHCRJBcNBklQwHCRJBcNBklQwHCRJBcNBklQwHCRJBcNBklQwHCRJBcNBklQwHCRJBcNBklQwHCRJBcNBklQwHCRJBcNBklQwHCRJBcNBklQwHCRJBcNBklToczhExOiIWBwRKyJieUR8u6p/PCIejIiV1euwqh4RcX1ErIqIJyJiUo9jzazGr4yImY1/WZKkRjRy5rAZ+G5mHgEcD1wQEUcClwCLMnMssKhaBzgVGFt9zAZuhO4wAS4DPgMcB1y2JVAkSfXoczhk5ouZubRa/jOwAhgJTAPmVsPmAmdUy9OA27Pbb4EDIuIQYArwYGauy8z1wIPA1L72JUlqXL9cc4iIVuAY4FHg4Mx8EboDBDioGjYSWN1jt86qtq16b59ndkQsiYglXV1d/dG6JKkXDYdDROwL/Afwncz8rw8a2kstP6BeFjNvysy2zGwbMWLEh29WkrRDGgqHiGihOxjmZebPqvLaarqI6vXlqt4JjO6x+yhgzQfUJUk1aeRupQBuBlZk5j/32HQ3sOWOo5nAXT3qM6q7lo4HNlTTTvcDp0TEsOpC9ClVTZJUk8EN7PtZ4O+BJyOio6pdClwN3BkR5wPPA2dX2+4FTgNWAW8CXwPIzHUR8SPgsWrc5Zm5roG+JEkN6nM4ZOZ/0vv1AoDJvYxP4IJtHOsW4Ja+9iJJ6l8+IS1JKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqSC4SBJKhgOkqTCgAmHiJgaEU9HxKqIuKTufiSpmQ2IcIiIQcD/A04FjgSmR8SR9XYlSc1rQIQDcBywKjOfycy/APOBaTX3JElNa6CEw0hgdY/1zqomSarB4LobqEQvtSwGRcwGZlerr0fE0x9pV83jQOCVupvYnrim7g5UE/999p/DdnTgQAmHTmB0j/VRwJqtB2XmTcBNO6upZhERSzKzre4+pN7477MeA2Va6TFgbESMiYg9gXOBu2vuSZKa1oA4c8jMzRHxTeB+YBBwS2Yur7ktSWpaAyIcADLzXuDeuvtoUk7VaSDz32cNIrO47itJanID5ZqDJGkAMRwkSQXDQdKAERF7RMTf1N2HvObQlCJiBPB1oJUeNyVk5qy6epK2iIhHMvOv6+6j2Q2Yu5W0U90FPAz8Cnin5l6krT0QEX8H/Cz96bU2njk0oYjoyMyJdfch9SYi/gx8jO4fXN6i++11MjP3r7WxJuM1h+Z0T0ScVncTUm8yc7/M3CMzWzJz/2rdYNjJPHNoQj1+MvsLsKkq+5OZBoSICOA8YExm/igiRgOHZObvam6tqRgOkgaUiLgReBc4KTOPiIhhwAOZ+emaW2sqXpBuUhHxt8Dnq9WHMvOeOvuRevhMZk6KiMcBMnN99Yac2om85tCEIuJq4NvAU9XHt6uaNBBsqn51cMJ7t16/W29LzcdppSYUEU8AEzPz3Wp9EPB4Zk6otzMJIuI84CvAJGAucBbwT5n577U21mScVmpeBwDrquWhdTYi9ZSZ8yKiHZhM922sZ2TmiprbajqGQ3P6MfB4RCym+z/f54E59bYkvc9K4L+ovkdFxP/MzOfrbam5OK3UpCLiEODTdIfDo5n5Us0tSQBExIXAZcBauh+E2/IQnNOeO5Hh0IQi4rNAR2a+ERFfpXtu918y87maW5OIiFV037H0at29NDPvVmpONwJvRsTRwPeA54Db621Jes9qYEPdTTQ7rzk0p82ZmRExDbg+M2+OiJl1N6XmFhH/p1p8BngoIn4BbNyyPTP/uZbGmpTh0Jz+HBFzgL8H/ld1K2tLzT1J+1Wvz1cfe1YfUD3zoJ3Haw5NKCL+BzAdeCwz/7N675oTM/OOmluTiIizt36mobeaPlqGQxOp3nBvy194VK9ZLW8EVgE/yMxFNbQnARARSzNz0vZq+mg5rdREMnO/bW2rppaOAuZVr9JOFRGnAqcBIyPi+h6b9gc219NV8zIcBEBmvgP8PiL+b929qGmtAZYAZwN/pPus9h26n3f4xxr7akpOK0kaECKiBbgS+AfgWbqnO0cDtwKXZuambe+t/uZzDpIGip8Aw4DDMnNSZh4DHE73e39dW2tnTcgzB0kDQkSsBD6RW31Tqq6H/SEzx9bTWXPyzEHSQJFbB0NVfAefc9jpDAdJA8VTETFj62L1/l9/qKGfpua0kqQBISJGAj8D3gLa6T5b+DSwN/DlzHyhxvaajuEgaUCJiJOAcXTfrbTchzLrYThIkgpec5AkFQwHSVLBcJAkFQwHSVLBcJAkFf4/Ts5bWohEuncAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plotUniqueValuesComparation(df_json, df_other, 'script_domain')" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
jsonother
baidu.com0.1561620.029125
cloudfront.net0.0823070.031655
rambler.ru0.0482660.010561
google-analytics.com0.0000050.121869
yandex.ru0.0219040.030423
\n", + "
" + ], + "text/plain": [ + " json other\n", + "baidu.com 0.156162 0.029125\n", + "cloudfront.net 0.082307 0.031655\n", + "rambler.ru 0.048266 0.010561\n", + "google-analytics.com 0.000005 0.121869\n", + "yandex.ru 0.021904 0.030423" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAFcCAYAAAAkiW7CAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3X24XeOd//H3Rx7ECNFGdEhoMqXVCFJ5ojS0RlElZYJgprTadKajakxVWq2qzlworer8mKl6qFBPNaUpGRkPg3qoJiFEpCoywZloRUhINEh8f3+sdXL22dnJWfucnbPOzv15XVcue6291j7fc5zz2fe+173uWxGBmZmlYbOyCzAzs+7j0DczS4hD38wsIQ59M7OEOPTNzBLi0DczS4hD38wsIQ59M7OEOPTNzBLSu+wCqm277bYxdOjQssswM2sqs2fPfiUiBnV0XI8L/aFDhzJr1qyyyzAzayqSni9yXKHuHUmHSHpG0gJJU2o8P17SY5JWS5pY9dxOkv5b0nxJT0saWuRrmplZ43UY+pJ6AZcChwLDgeMkDa867AXgJOD6Gi8xFbgwIj4MjAVe7krBZmbWeUW6d8YCCyJiIYCkG4EJwNOtB0TEovy5dytPzN8cekfEXflxKxpTtpmZdUaR0B8MvFix3QKMK/j6HwSWSfolMAy4G5gSEWsqD5I0GZgMsNNOOxV8aTNL3TvvvENLSwurVq0qu5Ru069fP4YMGUKfPn06dX6R0FeNfUUn4e8NfAz4CFkX0E1k3UBXtnuxiMuBywFGjx7tCf7NrJCWlha22morhg4dilQrqjYtEcHSpUtpaWlh2LBhnXqNIhdyW4AdK7aHAIsLvn4L8HhELIyI1cBtwF71lWhmVtuqVasYOHBgEoEPIImBAwd26ZNNkdCfCewiaZikvsAkYFrB158JvEdS69jRT1BxLcDMrKtSCfxWXf1+Owz9vIV+CjADmA/cHBHzJJ0r6Yi8iDGSWoCjgZ9Impefuwb4GnCPpLlkXUU/7VLFZmbWaYVuzoqI6cD0qn1nVzyeSdbtU+vcu4A9ulBj3YZOuaPLr7Ho/MMaUImZdadG/O1XKpoDH/3oR3n44Ycb+rU3Fs+9Y2bWRc0S+ODQNzPrsv79+/PSSy8xfvx4Ro4cyYgRI/jNb34DwA033MDuu+/OiBEjOPPMM9udc9ZZZ7Hnnnuy995786c//albanXom5k1wPXXX8/BBx/MnDlzeOKJJxg5ciSLFy/mzDPP5N5772XOnDnMnDmT2267DYCVK1ey995788QTTzB+/Hh++tPuudzp0Dcza4AxY8Zw9dVXc8455zB37ly22morZs6cyQEHHMCgQYPo3bs3J5xwAg888AAAffv25dOf/jQAo0aNYtGiRd1Sp0PfzKwBxo8fzwMPPMDgwYP5u7/7O6ZOnUrE+u817dOnz9rhl7169WL16tXdUqdD38ysAZ5//nm22247vvjFL3LyySfz2GOPMW7cOO6//35eeeUV1qxZww033MD+++9fap09bj59M7POKmuotSTuu+8+LrzwQvr06UP//v2ZOnUq22+/Peeddx4f//jHiQg+9alPMWHChFJqXFvrhj5+lGH06NHR1UVUPE7fLA3z58/nwx/+cKk1LF26lL322ovnny+0hklD1Pq+Jc2OiNEdnevuHTOzTlq8eDH77LMPX/va18oupTB375iZddIOO+zAH/7wh7LLqItb+mZmCXHom5klxKFvZpYQh76ZWUJ8IdfMNh3nDGjw6y2v+5Rly5Zx/fXX8+UvfxmA++67j4suuojbb7+9sbV1klv6ZmYNtGzZMi677LKGvV6jp2dw6JuZdcEPf/hDRowYwYgRI/jRj37ElClTeO655xg5ciRnnHEGACtWrGDixInsuuuunHDCCWvn5Jk9ezb7778/o0aN4uCDD+all14C4IADDuCb3/wm+++/P5dccklD6y3UvSPpEOASoBdwRUScX/X8eOBHZCtkTYqIW6qe35psqcVbI+KURhRuZla22bNnc/XVV/Poo48SEYwbN47rrruOp556ijlz5gBZ987jjz/OvHnz2GGHHdh333156KGHGDduHF/5ylf41a9+xaBBg7jppps466yzuOqqq4DsE8P999/f8Jo7DH1JvYBLgYOAFmCmpGkRUbnA+QvASWTr4dbyPaDx1ZuZlejBBx/kyCOPZMsttwTgqKOOWrt4SqWxY8cyZEi2ouzIkSNZtGgR22yzDU899RQHHXQQAGvWrGH77bdfe86xxx67UWou0tIfCyyIiIUAkm4EJgBrQz8iFuXPvVt9sqRRwPuAO4EO54UwM2sWRecu23zzzdc+bp1GOSLYbbfdeOSRR2qe0/pG0mhF+vQHAy9WbLfk+zokaTPgB8AZ9ZdmZtazjR8/nttuu40333yTlStXcuutt7LvvvvyxhtvdHjuhz70IZYsWbI29N955x3mzZu3sUsu1NJXjX1Fp+b8MjA9Il5sXSyg5heQJgOTAXbaaaeCL21mVqUTQyy7Yq+99uKkk05i7NixAHzhC19g1KhR7LvvvowYMYJDDz2Uww6rPWNv3759ueWWWzj11FNZvnw5q1ev5rTTTmO33XbbqDV3OLWypH2AcyLi4Hz7GwARcV6NY38G3N56IVfSz4GPAe8C/YG+wGURMWV9X89TK5tZUT1hauUydGVq5SIt/ZnALpKGAf8HTAKOL1JYRJxQUdBJwOgNBb6ZmW1cHfbpR8Rq4BRgBtmwy5sjYp6kcyUdASBpjKQW4GjgJ5I2fseUmZnVrdA4/YiYDkyv2nd2xeOZwJAOXuNnwM/qrtDMbAMigg1dM9zUdHW1Q9+Ra2ZNq1+/fixdurTLQdgsIoKlS5fSr1+/Tr+GJ1wzs6Y1ZMgQWlpaWLJkSdmldJt+/fqtvdGrMxz6Zta0+vTpw7Bhw8ouo6m4e8fMLCEOfTOzhDj0zcwS4tA3M0uIQ9/MLCEOfTOzhDj0zcwS4tA3M0uIQ9/MLCEOfTOzhDj0zcwS4tA3M0uIQ9/MLCEOfTOzhDj0zcwSUij0JR0i6RlJCySts7C5pPGSHpO0WtLEiv0jJT0iaZ6kJyUd28jizcysPh0uoiKpF3ApcBDQAsyUNC0inq447AXgJOBrVae/CXw2Ip6VtAMwW9KMiFjWkOrNrDznDGjAayzv+mtYXYqsnDUWWBARCwEk3QhMANaGfkQsyp97t/LEiPhDxePFkl4GBgEOfTOzEhTp3hkMvFix3ZLvq4uksUBf4Lkaz02WNEvSrJTWujQz625FQl819tW19Lyk7YFrgc9FxLvVz0fE5RExOiJGDxo0qJ6XNjOzOhQJ/RZgx4rtIcDiol9A0tbAHcC3IuK39ZVnZmaNVCT0ZwK7SBomqS8wCZhW5MXz428FpkbELzpfppmZNUKHoR8Rq4FTgBnAfODmiJgn6VxJRwBIGiOpBTga+ImkefnpxwDjgZMkzcn/jdwo34mZmXWoyOgdImI6ML1q39kVj2eSdftUn3cdcF0XazQzswbxHblmZglx6JuZJcShb2aWEIe+mVlCHPpmZglx6JuZJcShb2aWEIe+mVlCHPpmZglx6JuZJcShb2aWEIe+mVlCHPpmZglx6JuZJcShb2aWEIe+mVlCHPpmZglx6JuZJaRQ6Es6RNIzkhZImlLj+fGSHpO0WtLEqudOlPRs/u/ERhVuZmb16zD0JfUCLgUOBYYDx0kaXnXYC8BJwPVV574X+A4wDhgLfEfSe7petpmZdUaRlv5YYEFELIyIt4EbgQmVB0TEooh4Eni36tyDgbsi4tWIeA24CzikAXWbmVknFAn9wcCLFdst+b4iCp0rabKkWZJmLVmypOBLm5lZvYqEvmrsi4KvX+jciLg8IkZHxOhBgwYVfGkzM6tXkdBvAXas2B4CLC74+l0518zMGqxI6M8EdpE0TFJfYBIwreDrzwA+Kek9+QXcT+b7zMysBB2GfkSsBk4hC+v5wM0RMU/SuZKOAJA0RlILcDTwE0nz8nNfBb5H9sYxEzg332dmZiXoXeSgiJgOTK/ad3bF45lkXTe1zr0KuKoLNZqZWYP4jlwzs4Q49M3MEuLQNzNLiEPfzCwhDn0zs4Q49M3MEuLQNzNLiEPfzCwhDn0zs4Q49M3MEuLQNzNLiEPfzCwhDn0zs4Q49M3MEuLQNzNLiEPfzCwhDn0zs4QUWjlL0iHAJUAv4IqIOL/q+c2BqcAoYClwbEQsktQHuALYK/9aUyPivAbWbx0YOuWOLr/GovMPa0AlZtYTdNjSl9QLuBQ4FBgOHCdpeNVhJwOvRcTOwMXABfn+o4HNI2J3sjeEL0ka2pjSzcysXkW6d8YCCyJiYUS8DdwITKg6ZgJwTf74FuBASQIC2FJSb2AL4G3g9YZUbmZmdSsS+oOBFyu2W/J9NY+JiNXAcmAg2RvASuAl4AXgooh4tfoLSJosaZakWUuWLKn7mzAzs2KKhL5q7IuCx4wF1gA7AMOAf5b0V+scGHF5RIyOiNGDBg0qUJKZmXVGkdBvAXas2B4CLF7fMXlXzgDgVeB44M6IeCciXgYeAkZ3tWgzM+ucIqE/E9hF0jBJfYFJwLSqY6YBJ+aPJwL3RkSQdel8Qpktgb2B3zemdDMzq1eHoZ/30Z8CzADmAzdHxDxJ50o6Ij/sSmCgpAXA6cCUfP+lQH/gKbI3j6sj4skGfw9mZlZQoXH6ETEdmF617+yKx6vIhmdWn7ei1n4zMyuH78g1M0uIQ9/MLCEOfTOzhDj0zcwS4tA3M0uIQ9/MLCEOfTOzhDj0zcwS4tA3M0uIQ9/MLCEOfTOzhDj0zcwS4tA3M0uIQ9/MLCEOfTOzhDj0zcwS4tA3M0uIQ9/MLCGFQl/SIZKekbRA0pQaz28u6ab8+UclDa14bg9Jj0iaJ2mupH6NK9/MzOrRYehL6kW2wPmhwHDgOEnDqw47GXgtInYGLgYuyM/tDVwH/H1E7AYcALzTsOrNzKwuRVr6Y4EFEbEwIt4GbgQmVB0zAbgmf3wLcKAkAZ8EnoyIJwAiYmlErGlM6WZmVq8ioT8YeLFiuyXfV/OYiFgNLAcGAh8EQtIMSY9J+nqtLyBpsqRZkmYtWbKk3u/BzMwKKhL6qrEvCh7TG9gPOCH/75GSDlznwIjLI2J0RIweNGhQgZLMzKwzioR+C7BjxfYQYPH6jsn78QcAr+b774+IVyLiTWA6sFdXizYzs84pEvozgV0kDZPUF5gETKs6ZhpwYv54InBvRAQwA9hD0l/kbwb7A083pnQzM6tX744OiIjVkk4hC/BewFURMU/SucCsiJgGXAlcK2kBWQt/Un7ua5J+SPbGEcD0iLhjI30vZmbWgQ5DHyAippN1zVTuO7vi8Srg6PWcex3ZsE0zMyuZ78g1M0uIQ9/MLCEOfTOzhDj0zcwS4tA3M0uIQ9/MLCEOfTOzhDj0zcwS4tA3M0tIoTtyzTYFQ6d0fQaQRecf1oBKzMrjlr6ZWUIc+mZmCXHom5klxKFvZpYQh76ZWUIc+mZmCXHom5klpFDoSzpE0jOSFkiaUuP5zSXdlD//qKShVc/vJGmFpK81pmwzM+uMDkNfUi/gUuBQYDhwnKThVYedDLwWETsDFwMXVD1/MfBfXS/XzMy6okhLfyywICIWRsTbwI3AhKpjJgDX5I9vAQ6UJABJnwEWAvMaU7KZmXVWkdAfDLxYsd2S76t5TESsBpYDAyVtCZwJfHdDX0DSZEmzJM1asmRJ0drNzKxORUJfNfZFwWO+C1wcESs29AUi4vKIGB0RowcNGlSgJDMz64wiE661ADtWbA8BFq/nmBZJvYEBwKvAOGCipO8D2wDvSloVEf+vy5WbmVndioT+TGAXScOA/wMmAcdXHTMNOBF4BJgI3BsRAXys9QBJ5wArHPhmZuXpMPQjYrWkU4AZQC/gqoiYJ+lcYFZETAOuBK6VtICshT9pYxZtZmadU2g+/YiYDkyv2nd2xeNVwNEdvMY5najPzMwayHfkmpklxKFvZpYQh76ZWUIc+mZmCXHom5klpNDoHUvcOQMa8BrLu/4aZtZlbumbmSXELX0zs65qok/DbumbmSXELf31aaJ3bjOzotzSNzNLiEPfzCwhDn0zs4Q49M3MEuLQNzNLiEPfzCwhDn0zs4Q49M3MElLo5ixJhwCXkK2Re0VEnF/1/ObAVGAUsBQ4NiIWSToIOB/oC7wNnBER9zawfjOzLhk65Y4uv8aifg0opJt02NKX1Au4FDgUGA4cJ2l41WEnA69FxM7AxcAF+f5XgMMjYnfgRODaRhVuZmb1K9K9MxZYEBELI+Jt4EZgQtUxE4Br8se3AAdKUkQ8HhGL8/3zgH75pwIzMytBkdAfDLxYsd2S76t5TESsBpYDA6uO+Rvg8Yh4q/oLSJosaZakWUuWLClau5mZ1alI6KvGvqjnGEm7kXX5fKnWF4iIyyNidESMHjRoUIGSzMysM4qEfguwY8X2EGDx+o6R1BsYALyabw8BbgU+GxHPdbVgMzPrvCKhPxPYRdIwSX2BScC0qmOmkV2oBZgI3BsRIWkb4A7gGxHxUKOKNjOzzukw9PM++lOAGcB84OaImCfpXElH5IddCQyUtAA4HZiS7z8F2Bn4tqQ5+b/tGv5dmJlZIYXG6UfEdGB61b6zKx6vAo6ucd6/AP/SxRrNzKxBfEeumVlCHPpmZglx6JuZJcShb2aWEIe+mVlCHPpmZgkpNGTTzHLnDGjAayzv+muYdZJb+mZmCXHom5klxKFvZpYQh76ZWUIc+mZmCXHom5klxKFvZpYQh76ZWUIc+mZmCXHom5klpFDoSzpE0jOSFkiaUuP5zSXdlD//qKShFc99I9//jKSDG1e6mZnVq8PQl9QLuBQ4FBgOHCdpeNVhJwOvRcTOwMXABfm5w8kWUt8NOAS4LH89MzMrQZGW/lhgQUQsjIi3gRuBCVXHTACuyR/fAhwoSfn+GyPirYj4X2BB/npmZlaCIrNsDgZerNhuAcat75iIWC1pOTAw3//bqnMHV38BSZOByfnmCknPFKp+IxJsC7zSpRf5rhpTTMn8s2jjn0U7/lnkesjvxfuLHFQk9GtVEgWPKXIuEXE5cHmBWrqNpFkRMbrsOnoC/yza+GfRxj+LNs30syjSvdMC7FixPQRYvL5jJPUGBgCvFjzXzMy6SZHQnwnsImmYpL5kF2anVR0zDTgxfzwRuDciIt8/KR/dMwzYBfhdY0o3M7N6ddi9k/fRnwLMAHoBV0XEPEnnArMiYhpwJXCtpAVkLfxJ+bnzJN0MPA2sBv4xItZspO+l0XpUd1PJ/LNo459FG/8s2jTNz0JZg9zMzFLgO3LNzBLi0DczS4hD38wsIQ59M7OEOPRtHZK+WmRfKiTtVOtf2XWZdYZH71SQtA3wWWAoFcNZI+LUsmoqg6THImKvqn2PR8RHyqqpTJLm0naHeT9gGPBMROxWamHdTNJo4Cyy2/17k/08IiL2KLWwEki6mtqzC3y+hHLqUmQahpRMJ5sraC7wbsm1dDtJxwHHA8MkVd6AtxWwtJyqyhcRu1duS9oL+FJJ5ZTp58AZJPr3UeX2isf9gCNpktkG3NKvUKuFmxJJ7ydrxZ4HVK6b8AbwZESsLqWwHijF3xVJD0bEfmXX0RNJ2gy4OyI+UXYtHXHoV5D0T8AKsnfxt1r3R8SrpRVVkvwNYJeIuFvSFkDviHij7LrKIOn0is3NgL2AgRGR1KJAkg4EjgPuof3fxy9LK6qHkPQh4I58TZEezd077b0NXEjWb9n6bhjAX5VWUQkkfZFsquv3Ah8gmyjvP4ADy6yrRFtVPF4N3AH8Z0m1lOlzwK5AH9q6dwJILvQlvUH7Pv0/AmeWVE5dHPrtnQ7sHBFdmxe7+f0j2WI3jwJExLOStiu3pHLkK731j4gzyq6lB9iz+vpGivIFonaLiBfKrqUzPGSzvXnAm2UX0QO8la+SBqydLjvJfsB8gsCk+u434Lc1lkpNTj6D8K1l19FZbum3twaYI+l/aN9nmdSQTeB+Sd8EtpB0EPBl4Ncl11SmOflopl8AK1t3JtiXvR9woqT/Jfv7SHbIJtkb4JiImFl2IfXyhdwKkk6stT8irqm1f1OVj0Q4Gfgk2R/2DOCKSPSXJR+TXS2aYUx2I+UX99cREc93dy1lk/Q08EHgebKGQNO8ATr0q+QLxXww33wmIt4psx6znkTSnsDH8s3fRMQTZdZTlmZ+A3ToV5B0AHANsIjsnXtH4MSIeKDEsrqdpH2Bc1j3zsukRjFtiKRPR8TtHR+56cin4vgibaN1jgQuj4h/K68qq5dDv4Kk2cDxEfFMvv1B4IaIGFVuZd1L0u+BfwJmk13nACAikr0rt5qk70bEd8quoztJehLYJyJW5ttbAo80Q5dGd5B0e0R8uuw6OuILue31aQ18gIj4g6Q+ZRZUkuUR8V9lF9ET5Nc39o6Ihyv3pxb4OVHRCMgfq6RaeqIvll1AEQ799mZJuhK4Nt8+gay1m5r/kXQh2cf4ylFMj5VXUjki4l1JPwD2KbuWHuBq4FFJrcMVP0O2PnZyJG0XES9X7d4aeKmMeurh7p0KkjYnuzFpP7IWzAPAZRHx1gZP3MTkQ1arRTPMK7IxSPou8CTwy1RHMLXKJ5tb+/cREY+XXFIpJD0DfDsibs63/xk4OSJ6/H0MDv0KeR/lqvyGnNa7MTePCN+wlbD8lvstyboz/kzbhe2tSy2sm0naG5jXOgeTpK2A4RHxaLmVdT9J2wOXA6uA9wHzgX+OiBWlFlaA78ht7x5gi4rtLYC7S6qlR8lbeEmKiK0iYrOI6BMRW+fbSQV+7t/JJiRstTLfl5yIeAm4k6zbbygwtRkCHxz61fpV/o/LH/9FifX0JP9QdgFlUeZvJX07395R0tiy6yqBKru3IuJdEr0uKOkuYBwwAvgUcLGki8qtqhiHfnsrK1u0kkaRfZxPXkQ0xciEjeQyshbd8fn2CuDS8sopzUJJp0rqk//7KrCw7KJKcmlEfDYilkXEU8BHgeVlF1WE+/QrSBoD3EjbCjjbA8dGRFIjeCTdExEHdrQvFa0LplQuGSnpiYjYs+zaulM+0+qPgU+QTcB3D3BajVEsSWjWNSeS/Gi2PhExU9KuwIfILtb9PqVpGCT1I+vO2lbSe2gbg701sENphZXvnfyifgBIGkSCywXm4T6p7Dp6gmZec8KhXyUP+adatyX9ZUT8scSSutOXgNPIAn42baH/Oml2Z7T6MdlUuttJ+ldgIvCtckvqGVKcjiLXtGtOOPQ7diVwWNlFdIeIuAS4RNJXPJ9Km4j4eT5Fx4Fkb4SfiYj5JZfVU4yh/SLhqXgrIt7O1lNprjUn3KdvNUn6KNlQtLUNg4iYWlpBJZD03g09n+LayZaR9H1gGfBZ4Ctka048HRFnlVpYAQ79CpJ2qrW/WZdF6yxJ15L1U86hba6VSG0xmXyxkKD2/DLJzToq6Wjgzoh4Q9K3yFYU+16Kd+U285oTDv0KkubS9kfeDxhGNqf+bqUW1s0kzSe709K/HLaWpCcjYg9J+wHnARcB34yIcSWXZnVwn36F6kWf8zH7XyqpnDI9BfwlTTB5VHeRdBTZnDNBtnjIbSWXVIbWT32HAf8eEb+SdE6J9XS7ioZhTc0wzbRDfwMi4rF87H5qtgWelvQ72s+yeUR5JZVH0mXAzsAN+a6/l3RQRPxjiWWV4f8k/QT4a+CCfILC1G7wbJ0vv/X/feWMvE0xR5e7dypIOr1iczOyPsuBEXFwSSWVQtL+tfZHxP3dXUtPIGkeMKK1uyvvz52bYLffXwCHkH3vz+aTju0eEf9dcmndTtJDEbFvR/t6otTepTuyVcW/zYE7gAmlVlSCPNx/T9vPYn6qgZ97Bqi8yL8j2VTLqdkDuCsins23V9AkUw9sBFvm1zaAtaPdtiyxnsLc0rd1SDoGuBC4j+yi9seAMyLiljLr6m6Sfk3WfzuAbDz67/LtccDDEfHXJZbX7SQ9DuxV9YlnVkQkNwNrPi/XVWS/G5AN3/x8Myw05D592v1x15RgX/ZZwJjWOVXyaQfuBpIKfbLRKdZmnVk285uSkpPPx7WnpK3Jfi5N84knyf9hNbT+cR9FNmrlunz7OGBRGQWVbLOqSbSWkmBXYHWXVv4HnvLfzEJJp9I2h/6XSXSWzfwi9t+Q38DYemduRJxbYlmFuHungqQHImJ8R/s2dfn6uHvQNlrlWODJiDizvKrKI2ky8D2yabbfpW3lrNRuzvIsmzlJd5Jdz5hNxWLxEfGD0ooqyKFfIb8p6bCIWJhvDwOmR8SHy62s+1WMS29dC/XWDk7ZZEl6FtgnIl4puxbrGSQ9FREjyq6jM1L+qFrLPwH3SWr9yDqUxG7OyqcQnpFfpPxl2fX0EM/RJGOwNwZJX4+I70v6N2pc+0pteo7cw5J2j4i5ZRdSL4d+hYi4U9IuwK75rt9HxFsbOmdTExFrJL0paUAzXZzayL5B9kf+KO1vVksl7FpnFJ1VahU9y37ASfn8TG/R1uXnO3KbgaRPRMS9eZdGpQ9IIiJSa/GuAubm64CubN2ZUMhV+wlwLzCXNBdP+XX+8M2I+EXlc/kkbCk6tOwCOsuhn9mf7I/68BrPBel1c9yR/7PM6og4vePDNnnfAH5RYN8mLyKeh7UXt/uVXE5dfCHX1mpdB1fSBamO1KklXy3reeDXtO/eSWI+fUmHAp8CjgFuqnhqa7LZWMeWUliJJB0B/IBslbmXgfeT3bne46fmcEu/iqTDgN2oePduhrG3DbJ9Pu/OEZJupGoe+Wa423AjOT7/7zcq9gWQypDNxWT9+UeQDVFs9QbZ4IcUfQ/YG7g7Ij4i6eNk9/X0eA79CpL+g2xh8I8DV5Cthfq7UovqXmcDU8gWef5h1XNBNj47ORExrOwayhQRTwBPSHoVuCMikruuUcM7EbFU0maSNouI/5F0QdlFFeHunQoVi0S0/rc/8MuI+GTZtXUnSd+OiO+VXUdPImkEMJz2nwBTWz7yOmAf4D+Bq1NeJ1jS3cBngPOBgWRdPGMi4qOlFlaAW/rt/Tn/75uSdiCbfiCZVl6+aAzAHRWP10q1e0fSd4ADyEIACfEaAAAG0ElEQVR/OtnIjQeBpEI/Iv42n4riOOBqSQFcDdwQEW+UW123+xWwBXAa2Vz6A4Cm6AZ26Ld3u6RtgO/T1nd5RYn1dLfWW8j7AaOBJ8j69fcAHiUbm5yiicCewOMR8TlJ7yOt34u1IuJ1Sf9JW+AdCZwh6ccR8W/lVtet3gc8AjxGNtvm1GZZXtTdOxUkbQH8A9lUwgH8hmxZuFWlFtbN8ou4/9p6t2HetfG1iDip1MJKImlmRIyRNJvses8bwFPNMFKjkSQdDnwe+ADZilHXRMTL+eIq8yPi/aUW2M2UzbL2SeBzZI2km4ErI+K5UgvrgFv67V1D9gf943z7OLKP8MeUVlE5dq28vTwinpI0ssyCypL/YT+ZfwL8KdknwBWkdYG/1dHAxRHxQOXOiHhT0udLqqk0ERGS/gj8EVgNvAe4RdJdEfH1cqtbP7f0K0h6IiL27Gjfpk7SDWR34l5H9onnb4H+EdEUQ9IaTdLsiBiVPx4KbB0RKa6cZbl8iukTgVfIuvpui4h38oVlno2ID5Ra4Aa4pd/e45L2jojfAkgaBzxUck1l+BxZN9dX8+0HaJtDPUW/lTQmImZGxKKyi+lukt6g/URryrdb55vZupTCyrUtcFTrnbmt8oVlPr2ec3oEt/QBSXPJfon7AB8CXsi33w883axTqFpjSHoa+CDZXbkraaLJtcyqOfQBSRu8AFX9br6py2cOrDWFbip3oLazvt+P1H4vWlXPNxMRL5RYjtXJ3Tuk+8e7AaMrHvcju4D33pJqKZ1/PzLrm2+GbNoSaxJu6Vshkh6MiFTH6RvZoAayqTjazTcTEZNLLs3q4Ja+raPqbtzNyFr+W5VUjvUcTTvfjLVx6FstlYs7rwYWkd69CrauZfl8VA8AP5f0MtnvhzURd++YWSGStiRbVU20zTfz84hYWmphVheHvq1D0gDgO8D4fNf9wLleM9es+W1WdgHWI11FNh3FMfm/18lmU7SESTpK0rOSlkt6XdIbkl4vuy6rj1v6tg5JcyJiZEf7LC2SFgCHpzyP/qbALX2r5c+S1g7PlLQvbWsNWLr+5MBvfm7p2zryGTWvIbtQJ+BV4KR82TxLlKRLgL8EbqP9AvG/LK0oq5tD39YrXyWJiHC/rSGp1nWdiIjkplVuZg59W0vS6Rt6PiKqF0s3sybjm7OsUutdt63T5lZy6yBxkvoBJ5PNtVM54Zpb+k3EoW9rRcR3ASRdA3w1Ipbl2++h/V26lqZrgd8DB5MtAn4C2YRr1kQ8esdq2aM18AEi4jXgIyXWYz3DzhHxbWBlRFwDHAbsXnJNVieHvtWyWd66B0DSe/GnQoN38v8ukzSCbHTX0PLKsc7wH7LV8gPgYUm3kPXlHwP8a7klWQ9wed4Y+BYwDegPfLvckqxeHr1jNUkaTjZ3uoB7IuLpkksyswZw6JtZ3STdHhE9egFwq819+mbWGYPLLsA6x6FvZp3xeNkFWOe4e8fMCpO0BbBTRDxTdi3WOW7pm1khkg4H5gB35tsjJU0rtyqrl0PfzIo6BxgLLAOIiDl4nH7TceibWVGrvWRm8/PNWWZW1FOSjgd6SdoFOBV4uOSarE5u6ZtZUV8hm2HzLeAGsrWTTyu1IqubR++YmSXE3TtmtkGSfs0G1lOIiCO6sRzrIoe+mXXkorILsMZx946ZWULc0jezQiTNZd1unuXALOBfImJp91dl9XLom1lR/wWsAa7PtyeRTb29HPgZcHg5ZVk93L1jZoVIeigi9q21T9LciPDSiU3A4/TNrKj+ksa1bkgaS7Z6FsDqckqyerl7x8yK+gJwlaT+ZN06rwMnS9oSOK/Uyqwwd++YWV0kDSDLjmVl12L1c/eOmRUiaYCkHwL3AHdL+kH+BmBNxKFvZkVdBbwBHJP/ex24utSKrG7u3jGzQiTNiYiRHe2zns0tfTMr6s+S9mvdkLQv8OcS67FOcEvfzAqRNBK4BhhANnrnVeDEiHiy1MKsLg59M6uLpK0BIuL1smux+rl7x8wKqRi9cy9wr0fvNCeHvpkV5dE7mwB375hZIR69s2lwS9/MivLonU2AW/pmVoikPYGpZKN3AF7Do3eajkPfzAqRdHr+sHVmzRVkc+nPjog55VRl9XL3jpkVNRr4e2Brstb+ZOAA4KeSvl5iXVYHt/TNrBBJM4C/iYgV+XZ/4BbgSLLW/vAy67Ni3NI3s6J2At6u2H4HeH9E/Bl4q5ySrF5eRMXMiroe+K2kX+XbhwM35IuoPF1eWVYPd++YWWGSRgH7kc2982BEzCq5JKuTQ9/MLCHu0zczS4hD38wsIQ59M7OEOPTNzBLy/wGi3eFs8DD3DQAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plotTopUsageComparation(df_json, df_other, 'script_domain', 3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Above the mean Sample" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are 811 unique script_domain present on the non-json dataset and 1051 on the JSONs\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEOCAYAAACAfcAXAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAFHtJREFUeJzt3X+QVeWd5/H3V2lFo0GC6CqwttaQRBFE7BhnkomWmFXRDMT1RygysMENlapokjFEwUxVKpmsQcuaGKcmbFljFC1qIMsmhWWMY0Kw4lQlbmjsKEiMlKvSotgq9kRFAvE7f9yDaaGh277tvdDP+1XVdc95znPO+V66uZ97nnPOvZGZSJLKc1CzC5AkNYcBIEmFMgAkqVAGgCQVygCQpEIZAJJUKANAkgplAEhSoQwASSrUsGYXsC9HH310tra2NrsMSTqgtLe3v5SZo/vqt18HQGtrK2vWrGl2GZJ0QImIZ/rTzyEgSSqUASBJhTIAJKlQ+/U5AEn7jx07dtDZ2cmbb77Z7FJUGT58OGPHjqWlpWVA6xsAkvqls7OTI488ktbWViKi2eUULzN5+eWX6ezs5MQTTxzQNhwCktQvb775JqNGjfLFfz8REYwaNaquIzIDQFK/+eK/f6n392EASFKhPAcwCFoX/KTZJQwpTy+6qNklqB8G++++r9/7Oeecw8KFCzn//PPfbrvlllv4/e9/z/e///29rnfEEUfw2muvDUqNd911FzfddBOZSWYyd+5c5s+fPyjb3uWGG27g+uuvH9Rt7o1HAJIOCDNnzmTZsmXvaFu2bBkzZ85syP5/+tOfcsstt/DAAw+wfv161q5dy4gRIwZ9PzfccMOgb3NvDABJB4RLL72Ue++9l+3btwPw9NNPs3nzZj7+8Y/z2muvMXXqVKZMmcLEiRNZuXLlHus/+OCDXHzxxW/PX3XVVdx5550AtLe3c/bZZ3PGGWdw/vnn8/zzz++x/ne+8x1uvvlmjj/+eKB2CebnP/95ADo6OjjrrLOYNGkSn/70p9m6dStQO2rZ9XE2L730Ers+2+zOO+/kkksu4YILLmD8+PFce+21ACxYsIBt27YxefJkZs2axeuvv85FF13Eaaedxqmnnsry5csH4V/yzwwASQeEUaNGceaZZ3L//fcDtXf/V1xxBRHB8OHD+fGPf8zatWtZvXo1X/3qV8nMfm13x44dXH311axYsYL29nbmzp3L17/+9T36rVu3jjPOOKPXbcyePZsbb7yRRx99lIkTJ/LNb36zz/12dHSwfPlyHnvsMZYvX86mTZtYtGgRhx12GB0dHSxdupT777+f448/nt/+9resW7eOCy64oF/Pqb8MAEkHjJ7DQD2HfzKT66+/nkmTJnHeeefx3HPPsWXLln5t84knnmDdunV88pOfZPLkyXz729+ms7Oz3zV1d3fz6quvcvbZZwMwZ84cfvnLX/a53tSpUxkxYgTDhw/nlFNO4Zln9vz8tokTJ/Lzn/+c6667joceemjQh5wMAEkHjBkzZrBq1SrWrl3Ltm3bmDJlCgBLly6lq6uL9vZ2Ojo6OPbYY/e4Pn7YsGG89dZbb8/vWp6ZTJgwgY6ODjo6Onjsscd44IEH9tj3hAkTaG9vf1f19tzn7vUceuihb08ffPDB7Ny5c4/1P/jBD9Le3s7EiRNZuHAh3/rWt97V/vtiAEg6YBxxxBGcc845zJ079x0nf7u7uznmmGNoaWlh9erVvb6bPuGEE3j88cfZvn073d3drFq1CoAPfehDdHV18atf/QqoDQmtX79+j/UXLlzItddeywsvvADA9u3bufXWWxkxYgQjR47koYceAuDuu+9++2igtbX17dBYsWJFv55jS0sLO3bsAGDz5s0cfvjhfPazn2X+/PmsXbu2X9vorz4vA42IHwAXAy9m5qlV2weA5UAr8DRweWZujdpdCd8DpgFvAP8jM9dW68wB/r7a7Lczc8mgPhNJDdWsy3VnzpzJJZdc8o4rgmbNmsWnPvUp2tramDx5Mh/+8If3WG/cuHFcfvnlTJo0ifHjx3P66acDcMghh7BixQq+9KUv0d3dzc6dO/nKV77ChAkT3rH+tGnT2LJlC+eddx6ZSUQwd+5cAJYsWcIXvvAF3njjDU466STuuOMOAObPn8/ll1/O3Xffzbnnntuv5zdv3jwmTZrElClTmD17Nl/72tc46KCDaGlpYfHixQP6N9ub6OtESUR8AngNuKtHANwEvJKZiyJiATAyM6+LiGnA1dQC4KPA9zLzo1VgrAHagATagTMyc+u+9t3W1pYHwhfCeB/A4PI+gP3Thg0bOPnkk5tdhnbT2+8lItozs62vdfscAsrMXwKv7NY8Hdj1Dn4JMKNH+11Z82vgqIg4Djgf+FlmvlK96P8MGNzT2ZKkd2Wg5wCOzcznAarHY6r2McCmHv06q7a9tUuSmmSwTwL39slEuY/2PTcQMS8i1kTEmq6urkEtTlJ9+nttvRqj3t/HQANgSzW0Q/X4YtXeCYzr0W8ssHkf7XvIzNsysy0z20aP7vNL7SU1yPDhw3n55ZcNgf3Eru8DGD58+IC3MdAPg7sHmAMsqh5X9mi/KiKWUTsJ3J2Zz0fEvwE3RMTIqt9/AxYOuGpJDTd27Fg6OzvxyHz/sesbwQaqP5eB/itwDnB0RHQC36D2wv/DiLgSeBa4rOp+H7UrgDZSuwz0cwCZ+UpE/APwm6rftzJz9xPLkvZjLS0tA/7mKe2f+gyAzNzbR+1N7aVvAl/cy3Z+APzgXVUnSXrPeCewJBXKAJCkQhkAklQoA0CSCmUASFKhDABJKpQBIEmFMgAkqVAGgCQVygCQpEIZAJJUKANAkgplAEhSoQwASSqUASBJhTIAJKlQBoAkFcoAkKRCGQCSVCgDQJIKZQBIUqEMAEkqlAEgSYUyACSpUAaAJBXKAJCkQhkAklQoA0CSCmUASFKhDABJKpQBIEmFqisAIuLvImJ9RKyLiH+NiOERcWJEPBwRT0bE8og4pOp7aDW/sVreOhhPQJI0MMMGumJEjAG+BJySmdsi4ofAZ4BpwHczc1lE/G/gSmBx9bg1M/8iIj4D3AhcUfczkLRPrQt+0uwShoynF13U7BIGVb1DQMOAwyJiGHA48DxwLrCiWr4EmFFNT6/mqZZPjYioc/+SpAEacABk5nPAzcCz1F74u4F24NXM3Fl16wTGVNNjgE3Vujur/qN2325EzIuINRGxpqura6DlSZL6MOAAiIiR1N7VnwgcD7wPuLCXrrlrlX0s+3ND5m2Z2ZaZbaNHjx5oeZKkPtQzBHQe8P8zsyszdwA/Av4KOKoaEgIYC2yupjuBcQDV8hHAK3XsX5JUh3oC4FngrIg4vBrLnwo8DqwGLq36zAFWVtP3VPNUy3+RmXscAUiSGqOecwAPUzuZuxZ4rNrWbcB1wDURsZHaGP/t1Sq3A6Oq9muABXXULUmq04AvAwXIzG8A39it+SngzF76vglcVs/+JEmDxzuBJalQBoAkFcoAkKRCGQCSVCgDQJIKZQBIUqEMAEkqlAEgSYUyACSpUAaAJBXKAJCkQhkAklQoA0CSCmUASFKhDABJKpQBIEmFMgAkqVAGgCQVygCQpEIZAJJUKANAkgplAEhSoQwASSqUASBJhTIAJKlQBoAkFcoAkKRCGQCSVCgDQJIKZQBIUqHqCoCIOCoiVkTE7yJiQ0T8ZUR8ICJ+FhFPVo8jq74REbdGxMaIeDQipgzOU5AkDUS9RwDfA+7PzA8DpwEbgAXAqswcD6yq5gEuBMZXP/OAxXXuW5JUhwEHQES8H/gEcDtAZv4xM18FpgNLqm5LgBnV9HTgrqz5NXBURBw34MolSXWp5wjgJKALuCMiHomIf4mI9wHHZubzANXjMVX/McCmHut3Vm3vEBHzImJNRKzp6uqqozxJ0r7UEwDDgCnA4sw8HXidPw/39CZ6acs9GjJvy8y2zGwbPXp0HeVJkvalngDoBDoz8+FqfgW1QNiya2inenyxR/9xPdYfC2yuY/+SpDoMOAAy8wVgU0R8qGqaCjwO3APMqdrmACur6XuA2dXVQGcB3buGiiRJjTeszvWvBpZGxCHAU8DnqIXKDyPiSuBZ4LKq733ANGAj8EbVV5LUJHUFQGZ2AG29LJraS98EvljP/iRJg8c7gSWpUAaAJBXKAJCkQhkAklQoA0CSCmUASFKhDABJKpQBIEmFMgAkqVAGgCQVygCQpEIZAJJUKANAkgplAEhSoQwASSqUASBJhTIAJKlQBoAkFcoAkKRCGQCSVCgDQJIKZQBIUqEMAEkqlAEgSYUyACSpUAaAJBXKAJCkQhkAklQoA0CSCmUASFKh6g6AiDg4Ih6JiHur+RMj4uGIeDIilkfEIVX7odX8xmp5a737liQN3GAcAXwZ2NBj/kbgu5k5HtgKXFm1Xwlszcy/AL5b9ZMkNUldARARY4GLgH+p5gM4F1hRdVkCzKimp1fzVMunVv0lSU1Q7xHALcC1wFvV/Cjg1czcWc13AmOq6THAJoBqeXfVX5LUBAMOgIi4GHgxM9t7NvfSNfuxrOd250XEmohY09XVNdDyJEl9qOcI4GPA30TE08AyakM/twBHRcSwqs9YYHM13QmMA6iWjwBe2X2jmXlbZrZlZtvo0aPrKE+StC8DDoDMXJiZYzOzFfgM8IvMnAWsBi6tus0BVlbT91TzVMt/kZl7HAFIkhrjvbgP4DrgmojYSG2M//aq/XZgVNV+DbDgPdi3JKmfhvXdpW+Z+SDwYDX9FHBmL33eBC4bjP1JkurnncCSVCgDQJIKZQBIUqEMAEkqlAEgSYUyACSpUAaAJBXKAJCkQhkAklQoA0CSCmUASFKhDABJKpQBIEmFMgAkqVAGgCQVygCQpEIZAJJUKANAkgplAEhSoQwASSqUASBJhTIAJKlQBoAkFcoAkKRCGQCSVCgDQJIKZQBIUqEMAEkqlAEgSYUyACSpUAaAJBVqwAEQEeMiYnVEbIiI9RHx5ar9AxHxs4h4snocWbVHRNwaERsj4tGImDJYT0KS9O7VcwSwE/hqZp4MnAV8MSJOARYAqzJzPLCqmge4EBhf/cwDFtexb0lSnQYcAJn5fGaurab/AGwAxgDTgSVVtyXAjGp6OnBX1vwaOCoijhtw5ZKkugzKOYCIaAVOBx4Gjs3M56EWEsAxVbcxwKYeq3VWbbtva15ErImINV1dXYNRniSpF3UHQEQcAfxf4CuZ+R/76tpLW+7RkHlbZrZlZtvo0aPrLU+StBd1BUBEtFB78V+amT+qmrfsGtqpHl+s2juBcT1WHwtsrmf/kqSBq+cqoABuBzZk5j/2WHQPMKeangOs7NE+u7oa6Cyge9dQkSSp8YbVse7HgL8FHouIjqrtemAR8MOIuBJ4FrisWnYfMA3YCLwBfK6OfUuS6jTgAMjMf6f3cX2Aqb30T+CLA92fJGlweSewJBXKAJCkQhkAklQoA0CSCmUASFKhDABJKpQBIEmFMgAkqVAGgCQVygCQpEIZAJJUKANAkgplAEhSoQwASSqUASBJhTIAJKlQBoAkFcoAkKRCGQCSVCgDQJIKZQBIUqEMAEkqlAEgSYUyACSpUAaAJBXKAJCkQhkAklQoA0CSCmUASFKhDABJKlTDAyAiLoiIJyJiY0QsaPT+JUk1DQ2AiDgY+GfgQuAUYGZEnNLIGiRJNY0+AjgT2JiZT2XmH4FlwPQG1yBJovEBMAbY1GO+s2qTJDXYsAbvL3ppy3d0iJgHzKtmX4uIJ97zqspxNPBSs4voS9zY7ArUBP5tDq4T+tOp0QHQCYzrMT8W2NyzQ2beBtzWyKJKERFrMrOt2XVIu/NvszkaPQT0G2B8RJwYEYcAnwHuaXANkiQafASQmTsj4irg34CDgR9k5vpG1iBJqmn0EBCZeR9wX6P3K8ChNe2//NtsgsjMvntJkoYcPwpCkgplAEhSoQwASQ0XEQdFxF81u47SeQ5gCIuI0cDngVZ6nPDPzLnNqknaJSJ+lZl/2ew6Stbwq4DUUCuBh4CfA39qci3S7h6IiP8O/Ch9J9oUHgEMYRHRkZmTm12H1JuI+APwPmpvTrZR+6iYzMz3N7WwgngOYGi7NyKmNbsIqTeZeWRmHpSZLZn5/mreF/8G8ghgCOvxDuuPwI6q2XdY2i9ERACzgBMz8x8iYhxwXGb+vyaXVgwDQFJTRMRi4C3g3Mw8OSJGAg9k5keaXFoxPAk8xEXE3wCfqGYfzMx7m1mP1MNHM3NKRDwCkJlbqw+JVIN4DmAIi4hFwJeBx6ufL1dt0v5gR/U1sQlvX7b8VnNLKotDQENYRDwKTM7Mt6r5g4FHMnNScyuTICJmAVcAU4AlwKXA32fm/2lqYQVxCGjoOwp4pZoe0cxCpJ4yc2lEtANTqV0COiMzNzS5rKIYAEPbd4BHImI1tf9gnwAWNrck6R2eBP6D6rUoIv5rZj7b3JLK4RDQEBcRxwEfoRYAD2fmC00uSQIgIq4GvgFsoXYz2K4bwRyibBADYAiLiI8BHZn5ekR8ltpY6/cy85kmlyYRERupXQn0crNrKZVXAQ1ti4E3IuI04GvAM8BdzS1JetsmoLvZRZTMcwBD287MzIiYDtyambdHxJxmF6WyRcQ11eRTwIMR8RNg+67lmfmPTSmsQAbA0PaHiFgI/C3w19VloC1Nrkk6snp8tvo5pPqB6p4ANYbnAIawiPgvwEzgN5n579VnrZyTmXc3uTSJiLhs92v+e2vTe8cAGIKqD4Hb9YuN6jGr6e3ARuDrmbmqCeVJAETE2syc0leb3jsOAQ1BmXnk3pZVw0CnAkurR6mhIuJCYBowJiJu7bHo/cDO5lRVJgOgMJn5J+C3EfFPza5FxdoMrAEuA35P7ej0T9TuB/i7JtZVHIeAJDVURLQA/wv4n8DT1IYmxwF3ANdn5o69r63B5H0AkhrtJmAkcEJmTsnM04GTqH1W1c1NrawwHgFIaqiIeBL44O5fBF+dn/pdZo5vTmXl8QhAUqPl7i/+VeOf8D6AhjIAJDXa4xExe/fG6vOqfteEeorlEJCkhoqIMcCPgG1AO7V3/R8BDgM+nZnPNbG8ohgAkpoiIs4FJlC7Cmi9NyY2ngEgSYXyHIAkFcoAkKRCGQCSVCgDQJIKZQBIUqH+E63PxydaKaS5AAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plotUniqueValuesComparation(df_a_json, df_a_other, 'script_domain')" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
jsonother
sociaplus.com0.0899090.006558
tiqcdn.com0.0818660.037636
twimg.com0.079437NaN
google-analytics.com0.0000050.100417
adobedtm.com0.0084220.050673
yoox.biz0.0014980.041437
\n", + "
" + ], + "text/plain": [ + " json other\n", + "sociaplus.com 0.089909 0.006558\n", + "tiqcdn.com 0.081866 0.037636\n", + "twimg.com 0.079437 NaN\n", + "google-analytics.com 0.000005 0.100417\n", + "adobedtm.com 0.008422 0.050673\n", + "yoox.biz 0.001498 0.041437" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAFcCAYAAAAkiW7CAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3Xu8VXWd//HXO0Ao7yHNKKRQOhmikiBqGGqOt7wwNpqalZVlM+btV97Kn2ba/MoyzcrGzCTN8TbOqKikWeb9EqAg4mVERvOEFaKgoijo5/fHdx3YHA+ezTn7nLX3+r6fjwcPzlp7rXM+m31477W/63tRRGBmZnl4V9kFmJlZ33Hom5llxKFvZpYRh76ZWUYc+mZmGXHom5llxKFvZpYRh76ZWUYc+mZmGelfdgEdbbDBBjF8+PCyyzAzaynTp09/PiKGdHVc04X+8OHDmTZtWtllmJm1FEnP1HOcm3fMzDLi0Dczy4hD38wsI03Xpm9mVq+lS5fS1tbGkiVLyi6lzwwaNIhhw4YxYMCAbp3v0DezltXW1sbaa6/N8OHDkVR2Ob0uIliwYAFtbW2MGDGiW9/DzTtm1rKWLFnC4MGDswh8AEkMHjy4R59s6gp9SXtKekLSHEknd/L4BEkPSlom6YAOjx0m6cniz2HdrtTMrBO5BH67nj7fLkNfUj/gfGAvYCRwiKSRHQ77E/B54PIO574X+BawHTAO+Jak9XtUsZmZdVs9bfrjgDkRMRdA0pXARODR9gMi4unisbc6nLsHcGtEvFA8fiuwJ3BFjys3y8Hp63bzvEWNraNFDD/5poZ+v6e/t3ddx330ox/l3nvvbejP7i31NO8MBZ6t2W4r9tWjrnMlHSFpmqRp8+fPr/Nbm5k1h1YJfKgv9DtrQIo6v39d50bEhRExNiLGDhnS5dQRZmZNZa211uK5555jwoQJjB49mlGjRnHXXXcBcMUVV7DlllsyatQoTjrppJXOOeWUU9h6663Zfvvt+etf/9ontdYT+m3A+2u2hwHz6vz+PTnXzKxlXH755eyxxx7MmDGDmTNnMnr0aObNm8dJJ53EbbfdxowZM5g6dSrXXXcdAIsXL2b77bdn5syZTJgwgV/84hd9Umc9oT8V2EzSCElrAAcDk+v8/rcAu0tav7iBu3uxz8ysUrbddlsmTZrE6aefzqxZs1h77bWZOnUqO++8M0OGDKF///4ceuih3HnnnQCsscYa7LPPPgCMGTOGp59+uk/q7DL0I2IZcBQprB8Dro6I2ZLOkLQfgKRtJbUBBwI/lzS7OPcF4EzSG8dU4Iz2m7pmZlUyYcIE7rzzToYOHcpnP/tZLr30UiJW3RI+YMCA5d0v+/Xrx7Jly/qkzrpG5EbEFGBKh32n1Xw9ldR009m5FwMX96BGM7Om98wzzzB06FC+/OUvs3jxYh588EFOOukkjj32WJ5//nnWX399rrjiCo4++uhS6/Q0DGZWGfV2sWw0Sdx+++384Ac/YMCAAay11lpceumlbLjhhnz3u99ll112ISL4xCc+wcSJE0upcXmt7/Txowxjx44NL6JiVnA//Xf02GOP8eEPf7jUGhYsWMA222zDM8/UtYZJQ3T2vCVNj4ixXZ3ruXfMzLpp3rx57LDDDhx//PFll1I3N++YmXXTRhttxP/8z/+UXcZq8ZW+mVlGHPpmZhlx6JuZZcShb2aWEd/INbPq6G4X11V+v9Xv+rpw4UIuv/xyjjzySABuv/12zj77bG688cbG1tZNvtI3M2ughQsX8rOf/axh36/R0zM49M3MeuCcc85h1KhRjBo1ih/96EecfPLJPPXUU4wePZoTTjgBgFdeeYUDDjiAzTffnEMPPXT5nDzTp09np512YsyYMeyxxx4899xzAOy8885885vfZKedduK8885raL1u3jEz66bp06czadIkHnjgASKC7bbbjssuu4xHHnmEGTNmAKl556GHHmL27NlstNFGjB8/nnvuuYftttuOo48+muuvv54hQ4Zw1VVXccopp3DxxWmqsoULF3LHHXc0vGaHvplZN919993sv//+rLnmmgB88pOfXL54Sq1x48YxbFiak3L06NE8/fTTrLfeejzyyCPstttuALz55ptsuOGGy8856KCDeqVmh76ZWTfVO3fZwIEDl3/dPo1yRLDFFltw3333dXpO+xtJo7lN38ysmyZMmMB1113Hq6++yuLFi7n22msZP348L7/8cpfnfuhDH2L+/PnLQ3/p0qXMnj27t0uuzpX+8JNv6tZ5ZU3Fama9oI9nF91mm234/Oc/z7hx4wD40pe+xJgxYxg/fjyjRo1ir732Yu+9O8+YNdZYg2uuuYZjjjmGRYsWsWzZMo477ji22GKLXq25MlMrO/Stkjy18jtqhqmVy+Cplc3MrC4OfTOzjDj0zaylNVsTdW/r6fN16JtZyxo0aBALFizIJvgjggULFjBo0KBuf4/K9N4xs/wMGzaMtrY25s+fX3YpfWbQoEHLB3p1h0PfzFrWgAEDGDFiRNlltBQ375iZZcShb2aWEYe+mVlGHPpmZhlx6JuZZcShb2aWEYe+mVlGHPpmZhnx4KwW4amjzawR6rrSl7SnpCckzZF0ciePD5R0VfH4A5KGF/sHSLpE0ixJj0n6RmPLNzOz1dFl6EvqB5wP7AWMBA6RNLLDYYcDL0bEpsC5wFnF/gOBgRGxJTAG+Er7G4KZmfW9eq70xwFzImJuRLwBXAlM7HDMROCS4utrgF0lCQhgTUn9gXcDbwAvNaRyMzNbbfWE/lDg2ZrttmJfp8dExDJgETCY9AawGHgO+BNwdkS80PEHSDpC0jRJ03KaLc/MrK/VcyNXnezrOHn1qo4ZB7wJbASsD9wl6XcRMXelAyMuBC6EtEZuHTVZxfhGtVnfqOdKvw14f832MGDeqo4pmnLWBV4APg3cHBFLI+JvwD1Alwv3mplZ76gn9KcCm0kaIWkN4GBgcodjJgOHFV8fANwWaSmbPwEfV7ImsD3weGNKNzOz1dVl6Bdt9EcBtwCPAVdHxGxJZ0jarzjsl8BgSXOArwHt3TrPB9YCHiG9eUyKiIcb/BzMzKxOdQ3OiogpwJQO+06r+XoJqXtmx/Ne6Wy/mZmVw9MwmJllxKFvZpYRh76ZWUYc+mZmGXHom5llxKFvZpYRh76ZWUYc+mZmGXHom5llxKFvZpYRh76ZWUYc+mZmGXHom5llxKFvZpYRh76ZWUYc+mZmGXHom5llxKFvZpYRh76ZWUYc+mZmGXHom5llxKFvZpYRh76ZWUYc+mZmGXHom5llxKFvZpYRh76ZWUYc+mZmGXHom5llxKFvZpYRh76ZWUYc+mZmGakr9CXtKekJSXMkndzJ4wMlXVU8/oCk4TWPbSXpPkmzJc2SNKhx5ZuZ2eroMvQl9QPOB/YCRgKHSBrZ4bDDgRcjYlPgXOCs4tz+wGXAv0TEFsDOwNKGVW9mZqulniv9ccCciJgbEW8AVwITOxwzEbik+PoaYFdJAnYHHo6ImQARsSAi3mxM6WZmtrrqCf2hwLM1223Fvk6PiYhlwCJgMPAPQEi6RdKDkk7s7AdIOkLSNEnT5s+fv7rPwczM6lRP6KuTfVHnMf2BHYFDi7/3l7Tr2w6MuDAixkbE2CFDhtRRkpmZdUc9od8GvL9mexgwb1XHFO346wIvFPvviIjnI+JVYAqwTU+LNjOz7qkn9KcCm0kaIWkN4GBgcodjJgOHFV8fANwWEQHcAmwl6T3Fm8FOwKONKd3MzFZX/64OiIhlko4iBXg/4OKImC3pDGBaREwGfgn8WtIc0hX+wcW5L0o6h/TGEcCUiLipl56LmZl1ocvQB4iIKaSmmdp9p9V8vQQ4cBXnXkbqtmlmZiXziFwzs4w49M3MMuLQNzPLiEPfzCwjDn0zs4w49M3MMuLQNzPLiEPfzCwjDn0zs4w49M3MMuLQNzPLiEPfzCwjDn0zs4w49M3MMlLX1MpmZg13+rrdPG9RY+vIjK/0zcwy4tA3M8uIQ9/MLCMOfTOzjDj0zcwy4tA3M8uIQ9/MLCMOfTOzjDj0zcwy4tA3M8uIQ9/MLCMOfTOzjDj0zcwy4tA3M8uIQ9/MLCMOfTOzjDj0zcwy4tA3M8tIXcslStoTOA/oB1wUEd/r8PhA4FJgDLAAOCginq55fGPgUeD0iDi7MaWbmTWxJl0OsssrfUn9gPOBvYCRwCGSRnY47HDgxYjYFDgXOKvD4+cCv+l5uWZm1hP1NO+MA+ZExNyIeAO4EpjY4ZiJwCXF19cAu0oSgKR/AuYCsxtTspmZdVc9oT8UeLZmu63Y1+kxEbEMWAQMlrQmcBLw7Xf6AZKOkDRN0rT58+fXW7uZma2mekJfneyLOo/5NnBuRLzyTj8gIi6MiLERMXbIkCF1lGRmZt1Rz43cNuD9NdvDgHmrOKZNUn9gXeAFYDvgAEnfB9YD3pK0JCJ+2uPKzcxstdUT+lOBzSSNAP4MHAx8usMxk4HDgPuAA4DbIiKAj7UfIOl04BUHvplZeboM/YhYJuko4BZSl82LI2K2pDOAaRExGfgl8GtJc0hX+Af3ZtFmZtY9dfXTj4gpwJQO+06r+XoJcGAX3+P0btRnPdWkfYXNrBwekWtmlhGHvplZRhz6ZmYZceibmWXEoW9mlhGHvplZRhz6ZmYZceibmWXEoW9mlhGHvplZRhz6ZmYZceibmWXEoW9mlhGHvplZRhz6ZmYZceibmWXEoW9mlhGHvplZRhz6ZmYZceibmWXEoW9mlhGHvplZRhz6ZmYZceibmWXEoW9mlhGHvplZRhz6ZmYZceibmWXEoW9mlhGHvplZRhz6ZmYZceibmWWkrtCXtKekJyTNkXRyJ48PlHRV8fgDkoYX+3eTNF3SrOLvjze2fDMzWx1dhr6kfsD5wF7ASOAQSSM7HHY48GJEbAqcC5xV7H8e2DcitgQOA37dqMLNzGz11XOlPw6YExFzI+IN4EpgYodjJgKXFF9fA+wqSRHxUETMK/bPBgZJGtiIws3MbPXVE/pDgWdrttuKfZ0eExHLgEXA4A7H/DPwUES83vEHSDpC0jRJ0+bPn19v7WZmtprqCX11si9W5xhJW5CafL7S2Q+IiAsjYmxEjB0yZEgdJZmZWXfUE/ptwPtrtocB81Z1jKT+wLrAC8X2MOBa4HMR8VRPCzYzs+6rJ/SnAptJGiFpDeBgYHKHYyaTbtQCHADcFhEhaT3gJuAbEXFPo4o2M7Pu6TL0izb6o4BbgMeAqyNitqQzJO1XHPZLYLCkOcDXgPZunUcBmwKnSppR/Hlfw5+FmZnVpX89B0XEFGBKh32n1Xy9BDiwk/O+A3ynhzWamVmDeESumVlGHPpmZhlx6JuZZcShb2aWEYe+mVlGHPpmZhlx6JuZZcShb2aWEYe+mVlGHPpmZhlx6JuZZcShb2aWEYe+mVlGHPpmZhlx6JuZZcShb2aWEYe+mVlGHPpmZhmpa7lEM7NVGX7yTd067+lBDS7E6uIrfTOzjDj0zcwy4tA3M8uI2/RPX7eb5y1qbB1mZn3AV/pmZhlx6JuZZcShb2aWEYe+mVlGHPpmZhlx6JuZZcShb2aWEYe+mVlGHPpmZhmpa0SupD2B84B+wEUR8b0Ojw8ELgXGAAuAgyLi6eKxbwCHA28Cx0TELQ2r3sysl1VtFtEur/Ql9QPOB/YCRgKHSBrZ4bDDgRcjYlPgXOCs4tyRwMHAFsCewM+K72dmZiWop3lnHDAnIuZGxBvAlcDEDsdMBC4pvr4G2FWSiv1XRsTrEfG/wJzi+5mZWQnqad4ZCjxbs90GbLeqYyJimaRFwOBi//0dzh3a8QdIOgI4oth8RdITdVXfAIINgOdX+8Rvq/HF9ILKP7+zuvn8WkdlX7/K/272/fPbpJ6D6gn9ziqIOo+p51wi4kLgwjpqaThJ0yJibBk/uy/4+bW2Kj+/Kj83aN7nV0/zThvw/prtYcC8VR0jqT+wLvBCneeamVkfqSf0pwKbSRohaQ3SjdnJHY6ZDBxWfH0AcFtERLH/YEkDJY0ANgP+2JjSzcxsdXXZvFO00R8F3ELqsnlxRMyWdAYwLSImA78Efi1pDukK/+Di3NmSrgYeBZYBX42IN3vpuXRXKc1KfcjPr7VV+flV+blBkz4/pQtyMzPLgUfkmpllxKFvZpYRh76ZWUYc+mZmGalrwrWqkrQONf8GEfFCieVYHYq5m/YGhrPya3dOWTXZ6pG0FW9//f67tIIaSNJTwA8i4oKafTdGxD4llrWSLENf0leAM4DXWDFCOIAPlFZUg0jaBziTNCS7P2lUdETEOqUW1jg3AEuAWcBbJdfScJLGAqfw9tdvq1ILaxBJFwNbAbNZ8foFUInQB5YCu0jaDvhKMV/Z26aeKVOWoQ8cD2wREVWcs+VHwCeBWVHN/rjDqhKAq/AfwAlU9E0N2D4iOs7SWyWvRsRBkk4E7pL0KTqZeqZMuYb+U8CrZRfRS54FHqlo4AP8RtLuEfHbsgvpJfOLAY9VdZ+kkRHxaNmF9BIBRMT3JU0nDWp9b7klrSzLwVmSPgJMAh4AXm/fHxHHlFZUg0jaltS8cwcrP7dKtHlL2h+4jNQJYSkVa76StCtwCPB7Vn79KtH8IWkCqYnuL6TnV7Xmq30j4oaa7U2AwyLijBLLWkmuV/o/B26jmh+h/w14BRgErFFyLb3hh8AOVLf56gvA5sAAqtnmfTHwWSr2f0/S5hHxOPBnSdt0ePjGMmpalVxDf1lEfK3sInrJeyNi97KL6EVPUu3mq60jYsuyi+hFf6po89XXSGuC/LCTxwL4eN+Ws2q5hv4fioVbbmDlj9BV6LL5u4q3eT8H3C7pN1Sw+Qq4v+Jt3o9Lupy3/99r6U8yEXFE8fcuZdfSlVzb9P+3k90REVXosvkysCbwBqnNG6rV5v2tzvZHxLf7upbeIOkx4IPA/1LNNu9JneyOiPhinxfTCyQNAo4EdiRd4d8FXBARS0otrEaWoW+tT9LapLB4pexaGqm48fc2EfFMX9diq6+YSv5lUmcDSDfl14+IA8uramVZhr6kAcC/AhOKXbcDP4+Ipas8qYVI2o+a5xYRTXUjqSckjQJ+zYpucM8Dn4uI2eVV1ViStgY+VmzeFREzy6ynkSQNA34CjCddCd8NHBsRbaUW1iCSZkbE1l3tK1Ouc+/8OzAG+FnxZ0yxr+VJ+h5wLGnhmkeBY4t9VXEh8LWI2CQiNgG+Dvyi5JoaRtKxpAFa7yv+XCbp6HKraqhJpBX1NiKNVL2h2FcVD0navn2jGJl7T4n1vE2uV/pN/27cXZIeBkZHxFvFdj/goQq1CVf2tYPlr98OEbG42F4TuK9Cr9+MiBjd1b5WI2kW6ZPLAOBDwJ+K7U2ARyNiVInlrSTX3jtvSvpgRDwFIOkDQLMt49gT65GWrYS0SH2VzJV0KqmJB+AzpJueVSFW/l18s9hXFc9L+gxwRbF9CLCgxHoapWkmVOtKrqF/Aqnb5lzSf6hNSINiquC7pI+YfyA9twnAN8otqaG+CHybFYOV7qQ6rx0UI8UlXVts/xNpDeqq+CLwU+Bc0pXwvcW+ltZKN9qzbN4BkDSQ9DFMwOMR8XoXp7QMSRsC25Ke2wMR8ZeSS7LVUIzo3JH0+t0ZEQ+VXJL1QLNNrZzljVxJXwXeHREPFz0j3iPpyLLraoRibppXI2JyRFwPLJH0T2XX1SiSbpW0Xs32+pJuKbOmRipuAj4ZET+OiPOAOcXNwEqQdEknr9/FZdbUB75cdgG1sgx94MsRsbB9IyJepMlemB74VkQsat8onmenA5pa1AadvHbvK7GeRvt30txJ7RZTkZ5lha06ef0+UmI9DSWps9/FphoYmWvov0vS8ptjRQ+XqkxO1tlrWqV7N29J2rh9oxjMVKU2StXOK1T0wqrS6/cuSeu3b0h6L9V6fu1z6AMg6evAte9wfJ+r0j/26rgFuFrSBaTA+Bfg5nJLaphpks4Bzic9t6OB6eWW1FCnAHdLuqPYnkCa6Koq5ko6hhVX90cCc0usp9F+CNwr6RrS7+enSDPDVsXOwIWSDgT+DngMGFdqRR1keSNX0rtIQfGPpJtlvwUuioiW77ZZ9Os+lfTcID23f2vv910FkjYAtie9dvdVaQW0onngx6RZGYM0r/5xEfG3UgtrIEkjSc9PwO+rNrlccc/wG6Spow+JCA/OMjOrIkm3kmaCPQYYRlo/4M6IOL7Uwmrk2qb/NpJOL7uG3lJMI11Zkh4su4beVCx2X1mSKjM3FHB+RHwuIhZGxCPAR4FFXZ3Ulxz6K1Sp3bujKo3ofJuI6LhSUdVsW3YBvawqPeeIiOsk/Z2kfYo36/dGxJll11XLzTtmZg1S9Nz5AWnmXpFmSz0hIq4ps65aWYa+pO8D3wFeI/Xa2Zp0s+yydzyxBRQjjf8ZGE5N76xmWpi5J4pFYjr+0i4CpgFfj4iW7ulS9Pq4OSJelvR/gW2AM6syKre4+j2TNPVJf6q3sP1MYLf2G++ShgC/a6YJAXNt3tk9Il4iTZLUBvwDaT6eKrgemAgsIw3saf9TFeeQXquhpBtlx5OmVr6SdNOs1Z1aBP6OwB7AJcAFJdfUSD8CDgMGR8Q6EbF2VQK/8K4OPa0W0GQ5m2s//QHF358AroiIF2rGarW6YRGxZ9lF9KI9I6J2WoILJd0fEWdI+mZpVTVOe7fhvYF/j4jrK9bJ4FmqvbD9zcW0IO2ziB4ETCmxnrfJNfRvkPQ4qXnnyOIjWNOsYdlD90raMiJmlV1IL3mraDdtbyM9oOaxKgTJnyX9nDTO4qyiua6prhR76ERgSjG4rnIL20fECZI+yYoJ8y6MiKYakZtlmz6kiZ6AlyLiTUnvAdapwmyUkh4FNqW6C2t/ADgP2IEU8vcD/wf4MzAmIu4usbweK34X9wRmRcSTxYypW0bEb0surSEk/ZY0t9As0uAloDoL2wNI+jvSKNwA/thsA+uyDH1Jn+tsf0Rc2te1NJoX1m5txSybsyPi5WJ7bWBkRDxQbmWNIWlaRIwtu47e4t47TUrST2o2BwG7Ag9GxAGrOKXpFRNXrVJEvPBOj7cKSSNI8wkNZ+XeSfuVVVMjSXoI2Ka9zbuYMmRaVcYiFOs131aVTy4dtULvnSzb9CNipYWmJa3LiuX3WtV00sdJARsDLxZfr0dar3NEeaU11HWklaRuoKZ5oELeNsumpCr9P/0qcKKk14GlVKzLJu690zJeBTYru4ieiIgRAMXMoZMjYkqxvRcrJl+rgiUR8eOyi+hFlZ5lMyLWLruGXvabZu+9k2vzzg2s6OnxLmAkcHVEnFxeVY0haXpEjOmwrzLtqJI+TXqD/i0r9/6oxPw7VZ9lU9LvI2LXrva1quIN+1lSW377cpdN1Xsn1yv9s2u+XgY8ExFtZRXTYM8XIzkvI4XGZ0gfMatiS+CzpFBsb96JYrvlFeF+cNl1NJqkQcB7gA2KnnPtA2PWATYqrbDGex9phs0HSYMFm24pzyyv9KusuKH7LdLiIgHcCZxRoRu5j5OW3Huj7FoaSdKJEfH9opPB2/5TRsQxJZTVMJKOBY4jBfy8modeAn4RET8tpbBeUKzKtzvwBWAscDXwy4h4qtTCClld6a9i3hao0M2kItyPLbuOXjSTdHO6Es0dNR4r/p5WahW9pFjk/TxJR0fET7o8oYVFREj6C/AXUkvC+sA1km6NiBPLrc5X+pVTLOJwYPvi08VH6SsjYo9yK2sMSbcDWwFTWblNvypdNg+MiP/sal+rKUaprlJE/Hdf1dKbijb9w4DngYuA6yJiadH19smI+GCpBZLZlX4tSduQhkoHcHdVZjEENmgPfICIeLG4OVgV3yq7gF72DaBjwHe2r9XsW/z9PtLCIrcV27uQBjJVIvSBDYBPdhwMWXS9bYrFcLIMfUmnAQey4hftV5L+MyK+U2JZjfKWpI0j4k+wfIRuZT7ORcQdXR/VeoqutZ8Ahkqq7ZK6DqmJoKVFxBdg+SpZIyPiuWJ7Q+D8MmtrpIg47R0ee2xVj/WlLEMfOAT4SEQsgeWjBB8kzbHf6k4B7i4mtIJ0Q7fll0uUdHdE7NjJfZmq3I+ZR2rP34+VV3F7mTS3UFUMbw/8wl9JU5tbH8k19J8mTb/QPrPmQKAp7qz3VETcXDRdbU8KxP8TEc+XXFaPRcSOxd+VHNwTETOBmZJeAG6KiCqONga4vWbwUpC6p/6h3JLykuWNXEnXkdYdvZX0i7cbcDdFj5BW7h5XBP4qtfogJklnkLqh3hcRVVocBgBJl5FmEP0vYFKzNAk0kqT9SZ9AoQkHL1VdrqF/2Ds9HhGX9FUtjSbpftISew+TrvS3BP5ImuckIqKlBzFJ+iLpBvwOpKaPu0jBcX2phTWQpHVITZBfIF2UTCIt9vNyqYU1SHGfabOI+F0xlXS/qjy3VpBl6FeZpCuBf2tfREXSKOD4iPh8qYU1mKS/Bz5FWi5x/ao1+0jagDSa+jhSH/5NgR+3eh93SV8m3WN6b0R8UNJmwAVVmYahFTTV7G99RdJmkq6R9Kikue1/yq6rQTavXTUrIh4BRpdYT0NJukjSvaQJyfqTVs5av9yqGkfSvpKuJXVpHACMi4i9gK1Jb3Ct7qvAeNJIXCLiSVI3Tusjud7InUTq730uqZ/wF1gxF0ire0zSRaw8906V2oUHA/2AhcALwPMR0fJdGmscCJwbEXfW7oyIV4umrVb3ekS80b4mdTFttJsb+lCWzTvtM1FKmhURWxb77oqIj5VdW08VE1v9KzU3ykgLbFdlDWAAJH0Y2IPUnbFfRAwruSSrg6Tvk96wP0daDOdI4NGIOKXUwjKSa+jfQ5r69BrSx+g/A9+LiA+VWph1qRjV+DHSm9r6wH3AXRFxcamF9dCqxh+0/12BcQjA8pXADidNSCbSLJQXRY5BVJJcQ39bUpPHesCZwLrA9yPi/lIL6wFJV0fEpyTNovNZGquyMHr7dLV3RcS8Yt9ZEXFSuZVZvSStAWxO+j19omozpja7LEO/iiRtGBHPSboaOKH2IdIb2qdKKq2hJD3Ycb1YSQ9X5U2tXTFf0qD27fZpNVqdpL2BC0iDIUV3F3DaAAAGbUlEQVRaxvMrEfGbUgvLSFY3ciX9KCKO67By1nKtPFNjzdD2TTtO9iRp8xJKaihJ/0pq//2ApIdrHlobuKecqhpP0n7AD0nzzv8N2IT0qXSLMutqoB8Cu0TEHABJHwRuAhz6fSSr0GfF4udnv+NRLSiDULycFAzfBWqXtXy5KgvEFM4kTaHxu4j4iKRdSAO1quJv7YFfmEv11kZoalk270haE3itfX4TSf2AgRHxarmVdZ+kdUk3NqseipXWvp6xpJmkSQHfkvTHiBhXdm09UTOf/m6kTy9Xkz5tH0hq1/96WbXlJrcr/Xa/B/4ReKXYfjdpoe2PllZRD0XEImAR1boqzNFCSWuRutr+h6S/UYGplVkxnz6kmTV3Kr6eT4UG17WCXK/0Z0TE6K72mfW14lPoEtJNzkNJPcv+IyKqtLi9lSjLaRiAxbWzUUoaA7xWYj1mAETE4oh4MyKWRcQlEfHjKgW+pGGSrpX0N0l/lfRfkjywrg/leqW/LXAlaeEKgA2BgyJi+qrPMut9Rdv3WaT5aET1BmfdSrop396p4jPAoRGxW3lV5SXL0AeQNAD4EOk/1eMRsbTkksyQNAfYt4rz6IObVptBljdyi8CvnZ/mdkk/d/BbE/hrVQO/8Lykz5BWzoLU8aAyzVetIMsr/WIWygFA+2IpnwXejIgvlVeVGUg6D/h74Drg9fb9EfHfpRXVQJI2Bn5KWgQngHuBYzsOKLTek2voz4yIrbvaZ9bXJE3qZHdERBWmVbYmkGXzDvCmpA9GxFMAkj4AvFlyTWZExBfKrqE3SPoJ7zBvfiuvS91qcg3944E/1KyWNZy0kIpZqYr1EA4nzbVTO+Faq1/pTyv+Hg+MBK4qtg8E3GuuD+Ua+oOBUaSwn0gaibuozILMCr8GHictEHMGaYBWy9/YjYhLACR9njTh2tJi+wLSaHjrI7kOzjo1Il4C1iHNBXIBac1Vs7JtGhGnAouLoNwb2LLkmhppI9IkgO3WKvZZH8k19Nvb7/cGLoiI64E1SqzHrF17t+GFkkaRpmEYXl45Dfc94CFJv5L0K+BB4P+VW1Jecu29cyNpicR/BNqnYPije+9Y2SR9Cfgv0tX9r0hXwqdGxM/LrKuRJG1E6ib9GPAeYF7HheCt9+Qa+u8B9gRmRcSTkjYEtowIty2a9aLiTe1YYBgwg7R2wH0R8fFSC8tIlqFv1gok3RgR+5RdRyMVazhvC9wfEaOLVd2+HREHlVxaNnJt0zdrBUPLLqAXLImIJQCSBkbE46Q5sKyP5Npl06wVPFR2Ab2gTdJ6pGkmbpX0Iitmu7U+4OYdsyYj6d3AxhHxRNm19CZJO5F6J90cEW+UXU8u3Lxj1kQk7Uu6wXlzsT1a0uRyq+odEXFHREx24Pcth75ZczkdGAcsBIiIGVSrn76VzKFv1lyWFYvcm/UK38g1ay6PSPo00E/SZsAxpDnnzRrCV/pmzeVo0gybr5NWl3oJOK7UiqxS3HvHzCwjbt4xawKSbuCdFxnZrw/LsQpz6Js1h7PLLsDy4OYdM7OM+ErfrIkUE5J1vBJbRFpu8DsRsaDvq7IqceibNZffkBb5ubzYPhgQKfh/BexbTllWFW7eMWsiku6JiPGd7ZM0KyKqtHSilcD99M2ay1qStmvfkDSOtHoWwLJySrIqcfOOWXP5EnCxpLVIzTovAYdLWhP4bqmVWSW4ecesCUlal/T/c2HZtVi1uHnHrIlIWlfSOcDvgd9J+mHxBmDWEA59s+ZyMfAy8Kniz0vApFIrskpx845ZE5E0IyJGd7XPrLt8pW/WXF6TtGP7hqTxwGsl1mMV4yt9syYiaTRwCWntWAEvAIdFxMOlFmaV4dA3a0KS1gGIiJfKrsWqxc07Zk2kpvfObcBt7r1jjebQN2su7r1jvcrNO2ZNxL13rLf5St+subj3jvUqX+mbNRFJWwOXknrvALyIe+9YAzn0zZqIpK8VX7bPrPkKaS796RExo5yqrErcvGPWXMYC/wKsQ7raPwLYGfiFpBNLrMsqwlf6Zk1E0i3AP0fEK8X2WsA1wP6kq/2RZdZnrc9X+mbNZWPgjZrtpcAmEfEa8Ho5JVmVeBEVs+ZyOXC/pOuL7X2BK4pFVB4tryyrCjfvmDUZSWOAHUlz79wdEdNKLskqxKFvZpYRt+mbmWXEoW9mlhGHvplZRhz6ZmYZ+f/0/wq8Ey4tqgAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plotTopUsageComparation(df_a_json, df_a_other, 'script_domain', 3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# TLD\n", + "All top 3 domains are the same for both valid JSON and non-JSON, and they remain for the filtered data. But the are some domains that only appear in the whole sample producing only smaller values[1].\n", + "\n", + "---\n", + " For futher investigation: \n", + "1. Why are there TLD that only produces smaller values? What are they? Are there also the ones that only produces bigger values? " + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are 248 unique script_tld present on the non-json dataset and 141 on the JSONs\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAEOCAYAAACHE9xHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAE8JJREFUeJzt3X+QnVWd5/H3F9IQHTDGEBhIIo1WUIgJTWhZdnWHDMElBh1+rCApNKmJS9YqUJklYhK3ytFVRIpxGLbWVGVKJVDZSdiMFBQiE0lBgVWOmo5tSIhoygmkSQwNYgSBTBK++0c/HZuk079vbvfp96vq1n2ec895nm+nO59++txz743MRJJUrmPqXYAkqbYMekkqnEEvSYUz6CWpcAa9JBXOoJekwhn0klQ4g16SCmfQS1LhxtS7AICTTjopGxsb612GJI0oLS0tL2TmxN76DYugb2xsZMOGDfUuQ5JGlIh4pi/9nLqRpMIZ9JJUOINekgrX6xx9REwB7gb+HHgDWJGZ/xARfwtcB7RXXZdl5kPVmKXAp4ADwGcz81/6W9i+fftoa2vj9ddf7+9Q1cjYsWOZPHkyDQ0N9S5FUj/05cnY/cBNmbkxIk4EWiLih9Vjf5+Zt3ftHBFnA9cA04DTgEci4szMPNCfwtra2jjxxBNpbGwkIvozVDWQmbz44ou0tbVxxhln1LscSf3Q69RNZu7KzI3V9svAVmBSD0MuA1Zn5t7M/DdgG3B+fwt7/fXXmTBhgiE/TEQEEyZM8C8saQTq1xx9RDQC5wI/qZpuiIhNEfGdiBhftU0CdnQZ1kbPvxh6Ot9AhqlG/H5II1Ofgz4iTgD+GbgxM/8ALAfeDTQBu4C/6+zazfDDPq8wIhZFxIaI2NDe3t7NEEnSUOjTC6YiooGOkF+Vmd8DyMzdXR7/R+DBarcNmNJl+GRg56HHzMwVwAqA5ubmXj+4tnHJ9/tSap9tv/XSHh+fNWsWS5cu5ZJLLjnYdscdd/CrX/2Kb33rW0ccd8IJJ/DKK68MSY133303t912G5lJZrJw4UIWL148JMfudMstt7Bs2bIhPaaOvqH+/zHa9ZYPI02vV/TR8ff6t4GtmfnNLu2ndul2BbC52n4AuCYijo+IM4CpwE+HruSjY968eaxevfpNbatXr2bevHlH5fw/+MEPuOOOO1i3bh1btmxh48aNjBs3bsjPc8sttwz5MSUNL32ZuvkA8EngoohorW5zgdsi4smI2AT8JfA3AJm5BbgXeAp4GLi+vytuhoOPfexjPPjgg+zduxeA7du3s3PnTj74wQ/yyiuvMHv2bGbOnMn06dO5//77Dxv/2GOP8ZGPfOTg/g033MBdd90FQEtLCxdeeCHnnXcel1xyCbt27Tps/Ne//nVuv/12TjvtNKBjaeN1110HQGtrKxdccAEzZszgiiuu4KWXXgI6/grpfCuJF154gc73D7rrrru48sormTNnDlOnTuXmm28GYMmSJbz22ms0NTVx7bXX8sc//pFLL72Uc845h/e9732sWbNmCP4lJdVbX1bd/CgzIzNnZGZTdXsoMz+ZmdOr9r/KzF1dxnwtM9+dme/JzB/U9kuojQkTJnD++efz8MMPAx1X8x//+MeJCMaOHct9993Hxo0befTRR7npppvI7HX2Ceh4fcBnPvMZ1q5dS0tLCwsXLuSLX/ziYf02b97Meeed1+0x5s+fzze+8Q02bdrE9OnT+fKXv9zreVtbW1mzZg1PPvkka9asYceOHdx666285S1vobW1lVWrVvHwww9z2mmn8Ytf/ILNmzczZ86cPn1NkoY3Xxnbg67TN12nbTKTZcuWMWPGDC6++GKee+45du/e3dOhDnr66afZvHkzH/rQh2hqauKrX/0qbW1tfa5pz549/P73v+fCCy8EYMGCBTz++OO9jps9ezbjxo1j7NixnH322TzzzOHvhTR9+nQeeeQRvvCFL/DEE0/UZKpI0tFn0Pfg8ssvZ/369WzcuJHXXnuNmTNnArBq1Sra29tpaWmhtbWVU0455bD15WPGjOGNN944uN/5eGYybdo0WltbaW1t5cknn2TdunWHnXvatGm0tLT0q96u5zy0nuOPP/7g9rHHHsv+/fsPG3/mmWfS0tLC9OnTWbp0KV/5ylf6dX5Jw5NB34MTTjiBWbNmsXDhwjc9Cbtnzx5OPvlkGhoaePTRR7u9Oj799NN56qmn2Lt3L3v27GH9+vUAvOc976G9vZ0f//jHQMdUzpYtWw4bv3TpUm6++WZ++9vfArB3717uvPNOxo0bx/jx43niiScAuOeeew5e3Tc2Nh785bB27do+fY0NDQ3s27cPgJ07d/LWt76VT3ziEyxevJiNGzf26RiShrdh8X70fVGv5U7z5s3jyiuvfNMKnGuvvZaPfvSjNDc309TUxHvf+97Dxk2ZMoWrr76aGTNmMHXqVM4991wAjjvuONauXctnP/tZ9uzZw/79+7nxxhuZNm3am8bPnTuX3bt3c/HFF5OZRAQLFy4EYOXKlXz605/m1Vdf5V3vehff/e53AVi8eDFXX30199xzDxdddFGfvr5FixYxY8YMZs6cyfz58/n85z/PMcccQ0NDA8uXLx/Qv5mk4SX6+iRiLTU3N+ehHzyydetWzjrrrDpVpCPx+zI8uY5+aI2UdfQR0ZKZzb31c+pGkgpn0EtS4YZ10A+HaSX9id8PaWQatkE/duxYXnzxRcNlmOh8P/qxY8fWuxRJ/TRsV91MnjyZtrY2fGfL4aPzE6YkjSzDNugbGhr8JCNJGgLDdupGkjQ0DHpJKpxBL0mFM+glqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhTPoJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBXOoJekwvUa9BExJSIejYitEbElIj5Xtb8jIn4YEb+u7sdX7RERd0bEtojYFBEza/1FSJKOrC9X9PuBmzLzLOAC4PqIOBtYAqzPzKnA+mof4MPA1Oq2CFg+5FVLkvqs16DPzF2ZubHafhnYCkwCLgNWVt1WApdX25cBd2eHfwXeHhGnDnnlkqQ+6dccfUQ0AucCPwFOycxd0PHLADi56jYJ2NFlWFvVJkmqgz4HfUScAPwzcGNm/qGnrt20ZTfHWxQRGyJiQ3t7e1/LkCT1U5+CPiIa6Aj5VZn5vap5d+eUTHX/fNXeBkzpMnwysPPQY2bmisxszszmiRMnDrR+SVIv+rLqJoBvA1sz85tdHnoAWFBtLwDu79I+v1p9cwGwp3OKR5J09I3pQ58PAJ8EnoyI1qptGXArcG9EfAp4FriqeuwhYC6wDXgV+OshrViS1C+9Bn1m/oju590BZnfTP4HrB1mXJGmI+MpYSSqcQS9JhTPoJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBXOoJekwhn0klQ4g16SCmfQS1LhDHpJKpxBL0mFM+glqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhTPoJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqXK9BHxHfiYjnI2Jzl7a/jYjnIqK1us3t8tjSiNgWEU9HxCW1KlyS1Dd9uaK/C5jTTfvfZ2ZTdXsIICLOBq4BplVjvhURxw5VsZKk/us16DPzceB3fTzeZcDqzNybmf8GbAPOH0R9kqRBGswc/Q0Rsama2hlftU0CdnTp01a1HSYiFkXEhojY0N7ePogyJEk9GWjQLwfeDTQBu4C/q9qjm77Z3QEyc0VmNmdm88SJEwdYhiSpNwMK+szcnZkHMvMN4B/50/RMGzClS9fJwM7BlShJGowBBX1EnNpl9wqgc0XOA8A1EXF8RJwBTAV+OrgSJUmDMaa3DhHxT8As4KSIaAO+BMyKiCY6pmW2A/8dIDO3RMS9wFPAfuD6zDxQm9IlSX3Ra9Bn5rxumr/dQ/+vAV8bTFGSpKHjK2MlqXAGvSQVrtepG/1J45Lv17uEomy/9dJ6lyCNCl7RS1LhDHpJKpxBL0mFM+glqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhTPoJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBXOoJekwhn0klQ4g16SCmfQS1LhDHpJKpxBL0mFM+glqXAGvSQVzqCXpML1GvQR8Z2IeD4iNndpe0dE/DAifl3dj6/aIyLujIhtEbEpImbWsnhJUu/6ckV/FzDnkLYlwPrMnAqsr/YBPgxMrW6LgOVDU6YkaaB6DfrMfBz43SHNlwErq+2VwOVd2u/ODv8KvD0iTh2qYiVJ/TfQOfpTMnMXQHV/ctU+CdjRpV9b1XaYiFgUERsiYkN7e/sAy5Ak9Waon4yNbtqyu46ZuSIzmzOzeeLEiUNchiSp00CDfnfnlEx1/3zV3gZM6dJvMrBz4OVJkgZroEH/ALCg2l4A3N+lfX61+uYCYE/nFI8kqT7G9NYhIv4JmAWcFBFtwJeAW4F7I+JTwLPAVVX3h4C5wDbgVeCva1CzJKkfeg36zJx3hIdmd9M3gesHW5Qkaej4ylhJKpxBL0mFM+glqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhTPoJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBXOoJekwhn0klQ4g16SCmfQS1LhDHpJKpxBL0mFM+glqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4cYMZnBEbAdeBg4A+zOzOSLeAawBGoHtwNWZ+dLgypQkDdRQXNH/ZWY2ZWZztb8EWJ+ZU4H11b4kqU5qMXVzGbCy2l4JXF6Dc0iS+miwQZ/AuohoiYhFVdspmbkLoLo/eZDnkCQNwqDm6IEPZObOiDgZ+GFE/LKvA6tfDIsA3vnOdw6yDEnSkQzqij4zd1b3zwP3AecDuyPiVIDq/vkjjF2Rmc2Z2Txx4sTBlCFJ6sGAgz4i/iwiTuzcBv4LsBl4AFhQdVsA3D/YIiVJAzeYqZtTgPsiovM4/zczH46InwH3RsSngGeBqwZfpiRpoAYc9Jn5G+CcbtpfBGYPpihJ0tDxlbGSVDiDXpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBXOoJekwhn0klQ4g16SCmfQS1LhDHpJKpxBL0mFM+glqXAGvSQVzqCXpMIZ9JJUOINekgpn0EtS4Qx6SSqcQS9JhTPoJalwBr0kFc6gl6TCGfSSVDiDXpIKZ9BLUuEMekkqnEEvSYUz6CWpcAa9JBXOoJekwhn0klS4mgV9RMyJiKcjYltELKnVeSRJPatJ0EfEscD/AT4MnA3Mi4iza3EuSVLPanVFfz6wLTN/k5n/DqwGLqvRuSRJPahV0E8CdnTZb6vaJElH2ZgaHTe6acs3dYhYBCyqdl+JiKdrVMtodBLwQr2L6E18o94VqA782Rxap/elU62Cvg2Y0mV/MrCza4fMXAGsqNH5R7WI2JCZzfWuQzqUP5v1Uaupm58BUyPijIg4DrgGeKBG55Ik9aAmV/SZuT8ibgD+BTgW+E5mbqnFuSRJPavV1A2Z+RDwUK2Orx45Jabhyp/NOojM7L2XJGnE8i0QJKlwBr0kFc6gl1QzEXFMRPynetcx2jlHP8JFxETgOqCRLk+uZ+bCetUkdRURP87M/1jvOkazmq260VFzP/AE8AhwoM61SN1ZFxH/FfheemVZF17Rj3AR0ZqZTfWuQzqSiHgZ+DM6LkReo+MtUjIz31bXwkYR5+hHvgcjYm69i5COJDNPzMxjMrMhM99W7RvyR5FX9CNcl6ulfwf2Vc1eLWnYiIgArgXOyMz/FRFTgFMz86d1Lm3UMOgl1VRELAfeAC7KzLMiYjywLjPfX+fSRg2fjC1ARPwV8BfV7mOZ+WA965EO8R8yc2ZE/BwgM1+q3uxQR4lz9CNcRNwKfA54qrp9rmqThot91ceLJhxcEvxGfUsaXZy6GeEiYhPQlJlvVPvHAj/PzBn1rUzqEBHXAh8HZgIrgY8B/zMz/19dCxtFnLopw9uB31Xb4+pZiHSozFwVES3AbDqWVl6emVvrXNaoYtCPfF8Hfh4Rj9Lxn+gvgKX1LUk6zK+BP1BlTkS8MzOfrW9Jo4dTNwWIiFOB99MR9D/JzN/WuSTpoIj4DPAlYDcdL5rqfMGU04tHiUE/wkXEB4DWzPxjRHyCjnnQf8jMZ+pcmgRARGyjY+XNi/WuZbRy1c3Itxx4NSLOAT4PPAPcXd+SpDfZAeypdxGjmXP0I9/+zMyIuAy4MzO/HREL6l2UFBH/o9r8DfBYRHwf2Nv5eGZ+sy6FjUIG/cj3ckQsBT4J/OdqeWVDnWuSAE6s7p+tbsdVN6jW1OvocI5+hIuIPwfmAT/LzB9V7yMyKzPvqXNpEgARcdWha+a7a1PtGPQjVPVmZp3fvKjus9reC2wDvpiZ6+tQnnRQRGzMzJm9tal2nLoZoTLzxCM9Vk3fvA9YVd1LR11EfBiYC0yKiDu7PPQ2YH99qhqdDPoCZeYB4BcR8b/rXYtGtZ3ABuAq4Fd0/MV5gI719H9Tx7pGHaduJNVERDQAXwP+G7CdjmnFKcB3gWWZue/IozWUXEcvqVZuA8YDp2fmzMw8F3gXHe/HdHtdKxtlvKKXVBMR8WvgzEM/ELx6DumXmTm1PpWNPl7RS6qVPDTkq8YDuI7+qDLoJdXKUxEx/9DG6j2ZflmHekYtp24k1URETAK+B7wGtNBxFf9+4C3AFZn5XB3LG1UMekk1FREXAdPoWHWzxRfxHX0GvSQVzjl6SSqcQS9JhTPoJalwBr0kFc6gl6TC/X/qB5qnoz9HUQAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plotUniqueValuesComparation(df_json, df_other, 'script_tld')" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
jsonother
com0.6464850.650045
net0.1436210.170509
ru0.0823250.053212
fr0.0230780.006061
cn0.0060600.014413
\n", + "
" + ], + "text/plain": [ + " json other\n", + "com 0.646485 0.650045\n", + "net 0.143621 0.170509\n", + "ru 0.082325 0.053212\n", + "fr 0.023078 0.006061\n", + "cn 0.006060 0.014413" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEHCAYAAABV4gY/AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAFCRJREFUeJzt3X+QXWWd5/H3l/wgrkSYgcaFNNgZJ4IhUpE0CZgyiZYMRByyOswAC9Qwi6RmnAiWK0XULYpld3T8xWjtRNc4E0a0QnCZESJmh9qRBeSHTHcwDCRMNGIYeuLshMgPgULozHf/uJ1wbZr06fTtPt3Pfb+qqLrnuU+f/vSF+nDuc885NzITSVJZDqk7gCSp9Sx3SSqQ5S5JBbLcJalAlrskFchyl6QCWe6SVCDLXZIKZLlLUoGm1vWLjzrqqOzq6qrr10vSpLR58+YnM7NjuHm1lXtXVxe9vb11/XpJmpQi4vEq81yWkaQCWe6SVCDLXZIKVNuauyRV8fLLL9PX18eLL75Yd5RxNWPGDDo7O5k2bdpB/bzlLmlC6+vrY+bMmXR1dRERdccZF5nJnj176OvrY/bs2Qe1D5dlJE1oL774IkceeWTbFDtARHDkkUeO6t2K5S5pwmunYt9ntH+z5S5JBXLNXdKk0rX6uy3d384/PbvSvHe84x3cd999Lf3dY8lyv+bwUf78M63JIWlCm0zFDpO83Fvxf/CdM1oQRFLxDjvsMH784x9z3nnn8eyzz9Lf389XvvIV3vnOd3LjjTfyqU99iszk7LPP5jOf+cz+n7niiiu47bbbeN3rXsett97KG9/4xnHJ65q7JFW0fv16zjzzTLZs2cJDDz3E/Pnz2bVrF1dddRV33HEHW7Zsoaenh1tuuQWA559/ntNOO42HHnqIJUuW8LWvfW3cslruklTRqaeeyvXXX88111zDww8/zMyZM+np6WHZsmV0dHQwdepULrzwQu6++24Apk+fzvve9z4AFixYwM6dO8ctq+UuSRUtWbKEu+++m1mzZnHxxRdzww03kJmvOX/atGn7T2mcMmUK/f394xXVcpekqh5//HGOPvpoLrvsMi699FIefPBBFi1axF133cWTTz7J3r17ufHGG1m6dGndUSf3B6qS2k/VUxdbLSK48847+dznPse0adM47LDDuOGGGzjmmGP49Kc/zbve9S4yk/e+972sWLGiloy/kvdAbynGUnd3d472yzpac7bMfxzdDjwVUhpTjz76KG9961trzbBnzx5OOeUUHn+80vdktMxQf3tEbM7M7uF+1mUZSTqAXbt2cfrpp/Oxj32s7igj4rKMJB3Asccey49+9KO6Y4yYR+6SVCDLXZIKVKncI+KsiNgeETsiYvVrzPm9iNgWEVsjYn1rY0qSRmLYNfeImAKsAc4A+oCeiNiYmdua5swBPg4szsynIuLosQosSRpelQ9UFwI7MvMxgIjYAKwAtjXNuQxYk5lPAWTmv7Y6qCQBo7+T66v2d3CnMz/99NOsX7+eD33oQwDceeedfP7zn+e2225rZbqDVmVZZhbwRNN238BYs7cAb4mIeyPiBxFxVqsCStJE9PTTT/PlL3+5Zftr9a0JqpT7UN/1NPjKp6nAHGAZcAHwFxFxxKt2FLEyInojonf37t0jzSpJtbnuuuuYN28e8+bN44tf/CKrV6/mJz/5CfPnz+fKK68E4LnnnuPcc8/lxBNP5MILL9x/35nNmzezdOlSFixYwJlnnsnPfvYzAJYtW8YnPvEJli5dype+9KWW5q2yLNMHHNe03QnsGmLODzLzZeCnEbGdRtn3NE/KzLXAWmhcoXqwoSVpPG3evJnrr7+eBx54gMxk0aJFfPOb3+SRRx5hy5YtQGNZ5oc//CFbt27l2GOPZfHixdx7770sWrSID3/4w9x66610dHRw00038clPfpJ169YBjXcAd911V8szVyn3HmBORMwG/hk4Hxh8zf4tNI7Y/yoijqKxTPNYK4NKUl3uuece3v/+9/P6178egA984AN8//vff9W8hQsX0tnZCcD8+fPZuXMnRxxxBI888ghnnHEGAHv37uWYY47Z/zPnnXfemGQettwzsz8iVgG3A1OAdZm5NSKuBXozc+PAc78VEduAvcCVmblnTBJL0jireg+uQw89dP/jfbf4zUxOOukk7r///iF/Zt//MFqt0nnumbkpM9+SmW/OzD8ZGLt6oNjJho9m5tzMfFtmbhiTtJJUgyVLlnDLLbfwwgsv8Pzzz/Ptb3+bxYsX84tf/GLYnz3hhBPYvXv3/nJ/+eWX2bp161hH9t4ykiaZGu7Eesopp3DJJZewcOFCAD74wQ+yYMECFi9ezLx581i+fDlnnz30rYinT5/OzTffzOWXX84zzzxDf38/H/nIRzjppJPGNLO3/PWWv9KENhFu+VsXb/krSfoVlrskFchylzTh1bV8XKfR/s2Wu6QJbcaMGezZs6etCj4z2bNnDzNmzDjofXi2jKQJrbOzk76+PtrtliUzZszYf0HUwbDcJU1o06ZNY/bs2XXHmHRclpGkAlnuklQgy12SCmS5S1KBLHdJKpDlLkkFstwlqUCWuyQVyHKXpAJZ7pJUIMtdkgpkuUtSgSx3SSqQ5S5JBbLcJalAlco9Is6KiO0RsSMiVg/x/CURsTsitgz888HWR5UkVTXsl3VExBRgDXAG0Af0RMTGzNw2aOpNmblqDDJKkkaoypH7QmBHZj6WmS8BG4AVYxtLkjQaVcp9FvBE03bfwNhgvxMR/xARN0fEcUPtKCJWRkRvRPS22/chStJ4qlLuMcTY4K8h/w7QlZknA38HfH2oHWXm2szszszujo6OkSWVJFVWpdz7gOYj8U5gV/OEzNyTmb8c2PwasKA18SRJB6NKufcAcyJidkRMB84HNjZPiIhjmjbPAR5tXURJ0kgNe7ZMZvZHxCrgdmAKsC4zt0bEtUBvZm4ELo+Ic4B+4OfAJWOYWZI0jGHLHSAzNwGbBo1d3fT448DHWxtNknSwvEJVkgpkuUtSgSx3SSqQ5S5JBbLcJalAlrskFchyl6QCWe6SVCDLXZIKZLlLUoEsd0kqkOUuSQWy3CWpQJa7JBXIcpekAlnuklQgy12SCmS5S1KBLHdJKpDlLkkFstwlqUCWuyQVyHKXpAJVKveIOCsitkfEjohYfYB550ZERkR36yJKkkZq2HKPiCnAGmA5MBe4ICLmDjFvJnA58ECrQ0qSRqbKkftCYEdmPpaZLwEbgBVDzPtvwGeBF1uYT5J0EKqU+yzgiabtvoGx/SLi7cBxmXnbgXYUESsjojcienfv3j3isJKkaqqUewwxlvufjDgE+DPgPw+3o8xcm5ndmdnd0dFRPaUkaUSqlHsfcFzTdiewq2l7JjAPuDMidgKnARv9UFWS6lOl3HuAORExOyKmA+cDG/c9mZnPZOZRmdmVmV3AD4BzMrN3TBJLkoY1bLlnZj+wCrgdeBT4VmZujYhrI+KcsQ4oSRq5qVUmZeYmYNOgsatfY+6y0ceSJI2GV6hKUoEsd0kqkOUuSQWy3CWpQJa7JBXIcpekAlnuklQgy12SCmS5S1KBLHdJKpDlLkkFstwlqUCWuyQVyHKXpAJZ7pJUIMtdkgpkuUtSgSx3SSqQ5S5JBbLcJalAlrskFchyl6QCWe6SVKBK5R4RZ0XE9ojYERGrh3j+DyPi4YjYEhH3RMTc1keVJFU1bLlHxBRgDbAcmAtcMER5r8/Mt2XmfOCzwHUtTypJqqzKkftCYEdmPpaZLwEbgBXNEzLz2abN1wPZuoiSpJGaWmHOLOCJpu0+YNHgSRHxx8BHgenAu4faUUSsBFYCHH/88SPNKkmqqMqRewwx9qoj88xck5lvBq4C/stQO8rMtZnZnZndHR0dI0sqSaqsSrn3Acc1bXcCuw4wfwPwH0YTSpI0OlXKvQeYExGzI2I6cD6wsXlCRMxp2jwb+HHrIkqSRmrYNffM7I+IVcDtwBRgXWZujYhrgd7M3Aisioj3AC8DTwG/P5ahJUkHVuUDVTJzE7Bp0NjVTY+vaHEuSdIoeIWqJBXIcpekAlnuklQgy12SCmS5S1KBLHdJKpDlLkkFstwlqUCWuyQVyHKXpAJZ7pJUIMtdkgpkuUtSgSx3SSqQ5S5JBbLcJalAlrskFchyl6QCWe6SVCDLXZIKZLlLUoEsd0kqUKVyj4izImJ7ROyIiNVDPP/RiNgWEf8QEd+LiDe1Pqokqaphyz0ipgBrgOXAXOCCiJg7aNoPge7MPBm4Gfhsq4NKkqqrcuS+ENiRmY9l5kvABmBF84TM/L+Z+cLA5g+AztbGlCSNRJVynwU80bTdNzD2Wi4F/vdoQkmSRmdqhTkxxFgOOTHiIqAbWPoaz68EVgIcf/zxFSNKkkaqypF7H3Bc03YnsGvwpIh4D/BJ4JzM/OVQO8rMtZnZnZndHR0dB5NXklRBlXLvAeZExOyImA6cD2xsnhARbwe+SqPY/7X1MSVJIzFsuWdmP7AKuB14FPhWZm6NiGsj4pyBaZ8DDgP+V0RsiYiNr7E7SdI4qLLmTmZuAjYNGru66fF7WpxLkjQKXqEqSQWy3CWpQJa7JBXIcpekAlX6QFVt4prDR/nzz7Qmh6RR88hdkgpkuUtSgSx3SSqQ5S5JBbLcJalAlrskFchyl6QCWe6SVCAvYipE1+rvjnofO2e0IIikCcEjd0kqkOUuSQWy3CWpQJa7JBXIcpekAlnuklQgy12SCmS5S1KBLHdJKlClco+IsyJie0TsiIjVQzy/JCIejIj+iDi39TElSSMxbLlHxBRgDbAcmAtcEBFzB037J+ASYH2rA0qSRq7KvWUWAjsy8zGAiNgArAC27ZuQmTsHnvu3McgoSRqhKssys4Anmrb7BsYkSRNUlXKPIcbyYH5ZRKyMiN6I6N29e/fB7EKSVEGVcu8Djmva7gR2Hcwvy8y1mdmdmd0dHR0HswtJUgVVyr0HmBMRsyNiOnA+sHFsY0mSRmPYcs/MfmAVcDvwKPCtzNwaEddGxDkAEXFqRPQBvwt8NSK2jmVoSdKBVfompszcBGwaNHZ10+MeGss1kqQJwK/ZU3FG+5WDO//07BYlkerj7QckqUCWuyQVyHKXpAJZ7pJUIMtdkgpkuUtSgSx3SSqQ5S5JBbLcJalAXqEqDXbN4S3YxzOj34c0Ch65S1KBLHdJKpDlLkkFstwlqUCWuyQVyHKXpAJZ7pJUIMtdkgrkRUxSwfzKwfblkbskFchyl6QCuSwjSVVMsnsOVTpyj4izImJ7ROyIiNVDPH9oRNw08PwDEdHV6qCSpOqGLfeImAKsAZYDc4ELImLuoGmXAk9l5m8CfwZ8ptVBJUnVVVmWWQjsyMzHACJiA7AC2NY0ZwVwzcDjm4E/j4jIzGxhVknjbZItRRzIqM8cmtGiIOOkyrLMLOCJpu2+gbEh52RmP/AMcGQrAkqSRq7KkXsMMTb4iLzKHCJiJbByYPO5iNhe4fePqYCjgCcPegf/dag/fXLytWgY9esAvhbNfC1e0ZrX4k1VJlUp9z7guKbtTmDXa8zpi4ipwOHAzwfvKDPXAmurBBsvEdGbmd1155gIfC0afB1e4Wvxisn2WlRZlukB5kTE7IiYDpwPbBw0ZyPw+wOPzwXucL1dkuoz7JF7ZvZHxCrgdmAKsC4zt0bEtUBvZm4E/hL4RkTsoHHEfv5YhpYkHVili5gycxOwadDY1U2PXwR+t7XRxs2EWiaqma9Fg6/DK3wtXjGpXotw9USSyuO9ZSSpQJa7JBXIcpekArXlXSEj4mSgi6a/PzP/prZANYmI2Zn50+HG1D4G7iX19cy8qO4sGp22K/eIWAecDGwF/m1gOIG2K3fgr4FTBo3dDCyoIUutIuJ6hriqOjP/Uw1xapOZeyOiIyKmZ+ZLdeepW0R8gMaNEI+mcSV+AJmZb6g1WAVtV+7AaZk5+K6WbSUiTgROAg4f+I93nzcAk+z2SC1zW9PjGcD7efWV2O1iJ3BvRGwEnt83mJnX1ZaoPp8FfjszH607yEi1Y7nfHxFzM3Pb8FOLdQLwPuAI4Lebxn8BXFZLoppl5l83b0fEjcDf1RSnFhHxjcy8GDiPxq27DwFm1puqdv9vMhY7tOF57hGxBPgO8C/AL3nlbdbJtQarQUScnpn3151jIoqIE4DvDnxHQVuIiG00vrfhO8Cywc9n5qvuF1W6iPgS8O+BW2j0BTA5PqNrxyP3dcDFwMO8subervZExPeAN2bmvIEPms/JzP9ed7DxFBEB7AWeaxr+F+CqehLV5n8CfwvMBnqbxoPG5xG/UUeomr0BeAH4raaxSfEZXTseud+Rme+uO8dEEBF3AVcCX83Mtw+MPZKZ8+pNNv4i4sHMHPzhcluKiK9k5h/VnWMiiIivA1dk5tMD278GfGEyfNDejkfu/xgR62m89ZxUb7PGwL/LzL9vHLju119XmJrdFxGnZmZP3UHqZrH/ipP3FTtAZj4VEW+vM1BV7Vjur6NR6pPubdYYeDIi3szAKYARcS7ws3oj1ebdwB9FxE4aZ4i07Wcx+hWHRMSvZeZTABHx60yS3pwUIVspM/+g7gwTyB/TuNPdiRHxz8BPgQvrjVSb5XUH0IT0BRrv6m6mcRD0e8Cf1BupmnZcc+8E/gewmMa/rHtorKn11RqsBhFxKI0vV+kCfh14lsbR6rV15pImkoiYS+OdXQDfmyynUbdjuf8fYD3wjYGhi4ALM/OM+lLVIyL+FngaeJDG2SIAZOYXagslqSXasdy3ZOb84cbaQbueGSO1g3a8K+STEXFRREwZ+OciYE/doWpyX0S8re4QklqvHY/cjwf+HDidxpr7fcDlmflPtQarwcAVib9J44PUtr5aVypNO5b714GPDDq16fOT4aKEVouINw01npmPj3cWSa3VdqdC0rgo4al9G5n588lyUUKrWeJSudpxzf2QgUuIgcl1UYIkVdWOpTZpL0qQpKrabs0dJu9FCZJUVVuWuySVrh3X3CWpeJa7JBXIcpekAlnuklQgy12SCvT/AbRxrevuS2knAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plotTopUsageComparation(df_json, df_other, 'script_tld', 4)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are 52 unique script_tld present on the non-json dataset and 89 on the JSONs\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAEOCAYAAACZ2uz0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAEc1JREFUeJzt3XuMlfWZwPHvIwwdrYoU0VVxHUzwhiDi1LWXrUZstGorul5KsJLFlTRZb62ogJs07Xa9xbTWZmti6ioastJlbWxs61JZTG1i3DLjKCD1EtfLiNLRVVoVKeizf8yBcBmcw9xe5jffT0Jmzjvve84DM3x5+c17zkRmIkka/PaoegBJUt8w6JJUCIMuSYUw6JJUCIMuSYUw6JJUCIMuSYUw6JJUCIMuSYUYPpAPtv/++2dTU9NAPqQkDXotLS1vZeaY7vYb0KA3NTWxfPnygXxISRr0IuKVevZzyUWSCmHQJakQBl2SCjGga+iSdg8bN26kvb2dDz/8sOpRtJXGxkbGjh1LQ0NDj4436NIQ1N7ezj777ENTUxMRUfU4AjKTt99+m/b2dsaNG9ej+3DJRRqCPvzwQ0aPHm3MdyMRwejRo3v1vyaDLg1Rxnz309vPiUGXpEK4ht6Fprm/rHqEYrx881lVj6A69PXXfHef91NOOYV58+Zx+umnb9l2++238/zzz/OTn/xkp8ftvffevPfee30y43333cett95KZpKZzJo1izlz5vTJfW924403Mn/+/D69z0/iGbqkATd9+nQeeOCBbbY98MADTJ8+fUAe/9e//jW33347S5YsYdWqVbS2tjJy5Mg+f5wbb7yxz+/zkxh0SQPu/PPP5+GHH2bDhg0AvPzyy6xZs4YvfvGLvPfee0ydOpUpU6YwceJEHnrooR2Of+yxxzj77LO33L788su59957AWhpaeHkk0/mhBNO4PTTT+eNN97Y4fibbrqJ2267jYMPPhjovFzwsssuA6CtrY2TTjqJSZMmce655/LOO+8Anf+r2PzSJW+99RabX5fq3nvv5bzzzuOMM85g/PjxXHfddQDMnTuX9evXM3nyZGbMmMH777/PWWedxXHHHcexxx7LokWL+uBPclsGXdKAGz16NCeeeCKPPPII0Hl2ftFFFxERNDY28vOf/5zW1laWLVvGNddcQ2bWdb8bN27kiiuuYPHixbS0tDBr1ixuuOGGHfZbuXIlJ5xwQpf3cckll3DLLbfwzDPPMHHiRL773e92+7htbW0sWrSIFStWsGjRIl577TVuvvlm9txzT9ra2li4cCGPPPIIBx98ME8//TQrV67kjDPOqOv3tCsMuqRKbL3ssvVyS2Yyf/58Jk2axGmnncbrr7/O2rVr67rP5557jpUrV/LlL3+ZyZMn8/3vf5/29va6Z1q3bh3vvvsuJ598MgAzZ87kt7/9bbfHTZ06lZEjR9LY2MgxxxzDK6/s+FpaEydO5NFHH+X666/n8ccf75clHoMuqRLTpk1j6dKltLa2sn79eqZMmQLAwoUL6ejooKWlhba2Ng488MAdrs0ePnw4H3/88Zbbmz+emUyYMIG2tjba2tpYsWIFS5Ys2eGxJ0yYQEtLyy7Nu/Vjbj/Ppz71qS3vDxs2jE2bNu1w/BFHHEFLSwsTJ05k3rx5fO9739ulx6+HQZdUib333ptTTjmFWbNmbfPN0HXr1nHAAQfQ0NDAsmXLujzbPeyww3j22WfZsGED69atY+nSpQAceeSRdHR08MQTTwCdSzCrVq3a4fh58+Zx3XXX8eabbwKwYcMG7rjjDkaOHMmoUaN4/PHHAbj//vu3nK03NTVt+Udg8eLFdf0eGxoa2LhxIwBr1qxhr7324uKLL2bOnDm0trbWdR+7wssWJVV2een06dM577zztrniZcaMGXz1q1+lubmZyZMnc9RRR+1w3KGHHsqFF17IpEmTGD9+PMcffzwAI0aMYPHixVx55ZWsW7eOTZs2cfXVVzNhwoRtjj/zzDNZu3Ytp512GplJRDBr1iwAFixYwDe/+U0++OADDj/8cO655x4A5syZw4UXXsj999/PqaeeWtfvb/bs2UyaNIkpU6ZwySWXcO2117LHHnvQ0NDAnXfe2aM/s08S9X6zoS80NzfnYPgBF16H3ne8Dn33tHr1ao4++uiqx1AXuvrcRERLZjZ3d6xLLpJUCIMuSYUw6NIQNZDLrapPbz8nBl0aghobG3n77beN+m5k8+uhNzY29vg+vMpFGoLGjh1Le3s7HR0dVY+irWz+iUU9ZdClIaihoaHHPxVHuy+XXCSpEAZdkgph0CWpEHUFPSK+FRGrImJlRPx7RDRGxLiIeDIiXoiIRRExor+HlSTtXLdBj4hDgCuB5sw8FhgGfB24BfhhZo4H3gEu7c9BJUmfrN4ll+HAnhExHNgLeAM4Fdj8kmMLgGl9P54kqV7dBj0zXwduA16lM+TrgBbg3czc/KK/7cAhXR0fEbMjYnlELPeaV0nqP/UsuYwCzgHGAQcDnwa+0sWuXT7lLDPvyszmzGweM2ZMb2aVJH2CepZcTgP+NzM7MnMj8CDweWC/2hIMwFhgTT/NKEmqQz1BfxU4KSL2iogApgLPAsuA82v7zAR2/NHckqQBU88a+pN0fvOzFVhRO+Yu4Hrg2xHxIjAauLsf55QkdaOu13LJzO8A39lu80vAiX0+kSSpR3ymqCQVwqBLUiEMuiQVwqBLUiEMuiQVwqBLUiEMuiQVwqBLUiEMuiQVwqBLUiEMuiQVwqBLUiEMuiQVwqBLUiEMuiQVwqBLUiEMuiQVwqBLUiEMuiQVwqBLUiEMuiQVwqBLUiEMuiQVwqBLUiEMuiQVwqBLUiEMuiQVwqBLUiEMuiQVwqBLUiEMuiQVwqBLUiEMuiQVwqBLUiEMuiQVwqBLUiEMuiQVwqBLUiHqCnpE7BcRiyPiDxGxOiI+FxGfiYjfRMQLtbej+ntYSdLO1XuG/iPgkcw8CjgOWA3MBZZm5nhgae22JKki3QY9IvYFvgTcDZCZf8nMd4FzgAW13RYA0/prSElS9+o5Qz8c6ADuiYinIuKnEfFp4MDMfAOg9vaArg6OiNkRsTwilnd0dPTZ4JKkbdUT9OHAFODOzDweeJ9dWF7JzLsyszkzm8eMGdPDMSVJ3akn6O1Ae2Y+Wbu9mM7Ar42IgwBqb//YPyNKkurRbdAz803gtYg4srZpKvAs8AtgZm3bTOChfplQklSX4XXudwWwMCJGAC8Bf0/nPwY/i4hLgVeBC/pnRElSPeoKema2Ac1dfGhq344jSeopnykqSYUw6JJUCIMuSYUw6JJUCIMuSYUw6JJUCIMuSYUw6JJUCIMuSYUw6JJUCIMuSYUw6JJUCIMuSYWo9+VzJe0Gmub+suoRivLyzWdVPUKf8gxdkgph0CWpEAZdkgph0CWpEAZdkgph0CWpEAZdkgph0CWpEAZdkgph0CWpEAZdkgph0CWpEAZdkgph0CWpEAZdkgph0CWpEAZdkgph0CWpEAZdkgph0CWpEAZdkgph0CWpEAZdkgpRd9AjYlhEPBURD9duj4uIJyPihYhYFBEj+m9MSVJ3duUM/Spg9Va3bwF+mJnjgXeAS/tyMEnSrqkr6BExFjgL+GntdgCnAotruywApvXHgJKk+tR7hn47cB3wce32aODdzNxUu90OHNLVgRExOyKWR8Tyjo6OXg0rSdq5boMeEWcDf8zMlq03d7FrdnV8Zt6Vmc2Z2TxmzJgejilJ6s7wOvb5AvC1iDgTaAT2pfOMfb+IGF47Sx8LrOm/MSVJ3en2DD0z52Xm2MxsAr4O/HdmzgCWAefXdpsJPNRvU0qSutWb69CvB74dES/SuaZ+d9+MJEnqiXqWXLbIzMeAx2rvvwSc2PcjSZJ6wmeKSlIhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFaLboEfEoRGxLCJWR8SqiLiqtv0zEfGbiHih9nZU/48rSdqZes7QNwHXZObRwEnAP0bEMcBcYGlmjgeW1m5LkirSbdAz843MbK29/2dgNXAIcA6woLbbAmBafw0pSereLq2hR0QTcDzwJHBgZr4BndEHDtjJMbMjYnlELO/o6OjdtJKknao76BGxN/CfwNWZ+ad6j8vMuzKzOTObx4wZ05MZJUl1qCvoEdFAZ8wXZuaDtc1rI+Kg2scPAv7YPyNKkupRz1UuAdwNrM7MH2z1oV8AM2vvzwQe6vvxJEn1Gl7HPl8AvgGsiIi22rb5wM3AzyLiUuBV4IL+GVGSVI9ug56ZvwNiJx+e2rfjSJJ6ymeKSlIhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhDLokFcKgS1IhehX0iDgjIp6LiBcjYm5fDSVJ2nU9DnpEDAP+FfgKcAwwPSKO6avBJEm7pjdn6CcCL2bmS5n5F+AB4Jy+GUuStKt6E/RDgNe2ut1e2yZJqsDwXhwbXWzLHXaKmA3Mrt18LyKe68Vjalv7A29VPcQniVuqnkAV2e2/NmFQfX0eVs9OvQl6O3DoVrfHAmu23ykz7wLu6sXjaCciYnlmNlc9h7Q9vzar0Zsll98D4yNiXESMAL4O/KJvxpIk7aoen6Fn5qaIuBz4L2AY8G+ZuarPJpMk7ZLeLLmQmb8CftVHs2jXuZSl3ZVfmxWIzB2+jylJGoR86r8kFcKgS1IhDLqkXomIPSLi81XPIdfQB42IGANcBjSx1TezM3NWVTNJm0XEE5n5uarnGOp6dZWLBtRDwOPAo8BHFc8ibW9JRPwd8GB6llgZz9AHiYhoy8zJVc8hdSUi/gx8ms6TjfV0vjRIZua+lQ42xLiGPng8HBFnVj2E1JXM3Ccz98jMhszct3bbmA8wz9AHia3OgP4CbKxt9gxIu4WICGAGMC4z/zkiDgUOysz/qXi0IcWgS+q1iLgT+Bg4NTOPjohRwJLM/GzFow0pflN0EImIrwFfqt18LDMfrnIeaSt/k5lTIuIpgMx8p/aifRpArqEPEhFxM3AV8Gzt11W1bdLuYGPtx1ImbLnM9uNqRxp6XHIZJCLiGWByZn5cuz0MeCozJ1U7mQQRMQO4CJgCLADOB/4pM/+j0sGGGJdcBpf9gP+rvT+yykGkrWXmwohoAabSecnitMxcXfFYQ45BHzxuAp6KiGV0/oX5EjCv2pGkbbwA/IlaVyLirzPz1WpHGlpcchlEIuIg4LN0Bv3JzHyz4pEkACLiCuA7wFo6n1y0+YlFLgkOIIM+SETEF4C2zHw/Ii6mc63yR5n5SsWjSUTEi3Re6fJ21bMMZV7lMnjcCXwQEccB1wKvAPdVO5K0xWvAuqqHGOpcQx88NmVmRsQ5wB2ZeXdEzKx6KA1tEfHt2rsvAY9FxC+BDZs/npk/qGSwIcqgDx5/joh5wDeAv61dtthQ8UzSPrW3r9Z+jaj9gto16Ro4rqEPEhHxV8B04PeZ+bvaa2Wckpn3VzyaRERcsP01511tU/8y6Lu52otybf4kRe1t1t7fALwI3JCZSysYTwIgIlozc0p329S/XHLZzWXmPjv7WG3Z5VhgYe2tNKAi4ivAmcAhEXHHVh/aF9hUzVRDl0EfxDLzI+DpiPhx1bNoyFoDLAcuAJ6n83+PH9F5Pfq3KpxrSHLJRVKPRUQD8C/APwAv07kUeChwDzA/Mzfu/Gj1Na9Dl9QbtwKjgMMyc0pmHg8cTudrDd1W6WRDkGfoknosIl4Ajtj+B0PXvr/zh8wcX81kQ5Nn6JJ6I7ePeW3jR3gd+oAz6JJ649mIuGT7jbXXG/pDBfMMaS65SOqxiDgEeBBYD7TQeVb+WWBP4NzMfL3C8YYcgy6p1yLiVGACnVe5rPKJbtUw6JJUCNfQJakQBl2SCmHQJakQBl2SCmHQJakQ/w93tLJrAKtZgwAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plotUniqueValuesComparation(df_a_json, df_a_other, 'script_tld')" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
jsonother
com0.7050270.627121
net0.1508780.190486
ru0.0554860.002650
biz0.0014980.041437
\n", + "
" + ], + "text/plain": [ + " json other\n", + "com 0.705027 0.627121\n", + "net 0.150878 0.190486\n", + "ru 0.055486 0.002650\n", + "biz 0.001498 0.041437" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEHCAYAAABV4gY/AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAFK9JREFUeJzt3X+QndV93/H31/phNYZCAusOaGWkJgpYLFigRRJWRwLHFDCOFDvEloodM8WoqSsT1zVFNh1KlXbs2K5/NJEzlhNpTDyScJQaZKyOZmLMb0x3BSIgEbmyIsxGxFnWgA0MRiLf/rEr5frqSvvs7l1d3aP3a2Zn7jnPuc/97pXms+ee58eNzESSVJY3tLoASVLzGe6SVCDDXZIKZLhLUoEMd0kqkOEuSQUy3CWpQIa7JBXIcJekAk1s1QuffvrpOX369Fa9vCS1pW3btj2XmR3DjWtZuE+fPp3e3t5WvbwktaWIeLrKOJdlJKlAhrskFchwl6QCtWzNXZKq2L9/P319fbz66qutLuWYmjJlCp2dnUyaNGlUzzfcJR3X+vr6OPnkk5k+fToR0epyjonMZGBggL6+PmbMmDGqfbgsI+m49uqrr3LaaaedMMEOEBGcdtppY/q0YrhLOu6dSMF+0Fh/50rhHhFXRMSuiNgdESsbbP9iRGwf+vlBRLwwpqokSWMy7Jp7REwAVgOXAX1AT0RszsydB8dk5n+sGf9R4IJxqFWSmL7yO03d397PXFVp3Nvf/nYeeuihpr72eKpyQHUusDsz9wBExEZgCbDzCOOXAf+1OeWNTLP/0aH6P7yksrVTsEO1ZZmpwDM17b6hvsNExFnADODuI2xfHhG9EdHb398/0lolqWVOOukknn32WRYuXMjs2bPp6uri/vvvB2DDhg2cd955dHV1cdNNN/3Cc26++Wbe9ra3MX/+fH784x8fs3qrhHujVf08wtilwKbMfL3Rxsxck5ndmdnd0THsfW8k6biyfv16Lr/8crZv387jjz/O7Nmz2bdvHzfddBN3330327dvp6enhzvuuAOAl19+mfnz5/P444+zcOFCvva1rx2zWquEex8wrabdCew7wtilwIaxFiVJx6OLLrqIdevWceutt/LEE09w8skn09PTwyWXXEJHRwcTJ07kmmuu4b777gNg8uTJvPvd7wZgzpw57N2795jVWiXce4CZETEjIiYzGOCb6wdFxNnALwMPN7dESTo+LFy4kPvuu4+pU6fywQ9+kNtuu43MIy1kwKRJkw6d0jhhwgQOHDhwrEodPtwz8wCwAtgKPAV8MzN3RMSqiFhcM3QZsDGP9ptKUht7+umnefOb38z111/Pddddx6OPPsq8efO49957ee6553j99dfZsGEDixYtanWp1W4/kJlbgC11fbfUtW9tXlmS1FirzmCLCO655x4+97nPMWnSJE466SRuu+02zjjjDD796U9z6aWXkpm8613vYsmSJS2p8RfqbdVEu7u7O5v9ZR2eCimV56mnnuKtb31rS2sYGBjgwgsv5OmnK31PRtM0+t0jYltmdg/3XG8/IElHsW/fPi6++GI+8YlPtLqUEfGukJJ0FGeeeSY/+MEPWl3GiDlzl6QCGe6SVCDDXZIKZLhLUoE8oCqpvdx6SpP39+KonvbCCy+wfv16PvKRjwBwzz338PnPf5677rqrmdWNmjN3SRqFF154ga985StN21+zb01guEtSBV/4whfo6uqiq6uLL33pS6xcuZIf/vCHzJ49mxtvvBGAl156iauvvppzzjmHa6655tB9Z7Zt28aiRYuYM2cOl19+Oc8++ywAl1xyCZ/61KdYtGgRX/7yl5tar8sykjSMbdu2sW7dOh555BEyk3nz5vGNb3yDJ598ku3btwODyzKPPfYYO3bs4Mwzz2TBggU8+OCDzJs3j49+9KPceeeddHR0cPvtt3PzzTezdu1aYPATwL333tv0mg13SRrGAw88wHve8x7e9KY3AfDe97730Bd11Jo7dy6dnZ0AzJ49m71793Lqqafy5JNPctlllwHw+uuvc8YZZxx6zvvf//5xqdlwl6RhVL0H1xvf+MZDjw/e4jczOffcc3n44cZ3Qz/4B6PZXHOXpGEsXLiQO+64g1deeYWXX36Zb33rWyxYsICf/exnwz737LPPpr+//1C479+/nx07dox3yc7cJbWZUZ66OBYXXngh1157LXPnzgXgwx/+MHPmzGHBggV0dXVx5ZVXctVVje8gO3nyZDZt2sQNN9zAiy++yIEDB/jYxz7GueeeO641e8vfYeyd8m+avs9W/OeU2tXxcMvfVvGWv5KkX2C4S1KBDHdJx70T8auZx/o7G+6SjmtTpkxhYGDghAr4zGRgYIApU6aMeh+VzpaJiCuALwMTgD/NzM80GPM+4FYggcczcxyOREo60XR2dtLX10d/f3+rSzmmpkyZcuiCqNEYNtwjYgKwGrgM6AN6ImJzZu6sGTMT+CSwIDOfj4g3j7oiSaoxadIkZsyY0eoy2k6VZZm5wO7M3JOZrwEbgSV1Y64HVmfm8wCZ+Q/NLVOSNBJVwn0q8ExNu2+or9avA78eEQ9GxPeHlnEOExHLI6I3InpPtI9YknQsVQn3aNBXf2RjIjATuARYBvxpRJx62JMy12Rmd2Z2d3R0jLRWSVJFVcK9D5hW0+4E9jUYc2dm7s/MvwV2MRj2kqQWqBLuPcDMiJgREZOBpcDmujF3AJcCRMTpDC7T7GlmoZKk6oYN98w8AKwAtgJPAd/MzB0RsSoiFg8N2woMRMRO4HvAjZk5MF5FS5KOrtJ57pm5BdhS13dLzeMEPj70I0lqMa9QlaQCGe6SVCDDXZIKZLhLUoEMd0kqkOEuSQUy3CWpQIa7JBXIcJekAhnuklQgw12SCmS4S1KBDHdJKpDhLkkFMtwlqUCGuyQVyHCXpAIZ7pJUIMNdkgpkuEtSgSqFe0RcERG7ImJ3RKxssP3aiOiPiO1DPx9ufqmSpKomDjcgIiYAq4HLgD6gJyI2Z+bOuqG3Z+aKcahRkjRCVWbuc4HdmbknM18DNgJLxrcsSdJYVAn3qcAzNe2+ob56vx0Rfx0RmyJiWlOqkySNSpVwjwZ9Wdf+NjA9M88H/gr4esMdRSyPiN6I6O3v7x9ZpZKkyqqEex9QOxPvBPbVDsjMgcz8+VDza8CcRjvKzDWZ2Z2Z3R0dHaOpV5JUQZVw7wFmRsSMiJgMLAU21w6IiDNqmouBp5pXoiRppIY9WyYzD0TECmArMAFYm5k7ImIV0JuZm4EbImIxcAD4CXDtONYsSRrGsOEOkJlbgC11fbfUPP4k8MnmliZJGi2vUJWkAhnuklQgw12SCmS4S1KBDHdJKpDhLkkFMtwlqUCGuyQVyHCXpAIZ7pJUIMNdkgpkuEtSgQx3SSqQ4S5JBTLcJalAhrskFchwl6QCGe6SVCDDXZIKZLhLUoEMd0kqUKVwj4grImJXROyOiJVHGXd1RGREdDevREnSSA0b7hExAVgNXAnMApZFxKwG404GbgAeaXaRkqSRqTJznwvszsw9mfkasBFY0mDcHwCfBV5tYn2SpFGoEu5TgWdq2n1DfYdExAXAtMy862g7iojlEdEbEb39/f0jLlaSVE2VcI8GfXloY8QbgC8C/2m4HWXmmszszszujo6O6lVKkkakSrj3AdNq2p3Avpr2yUAXcE9E7AXmA5s9qCpJrVMl3HuAmRExIyImA0uBzQc3ZuaLmXl6Zk7PzOnA94HFmdk7LhVLkoY1bLhn5gFgBbAVeAr4ZmbuiIhVEbF4vAuUJI3cxCqDMnMLsKWu75YjjL1k7GVJksbCK1QlqUCGuyQVyHCXpAIZ7pJUIMNdkgpkuEtSgQx3SSqQ4S5JBTLcJalAhrskFchwl6QCGe6SVCDDXZIKZLhLUoEMd0kqkOEuSQUy3CWpQIa7JBXIcJekAhnuklSgSuEeEVdExK6I2B0RKxts/72IeCIitkfEAxExq/mlSpKqGjbcI2ICsBq4EpgFLGsQ3usz87zMnA18FvhC0yuVJFVWZeY+F9idmXsy8zVgI7CkdkBm/rSm+SYgm1eiJGmkJlYYMxV4pqbdB8yrHxQR/wH4ODAZeEdTqpMkjUqVmXs06DtsZp6ZqzPzV4GbgP/ScEcRyyOiNyJ6+/v7R1apJKmyKuHeB0yraXcC+44yfiPwW402ZOaazOzOzO6Ojo7qVUqSRqRKuPcAMyNiRkRMBpYCm2sHRMTMmuZVwP9rXomSpJEads09Mw9ExApgKzABWJuZOyJiFdCbmZuBFRHxTmA/8DzwofEsWpJ0dFUOqJKZW4AtdX231Dz+/SbXJUkaA69QlaQCGe6SVCDDXZIKZLhLUoEMd0kqkOEuSQUy3CWpQIa7JBXIcJekAhnuklQgw12SCmS4S1KBDHdJKpDhLkkFMtwlqUCGuyQVyHCXpAIZ7pJUIMNdkgpkuEtSgSqFe0RcERG7ImJ3RKxssP3jEbEzIv46Ir4bEWc1v1RJUlXDhntETABWA1cCs4BlETGrbthjQHdmng9sAj7b7EIlSdVVmbnPBXZn5p7MfA3YCCypHZCZ38vMV4aa3wc6m1umJGkkqoT7VOCZmnbfUN+RXAf8n7EUJUkam4kVxkSDvmw4MOIDQDew6AjblwPLAd7ylrdULFGSNFJVZu59wLSadiewr35QRLwTuBlYnJk/b7SjzFyTmd2Z2d3R0TGaeiVJFVQJ9x5gZkTMiIjJwFJgc+2AiLgA+CqDwf4PzS9TkjQSwy7LZOaBiFgBbAUmAGszc0dErAJ6M3Mz8DngJOAvIgLgR5m5eBzrVru69ZRx2OeLzd+n1OaqrLmTmVuALXV9t9Q8fmeT65IkjYFXqEpSgQx3SSqQ4S5JBTLcJalAhrskFchwl6QCGe6SVCDDXZIKZLhLUoEqXaGqE9P0ld9p+j73Tmn6LiU14MxdkgpkuEtSgQx3SSqQ4S5JBTLcJalAhrskFchwl6QCGe6SVCDDXZIKZLhLUoEMd0kqUKVwj4grImJXROyOiJUNti+MiEcj4kBEXN38MiVJIzFsuEfEBGA1cCUwC1gWEbPqhv0IuBZY3+wCJUkjV+WukHOB3Zm5ByAiNgJLgJ0HB2Tm3qFt/zgONUqSRqjKssxU4Jmadt9Q34hFxPKI6I2I3v7+/tHsQpJUQZVwjwZ9OZoXy8w1mdmdmd0dHR2j2YUkqYIq4d4HTKtpdwL7xqccSVIzVAn3HmBmRMyIiMnAUmDz+JYlSRqLYcM9Mw8AK4CtwFPANzNzR0SsiojFABFxUUT0Ab8DfDUidoxn0ZKko6v0HaqZuQXYUtd3S83jHgaXayRJxwGvUJWkAhnuklQgw12SCmS4S1KBDHdJKpDhLkkFMtwlqUCVznOXNDbTV36n6fvc+5mrmr5PlcOZuyQVyHCXpAIZ7pJUIMNdkgrkAVVJArj1lHHY54vN32dFztwlqUCGuyQVyHCXpAIZ7pJUIMNdkgpkuEtSgQx3SSpQpXCPiCsiYldE7I6IlQ22vzEibh/a/khETG92oZKk6oYN94iYAKwGrgRmAcsiYlbdsOuA5zPz14AvAn/Y7EIlSdVVmbnPBXZn5p7MfA3YCCypG7ME+PrQ403Ab0RENK9MSdJIVLn9wFTgmZp2HzDvSGMy80BEvAicBjzXjCIlNVDY5fIjMS73x5/S9F22VJVwbzQDz1GMISKWA8uHmi9FxK4Kr99SAafT7D9S/+3E/VDj+9k8vpfN1Ubv51lVBlUJ9z5gWk27E9h3hDF9ETEROAX4Sf2OMnMNsKZKYceLiOjNzO5W11EK38/m8b1srtLezypr7j3AzIiYERGTgaXA5roxm4EPDT2+Grg7Mw+buUuSjo1hZ+5Da+grgK3ABGBtZu6IiFVAb2ZuBv4M+POI2M3gjH3peBYtSTq6Svdzz8wtwJa6vltqHr8K/E5zSztutNUyUhvw/Wwe38vmKur9DFdPJKk83n5AkgpkuEtSgQx3SSqQ4d5ARJwfEYsj4r0Hf1pdUzuLiBlV+qRjLSJ+GBG/V9d3V6vqaaZKZ8ucSCJiLXA+sAP4x6HuBP53y4pqf38JXFjXtwmY04Ja2lpErKPB1d+Z+W9bUE4J9gOXRsQ84N8N3T9raotragrD/XDzM7P+rpcahYg4BzgXOKXu088/Bwq7k8cxUzurnAK8h8OvGFd1r2Tm+yPiPwP3R8T7aPDHsx0Z7od7OCJmZebOVhdSgLOBdwOnAr9Z0/8z4PqWVNTmMvMva9sRsQH4qxaVU4IAyMzPRsQ2Bi/W/JXWltQcnudeJyIWAt8G/h74OYP/+JmZ57e0sDYWERdn5sOtrqNEEXE28J2h71LQCEXEb2bmt2vaZwEfysxVLSyrKZy5H24t8EHgCf5pzV1jMxAR3wX+RWZ2RcT5wOLM/O+tLqydDH1HwuvASzXdfw/c1JqK2ldEnJOZfwP8XUTUHw8q4oCqM/c6EXF3Zr6j1XWUJCLuBW4EvpqZFwz1PZmZXa2trP1ExKOZWR9GGqGIWJOZyyPie/ziGvvBT+ptnwHO3A/3NxGxnsGlmZ8f7MxMz5YZvV/KzP9b9+VcB1pVTJt7KCIuysyeVhfSzjLz4PdKvAv4CPCvGAz5+4E/aVVdzWS4H+6fMRjq/7qmz1Mhx+a5iPhVhmZIEXE18GxrS2pb7wD+fUTsBV7GY0Jj9XXgp8D/GmovA24D3teyiprEZRmNu4j4lwzece/twPPA3wLXZObTLS2sDQ0d8DuM7+XoRMTjmfm24frakTP3OhHRCfwRsIDBmeYDwO9nZl9LC2tvfwesA77H4GlmP2Xwy13a/oyEY80Qb7rHImJ+Zn4fYOhipgdbXFNTGO6HWwes55/uT/+Bob7LWlZR+7sTeAF4FC+40XEgIp5gcPI2CfjdiPjRUPssoIhrXFyWqRMR2zNz9nB9qs4zY3S8OdLy1kElfEJy5n645yLiA8CGofYyYKCF9ZTgoYg4LzOfaHUhEpQR3sNx5l4nIt4C/DFwMYMf0x4CbsjMH7W0sDYWETuBX2PwQKpX/UrHgOFeJyK+DnwsM58fav8K8Hnvujd6nuEhHXsuyxzu/IPBDpCZP4mIC1pZULszxKVjzy/rONwbIuKXDzaGZu7+EZTUVgytw/1PBg8AbmJwzf19wP9obUmSNDKuuTcQEbMYvMw7gO96b3dJ7cZwl6QCueYuSQUy3CWpQIa7JBXIcJekAhnuklSg/w9NYc34abuLYwAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plotTopUsageComparation(df_a_json, df_a_other, 'script_tld', 3)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/analyses/2019_03_aliamcami_value_analyses/isJson_Script_Domain_Output.ipynb b/analyses/2019_03_aliamcami_value_analyses/isJson_Script_Domain_Output.ipynb new file mode 100644 index 0000000..64b75e3 --- /dev/null +++ b/analyses/2019_03_aliamcami_value_analyses/isJson_Script_Domain_Output.ipynb @@ -0,0 +1,245 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Start Dask" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/anaconda3/envs/overscripted/lib/python3.6/site-packages/dask/config.py:168: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.\n", + " data = yaml.load(f.read()) or {}\n" + ] + } + ], + "source": [ + "import dask.dataframe as dd\n", + "from dask.diagnostics import ProgressBar\n", + "import matplotlib.pyplot as plt\n", + "import pandas as pd\n", + "import numpy as np\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Objective\n", + "\n", + "The objective of this notebook is to answer two main questions: \n", + " - \"Are there a set of location domains that always produces a JSON?\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusion\n", + "To answer \"Are there a set of location domains that always produces a JSON?\", YES, but it counts only to 0.16%, and\n", + "about 31% of all the domains can produces both types of values, json and non-json, so, we can say that only 31% of all the domains are capaple of geting JSONs, all the others 69% will never get a JSON value. \n", + "\n", + "---\n", + "\n", + "There are 11185 different scripts domains, 93.60% of those appear in multiple rows across the dataset. \n", + "\n", + "Most domains only have one type of value output, either they are json or not-json.\n", + "For the domains that have one tipe of output, 99% of the time they get the same keys_md5, but it may not be very accurate because every non-json value has a empty key and thus equal, so, after filtering to valid jsons we have only 19 domains with unique output && valid jsons, out of this, 63% are aways the same json." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "DIR = 'sample_0_prep/'" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['is_json', 'location_domain', 'script_domain', 'keys_md5'], dtype='object')" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = dd.read_parquet(DIR + 's0_domains_isJson_jsonKeys_md5_TLD.parquet',\n", + " engine='pyarrow',\n", + " columns=['is_json', 'location_domain', 'script_domain', 'keys_md5'])\n", + "df.columns" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "# Are there a set of location domains that always produces a JSON?\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[########################################] | 100% Completed | 3.5s\n", + "The total number of different script_domain is 11185\n" + ] + } + ], + "source": [ + "with ProgressBar():\n", + " group_by_script_domain = df.compute().groupby(['script_domain'])\n", + " group_by_script_domain_len = len(group_by_script_domain)\n", + " print(\"The total number of different {} is {}\".format('script_domain', group_by_script_domain_len))" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "agg = group_by_script_domain.agg({'is_json': ['nunique', 'sum'],\n", + " 'location_domain': ['nunique'],\n", + " 'keys_md5': ['nunique']})" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are a total of 10469(93.60%) JSONs that appear in multiple rows\n" + ] + } + ], + "source": [ + "appear_multiple_times = agg['is_json'][group_by_script_domain['is_json'].count() > 1]\n", + "appear_multiple_times_len = len(appear_multiple_times)\n", + "agg_len = len(agg['is_json'])\n", + "print('There are a total of {0}({1:0.2f}%) JSONs that appear in multiple rows'.format(\n", + " appear_multiple_times_len, \n", + " appear_multiple_times_len*100/agg_len))" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "def get_unique(agg, column, title=''):\n", + " agg_len = len(agg[column])\n", + " x = agg[agg[column]['nunique'] == 1]\n", + " x_len = len(x)\n", + " print(title + '{0} ({1:0.2f}%) unique {2},\\n{3} ({4:0.2f}%) multiple {2}'.format(\n", + " x_len,\n", + " x_len*100/agg_len,\n", + " column, \n", + " agg_len - x_len,\n", + " (agg_len - x_len) * 100 / agg_len\n", + " ))\n", + " return x" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "JSON data:\n", + "7697 (68.82%) unique is_json,\n", + "3488 (31.18%) multiple is_json\n", + "\n", + "KEYS data: out of the unique jsons\n", + "7690 (99.91%) unique keys_md5,\n", + "7 (0.09%) multiple keys_md5\n" + ] + } + ], + "source": [ + "unique_is_json = get_unique(agg, 'is_json', 'JSON data:\\n')\n", + "unique_json_key = get_unique(unique_is_json, 'keys_md5', '\\nKEYS data: out of the unique jsons\\n')\n", + "#it may not be very accurate because every non-json value has a empty key and thus equal" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are only 19 domains with unique valid json output\n", + "\n", + "KEYS data: out of the unique valid jsons\n", + "12 (63.16%) unique keys_md5,\n", + "7 (36.84%) multiple keys_md5\n" + ] + } + ], + "source": [ + "#Filter to only valid jsons\n", + "unique_is_json_jsons = unique_is_json[unique_is_json['is_json']['sum'] > 0]\n", + "print(\"There are only {} domains with unique valid json output\".format(len(unique_is_json_jsons)))\n", + "unique_key_jsons = get_unique(unique_is_json_jsons, 'keys_md5', '\\nKEYS data: out of the unique valid jsons\\n')" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/analyses/2019_03_aliamcami_value_analyses/isJson_Value_Distribution.ipynb b/analyses/2019_03_aliamcami_value_analyses/isJson_Value_Distribution.ipynb new file mode 100644 index 0000000..f7b718a --- /dev/null +++ b/analyses/2019_03_aliamcami_value_analyses/isJson_Value_Distribution.ipynb @@ -0,0 +1,671 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Start dask" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/anaconda3/envs/overscripted/lib/python3.6/site-packages/dask/config.py:168: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.\n", + " data = yaml.load(f.read()) or {}\n" + ] + } + ], + "source": [ + "import dask.dataframe as dd\n", + "from dask.diagnostics import ProgressBar\n", + "import matplotlib.pyplot as plt\n", + "import pandas as pd\n", + "import numpy as np\n", + "\n", + "# from dask.distributed import Client\n", + "# #Initializing client\n", + "# client = Client()\n", + "# client\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Parquet\n", + "Used sample: sample_0_prep/s0_domains_isjson.parquet\n", + " * This sample is the 10% sample with the \"is_json\" column added to it, this column is the result of the 'value' columns as a valid json or not. \n", + " * This sample can be obtained by running 'jsJson_dataPrep.ipynb'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
value_lenis_json
08False
\n", + "
" + ], + "text/plain": [ + " value_len is_json\n", + "0 8 False" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = dd.read_parquet('sample0_prep/s0_domains_isjson.parquet', engine=\"pyarrow\", columns=['value_len', 'is_json'])\n", + "df.head(1)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[########################################] | 100% Completed | 1.7s\n" + ] + } + ], + "source": [ + "with ProgressBar():\n", + " df = df.compute()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Values distribution: " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The absolute majority of the values are small. As seen on the graph below." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "%matplotlib inline\n", + "df['value_len'].plot(kind='hist', legend=True, logy=True, bins=10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Type distribution\n", + "The non-json values are found mainly withing the smaller values.\n", + " - Orange bar: non-json values\n", + " - Blue bars: json values" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Text(0, 0.5, 'density')" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZIAAAEMCAYAAADu7jDJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAFvpJREFUeJzt3X2MZXd93/H3J2sGmoSYZ+p6vdl1xnHjUlK8IwNNQSSpwSQ7WFCr8WIUirde8eCqUVRF60aqiVq0PKgJD7YCm2KMo8TGcRx1F5ZuEZAaVRbsGgjsylm8cU08eJM1eTAOSAGTb/+4Z+HuMA9359wzd+bM+yVdzT2/OQ/fe7zXn/n9fueem6pCkqSV+qFJFyBJWt8MEklSKwaJJKkVg0SS1IpBIklqxSCRJLVikEiSWjFIJEmtrMsgSfKyJJ9J8v4kL5t0PZK0ka16kCS5JcmpJEfntV+R5HiSE0n2LLObAv4OeAow11WtkqTlZbVvkZLkpQxC4Laqel7Ttgn4CnA5g2A4DOwENgF75+3iWuDrVfUPSZ4L/GZVXbPUMZ/1rGfV1q1bx/o6JKnv7rvvvq9X1bOXW++c1ShmWFXdk2TrvObLgBNV9SBAkjuAK6tqL7Bjid39DfDk5Y65detWjhw5srKCJWmDSvLVUdZb9SBZxPnAw0PLc8ALF1s5yWuAVwBPA25aZJ3dwG6ALVu2jK1QSdKZ1kqQZIG2Rcfcqupu4O6ldlhV+5KcBGanpqa2t6xPkrSItXLV1hxwwdDyZuCRtjutqgNVtfvcc89tuytJ0iLWSpAcBi5Ksi3JFHA1sL/tTpPMJtn32GOPtS5QkrSwSVz+eztwL3Bxkrkku6rqCeB64BBwP3BnVR1reyx7JJLUvUlctbVzkfaDwMFxHivJLDA7PT09zt1KkoaslaGtTtgjkaTu9TpInCORpO6tlct/O1FVB4ADMzMz1411x29doIfzVsNK0sbU6yAZl617PnbG8kNPmVAhkrQGObQlSWql10HiZLskda/XQSJJ6l6vg8ShLUnqXq+DxKEtSeper4NEktQ9g0SS1Eqvg8Q5EknqXq+DxDkSSeper4NEktQ9g0SS1IpBIklqxSCRJLXS6yDxqi1J6l6vg8SrtiSpe70OEklS9wwSSVIrBokkqRWDRJLUyrr8zvYkPwT8V+DHgCNV9eEJlyRJG9aq90iS3JLkVJKj89qvSHI8yYkke5bZzZXA+cB3gLmuapUkLW8SPZJbgZuA2043JNkE3AxcziAYDifZD2wC9s7b/lrgYuDeqvpAkruAT65C3ZKkBax6kFTVPUm2zmu+DDhRVQ8CJLkDuLKq9gI75u8jyRzw7Wbxu91VK0lazlqZbD8feHhoea5pW8zdwCuSvA+4Z6EVkuxOciTJkUcffXR8lUqSzrBWJtuzQFsttnJVfQvYtdQOq2pfkpPA7NTU1PaW9UmSFrFWeiRzwAVDy5uBR9ru1FukSFL31kqQHAYuSrItyRRwNbC/7U69aaMkdW8Sl//eDtwLXJxkLsmuqnoCuB44BNwP3FlVx9oeyx6JJHVvEldt7Vyk/SBwcJzHSjILzE5PT49zt5KkIWtlaKsT9kgkqXu9DhLnSCSpe70OEnskktS9XgeJJKl7vQ4Sh7YkqXu9DhKHtiSpe70OEklS93odJA5tSVL3eh0kDm1JUvd6HSSSpO4ZJJKkVnodJM6RSFL3eh0kzpFIUvd6HSSSpO4ZJJKkVgwSSVIrBokkqZVeB4lXbUlS93odJF61JUnd63WQSJK6Z5BIkloxSCRJrRgkkqRWzpl0ASuR5CXANQzqv6Sq/uWES5KkDWvVeyRJbklyKsnRee1XJDme5ESSPUvto6o+U1VvBD4KfLjLeiVJS5tEj+RW4CbgttMNSTYBNwOXA3PA4ST7gU3A3nnbX1tVp5rnrwX+fdcFS5IWt+pBUlX3JNk6r/ky4ERVPQiQ5A7gyqraC+xYaD9JtgCPVdU3OixXkrSMtTLZfj7w8NDyXNO2lF3Ahxb7ZZLdSY4kOfLoo4+OoURJ0kLWymR7FmirpTaoqhuX+f2+JCeB2ampqe1tipMkLW6t9EjmgAuGljcDj7TdqbdIkaTurZUgOQxclGRbkingamB/251600ZJ6t6qD20luR14GfCsJHPAjVX1wSTXA4cYXKl1S1Uda3usqjoAHJiZmbmu7b7OxtY9H/uBtofe/ourWYIkrZpJXLW1c5H2g8DBcR4rySwwOz09Pc7dSpKGrJWhrU44RyJJ3et1kDhHIknd63WQ2CORpO71OkgkSd3rdZA4tCVJ3et1kDi0JUnd63WQSJK61+sgcWhLkrrX6yBxaEuSurdW7v67oXlLFUnrWa97JJKk7vU6SJwjkaTu9TpInCORpO71OkgkSd0zSCRJrRgkkqRWDBJJUiu9DhKv2pKk7vU6SLxqS5K6N1KQJNmRpNehI0lamVHD4WrggSTvTPJTXRYkSVpfRgqSqnod8ALgz4APJbk3ye4kT+20OknSmjfycFVVfQP4Q+AO4Dzg1cDnk/yHjmqTJK0Do86RvCrJHwGfAp4EXFZVrwR+GvhPHda3WD1bkuxPckuSPat9fEnS943aI7kK+K2qen5VvauqTgFU1beAa8/mgM3//E8lOTqv/Yokx5OcGCEcfhL4WFVdC1xyNseXJI3XqEFysqruGW5I8g6AqvrkWR7zVuCKefvaBNwMvJJBMOxMckmSf57ko/MezwG+AFyd5FPAp8/y+JKkMRo1SC5foO2VKzlgE0h/Pa/5MuBEVT1YVd9mMA9zZVV9uap2zHucAt4A3FhVPwf4DVCSNEFLfkNikjcBbwZ+IsmXhn71VOD/jrGO84GHh5bngBcusf7/At6a5LXAQwutkGQ3sBtgy5Yt46lSkvQDlvuq3d8HPg7sBYbnLR6vqvm9ijayQFsttnJVHWUwb7OoqtqX5CQwOzU1tb1lfZKkRSw3tFVV9RDwFuDxoQdJnjHGOuaAC4aWNwOPtN2pt0iRpO6N0iPZAdzHoIcw3HMo4MIx1XEYuCjJNuBrDD5J/9q2O00yC8xOT0+33dW6t3XPx36g7aG3O70kqb0leyRVtaP5ua2qLmx+nn6sKESS3A7cC1ycZC7Jrqp6ArgeOATcD9xZVcdWsv959dsjkaSOLdcjASDJzwBfrKpvJnkdcCnw7qr687M9YFXtXKT9IHDwbPe3FHskktS9US///W3gW0l+Gvg14KvA73ZW1ZjYI5Gk7o3UIwGeqKpKciXwnqr6YJLXd1nYOPS1R+J8h6S1ZNQeyeNJbgBeB3ys+ST6k7orazzskUhS90YNkl8C/h7YVVV/weADhO/qrCpJ0rox0tBWEx6/ObT858BtXRU1Ln0d2pKktWTU28i/JskDSR5L8o0kjyf5RtfFteXQliR1b9TJ9ncCs1V1f5fFSJLWn1HnSP5yPYZIktkk+x577LFJlyJJvTVqkBxJ8pEkO5thrtckeU2nlY2BQ1uS1L1Rh7Z+DPgW8PKhtgLuHntFkqR1ZdSrtt7QdSGSpPVp1Htt/SSD26Q8t6qel+T5wKuq6r91Wl1LXv67cn56XtKoRp0j+R3gBuA7AFX1JQa3el/TnCORpO6NGiQ/XFWfm9f2xLiLkSStP6MGydeT/ATN198muQo42VlVkqR1Y9Srtt4C7AP+aZKvAf8PuKazqiRJ68aSQZLkV4cWDwKfZtCL+Sbwbxi6/5YkaWNabmjrqc1jBngT8HTgacAbgUu6La09P9kuSd1bskdSVb8BkOR/A5dW1ePN8luBP+i8upaq6gBwYGZm5rpJ16IzeXmx1B+jTrZvAb49tPxtYOvYq5EkrTujTrb/LvC5JH/E4MqtVwMf7qwqrTv2MKSNa9RbpLwtyceBlzRNb6iqL3RXliRpvRi1R0JVfR74fIe1SJLWoVHnSNaUJJckuTPJbzcfjpQkTcjIPZJxSXILsAM4VVXPG2q/AngPsAn4H1X19iV280rgfVX1mST7gbu6rFlrg/Mw0tq06kEC3ArcBNx2uiHJJuBm4HJgDjjcBMQmYO+87a9lMPl/Y5JXAc9chZolSYtY9SCpqnuSbJ3XfBlwoqoeBEhyB3BlVe1l0HtZyFuaAPLLtSRpgibRI1nI+cDDQ8tzwAsXW7kJov8M/AjwrkXW2Q3sBtiyZcuYypQkzbdWgiQLtNViK1fVQzQhscQ6+5KcBGanpqa2tytPkrSYtXLV1hxwwdDyZuCRtjv1i60kqXtrJUgOAxcl2ZZkisG3L+5vu1Nv2ihJ3Vv1IElyO3AvcHGSuSS7quoJ4HrgEHA/cGdVHWt7LHskktS9SVy1tXOR9oMMvvNkbJLMArPT09Pj3K0kachaGdrqhD0SSeper4PEORJJ6l6vg8QeiSR1r9dBIknqXq+DxKEtSeper4PEoS1J6l6vg0SS1L1eB4lDW5LUvbVy08ZOVNUB4MDMzMx1k65Fa59fnLU2+N9h/el1j0SS1D2DRJLUSq+DxDkSSeper4PEy38lqXu9DhJJUvcMEklSKwaJJKkVg0SS1Eqvg8SrtiSpe70OEq/akqTu9TpIJEnd6/W9tiRtDN6fa7LskUiSWjFIJEmtrPkgSXJhkg8muWuo7UeSfDjJ7yS5ZpL1SdJG12mQJLklyakkR+e1X5HkeJITSfYstY+qerCqds1rfg1wV1VdB7xqzGVLks5C15PttwI3AbedbkiyCbgZuByYAw4n2Q9sAvbO2/7aqjq1wH43A19unn93zDWrJ5yAlVZHp0FSVfck2Tqv+TLgRFU9CJDkDuDKqtoL7Bhx13MMwuSLrIPhOUnqs0lc/ns+8PDQ8hzwwsVWTvJM4G3AC5Lc0ATO3cBNSX4ROLDIdruB3QBbtmwZU+mSNiJ7t0ubRJBkgbZabOWq+ivgjfPavgm8YamDVNW+JCeB2ampqe0rKVSStLxJDAvNARcMLW8GHuniQN4iRZK6N4kgOQxclGRbkingamB/Fwfypo2S1L2uL/+9HbgXuDjJXJJdVfUEcD1wCLgfuLOqjnVxfHskktS9rq/a2rlI+0HgYJfHhkGPBJidnp7u+lCStGH1+tJZeySS1L1eB4lzJJLUvV7fRr6qDgAHZmZmrpt0Leqn+Z8v8LMF2ojskUiSWrFHIq1hfqJa60GveySSpO4ZJJKkVnodJM6RSFL3eh0kfo5EkrrX6yCRJHXPIJEktdLrIHGORJK61+sgcY5EkrrX6yCRJHXPIJEktWKQSJJaMUgkSa30Oki8akuSutfrIPGqLUnqXq9vIy9ped6qXm31ukciSeqeQSJJasUgkSS1suaDJMmFST6Y5K6l2iRJk9FpkCS5JcmpJEfntV+R5HiSE0n2LLWPqnqwqnYt1yZJmoyur9q6FbgJuO10Q5JNwM3A5cAccDjJfmATsHfe9tdW1amOa5RWjVdIaTHr+d9Gp0FSVfck2Tqv+TLgRFU9CJDkDuDKqtoL7OiyHknS+E1ijuR84OGh5bmmbUFJnpnk/cALktywWNsC2+1OciTJkUcffXSM5UuShk3iA4lZoK0WW7mq/gp443JtC2y3L8lJYHZqamr7SgqVJC1vEj2SOeCCoeXNwCNdHMhbpEhS9yYRJIeBi5JsSzIFXA3s7+JA3rRRkrrX9eW/twP3AhcnmUuyq6qeAK4HDgH3A3dW1bEujm+PRJK61/VVWzsXaT8IHOzy2DDokQCz09PTXR9KkjasNf/J9jbskUhS93odJM6RSFL3ev19JFV1ADgwMzNz3aRrkaRxWkufhLdHIklqpddB4hyJJHWv10EiSeqeQSJJaqXXQeIciSR1r9dB4hyJJHWv10EiSeqeQSJJaqXXQeIciSR1L1WLfqdUbyR5FPjqCjZ9FvD1MZez3nlOzuT5OJPn40zr/Xz8eFU9e7mVNkSQrFSSI1U1M+k61hLPyZk8H2fyfJxpo5yPXg9tSZK6Z5BIkloxSJa2b9IFrEGekzN5Ps7k+TjThjgfzpFIklqxRyJJasUgWUSSK5IcT3IiyZ5J17MSSW5JcirJ0aG2ZyT5RJIHmp9Pb9qT5L3N6/1SkkuHtnl9s/4DSV4/1L49yZebbd6bJCs9xmpIckGSTye5P8mxJP9xI5+TJE9J8rkkf9Kcj99o2rcl+WxT60eSTDXtT26WTzS/3zq0rxua9uNJXjHUvuD7aCXHWC1JNiX5QpKPrrTWPp2PkVSVj3kPYBPwZ8CFwBTwJ8Alk65rBa/jpcClwNGhtncCe5rne4B3NM9/Afg4EOBFwGeb9mcADzY/n948f3rzu88BL262+TjwypUcYxXPx3nApc3zpwJfAS7ZqOekOeaPNs+fBHy2qeFO4Oqm/f3Am5rnbwbe3zy/GvhI8/yS5j3yZGBb897ZtNT76GyPscr/Tn4V+H3goyuptW/nY6RzNukC1uKj+R/BoaHlG4AbJl3XCl/LVs4MkuPAec3z84DjzfMPADvnrwfsBD4w1P6Bpu084E+H2r+33tkeY4Ln5n8Cl3tOCuCHgc8DL2TwAbpzmvbvvReAQ8CLm+fnNOtl/vvj9HqLvY+abc7qGKt4HjYDnwR+DvjoSmrt0/kY9eHQ1sLOBx4eWp5r2vrguVV1EqD5+ZymfbHXvFT73ALtKznGqmuGCF7A4K/wDXtOmmGcLwKngE8w+Iv5b6vqiQXq+V6tze8fA57J2Z+nZ67gGKvl3cCvAf/QLK+k1j6dj5EYJAvLAm19v7xtsdd8tu0rOcaqSvKjwB8Cv1JV31hq1QXaenVOquq7VfUvGPwlfhnwU0vUM67zsdRrntj5SLIDOFVV9w03L1FPr8/H2TBIFjYHXDC0vBl4ZEK1jNtfJjkPoPl5qmlf7DUv1b55gfaVHGPVJHkSgxD5vaq6u2ne0OcEoKr+FvhjBnMkT0tyzgL1fK/W5vfnAn/N2Z+nr6/gGKvhZ4BXJXkIuIPB8Na7V1BrX87HyAyShR0GLmqupJhiMMm1f8I1jct+4PRVRq9nME9wuv2Xm6uIXgQ81gzBHAJenuTpzZVGL2cwfnsSeDzJi5ork3553r7O5hiroqnzg8D9VfWbQ7/akOckybOTPK15/o+Afw3cD3wauGqRWk+/hquAT9Vg8H4/cHVzhdE24CIGFx0s+D5qtjnbY3Suqm6oqs1VtbWp9VNVdc0Kau3F+Tgrk56kWasPBlfTfIXBmPGvT7qeFb6G24GTwHcY/GWzi8H46ieBB5qfz2jWDXBz83q/DMwM7eda4ETzeMNQ+wxwtNnmJr7/AdezPsYqnY9/xWBY4EvAF5vHL2zUcwI8H/hCcz6OAv+lab+Qwf/4TgB/ADy5aX9Ks3yi+f2FQ/v69eY1HKe5Um2p99FKjrHK/1Zexvev2trw52O5h59slyS14tCWJKkVg0SS1IpBIklqxSCRJLVikEiSWjFIJEmtGCTSmCX5uzHv79YkVy2/pjQZBokkqRWDRFpGknckefPQ8luT3Jjkk0k+n8EXWV25wHYvO/3lSM3yTUn+XfN8e5L/k+S+JIdO34drhFoW3C7JHzd1fi7JV5K8pPULl0ZkkEjLuwP4paHlfwt8CHh1VV0K/Czw35v7ay2ruXHk+4Crqmo7cAvwtjFsd05VXQb8CnDjKLVI43DO8qtIG1tVfSHJc5L8E+DZwN8wuIfZbyV5KYPvrjgfeC7wFyPs8mLgecAnmuzZ1Oyv7Xan72Z8H4MvNJNWhUEijeYuBndf/ccMeijXMAiV7VX1nebW40+Zt80TnNnrP/37AMeq6sVnWcNy2/198/O7+N7WKnJoSxrNHQxu+30Vg1A5l8GXIH0nyc8CP77ANl8FLmluJ34u8PNN+3Hg2UleDIMhqyT/bIQaVrqd1Cn/apFGUFXHkjwV+FpVnUzye8CBJEcY3I7+TxfY5uEkdzK4TfsDDG7ZTlV9u7mc971NwJzD4AuUji1Tw4q2k7rmbeQlSa04tCVJasWhLWmNSHIzg+8NH/aeqvrQJOqRRuXQliSpFYe2JEmtGCSSpFYMEklSKwaJJKkVg0SS1Mr/B63LLKd83LcbAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plt.hist(\n", + " (df[df.is_json==True].value_len, df[df.is_json==False].value_len),\n", + " bins=25,\n", + " density=True,\n", + " label=['true', 'false'],\n", + "# color=['teal','orange'],\n", + ")\n", + "plt.yscale('log')\n", + "plt.xlabel('value_len')\n", + "plt.ylabel('density')" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Text(0, 0.5, 'density')" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plt.hist(\n", + " (df[df.is_json==True].value_len, df[df.is_json==False].value_len),\n", + " bins=25,\n", + " density=True,\n", + " label=['true', 'false'],\n", + "# color=['teal','orange'],\n", + ")\n", + "plt.xlabel('value_len')\n", + "plt.ylabel('density')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# JSON percentual by group\n", + "Here the orange is the percentual of non-JSON values found in each group, and the blue is the percentual of JSON values. \n", + "We can see that as we filter the data to bigger values the percentual of JSON values also increases. \n", + "\n", + "The gorups are: \n", + "- Original: all original data (sample 10%)\n", + "- Above_mean: original data filtered to only values above the mean\n", + "- Above_std: original data filtered to only values 1 std above the mean" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "def count_json(df):\n", + " trues = df.is_json[df.is_json == True].count()\n", + " falses = df.is_json[df.is_json == False].count()\n", + " total = df.is_json.count()\n", + " return trues/total, falses/total\n", + " \n", + "total_count = count_json(df)\n", + "total_mean = df.value_len.mean()\n", + "total_std = df.value_len.std()\n", + "\n", + "above_mean_count = count_json(df[df['value_len'] > total_mean])\n", + "above_std_count = count_json(df[df['value_len'] > (total_mean + total_std)])\n", + "\n", + "p1 = pd.DataFrame([total_count, above_mean_count, above_std_count],\n", + " columns= [ 'json', 'other'],\n", + " index=[ 'original', 'above_mean', 'above_std'])\n", + "plot = p1.plot(kind='bar')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# JSON percentual by bins" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "#Helper code to separate and calculate what needed\n", + "import math\n", + "def percetangeData(df):\n", + " bins=[]\n", + " trues=[]\n", + " falses=[]\n", + " \n", + " nbins = 10\n", + " minimum_value = min(df.value_len)\n", + " range_value = max(df.value_len) - minimum_value\n", + " step = math.ceil(range_value/nbins)\n", + " bin_max_range = minimum_value\n", + " def count_in_range(df):\n", + " f1 = df.value_len >= bin_max_range - step\n", + " f2 = df.value_len < bin_max_range\n", + " return len(df[f1 & f2])\n", + "\n", + " for x in range(0, nbins):\n", + " bin_max_range += step\n", + " range_count = count_in_range(df)\n", + " bins.append(str(bin_max_range)) #superior margin for each bin\n", + " if range_count == 0:\n", + " #If range_count is 0 then there is no counting to do for trues or falses, all 0. \n", + " trues.append(0)\n", + " falses.append(0)\n", + " else:\n", + " trues.append(count_in_range(df[df.is_json == True]) / range_count)\n", + " falses.append(count_in_range(df[df.is_json == False]) / range_count)\n", + "\n", + "# print('Bins:', bins)\n", + "# print('Trues: ', trues)\n", + "# print('Falses: ', falses)\n", + "# print(pd.DataFrame([bins, trues, falses], index= ['up to value', 'json%', 'non json%']))\n", + " return (bins, trues, falses)\n", + "\n", + "def plotPercentualComparison(df, title='Value type: Json X Other'):\n", + " bins, trues, falses = percetangeData(df)\n", + " width = 0.95\n", + " p1 = plt.bar(bins, trues, width=width)\n", + " p2 = plt.bar(bins, falses, bottom=trues, width=width)\n", + "\n", + " plt.ylabel('Scores')\n", + " plt.xlabel('Value_len')\n", + " plt.title(title)\n", + " plt.legend((p1[0], p2[0]), ('JSON', 'Other'))\n", + " idx = np.round(np.linspace(0, 10 - 1, 4)).astype(int)\n", + " plt.xticks(idx, [bins[i] for i in idx])\n", + "\n", + " return plt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## All values\n", + "If all data is divided in 10 bins and the percentage of NON-JSON values in each bin is painted orange, we have the following graph:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEXCAYAAABCjVgAAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3Xm8HfP9x/HXOzebLYKEkhtJLEGKWmKrfY8tqkWjlvjZWrUUtUQXW0vtaq+0KGpLQ9sglSolqqWiIkSEiOAKEgSNCEnu5/fHzB2Tk7ucXGfuieT9fDzOI7N8Z+Z75tyc95nvzHxHEYGZmRlAu2pXwMzMFh0OBTMzyzgUzMws41AwM7OMQ8HMzDIOBTMzyzgUzAomabykHapdj0qQdLikf1a7HlYch4KVTdKjkmZI6lQy/feSftnEMiFprRbWe5CkKZJUMr29pGmS9s5N6yOpXtJ15W5L0jmS/tBc+fS9zZY0M/e6L1f2J5JeS6fXSbq7ufeUFxFfj4hHyy2/qJDUO91H7atdF2s7DgUri6TewLZAAAMrvPo/AV2B7UumD0i392Bu2mHADGBQaThVwPERsWzutQ+ApMHAocAuEbEs0B94uMLbXuI4bBZNDgUr12HAk8DvgcGVXHFEzAaGpdso3ebtETG3ZNrPgDnAPpWsRzM2A0ZFxKsAEfFORAwtd+H0KGiXdHhzSWMkfSzpXUmX58oNTJuaPkyPXNYrWcepksZJ+kjS3ZI6N7G96yUNz41fJOnh0iOxdF47ST+T9Hp6VHarpOXT2aPTfz9Mj5C2yi13aXrU+JqkPXLTl5d0o6S3Jb0l6ZeSatJ5h0t6QtIVkj4Azil3H1rbcShYuQ4Dbk9fu0tapcLrvwXYX9JSkHy5kHzp39pQQNK2QC1wF42HSFGeBA6TdJqk/g1fcrl6DZF0f5nruhK4MiK6AGuSvA8k9QXuBE4CugMjgfskdcwteyDJ0VMfYEPg8Ca28WNgw/RLeFvgSGBwNN6nzeHpa0dgDWBZ4Jp03nbpv13TI6d/p+NbABOBbsDFwI25wLkFmAusBWwM7AYcldveFsBkYGXg/Cbqb1XkULAWSdoG6AUMi4hngFeB71VyGxHxBPAusF866UDg5YgYmys2GPhrRMwA7gD2kLRyBatxVforveH1i7RufwBOAHYHHgOmSRqSq/uFEbF346tcwBxgLUndImJmRDyZTv8u8EBEPBQRc4BLgaWAb+brFxFTI+ID4D5go8Y2EBGzgEOAy4E/ACdERF0T9TkYuDwiJkfETOBMkqa55pp2Xo+I30bEPJIQWBVYJf2hsAdwUkR8EhHTgCuAQbllp0bE1RExNyI+bWYbViUOBSvHYOBvEfFeOn4HFW5CSt3KF7/+DyX5wgEgPYI4gORIhfRX6xuUF05zgQ75CZIaxufkJp8YEV1zr583zIiI2yNiF5JzHz8AzpO0+8K8udSRQF/gJUlP506irwa8nttePfAm0CO37Du54Vkkv+obFRH/IflFLtKjkSbMt910uD3Q3JFgVo80gEjr0otkP7/dEKzADSRHBQ3ebGa9tghwKFiz0i/jA4HtJb0j6R3gZOAbkr5R4c3dCuyctl1vSRI+DfYDugDX5erRg/KakN4AepdM6wPMA95amApGxJyI+CMwDlh/YZZNl38lIg4i+aK8CBguaRlgKsmXKgBpc0zPha1fbvnjgE7pek9vpuh82wVWJwnRd0lO8i+MN4HPgG65YO0SEV/PlXG3zIs4h4K15FskX579SJorNgLWAx5n/i/kGkmdc698W3jHknnztck3iIjXgX+StK0/FBH5X8aDgZuADXL12BrYSNIGLWzrQWAdSYdK6iBpReACYHjJSexGpW3ze0laLj0xuwfwdeCplpZtZF2HSOqeHgl8mE6eR/Jrfi9JO6dHMT8m+YL9Vyu20Rf4JUkT0qHA6ZIabWoi2dcnK7nUd1mS/XJ3ul+mA/Uk5xpaFBFvA38DLpPUJd1Xa0oqvarMFmEOBWvJYODmiHgjvermnfTL+hrg4Fzb8xDg09zrkdw6xpfM+79mtncLyS/X/AnmHsDOwK/zdUjPbzzI/E1ZC2wrbdveE/g+MA14AfgIOLZk29do/vsUnkmnfwz8hOSI40OSk6vHRsQ/0/r9RNJfm3lPeQOA8ZJmkpx0HhQRsyNiIsmX+NXAeyQn2feJiM/LXC9pXdqTnEe4KCKei4hX0rrfpsYv4b0JuI3kSqPXgNkk508amobOB55Im4O2LKMKhwEdgRdJLh0eTnLOwb4i5IfsmBVL0hvAIRExusXCZlXmIwWzAknqTnKJ6ZQqV8WsLIWFgqSb0pthXmhiviRdJWmSkhtyNimqLmbVIGkz4BXg6oh4o9r1MStHYc1HkrYDZgK3RsQCV2lI2pOk7XJPkhtaroyILQqpjJmZlaWwI4W0/fSDZorsSxIYkd7A01WST0iZmVVRNTuk6sH8N7LUpdPeLi0o6RjgGIBllllm03XXXbd1W5z6bOuW+zJW2xiA59/6qM03vUGP5auy3Wpue4MeSbc9S9L+9nte8rbdGs8888x7EdG9pXLVDIUFOueiiRtb0s7HhgL0798/xowZ07otntP6Hdpq5yR17T3kgTbf9JgL96rKdqu57TEX7gUsWfvb73nJ23ZrSHq95VLVvfqojuSOzQa1JHdXmplZlVQzFEaQ9Dyp9KaYj9I7Is3MrEoKaz6SdCewA9BNUh1wNmmnZBHxG5KugfcEJpF07tXcXa5mZtYGCguFtNOv5uYHcFxR2zezJVeXTu04YYsV6NW1A2r09GXrTZgwAYDfDmz7iyUbtt2czp07U1tbS4cOHVos2xg/Ds/MFjsnbLECm6y5Gu2XXg4t+MC5L2W92q4AzKn7sIWSldew7aZEBO+//z51dXX06dOnVdtwNxdmttjp1bVDIYGwqJPESiutxOzZs1u9DoeCmS12hJa4QGjwZd+3Q8HMzDI+p2Bmi72B1zxR0fVNKeMmsi3XqeVfE97g4nPO5D9PjEYSnTp15uLrb6Z29V787+OPuPCsMxj7dPKspo0224Ih513Ecl2W560332DPb36DM867iO/93zEAXPCz0xiww9YcfvjhFX0vpXykYGZWkFEj7mX6u+8w/KEnuOfv/+KK391Gly5JzwrnnHYitav35oEnnuWBJ56lR89enHv6j7JlV+zWnTtu/A1zPl+o5yx9aQ4FM7OCTJ/2Lt1WXoV27ZKv2lVW7UGXrl1547XJvPj8WI750WlZ2e+fdDrjxz3Lm1NeA2CFFVdi8222Y8TwO9u0zg4FM7OC7L7Ptxj99wc5cPdtufS8nzHhhXEATH7lJdbptwE1NV88rrympoZ1+m3Aqy9/cS/CET88mVuHXsu8efParM4OBTOzgqyyag/+8ujTnDjkLNq1E8cM2pen/vkYEU1cJZTMyEZrV+/F+httwsg//7HN6uwTzWZmBerYqRPb7Lgr2+y4Kyt1W5lHRj3AwUf8gJfGj6O+vj5rWqqvr2fihBdYY6115lv+qONP4cffH8ymW3yzTerrIwUzs4JMeP45pr2T9PNZX1/Pyy+NZ7UePVm9zxqs+/UNGXrVpVnZoVddynrrf4PV+6wx3zr6rNWXNfuuy+iHR7VJnX2kYGaLvRHHb12xdW3YQlcTAHPnzqVjx4588N50zj3jR3z+2WcArL/Rpgw6/GgAzr3kan511unsvc0mRAQbbroZ51xydaPrO+qEU/jugO0r9h6a41AwM6uwV1+eQG2vPmy94y5sveMujZbp0rUrv7pqaKPzevRcnXsf/nc2vk6/DRj7xgdlBdKX5VAwM6ugYbfdxJ03D+W0sy+odlVaxaFgZlZBBx56BAceekS1q9FqPtFsZmYZh4KZmWUcCmZmlnEomJlZxieazWyxt+HvelV2hed81GKRd99+iwt+ehqTX5lIfX092+2yO6f89DxefWUi0999m2132g2A6y+/kKWXXobBPzihsnVsJR8pmJlVWERw8tGHsePue3Hf488wYvQYZn3yCVdf/Asmjn+exx95qGLbqnRneT5SMDOrsP88MZpOnTrxre8eDCQ9oJ529vkM2GpD2rfvABGMffpJjjjuZABefWUiRx6wN29PrePgI4/l4CO+D8D9997NHTcNZe6cz1l/4025+5YbqampYdlll+WUU05h1KhRXHbZZWyzzTYVq7uPFMzMKmzSyy/Rb4ON5pu27HJdWK12dY4+8VR222c/ho16nAEDvw3AlFdf5vo/3MPt9z3MDVdcxJw5c5j8ykRG3fcnbvnTgwwb9Tg17Wq4/fbbAfjkk09Yf/31eeqppyoaCOAjBTOzyivpAjs/vbEus7fdaTc6dupEx06dWLFbdz54bxpPPfEYE8Y9x8F77wTA7NmzWW+NnkBy5PGd73ynkKo7FMzMKmzNvuvy95Ej5ps2838f887Ut6hpt2ADTceOnbLhdu3aMXfuPCJgnwMG8aMhZ2fzGvo+6ty583wP6KkkNx+ZmVXYFttsz+xPP+W+4XcBycngy37xcwYe8D1W6r4ysz6Z2fI6tt6Ovz8wgvffmw7ARzNm8Prrrxdab/CRgpktAcYdVbkv03J6KpXEFb+7jfN/eipDr7yE+vp6ttlpV0484+d8OmsWN137aw7cfdvsRHNj1uy7Lsed9lOOPfjb1NfX075DB24a+ht69arw5bUlHApmZgX42mq1XH3zXQtM79ipE3c88EiTy+W7zB4w8NvZyWj4IpBmzmz5SKO13HxkZmYZh4KZmWUcCma22AmCiKh2Nariy75vh4KZLXZe/3AOc2d9vMQFQ0Tw/vvv07lz51avwyeazWyxc/VTMzgB6NX1PUQjN5F9CRP+txQA7874tKLrXZhtN6dz587U1ta2ehsOBTNb7Hz8WT3nj36/kHVPuXAvAPYY8kAh6y9n20Vy85GZmWUKDQVJAyRNlDRJ0pBG5q8u6R+SnpU0TtKeRdbHzMyaV1goSKoBrgX2APoBB0nqV1LsZ8CwiNgYGARcV1R9zMysZUUeKWwOTIqIyRHxOXAXsG9JmQC6pMPLA1MLrI+ZmbWgyFDoAbyZG69Lp+WdAxwiqQ4YCTT6PDpJx0gaI2nM9OnTi6irmZlRbCg0dh1Y6UXDBwG/j4haYE/gNkkL1CkihkZE/4jo37179wKqamZmUGwo1AE9c+O1LNg8dCQwDCAi/g10BroVWCczM2tGkaHwNLC2pD6SOpKcSB5RUuYNYGcASeuRhILbh8zMqqSwUIiIucDxwChgAslVRuMlnSdpYFrsx8DRkp4D7gQOjyXtvnQzs0VIoXc0R8RIkhPI+Wln5YZfBLYusg5mZlY+39FsZmYZh4KZmWUcCmZmlnEomJlZxqFgZmYZh4KZmWUcCmZmlnEomJlZxqFgZmYZh4KZmWUcCmZmlnEomJlZxqFgZmYZh4KZmWUcCmZmlnEomJlZxqFgZmYZh4KZmWUcCmZmlnEomJlZxqFgZmYZh4KZmWUcCmZmlnEomJlZxqFgZmYZh4KZmWUcCmZmlnEomJlZxqFgZmYZh4KZmWUcCmZmlnEomJlZxqFgZmYZh4KZmWUKDQVJAyRNlDRJ0pAmyhwo6UVJ4yXdUWR9zMysee2LWrGkGuBaYFegDnha0oiIeDFXZm3gTGDriJghaeWi6mNmZi0r8khhc2BSREyOiM+Bu4B9S8ocDVwbETMAImJagfUxM7MWFBkKPYA3c+N16bS8vkBfSU9IelLSgMZWJOkYSWMkjZk+fXpB1TUzsyJDQY1Mi5Lx9sDawA7AQcDvJHVdYKGIoRHRPyL6d+/eveIVNTOzRJGhUAf0zI3XAlMbKfOXiJgTEa8BE0lCwszMqqDIUHgaWFtSH0kdgUHAiJIyfwZ2BJDUjaQ5aXKBdTIzs2YUFgoRMRc4HhgFTACGRcR4SedJGpgWGwW8L+lF4B/AaRHxflF1MjOz5hV2SSpARIwERpZMOys3HMAp6cvMzKrMdzSbmVnGoWBmZhmHgpmZZcoKBUkHSFouHf6ZpHslbVJs1czMrK2Ve6Tw84j4n6RtgN2BW4Dri6uWmZlVQ7mhMC/9dy/g+oj4C9CxmCqZmVm1lBsKb0m6ATgQGCmp00Isa2ZmXxHlfrEfSHKj2YCI+BBYETitsFqZmVlVlBUKETELmAZsk06aC7xSVKXMzKw6yr366GzgDJIH4gB0AP5QVKXMzKw6ym0+2g8YCHwCEBFTgeWKqpSZmVVHuaHwedpPUQBIWqa4KpmZWbWUGwrD0quPuko6Gvg78NviqmVmZtVQVi+pEXGppF2Bj4F1gLMi4qFCa2ZmZm2uxVCQVAOMiohdAAeBmdlirMXmo4iYB8yStHwb1MfMzKqo3IfszAael/QQ6RVIABFxYiG1MjOzqig3FB5IX2Zmthgr90TzLZI6An3TSRMjYk5x1TIzs2ooKxQk7UDSXfYUQEBPSYMjYnRxVTMzs7ZWbvPRZcBuETERQFJf4E5g06IqZmZmba/cm9c6NAQCQES8TNL/kZmZLUbKPVIYI+lG4LZ0/GDgmWKqZGZm1VJuKBwLHAecSHJOYTRwXVGVMjOz6ig3FNoDV0bE5ZDd5dypsFqZmVlVlHtO4WFgqdz4UiSd4pmZ2WKk3FDoHBEzG0bS4aWLqZKZmVVLuaHwiaRNGkYk9Qc+LaZKZmZWLeWeUzgJ+KOkqSQP2lkN+G5htTIzs6po9khB0maSvhYRTwPrAncDc4EHgdfaoH5mZtaGWmo+ugH4PB3eCvgJcC0wAxhaYL3MzKwKWmo+qomID9Lh7wJDI+Ie4B5JY4utmpmZtbWWjhRqJDUEx87AI7l55Z6PMDOzr4iWvtjvBB6T9B7J1UaPA0haC/io4LqZmVkbazYUIuJ8SQ8DqwJ/i4hIZ7UDTii6cmZm1rbKeUbzkxHxp4jIP4bz5Yj4b0vLShogaaKkSZKGNFNuf0mR3v9gZmZVUu7Nawst7R/pWmAPoB9wkKR+jZRbjqSjvaeKqouZmZWnsFAANgcmRcTkiPgcuAvYt5FyvwAuBmYXWBczMytDkaHQA3gzN16XTstI2hjoGRH3N7ciScdIGiNpzPTp0ytfUzMzA4oNBTUyLbKZUjvgCuDHLa0oIoZGRP+I6N+9e/cKVtHMzPKKDIU6oGduvBaYmhtfDlgfeFTSFGBLYIRPNpuZVU+RofA0sLakPpI6AoOAEQ0zI+KjiOgWEb0jojfwJDAwIsYUWCczM2tGYaEQEXOB44FRwARgWESMl3SepIFFbdfMzFqv0K4qImIkMLJk2llNlN2hyLqYmVnLimw+MjOzrxiHgpmZZRwKZmaWcSiYmVnGoWBmZhmHgpmZZRwKZmaWcSiYmVnGoWBmZhmHgpmZZRwKZmaWcSiYmVnGoWBmZhmHgpmZZRwKZmaWcSiYmVnGoWBmZhmHgpmZZRwKZmaWcSiYmVnGoWBmZhmHgpmZZRwKZmaWcSiYmVnGoWBmZhmHgpmZZRwKZmaWcSiYmVnGoWBmZhmHgpmZZRwKZmaWcSiYmVnGoWBmZhmHgpmZZQoNBUkDJE2UNEnSkEbmnyLpRUnjJD0sqVeR9TEzs+YVFgqSaoBrgT2AfsBBkvqVFHsW6B8RGwLDgYuLqo+ZmbWsyCOFzYFJETE5Ij4H7gL2zReIiH9ExKx09EmgtsD6mJlZC4oMhR7Am7nxunRaU44E/trYDEnHSBojacz06dMrWEUzM8srMhTUyLRotKB0CNAfuKSx+RExNCL6R0T/7t27V7CKZmaW177AddcBPXPjtcDU0kKSdgF+CmwfEZ8VWB8zM2tBkUcKTwNrS+ojqSMwCBiRLyBpY+AGYGBETCuwLmZmVobCQiEi5gLHA6OACcCwiBgv6TxJA9NilwDLAn+UNFbSiCZWZ2ZmbaDI5iMiYiQwsmTaWbnhXYrcvpmZLRzf0WxmZhmHgpmZZRwKZmaWcSiYmVnGoWBmZhmHgpmZZRwKZmaWcSiYmVnGoWBmZhmHgpmZZRwKZmaWcSiYmVnGoWBmZhmHgpmZZRwKZmaWcSiYmVnGoWBmZhmHgpmZZRwKZmaWKfQZzYua3rPvaPNtTmnzLZqZtZ6PFMzMLONQMDOzjEPBzMwyDgUzM8s4FMzMLONQMDOzjEPBzMwyDgUzM8s4FMzMLONQMDOzjEPBzMwyDgUzM8s4FMzMLONQMDOzjEPBzMwyhYaCpAGSJkqaJGlII/M7Sbo7nf+UpN5F1sfMzJpXWChIqgGuBfYA+gEHSepXUuxIYEZErAVcAVxUVH3MzKxlRR4pbA5MiojJEfE5cBewb0mZfYFb0uHhwM6SVGCdzMysGYqIYlYs7Q8MiIij0vFDgS0i4vhcmRfSMnXp+KtpmfdK1nUMcEw6ug4wsZBKN68b8F6LpaxSvL/bjvd126rW/u4VEd1bKlTkM5ob+8VfmkDllCEihgJDK1Gp1pI0JiL6V7MOSxLv77bjfd22FvX9XWTzUR3QMzdeC0xtqoyk9sDywAcF1snMzJpRZCg8DawtqY+kjsAgYERJmRHA4HR4f+CRKKo9y8zMWlRY81FEzJV0PDAKqAFuiojxks4DxkTECOBG4DZJk0iOEAYVVZ8KqGrz1RLI+7vteF+3rUV6fxd2otnMzL56fEezmZllHApmZpZZIkJBUo2kZyXdXzL9akkzc+O9JD0saZykRyXV5uatLulvkiZIerGhSw5JO0v6r6Sxkv4paa10+hXptLGSXpb0Ydu827Yn6SZJ09L7Thqm3Z17/1MkjU2nd5R0s6TnJT0naYfcMgel08dJelBSt+bWlVtudUkzJZ2am3aypPGSXpB0p6TOhe+INiCps6T/pPtuvKRz0+nHp93FRMN+K1luM0nz0vuHkLRjbp+OlTRb0rfSeU39TTfaLU1zn+niYlH7DpE0WNIr6WswlRQRi/0LOAW4A7g/N60/cBswMzftj8DgdHgn4LbcvEeBXdPhZYGl0+GXgfXS4R8Cv29k+yeQnGiv+r4oaP9uB2wCvNDE/MuAs9Lh44Cb0+GVgWdIfpy0B6YB3dJ5FwPnNLeu3LR70s/u1HS8B/AasFQ6Pgw4vNr7qUL7WsCy6XAH4ClgS2BjoDcwpWEf5papAR4BRgL7N7LOFUku9Gj2bzod/k06PAi4u7nPtNr7qsL7fZH5Dkk/r8npvyukwytU6r0u9kcKaVLvBfwuN60GuAQ4vaR4P+DhdPgfpN1yKOmzqX1EPAQQETMjYlZaLoAu6fDyLHgvBsBBwJ1f+s0soiJiNE3cXyJJwIF88f6zfRwR04APSf5zKX0tky7ThZJ92ci6SH/dTgbGl2y6PbCUkvtfli5d11dVJBp+mXZIXxERz0bElCYWO4EkOKc1MX9/4K9l/E031S1NU5/pYmER/A7ZHXgoIj6IiBnAQ8CAVr/BEot9KAC/Jvng6nPTjgdGRMTbJWWfA76TDu8HLCdpJaAv8KGke9NDyEvSPwqAo4CRkuqAQ4EL8yuU1AvoQ/JLbUm0LfBuRLySjj8H7CupvaQ+wKZAz4iYAxwLPE/yn6IfySXLTa5L0jLAGcC5+UIR8RZwKfAG8DbwUUT8rYg3Vw1pU8ZYki/5hyLiqWbK9iD5W/5NM6scxPw/Wpr6m+4BvAnJJefAR8BKNPGZtua9LaIWte+Q7HNI1aXTKmKxDgVJewPTIuKZ3LTVgAOAqxtZ5FRge0nPAtsDbwFzSX51bpvO3wxYAzg8XeZkYM+IqAVuBi4vWecgYHhEzKvQ2/qqKT1Kuonkj3gMyX+2fwFzJXUgCYWNgdWAccCZLazrXOCK3C9nACStQPILrU+6rmUkHVKpN1RtETEvIjYi6SVgc0nrN1P818AZTf39SVoV2IDkfqIGTf1NN9UtTaOfafnvaNG1iH6HlNU9UKtVu62uyBfwK5I/1inAO8AsYEY6PCV91ZP05lq67LJAXTq8JfBobt6hJN2CdwdezU1fHXixZD3PAt+s9r5og33dm5JzCiT/Ed4FaptZ7l8kRwWbAQ/npm8HjGxuXcDjuc/xQ5ImrONJ/sPemCt3GHBdtfdRQfv9bNJzKen4FHLnFEjOrTTso5kkRxffys3/ETA0N97k3zRJcGyV+zzeI73XqbHPtNr7pkL7d5H7DiH5cXRDbvwG4KBKvefF+kghIs6MiNqI6E2Sto9ExAoR8bWI6J1OnxXJ8xyQ1E1Swz45k+QXECRddqwgqaGHwZ2AF0n+OJaX1DedviswoWH7ktYhORH078Le5KJtF+ClSHvBBZC0dNrsg6RdgbkR8SLJL6p+uX08375sbF0RsW3uc/w1cEFEXEPSbLRlui0BO5es6ytLUndJXdPhpUj3S1PlI6JPbh8NB34YEX/OFSk9+mrub7rRbmma+Uy/8hbR75BRwG6SVkiPindj/iO9L6XIXlK/inYAfiUpgNEkV1UQEfOUXO74cPol8wzw20i68jgauEdSPckHfERufQcBd0Ua54srSXeS7Ltuabvo2RFxIwu2VUNydcqodH+9RfKLiYiYquTyytGS5gCv88XhNU2sq1ER8ZSk4cB/SQ7dn2UR71pgIawK3JK2R7cDhkXE/ZJOJGn3/howTtLISLutb0p6SWRP4LGGaS38TTfVLU2jn+kSagcK/g6JiA8k/YIkaADOi4iKdSTqbi7MzCyzWDcfmZnZwnEomJlZxqFgZmYZh4KZmWUcCmZmlnEomJlZxqFgS4y0K+PdS6adJOm6ZpaZ2dS8Vtbh90q7rzZbFDkUbElyJws+B7zsm+LMlgQOBVuSDAf2ltQJsjt6VwPGKnkwyn+VPChm39IFJe2g3ANWJF0j6fB0eFNJj0l6RtKotJO5FjW1XHpEc5GSh+m8LGnbL/vGzcrlULAlRkS8D/yHL/qeHwTcDXwK7BcRmwA7ApelXRG0KO3d9WqSh9dsStLXzfkVWK59RGwOnETS6Z1Zm3DfR7akaWhC+kv67xEkXRFfIGk7kh4vewCrkPSE2ZJ1gPWBh9IcqSF5hsOXXe7e9N9nSHqgNWsTDgVb0vwZuFzSJiSP6/xv2gzUHdg0IuZImgKUPtN5LvMfWTfMFzA+IrZayHq0tNxn6b/z8P9Ta0NuPrIlSiQP5HmUpLmm4QTz8iQPUpkjaUegVyPGiHSvAAAAn0lEQVSLvk7StXcnScuTdMcNMBHoLmkrSJqFJH29jKq0djmzQvkXiC2J7iRpnmm4Eul24D5JY4CxNPJ8goh4U9IwkifCvULSHTcR8Xl6ielVaVi0J3m2Q+kzo0vX16rlzIrmrrPNzCzj5iMzM8u4+cisAJKuBbYumXxlRNxcjfqYlcvNR2ZmlnHzkZmZZRwKZmaWcSiYmVnGoWBmZpn/B2N/LHgmX1d2AAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plotPercentualComparison(df, title='ALL VALUES: json x other')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This graph proves that all bigger values are JSON and the non-json types only appear on the smaller values." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Conclusion\n", + "\n", + "There is absolute no value greater than 104653 (max value for non-json) that represents a valid JSON in this 10% sample. \n", + "\n", + "This implies that all the greater values are JSON but they represent very low percentage of the whole data (6.76%). " + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The top (0.30% - whole sample) / (6.76% - values above the mean) is guarantee to be a valid JSON\n" + ] + } + ], + "source": [ + "max_non_json_value_len = df[df.is_json == False].value_len.max()\n", + "allJson = df[df['value_len'] > max_non_json_value_len ]\n", + "length = allJson.is_json.count()\n", + "print(\"The top ({0:0.2f}% - whole sample) / ({1:0.2f}% - values above the mean) is guarantee to be a valid JSON\".format(\n", + " length / df.is_json.count() * 100, length / df[df.value_len > df.value_len.mean()].is_json.count() * 100))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "--- \n", + "\n", + "# Out of Curiosity: small values\n", + "This is not exactly relevant to the issue 22 ('What's in the really large values?') but I was curious to know how was the distribution of the smaller values" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Distribution of non-json values\n", + "That concentration of the non-json values made me curious: how is the distribution of NON-JSON values among the smaller values? \n", + "To answer this I filtered the data to only values bellow the bigger non-json and ploted the same graph.\n", + "- What I got is, again, the absolute majority of these non-json are on the first bin, so they really tend to be small values. \n", + " \n", + " *TODO: what is that really small portion of non-json values present on the 9th bin? Are they any different from the others thre are smaller?" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Bigger non json value_len: 104653\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "print(\"Bigger non json value_len: \", max_non_json_value_len)\n", + "plotPercentualComparison(df[df.value_len < (max_non_json_value_len)], title=\"Up to the bigger NON-JSON: json X other\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Still, what about that first bin (the fist 1/10th)?" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "first 1/10th: 10465.3\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "print(\"first 1/10th: \", max_non_json_value_len/10)\n", + "plotPercentualComparison(df[df.value_len < (max_non_json_value_len/10)], title=\"First 1/10th: json X other\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Distribution for the values bellow the mean\n", + "This is where 95% of the rows are, they do have JSON types on them but as we can see, most of this data is of some other type, not json, and they have a eavenly distribution of this value_range, why is that?" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The values smaller than the mean represent 95.57% of the whole sample\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "m = df.value_len.mean()\n", + "bellow_mean = df[df.value_len <= (m)]\n", + "print('The values smaller than the mean represent {0:0.2f}% of the whole sample'.format(bellow_mean.is_json.count()/df.is_json.count()*100))\n", + "plotPercentualComparison(bellow_mean, title='Bellow the mean: json X other')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There is more JSON values below the mean than above the mean, but that's not surprising since the data below the mean is 95% of everything." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Jsons below the mean are 80.93% of all jsons\n" + ] + } + ], + "source": [ + "bellow_mean_count = bellow_mean[bellow_mean.is_json == True].is_json.count()\n", + "above_mean_count = df[(df.value_len > m) & (df.is_json == True)].is_json.count()\n", + "total = bellow_mean_count + above_mean_count\n", + "print(\"Jsons below the mean are {0:.2f}% of all jsons\".format(bellow_mean_count/total * 100))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "I idenfified lots of falses positives for \"valid_json\", my false positives were all small values, like a number passing on as a 'valid json', it did not make too much of a difference in the overall analysis, but made me think, are there more false positives? how can I eliminate them? " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/analyses/2019_03_aliamcami_value_analyses/isJson_dataPrep.ipynb b/analyses/2019_03_aliamcami_value_analyses/isJson_dataPrep.ipynb new file mode 100644 index 0000000..7577bfa --- /dev/null +++ b/analyses/2019_03_aliamcami_value_analyses/isJson_dataPrep.ipynb @@ -0,0 +1,1452 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Start" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/anaconda3/envs/overscripted/lib/python3.6/site-packages/dask/config.py:168: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.\n", + " data = yaml.load(f.read()) or {}\n" + ] + } + ], + "source": [ + "import dask.dataframe as dd\n", + "from dask.diagnostics import ProgressBar\n", + "import json\n", + "import pandas as pd\n", + "import os\n", + "import tldextract\n", + "import hashlib\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "All sub samples and new samples with new columns/data will be saved under the \"DIR\" directory to keep things organized. \n", + "As such, the function \"save_parquet\" and \"read_parquet\" adds this directory to every parquet name, and I'm using this functions instead of dd.read_parquet/dd.to_parquet direct to ensure the same read and write settings across the notebook. \n", + "\n", + "NOTE: each section adds its name to the 'FILE_NAME' and saves the new parquet with this name. Because of it, you can run the sections at any order you desire to have the output you need. " + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "#Initializing client / distributed\n", + "# client = Client()\n", + "# client\n", + "\n", + "#Create folder to save/read new data\n", + "DIR = 'sample_0_prep/'\n", + "FILE_NAME = 's0'\n", + "\n", + "if not os.path.exists(DIR):\n", + " os.makedirs(DIR)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If no \"recalculate_partition\" is passed on, it will not recalculate the partitions. It is not mandatory, but good if you are significantly reducing the size of the data. " + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "#Save a DF to a parquet\n", + "def save_parquet(df, name, recalculate_partition=False):\n", + " with ProgressBar():\n", + " #DF.REPARTITION copyed from: https://stackoverflow.com/questions/44657631/strategy-for-partitioning-dask-dataframes-efficiently\n", + " if recalculate_partition:\n", + " n = 1+df.memory_usage(deep=True).sum().compute() // (1000 * 1000 * 100)\n", + " print(\"Npartition: \", n)\n", + " df.repartition(npartitions= n).to_parquet(DIR + name + '.parquet', engine=\"pyarrow\")\n", + " else:\n", + " df.to_parquet(DIR + name + '.parquet', engine=\"pyarrow\")\n", + " \n", + " \n", + "def read_parquet(name):\n", + " return dd.read_parquet(DIR + name + '.parquet', engine='pyarrow')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Data\n", + "Using 10% sample and self produced samples\n", + " - 10% sample has 11292867 rows\n", + " - Filtered by value_len > df.mean() has 499805 rows" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['argument_0', 'argument_1', 'argument_2', 'argument_3', 'argument_4',\n", + " 'argument_5', 'argument_6', 'argument_7', 'argument_8', 'arguments',\n", + " 'arguments_n_keys', 'call_stack', 'crawl_id', 'file_name', 'func_name',\n", + " 'in_iframe', 'location', 'operation', 'script_col', 'script_line',\n", + " 'script_loc_eval', 'script_url', 'symbol', 'time_stamp', 'value',\n", + " 'value_1000', 'value_len'],\n", + " dtype='object')" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Original sample sample_0.parquet'\n", + "df = dd.read_parquet('sample_0.parquet', \n", + " engine='pyarrow', )\n", + "# columns=['value_1000', 'value', 'value_len', 'symbol', 'script_url', 'location', 'operation'])\n", + "\n", + "# df.astype({'value_1000': str, 'value': str,'value_len': int,'symbol': int,'script_url': str})\n", + "df.columns" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## DF overview\n", + "Some overview about the sample: \n", + "- Mean: 1356.97,\n", + "- Min: 0,\n", + "- Max: 4496861\n", + "- Std: 26310.62" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[########################################] | 100% Completed | 1min 19.3s\n", + "MEAN: 1356.9776628910975,\n", + "MIN: 0,\n", + "MAX: 4496861,\n", + "std: 26310.62140481331,\n", + "LEN: 11292867\n" + ] + } + ], + "source": [ + "with ProgressBar():\n", + " df_mean = df['value_len'].mean()\n", + " df_min = df['value_len'].min()\n", + " df_max = df['value_len'].max()\n", + " df_std = df['value_len'].std()\n", + " df_len = df['value_len'].count()\n", + " (df_mean, df_min, df_max, df_std, df_len) = dd.compute(df_mean, df_min, df_max, df_std, df_len);\n", + " print(\"MEAN: {},\\nMIN: {},\\nMAX: {},\\nstd: {},\\nLEN: {}\".format(df_mean, df_min, df_max, df_std, df_len))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Add Column: Domains\n", + "The following code is copyed from this same project: ~/analyses/hello_world.ipynb\n", + "\n", + "It uses the data saved from the last section\n", + "This section is dedicated to extract the domain of the columns \"location\" and \"script_url\" and add it as new columns \"location_domain\" and \"script_domain\"" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Notebook name: s0_domains\n" + ] + } + ], + "source": [ + "FILE_NAME += '_domains'\n", + "print('Notebook name: ', FILE_NAME)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "def extract_domain(url):\n", + " \"\"\"Use tldextract to return the base domain from a url\"\"\"\n", + " try:\n", + " extracted = tldextract.extract(url)\n", + " return '{}.{}'.format(extracted.domain, extracted.suffix)\n", + " except Exception as e:\n", + " return 'ERROR'" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "df.astype({'value_1000': str, 'value': str,'value_len': int,'symbol': int,'script_url': str, 'location': str})\n", + "df['location_domain'] = df.location.apply(extract_domain, meta='O')\n", + "df['script_domain'] = df.script_url.apply(extract_domain, meta='O')" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[########################################] | 100% Completed | 7min 22.3s\n" + ] + } + ], + "source": [ + "#save\n", + "save_parquet(df=df, name=FILE_NAME)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
location_domainlocationscript_domainscript_url
0vk.comhttps://vk.com/widget_comments.php?app=2297596...vk.comhttps://vk.com/js/api/xdm.js?1449919642
1vk.comhttps://vk.com/widget_comments.php?app=2297596...vk.comhttps://vk.com/js/api/xdm.js?1449919642
2vk.comhttps://vk.com/widget_comments.php?app=2297596...vk.comhttps://vk.com/js/al/aes_light.js?592436914
3baidu.comhttps://pos.baidu.com/s?hei=70&wid=670&di=u313...baidustatic.comhttps://cpro.baidustatic.com/cpro/ui/noexpire/...
4serienjunkies.orghttp://serienjunkies.org/smilf/smilf-season-1-...google.comhttps://apis.google.com/js/plusone.js?_=151338...
\n", + "
" + ], + "text/plain": [ + " location_domain location \\\n", + "0 vk.com https://vk.com/widget_comments.php?app=2297596... \n", + "1 vk.com https://vk.com/widget_comments.php?app=2297596... \n", + "2 vk.com https://vk.com/widget_comments.php?app=2297596... \n", + "3 baidu.com https://pos.baidu.com/s?hei=70&wid=670&di=u313... \n", + "4 serienjunkies.org http://serienjunkies.org/smilf/smilf-season-1-... \n", + "\n", + " script_domain script_url \n", + "0 vk.com https://vk.com/js/api/xdm.js?1449919642 \n", + "1 vk.com https://vk.com/js/api/xdm.js?1449919642 \n", + "2 vk.com https://vk.com/js/al/aes_light.js?592436914 \n", + "3 baidustatic.com https://cpro.baidustatic.com/cpro/ui/noexpire/... \n", + "4 google.com https://apis.google.com/js/plusone.js?_=151338... " + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#read\n", + "df = read_parquet(FILE_NAME)\n", + "df[['location_domain', 'location', 'script_domain', 'script_url']].head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Add Column: is_json\n", + "\n", + "After manual initial analysis I have think that the huge values are json structured, to validate that I included an new column that is a boolean value with the validation of json\n", + "\n", + "After simple validation of value is a json or not, boolean value will be saved on a new column named \"is_json\"\n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Notebook name: s0_domains_isJson\n" + ] + } + ], + "source": [ + "FILE_NAME += '_isJson'\n", + "print('Notebook name: ', FILE_NAME)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "def is_json(myjson):\n", + " if (myjson == '{}'):\n", + " #would be counted as valid, but its an empty json\n", + " return False\n", + " try:\n", + " #Eliminate false positives\n", + " return (type(json.loads(myjson)) == dict)\n", + " except ValueError as e:\n", + " return False" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "df['is_json'] = df['value'].apply(is_json, meta='O')" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[########################################] | 100% Completed | 5min 12.2s\n" + ] + } + ], + "source": [ + "#save\n", + "save_parquet(df=df, name=FILE_NAME)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
value_1000is_json
0fXDcab74False
1fXDcab74False
2Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko...False
3Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko...False
4_ga=GA1.2.1529583939.1513387469; _gid=GA1.2.17...False
\n", + "
" + ], + "text/plain": [ + " value_1000 is_json\n", + "0 fXDcab74 False\n", + "1 fXDcab74 False\n", + "2 Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko... False\n", + "3 Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko... False\n", + "4 _ga=GA1.2.1529583939.1513387469; _gid=GA1.2.17... False" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#read\n", + "df = read_parquet(FILE_NAME)\n", + "df[['value_1000', 'is_json']].head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Add json keys" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Extract the top level keys, sort them and add as a list into another column named 'json_keys'\n", + "Will be using \"https://github.com/rnd0101/json_schema_inferencer\" to guess the json schema and save it into another column called \"json_schema\"" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Notebook name: s0_domains_isJson_jsonKeys\n" + ] + } + ], + "source": [ + "FILE_NAME += '_jsonKeys'\n", + "print('Notebook name: ', FILE_NAME)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + " def jsonKeys(r):\n", + " if(r['is_json']):\n", + " try:\n", + " dct = json.loads(r['value'])\n", + " keys = list(dct.keys())\n", + " keys.sort()\n", + " return str(keys)\n", + " except ValueError as e:\n", + " return ''\n", + " else:\n", + " return ''" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[########################################] | 100% Completed | 8min 32.7s\n" + ] + } + ], + "source": [ + "df['json_keys'] = df.apply(jsonKeys,axis=1, meta='O')\n", + "save_parquet(df=df, name=FILE_NAME)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
value_1000is_jsonjson_keys
0fXDcab74False
1fXDcab74False
2Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko...False
3Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko...False
4_ga=GA1.2.1529583939.1513387469; _gid=GA1.2.17...False
\n", + "
" + ], + "text/plain": [ + " value_1000 is_json json_keys\n", + "0 fXDcab74 False \n", + "1 fXDcab74 False \n", + "2 Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko... False \n", + "3 Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko... False \n", + "4 _ga=GA1.2.1529583939.1513387469; _gid=GA1.2.17... False " + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#read \n", + "df = read_parquet(FILE_NAME)\n", + "df[['value_1000', 'is_json', 'json_keys']].head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Add Column: keys_md5\n", + "Include new columns called \"keys_md5\" that is the md5 of json_keys column" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Notebook name: s0_domains_isJson_jsonKeys_md5\n" + ] + } + ], + "source": [ + "FILE_NAME += '_md5'\n", + "print('Notebook name: ', FILE_NAME)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "def md5(value):\n", + " if (value == ''):\n", + " return ''\n", + " else:\n", + " return hashlib.md5(value.encode('utf-8')).hexdigest()" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "df['keys_md5'] = df['json_keys'].apply(md5, meta='O')" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[########################################] | 100% Completed | 3min 49.6s\n" + ] + } + ], + "source": [ + "#save\n", + "save_parquet(df=df, name=FILE_NAME)" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
value_1000keys_md5
0fXDcab74
1fXDcab74
2Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko...
3Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko...
4_ga=GA1.2.1529583939.1513387469; _gid=GA1.2.17...
\n", + "
" + ], + "text/plain": [ + " value_1000 keys_md5\n", + "0 fXDcab74 \n", + "1 fXDcab74 \n", + "2 Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko... \n", + "3 Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko... \n", + "4 _ga=GA1.2.1529583939.1513387469; _gid=GA1.2.17... " + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#read\n", + "df = read_parquet(FILE_NAME)\n", + "df[['value_1000', 'keys_md5']].head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# TLD\n", + "Include new columns called \"script_tld\" that is the the TLD for the script_domain" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Notebook name: s0_domains_isJson_jsonKeys_md5_TLD\n" + ] + } + ], + "source": [ + "FILE_NAME += '_TLD'\n", + "print('Notebook name: ', FILE_NAME)" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [], + "source": [ + "def extractTLD(domain):\n", + " return domain.split('.')[-1]" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "df['script_tld'] = df['script_domain'].apply(extractTLD, meta='O')" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[########################################] | 100% Completed | 3min 59.4s\n" + ] + } + ], + "source": [ + "#save\n", + "save_parquet(df=df, name=FILE_NAME)" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
script_domainscript_tld
0vk.comcom
1vk.comcom
2vk.comcom
3baidustatic.comcom
4google.comcom
\n", + "
" + ], + "text/plain": [ + " script_domain script_tld\n", + "0 vk.com com\n", + "1 vk.com com\n", + "2 vk.com com\n", + "3 baidustatic.com com\n", + "4 google.com com" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#read\n", + "df = read_parquet(FILE_NAME)\n", + "df[['script_domain', 'script_tld']].head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Saving other possible usefull filtered samples to future analyses" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## value_len > df_mean\n", + "1356 is the value_len mean\n", + "\n", + "To filter the data into something that is more interesting to this task I decided to only work with values that are at above the mean.\n", + "\n", + "All values above the mean count up to 499805 rows. That is just 4,42% of the whole sample, and a lot easier to work on. " + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Notebook name: s0_domains_isJson_jsonKeys_md5_TLD_above_mean\n" + ] + } + ], + "source": [ + "name = FILE_NAME + '_above_mean'\n", + "print('Notebook name: ', name)" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[########################################] | 100% Completed | 2min 23.6s\n" + ] + } + ], + "source": [ + "#Save\n", + "save_parquet(df= df[df['value_len'] > df_mean], name= name)" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['argument_0', 'argument_1', 'argument_2', 'argument_3', 'argument_4',\n", + " 'argument_5', 'argument_6', 'argument_7', 'argument_8', 'arguments',\n", + " 'arguments_n_keys', 'call_stack', 'crawl_id', 'file_name', 'func_name',\n", + " 'in_iframe', 'location', 'operation', 'script_col', 'script_line',\n", + " 'script_loc_eval', 'script_url', 'symbol', 'time_stamp', 'value',\n", + " 'value_1000', 'value_len', 'location_domain', 'script_domain',\n", + " 'is_json', 'json_keys', 'keys_md5', 'script_tld'],\n", + " dtype='object')" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Read\n", + "df = read_parquet(name)\n", + "df.columns" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Filter to parquet containing only JSON " + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Notebook name: s0_domains_isJson_jsonKeys_md5_TLD_JSON_ONLY\n" + ] + } + ], + "source": [ + "name = FILE_NAME + '_JSON_ONLY'\n", + "print('Notebook name: ', name)" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[########################################] | 100% Completed | 1min 20.0s\n" + ] + } + ], + "source": [ + "save_parquet(df=df[df['is_json'] == True], name=name)" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
value_1000is_json
0{\"im-settings\":\"{\\\"val\\\":{\\\"settings\\\":{\\\"Site...True
1{\"APLUS_S_CORE_0.17.12_20171214163401_2ee09a0c...True
2{\"APLUS_S_CORE_0.17.12_20171214163401_2ee09a0c...True
3{\"dueljs_channel_comm\":\"[{\\\"id\\\":4734405521279...True
4{\"dueljs_channel_comm\":\"[{\\\"id\\\":4734405521279...True
\n", + "
" + ], + "text/plain": [ + " value_1000 is_json\n", + "0 {\"im-settings\":\"{\\\"val\\\":{\\\"settings\\\":{\\\"Site... True\n", + "1 {\"APLUS_S_CORE_0.17.12_20171214163401_2ee09a0c... True\n", + "2 {\"APLUS_S_CORE_0.17.12_20171214163401_2ee09a0c... True\n", + "3 {\"dueljs_channel_comm\":\"[{\\\"id\\\":4734405521279... True\n", + "4 {\"dueljs_channel_comm\":\"[{\\\"id\\\":4734405521279... True" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#read all_json_above_mean\n", + "df = read_parquet(name)\n", + "df[['value_1000', 'is_json']].head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## All NON json above the mean" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Notebook name: s0_domains_isJson_jsonKeys_md5_TLD_nonJSON_ONLY\n" + ] + } + ], + "source": [ + "name = FILE_NAME + '_nonJSON_ONLY'\n", + "df = read_parquet(FILE_NAME)\n", + "print('Notebook name: ', name)" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[########################################] | 100% Completed | 4min 34.1s\n", + "Npartition: 285\n", + "[########################################] | 100% Completed | 2min 11.3s\n" + ] + } + ], + "source": [ + "save_parquet(df=df[df['is_json'] == False], name=name, recalculate_partition=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
argument_0argument_1argument_2argument_3argument_4argument_5argument_6argument_7argument_8arguments...time_stampvaluevalue_1000value_lenlocation_domainscript_domainis_jsonjson_keyskeys_md5script_tld
0NoneNoneNoneNoneNoneNoneNoneNoneNone{}...2017-12-16 19:02:31.406fXDcab74fXDcab748vk.comvk.comFalsecom
1NoneNoneNoneNoneNoneNoneNoneNoneNone{}...2017-12-16 19:02:31.407fXDcab74fXDcab748vk.comvk.comFalsecom
2NoneNoneNoneNoneNoneNoneNoneNoneNone{}...2017-12-16 19:02:31.659Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko...Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko...68vk.comvk.comFalsecom
3NoneNoneNoneNoneNoneNoneNoneNoneNone{}...2017-12-16 00:24:09.355Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko...Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko...68baidu.combaidustatic.comFalsecom
4NoneNoneNoneNoneNoneNoneNoneNoneNone{}...2017-12-16 01:24:30.372_ga=GA1.2.1529583939.1513387469; _gid=GA1.2.17..._ga=GA1.2.1529583939.1513387469; _gid=GA1.2.17...288serienjunkies.orggoogle.comFalsecom
\n", + "

5 rows × 33 columns

\n", + "
" + ], + "text/plain": [ + " argument_0 argument_1 argument_2 argument_3 argument_4 argument_5 \\\n", + "0 None None None None None None \n", + "1 None None None None None None \n", + "2 None None None None None None \n", + "3 None None None None None None \n", + "4 None None None None None None \n", + "\n", + " argument_6 argument_7 argument_8 arguments ... time_stamp \\\n", + "0 None None None {} ... 2017-12-16 19:02:31.406 \n", + "1 None None None {} ... 2017-12-16 19:02:31.407 \n", + "2 None None None {} ... 2017-12-16 19:02:31.659 \n", + "3 None None None {} ... 2017-12-16 00:24:09.355 \n", + "4 None None None {} ... 2017-12-16 01:24:30.372 \n", + "\n", + " value \\\n", + "0 fXDcab74 \n", + "1 fXDcab74 \n", + "2 Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko... \n", + "3 Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko... \n", + "4 _ga=GA1.2.1529583939.1513387469; _gid=GA1.2.17... \n", + "\n", + " value_1000 value_len \\\n", + "0 fXDcab74 8 \n", + "1 fXDcab74 8 \n", + "2 Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko... 68 \n", + "3 Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko... 68 \n", + "4 _ga=GA1.2.1529583939.1513387469; _gid=GA1.2.17... 288 \n", + "\n", + " location_domain script_domain is_json json_keys keys_md5 script_tld \n", + "0 vk.com vk.com False com \n", + "1 vk.com vk.com False com \n", + "2 vk.com vk.com False com \n", + "3 baidu.com baidustatic.com False com \n", + "4 serienjunkies.org google.com False com \n", + "\n", + "[5 rows x 33 columns]" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#read \n", + "df = read_parquet(name)\n", + "df.head()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}