Tutorials

Short Tutorial in the Retention Class.

Let’s import the class and see insert a some retention numbers.

Lets see what what we have in here:

                 Retention
DaysSinceInstall
1                    50.0%
7                    15.0%
30                    5.0%

Lets fit an equation to this data, default is a power function and as of right now the only available functions build in are

  • Power function: \(f(x) = k_1x^{k_2}\) identifier: power

Lets see how well it was fitted:

                 Retention RetentionFit
DaysSinceInstall
1                    50.0%        50.1%
2                     nan%        32.2%
3                     nan%        24.9%
4                     nan%        20.7%
5                     nan%        18.0%
6                     nan%        16.0%
7                    15.0%        14.5%
8                     nan%        13.3%
9                     nan%        12.3%
10                    nan%        11.5%
11                    nan%        10.9%
12                    nan%        10.3%
13                    nan%         9.8%
14                    nan%         9.3%
15                    nan%         8.9%
16                    nan%         8.6%
17                    nan%         8.2%
18                    nan%         7.9%
19                    nan%         7.7%
20                    nan%         7.4%
21                    nan%         7.2%
22                    nan%         7.0%
23                    nan%         6.8%
24                    nan%         6.6%
25                    nan%         6.4%
26                    nan%         6.3%
27                    nan%         6.1%
28                    nan%         6.0%
29                    nan%         5.9%
30                    5.0%         5.7%
_images/Retention_Example_8_0.png

Let’s try something interesting - what is the sum of the retention values from the retention array?

7.987575025935071

Hmm, that is not the same as the 8.7 calculated using the integrated function of the power equation. At its core the reason for this is that when performing Riemann integration we look at a smooth curve, whereas simply summing retention values from an array we assume a certain linear relationship between point \(R(i)\) and \(R(i+1)\).

Custom Functions

In this example, a power function was used, but any callable function can be used.

NOTE: It is recommended to use the Fitfunction class to create new functions, this class simply needs two functions, the equation used to fit and the integrated equation from 1 to b. The integrated function is only used to calculate the summed retention and can be left out. In that case, just pass the callable function directly. Anyway, example with power function and its integrated companion:

\[R(t) = k_1t^{k_2}.\]

The integral from a to b is simple

\[\int_a^b R(t) dt = \frac{k_1}{k_2+1}\left( b^{ k_2 + 1} - a^{k_2 + 1}\right)\]

Well… Turns out that fitting is not always 100 % similar to real world data, to not get into some bad situations lets set \(a=1\), we can do this, since we know that retention for day 0 is always 1!

\[\int_1^b R(t) dt + 1= \frac{k_1}{k_2+1}\left( b^{ k_2 + 1} - 1^{k_2 + 1}\right) + 1.\]

Lets create a fitter class for our power function:

Lets fit with this function:

_images/Retention_Example_16_0.png

If we are interested in uncertainties the Uncertainties package have been implemented. This can be used the following way:

_images/Retention_Example_18_0.png

When working with uncertainties, the nominal values and the uncertainty values can be obtained with functions nominal_values and std_devs, respectively:

DaysSinceInstall
1       0.51+/-0.04
2     0.322+/-0.018
3     0.246+/-0.013
4     0.204+/-0.010
5     0.176+/-0.009
6     0.156+/-0.008
7     0.140+/-0.008
8     0.129+/-0.008
9     0.119+/-0.007
10    0.111+/-0.007
11    0.104+/-0.007
12    0.098+/-0.007
13    0.093+/-0.006
14    0.089+/-0.006
15    0.085+/-0.006
16    0.081+/-0.006
17    0.078+/-0.006
18    0.075+/-0.006
19    0.072+/-0.006
20    0.070+/-0.006
21    0.068+/-0.006
22    0.066+/-0.006
23    0.064+/-0.005
24    0.062+/-0.005
25    0.060+/-0.005
26    0.059+/-0.005
27    0.057+/-0.005
28    0.056+/-0.005
29    0.055+/-0.005
30    0.054+/-0.005
Name: RetentionFit, dtype: object
array([0.51007967, 0.32223477, 0.24631593, 0.20356673, 0.17558734,
       0.1556062 , 0.14049706, 0.12860006, 0.11894522, 0.11092453,
       0.10413587, 0.09830176, 0.09322396, 0.0887568 , 0.0847906 ,
       0.08124106, 0.07804224, 0.07514176, 0.07249742, 0.07007482,
       0.06784561, 0.06578619, 0.06387677, 0.06210058, 0.06044333,
       0.05889276, 0.05743829, 0.05607071, 0.054782  , 0.05356512])
array([0.03564148, 0.01775511, 0.01252601, 0.01027747, 0.00909227,
       0.00837123, 0.00788221, 0.00752197, 0.00723962, 0.00700784,
       0.00681092, 0.00663929, 0.00648677, 0.00634921, 0.0062237 ,
       0.00610815, 0.00600099, 0.005901  , 0.00580724, 0.00571896,
       0.00563553, 0.00555645, 0.00548129, 0.00540969, 0.00534132,
       0.00527593, 0.00521327, 0.00515313, 0.00509533, 0.00503971])

So why bother with this integral function anyway? Well, let’s see what happens when we create a data set consisting of retention values from days since install 1 to 180 and sum that using the same power function with the same parameters:

It is also possible to save and load instances of classes (using pickle):

my_retention.save('myretention.pkl')

Loading the data is simply done with

import pyfreya
LoadedRetentionClass = pyfreya.load('myretention.pkl')

Short Tutorial in the Cohort Class.

Retention

Let’s import the class and see insert a some retention numbers along with the amount of new users in the cohort.

To get more info on retenion see retention tutorial.

                        1
DaysSinceInstall
0                     100
1                 50.0629
2                 32.1914
3                 24.8632
4                 20.6996
5                 17.9566
6                 15.9875
7                 14.4921
8                 13.3102
9                  12.348
10                11.5464
11                10.8662
12                10.2802
13                9.76917
14                9.31867
15                8.91795
16                8.55872
17                8.23447
18                7.94001
19                7.67118
20                7.42456
21                7.19733
22                6.98716
23                6.79206
24                6.61038
25                6.44069
26                6.28175
27                6.13252
28                5.99207
29                 5.8596
30                 5.7344
31                5.61586

Note: That the cohort class can also take a retention profile instead of actual retention data points. The name given is not of any particular importance now, but when plotting various aggregates from multiple cohorts easily identifiable names are nice to have - is no name given a random one will be applied.

Daily Active Users

Maybe similar cohorts comes in multiple days in a row. It is moddeled like this:

                       1        2        3        4        5        6        7        8        9        10
DaysSinceInstall
0                     100      100      100      100      100      100      100      100      100      100
1                 50.0629  50.0629  50.0629  50.0629  50.0629  50.0629  50.0629  50.0629  50.0629  50.0629
2                     NaN  32.1914  32.1914  32.1914  32.1914  32.1914  32.1914  32.1914  32.1914  32.1914
3                     NaN      NaN  24.8632  24.8632  24.8632  24.8632  24.8632  24.8632  24.8632  24.8632
4                     NaN      NaN      NaN  20.6996  20.6996  20.6996  20.6996  20.6996  20.6996  20.6996
5                     NaN      NaN      NaN      NaN  17.9566  17.9566  17.9566  17.9566  17.9566  17.9566
6                     NaN      NaN      NaN      NaN      NaN  15.9875  15.9875  15.9875  15.9875  15.9875
7                     NaN      NaN      NaN      NaN      NaN      NaN  14.4921  14.4921  14.4921  14.4921
8                     NaN      NaN      NaN      NaN      NaN      NaN      NaN  13.3102  13.3102  13.3102
9                     NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN   12.348   12.348
10                    NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN  11.5464

Well - its nice to see this user distribution, but how many daily active users do we have ? (also note the type is a pandas DataFrame)

<class 'pandas.core.frame.DataFrame'>
dau
Date
1 100
2 150.063
3 182.254
4 207.117
5 227.817
6 245.774
7 261.761
8 276.253
9 289.563
10 301.912

Since users are still active after the influx of 10 days lets see what it looks like after 30 days (10 days of user influx and 20 days of waiting):

dau
Date
1 100
2 150.063
3 182.254
4 207.117
5 227.817
6 245.774
7 261.761
8 276.253
9 289.563
10 301.912
11 213.458
12 174.261
13 152.35
14 137.256
15 125.875
16 116.836
17 109.408
18 103.15
19 97.7799
20 93.103
21 88.9812
22 85.3123
23 82.0192
24 79.0421
25 76.3338
26 73.8566
27 71.5796
28 69.4776
29 67.5297
30 65.7181

Enough numbers, lets plot some of this. First, lets plot the retention - maybe it fitted the data incorrectly:

_images/Cohort_Example_9_0.png

How about dau?

_images/Cohort_Example_11_0.png

If you wonder how long time it takes to reach a certain amount of dau it can be calculated. This does assume a steady influx of users given in new_users and with the retention profile calculated earlier.

21
_images/Cohort_Example_14_0.png

Datetime

What kind of date is this anyway? Lets use proper human dates from the Gregorian calendar:

_images/Cohort_Example_16_0.png

Revenue

Well how much money did we earn? A premade revenue profile class called ARPDAU is imported and is initialized by setting the ARPDAU to a value.

dau revenue
Date
2019-03-14 100 210
2019-03-15 150.063 315.132
2019-03-16 182.254 382.734
2019-03-17 207.117 434.947
2019-03-18 227.817 478.416
2019-03-19 245.774 516.125
2019-03-20 261.761 549.698
2019-03-21 276.253 580.132
2019-03-22 289.563 608.083
2019-03-23 301.912 634.014
2019-03-24 313.458 658.262
2019-03-25 324.324 681.081
2019-03-26 334.604 702.669
2019-03-27 344.374 723.184
2019-03-28 353.692 742.754
2019-03-29 362.61 761.481
2019-03-30 371.169 779.455
2019-03-31 379.403 796.747
2019-04-01 387.343 813.421
2019-04-02 395.015 829.531
2019-04-03 402.439 845.122
2019-04-04 409.636 860.237
2019-04-05 416.624 874.91
2019-04-06 323.416 679.173
2019-04-07 279.963 587.923
2019-04-08 254.212 533.846
2019-04-09 235.631 494.825
2019-04-10 221.064 464.234
2019-04-11 209.099 439.109
2019-04-12 198.971 417.84
2019-04-13 190.214 399.449
2019-04-14 182.519 383.291
2019-04-15 175.675 368.917

This can be plotted too!

_images/Cohort_Example_20_0.png

If we are interested in uncertainties the Uncertainties package have been implemented. This can be used the following way:

_images/Cohort_Example_22_0.png _images/Cohort_Example_22_1.png _images/Cohort_Example_22_2.png

When working with uncertainties, the nominal values and the uncertainty values can be obtained with functions nominal_values and std_devs, respectively:

Date
1      100+/-5
2      151+/-8
3     183+/-11
4     208+/-12
5     228+/-14
6     246+/-15
7     261+/-16
8     275+/-17
9     288+/-17
10    300+/-18
11    311+/-19
12    322+/-20
13    331+/-20
14    341+/-21
15    350+/-22
16    358+/-22
17    366+/-23
18    374+/-23
19    382+/-24
20    389+/-24
21    396+/-25
22    403+/-26
23    409+/-26
24    316+/-23
25    271+/-21
26    245+/-19
27    226+/-19
28    211+/-18
29    199+/-17
30    189+/-17
31    181+/-16
32    173+/-16
33    166+/-16
Name: dau, dtype: object
array([100.        , 151.00796688, 183.23144399, 207.86303692,
       228.21970943, 245.77844367, 261.3390633 , 275.38876963,
       288.24877584, 300.14329757, 311.23575048, 321.64933796,
       331.47951354, 340.80190942, 349.6775894 , 358.15664906,
       366.28075478, 374.08497899, 381.59915514, 388.84889718,
       395.85637904, 402.64093973, 409.21955908, 315.60723627,
       270.8093275 , 244.6301836 , 225.88786696, 211.27502302,
       199.32335936, 189.24093946, 180.54774509, 172.92912575,
       166.16687945])
array([ 5.        ,  8.34935123, 10.56825389, 12.20066331, 13.51786087,
       14.6418741 , 15.63646161, 16.53872917, 17.37200759, 18.15183883,
       18.88905055, 19.59146105, 20.26488384, 20.91374916, 21.54150443,
       22.1508812 , 22.74407817, 23.32289008, 23.88880016, 24.44304807,
       24.98668034, 25.52058868, 26.04553928, 22.86365271, 20.63415691,
       19.40634919, 18.5255104 , 17.82742722, 17.24464372, 16.74232945,
       16.29990806, 15.90409195, 15.54574024])

It is possible to save a cohort class instance (using pickle) and loading it.

facebook.save('facebook_revenue.pkl')
import pyfreya
facebook_loaded = pyfreya.load('facebook_revenue.pkl')

Short Tutorial in the Revenue Classes.

Let’s import the class and insert a some retention numbers along with the amount of new users in a cohort. Additional information about Retention and Cohort classes can be found at retention tutorial and cohort tutorial. Let’s also import a predefined revenue spending class (profile) ARPDAU.

                        1
DaysSinceInstall
0                     100
1                 50.0629
2                 32.1914
3                 24.8632
4                 20.6996
5                 17.9566
6                 15.9875
7                 14.4921
8                 13.3102
9                  12.348
10                11.5464
11                10.8662
12                10.2802
13                9.76917
14                9.31867
15                8.91795
16                8.55872
17                8.23447
18                7.94001
19                7.67118
20                7.42456
21                7.19733
22                6.98716
23                6.79206
24                6.61038
25                6.44069
26                6.28175
27                6.13252
28                5.99207
29                 5.8596
30                 5.7344
31                5.61586
dau revenue
Date
1 100 210
2 50.0629 105.132
3 32.1914 67.6019
4 24.8632 52.2127
5 20.6996 43.4692
6 17.9566 37.7089
7 15.9875 33.5737
8 14.4921 30.4334
9 13.3102 27.9515
10 12.348 25.9309
11 11.5464 24.2475
12 10.8662 22.819
13 10.2802 21.5885
14 9.76917 20.5153
15 9.31867 19.5692
16 8.91795 18.7277
17 8.55872 17.9733
18 8.23447 17.2924
19 7.94001 16.674
20 7.67118 16.1095
21 7.42456 15.5916
22 7.19733 15.1144
23 6.98716 14.673
24 6.79206 14.2633
25 6.61038 13.8818
26 6.44069 13.5254
27 6.28175 13.1917
28 6.13252 12.8783
29 5.99207 12.5833
30 5.8596 12.3052
31 5.7344 12.0422

As can be seen, there is also room for adding uncertainty. Let’s import the base revenue class and define ARPDAU but with uncertainty. It does make use of some properties of the Cohort class, it is recommended to be a bit familiar with that.

dau revenue
Date
1 100 210
2 50.0629 105.132
3 32.1914 67.6019
4 24.8632 52.2127
5 20.6996 43.4692
6 17.9566 37.7089
7 15.9875 33.5737
8 14.4921 30.4334
9 13.3102 27.9515
10 12.348 25.9309
11 11.5464 24.2475
12 10.8662 22.819
13 10.2802 21.5885
14 9.76917 20.5153
15 9.31867 19.5692
16 8.91795 18.7277
17 8.55872 17.9733
18 8.23447 17.2924
19 7.94001 16.674
20 7.67118 16.1095
21 7.42456 15.5916
22 7.19733 15.1144
23 6.98716 14.673
24 6.79206 14.2633
25 6.61038 13.8818
26 6.44069 13.5254
27 6.28175 13.1917
28 6.13252 12.8783
29 5.99207 12.5833
30 5.8596 12.3052
31 5.7344 12.0422
_images/Revenue_Example_5_0.png

If we are interested in uncertainties the Uncertainties package have been implemented. This can be used the following way:

_images/Revenue_Example_7_0.png

When working with uncertainties, the nominal values and the uncertainty values can be obtained with functions nominal_values and std_devs, respectively:

Date
1       210+/-23
2       107+/-14
3         68+/-8
4         52+/-6
5         43+/-5
6         37+/-4
7         33+/-4
8         30+/-4
9     27.0+/-3.3
10    25.0+/-3.1
11    23.3+/-2.9
12    21.9+/-2.8
13    20.6+/-2.6
14    19.6+/-2.5
15    18.6+/-2.4
16    17.8+/-2.3
17    17.1+/-2.2
18    16.4+/-2.2
19    15.8+/-2.1
20    15.2+/-2.0
21    14.7+/-2.0
22    14.2+/-1.9
23    13.8+/-1.9
24    13.4+/-1.8
25    13.0+/-1.8
26    12.7+/-1.8
27    12.4+/-1.7
28    12.1+/-1.7
29    11.8+/-1.7
30    11.5+/-1.6
31    11.2+/-1.6
Name: revenue, dtype: object
array([210.        , 107.11673044,  67.66930193,  51.72634515,
        42.74901227,  36.8733419 ,  32.67730124,  29.50438329,
        27.00601304,  24.97849563,  23.29415112,  21.86853369,
        20.64336873,  19.57703135,  18.63892795,  17.80602528,
        17.06062202,  16.38887082,  15.77976993,  15.22445827,
        14.71571191,  14.24757745,  13.81510062,  13.41412211,
        13.04112202,  12.69309974,  12.36748022,  12.06203999,
        11.77484823,  11.50421944,  11.24867512])
array([22.588714  , 13.73966474,  8.17827559,  6.15442801,  5.07962406,
        4.40195366,  3.9300445 ,  3.57937697,  3.3066122 ,  3.08712987,
        2.90585349,  2.75300466,  2.62194609,  2.5080031 ,  2.40778109,
        2.31875073,  2.23898552,  2.16699004,  2.1015841 ,  2.04182268,
        1.9869392 ,  1.93630468,  1.88939763,  1.84578168,  1.80508855,
        1.76700502,  1.73126286,  1.69763082,  1.66590837,  1.63592067,
        1.60751445])

It is possible to save a revenue class instance (using pickle) and loading it.

facebook.save('facebook_revenue.pkl')
facebook_loaded = pyfreya.load('facebook_revenue.pkl')

Short Tutorial Working With Multiple Cohorts

When working with multiple cohorts plotting can’t be performed with the Cohort class methods, instead, import multi plot functions. Let’s create some cohorts first.

Lets plot their retention

tutorials/multicohort/Multi_Cohort_Example_3_0.png

If copies of the Facebook cohort comes in each day in 10 days and copies of the Google cohort comes in 6 days in a row - what does DAU look like?

tutorials/multicohort/Multi_Cohort_Example_5_0.png tutorials/multicohort/Multi_Cohort_Example_6_0.png

How does revenue look across this? Lets apply a revenue profile and take a look.

tutorials/multicohort/Multi_Cohort_Example_8_0.png tutorials/multicohort/Multi_Cohort_Example_9_0.png

If we are interested in uncertainties the Uncertainties package have been implemented. This can be used the following way:

tutorials/multicohort/Multi_Cohort_Example_11_0.png tutorials/multicohort/Multi_Cohort_Example_12_0.png

When working with uncertainties, the nominal values and the uncertainty values can be obtained with functions nominal_values and std_devs, respectively:

Date
1      100+/-6
2     152+/-10
3     185+/-13
4     209+/-16
5     230+/-17
6     247+/-19
7     262+/-20
8     276+/-21
9     288+/-22
10    300+/-23
Name: dau, dtype: object
array([100.        , 152.2302569 , 184.70997132, 209.30960342,
       229.50731853, 246.84077882, 262.13821507, 275.9019684 ,
       288.46204483, 300.04808586])
array([ 6.        , 10.49918186, 13.4983251 , 15.64446572, 17.31653098,
       18.69011938, 19.85918453, 20.87961789, 21.78730166, 22.60661843])