pyfreya.cohort package

Submodules

pyfreya.cohort.cohort module

Short Tutorial in the Cohort Class.

Retention

Let’s import the class and see insert a some retention numbers along with the amount of new users in the cohort.

To get more info on retenion see retention tutorial.

                        1
DaysSinceInstall
0                     100
1                 50.0629
2                 32.1914
3                 24.8632
4                 20.6996
5                 17.9566
6                 15.9875
7                 14.4921
8                 13.3102
9                  12.348
10                11.5464
11                10.8662
12                10.2802
13                9.76917
14                9.31867
15                8.91795
16                8.55872
17                8.23447
18                7.94001
19                7.67118
20                7.42456
21                7.19733
22                6.98716
23                6.79206
24                6.61038
25                6.44069
26                6.28175
27                6.13252
28                5.99207
29                 5.8596
30                 5.7344
31                5.61586

Note: That the cohort class can also take a retention profile instead of actual retention data points. The name given is not of any particular importance now, but when plotting various aggregates from multiple cohorts easily identifiable names are nice to have - is no name given a random one will be applied.

Daily Active Users

Maybe similar cohorts comes in multiple days in a row. It is moddeled like this:

                       1        2        3        4        5        6        7        8        9        10
DaysSinceInstall
0                     100      100      100      100      100      100      100      100      100      100
1                 50.0629  50.0629  50.0629  50.0629  50.0629  50.0629  50.0629  50.0629  50.0629  50.0629
2                     NaN  32.1914  32.1914  32.1914  32.1914  32.1914  32.1914  32.1914  32.1914  32.1914
3                     NaN      NaN  24.8632  24.8632  24.8632  24.8632  24.8632  24.8632  24.8632  24.8632
4                     NaN      NaN      NaN  20.6996  20.6996  20.6996  20.6996  20.6996  20.6996  20.6996
5                     NaN      NaN      NaN      NaN  17.9566  17.9566  17.9566  17.9566  17.9566  17.9566
6                     NaN      NaN      NaN      NaN      NaN  15.9875  15.9875  15.9875  15.9875  15.9875
7                     NaN      NaN      NaN      NaN      NaN      NaN  14.4921  14.4921  14.4921  14.4921
8                     NaN      NaN      NaN      NaN      NaN      NaN      NaN  13.3102  13.3102  13.3102
9                     NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN   12.348   12.348
10                    NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN  11.5464

Well - its nice to see this user distribution, but how many daily active users do we have ? (also note the type is a pandas DataFrame)

<class 'pandas.core.frame.DataFrame'>
dau
Date
1 100
2 150.063
3 182.254
4 207.117
5 227.817
6 245.774
7 261.761
8 276.253
9 289.563
10 301.912

Since users are still active after the influx of 10 days lets see what it looks like after 30 days (10 days of user influx and 20 days of waiting):

dau
Date
1 100
2 150.063
3 182.254
4 207.117
5 227.817
6 245.774
7 261.761
8 276.253
9 289.563
10 301.912
11 213.458
12 174.261
13 152.35
14 137.256
15 125.875
16 116.836
17 109.408
18 103.15
19 97.7799
20 93.103
21 88.9812
22 85.3123
23 82.0192
24 79.0421
25 76.3338
26 73.8566
27 71.5796
28 69.4776
29 67.5297
30 65.7181

Enough numbers, lets plot some of this. First, lets plot the retention - maybe it fitted the data incorrectly:

_images/Cohort_Example_9_0.png

How about dau?

_images/Cohort_Example_11_0.png

If you wonder how long time it takes to reach a certain amount of dau it can be calculated. This does assume a steady influx of users given in new_users and with the retention profile calculated earlier.

21
_images/Cohort_Example_14_0.png

Datetime

What kind of date is this anyway? Lets use proper human dates from the Gregorian calendar:

_images/Cohort_Example_16_0.png

Revenue

Well how much money did we earn? A premade revenue profile class called ARPDAU is imported and is initialized by setting the ARPDAU to a value.

dau revenue
Date
2019-03-14 100 210
2019-03-15 150.063 315.132
2019-03-16 182.254 382.734
2019-03-17 207.117 434.947
2019-03-18 227.817 478.416
2019-03-19 245.774 516.125
2019-03-20 261.761 549.698
2019-03-21 276.253 580.132
2019-03-22 289.563 608.083
2019-03-23 301.912 634.014
2019-03-24 313.458 658.262
2019-03-25 324.324 681.081
2019-03-26 334.604 702.669
2019-03-27 344.374 723.184
2019-03-28 353.692 742.754
2019-03-29 362.61 761.481
2019-03-30 371.169 779.455
2019-03-31 379.403 796.747
2019-04-01 387.343 813.421
2019-04-02 395.015 829.531
2019-04-03 402.439 845.122
2019-04-04 409.636 860.237
2019-04-05 416.624 874.91
2019-04-06 323.416 679.173
2019-04-07 279.963 587.923
2019-04-08 254.212 533.846
2019-04-09 235.631 494.825
2019-04-10 221.064 464.234
2019-04-11 209.099 439.109
2019-04-12 198.971 417.84
2019-04-13 190.214 399.449
2019-04-14 182.519 383.291
2019-04-15 175.675 368.917

This can be plotted too!

_images/Cohort_Example_20_0.png

If we are interested in uncertainties the Uncertainties package have been implemented. This can be used the following way:

_images/Cohort_Example_22_0.png _images/Cohort_Example_22_1.png _images/Cohort_Example_22_2.png

When working with uncertainties, the nominal values and the uncertainty values can be obtained with functions nominal_values and std_devs, respectively:

Date
1      100+/-5
2      151+/-8
3     183+/-11
4     208+/-12
5     228+/-14
6     246+/-15
7     261+/-16
8     275+/-17
9     288+/-17
10    300+/-18
11    311+/-19
12    322+/-20
13    331+/-20
14    341+/-21
15    350+/-22
16    358+/-22
17    366+/-23
18    374+/-23
19    382+/-24
20    389+/-24
21    396+/-25
22    403+/-26
23    409+/-26
24    316+/-23
25    271+/-21
26    245+/-19
27    226+/-19
28    211+/-18
29    199+/-17
30    189+/-17
31    181+/-16
32    173+/-16
33    166+/-16
Name: dau, dtype: object
array([100.        , 151.00796688, 183.23144399, 207.86303692,
       228.21970943, 245.77844367, 261.3390633 , 275.38876963,
       288.24877584, 300.14329757, 311.23575048, 321.64933796,
       331.47951354, 340.80190942, 349.6775894 , 358.15664906,
       366.28075478, 374.08497899, 381.59915514, 388.84889718,
       395.85637904, 402.64093973, 409.21955908, 315.60723627,
       270.8093275 , 244.6301836 , 225.88786696, 211.27502302,
       199.32335936, 189.24093946, 180.54774509, 172.92912575,
       166.16687945])
array([ 5.        ,  8.34935123, 10.56825389, 12.20066331, 13.51786087,
       14.6418741 , 15.63646161, 16.53872917, 17.37200759, 18.15183883,
       18.88905055, 19.59146105, 20.26488384, 20.91374916, 21.54150443,
       22.1508812 , 22.74407817, 23.32289008, 23.88880016, 24.44304807,
       24.98668034, 25.52058868, 26.04553928, 22.86365271, 20.63415691,
       19.40634919, 18.5255104 , 17.82742722, 17.24464372, 16.74232945,
       16.29990806, 15.90409195, 15.54574024])

It is possible to save a cohort class instance (using pickle) and loading it.

facebook.save('facebook_revenue.pkl')
import pyfreya
facebook_loaded = pyfreya.load('facebook_revenue.pkl')
Cohort Class
class pyfreya.cohort.cohort.Cohort(new_users, days_since_install=None, retention_values=None, retention_function='power', retention_profile=None, start_date=1, revenue_profile=None, name='')[source]

Bases: object

Cohort class new_users parameter must be provided. To add retention, either add retention and days since install values or supply a pre-made retention profile - see Retention.

apply_revenue(revenue_profile=None)[source]

Given a revenue profile and a cohort apply the revenue profile to get revenue and revenue uncertainty.

Parameters:revenue_profile (Optional[BaseRevenue]) – The revenue profile to use, if none is provided it will assume that

one was provided earlier. :return:

days_to_dau(goal, max_days=360)[source]

Calculates the number of days until a given dau count have been reached. To not continue into infinity (and beyond) it is ensured that the maximum amount of days is max_days.

Parameters:
  • goal (int) – The amount of DAU that is the goal.
  • max_days – The maximum number of days to look through.
Returns:

days_to_rev(goal, max_days=360)[source]

Calculates the number of days until revenue of a single day hav reached goal or above. To not continue into infinity (and beyond) it is ensured that the maximum amount of days is max_days.

Parameters:
  • goal (float) – Daily revenue goal.
  • max_days – The maximum number of days to look trough.
Returns:

days_to_total_rev(goal, max_days=360)[source]

Calculates the number of days until the cumulative revenue have reached goal To not continue into infinity (and beyond) it is ensured that the maximum amount of days is max_days.

Parameters:
  • goal (float) – Cumulative revenue goal.
  • max_days – The maximum number of days to look through.
Returns:

plot_dau()[source]

Plot daily active users.

Returns:
plot_retention()[source]

Plots the retention.

Returns:
plot_revenue()[source]

Plot the revenue with uncertainty (left y-axis) and cumulative revenue (right y-axis). The cumulative revenue could also have uncertainty, though it is not obvious how to calculate this. The best bet is probably `error propagation

Returns:
replicate_cohort(n_days_since_install, post_influx_duration=0)[source]

Replicate the cohort over multiple days. The number of dates are concurrent and given in the first parameter. If it is of any interest to see the cohorts after the influx of them have stopped post_influx_duration can be set to some amount of days.

Parameters:
  • n_days_since_install (int) – Number of days a new (equivalent) cohort starts.
  • post_influx_duration – The number of days to wait after the last cohort have been

added. :return:

save(filename)[source]

Saves the cohort as a pickle file.

Parameters:filename (str) – Filename for the cohort.
Returns:

Module contents

inits cohort