TL;DR
A trimmed version of this analysis (without code) is published in medium.
Republica published a story that Lawmakers in Nepal are planning to put several sectors in negative list for FDI. They might or mightn’t have done some analysis on what the FDI data says. In case they haven’t, i thought i might take a look at that.
Fortunately data is maintained in the website of Department of Industry, but unfortunately most of their data is in PDF as usual. I created CSV file from PDF and cleaned that up - the scrapping and cleanup process is for another day.
First we need to load the clean data and do basic cleanup and transformation. The clean data set already has segregated the Local and Foreign share in the company into separate columns.
df.companies <- read.csv(here::here("doi","data","doi.companies-clean.csv"), stringsAsFactors = FALSE)
df.companies$total_capital <- as.numeric(df.companies$total_capital)Total industries: 6519
We have the year column ready for quick charts. So first we can quickly see in which year industries are registering.
Lets create a table with the count of registration based on year and month.
df.companies %>% 
  group_by(register_year, register_month) %>% 
  count() %>% 
  ungroup() %>% 
  mutate(register_month = as.character(register_month)) %>% 
  bind_rows(group_by(., register_year) %>% 
              summarise(register_month = "Total", 
                        n = sum(n)))%>%   
  mutate(register_month = factor(register_month, levels = c(1:12,"Total"))) %>% 
  spread(register_month, n) %>% 
  gt() %>% 
  
  fmt_missing(columns = 1:13,
              missing_text = "") %>% 
  cols_label(
    register_year = "Registration Year (Nepali)",
  ) %>% 
  tab_spanner(
    label = "Month (Nepali)",
    columns = vars(1,2,3,4,5,6,7,8,9,10,11,12)
  ) %>% 
  tab_style(
    style = cells_styles(
      text_size = px(12)
    ),
    locations = cells_data(
      columns = everything()
    )
  )| Registration Year (Nepali) | Month (Nepali) | Total | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | ||
| 2018 | 1 | 1 | |||||||||||
| 2022 | 1 | 1 | 2 | ||||||||||
| 2030 | 1 | 2 | 1 | 4 | |||||||||
| 2032 | 1 | 1 | |||||||||||
| 2033 | 1 | 1 | 2 | ||||||||||
| 2034 | 1 | 1 | 2 | ||||||||||
| 2035 | 1 | 1 | 2 | ||||||||||
| 2037 | 1 | 1 | 2 | ||||||||||
| 2038 | 1 | 1 | |||||||||||
| 2039 | 2 | 1 | 1 | 1 | 5 | ||||||||
| 2040 | 1 | 1 | |||||||||||
| 2041 | 1 | 1 | 2 | ||||||||||
| 2042 | 1 | 1 | 1 | 3 | |||||||||
| 2043 | 1 | 1 | 1 | 2 | 5 | ||||||||
| 2044 | 1 | 1 | 1 | 1 | 4 | ||||||||
| 2045 | 1 | 1 | 1 | 1 | 1 | 1 | 6 | ||||||
| 2046 | 1 | 1 | 1 | 1 | 4 | ||||||||
| 2047 | 9 | 12 | 7 | 7 | 20 | 16 | 8 | 11 | 6 | 96 | |||
| 2048 | 10 | 18 | 11 | 8 | 13 | 12 | 13 | 12 | 13 | 20 | 11 | 36 | 177 | 
| 2049 | 105 | 75 | 118 | 133 | 124 | 47 | 216 | 17 | 23 | 14 | 3 | 3 | 878 | 
| 2050 | 3 | 11 | 12 | 7 | 6 | 12 | 8 | 12 | 13 | 14 | 11 | 18 | 127 | 
| 2051 | 13 | 12 | 21 | 21 | 20 | 15 | 11 | 16 | 16 | 16 | 12 | 20 | 193 | 
| 2052 | 10 | 26 | 30 | 24 | 24 | 15 | 27 | 36 | 40 | 33 | 34 | 36 | 335 | 
| 2053 | 35 | 38 | 31 | 20 | 22 | 32 | 10 | 21 | 20 | 19 | 23 | 19 | 290 | 
| 2054 | 18 | 20 | 18 | 9 | 13 | 8 | 9 | 15 | 5 | 7 | 12 | 13 | 147 | 
| 2055 | 7 | 5 | 9 | 9 | 10 | 4 | 4 | 3 | 5 | 13 | 9 | 17 | 95 | 
| 2056 | 6 | 12 | 19 | 11 | 7 | 16 | 2 | 8 | 14 | 19 | 11 | 21 | 146 | 
| 2057 | 19 | 18 | 13 | 13 | 9 | 17 | 11 | 16 | 9 | 15 | 11 | 11 | 162 | 
| 2058 | 12 | 9 | 12 | 10 | 10 | 23 | 8 | 17 | 9 | 16 | 8 | 10 | 144 | 
| 2059 | 8 | 9 | 11 | 13 | 13 | 7 | 9 | 6 | 11 | 6 | 7 | 4 | 104 | 
| 2060 | 11 | 21 | 6 | 10 | 13 | 15 | 7 | 14 | 11 | 13 | 15 | 13 | 149 | 
| 2061 | 8 | 11 | 12 | 23 | 9 | 8 | 7 | 7 | 4 | 8 | 5 | 6 | 108 | 
| 2062 | 11 | 10 | 9 | 12 | 11 | 12 | 3 | 14 | 13 | 5 | 10 | 11 | 121 | 
| 2063 | 7 | 9 | 13 | 10 | 17 | 12 | 9 | 16 | 13 | 11 | 13 | 10 | 140 | 
| 2064 | 24 | 20 | 22 | 18 | 17 | 28 | 9 | 17 | 11 | 18 | 25 | 15 | 224 | 
| 2065 | 13 | 26 | 30 | 20 | 34 | 24 | 29 | 23 | 21 | 13 | 25 | 18 | 276 | 
| 2066 | 20 | 30 | 44 | 21 | 27 | 21 | 19 | 28 | 13 | 20 | 13 | 19 | 275 | 
| 2067 | 16 | 35 | 24 | 27 | 23 | 22 | 9 | 17 | 17 | 14 | 18 | 19 | 241 | 
| 2068 | 16 | 28 | 32 | 30 | 24 | 19 | 21 | 27 | 10 | 15 | 12 | 26 | 260 | 
| 2069 | 22 | 36 | 37 | 46 | 34 | 51 | 25 | 32 | 24 | 30 | 29 | 27 | 393 | 
| 2070 | 46 | 61 | 41 | 38 | 25 | 34 | 15 | 16 | 19 | 26 | 19 | 34 | 374 | 
| 2071 | 48 | 54 | 43 | 31 | 43 | 48 | 35 | 32 | 36 | 36 | 24 | 41 | 471 | 
| 2072 | 19 | 60 | 58 | 32 | 47 | 40 | 29 | 26 | 18 | 5 | 23 | 33 | 390 | 
| 2073 | 36 | 61 | 59 | 156 | |||||||||
Lets quickly draw a chart showing the trend of industry registration.
registration_years <- unique(df.companies$register_year)
df.companies %>% 
  group_by(register_year) %>% 
  count() %>% 
  modelr::add_predictions(loess(n ~ register_year, data = .), "prediction") %>% 
  mutate(prediction = ifelse(register_year<2046,NA,prediction)) %>% 
  ggplot(aes(x=register_year, y=n)) +
  geom_bar(stat="identity") +
  scale_x_continuous(labels = registration_years, breaks = registration_years) +
  geom_line(aes(y=prediction), linetype = "longdash", color = "blue", size = 0.3) +
  labs(x="Registration Year",y="") +
  theme(axis.text.x = element_text(angle = 75, vjust = 0.5, size = 7.5))We see that there’s spark spike of company registration in 2049 and declines in subsequent years. It would be interesting to see why there’s such a remarkable increase in the registration. I would be happy to see some researcher looking into the year 2049 BS and see what happened in Nepal.
The trend is in increase mode since 2064.
Our plan is to explore FDI. We have already segregated the FDI share for each industry. Lets see the trend of FDI share in the registered industries.
df.companies %>% 
  filter(total_capital > 0) %>% 
  mutate(has_fdi = ifelse(foreign_percent>0,"Yes","No")) %>% 
  group_by(has_fdi, register_year) %>% 
  count() %>% 
  ggplot(aes(x = register_year, y = n, fill = has_fdi)) +
  geom_bar(stat= "identity") +
  scale_x_continuous(labels = registration_years, breaks = registration_years) +
  labs(x="Registration Year",y="",fill="Has FDI?") +
  theme(axis.text.x = element_text(angle = 75, vjust = 0.5, size = 7.5),
        legend.position = c(0.92,0.85))We see that FDI started flowing in from 2069 only and significant number of industries are FDI invested. It’s not clear from the data whether Nepal received FDI before that or not.
But we will limit our discussion to this data set only. It doesn’t give full picture of FDI, but we want to see whether the Lawmakers thoughts on banning FDI justifies or not.
So we will limit the analysis from 2069 onwards only.
The following chart considers the industry data from 2069 onwards. Here we are trying to see where each industry (of either small, medium or large scale) resides in the chart of employment against total capital.
df.companies %>% 
  filter(register_year >=2069) %>% 
  filter(total_capital > 0) %>% 
  mutate(has_fdi = ifelse(foreign_percent>0,"Yes","No")) %>% 
  ggplot(aes(x=total_capital, y=employment, color=scale)) +
  geom_point(alpha = 0.2) +
  scale_x_log10(label = comma) +
  scale_y_log10() +
  facet_grid(has_fdi ~ .) +
  labs(x = "Total Capital", y = "Employment")At quick glance, we see that most of the FDI industries are of small scale. Though some of the small scaled industries (blue dots) are also on the region of large scale industries (red dots). This raises the question on data authenticity, on the definition of scale.
Further analysis will be carried to those industries with 100% FDI.
df.companies.mean <- df.companies %>% 
  filter(foreign_percent == 100) %>% 
  group_by(scale) %>% 
  summarise(
    employment = mean(employment, na.rm = TRUE),
    total_capital = mean(total_capital, na.rm = TRUE)
  )
df.companies %>% 
  filter(foreign_percent == 100) %>% 
  ggplot(aes(x = total_capital, y = employment, color = scale)) +
  geom_point(alpha = 0.2, size = 1) +
  geom_point(data = df.companies.mean, size = 3, alpha = 1) +
  scale_x_log10(label = comma) +
  scale_y_log10() +
  labs(x = "Total Capital", y = "Employment")The big dot is the mean for each category.
The red dot is the mean employment for different scale. To interpret the boxplot below, read the following.
df.companies %>% 
  filter(foreign_percent == 100) %>% 
  ggplot(aes(x=scale, y=employment)) +
  geom_boxplot(alpha=0.5) +
  geom_jitter(alpha=0.05) +
  stat_summary(fun.y=mean, colour="darkred", geom="point", 
               shape=18, size=3,show_guide = FALSE) +
  scale_y_log10()The red dot is the mean capital for different scale.
df.companies %>% 
  filter(foreign_percent == 100) %>% 
  ggplot(aes(x=scale, y=total_capital)) +
  geom_boxplot(alpha=0.5) +
  geom_jitter(alpha=0.05) +
  stat_summary(fun.y=mean, colour="darkred", geom="point", 
               shape=18, size=3,show_guide = FALSE) +
  scale_y_log10(label = comma)Again we are limiting the analysis to 100% FDI industries. Lets see the summary of sector wise breakdown of 100% FDI industries.
df.companies %>% 
  filter(foreign_percent == 100) %>% 
  group_by(category) %>% 
  summarise(industry = n(), 
            employment = sum(employment, na.rm = TRUE), 
            capital = sum(total_capital, na.rm = TRUE)) %>% 
  mutate(category = tolower(category)) %>% 
  bind_rows(group_by(.) %>% 
              summarise(category = "Total",
                        industry = sum(industry),
                        employment = sum(employment),
                        capital = sum(capital))
            ) %>% 
  mutate(capital_per_capita = capital/industry,
         employment_per_capita = employment/industry) %>% 
  gt() %>% 
  cols_label(
    category = "Sector",
    industry = "No of Industry",
    employment = "Total",
    employment_per_capita = "Per industry",
    capital = "Total",
    capital_per_capita = "Per industry"
  ) %>% 
  cols_move(
    columns = vars(employment_per_capita),
    after = vars(employment)
  ) %>% 
  tab_spanner(
    label = "Employment",
    columns = vars(employment, employment_per_capita)
  ) %>% 
  tab_spanner(
    label = "Capital",
    columns = vars(capital, capital_per_capita)
  ) %>% 
  fmt_number(
    columns = vars(capital_per_capita, capital, employment, employment_per_capita)
  ) %>% 
  tab_style(
    style = cells_styles(
      text_size = px(12)
    ),
    locations = cells_data(
      columns = everything()
    )
  )  | Sector | No of Industry | Employment | Capital | ||
|---|---|---|---|---|---|
| Total | Per industry | Total | Per industry | ||
| agro and forestry based | 130 | 4,113.00 | 31.64 | 2,792,600,000.00 | 21,481,538.46 | 
| construction | 1 | 25.00 | 25.00 | 100,000,000.00 | 100,000,000.00 | 
| energy based | 4 | 213.00 | 53.25 | 336,966,193.00 | 84,241,548.25 | 
| manufacturing | 122 | 5,842.00 | 47.89 | 3,509,162,358.00 | 28,763,625.89 | 
| mineral | 10 | 1,096.00 | 109.60 | 270,000,000.00 | 27,000,000.00 | 
| service | 347 | 12,755.00 | 36.76 | 16,883,380,240.00 | 48,655,274.47 | 
| tourism | 305 | 9,468.00 | 31.04 | 8,044,602,000.00 | 26,375,744.26 | 
| Total | 919 | 33,512.00 | 36.47 | 31,936,710,791.00 | 34,751,589.54 | 
The big dot is the mean for each category.
df.companies.category.mean <- df.companies %>% 
  filter(foreign_percent == 100) %>% 
  group_by(category) %>% 
  summarise(
    employment = mean(employment, na.rm = TRUE),
    total_capital = mean(total_capital, na.rm = TRUE)
  )
df.companies %>% 
  filter(foreign_percent == 100) %>% 
  ggplot(aes(x = total_capital, y = employment, color = tolower(category))) +
  geom_point(alpha = 0.1, size = 1) +
  geom_point(data = df.companies.category.mean, size = 3) +
  scale_x_log10(label = comma) +
  scale_y_log10() +
  labs(x = "Total Capital", y = "Employment", color = "Sector")Here’s district-wise summary for 100% FDI industries.
df.companies %>% 
  filter(foreign_percent == 100) %>% 
  group_by(district) %>% 
  summarise(industries = n(), 
            employment = sum(employment, na.rm = TRUE), 
            capital = sum(total_capital, na.rm = TRUE)) %>% 
  arrange(desc(employment), desc(capital)) %>% 
  gt() %>% 
  cols_label(
    district = "District",
    industries = "No of Industries",
    employment = "Total Employment",
    capital = "Total Capital"
  ) %>% 
  fmt_number(
    columns = vars(capital)
  ) %>% 
  tab_style(
    style = cells_styles(
      text_size = px(12)
    ),
    locations = cells_data(
      columns = everything()
    )
  )| District | No of Industries | Total Employment | Total Capital | 
|---|---|---|---|
| KATHMANDU | 478 | 16519 | 15,691,144,433.00 | 
| LALITPUR | 238 | 9026 | 9,577,971,368.00 | 
| KASKI | 83 | 2038 | 865,100,000.00 | 
| DANG | 7 | 653 | 169,000,000.00 | 
| BHAKTAPUR | 10 | 598 | 2,538,400,000.00 | 
| PARSA | 6 | 564 | 189,234,000.00 | 
| KAVRE | 11 | 459 | 474,209,000.00 | 
| CHITWAN | 9 | 337 | 257,500,000.00 | 
| KAILALI | 6 | 273 | 104,500,000.00 | 
| BARA | 5 | 227 | 355,200,000.00 | 
| PYUTHAN | 2 | 200 | 90,000,000.00 | 
| BARDIYA | 2 | 188 | 135,000,000.00 | 
| KANCHANPUR | 5 | 187 | 50,000,000.00 | 
| NUWAKOT | 7 | 185 | 108,615,000.00 | 
| MORANG | 4 | 173 | 251,236,990.00 | 
| BAITADI | 3 | 161 | 46,000,000.00 | 
| MAKWANPUR | 5 | 155 | 105,000,000.00 | 
| BANKE | 2 | 150 | 15,000,000.00 | 
| SINDHUPALCH OWK | 4 | 144 | 104,000,000.00 | 
| SOLUKHUMBU | 3 | 137 | 235,000,000.00 | 
| SANKHUWASA BHA | 1 | 130 | 0.00 | 
| RUPANDEHI | 4 | 129 | 165,000,000.00 | 
| SURKHET | 1 | 110 | 10,000,000.00 | 
| ILAM | 1 | 103 | 100,000,000.00 | 
| DHADING | 3 | 101 | 45,000,000.00 | 
| DOLKHA | 3 | 95 | 25,000,000.00 | 
| NAWALPARASI | 2 | 92 | 60,000,000.00 | 
| JHAPA | 2 | 69 | 37,000,000.00 | 
| TANAHU | 2 | 64 | 14,000,000.00 | 
| RAUTAHAT | 1 | 50 | 10,000,000.00 | 
| GORKHA | 2 | 47 | 25,000,000.00 | 
| DADELDHURA | 2 | 47 | 15,000,000.00 | 
| JAJARKOT | 1 | 40 | 30,000,000.00 | 
| HUMLA | 1 | 21 | 10,000,000.00 | 
| MUGU | 1 | 18 | 8,600,000.00 | 
| SUNSARI | 1 | 13 | 5,000,000.00 | 
| KAPILBASTU | 1 | 9 | 15,000,000.00 | 
Here we want to see how much industries are concentrated in valley and the rest of Nepal.
df.companies %>% 
  filter(foreign_percent == 100) %>% 
  mutate(district_group = ifelse(tolower(district) %in% c("kathmandu","lalitpur","bhaktapur"), "Kathmandu Valley", "Rest")) %>% 
  group_by(district_group) %>% 
  summarise(industries = n(), 
            employment = sum(employment, na.rm = TRUE), 
            capital = sum(total_capital, na.rm = TRUE)) %>% 
  arrange(desc(employment), desc(capital)) %>% 
  mutate(capital_percent = capital/sum(capital),
         employment_percent = employment/sum(employment),) %>% 
  gt() %>% 
  cols_label(
    district_group = "Valley and Rest",
    industries = "No of Industries",
    employment = "Total",
    employment_percent = "% Share",
    capital = "Total",
    capital_percent = "% Share"
  ) %>% 
  cols_move(
    columns = vars(employment_percent),
    after = vars(employment)
  ) %>% 
  tab_spanner(
    label = "Employment",
    columns = vars(employment, employment_percent)
  ) %>% 
  tab_spanner(
    label = "Capital",
    columns = vars(capital, capital_percent)
  ) %>% 
  fmt_number(
    columns = vars(capital)
  ) %>% 
  fmt_percent(
    columns = vars(capital_percent, employment_percent)
  ) %>% 
  tab_style(
    style = cells_styles(
      text_size = px(12)
    ),
    locations = cells_data(
      columns = everything()
    )
  )| Valley and Rest | No of Industries | Employment | Capital | ||
|---|---|---|---|---|---|
| Total | % Share | Total | % Share | ||
| Kathmandu Valley | 726 | 26143 | 78.01% | 27,807,515,801.00 | 87.07% | 
| Rest | 193 | 7369 | 21.99% | 4,129,194,990.00 | 12.93% | 
 As expected, valley represents bulk of FDI. 
Thank you for reading. Any questions or suggestions? Please ask me @twitter 
A trimmed version of this analysis (without code) is published in medium.