TL;DR

A trimmed version of this analysis (without code) is published in medium.

1 Background

Republica published a story that Lawmakers in Nepal are planning to put several sectors in negative list for FDI. They might or mightn’t have done some analysis on what the FDI data says. In case they haven’t, i thought i might take a look at that.

Fortunately data is maintained in the website of Department of Industry, but unfortunately most of their data is in PDF as usual. I created CSV file from PDF and cleaned that up - the scrapping and cleanup process is for another day.

2 Data Loading

First we need to load the clean data and do basic cleanup and transformation. The clean data set already has segregated the Local and Foreign share in the company into separate columns.

df.companies <- read.csv(here::here("doi","data","doi.companies-clean.csv"), stringsAsFactors = FALSE)
df.companies$total_capital <- as.numeric(df.companies$total_capital)

Total industries: 6519

3 Data analysis

3.1 Companies Registration Trend

We have the year column ready for quick charts. So first we can quickly see in which year industries are registering.

3.1.1 Year and Monthwise Registration

Lets create a table with the count of registration based on year and month.

df.companies %>% 
  group_by(register_year, register_month) %>% 
  count() %>% 
  ungroup() %>% 
  mutate(register_month = as.character(register_month)) %>% 
  bind_rows(group_by(., register_year) %>% 
              summarise(register_month = "Total", 
                        n = sum(n)))%>%   
  mutate(register_month = factor(register_month, levels = c(1:12,"Total"))) %>% 
  spread(register_month, n) %>% 
  gt() %>% 
  
  fmt_missing(columns = 1:13,
              missing_text = "") %>% 
  cols_label(
    register_year = "Registration Year (Nepali)",
  ) %>% 
  tab_spanner(
    label = "Month (Nepali)",
    columns = vars(1,2,3,4,5,6,7,8,9,10,11,12)
  ) %>% 
  tab_style(
    style = cells_styles(
      text_size = px(12)
    ),
    locations = cells_data(
      columns = everything()
    )
  )

Registration Year (Nepali)	Month (Nepali)												Total
Registration Year (Nepali)	1	2	3	4	5	6	7	8	9	10	11	12	Total
2018								1					1
2022		1	1										2
2030			1				2					1	4
2032		1											1
2033						1		1					2
2034	1			1									2
2035						1			1				2
2037				1						1			2
2038					1								1
2039				2		1		1			1		5
2040										1			1
2041				1					1				2
2042			1	1								1	3
2043				1	1	1			2				5
2044					1		1		1		1		4
2045		1			1		1	1	1			1	6
2046		1	1	1					1				4
2047				9	12	7	7	20	16	8	11	6	96
2048	10	18	11	8	13	12	13	12	13	20	11	36	177
2049	105	75	118	133	124	47	216	17	23	14	3	3	878
2050	3	11	12	7	6	12	8	12	13	14	11	18	127
2051	13	12	21	21	20	15	11	16	16	16	12	20	193
2052	10	26	30	24	24	15	27	36	40	33	34	36	335
2053	35	38	31	20	22	32	10	21	20	19	23	19	290
2054	18	20	18	9	13	8	9	15	5	7	12	13	147
2055	7	5	9	9	10	4	4	3	5	13	9	17	95
2056	6	12	19	11	7	16	2	8	14	19	11	21	146
2057	19	18	13	13	9	17	11	16	9	15	11	11	162
2058	12	9	12	10	10	23	8	17	9	16	8	10	144
2059	8	9	11	13	13	7	9	6	11	6	7	4	104
2060	11	21	6	10	13	15	7	14	11	13	15	13	149
2061	8	11	12	23	9	8	7	7	4	8	5	6	108
2062	11	10	9	12	11	12	3	14	13	5	10	11	121
2063	7	9	13	10	17	12	9	16	13	11	13	10	140
2064	24	20	22	18	17	28	9	17	11	18	25	15	224
2065	13	26	30	20	34	24	29	23	21	13	25	18	276
2066	20	30	44	21	27	21	19	28	13	20	13	19	275
2067	16	35	24	27	23	22	9	17	17	14	18	19	241
2068	16	28	32	30	24	19	21	27	10	15	12	26	260
2069	22	36	37	46	34	51	25	32	24	30	29	27	393
2070	46	61	41	38	25	34	15	16	19	26	19	34	374
2071	48	54	43	31	43	48	35	32	36	36	24	41	471
2072	19	60	58	32	47	40	29	26	18	5	23	33	390
2073	36	61	59										156

3.1.2 Registration Trend

Lets quickly draw a chart showing the trend of industry registration.

registration_years <- unique(df.companies$register_year)
df.companies %>% 
  group_by(register_year) %>% 
  count() %>% 
  modelr::add_predictions(loess(n ~ register_year, data = .), "prediction") %>% 
  mutate(prediction = ifelse(register_year<2046,NA,prediction)) %>% 
  ggplot(aes(x=register_year, y=n)) +
  geom_bar(stat="identity") +
  scale_x_continuous(labels = registration_years, breaks = registration_years) +
  geom_line(aes(y=prediction), linetype = "longdash", color = "blue", size = 0.3) +
  labs(x="Registration Year",y="") +
  theme(axis.text.x = element_text(angle = 75, vjust = 0.5, size = 7.5))

We see that there’s spark spike of company registration in 2049 and declines in subsequent years. It would be interesting to see why there’s such a remarkable increase in the registration. I would be happy to see some researcher looking into the year 2049 BS and see what happened in Nepal.

The trend is in increase mode since 2064.

3.2 Registratration Trend with FDI

Our plan is to explore FDI. We have already segregated the FDI share for each industry. Lets see the trend of FDI share in the registered industries.

df.companies %>% 
  filter(total_capital > 0) %>% 
  mutate(has_fdi = ifelse(foreign_percent>0,"Yes","No")) %>% 
  group_by(has_fdi, register_year) %>% 
  count() %>% 
  ggplot(aes(x = register_year, y = n, fill = has_fdi)) +
  geom_bar(stat= "identity") +
  scale_x_continuous(labels = registration_years, breaks = registration_years) +
  labs(x="Registration Year",y="",fill="Has FDI?") +
  theme(axis.text.x = element_text(angle = 75, vjust = 0.5, size = 7.5),
        legend.position = c(0.92,0.85))

We see that FDI started flowing in from 2069 only and significant number of industries are FDI invested. It’s not clear from the data whether Nepal received FDI before that or not.

But we will limit our discussion to this data set only. It doesn’t give full picture of FDI, but we want to see whether the Lawmakers thoughts on banning FDI justifies or not.

So we will limit the analysis from 2069 onwards only.

3.3 FDI and Non-FDI industry exploration

The following chart considers the industry data from 2069 onwards. Here we are trying to see where each industry (of either small, medium or large scale) resides in the chart of employment against total capital.

df.companies %>% 
  filter(register_year >=2069) %>% 
  filter(total_capital > 0) %>% 
  mutate(has_fdi = ifelse(foreign_percent>0,"Yes","No")) %>% 
  ggplot(aes(x=total_capital, y=employment, color=scale)) +
  geom_point(alpha = 0.2) +
  scale_x_log10(label = comma) +
  scale_y_log10() +
  facet_grid(has_fdi ~ .) +
  labs(x = "Total Capital", y = "Employment")

At quick glance, we see that most of the FDI industries are of small scale. Though some of the small scaled industries (blue dots) are also on the region of large scale industries (red dots). This raises the question on data authenticity, on the definition of scale.

3.4 FDI Share breakdown

There are industries with as little as 2% FDI as well with 100% FDI. Lets see the breakdown based on FDI percentage share.

df.companies %>% 
  filter(foreign_percent > 0) %>% 
  mutate(foreign_percent_range = cut(foreign_percent, 
                                     breaks = c(0,11,49,99.99,100),
                                     labels = c("10 or less than 10%","11 to 49%", "50 to less than 100%", "100%"),
                                     include.lowest = TRUE)) %>% 
  group_by(foreign_percent_range) %>% 
  count() %>% 
  mutate(percent = n/sum(.$n)) %>% 
  ungroup() %>% 
  gt() %>% 
  cols_label(
    foreign_percent_range = "FDI % share ",
    n = "Number of Registered Industries",
    percent = "Percentage of Industries"
  ) %>% 
  fmt_percent(
    columns = vars(percent)
  )

FDI % share	Number of Registered Industries	Percentage of Industries
10 or less than 10%	3	0.27%
11 to 49%	41	3.65%
50 to less than 100%	159	14.17%
100%	919	81.91%

82% of registered industries are 100% FDI. So we can further narrow our analysis to 100% FDI invested data only.

3.5 Exploring 100% FDI industries

Further analysis will be carried to those industries with 100% FDI.

3.5.1 Scatterplot of industries based on scale

df.companies.mean <- df.companies %>% 
  filter(foreign_percent == 100) %>% 
  group_by(scale) %>% 
  summarise(
    employment = mean(employment, na.rm = TRUE),
    total_capital = mean(total_capital, na.rm = TRUE)
  )

df.companies %>% 
  filter(foreign_percent == 100) %>% 
  ggplot(aes(x = total_capital, y = employment, color = scale)) +
  geom_point(alpha = 0.2, size = 1) +
  geom_point(data = df.companies.mean, size = 3, alpha = 1) +
  scale_x_log10(label = comma) +
  scale_y_log10() +
  labs(x = "Total Capital", y = "Employment")

The big dot is the mean for each category.

3.5.2 Spread of employment numbers for 100% FDI industries

The red dot is the mean employment for different scale. To interpret the boxplot below, read the following.

df.companies %>% 
  filter(foreign_percent == 100) %>% 
  ggplot(aes(x=scale, y=employment)) +
  geom_boxplot(alpha=0.5) +
  geom_jitter(alpha=0.05) +
  stat_summary(fun.y=mean, colour="darkred", geom="point", 
               shape=18, size=3,show_guide = FALSE) +
  scale_y_log10()

3.5.3 Spread of total capital for 100% FDI industries

The red dot is the mean capital for different scale.

df.companies %>% 
  filter(foreign_percent == 100) %>% 
  ggplot(aes(x=scale, y=total_capital)) +
  geom_boxplot(alpha=0.5) +
  geom_jitter(alpha=0.05) +
  stat_summary(fun.y=mean, colour="darkred", geom="point", 
               shape=18, size=3,show_guide = FALSE) +
  scale_y_log10(label = comma)

3.6 Sector-wise analysis

Again we are limiting the analysis to 100% FDI industries. Lets see the summary of sector wise breakdown of 100% FDI industries.

3.6.1 Sector-wise summary table

df.companies %>% 
  filter(foreign_percent == 100) %>% 
  group_by(category) %>% 
  summarise(industry = n(), 
            employment = sum(employment, na.rm = TRUE), 
            capital = sum(total_capital, na.rm = TRUE)) %>% 
  mutate(category = tolower(category)) %>% 
  bind_rows(group_by(.) %>% 
              summarise(category = "Total",
                        industry = sum(industry),
                        employment = sum(employment),
                        capital = sum(capital))
            ) %>% 
  mutate(capital_per_capita = capital/industry,
         employment_per_capita = employment/industry) %>% 
  gt() %>% 
  cols_label(
    category = "Sector",
    industry = "No of Industry",
    employment = "Total",
    employment_per_capita = "Per industry",
    capital = "Total",
    capital_per_capita = "Per industry"
  ) %>% 
  cols_move(
    columns = vars(employment_per_capita),
    after = vars(employment)
  ) %>% 
  tab_spanner(
    label = "Employment",
    columns = vars(employment, employment_per_capita)
  ) %>% 
  tab_spanner(
    label = "Capital",
    columns = vars(capital, capital_per_capita)
  ) %>% 
  fmt_number(
    columns = vars(capital_per_capita, capital, employment, employment_per_capita)
  ) %>% 
  tab_style(
    style = cells_styles(
      text_size = px(12)
    ),
    locations = cells_data(
      columns = everything()
    )
  )

Sector	No of Industry	Employment		Capital
Sector	No of Industry	Total	Per industry	Total	Per industry
agro and forestry based	130	4,113.00	31.64	2,792,600,000.00	21,481,538.46
construction	1	25.00	25.00	100,000,000.00	100,000,000.00
energy based	4	213.00	53.25	336,966,193.00	84,241,548.25
manufacturing	122	5,842.00	47.89	3,509,162,358.00	28,763,625.89
mineral	10	1,096.00	109.60	270,000,000.00	27,000,000.00
service	347	12,755.00	36.76	16,883,380,240.00	48,655,274.47
tourism	305	9,468.00	31.04	8,044,602,000.00	26,375,744.26
Total	919	33,512.00	36.47	31,936,710,791.00	34,751,589.54

3.7 Sector-wise scatterplot with mean

The big dot is the mean for each category.

df.companies.category.mean <- df.companies %>% 
  filter(foreign_percent == 100) %>% 
  group_by(category) %>% 
  summarise(
    employment = mean(employment, na.rm = TRUE),
    total_capital = mean(total_capital, na.rm = TRUE)
  )

df.companies %>% 
  filter(foreign_percent == 100) %>% 
  ggplot(aes(x = total_capital, y = employment, color = tolower(category))) +
  geom_point(alpha = 0.1, size = 1) +
  geom_point(data = df.companies.category.mean, size = 3) +
  scale_x_log10(label = comma) +
  scale_y_log10() +
  labs(x = "Total Capital", y = "Employment", color = "Sector")

3.8 District-wise summary

Here’s district-wise summary for 100% FDI industries.

df.companies %>% 
  filter(foreign_percent == 100) %>% 
  group_by(district) %>% 
  summarise(industries = n(), 
            employment = sum(employment, na.rm = TRUE), 
            capital = sum(total_capital, na.rm = TRUE)) %>% 
  arrange(desc(employment), desc(capital)) %>% 
  gt() %>% 
  cols_label(
    district = "District",
    industries = "No of Industries",
    employment = "Total Employment",
    capital = "Total Capital"
  ) %>% 
  fmt_number(
    columns = vars(capital)
  ) %>% 
  tab_style(
    style = cells_styles(
      text_size = px(12)
    ),
    locations = cells_data(
      columns = everything()
    )
  )

District	No of Industries	Total Employment	Total Capital
KATHMANDU	478	16519	15,691,144,433.00
LALITPUR	238	9026	9,577,971,368.00
KASKI	83	2038	865,100,000.00
DANG	7	653	169,000,000.00
BHAKTAPUR	10	598	2,538,400,000.00
PARSA	6	564	189,234,000.00
KAVRE	11	459	474,209,000.00
CHITWAN	9	337	257,500,000.00
KAILALI	6	273	104,500,000.00
BARA	5	227	355,200,000.00
PYUTHAN	2	200	90,000,000.00
BARDIYA	2	188	135,000,000.00
KANCHANPUR	5	187	50,000,000.00
NUWAKOT	7	185	108,615,000.00
MORANG	4	173	251,236,990.00
BAITADI	3	161	46,000,000.00
MAKWANPUR	5	155	105,000,000.00
BANKE	2	150	15,000,000.00
SINDHUPALCH OWK	4	144	104,000,000.00
SOLUKHUMBU	3	137	235,000,000.00
SANKHUWASA BHA	1	130	0.00
RUPANDEHI	4	129	165,000,000.00
SURKHET	1	110	10,000,000.00
ILAM	1	103	100,000,000.00
DHADING	3	101	45,000,000.00
DOLKHA	3	95	25,000,000.00
NAWALPARASI	2	92	60,000,000.00
JHAPA	2	69	37,000,000.00
TANAHU	2	64	14,000,000.00
RAUTAHAT	1	50	10,000,000.00
GORKHA	2	47	25,000,000.00
DADELDHURA	2	47	15,000,000.00
JAJARKOT	1	40	30,000,000.00
HUMLA	1	21	10,000,000.00
MUGU	1	18	8,600,000.00
SUNSARI	1	13	5,000,000.00
KAPILBASTU	1	9	15,000,000.00

3.8.1 FDI registered in Valley and elsewhere

Here we want to see how much industries are concentrated in valley and the rest of Nepal.

df.companies %>% 
  filter(foreign_percent == 100) %>% 
  mutate(district_group = ifelse(tolower(district) %in% c("kathmandu","lalitpur","bhaktapur"), "Kathmandu Valley", "Rest")) %>% 
  group_by(district_group) %>% 
  summarise(industries = n(), 
            employment = sum(employment, na.rm = TRUE), 
            capital = sum(total_capital, na.rm = TRUE)) %>% 
  arrange(desc(employment), desc(capital)) %>% 
  mutate(capital_percent = capital/sum(capital),
         employment_percent = employment/sum(employment),) %>% 
  gt() %>% 
  cols_label(
    district_group = "Valley and Rest",
    industries = "No of Industries",
    employment = "Total",
    employment_percent = "% Share",
    capital = "Total",
    capital_percent = "% Share"
  ) %>% 
  cols_move(
    columns = vars(employment_percent),
    after = vars(employment)
  ) %>% 
  tab_spanner(
    label = "Employment",
    columns = vars(employment, employment_percent)
  ) %>% 
  tab_spanner(
    label = "Capital",
    columns = vars(capital, capital_percent)
  ) %>% 
  fmt_number(
    columns = vars(capital)
  ) %>% 
  fmt_percent(
    columns = vars(capital_percent, employment_percent)
  ) %>% 
  tab_style(
    style = cells_styles(
      text_size = px(12)
    ),
    locations = cells_data(
      columns = everything()
    )
  )

Valley and Rest	No of Industries	Employment		Capital
Valley and Rest	No of Industries	Total	% Share	Total	% Share
Kathmandu Valley	726	26143	78.01%	27,807,515,801.00	87.07%
Rest	193	7369	21.99%	4,129,194,990.00	12.93%

As expected, valley represents bulk of FDI.

Thank you for reading. Any questions or suggestions? Please ask me @twitter