2 Data Processing

RHoMIS data processing takes place in three stages.

  1. Unit Extraction: We search through a survey for all of the units, crops, and livestock which were encountered. The user can examine these units and provide conversion factors so that everything is standardised. For crops and livestock, the user can check all of the entries and standardise (e.g. ensure that there are no duplicates with mispelling, such as “maize” and “maze”).

  2. Secondary Cleaning: Now that crop and livestock names are standardised (in step 1), we identify all of the calorie conversions and prices we need for a specific survey (e.g. based on crops identified in step one we need calorie and prices for “maize”). We use the units converted in step 1 to calculate crop and livestock prices (see Indicators Explained).

  3. Indicator Calculation: Finally, with all conversions and units identified, we calculate all of the indicators possible for the survey.

A full reproducible example that you can work through, along with a sample dataset can be found here

2.1 Installation

Currently RHoMIS is not available on CRAN, but we hope one day it will be. In the meantime, to install the RHoMIS R-package, you will need to install devtools, you can that by entering the following into the R console:

install.packages("devtools")

Then to install the most recent version of the R-package you can enter the following command

library(devtools)
devtools::install_github("https://github.com/RHoMIS/rhomis-R-package", force = TRUE)

Then make sure to load the RHoMIS library:

library(rhomis)

2.2 Initial Setup

You should then ensure you have a folder containing the data you would like to process. If using Rstudio interactively, set your working directory to this folder using the setwd() command. Your directory structure should look something like this:

📂 project
┃
┗📂 raw-data
  ┃
  ┗📜 raw-data.csv

2.3 Stage 1: Unit Extraction

2.3.1 Extracting Units

To extract new units from the data you are working with, run the code below. Set the dataFilePath to the path of your RHoMIS raw-data file.

extract_units_and_conversions_csv(
    file_path="raw-data/raw-data.csv",
    id_type = "string",
    proj_id = "test_prj",
    form_id = "test_frm")

If you now go back to your project directory, you should be able to see that more files have been added:

📂 project
┃
┣📂 raw-data
┃ ┗📜 sample_project.csv
┃
┣📂 conversions_stage_1
┃ ┣📜 country.csv
┃ ┗📜 ...
┃
┗📂 .original_stage_1_conversions
  ┣📜 crop_yield_units.csv
  ┣📜 country.csv
  ┗📜 ...

The conversions_stage_1 folder should will include the units which you will verify and change. The .original_stage_1_conversions folder includes an original copy of the units pulled straight from the data. This folder will likely be hidden unless you change the settings on your file browser.

2.3.2 Converting Units

Please note, while you do not have to convert units for the next step in the calculations to run, uncoverted units will be ignored and will lead to unnecessary missing values

In the project/conversions_stage_1 directory, you will find a series of csv files with conversion factors that need to be filled in. Each file will look like the table below. The survey_value column indicates a value which was found in the survey. The conversion column shows the conversion factor which ought to be used. Where the conversion factor is unknown, the column will display NA. Where the conversion factor is known it will be provided. Known conversion factors are already embedded in the R-package.

unit_type id_rhomis_dataset survey_value conversion
crop_unit x foo NA
crop_unit x bar NA

There are a range of conversions to enter and it is important to be aware which units are needed. Each file is described in more detail below:

File Description Example Survey Value Example Conversion
honey_amount_to_l.csv The conversion factor should be used to convert to kilograms/litres. buckets_12_litre 12
country_to_iso2.csv The conversion factor should be used to convert country names to two-letter ISO country codes. This is important as it is used to find currency conversion factors uganda UG
crop_name_to_std.csv The name of crops entered in the survey. Often enumerators may specify “other” crops in a free-text-entry field. Sometimes these crops can be mispelled, or in a language other than English. Here we correct misspellings and translations into standard forms (all lower case) maze maize
crop_price_to_lcu_per_kg.csv The price units for crops which were sold. Please note that only one unit will be converted into a string, “total_income_per_year” will be converted to “total_income_per_year” as this is treated as a special case in the analysis scripts. price_per_50kg_sack 0.02
crop_amount_to_kg.csv The unit of crops which have been collected. This needs to be converted to kilograms cart_250kg 250
eggs_price_to_lcu_per_year.csv The amount of money made per unit time for selling eggs. This needs to be converted into an income per year income for 3 months 4
eggs_amount_to_pieces_per_year.csv The number of eggs collected per year pieces/day 365
fertiliser_amount_to_kg.csv The amount of fertiliser in kg sacks_25kg 25
livestock_name_to_std.csv The names of livestock entered in the survey. As with crop names, enumerators can also enter extra livestock names which are in a different language or mispelt. This conversion is used to standardise livestock names entered in the survey catel cattle
milk_price_to_lcu_per_l.csv The amount of money made per unit volume or per unit time. For unit time, the options are “day”, “week”, “month”, “year”. These time units must be entered as strings. For the unit volume, numeric conversions must be entered month month
0.3litre 0.3
milk_amount_to_l.csv The amount of milk collected in litres per year. There are a number of exceptions which must be kept as text strings, and are dealt with in the analysis scripts (e.g. “per animal per day” and “l/animal/day” l/day 365
land_area_to_ha.csv The amount of land in hectares acres 0.4

2.4 Stage 2: Calculating Prices, Preparing Calorie Conversions, and Extracting Secondary Units

As with unit extraction, ensure you are in the project directory. Then enter the commands below:

calculate_prices_csv(
    file_path="raw-data/raw-data.csv",
    id_type = "string",
    proj_id = "my_project_id",
    form_id = "my_form_id"
)

You will now see that the directory structure again looks quite different:

📂 project
┃
┣📂 raw-data
┃
┃ ┗📜 sample_project.csv
┣📂 conversions_stage_1
┃ ┣📜 country.csv
┃ ┗📜 ...
┃
┗📂 .original_stage_1_conversions
┃ ┣📜 crop_yield_units.csv
┃ ┣📜 country.csv
┃ ┗📜 ...
┃
┣📂 conversions_stage_2
┃ ┣📜 crop_calories.csv
┃ ┣📜 mean_crop_pice_lcu_per_kg
┃ ┣📜 eggs_calories.csv
┃ ┣📜 mean_eggs_price_per_kg
┃ ┗📜 ...
┃
┗📂 .original_stage_2_conversions
  ┣📜 crop_calories.csv
  ┣📜 mean_crop_pice_lcu_per_kg
  ┣📜 eggs_calories.csv
  ┣📜 mean_eggs_price_per_kg
  ┗📜 ...

2.4.1 Converting Calories, Prices, and Other Units

2.4.1.1 Calories and Prices

As with the units which were extracted, you will also need to verify prices and calorie values (in the conversions_stage_2 folder). Prices will be in lcu/kg for crops, meat, and eggs (where lcu is local currency units). Prices will be in lcu/l for milk and honey. And prices will be in lcu/animal for whole livestock sales. For calorie values, conversions will be in kcal/kg or kcal/l.

2.4.1.2 Secondary Units

Aside from prices and calories, there are other conversion tables to be verified, these units depend on conversions from stage 1. For example livestock_count_tlu requires us to know the standard livestock names (identified in stage 1). The secondary units currently processed in the R-package are:

File Description Example Survey Value Example Conversion
livestock_weight_kg.csv The conversion factor used to estimate the amount of meat collected from a whole animal. sheep 25
livestock_count_tlu.csv “Tropical Livestock Units” (TLU) for individual animals. cattle 0.7

Please note any folder with a dot before it will not be visible by default in most file explorers

2.5 Stage 3: Calculating Indicators

Using the calories and prices you have converted, you will then be able to produce the final indicators. Either run the code below, or run the 03-calculate-final-indicators.R file.

calculate_indicators_local(
    file_path="raw-data/raw-data.csv",
    id_type = "string",
    proj_id = "test_prj",
    form_id = "test_frm")

You will now see the directory includes some extra folders:

You will now see that the directory structure again looks quite different. Extra folders and files have been created. There is a lot of information here, for more explanation, please see an explanation of indicators and outputs

📂 project
┃
┣📂 raw-data
┃ ┗📜 sample_project.csv
┃
┣📂 conversions_stage_1
┃ ┣📜 country.csv
┃ ┗📜 ...
┃
┗📂 .original_stage_1_conversions
┃ ┣📜 crop_yield_units.csv
┃ ┣📜 country.csv
┃ ┗📜 ...
┃
┣📂 conversions_stage_2
┃ ┣📜 crop_calories.csv
┃ ┣📜 mean_crop_pice_lcu_per_kg
┃ ┣📜 eggs_calories.csv
┃ ┣📜 mean_eggs_price_per_kg
┃ ┗📜 ...
┃
┣📂 .original_stage_2_conversions
┃ ┣📜 crop_calories.csv
┃ ┣📜 mean_crop_pice_lcu_per_kg
┃ ┣📜 eggs_calories.csv
┃ ┣📜 mean_eggs_price_per_kg
┃ ┗📜 ...
┃
┣📂 crop_data
┃ ┣📜 crop_consumed_kg_per_year.csv
┃ ┣📜 crop_harvest_kg_per_year.csv
┃ ┗📜 ...
┃
┣📂 indicator_data
┃ ┗📜 indicators.csv
┃
┣📂 livestock_data
┃ ┣📜 crop_consumed_kg_per_year.csv
┃ ┣📜 crop_consumed_kg_per_year.csv
┃ ┗📜 ...
┃
┣📂 off_farm_data
┃ ┣📜 offfarm_income_name.csv
┃ ┣📜 offfarm_who_control_revenue.csv
┃ ┗📜 ...
┃
┗📂 processed_data
┃  ┗📜 processed_data.csv
┃
┗📂 extra_outputs
    ┃
    ┣📂 consumption_calorie_values
    ┃ ┣📜 crop_calories_consumed_kcal.csv
    ┃ ┣📜 milk_calories_consumed_kcal.csv
    ┃ ┗📜 ...
    ┣📂 consumption_lcu_values
    ┃ ┣📜 value_crop_consumed_lcu.csv
    ┃ ┣📜 value_eggs_consumed_lcu.csv
    ┃ ┗📜 ... 
    ┗📂 gender_control
        ┃
        ┣📂 gender_control_amounts_consumed
        ┃
        ┣📂 gender_control_income
        ┃
        ┗📂 ...