๐ฌ Loading Configuration and Data Files in Python Projects¶
This section provides instructions on how to load configuration files (such as config.yml
) and data files (such as amazon_alexa.tsv
) in Python projects. The instructions are applicable when the configuration files are located in a config
directory and the data files are located in a data
directory at the root level of the project. You may want to check the setup.py section to understand the importance of this section.
Your files could be structured as below:
.
โโโ config
โย ย โโโ config.yml
โย ย โโโ __init__.py
โโโ data
โย ย โโโ text
โย ย โโโ amazon_alexa.tsv
โโโ helpers
โย ย โโโ helper.py
โย ย โโโ __init__.py
โโโ notebooks
โย ย โโโ test_code.ipynb
โโโ setup.py
โโโ src
โโโ test_code.py
๐ช Loading Configuration Files¶
โ๏ธ Loading using importlib.resources
¶
Configuration files (like config.yml
) can be loaded as a package resource using the importlib.resources
module in Python.
Here's a sample code snippet:
import yaml
import importlib.resources
def read_config():
# Read the resource as text
resource_string = importlib.resources.read_text('config', 'config.yml')
# Load the yaml file
config = yaml.safe_load(resource_string)
return config
In this code, 'config'
is the name of the package where the config.yml
file is located, and 'config.yml'
is the name of the resource. It should be a file in the config
package.
To load the configuration file as a package resource, you need to include it in your setup.py
file. Here's how you can do it:
from setuptools import find_packages, setup
setup(
name="Test-Project",
version="1.0",
packages=find_packages(),
# added these lines
package_data={'': ['config/*.yml']},
include_package_data=True,
)
In this code, package_data={'': ['config/*.yml']}
tells setuptools to include all .yml
files in the config
directory in any package, and include_package_data=True
tells setuptools
to include any data files specified in package_data
.
Caution
This way you are telling python that your config.yml
should be treated as a part of the project package. When you run the code for the first time, in order to optimize loading for the second time, python will cache this file. Now, if you change some values in the config.yml
file, and re-run the code, you might get the previous values that were set in the config.yml file. This is not what you intended to do but python does it inherently to save loading the next time. It is strictly adviced not to use this method for loading config files, you may use it for those files which are not expected to change much. Even re-running pipenv install -e .
command did not overwrite the cached config.yml file. You should choose this method for files that suites your purpose.
๐งฒ Loading as a Regular File¶
Alternatively, you can load the configuration file as a regular file without including it as a package resource.
Here's how you can do it:
import yaml
import os
def read_config():
# Get the current working directory
current_dir = os.getcwd()
# Get the config file path
config_file_path = os.path.join(current_dir, '..', 'config', 'config.yml')
# Resolve the relative path to an absolute path
config_file_path = os.path.abspath(config_file_path)
# Load the yaml file
with open(config_file_path, 'r') as file:
config = yaml.safe_load(file)
return config
In this code, os.getcwd()
gets the current working directory, os.path.join(current_dir, '..', 'config', 'config.yml')
constructs the path to the config file relative to the current working directory, os.path.abspath(config_file_path)
resolves the relative path to an absolute path, and yaml.safe_load(file)
loads the yaml file.
๐ฃ Loading with pkg_resources (this will soon get deprecated)¶
You can also load the configuration file using pkg_resources
, but please note that pkg_resources
is being phased out in favor of importlib.resources
. Here's how you can do it:
import pkg_resources
import yaml
def read_config():
config_path = pkg_resources.resource_filename('config', 'config.yml')
with open(config_path, 'r') as file:
config = yaml.safe_load(file)
return config
๐ Loading Data Files¶
Data files (like amazon_alexa.tsv
) can be loaded using the pandas library in Python. Here's a sample code snippet:
import pandas as pd
import os
def read_data():
# Get the current working directory
current_dir = os.getcwd()
# Get the data file path
data_file_path = os.path.join(current_dir, '..', 'data', 'text', 'amazon_alexa.tsv')
# Resolve the relative path to an absolute path
data_file_path = os.path.abspath(data_file_path)
# Read the data file
data = pd.read_csv(data_file_path, sep='\t')
return data
In this code, os.getcwd()
gets the current working directory, os.path.join(current_dir, '..', 'data', 'text', 'amazon_alexa.tsv')
constructs the path to the data file relative to the current working directory, os.path.abspath(data_file_path)
resolves the relative path to an absolute path, and pd.read_csv(data_file_path, sep='\t')
reads the data file into a pandas DataFrame.
Please replace '..'
, 'data'
, 'text'
, and 'amazon_alexa.tsv'
with the actual relative path from your script or notebook to amazon_alexa.tsv if it's different in your project.
๐งช Running Python Scripts from the Root of the Project¶
๐งญ Loading config file¶
When you run a Python script from the root of the project (for example, python src/test_code.py
), the current working directory is the root of the project, not the directory where the script is located. In this case, you can use the __file__
variable to get the directory of the script, and then construct the path to the configuration or data file relative to the script directory. Here's how you can do it:
import yaml
import os
def read_config():
# Get the directory of the script
script_dir = os.path.dirname(os.path.abspath(__file__))
# Get the config file path
config_file_path = os.path.join(script_dir, '..', 'config', 'config.yml')
# Load the yaml file
with open(config_file_path, 'r') as file:
config = yaml.safe_load(file)
return config
In this code, os.path.dirname(os.path.abspath(__file__))
gets the directory of the script, os.path.join(script_dir, '..', 'config', 'config.yml')
constructs the path to the config file relative to the script directory, and yaml.safe_load(file)
loads the yaml file.
๐๏ธ Loading data files¶
import pandas as pd
import os
def read_data():
# Get the directory of the script
script_dir = os.path.dirname(os.path.abspath(__file__))
# Get the data file path
data_file_path = os.path.join(script_dir, '..', 'data', 'text', 'amazon_alexa.tsv')
# Read the data file
data = pd.read_csv(data_file_path, sep='\t')
return data
In this code, os.path.dirname(os.path.abspath(__file__))
gets the directory of the script, os.path.join(script_dir, '..', 'data', 'text', 'amazon_alexa.tsv')
constructs the path to the data file relative to the script directory, and pd.read_csv(data_file_path, sep='\t')
reads the data file into a pandas DataFrame.