Getting Started

Prerequisites

Activate the conda environment:
```
conda activate ts
```

Sample Data

Download the example structures Here

We are interested in expanding the dataset from 5 given mother structures. First, each mother structure needs to be optimized to serve as a good starting point for GSM. This can be achieved by performing geometry optimization using GFN2-xTB. Ensure that all the prepared mother structures are in specific input_path and saved in .xyz file format.

Your input structures should be organized as follows:

mother_strucs/
├── Cl7138/
│   └── ClGeom-m7138-i1-c1-opt/
│       └── struc.xyz
├── Cl7164/
│   └── ClGeom-m7164-i1-c1-opt/
│       └── struc.xyz
├── ...

Command Reference

1. Sampling the reaction pathways

Basic command structure:

$ dand sample [-h] -i INPUT_PATH -o OUTPUT_PATH -n MAX_WORKERS

Parameters:

Parameter	Description
`-h`, `--help`	Displays the help message and exits the program.
`-i`, `--input_path`	Specifies the directory where the mother structures are stored.
`-o`, `--output_path`	Specifies the directory where Dandelion output will be saved.
`-n`, `--max_workers`	Specifies the number of worker processes for parallel execution.

Assuming your mother structures are saved as ‘struc.xyz’ in /home/pekora/example/mother_strucs, you can initiate sampling process with the following command:

$ dand sample -i /home/pekora/example/mother_strucs -o /home/pekora/example/outputs -n 30

Output files will be stored in /home/pekora/example/outputs.

The following six steps will be executed automatically:

Dandelion first generates possible driving coordinates(seeds) from each mother structure.

╔════════════════════════════════════════════════════════════════════╗
║                          1. Creating GSM                           ║
╚════════════════════════════════════════════════════════════════════╝

Arguments provided:
  input_path: /home/pekora/example/mother_strucs
  output_path: /home/pekora/example/outputs/1_gsm
  maxbreak: 2
  maxform: 2
  maxchange: 3
  minbreak: 0
  minform: 0
  minchange: 1
  ignore_single_change: True
  equiv_Hs: False

280 Seeds were generated from ClGeom-m7138-i1-c1-opt
276 Seeds were generated from ClGeom-m7164-i1-c1-opt
...

Creating GSM finished!

Based on generated GSM jobs, GSM can be executed. Some jobs can fail to converge or reach the product within the predefined maximum number of nodes and should be filtered out.

╔════════════════════════════════════════════════════════════════════╗
║                           2. Running GSM                           ║
╚════════════════════════════════════════════════════════════════════╝

Arguments provided:
  input_path: /home/pekora/example/output/1_gsm
  max_workers: 30

GSM on seeds: 100%|████████████████████████| 1406/1406 [4:57:13<00:00]
GSM finished!

Dandelion excludes some trivial pathways with strictly uphill energy trajectories, negligible energy variations, unfeasible structures, or those that are repetitive.

╔════════════════════════════════════════════════════════════════════╗
║                          3. Filtering GSM                          ║
╚════════════════════════════════════════════════════════════════════╝

Arguments provided:
  input_path: /home/pekora/example/outputs/1_gsm
  output_path: /home/pekora/example/outputs/2_gsm_filtered
  barrier_min: 5
  barrier_max: 200
  delta_e_min: 5

◤◢◤◢◤◢◤◢◤◢◤◢◤◢◤◢◤◢◤◢◤◢◤◢◤◢◤◢◤◢◤◢◤◢◤◢
   mother: ClGeom-m7138-i1-c1-opt
Initial seeds:                   280
GSM success reactions:           115
Profile filtered reactions:       41
Structure filtered reactions:     38
Unique reactions:                 28
...

Filtering GSM finished!

Using the outputs of gsm, Dandelion runs climbing-image NEB. NEB optimizes the reaction pathways, relaxing the force of the bands of structures.

╔════════════════════════════════════════════════════════════════════╗
║                           4. Running NEB                           ║
╚════════════════════════════════════════════════════════════════════╝

Arguments provided:
  input_path: /home/pekora/example/output/2_gsm_filtered
  output_path: /home/pekora/example/output/3_neb
  max_workers: 30
  n_images: 10
  neb_fmax: 0.5
  cineb_fmax: 0.05
  steps: 500

Seeds: 100%|███████████████████████████████████| 144/144 [04:34<00:00]
xTB-NEB completed!

In the fifth step, data is filtered based on specific criteria. For example, non-convergent reactions and those without a single negative eigenvalue in their Hessian matrices are excluded. This ensures focus on structures near valid paths. NEB results are further refined to avoid redundant structural data. A new band iteration is chosen only when the cumulative Fmax exceeds 0.1 eV/Å, saving DFT calculations and preventing overfitting to narrow PES regions.

╔════════════════════════════════════════════════════════════════════╗
║                          5. Filtering NEB                          ║
╚════════════════════════════════════════════════════════════════════╝

Arguments provided:
  input_path: /home/pekora/example/output/3_neb
  output_path: /home/pekora/example/output/4_neb_filtered

Mothers: 100%|█████████████████████████████████████| 5/5 [00:20<00:00]

40/53 rxns were saved to /home/pekora/example/output/4_neb_filtered/reactions.json
Filtering NEB finished!

The sixth step is to compile samples in Hierarchical Data Format:

╔════════════════════════════════════════════════════════════════════╗
║                        6. Compiling samples                        ║
╚════════════════════════════════════════════════════════════════════╝

Arguments provided:
  input_path: /home/pekora/example/output/4_neb_filtered/reactions.json
  output_path: /home/pekora/example/output/xtb.h5
  fmax_threshold: 0.1

Compiling reactions: 100%|███████████████████████| 40/40 [00:03<00:00]
Compiling finished!

And there will be a newly generated file in your output path, the xtb.h5 file.

2. Refining Energy and Force Labels

Structure sampling is now finished. The next step is to refine the energy and force labels at the DFT level.

$ dand refine [-h] -i INPUT_PATH -n MAX_WORKERS --orca ORCA

Parameter	Description
`-h`, `--help`	Displays the help message and exits the program.
`-i`, `--input_path`	Specifies the path of working directory containing `xtb.h5`.
`-n`, `--max_workers`	Specifies the number of worker processes for parallel execution.
`--orca`	Specifies the path of the orca executable file

The ORCA path must point to the executable file.

If you enter like this:

$ dand refine -i /home/pekora/example/outputs -n 15 --orca /home/pekora/package/orca/orca_5_0_4/orca

The refinement process includes two steps:

In this phase, we use DFT calculations with Orca 5.0.4. The default setting uses wB97X functional and 6-31(d) basis set, but these settings can be adjusted as needed.

╔════════════════════════════════════════════════════════════════════╗
║                         7. Refining forces                         ║
╚════════════════════════════════════════════════════════════════════╝
Arguments provided:
  input_path: /home/pekora/example/output/xtb.h5
  output_path: /home/pekora/example/output/wb97x.db
  max_workers: 15
  orca: /home/pekora/package/orca/orca_5_0_4/orca

Restarting calculation from /home/pekora/example/output/wb97x.db
640 points are skipped.

Formulas: 100%|██████████████████████| 2/2 [85:50:01<00:00, ? hour/it]
wB97X calculation finished!

You can check your compiled database wb97x.db using ASE:

$ ase db wb97x.db
id|age|user   |formula |calculator|    energy|natoms| fmax|pbc|charge|   mass
 1| 5d|pekora|C4ClH4NO|orca      |-20266.198|    11|5.874|FFF| 0.000|117.532
 2| 5d|pekora|C4ClH4NO|orca      |-20269.074|    11|0.470|FFF| 0.000|117.532
 3| 5d|pekora|C4ClH4NO|orca      |-20268.333|    11|5.994|FFF| 0.000|117.532
 4| 5d|pekora|C4ClH4NO|orca      |-20268.047|    11|1.195|FFF| 0.000|117.532
...
Rows: 53842 (showing first 20)

Finally, it compiles our wb97x.db sample in Hierarchical Data Format :

╔════════════════════════════════════════════════════════════════════╗
║                     8. Compiling final samples                     ║
╚════════════════════════════════════════════════════════════════════╝
Arguments provided:
  input_path: /home/pekora/example/output/wb97x.db
  output_path: /home/pekora/example/output/wb97x.h5

Compiled successfully!

Checking Results

ase db file

You can check how to access the ase db file in the ase manual.

Launch visualization interface:
ase db wb97x.db -w

h5 file

You can check how to access the h5 file in the hdfgroup webpage

The data structure of h5 files can be easily visualized using VS Code with H5Web extension.

Continue: Modules