Overall Framework
Our DARG framework we first construct the reasoning graphs for data points in given benchmarks using LLMs (e.g., computational reasoning graphs for solving a math problem are shown in the following figure). Next, we perform fine-grained graph perturbations based on various dimensions of the reasoning graph. Afterwards, we convert the reasoning graph back into the description that adapts the linguistic diversity as the original data. In order to ensure the correctness of the reasoning graph construction and graph-to-text generation, we use tool-augmented LLMs to verify the quality of reasoning graphs and generated text to produce valid test examples.