Rashid, M ORCID: https://orcid.org/0000-0001-5852-1296, Khan, S ORCID: https://orcid.org/0000-0001-8342-6928, Sonbul, OS ORCID: https://orcid.org/0000-0003-1029-7568 and Hwang, SO ORCID: https://orcid.org/0000-0003-4240-6255 (2024) A Flexible and Parallel Hardware Accelerator for Forward and Inverse Number Theoretic Transform. IEEE Access, 12. pp. 181351-181361.
|
Published Version
Available under License Creative Commons Attribution. Download (1MB) | Preview |
Abstract
This paper demonstrates an efficient and flexible hardware accelerator for polynomial multiplication using number theoretic transform (NTT). The proposed architecture considers flexibility and performance requirements at the same time. Flexibility is achieved by computing the following three operations: (i) computing only the forward NTT operation using a Cooley-Tukey butterfly unit (CT-BFU), (ii) computing only the inverse NTT operation using a Gentleman-Sande butterfly unit (GS-BFU), and (iii) computing both forward and inverse NTT operations simultaneously. The performance is enhanced by supporting parallelism between one CT-BFU unit, one GS-BFU unit, and four Block-RAMs. Moreover, a dedicated control unit is implemented to ensure a flexible and parallel FP-NTT design. A throughput/area metric is used for evaluation of performance for the proposed design. The implementation results are presented after post-placement and route on various Xilinx field-programmable gate array (FPGA) devices. Specifically, on Virtex-7 FPGA, FP-NTT operates at a frequency of 250MHz, utilising 1026 slices, and requires 4.61μs and 5.12μs for forward and inverse NTT computations, respectively. The calculated throughput/area is 211.41 and 190.36 for forward and inverse computations, respectively. A comparison with state-of-the-art designs emphasises the suitability of the FP-NTT accelerator for high-speed cryptographic applications.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.