Constantine: modular, high-performance, zero-dependency cryptography stack for verifiable computation, proof systems and blockchain protocols.

[GPU] GPU / LLVM IR Elliptic curves implementation plan#465

Open
Opened 8/27/20241 commentsby mratsim
mratsim

This issue replaces #92 with a more concrete plan of action The current Constantine has good foundations to build and test LLVM IR primitives for x86, ARM, Nvidia, AMDGPU and all other ISAs that support add-with-carry after: - #452 - #453 - #456 - #464 The ultimate goal is to port multi-scalar multiplication to Nvidia GPUs, this will require the following steps: ## Field arithmetic Currently only add/sub/mul are implemented, we need to implement all primitives needed for shortweierstrass jacobian doubling/addition and shortweierstrass projective doubling/addition. There is no need initially for primitives for serialization/deserialization/validation as those can be done on CPU and copied to GPU, for example sqrt is not on the critical path https://github.com/mratsim/constantine/blob/65147ed815d96fa94a05d307c1d9980877b7d0e8/constantine/math/elliptic/ec_shortweierstrass_affine.nim#L105-L128 Meaning at the very least: - ccopy - square - neg - cneg I think `square` can be implemented as `nsqr` with "virtual" signature `proc nsqr(r: var Fp, a: Fp, count: int)` that loops from the count to 0 excluded as overhead is minimal and it will be helpful for primitives that need square_repeated (deserialization). ## Naming see this https://github.com/mratsim/constantine/blob/65147ed815d96fa94a05d307c1d9980877b7d0e8/constantine/math_compiler/README.md > Naming convention for internal procedures: > - _big_add_u64x4 > - _finalsub_mayo_u64x4 -> final substraction may overflow > - _finalsub_noo_u64x4 -> final sub no overflow > - _mod_add_u64x4 > - _mod_add2x_u64x8 -> FpDbl backend > - _mty_mulur_u64x4b2 -> unreduced Montgomery multiplication (unreduced result valid iff 2 spare bits) > - _mty_mul_u64x4b1 -> reduced Montgomery multiplication (result valid iff at least 1 spare bit) > - _mty_mul_u64x4 -> reduced Montgomery multiplication > - _mty_nsqrur_u64x4b2 -> unreduced square n times > - _mty_nsqr_u64x4b1 -> reduced square n times > - _mty_sqr_u64x4 -> square > - _mty_red_u64x4 -> reduction u64x4 <- u64x8 > - _pmp_red_mayo_u64x4 -> Pseudo-Mersenne Prime partial reduction may overflow (secp256k1) > - _pmp_red_noo_u64x4 -> Pseudo-Mersenne Prime partial reduction no overflow > - _secp256k1_red -> special reduction > - _fp2x_sqr2x_u64x4 -> Fp2 complex, Fp -> FpDbl lazy reduced squaring > - _fp2g_sqr2x_u64x4 -> Fp2 generic/non-complex (do we pass the mul-non-residue as parameter?) > - _fp2_sqr_u64x4 -> Fp2 (pass the mul-by-non-residue function as parameter) > - _fp4o2_mulnr1pi_u64x4 -> Fp4 over Fp2 mul with (1+i) non-residue optimization > - _fp4o2_mulbynr_u64x4 > - _fp12_add_u64x4 > - _fp12o4o2_mul_u64x4 -> Fp12 over Fp4 over Fp2 > - _ecg1swjac_adda0_u64x4 -> Shortweierstrass G1 jacobian addition a=0 > - _ecg1swjac_add_u64x4_var -> Shortweierstrass G1 jacobian vartime addition > - _ectwprj_add_u64x4 -> Twisted Edwards Projective addition ## Elliptic curve arithmetic Similar to the field descriptor, an EcDescriptor will likely be needed https://github.com/mratsim/constantine/blob/65147ed815d96fa94a05d307c1d9980877b7d0e8/constantine/math_compiler/ir.nim#L142-L160 The following algebraic object configurations in SageMath and Nim Macro and the curve-specific calls will help guide what the descriptor should hold: https://github.com/mratsim/constantine/blob/65147ed815d96fa94a05d307c1d9980877b7d0e8/sage/curves.sage#L123-L137 https://github.com/mratsim/constantine/blob/65147ed815d96fa94a05d307c1d9980877b7d0e8/constantine/named/config_fields_and_curves.nim#L252-L270 Jacobian coordinates are probably slightly easier as they don't depends on the curve equation parameters (a, b) in y² = x³ + ax + b. It is fine to focus on curves with a == 0 as all curves used for Ethereum (secp256k1, BN254_Snarks and BLS12_381) are in that case. When there is a dependency on the curve equation, I am undecided yet whether to materialize b as a global in LLVM IR and let the optimizer do its job or doing directly in the code generator. Addition, mixed addition and doublings are the priority. Due to GPU constraint, will start with the constant-time versions.

AI Analysis

This issue appears to be discussing a feature request or bug report related to the repository. Based on the content, it seems to be still under discussion. The issue was opened by mratsim and has received 1 comments.

Add a comment
Comment form would go here