LCODE 3D documentation

If you are interested in not just what LCODE does, but also how does it do it, and you are reading this documentation for the first time, the recommended procedure is to glance over the first chapter and then dive straight into the source code switching back and forth using [source] and [docs] cross-links. Trust us, at 500 lines of code it’s even shorter than the docs, and if you get stuck, you always have a link back to the explanations.

Problem

Objective

LCODE 3D calculates the plasma response to an ultrarelativistic charged beam.

Simulating particle beams is definitely planned for the future.

Geometry

Quasistatic approximation is employed, with time-space coordinate \(\xi = z - ct\).

From the perspective of the beam, \(\xi\) is a space coordinate. The head of the beam corresponds to \(\xi = 0\), with its tail extending into lower, negative values of \(\xi\).

From the perspective of a plasma layer, penetrated by the beam, \(\xi\) is a time coordinate. At \(\xi = 0\) the plasma layer is unperturbed; as the beam passes through it, \(\xi\) values decrease.

The remaining two coordinates \(x, y\) are way more boring [Simulation window and grids].

The problem geometry is thus \(\xi, x, y\).

Beam

The beam is currently simulated as a charge density function \(\rho_b(\xi, x, y)\), and not with particles [Beam].

Plasma

Only the electron motion is simulated, the ions are represented with a static backround charge density [Background ions].

\[ \begin{align}\begin{aligned}\frac{d \vec{p}}{d \xi} &= -\frac{q}{1-v_z} \left( \vec{E} + \left[ \vec{v} \times \vec{B} \right]\right)\\\frac{d x}{d \xi} &= -\frac{v_x}{1-v_z}\\\frac{d y}{d \xi} &= -\frac{v_y}{1-v_z}\\\vec{v} &= \frac{\vec{p}}{\sqrt{M^2+p^2}}\end{aligned}\end{align} \]

The plasma is simulated using a PIC method with an optional twist: only a ‘coarse’ grid of plasma (think 1 particle per 9 cells) is stored and evolved, while ‘fine’ particles (think 4 per cell) are bilinearly interpolated from it during the deposition [Coarse and fine plasma]. The plasma is effectively made not from independent particles, but from a fabric of ‘fine’ TSC-2D shaped particles.

Fields

Both the plasma movement and the ‘external’ beam contribute to the charge density/currents \(\rho, j_x, j_y, j_z\) [Deposition].

The fields are calculated from their derivatives. Theoretically, the equations are

\[ \begin{align}\begin{aligned}\Delta_\perp E_z &= \frac{\partial j_x}{\partial x} - \frac{\partial j_y}{\partial y}\\\Delta_\perp B_z &= \frac{\partial j_x}{\partial y} - \frac{\partial j_y}{\partial x}\\\Delta_\perp E_x &= \frac{\partial \rho}{\partial x} - \frac{\partial j_x}{\partial \xi}\\\Delta_\perp E_y &= \frac{\partial \rho}{\partial y} - \frac{\partial j_y}{\partial \xi}\\\Delta_\perp B_x &= \frac{\partial j_y}{\partial \xi} - \frac{\partial j_z}{\partial y}\\\Delta_\perp B_y &= \frac{\partial j_z}{\partial x} - \frac{\partial j_x}{\partial \xi}\\\Delta_\perp &= \frac{\partial^2}{\partial x^2} + \frac{\partial^2}{\partial y^2}\\\rho &= \rho_e + \rho_i + \rho_b\\j &= j_e + j_i + j_b\end{aligned}\end{align} \]

where indices \(e, i, b\) represent electrons, ions and beam respectively.

Note

In reality, things are not that simple.

\(E_z\) and \(B_z\) calculations is relatively straightforward and boils down to solving the Laplace and Neumann equation with Dirichlet boundary conditions respectively.

The transverse fields are actually obtained by solving the Helmholtz equation with mixed boundary conditions, and then doing some more magic on top of that (so refer to Ez, Ex, Ey, Bx, By and Bz for the equations that we really solve).

Step

The \(\xi\)-cycle idea consists of looping these three actions:

  • depositing plasma particles (and adding the beam density/current),
  • calculating the new fields and
  • moving plasma particles,

executed several times for each step in a predictor-corrector scheme [Looping in xi].

Trickery index

Geometry

Todo

DOCS: write a separate page on the topic or link somewhere from the overview.

Numerical stability

Simplifications

  • Reflection boundary is closer than the field calculation boundary to simplify boundary handling.

Simulation window and grids

Fields and densities grid

_images/cell_grid.png

The grid on which the fields and densities are calculated.

Fields and densities (\(ro\), \(j\)) are calculated on a config.grid_steps x config.grid_steps-sized grid. This number must be odd in order to have an on-axis cell for on-axis diagnostics.

config_example.grid_steps = 641

Transverse grid size in cells

config_example.grid_step_size = 0.025

Transverse grid step size in plasma units

The fields are calculated at the centers of the grid cells.

Note

The muddy concept of ‘window width’ is no longer referenced in LCODE 3D to ease up the inevitable confusion about what it actually means and how it relates to config.grid_step_size. Please refrain from thinking in these terms and head over to the following subsection for more useful ones.

Reflect and ‘plasma’ boundaries

_images/boundaries.png

The reflect and plasma boundaries illustrated.

config_example.reflect_padding_steps = 5

Plasma reflection <-> field calculation boundaries

config_example.plasma_padding_steps = 10

Plasma placement <-> field calculation boundaries

The plasma partlcles are not allowed to enter the outermost cells in order to simplify the treatment of boundary regions during interpolation, deposition and field calculation [Zero special boundary treatment]. In order to achieve that, the reflection boundary is placed config.reflection_padding_steps steps deeper into the simulation area.

Note

While choosing the value for this parameter, one should take into account the particle size. Even a single fine [Coarse/fine plasma] particle is three cells wide in deposition, [Plasma], so the gap width should be wide enough to cover the entire coarse particle cloud size. Failure to meet this condition may result in a memory violation error. This could be solved by introducing fine particle reflection, but that’d be more resource-intensive.

Note that while it defines the area where plasma is allowed to be present, it must be larger than the area where the plasma is initially positioned. The size of the second area is controlled by the config.plasma_padding_steps, which puts a cut-off limit on the placement of both coarse and fine plasma particles.

Coarse and fine plasma grids

Finally, the plasma boundary hosts two grids of particles, the coarse grid and the fine grid [more info in Coarse/fine plasma approach], which are coarser and finer than the field grid respectively.

Units

Todo

DOCS (Lotov)

Installation

Common

LCODE 3D requires an NVIDIA GPU with CUDA support. CUDA Compute Capability 6+ is strongly recommended for accelerated atomic operations support.

On the Python front, it needs Python 3.6+ and the packages listed in requirements.txt:

cupy>=5.1
matplotlib>=1.4
numba>=0.41
numpy>=1.8
scipy>=0.14

Most of them are extremely popular and the only one that may be slightly problematic to obtain due to its ‘teen age’ is cupy.

Linux, distribution Python

All the dependencies, except for, probably, cupy, should be easily installable with your package manager.

Install NVIDIA drivers and CUDA packages according to your distribution documentation.

Install cupy according to the official installation guide, unless 5.1 or newer is already packaged by your distribution.

Linux, Anaconda

All dependencies, including cupy, are available from the official conda channels.

conda install cupy

or, if you are a miniconda user or a thorough kind of person,

while read req; do conda install --yes $req; done < requirements.txt

You probably still need to install NVIDIA drivers and CUDA packages, follow your distribution documentation.

Linux, NixOS

nix-shell

In case it’s not enough, consider either switching to NVIDIA driver, or simply adding

boot.kernelPackages = pkgs.linuxPackages;  # explicitly require stable kernel
boot.kernelModules = [ "nvidia-uvm" ];  # should bring just enough kernel support for CUDA userspace

to /etc/nixos/configuration.nix and rebuilding the system.

Linux, locked-down environment

If want to, e.g., run LCODE 3D on a cluster without permissions to install software the proper way, please contact the administrator first and refer them to this page.

If you are sure about CUDA support and you absolutely want to install the dependencies yourself, then make sure you have Python 3.6+ and try to install cupy using the official installation guide. If you succeed, install all the other missing requirements with pip’s ‘User Installs’ feature. You mileage may vary. You’re responsible for the cleanup.

Windows, Anaconda

If cupy 5.1 or newer has already hit the channels, you’re in luck. Just conda install cupy, and you should be good to go.

If https://anaconda.org/anaconda/cupy still shows ‘win-64’ at v4.1.0, please accept our condolences and proceed to the next subsection.

Windows, the hard way

  • Ensure that you have Python 3.6+.
  • Free up some 10 GiB of disk space or more.
  • Verify that you’re on good terms with the deity of your choice.
  • Install Visual Studio (Community Edition is fine) with C++ support.
  • Install NVIDIA CUDA Toolkit.
  • Follow the cupy installation guide.
  • Prefer installing precompiled packages, but you might also try installing from source.
  • Verify that it works by executing import cupy; (cupy.asarray([2])**2).get() in Python shell.
  • Install the other dependencies.

Known to work

As of early 2019, LCODE 3D is developed and known to work under:

  • NixOS 19.03 “Koi”
  • Debian 10 “Buster” + Anaconda 2019.03
  • Windows 10 1903 + Anaconda 2019.03

Running and embedding

Only two files are required

LCODE 3D is a single-file module and you only need two files to execute it: lcode.py and config.py.

Installing LCODE into PYTHONPATH with the likes of pip install . is possible, but is not officially supported.

Configuration

LCODE 3D is configured by placing a file config.py into the current working directory. An example is provided as config_example.py.

The file gets imported by the standard Python importing mechanism, the resulting module is passed around internally as config.

One can use all the features of Python inside the configuration file, from arithmetic expressions and functions to other modules and metaprogramming.

Execution

python3 lcode.py, python lcode.py or ./lcode.py

Todo

CODE: embedding

Ez

Equations

We want to solve

\[\Delta_\perp E_z = \frac{\partial j_x}{\partial x} - \frac{\partial j_y}{\partial y}\]

with Dirichlet (zero) boundary conditions.

Method

The algorithm can be succinctly written as iDST2D(dirichlet_matrix * DST2D(RHS)), where DST2D and iDST2D are Type-1 Forward and Inverse Discrete Sine 2D Trasforms respectively, RHS is the right-hand side of the equiation above, and dirichlet_matrix is a ‘magical’ matrix that does all the work.

lcode.dirichlet_matrix(grid_steps, grid_step_size)[source]

Calculate a magical matrix that solves the Laplace equation if you elementwise-multiply the RHS by it “in DST-space”. See Samarskiy-Nikolaev, p. 187.

In addition to the magic values, it also hosts the DST normalization multiplier.

Todo

DOCS: expand with method description (Kargapolov, Shalimova)

lcode.calculate_Ez(config, jx, jy)[source]

Calculate Ez as iDST2D(dirichlet_matrix * DST2D(djx/dx + djy/dy)).

Note that the outer cells do not participate in the calculations, and the result is simply padded with zeroes in the end.

DST2D

lcode.dst2d(a)[source]

Calculate DST-Type1-2D, jury-rigged from anti-symmetrically-padded rFFT.

As cupy currently ships no readily available function for calculating the DST2D on the GPU, we roll out our own FFT-based implementation.

We don’t need to make a separate iDST2D function as (for Type-1) it matches DST2D up to the normalization multiplier, which is taken into account in dirichlet_matrix().

Ex, Ey, Bx, By

Theory

We want to solve

\[ \begin{align}\begin{aligned}\Delta_\perp E_x &= \frac{\partial \rho}{\partial x} - \frac{\partial j_x}{\partial \xi}\\\Delta_\perp E_y &= \frac{\partial \rho}{\partial y} - \frac{\partial j_y}{\partial \xi}\\\Delta_\perp B_x &= \frac{\partial j_y}{\partial \xi} - \frac{\partial j_z}{\partial y}\\\Delta_\perp B_y &= \frac{\partial j_z}{\partial x} - \frac{\partial j_x}{\partial \xi}\end{aligned}\end{align} \]

with mixed boundary conditions.

Todo

DOCS: specify the boundary conditions.

Unfortunately, what we actually solve is no less than three steps away from these.

Helmholtz equations

The harsh reality of numerical stability forces us to solve these Helmholtz equations instead:

\[ \begin{align}\begin{aligned}\Delta_\perp E_x - E_x &= \frac{\partial \rho}{\partial x} - \frac{\partial j_x}{\partial \xi} - E_x\\\Delta_\perp E_y - E_y &= \frac{\partial \rho}{\partial y} - \frac{\partial j_y}{\partial \xi} - E_y\\\Delta_\perp B_x - B_x &= \frac{\partial j_y}{\partial \xi} - \frac{\partial j_z}{\partial y} - B_x\\\Delta_\perp B_y - B_y &= \frac{\partial j_z}{\partial x} - \frac{\partial j_x}{\partial \xi} - B_y\end{aligned}\end{align} \]

Note

The behaviour is actually configurable with config.field_solver_subtraction_trick (what a mouthful). 0 or False corresponds to Laplace equation, and while any floating-point values or whole matrices of them should be accepted, it’s recommended to simply use 1 or True instead.

config_example.field_solver_subtraction_trick = 1

0 for Laplace eqn., Helmholtz otherwise

Method

The algorithm can be succinctly written as iMIX2D(mixed_matrix * MIX2D(RHS)), where MIX2D and iMIX2D are Type-1 Forward and Inverse Discrete Trasforms, Sine in one direction and Cosine in the other. RHS is the right-hand side of the equiation above, and dirichlet_matrix is a ‘magical’ matrix that does all the work.

lcode.mixed_matrix(grid_steps, grid_step_size, subtraction_trick)[source]

Calculate a magical matrix that solves the Helmholtz or Laplace equation (subtraction_trick=True and subtraction_trick=False correspondingly) if you elementwise-multiply the RHS by it “in DST-DCT-transformed-space”. See Samarskiy-Nikolaev, p. 189 and around.

Todo

DOCS: expand with method description (Kargapolov, Shalimova)

lcode.calculate_Ex_Ey_Bx_By(config, Ex_avg, Ey_avg, Bx_avg, By_avg, beam_ro, ro, jx, jy, jz, jx_prev, jy_prev)[source]

Calculate transverse fields as iDST-DCT(mixed_matrix * DST-DCT(RHS.T)).T, with and without transposition depending on the field component.

Note that some outer cells do not participate in the calculations, and the result is simply padded with zeroes in the end. We don’t define separate functions for separate boundary condition types and simply transpose the input and output data.

DST-DCT Hybrid

lcode.mix2d(a)[source]

Calculate a DST-DCT-hybrid transform (DST in first direction, DCT in second one), jury-rigged from padded rFFT (anti-symmetrically in first direction, symmetrically in second direction).

As cupy currently ships no readily available function for calculating even 1D DST/DCT on the GPU, we, once again, roll out our own FFT-based implementation.

We don’t need a separate function for the inverse transform, as it matches the forward one up to the normalization multiplier, which is taken into account in mixed_matrix().

Variant B

But wait, the complications don’t stop here.

While we do have a succesfully implemented \((\Delta_\perp - 1)\) inverse operator, there’s still an open question of supplying an unknown value to the RHS.

The naive version (internally known as ‘Variant B’) is to pass the best known substitute to date, i.e. previous layer fields at the predictor phase and the averaged fields at the corrector phase. \(\xi\)-derivatives are taken at half-steps, transverse derivatives are averaged at half-steps or taken from the previous layer if not available.

\[ \begin{align}\begin{aligned}(\Delta_\perp - 1) E_x^{next} &= \frac{\partial \rho^{prev}}{\partial x} - (\frac{\partial j_x}{\partial \xi})^{\mathrm{halfstep}} - E_x^{avg}\\(\Delta_\perp - 1) E_y^{next} &= \frac{\partial \rho^{prev}}{\partial y} - (\frac{\partial j_y}{\partial \xi})^{\mathrm{halfstep}} - E_y^{avg}\\(\Delta_\perp - 1) B_x^{next} &= (\frac{\partial j_y}{\partial \xi})^{\mathrm{halfstep}} - \frac{\partial j_z^{prev}}{\partial y} - B_x^{avg}\\(\Delta_\perp - 1) B_y^{next} &= \frac{\partial j_z^{prev}}{\partial x} - (\frac{\partial j_x}{\partial \xi})^{\mathrm{halfstep}} - B_y^{avg}\end{aligned}\end{align} \]

Variant A

The more correct version (known as ‘Variant A’) mutates the equations once again to take everything at half-steps:

\[ \begin{align}\begin{aligned}(\Delta_\perp - 1) E_x^{\mathrm{halfstep}} &= \frac{\partial \rho^{\mathrm{avg}}}{\partial x} - (\frac{\partial j_x}{\partial \xi})^{\mathrm{halfstep}} - E_x^{\mathrm{avg}}\\(\Delta_\perp - 1) E_y^{\mathrm{halfstep}} &= \frac{\partial \rho^{\mathrm{avg}}}{\partial y} - (\frac{\partial j_y}{\partial \xi})^{\mathrm{halfstep}} - E_y^{\mathrm{avg}}\\(\Delta_\perp - 1) B_x^{\mathrm{halfstep}} &= (\frac{\partial j_y}{\partial \xi})^{\mathrm{halfstep}} - \frac{\partial j_z^{\mathrm{avg}}}{\partial y} - B_x^{\mathrm{avg}}\\(\Delta_\perp - 1) B_y^{\mathrm{halfstep}} &= \frac{\partial j_z^{\mathrm{avg}}}{\partial x} - (\frac{\partial j_x}{\partial \xi})^{\mathrm{halfstep}} - B_y^{\mathrm{avg}}\end{aligned}\end{align} \]

and calculates the fields at next step in the following fashion: \(E_x^{next} = 2 E_x^{avg} - E_x^{prev}\), e.t.c.

Solving these is equivalent to solving Variant B equations with averaged fields, \(\rho\) and \(j_z\) and applying the above transformation to the result. See lcode.step() for the wrapping code that does that.

config_example.field_solver_variant_A = True

Use Variant A or Variant B for Ex, Ey, Bx, By

Bz

Equations

We want to solve

\[\Delta_\perp B_z = \frac{\partial j_x}{\partial y} - \frac{\partial j_y}{\partial x}\]

with Neumann boundary conditions (derivative = 0).

Method

The algorithm can be succinctly written as iDCT2D(neumann_matrix * DCT2D(RHS)), where DCT2D and iDCT2D are Type-1 Forward and Inverse Discrete Sine 2D Trasforms respectively, RHS is the right-hand side of the equiation above, and neumann_matrix is a ‘magical’ matrix that does all the work.

lcode.neumann_matrix(grid_steps, grid_step_size)[source]

Calculate a magical matrix that solves the Laplace equation if you elementwise-multiply the RHS by it “in DST-space”. See Samarskiy-Nikolaev, p. 187.

In addition to the magic values, it also hosts the DCT normalization multiplier.

Todo

DOCS: expand with method description (Kargapolov, Shalimova)

lcode.calculate_Bz(config, jx, jy)[source]

Calculate Bz as iDCT2D(dirichlet_matrix * DCT2D(djx/dy - djy/dx)).

Note that this time the outer cells do not participate in the calculations, so the RHS derivatives are padded with zeroes in the beginning.

DCT2D

lcode.dct2d(a)[source]

Calculate DCT-Type1-2D, jury-rigged from symmetrically-padded rFFT.

As cupy currently ships no readily available function for calculating the DCT2D on the GPU, we roll out our own FFT-based implementation.

We don’t need to make a separate iDCT2D function as (for Type-1) it matches DCT2D up to the normalization multiplier, which is taken into account in neumann_matrix().

Plasma

Characteristics

A plasma particle has these characteristics according to our model:

  • Coordinates \(x\) and \(y\), stored as x_init + x_offt and y_init + y_offt [Offset-coordinate separation].
  • Momenta \(p_x\), \(p_y\) and \(p_z\), stored as px, py and pz.
  • Charge \(q\), stored as q.
  • Mass \(m\), stored as m.

Shape

From the interpolation/deposition perspective, a plasma particle represents not a point in space, but a 2D Triangular-Shaped Cloud (TSC2D).

These clouds always (partially) cover an area the size of \(3x3\) cells: the one where their center lies and eight neighouring ones.

_images/tsc2d.png

Todo

DOCS: WRITE: write a nicer formula for the weights of each cell.

lcode.weights(x, y, grid_steps, grid_step_size)[source]

Calculate the indices of a cell corresponding to the coordinates, and the coefficients of interpolation and deposition for this cell and 8 surrounding cells. The weights correspond to 2D triangluar shaped cloud (TSC2D).

The same coefficients are used for both deposition of the particle characterictics onto the grid [Deposition] and interpolation of the fields in the particle center positions [Plasma pusher].

lcode.deposit9(a, i, j, val, wMP, w0P, wPP, wM0, w00, wP0, wMM, w0M, wPM)[source]

Deposit value into a cell and 8 surrounding cells (using weights output).

lcode.interp9(a, i, j, wMP, w0P, wPP, wM0, w00, wP0, wMM, w0M, wPM)[source]

Collect value from a cell and 8 surrounding cells (using weights output).

The concept is orthogonal to the coarse plasma particle shape [Coarse and fine plasma]. While a coarse particle may be considered to be a component of an elastic cloud of fine particles, each individial fine particle sports the same TSC2D shape.

Coarse and fine plasma

Concept

In order to increase stability and combat transverse grid noise, LCODE 3D utilises a dual plasma appoach.

_images/virtplasma.png

Positioning of the coarse and fine particles in dual plasma approach.

Coarse particles are the ones that get tracked throughout the program, and pushed by the pusher. There coarse plasma grid is many times more sparse than the fields grid, think \(\frac{1}{9}\) particles per cell.

config_example.plasma_coarseness = 3

Square root of the amount of cells per coarse particle

Fine particles only exist inside the deposition phase. There are several fine particles per cell, think \(4\) or more. Their characteristic values are neither stored or evolved; instead they are intepolated from the coarse particle grid as a part of the the deposition process (and they don’t ‘exist’ in any form outside of it).

config_example.plasma_fineness = 2

Square root of the amount of fine particles per cell

Initialization

lcode.make_coarse_plasma_grid(steps, step_size, coarseness=3)[source]

Create initial coarse plasma particles coordinates (a single 1D grid for both x and y).

lcode.make_fine_plasma_grid(steps, step_size, fineness=2)[source]

Create initial fine plasma particles coordinates (a single 1D grid for both x and y).

Avoids positioning particles at the cell edges and boundaries.

  • fineness=3 (and coarseness=2):

    +-----------+-----------+-----------+-----------+
    | .   .   . | .   .   . | .   .   . | .   .   . |
    |           |           |           |           |   . - fine particle
    | .   .   . | .   *   . | .   .   . | .   *   . |
    |           |           |           |           |   * - coarse+fine particle
    | .   .   . | .   .   . | .   .   . | .   .   . |
    +-----------+-----------+-----------+-----------+
    
  • fineness=2 (and coarseness=2):

    +-------+-------+-------+-------+-------+
    | .   . | .   . | .   . | .   . | .   . |           . - fine particle
    |       |   *   |       |   *   |       |
    | .   . | .   . | .   . | .   . | .   . |           * - coarse particle
    +-------+-------+-------+-------+-------+
    
lcode.make_plasma(steps, cell_size, coarseness=3, fineness=2)[source]

Make coarse plasma initial state arrays and the arrays needed to intepolate coarse plasma into fine plasma (virt_params).

Coarse is the one that will evolve and fine is the one to be bilinearly interpolated from the coarse one based on the initial positions (using 1 to 4 coarse plasma particles that initially were the closest).

Initializing coarse particles is pretty simple: coarse_x_init and coarse_y_init are broadcasted output of make_coarse_plasma_grid(). coarse_x_offt and coarse_y_offt are zeros and so are coarse_px, coarse_py and coarse_pz. coarse_m and coarse_q are constants divided by the factor of coarseness by fineness squared because fine particles represent smaller macroparticles.

Initializing fine particle boils down to calculating the interpolation coefficients (influence_prev and influence_next) and the indices of the coarse particles (indices_prev, indices_next) that the characteristics will be intepolated from.

influence_prev and influence_next are linear interpolation coefficients based on the initial closest coarse particle positioning. Note that these are constant and do not change in \(\xi\). The edges get special treatment later.

indices_next happens to be pretty much equal to np.searchsorted(coarse_grid, fine_grid) and indices_prev is basically indices_next - 1, except for the edges, where a fine particle can have less than four ‘parent’ coarse particles. For such ‘outer’ particles, existing coarse particles are used instead, so clipping the indices and fixing influence-arrays is carried out.

Note that these arrays are 1D for memory considerations [Memory considerations].

The function returns the coarse particles and virtparams: a GPUArrays instance that conveniently groups the fine-particle related arrays, which only matter during deposition, under a single name.

Coarse-to-fine interpolation

_images/virtplasma_moved.png
lcode.mix(coarse, A, B, C, D, pi, ni, pj, nj)[source]

Bilinearly interpolate fine plasma properties from four historically-neighbouring plasma particle property values:

B    D   #  y ^         A - bottom-left  neighbour, indices: pi, pj
   .     #    |         B - top-left     neighbour, indices: pi, nj
         #    +---->    C - bottom-right neighbour, indices: ni, pj
A    C   #         x    D - top-right    neighbour, indices: ni, nj

See the rest of the deposition and plasma creation for more info.

This is just a shorthand for the characteristic value mixing for internal use in coarse_to_fine.

_images/coarse_to_fine.png
lcode.coarse_to_fine(fi, fj, c_x_offt, c_y_offt, c_m, c_q, c_px, c_py, c_pz, virtplasma_smallness_factor, fine_grid, influence_prev, influence_next, indices_prev, indices_next)[source]

Bilinearly interpolate fine plasma properties from four historically-neighbouring plasma particle property values.

The internals are pretty straightforward once you wrap your head around the indexing.

A single fine particle with the indices [fi, fj] (in fine particles virt_params 1D arrays) is interpolated from four particles with indices [pi, pj], [pi, nj], [ni, pj], [ni, nj] (in coarse particles c_* arrays) and four weights A, B, C, D respectively. The weights are, in turn, composed as a products of values from influence_prev and influence_next arrays, indiced, once again, with [fi, fj]. It would be convenient to calculate them beforehand, but they are recalculated instead as a result of time-memory tradeoff [Memory considerations].

Finally, momenta, charge and mass are scaled according to the coarse-to-fine macrosity coefficient discussed above.

Alternative illustration

(Source code)

_images/coarse_and_fine_plasma-1.png

Deposition

Deposition operates on fine particles. Once the coarse-to-fine interpolation is out of the picture, there isn’t much left to discuss.

lcode.deposit_kernel(grid_steps, grid_step_size, virtplasma_smallness_factor, c_x_offt, c_y_offt, c_m, c_q, c_px, c_py, c_pz, fine_grid, influence_prev, influence_next, indices_prev, indices_next, out_ro, out_jx, out_jy, out_jz)[source]

Interpolate coarse plasma into fine plasma and deposit it on the charge density and current grids.

First, the fine particle characteristics are interpolated from the coarse ones. Then the total contribution of the particles to the density and the currents is calculated and, finally, deposited on a grid in a 3x3 cell square with i, j as its center according to the weights calculated by weights(). Finally, the ion background density is added to the resulting array.

The strange incantation at the top and the need to modify the output arrays instead of returning them are dictated by the fact that ihis is actually not a function, but a CUDA kernel (for more info, refer to CUDA kernels with numba.cuda). It is launched in parallel for each fine particle, determines its 2D index (fi, fj), interpolates its characteristics from coarse particles and proceeds to deposit it.

lcode.deposit(config, ro_initial, x_offt, y_offt, m, q, px, py, pz, virt_params)[source]

Interpolate coarse plasma into fine plasma and deposit it on the charge density and current grids. This is a convenience wrapper around the deposit_kernel CUDA kernel.

This function allocates the output arrays, unpacks the arguments from config and virt_params, calculates the kernel dispatch parameters (for more info, refer to CUDA kernels with numba.cuda), and launches the kernel.

Todo

DOCS: explain deposition contribution formula (Lotov)

Background ions

LCODE 3D currently simulates only the movement of plasma electrons. The ions are modelled as a constant charge density distribution component that is calculated from the inital electron placement during the initialization.

lcode.initial_deposition(config, x_offt, y_offt, px, py, pz, m, q, virt_params)[source]

Determine the background ion charge density by depositing the electrons with their initial parameters and negating the result.

For this initial deposition invocation, the ion density argument is specified as 0.

The result is stored as const.ro_initial and passed to every consequtive lcode.deposit() invocation.

Plasma pusher

Without fields

The coordinate-evolving equations of motion are as follows:

\[ \begin{align}\begin{aligned}\frac{d x}{d \xi} &= -\frac{v_x}{1-v_z}\\\frac{d y}{d \xi} &= -\frac{v_y}{1-v_z}\\\vec{v} &= \frac{\vec{p}}{\sqrt{M^2+p^2}}\end{aligned}\end{align} \]
lcode.move_estimate_wo_fields(config, m, x_init, y_init, prev_x_offt, prev_y_offt, px, py, pz)[source]

Move coarse plasma particles as if there were no fields. Also reflect the particles from +-reflect_boundary.

This is used at the beginning of the xi step to roughly estimate the half-step positions of the particles.

The reflection here flips the coordinate, but not the momenta components.

With fields

The coordinate-evolving equation of motion is as follows:

\[\frac{d \vec{p}}{d \xi} = -\frac{q}{1-v_z} \left( \vec{E} + \left[ \vec{v} \times \vec{B} \right]\right)\]

As the particle momentum is present at both sides of the equation (as \(p\) and \(v\) respectively), an iterative predictor-corrector scheme is employed.

The alternative is to use a symplectic solver that solves the resulting matrix equation (not mainlined at the moment, look for an alternative branch in t184256’s fork).

lcode.move_smart_kernel(xi_step_size, reflect_boundary, grid_step_size, grid_steps, ms, qs, x_init, y_init, prev_x_offt, prev_y_offt, estimated_x_offt, estimated_y_offt, prev_px, prev_py, prev_pz, Ex_avg, Ey_avg, Ez_avg, Bx_avg, By_avg, Bz_avg, new_x_offt, new_y_offt, new_px, new_py, new_pz)[source]

Update plasma particle coordinates and momenta according to the field values interpolated halfway between the previous plasma particle location and the the best estimation of its next location currently available to us. Also reflect the particles from +-reflect_boundary.

The function serves as the coarse particle loop, fusing together midpoint calculation, field interpolation with interp9() and particle movement for performance reasons.

The equations for half-step momentum are solved twice, with more precise momentum for the second time.

The particles coordinates are advanced using half-step momentum, and afterwards the momentum is advanced to the next step.

The reflection is more involved this time, affecting both the coordinates and the momenta.

Note that the reflected particle offsets are mixed with the positions, resulting in a possible float precision loss [Offset-coordinate separation]. This effect probably negligible at this point, as the particle had to travel at least several cell sizes at this point. The only place where the separation really matters is the (final) coordinate addition (x_offt += ... and y_offt += ...).

The strange incantation at the top and the need to modify the output arrays instead of returning them is dictated by the fact that ihis is actually not a function, but a CUDA kernel (for more info, refer to CUDA kernels with numba.cuda), It is launched in parallel for each coarse particle, determines its 1D index k, interpolates the fields at its position and proceeds to move and reflect it.

lcode.move_smart(config, m, q, x_init, y_init, x_prev_offt, y_prev_offt, estimated_x_offt, estimated_y_offt, px_prev, py_prev, pz_prev, Ex_avg, Ey_avg, Ez_avg, Bx_avg, By_avg, Bz_avg)[source]

Update plasma particle coordinates and momenta according to the field values interpolated halfway between the previous plasma particle location and the the best estimation of its next location currently available to us. This is a convenience wrapper around the move_smart_kernel CUDA kernel.

This function allocates the output arrays, unpacks the arguments from config calculates the kernel dispatch parameters (for more info, refer to CUDA kernels with numba.cuda), flattens the input and output array of particle characteristics (as the pusher does not care about the particle 2D indices) and launches the kernel.

Beam

The beam is currently simulated as a charge density function \(\rho_b(\xi, x, y)\), and not with particles.

In the future, there will certaintly be a way to define a beam with particles and simulate beam-plasma interaction both ways, but for now only simulating a plasma response to a rigid beam is possible.

config_example.beam(xi_i, x, y)[source]

The user should specify the beam charge density as a function in the configuration file.

xi_i is not the value of the \(\xi\) coordinate, but the step index. Please use something in the lines of xi = -xi_i * xi_step_size + some_offset, according to where exactly in \(\xi\) do you define the beam density slices [Integer xi steps].

x and y are numpy arrays, so one should use vectorized numpy operations to calculate the desired beam charge density, like numpy.exp(-numpy.sqrt(x**2 + y**2)).

The function should ultimately return an array with the same shape as x and y.

Todo

CODE: Simulate the beam with particles and evolve it according to the plasma response.

Looping in xi

Finally, here’s the function that binds it all together, and currently makes up half of LCODE 3D API.

In short it: moves, deposits, estimates fields, moves, deposits, recalculates fields, moves and deposits.

config_example.xi_step_size = 0.005

Step size in time-space coordinate xi

config_example.xi_steps = 599999

Amount of xi steps

lcode.step(config, const, virt_params, prev, beam_ro)[source]

Calculate the next iteration of plasma evolution and response. Returns the new state with the following attributes: x_offt, y_offt, px, py, pz, Ex, Ey, Ez, Bx, By, Bz, ro, jx, jy, jz. Pass the returned value as prev for the next iteration. Wrap it in GPUArraysView if you want transparent conversion to numpy arrays.

Input parameters

Beam density array rho_b (beam_ro) is copied to the GPU with cupy.asarray, as it is calculated with numpy in config-residing beam().

All the other arrays come packed in GPUArrays objects [GPU array conversion], which ensures that they reside in the GPU memory. These objects are:

  • const and virt_params, which are constant at least for the \(\xi\)-step duration and defined during the initialization, and
  • prev, which is usually obtained as the return value of the previous step() invocation, except for the very first step.

Initial half-step estimation

  1. The particles are advanced according to their current momenta only (lcode.move_estimate_wo_fields()).

Field prediction

While we don’t know the fields on the next step:

  1. The particles are advanced with the fields from the previous step using the coordinates estimated at 1. to calculate the half-step positions where the previous step fields should be interpolated at (lcode.move_smart()).
  2. The particles from 2. are deposited onto the charge/current density grids (lcode.deposit()).
  3. The fields at the next step are calculated using densities from 3. (lcode.calculate_Ez(), lcode.calculate_Ex_Ey_Bx_By(), lcode.calculate_Bz()) and averaged with the previous fields.

This phase gives us an estimation of the fields at half-step, and the coordinate estimation at next step, while all other intermediate results are ultimately ignored.

Field correction

  1. The particles are advanced with the averaged fields from 4., using the coordinates from 2. to calculate the half-step positions where the averaged fields from 4. should be interpolated at (lcode.move_smart()).
  2. The particles from 5. are deposited onto the charge/current density grids (lcode.deposit()).
  3. The fields at the next step are calculated using densities from 6. (lcode.calculate_Ez(), lcode.calculate_Ex_Ey_Bx_By(), lcode.calculate_Bz()) and averaged with the previous fields.

The resulting fields are far more precise than the ones from the prediction phase, but the coordinates and momenta are still pretty low-quality until we recalculate them using the new fields. Iterating the algorithm more times improves the stability, but it currently doesn’t bring much to the table as the transverse noise dominates.

Final plasma evolution and deposition

  1. The particles are advanced with the averaged fields from 7., using the coordinates from 5. to calculate the half-step positions where the averaged fields from 7. should be interpolated at (lcode.move_smart()).
  2. The particles from 8. are deposited onto the charge/current density grids (lcode.deposit()).

The result, or the ‘new prev’

The fields from 7., coordinates and momenta from 8., and densities from 9. make up the new GPUArrays collection that would be passed as prev to the next iteration of step().

Initialization

lcode.init(config)[source]

Initialize all the arrays needed for step and config.beam.

This function performs quite a boring sequence of actions, outlined here for interlinking purposes:

  • validates the oddity of config.grid_steps [Fields and densities grid],
  • validates that config.reflect_padding_steps is large enough [Reflect and ‘plasma’ boundaries],
  • calculates the reflect_boundary and monkey-patches it back into config,
  • initializes the x and y arrays for use in config_example.beam(),
  • calculates the plasma placement boundary,
  • immediately passes it to make_plasma(), leaving it oblivious to the padding concerns,
  • performs the initial electrion deposition to obtain the background ions charge density [Background ions],
  • groups the constant arrays into a GPUArray instance const, and
  • groups the evolving arrays into a GPUArray instance state.

GPU calculations pecularities

LCODE 3D performs most of the calculations on GPU using a mix of two approaches.

CUDA kernels with numba.cuda

One can use CUDA from Python more or less directly by writing and launching CUDA kernels with numba.cuda.

An example would be:

@numba.cuda.jit
def add_two_arrays_kernel(arr1, arr2, result):
    i = numba.cuda.grid(1)
    if i >= arr1.shape[0]:
        return
    result[i] = arr1[i] + arr2[i]

This function represents a loop body, launched in parallel with many threads at once. Each of them starts with obtaining the array index it is ‘responsible’ for with cuda.grid(1) and then proceeds to perform the required calculation. As it is optimal to launch them in 32-threaded ‘warps’, one also has to handle the case of having more threads than needed by making them skip the calculation.

No fancy Python operations are supported inside CUDA kernels, it is basically a way to write C-like bodies for hot loops without having to write actual C/CUDA code. You can only use simple types for kernel arguments and you cannot return anything from them.

To rub it in, this isn’t even a directly callable function yet. To conceal the limitations and the calling complexity, it is convenient to write a wrapper for it.

def add_two_arrays(arr1, arr2):
    result = cp.zeros_like(arr1)  # uses cupy, see below
    warp_count = int(ceil(arr1.size / WARP_SIZE))
    add_two_arrays_kernel[warp_count, WARP_SIZE](arr1, arr2, result)
    return result

A pair of numbers (warp_count, WARP_SIZE) is required to launch the kernel. warp_count is chosen this way so that warp_count * WARP_SIZE would be larger than the problem size.

lcode.WARP_SIZE = 32

Should be detectable with newer cupy (>6.0.0b2) as WARP_SIZE = cp.cuda.Device(config.gpu_index).attributes['WarpSize']. As of 2019 it’s equal to 32 for all CUDA-capable GPUs. It’s even a hardcoded value in cupy.

Array-wise operations with cupy

cupy is a GPU array library that aims to implement a numpy-like interface to GPU arrays. It allows one to, e.g., add up two GPU arrays with a simple and terse a + b. Most of the functions in LCODE use vectorized operations and cupy. All memory management is done with cupy for consistency.

It’s hard to underestimate the convenience of this approach, but sometimes expressing algorithms in vectorized notation is too hard or suboptimal. The only two times we’re actually going for writing CUDA kernels are deposit() (our fine particle loop) and move_smart() (our coarse particle loop).

Copying is expensive

If the arrays were copied between GPU RAM and host RAM, the PCI-E bandwidth would become a bottleneck. The two most useful strategies to minimize excessive copying are

  1. churning for several consecutive \(\xi\)-steps with no copying and no CPU-side data processing (with a notable exception of beam() and the resulting beam_ro); and
  2. copying only the subset of the arrays that the outer diagnostics code needs.

GPU array conversion

In order for a + b to work in cupy, both arrays have to be copied to GPU (cupy.asarray(a)) and, in case you want the results back as numpy arrays, you have to explicitly copy them back (gpu_array.get()).

While for the LCODE 3D itself it’s easier and quicker to stick to using GPU arrays exclusively, this means the only time when we want to do the conversion to numpy is when we are returning the results back to the external code.

There are two classes that assist in copying the arrays back and forth and conveniently as possible. The implementation looks a bit nightmarish, but using them is simple.

class lcode.GPUArrays(**kwargs)[source]

A convenient way to group several GPU arrays and access them with a dot. x = GPUArrays(something=numpy_array, something_else=another_array) will create x with x.something and x.something_else stored on GPU.

Do not add more attributes later, specify them all at construction time.

class lcode.GPUArraysView(gpu_arrays)[source]

This is a magical wrapper around GPUArrays that handles GPU-RAM data transfer transparently. Accessing view.something will automatically copy array to host RAM, setting view.something = ... will copy the changes back to GPU RAM.

Usage: view = GPUArraysView(gpu_arrays); view.something

Do not add more attributes later, specify them all at construction time.

NOTE: repeatedly accessing an attribute will result in repeated copying!

This way we can wrap everything we need in GPUArrays with, e.g., const = GPUArrays(x_init=x_init, y_init=y_init, ...) and then access them as const.x_init from GPU-heavy code. For the outer code that does not care about GPU arrays at all, we can return a wrapped const_view = GPUArraysView(const) and access the arrays as const_view.x_init.

Copying will happen on-demand during the attribute access, intercepted by our __getattr__ implementation, but beware!

Note

Repeatedly accessing const_view.x_init will needlessly perform the copying again, so one should bind it to the variable name (x_init = const_view.x_init) once and reuse the resulting numpy array.

Todo

CODE: wrap the returned array with GPUArraysView by default

Selecting GPU

config_example.gpu_index = 0

Index of the GPU that should perform the calculations

LCODE 3D currently does not support utilizing several GPUs for one simulation, but once we switch to beam evolution calculation, processing several consecutive \(t\)-steps in a pipeline of several GPUs should be a low hanging fruit.

Optimal transverse grid sizes

FFT works best for grid sizes that are factorizable into small numbers. Any size will work, but the performance may vary dramatically.

FFTW documentation quotes the optimal size for their algorithm as \(2^a 3^b 5^c 7^d 11^e 13^f\), where \(e+f\) is either \(0\) or \(1\), and the other exponents are arbitrary.

While LCODE 3D does not use FFTW (it uses cufft instead, wrapped by cupy), the formula is still quite a good rule of thumb for calculating performance-friendly config_example.grid_steps values.

The employed FFT sizes for a grid sized \(N\) are \(2N-2\) for both DST (dst2d(), mix2d()) and DCT transforms (dct2d(), mix2d()) when we take padding and perimeter cell stripping into account.

This leaves us to find such \(N\) that \(N-1\) satisfies the small-factor conditions.

If you don’t mind arbitrary grid sizes, we suggest using

  1. \(N=2^K + 1\), they always perform the best, or
  2. one of the roundish \(201\), \(301\), \(401\), \(501\), \(601\), \(701\), \(801\), \(901\), \(1001\), \(1101\), \(1201\), \(1301\), \(1401\), \(1501\), \(1601\), \(1801\), \(2001\), \(2101\), \(2201\), \(2401\), \(2501\), \(2601\), \(2701\), \(2801\), \(3001\), \(3201\), \(3301\), \(3501\), \(3601\), \(3901\), \(4001\).

The code to check for the FFTW criteria above and some of the matching numbers are listed below.

def factorize(n, a=[]):
    if n <= 1:
        return a
    for i in range(2, n + 1):
        if n % i == 0:
            return factorize(n // i, a + [i])

def good_size(n):
    factors = factorize(n - 1)
    return (all([f in [2, 3, 4, 5, 7, 11, 13] for f in factors])
            and actors.count(11) + factors.count(13) < 2 and
            and n % 2)

', '.join([str(a) for a in range(20, 4100) if good_size(a)])

\(21\), \(23\), \(25\), \(27\), \(29\), \(31\), \(33\), \(37\), \(41\), \(43\), \(45\), \(49\), \(51\), \(53\), \(55\), \(57\), \(61\), \(65\), \(67\), \(71\), \(73\), \(79\), \(81\), \(85\), \(89\), \(91\), \(97\), \(99\), \(101\), \(105\), \(109\), \(111\), \(113\), \(121\), \(127\), \(129\), \(131\), \(133\), \(141\), \(145\), \(151\), \(155\), \(157\), \(161\), \(163\), \(169\), \(177\), \(181\), \(183\), \(193\), \(197\), \(199\), \(201\), \(209\), \(211\), \(217\), \(221\), \(225\), \(235\), \(241\), \(251\), \(253\), \(257\), \(261\), \(265\), \(271\), \(281\), \(289\), \(295\), \(301\), \(309\), \(313\), \(321\), \(325\), \(331\), \(337\), \(351\), \(353\), \(361\), \(365\), \(379\), \(385\), \(391\), \(393\), \(397\), \(401\), \(417\), \(421\), \(433\), \(441\), \(449\), \(451\), \(463\), \(469\), \(481\), \(487\), \(491\), \(501\), \(505\), \(513\), \(521\), \(529\), \(541\), \(547\), \(551\), \(561\), \(577\), \(589\), \(595\), \(601\), \(617\), \(625\), \(631\), \(641\), \(649\), \(651\), \(661\), \(673\), \(687\), \(701\), \(703\), \(705\), \(721\), \(729\), \(751\), \(757\), \(769\), \(771\), \(781\), \(785\), \(793\), \(801\), \(811\), \(833\), \(841\), \(865\), \(881\), \(883\), \(897\), \(901\), \(911\), \(925\), \(937\), \(961\), \(973\), \(981\), \(991\), \(1001\), \(1009\), \(1025\), \(1041\), \(1051\), \(1057\), \(1079\), \(1081\), \(1093\), \(1101\), \(1121\), \(1135\), \(1153\), \(1171\), \(1177\), \(1189\), \(1201\), \(1233\), \(1249\), \(1251\), \(1261\), \(1275\), \(1281\), \(1297\), \(1301\), \(1321\), \(1345\), \(1351\), \(1373\), \(1387\), \(1401\), \(1405\), \(1409\), \(1441\), \(1457\), \(1459\), \(1471\), \(1501\), \(1513\), \(1537\), \(1541\), \(1561\), \(1569\), \(1585\), \(1601\), \(1621\), \(1639\), \(1651\), \(1665\), \(1681\), \(1729\), \(1751\), \(1761\), \(1765\), \(1783\), \(1793\), \(1801\), \(1821\), \(1849\), \(1873\), \(1891\), \(1921\), \(1945\), \(1951\), \(1961\), \(1981\), \(2001\), \(2017\), \(2049\), \(2059\), \(2081\), \(2101\), \(2107\), \(2113\), \(2157\), \(2161\), \(2185\), \(2201\), \(2241\), \(2251\), \(2269\), \(2305\), \(2311\), \(2341\), \(2353\), \(2377\), \(2401\), \(2431\), \(2451\), \(2465\), \(2497\), \(2501\), \(2521\), \(2549\), \(2561\), \(2593\), \(2601\), \(2641\), \(2647\), \(2689\), \(2701\), \(2731\), \(2745\), \(2751\), \(2773\), \(2801\), \(2809\), \(2817\), \(2881\), \(2913\), \(2917\), \(2941\), \(2971\), \(3001\), \(3025\), \(3073\), \(3081\), \(3121\), \(3137\), \(3151\), \(3169\), \(3201\), \(3235\), \(3241\), \(3251\), \(3277\), \(3301\), \(3329\), \(3361\), \(3403\), \(3431\), \(3457\), \(3501\), \(3511\), \(3521\), \(3529\), \(3565\), \(3585\), \(3601\), \(3641\), \(3697\), \(3745\), \(3751\), \(3781\), \(3823\), \(3841\), \(3851\), \(3889\), \(3901\), \(3921\), \(3961\), \(4001\), \(4033\), \(4051\), \(4097\)

Offset-coordinate separation

Float precision loss

When floating point numbers of different magnitudes get added up, there is an inherent precision loss that grows with the magnitude disparity.

If a particle has a large coordinate (think 5.223426), but moves for a small distance (think 7.139152e-4) due to low xi_step_size and small momentum projection, calculating the sum of these numbers suffers from the precision loss due to the finite significand size:

An oversimplified illustration in decimal notation:

 5.223426
+0.0007139152
=5.224139LOST

We have not conducted extensive research on how detrimental this round-off accumulation is to LCODE 3D numerical stability in \(\xi\). Currently the transverse noise dominates, but in order to make our implementation a bit more future-proof, we store the plasma particle coordinates separated into two floats: initial position (x_init, y_init) and accumulated offset (x_offt, y_offt) and do not mix them.

Mixing them all the time…

OK, we do mix them. Each and every function involving them adds them up at some point and even has the code like this:

x = x_init + x_offt
...
x_offt = x - x_init

to reconstruct x_offt from the ‘dirty’ sum values x.

We do that because we’re fine with singular round-off errors until they don’t propagate to the next step, accumulating for millions of \(\xi\)-steps (‘Test 1’ simulations were conducted for up to 1.5 million steps).

… but not where it really matters

This way the only places where the separation should be preserved is the path from prev.x_offt to new_state.x_offt. Several x_offt additions are performed and rolled back at each \(\xi\)-step, but only two kinds of them persist, both residing in move_smart():

  1. x_offt += px / (gamma_m - pz) * xi_step_size does no mixing with the coordinate values, and
  2. x = +2 * reflect_boundary - x and the similar one for the left boundary only happen during particle reflection, which presumably happens rarely and only affects the particles that have already deviated at least several cells away from the initial position.

This way most particles won’t experience this kind of rounding issues with their coordinates. On the flip side, splitting the coordinates makes working with them quite unwieldy.

Design decisions

Codebase complexity

The code strives to be readable and vaguely understandable by a freshman student with only some basic background in plasma physics and numerical simulations, to the point of being comfortable with modifying it.

Given the complexity of the physical problem behind it, this goal is, sadly, unattainable, but the authors employ several avenues to get as close as possible:

  1. Abstaining from using advanced programming concepts. Cool things like aspect-oriented programming are neat, but keeping the code well under 1000 SLOC is even neater. The two classes we currently have is two classes over the ideal amount of them.
  2. Preferring less code over extensibility.
  3. Picking simpler code over performance tweaking.
  4. Choosing malleability over user convenience.
  5. Creating external modules or branches over exhaustive featureset.
  6. Not shying away from external dependencies or unpopular technologies, even if this means sacrificing the portability.
  7. Appointing a physics student with modest programming background as the maintainer and primary code reviewer.

Codebase size

LCODE 3D wasn’t always around 500 SLOC. In fact, even as it got rewritten from C to Cython to numba to numba.cuda to cupy, it peaked at around 5000 SLOC, twice. And we don’t even count its Fortran days.

In order to objectively curb the complexity, scope and malleability of the codebase, its size is limited to 1000 SLOC.

David Wheeler’s SLOCCount is used for obtaining the metric. Empty lines, docstrings and comments don’t count towards that limit.

Zero special boundary treatment

[Not allowing the particles to reach the outer cells of the simulation window] slightly modifies the physical problem itself, but, in return blesses us with the ability to forego special boundary checks during deposition, interpolation and field calculation, simplifying the code and boosting the performance.

Memory considerations

LCODE 3D is observed to consume roughly the same amount of host and GPU RAM, hovering around 500 MiB for a 641x641 grid, coarseness=3 and fineness=2.

The size of the arrays processed by LCODE 3D depends on these parameters.

Let’s label the field/densities grid size in a single direction as \(N\), coarse plasma grid size as \(N_c \approx \frac{N}{\text{coarseness}}\) and fine plasma grid size as \(N_f \approx N * \text{fineness}\).

With about the same amount of arrays in scope for each of these three sizes, it is clear that the \(N_f^2\)-sized arrays would dominate the memory consumption. Fortunately, the arrays that contain fine plasma characteristics would be transient and only used during the deposition, while the interpolation indices and coefficients grouped under virt_params can be reduced to 1D arrays by exploiting the \(x/y\) symmetry of the coarse/fine plasma grids.

This way LCODE 3D stores only \(N_c^2\)- and \(N^2\)-sized arrays, with \(N_f\)-sized ones barely taking up any space thanks to the being 1D.

Also, all previous attempts to micromanaged the GPU memory allocations have been scraped in favor of blindly trusting the cupy on-demand allocation. Not only it is extremely convenient, it’s even more performant than our own solutions.

Integer xi steps

\(\xi\)-steps are integer for the purpose of bypassing float precision-based errors. The task of converting it into the \(\xi\)-coordinate is placed within the usage context.

Contributing

When contributing to this repository, please discuss the change you wish to make via email (team@lcode.info), issue tracker (https://github.com/lotov/lcode3d/issues), personal communication or any other method with our team.

The suggested followup workflow for the implementor would be:

  • choose the most suitable parent branch;
  • fork https://github.com/lotov/lcode3d or its fork;
  • check it out locally;
  • install dependencies (see requirements.txt);
  • verify that LCODE runs as-is;
  • implement, test and commit changes;
  • check that the code is still under 1000 SLOC;
  • try to strip all the complex programming concepts and clever hacks;
  • rebase it if the parent branch advances;
  • submit a pull request;
  • wait for it to be rebased-and-merged.

By submitting patches to this project, you agree them to be redistributed under the project’s license according to the normal forms and usages of the open-source community.