Internals

Data Consistency

When creating a LASDataset or writing a tabular point cloud out to a file, we need to make sure that the header information we provide is consistent with that of the point cloud and any VLRs and user bytes. Internally, this is done using the function make_consistent_header!, which compares a LasHeader and some LAS data and makes sure the header has the appropriate data offsets, flags and other metadata. This will, for example, make sure that the numbers of points, VLRs and EVLRs are consistent with the data we've provided, so your LASDataset is guaranteed to be consistent.

LASDatasets.make_consistent_header!Function
make_consistent_header!(
    header::LasHeader,
    pointcloud::AbstractVector{<:NamedTuple},
    vlrs::Vector{<:LasVariableLengthRecord},
    evlrs::Vector{<:LasVariableLengthRecord},
    user_defined_bytes::Vector{UInt8}
)

Ensure that a LAS header is consistent with a given pointcloud data coupled with sets of vlrs, evlrs and user_defined_bytes

source
LASDatasets.make_consistent_headerFunction
make_consistent_header(
    pointcloud::AbstractVector{<:NamedTuple},
    point_format::Type{TPoint<:LasPoint},
    vlrs::Vector{<:LasVariableLengthRecord},
    evlrs::Vector{<:LasVariableLengthRecord},
    user_defined_bytes::Vector{UInt8},
    scale::Union{Real, AxisInfo, StaticArraysCore.SVector{3, <:Real}}
) -> LasHeader

Construct a LAS header that is consistent with a given pointcloud data in a specific LAS point_format, coupled with sets of vlrs, evlrs and user_defined_bytes

source

Third Party Packages

This package relies heavily upon PackedReadWrite.jl to speed up the reading and writing of LasPoints and some of our VLRs.

We also use BufferedStreams.jl to drastically reduce I/O overhead.

Point Records

As outlined in the User Fields Section, in order to offer full support of "extra point data" in our LAS files, we treat LAS point records as having a point, extra user fields and a set of undocumented bytes. Internally, however, this is broken up into 4 separate classes each implementing the LasRecord abstract type. These correspond to each combination of a point with/without user fields/undocumented bytes.

LASDatasets.LasRecordType
abstract type LasRecord

An abstract form of a LAS record. These are points with some additional information possibly included

source
LASDatasets.PointRecordType
struct PointRecord{TPoint} <: LASDatasets.LasRecord

A LAS record that only has a point

  • point::Any: The LAS point stored in this record
source
LASDatasets.ExtendedPointRecordType
struct ExtendedPointRecord{TPoint, Names, Types} <: LASDatasets.LasRecord

A LAS record that has a LAS point and extra user-defined point fields. Note that these must be documented as ExtraBytes VLRs in the LAS file

  • point::Any: The LAS point stored in this record

  • user_fields::LASDatasets.UserFields: Extra user fields associated with this point

source
LASDatasets.UndocPointRecordType
struct UndocPointRecord{TPoint, N} <: LASDatasets.LasRecord

A LAS record that has a point as well as additional undocumented bytes (i.e. that don't have an associated ExtraBytes VLR)

  • point::Any: The LAS point stored in this record

  • undoc_bytes::StaticArraysCore.SVector{N, UInt8} where N: Array of extra bytes after the point that haven't been documented in the VLRs

source
LASDatasets.FullRecordType
struct FullRecord{TPoint, Names, Types, N} <: LASDatasets.LasRecord

A LAS record that has a LAS point, extra user-defined fields and additional undocumented extra bytes

  • point::Any: The LAS point stored in this record

  • user_fields::LASDatasets.UserFields: Extra user fields associated with this point

  • undoc_bytes::StaticArraysCore.SVector{N, UInt8} where N: Array of extra bytes after the point that haven't been documented in the VLRs

source

This was done largely to increase performance of reading point records, since having one single type for point records would require more conditional checks to see if certain extra fields need to be read from a file which ends up congesting the read process. Instead, we use Julia's multiple dispatch and define Base.read and Base.write methods for each record type and avoid these checks and also decrease the type inference time when reading these into a vector.

Writing Optimisations

Typically, Julia is slower at performing multiple consecutive smaller writes to an IO channel than one much larger write. For this reason, when writing point records to a LAS file, we first construct a vector of bytes from the records and then write that whole vector to the file. This is possible since for each point record we know:

  • How many bytes the point format is,
  • How many user fields in this record and their data size in bytes and
  • How many undocumented bytes there are.

This is done using LASDatasets.get_record_bytes, which takes a collection of LAS records and writes each LAS field, user field and extra bytes collection into its correct location in the final byte vector.

In order to do this, we need to frequently access each field in a (potentially huge) list of records, which in normal circumstances is slow. We instead first pass our records into a StructVector using StructArrays.jl which vastly increases the speed at which we can access these fields and broadcast over them.

LASDatasets.get_record_bytesFunction
get_record_bytes(
    records::StructArrays.StructArray{TRecord<:LASDatasets.LasRecord, 1},
    vlrs::Vector{LasVariableLengthRecord}
) -> Any

Construct an array of bytes that correctly encodes the information stored in a set of LAS records according to the spec

source

Automatic Support for User Fields

In order for the system to automatically handle a user supplying their own custom fields in a point cloud table, we make some checks on field types and have processes in place that ensure each column has an ExtraBytes VLR associated to it.

Firstly, the LAS 1.4 spec officially supports the following data types directly: UInt8, Int8, UInt16, Int16, UInt32, Int32, UInt64, Int64, Float32 and Float64

This means that every ExtraBytes VLR must have a data type among these values (note that vectors are not directly supported). LASDatasets.jl supports static vectors (static sizing is essential) as user fields as well by internally separating out vector components and adding an ExtraBytes VLR for each component following the naming convention in the spec. That is, for a user field with N entries, the individual component names that are documented in the VLRs are "col [0]", "col [1]", ..., "col [N - 1]".

When a user passes a custom field to the system, it will firstly check that the data type for this field is either one of the above types or an SVector of one. If it is a vector, it will construct a list of the component element field names as above. Then, it will extract all ExtraBytes VLRs and check if any of them have matching names and update them iff they exist so their data type matches the new type supplied. If these are new fields, a new ExtraBytes VLR will be added per field name. Finally, the header is updated to reflect the new number of VLRs, the new data offsets and the new point record lengths.