PDS4格式

基本介绍

PDS4，是NASA开发的一种专门用于存储行星探索任务的数据系统。数据的后缀有两种：2BL和2B(2CL和2C)。2BL文件是一个XML格式的文件（XML 被设计用来结构化、存储以及传输信息。），里面记录了拍摄器材、拍摄时间和曝光参数等信息，最重要的是它存储了一个文件路径，也就是同名的2B文件，2B文件里面才是实际的图片数据。

探月雷达数据主要结构

简化后的文件结构

<?xml version="1.0" encoding="UTF-8"?>  #声明文档是XML格式，并且使用UTF-8编码
<Product_Observational xmlns="http://pds.nasa.gov/pds4/pds/v1" xmlns:pds="http://pds.nasa.gov/pds4/pds/v1">   #PDS4的特有元素，描述观测数据产品的容器，定义命名空间

    <Identification_Area>
        #识别信息（Identification_Area）：包括产品的唯一标识符、版本、标题等。
    </Identification_Area>

    <Observation_Area>
    # 观测信息（Observation_Area）：描述了观测的详细信息，包括时间坐标、观测目的、处理级别、科学参数等。
    </Observation_Area>

    <File_Area_Observational>
    #需要了解的核心部分，描述了观测数据文件的组织结构，包括文件名、文件属性、记录格式、字段属性和组结构。通过这些信息，用户可以了解如何访问和解释文件中的数据。
        <File> #存放实际数据的文件名
            <file_name>CE4_GRAS_LPR-1_SCI_N_20201119112501_20201120143000_0155_A.2B</file_name>
            <local_identifier>None</local_identifier>
            <creation_date_time>2020-12-23T02:36:00.000Z</creation_date_time>
            <file_size unit="byte">19565385</file_size>
            <records>2380</records>
        </File>
        <Table_Binary> #二进制数据
            <offset unit="byte">0</offset>
            <records>2380</records>
            <Record_Binary> # 描述单个记录的结构
                <fields>28</fields>
                <groups>1</groups>
                <record_length unit="byte">32883</record_length>
                <Field_Binary> #用于描述存储在二进制文件中的单个字段的属性和结构。
                    <name>FRAME_IDENTIFICATION</name> #字段的名称，提供了字段内容的描述性标识。
                    <field_number>1</field_number>
                    <field_location unit="byte">1</field_location>
                    <data_type>UnsignedByte</data_type>
                    <field_length unit="byte">4</field_length>
                    <field_format>None</field_format>
                    <unit>none</unit>
                    <description>Frame Identifier, 0x146F1111: Channel 1 data</description>
                </Field_Binary>
                <Group_Field_Binary>  # 二进制表中的组结构，描述二进制表中一组字段重复模式的数据
                    <name>ECHO_DATA</name> # 
                    <group_number>1</group_number>
                    <repetitions>8192</repetitions>
                    <fields>1</fields>
                    <groups>0</groups>
                    <group_location unit="byte">115</group_location># 从第几个数据开始
                    <group_length unit="byte">32768</group_length>  #8192*4 byte

                    <Field_Binary>
                        <name>ECHO_DATA</name>   #雷达回波数据
                        <field_number>1</field_number>
                        <field_location unit="byte">1</field_location>
                        <data_type>IEEE754LSBSingle</data_type> #数据格式
                        <field_length unit="byte">4</field_length> #需要重复的数据每个所占的数据大小
                        <field_format>F.6</field_format>
                        <unit>none</unit>
                        <description>There are 8192 echoes, with a sample interval of 2.5ns.</description>
                    </Field_Binary>
                </Group_Field_Binary>
            </Record_Binary>
        </Table_Binary>
    </File_Area_Observational>
</Product_Observational>

数据解析

1.通过import xml.etree.ElementTree这个库，查找XML文件中的元素和字段。

2.获取File_Area_Observational信息，得到存储数据的二进制文件文件名

file_area = root.find('pds:File_Area_Observational', namespaces)

3.打开二进制文件并读取所有数据

    with open(binary_file, 'rb') as bin_file:
        data = bin_file.read()

4.读懂Table_Binary的结构，弄懂每个Field_Binary的含义

5.利用find这个工具，根据结构一层一层的查找需要的数据定义：比如这个字段从第几个字节开始，占用多少个字节，这个字段的格式，将这些数据保存到变量
例如：

    # 获取field_length元素
    field_length_element = field_binary.find('pds:field_length', namespaces)
    if field_length_element is not None:
        field_length = int(field_length_element.text)
        print(f"Field_length: {field_length}")
    else:
        print("field_length_element not found.")

得到这些关键数据

6.根据这些变量来确定循环执行的次数，每次的偏移量是多少。解析保存数据，然后绘图

7.得到看结果如下：
嫦娥4号的单个2B数据图像：

嫦娥5号的一次回波数据：

嫦娥5号的单个2B数据图像：

遇到的问题和注意事项

1.嫦娥4号用的是小端存储，嫦娥5号数据用的是大端存储，电脑解析的时候默认是小端存储，需要根据2BL文件中的定义决定解析方式

2.2BL中数据位置（field_location）标注开始是从1开始的（1表示字段从记录的第1个字节开始），python读取2B二进制文件，放入一个一维数组，索引是从0开始的，需要field_location-1 作为数组的索引。最开始看到offset =0 ,所以就按照field_location作为数组的索引来读取，得到的数据有点奇怪。

3.数据量很少，嫦娥4号的数据需要根据X，Y的位置信息，将多个2B文件拼接，但是很多下载链接失效

解析文件

我把我的数据解析的代码和测试数据打包后，放到了这个仓库里边了，下载后直接运行Python文件就可以看到效果，欢迎大家Star。

GitHub

zzkuner/CE-LPR

嫦娥四号数据解析：

import xml.etree.ElementTree as ET
import numpy as np
import matplotlib.pyplot as plt
import struct
# XML文件路径
# xml_file = 'CE4_GRAS_LPR-2B_SCI_N_20201219044001_20201220084500_0163_A.2BL'
xml_file = 'CE4_GRAS_LPR-2A_SCI_N_20201119112501_20201120143000_0155_A.2BL'

# 解析XML文件
tree = ET.parse(xml_file)
root = tree.getroot()

# 定义命名空间
namespaces = {
    'pds': 'http://pds.nasa.gov/pds4/pds/v1',
    'sp': 'http://pds.nasa.gov/pds4/sp/v1'
}

# 获取Identification_Area信息
identification_area = root.find('pds:Identification_Area', namespaces)
logical_identifier = identification_area.find('pds:logical_identifier', namespaces).text
version_id = identification_area.find('pds:version_id', namespaces).text
title = identification_area.find('pds:title', namespaces).text
print(f"Logical Identifier: {logical_identifier}")
print(f"Version ID: {version_id}")
print(f"Title: {title}")

# 获取Observation_Area信息
observation_area = root.find('pds:Observation_Area', namespaces)
start_date_time = observation_area.find('pds:Time_Coordinates/pds:start_date_time', namespaces).text
stop_date_time = observation_area.find('pds:Time_Coordinates/pds:stop_date_time', namespaces).text
print(f"Start Date Time: {start_date_time}")
print(f"Stop Date Time: {stop_date_time}")

# 获取Mission_Area信息
mission_area = observation_area.find('pds:Mission_Area', namespaces)
product_id = mission_area.find('pds:product_id', namespaces).text
product_version = mission_area.find('pds:product_version', namespaces).text
product_level = mission_area.find('pds:product_level', namespaces).text
instrument_name = mission_area.find('pds:instrument_name', namespaces).text
print(f"Product ID: {product_id}")
print(f"Product Version: {product_version}")
print(f"Product Level: {product_level}")
print(f"Instrument Name: {instrument_name}")

# 获取Lander_Location信息
lander_location = mission_area.find('pds:Lander_Location', namespaces)
longitude = lander_location.find('pds:longitude', namespaces).text
latitude = lander_location.find('pds:latitude', namespaces).text
print(f"Lander Longitude: {longitude}")
print(f"Lander Latitude: {latitude}")

# # 获取File_Area_Observational信息
file_area = root.find('pds:File_Area_Observational', namespaces)
# 解析文件信息
file = file_area.find('pds:File', namespaces)
file_name = file.find('pds:file_name', namespaces).text
creation_date_time = file.find('pds:creation_date_time', namespaces).text
file_size = file.find('pds:file_size', namespaces).text
print(f"File Name: {file_name}")
print(f"Creation Date Time: {creation_date_time}")
print(f"File Size: {file_size}")

binary_file = file_name
# 打开二进制文件并读取所有数据
with open(binary_file, 'rb') as bin_file:
    data = bin_file.read()
# 解析记录二进制信息
table_binary = file_area.find('pds:Table_Binary', namespaces)
if table_binary is not None:
    record_binary = table_binary.find('pds:Record_Binary', namespaces)
    if record_binary is not None:
        # 遍历Record_Binary下的所有Field_Binary元素
        for field_binary in record_binary.findall('pds:Field_Binary', namespaces):
            # 获取name元素
            name_element = field_binary.find('pds:name', namespaces)
            if name_element is not None:
                field_name = name_element.text
                print(f"Field Name: {field_name}")
            else:
                print("Name element not found.")

            # 获取field_location元素
            field_location_element = field_binary.find('pds:field_location', namespaces)
            if field_location_element is not None:
                field_location = int(field_location_element.text)
                print(f"field_location: {field_location}")
            else:
                print("field_location_element not found.")

            # 获取data_type元素
            data_type_element = field_binary.find('pds:data_type', namespaces)
            if data_type_element is not None:
                data_type = data_type_element.text
                print(f"data_type: {data_type}")
            else:
                print("data_type_element not found.")

            # 获取field_length元素
            field_length_element = field_binary.find('pds:field_length', namespaces)
            if field_length_element is not None:
                field_length = int(field_length_element.text)
                print(f"Field_length: {field_length}")
            else:
                print("field_length_element not found.")

        group_field = record_binary.find('pds:Group_Field_Binary', namespaces)
        if group_field is not None:
            group_name = group_field.find('pds:name', namespaces).text
            repetitions = int(group_field.find('pds:repetitions', namespaces).text)
            fields = int(group_field.find('pds:fields', namespaces).text)
            groups = int(group_field.find('pds:groups', namespaces).text)
            group_location = int(group_field.find('pds:group_location', namespaces).text)
            group_length = int(group_field.find('pds:group_length', namespaces).text)
            group_field_binary =group_field.find('pds:Field_Binary', namespaces)
            group_field_length =int(group_field_binary.find('pds:field_length', namespaces).text)
            # 准备存储解析后的数据
            science_data = np.zeros(repetitions, dtype=np.float32)
            # 遍历group指定的次数
            for i in range(repetitions):
                sample_start = group_location -1 + i * group_field_length# 每个样本大小为4字节
                sample = np.frombuffer(data[sample_start:sample_start + group_field_length], dtype='<f4')  # 小端序
                science_data[i] = sample[0]
            plt.plot(science_data[:])
            plt.show()
        else:
            print("Group_Field_Binary element not found.")
    else:
        print("Record_Binary element not found.")
else:
    print("Table_Binary element not found.")

嫦娥5号的数据解析：

import xml.etree.ElementTree as ET
import numpy as np
import matplotlib.pyplot as plt

def parse_pds4_file(file_path):
    # 加载和解析XML文件
    tree = ET.parse(file_path)
    root = tree.getroot()

    # 处理命名空间
    ns = {
        'pds': 'http://pds.nasa.gov/pds4/pds/v1',
        'sp': 'http://pds.nasa.gov/pds4/sp/v1'
    }

    # 获取File Area信息
    file_area = root.find('pds:File_Area_Observational', ns)
    file = file_area.find('pds:File', ns)
    file_name = file.find('pds:file_name', ns).text

    # 读取二进制数据
    with open(file_name, 'rb') as f:
        data = f.read()

    # 解析科学数据
    # 根据XML中的group_location和group_length
    offset = 311-1
    num_records = 132
    record_length = 12311
    num_samples = 3000
    sample_size = 4  # 每个数据点占用4个字节

    # 初始化数组
    science_data = np.zeros((num_records, num_samples), dtype=np.float32)

    # 读取科学数据
    for i in range(num_records):
        record_start = offset + i * record_length
        for j in range(num_samples):
            sample_start = record_start + j * sample_size
            sample = np.frombuffer(data[sample_start:sample_start + sample_size], dtype='>f4')
            # sample=sample.astype('>f4')
            science_data[i, j] = sample[0]

    # 绘制科学数据
    plt.figure(figsize=(10, 6))
    # plt.imshow(science_data.T, cmap='viridis', aspect='auto')
    # plt.colorbar(label='Data Value')
    # plt.title('Science Data Image')
    # plt.xlabel('X Axis')
    # plt.ylabel('Y Axis')
    # plt.savefig('science_data_image.png')
    plt.plot(science_data[1,:])
    plt.show()

if __name__ == "__main__":
    file_path = 'CE5-L_GRAS_LRPR-A_SCI_N_20201201171559_20201201173326_0001_A.2BL'  # 替换为你的文件路径
    parse_pds4_file(file_path)

PDS4格式

基本介绍

探月雷达数据主要结构

数据解析

遇到的问题和注意事项

解析文件

发送评论 编辑评论

推荐文章

发送评论编辑评论