From 7cd01f1defe2413504188802bdf2c72a2523361a Mon Sep 17 00:00:00 2001 From: Simon Billinge Date: Sat, 17 Jan 2026 13:50:38 -0500 Subject: [PATCH 01/16] deprecated: loadData to load_data, and move it --- docs/source/examples/parsers_example.rst | 24 +-- docs/source/examples/resample_example.rst | 6 +- docs/source/utilities/parsers_utility.rst | 4 +- news/depr-tests.rst | 23 +++ src/diffpy/utils/_deprecator.py | 12 +- src/diffpy/utils/parsers/loaddata.py | 15 +- src/diffpy/utils/tools.py | 180 ++++++++++++++++++++++ tests/test_loaddata.py | 75 +++++++++ tests/test_serialization.py | 20 +-- 9 files changed, 327 insertions(+), 32 deletions(-) create mode 100644 news/depr-tests.rst diff --git a/docs/source/examples/parsers_example.rst b/docs/source/examples/parsers_example.rst index db97e0ed..19e607eb 100644 --- a/docs/source/examples/parsers_example.rst +++ b/docs/source/examples/parsers_example.rst @@ -13,13 +13,13 @@ Using the parsers module, we can load file data into simple and easy-to-work-wit Our goal will be to extract the data, and the parameters listed in the header, from this file and load it into our program. -2) To get the data table, we will use the ``loadData`` function. The default behavior of this +2) To get the data table, we will use the ``load_data`` function. The default behavior of this function is to find and extract a data table from a file. .. code-block:: python - from diffpy.utils.parsers.loaddata import loadData - data_table = loadData('') + from diffpy.utils.tools import load_data + data_table = load_data('') While this will work with most datasets, on our ``data.txt`` file, we got a ``ValueError``. The reason for this is due to the comments ``$ Phase Transition Near This Temperature Range`` and ``--> Note Significant Jump in Rw <--`` @@ -27,9 +27,9 @@ embedded within the dataset. To fix this, try using the ``comments`` parameter. .. code-block:: python - data_table = loadData('', comments=['$', '-->']) + data_table = load_data('', comments=['$', '-->']) -This parameter tells ``loadData`` that any lines beginning with ``$`` and ``-->`` are just comments and +This parameter tells ``load_data`` that any lines beginning with ``$`` and ``-->`` are just comments and more entries in our data table may follow. Here are a few other parameters to test out: @@ -39,7 +39,7 @@ Here are a few other parameters to test out: .. code-block:: python - loadData('', comments=['$', '-->'], delimiter=',') + load_data('', comments=['$', '-->'], delimiter=',') returns an empty list. * ``minrows=50``: Only look for data tables with at least 50 rows. Since our data table has much less than that many @@ -47,7 +47,7 @@ returns an empty list. .. code-block:: python - loadData('', comments=['$', '-->'], minrows=50) + load_data('', comments=['$', '-->'], minrows=50) returns an empty list. * ``usecols=[0, 3]``: Only return the 0th and 3rd columns (zero-indexed) of the data table. For ``data.txt``, this @@ -55,14 +55,14 @@ returns an empty list. .. code-block:: python - loadData('', comments=['$', '-->'], usecols=[0, 3]) + load_data('', comments=['$', '-->'], usecols=[0, 3]) -3) Next, to get the header information, we can again use ``loadData``, +3) Next, to get the header information, we can again use ``load_data``, but this time with the ``headers`` parameter enabled. .. code-block:: python - hdata = loadData('', comments=['$', '-->'], headers=True) + hdata = load_data('', comments=['$', '-->'], headers=True) 4) Rather than working with separate ``data_table`` and ``hdata`` objects, it may be easier to combine them into a single dictionary. We can do so using the ``serialize_data`` function. @@ -116,8 +116,8 @@ The returned value, ``parsed_file_data``, is the dictionary we just added to ``s .. code-block:: python - data_table = loadData('') - hdata = loadData('', headers=True) + data_table = load_data('') + hdata = load_data('', headers=True) serialize_data('', hdata, data_table, serial_file='') The serial file ``serialfile.json`` should now contain two entries: ``data.txt`` and ``moredata.txt``. diff --git a/docs/source/examples/resample_example.rst b/docs/source/examples/resample_example.rst index ba28390b..5af42e73 100644 --- a/docs/source/examples/resample_example.rst +++ b/docs/source/examples/resample_example.rst @@ -16,9 +16,9 @@ given enough datapoints. .. code-block:: python - from diffpy.utils.parsers.loaddata import loadData - nickel_datatable = loadData('') - nitarget_datatable = loadData('') + from diffpy.utils.tools import load_data + nickel_datatable = load_data('') + nitarget_datatable = load_data('') Each data table has two columns: first is the grid and second is the function value. To extract the columns, we can utilize the serialize function ... diff --git a/docs/source/utilities/parsers_utility.rst b/docs/source/utilities/parsers_utility.rst index ffaf768e..954405b5 100644 --- a/docs/source/utilities/parsers_utility.rst +++ b/docs/source/utilities/parsers_utility.rst @@ -5,7 +5,7 @@ Parsers Utility The ``diffpy.utils.parsers`` module allows users to easily and robustly load file data into a Python project. -- ``loaddata.loadData()``: Find and load a data table/block from a text file. This seems to work for most datafiles +- ``loaddata.load_data()``: Find and load a data table/block from a text file. This seems to work for most datafiles including those generated by diffpy programs. Running only ``numpy.loadtxt`` will result in errors for most these files as there is often excess data or parameters stored above the data block. Users can instead choose to load all the parameters of the form `` = `` into a dictionary @@ -17,7 +17,7 @@ The ``diffpy.utils.parsers`` module allows users to easily and robustly load fil - ``serialization.deserialize_data()``: Load data from a serial file format into a Python dictionary. Currently, the only supported serial format is ``.json``. -- ``serialization.serialize_data()``: Serialize the data generated by ``loadData()`` into a serial file format. Currently, the only +- ``serialization.serialize_data()``: Serialize the data generated by ``load_data()`` into a serial file format. Currently, the only supported serial format is ``.json``. For a more in-depth tutorial for how to use these parser utilities, click :ref:`here `. diff --git a/news/depr-tests.rst b/news/depr-tests.rst new file mode 100644 index 00000000..74f7c173 --- /dev/null +++ b/news/depr-tests.rst @@ -0,0 +1,23 @@ +**Added:** + +* + +**Changed:** + +* load_data now takes a Path or a string for the file-path + +**Deprecated:** + +* diffpy.utils.parsers.loaddata.loadData replaced by diffpy.utils.tools.load_data + +**Removed:** + +* + +**Fixed:** + +* + +**Security:** + +* diff --git a/src/diffpy/utils/_deprecator.py b/src/diffpy/utils/_deprecator.py index 72172cae..c504604a 100644 --- a/src/diffpy/utils/_deprecator.py +++ b/src/diffpy/utils/_deprecator.py @@ -20,7 +20,7 @@ def deprecated(message, *, category=DeprecationWarning, stacklevel=1): .. code-block:: python - from diffpy._deprecations import deprecated + from diffpy.utils._deprecator import deprecated import warnings @deprecated("old_function is deprecated; use new_function instead") @@ -39,7 +39,6 @@ def new_function(x, y): .. code-block:: python from diffpy._deprecations import deprecated - import warnings warnings.simplefilter("always", DeprecationWarning) @@ -83,7 +82,9 @@ def wrapper(*args, **kwargs): return decorator -def deprecation_message(base, old_name, new_name, removal_version): +def deprecation_message( + base, old_name, new_name, removal_version, new_base=None +): """Generate a deprecation message. Parameters @@ -102,7 +103,10 @@ def deprecation_message(base, old_name, new_name, removal_version): str A formatted deprecation message. """ + if new_base is None: + new_base = base return ( f"'{base}.{old_name}' is deprecated and will be removed in " - f"version {removal_version}. Please use '{base}.{new_name}' instead." + f"version {removal_version}. Please use '{new_base}.{new_name}' " + f"instead." ) diff --git a/src/diffpy/utils/parsers/loaddata.py b/src/diffpy/utils/parsers/loaddata.py index 05d37497..7de4204e 100644 --- a/src/diffpy/utils/parsers/loaddata.py +++ b/src/diffpy/utils/parsers/loaddata.py @@ -18,8 +18,21 @@ import numpy from diffpy.utils import validators +from diffpy.utils._deprecator import deprecated, deprecation_message +base = "diffpy.utils.parsers.loaddata" +removal_version = "4.0.0" +loaddata_deprecation_msg = deprecation_message( + base, + "loadData", + "load_data", + removal_version, + new_base="diffpy.utils.tools", +) + + +@deprecated(loaddata_deprecation_msg) def loadData( filename, minrows=10, headers=False, hdel="=", hignore=None, **kwargs ): @@ -254,7 +267,7 @@ def readfp(self, fp, append=False): File details include: * File name. - * All data blocks findable by loadData. + * All data blocks findable by load_data. * Headers (if present) for each data block. (Generally the headers contain column name information). """ diff --git a/src/diffpy/utils/tools.py b/src/diffpy/utils/tools.py index 19f8e03b..4d4e19fb 100644 --- a/src/diffpy/utils/tools.py +++ b/src/diffpy/utils/tools.py @@ -8,6 +8,7 @@ from scipy.signal import convolve from xraydb import material_mu +from diffpy.utils import validators from diffpy.utils.parsers.loaddata import loadData @@ -396,3 +397,182 @@ def compute_mud(filepath): key=lambda pair: pair[1], ) return best_mud + + +def load_data( + filename, minrows=10, headers=False, hdel="=", hignore=None, **kwargs +): + """Find and load data from a text file. + + The data block is identified as the first matrix block of at least + minrows rows and constant number of columns. This seems to work for most + of the datafiles including those generated by diffpy programs. + + Parameters + ---------- + filename: Path or string + Name of the file we want to load data from. + minrows: int + Minimum number of rows in the first data block. All rows must have + the same number of floating point values. + headers: bool + when False (default), the function returns a numpy array of the data + in the data block. When True, the function instead returns a + dictionary of parameters and their corresponding values parsed from + header (information prior the data block). See hdel and hignore for + options to help with parsing header information. + hdel: str + (Only used when headers enabled.) Delimiter for parsing header + information (default '='). e.g. using default hdel, the line ' + parameter = p_value' is put into the dictionary as + {parameter: p_value}. + hignore: list + (Only used when headers enabled.) Ignore header rows beginning with + any elements in hignore. e.g. hignore=['# ', '['] causes the + following lines to be skipped: '# qmax=10', '[defaults]'. + kwargs: + Keyword arguments that are passed to numpy.loadtxt including the + following arguments below. (See numpy.loadtxt for more details.) Only + pass kwargs used by numpy.loadtxt. + + Useful kwargs + ============= + comments: str, sequence of str + The characters or list of characters used to indicate the start of a + comment (default '#'). Comment lines are ignored. + delimiter: str + Delimiter for the data in the block (default use whitespace). For + comma-separated data blocks, set delimiter to ','. + unpack: bool + Return data as a sequence of columns that allows tuple unpacking such + as x, y = load_data(FILENAME, unpack=True). Note transposing the + loaded array as load_data(FILENAME).T has the same effect. + usecols: + Zero-based index of columns to be loaded, by default use all detected + columns. The reading skips data blocks that do not have the usecols- + specified columns. + + Returns + ------- + data_block: ndarray + A numpy array containing the found data block. (This is not returned + if headers is enabled.) + hdata: dict + If headers are enabled, return a dictionary of parameters read from + the header. + """ + from numpy import array, loadtxt + + # for storing header data + hdata = {} + # determine the arguments + delimiter = kwargs.get("delimiter") + usecols = kwargs.get("usecols") + # required at least one column of floating point values + mincv = (1, 1) + # but if usecols is specified, require sufficient number of columns + # where the used columns contain floats + if usecols is not None: + hiidx = max(-min(usecols), max(usecols) + 1) + mincv = (hiidx, len(set(usecols))) + + # Check if a line consists of floats only and return their count + # Return zero if some strings cannot be converted. + def countcolumnsvalues(line): + try: + words = line.split(delimiter) + # remove trailing blank columns + while words and not words[-1].strip(): + words.pop(-1) + nc = len(words) + if usecols is not None: + nv = len([float(words[i]) for i in usecols]) + else: + nv = len([float(w) for w in words]) + except (IndexError, ValueError): + nc = nv = 0 + return nc, nv + + # Check if file exists before trying to open + filename = Path(filename) + if not filename.is_file(): + raise IOError( + ( + f"File {str(filename)} cannot be found. " + "Please rerun the program specifying a valid filename." + ) + ) + + # make sure fid gets cleaned up + with open(filename, "rb") as fid: + # search for the start of datablock + start = ncvblock = None + fpos = (0, 0) + nrows = 0 + for line in fid: + # decode line + dline = line.decode() + # find header information if requested + if headers: + hpair = dline.split(hdel) + flag = True + # ensure number of non-blank arguments is two + if len(hpair) != 2: + flag = False + else: + # ignore if an argument is blank + hpair[0] = hpair[0].strip() # name of data entry + hpair[1] = hpair[1].strip() # value of entry + if not hpair[0] or not hpair[1]: + flag = False + else: + # check if row has an ignore tag + if hignore is not None: + for tag in hignore: + taglen = len(tag) + if ( + len(hpair[0]) >= taglen + and hpair[0][:taglen] == tag + ): + flag = False + # add header data + if flag: + name = hpair[0] + value = hpair[1] + # check if data value should be stored as float + if validators.is_number(hpair[1]): + value = float(hpair[1]) + hdata.update({name: value}) + # continue search for the start of datablock + fpos = (fpos[1], fpos[1] + len(line)) + line = dline + ncv = countcolumnsvalues(line) + if ncv < mincv: + start = None + continue + # ncv is acceptable here, require the same number of columns + # throughout the datablock + if start is None or ncv != ncvblock: + ncvblock = ncv + nrows = 0 + start = fpos[0] + nrows += 1 + # block was found here! + if nrows >= minrows: + break + + # Return header data if requested + if headers: + return hdata # Return, so do not proceed to reading datablock + + # Return an empty array when no data found. + # loadtxt would otherwise raise an exception on loading from EOF. + if start is None: + data_block = array([], dtype=float) + else: + fid.seek(start) + # always use usecols argument so that loadtxt does not crash + # in case of trailing delimiters. + kwargs.setdefault("usecols", list(range(ncvblock[0]))) + data_block = loadtxt(fid, **kwargs) + return data_block diff --git a/tests/test_loaddata.py b/tests/test_loaddata.py index 82d947ee..95ac009f 100644 --- a/tests/test_loaddata.py +++ b/tests/test_loaddata.py @@ -6,6 +6,7 @@ import pytest from diffpy.utils.parsers.loaddata import loadData +from diffpy.utils.tools import load_data def test_loadData_default(datafile): @@ -80,3 +81,77 @@ def test_loadData_headers(datafile): loaddatawithheaders, headers=True, hdel=delimiter, hignore=hignore ) assert hdata == expected + + +def test_load_data_default(datafile): + """Check load_data() with default options.""" + loaddata01 = datafile("loaddata01.txt") + d2c = np.array([[3, 31], [4, 32], [5, 33]]) + + with pytest.raises(IOError) as err: + load_data("doesnotexist.txt") + assert str(err.value) == ( + "File doesnotexist.txt cannot be found. " + "Please rerun the program specifying a valid filename." + ) + + # The default minrows=10 makes it read from the third line + d = load_data(loaddata01) + assert np.array_equal(d2c, d) + + # The usecols=(0, 1) would make it read from the third line + d = load_data(loaddata01, minrows=1, usecols=(0, 1)) + assert np.array_equal(d2c, d) + + # Check the effect of usecols effect + d = load_data(loaddata01, usecols=(0,)) + assert np.array_equal(d2c[:, 0], d) + + d = load_data(loaddata01, usecols=(1,)) + assert np.array_equal(d2c[:, 1], d) + + +def test_load_data_1column(datafile): + """Check loading of one-column data.""" + loaddata01 = datafile("loaddata01.txt") + d1c = np.arange(1, 6) + + # Assertions using pytest's assert + d = load_data(loaddata01, usecols=[0], minrows=1) + assert np.array_equal(d1c, d) + + d = load_data(loaddata01, usecols=[0], minrows=2) + assert np.array_equal(d1c, d) + + d = load_data(loaddata01, usecols=[0], minrows=3) + assert not np.array_equal(d1c, d) + + +def test_load_data_headers(datafile): + """Check loadData() with headers options enabled.""" + expected = { + "wavelength": 0.1, + "dataformat": "Qnm", + "inputfile": "darkSub_rh20_C_01.chi", + "mode": "xray", + "bgscale": 1.2998929285, + "composition": "0.800.20", + "outputtype": "gr", + "qmaxinst": 25.0, + "qmin": 0.1, + "qmax": 25.0, + "rmax": "100.0r", + "rmin": "0.0r", + "rstep": "0.01r", + "rpoly": "0.9r", + } + + loaddatawithheaders = datafile("loaddatawithheaders.txt") + hignore = ["# ", "// ", "["] # ignore lines beginning with these strings + delimiter = ": " # what our data should be separated by + + # Load data with headers + hdata = load_data( + loaddatawithheaders, headers=True, hdel=delimiter, hignore=hignore + ) + assert hdata == expected diff --git a/tests/test_serialization.py b/tests/test_serialization.py index 049d325c..33adb4ee 100644 --- a/tests/test_serialization.py +++ b/tests/test_serialization.py @@ -7,7 +7,7 @@ ImproperSizeError, UnsupportedTypeError, ) -from diffpy.utils.parsers.loaddata import loadData +from diffpy.utils.parsers.loaddata import load_data from diffpy.utils.parsers.serialization import deserialize_data, serialize_data @@ -21,9 +21,9 @@ def test_load_multiple(tmp_path, datafile): generated_data = None for headerfile in tlm_list: - # gather data using loadData - hdata = loadData(headerfile, headers=True) - data_table = loadData(headerfile) + # gather data using load_data + hdata = load_data(headerfile, headers=True) + data_table = load_data(headerfile) # check path extraction generated_data = serialize_data( @@ -60,8 +60,8 @@ def test_exceptions(datafile): loadfile = datafile("loadfile.txt") warningfile = datafile("generatewarnings.txt") nodt = datafile("loaddatawithheaders.txt") - hdata = loadData(loadfile, headers=True) - data_table = loadData(loadfile) + hdata = load_data(loadfile, headers=True) + data_table = load_data(loadfile) # improper file types with pytest.raises(UnsupportedTypeError): @@ -123,15 +123,15 @@ def test_exceptions(datafile): assert numpy.allclose(r_extract[data_name]["r"], r_list) assert numpy.allclose(gr_extract[data_name]["gr"], gr_list) # no datatable - nodt_hdata = loadData(nodt, headers=True) - nodt_dt = loadData(nodt) + nodt_hdata = load_data(nodt, headers=True) + nodt_dt = load_data(nodt) no_dt = serialize_data(nodt, nodt_hdata, nodt_dt, show_path=False) nodt_data_name = list(no_dt.keys())[0] assert numpy.allclose(no_dt[nodt_data_name]["data table"], nodt_dt) # ensure user is warned when columns are overwritten - hdata = loadData(warningfile, headers=True) - data_table = loadData(warningfile) + hdata = load_data(warningfile, headers=True) + data_table = load_data(warningfile) with pytest.warns(RuntimeWarning) as record: serialize_data( warningfile, From 0b46822b2c7b4fb721b9976c8c121cb11d1e3560 Mon Sep 17 00:00:00 2001 From: Simon Billinge Date: Sat, 17 Jan 2026 14:06:34 -0500 Subject: [PATCH 02/16] fix: bad imports so tests pass --- src/diffpy/utils/parsers/serialization.py | 2 +- src/diffpy/utils/tools.py | 3 +-- tests/test_serialization.py | 2 +- 3 files changed, 3 insertions(+), 4 deletions(-) diff --git a/src/diffpy/utils/parsers/serialization.py b/src/diffpy/utils/parsers/serialization.py index b8ed0c60..5acbbe41 100644 --- a/src/diffpy/utils/parsers/serialization.py +++ b/src/diffpy/utils/parsers/serialization.py @@ -37,7 +37,7 @@ def serialize_data( into a serial language file. Dictionary is formatted as {filename: data}. - Requires hdata and data_table (can be generated by loadData). + Requires hdata and data_table (can be generated by load_data). Parameters ---------- diff --git a/src/diffpy/utils/tools.py b/src/diffpy/utils/tools.py index 4d4e19fb..94611a96 100644 --- a/src/diffpy/utils/tools.py +++ b/src/diffpy/utils/tools.py @@ -9,7 +9,6 @@ from xraydb import material_mu from diffpy.utils import validators -from diffpy.utils.parsers.loaddata import loadData def _stringify(string_value): @@ -391,7 +390,7 @@ def compute_mud(filepath): mu*D : float The best-fit mu*D value. """ - z_data, I_data = loadData(filepath, unpack=True) + z_data, I_data = load_data(filepath, unpack=True) best_mud, _ = min( (_compute_single_mud(z_data, I_data) for _ in range(20)), key=lambda pair: pair[1], diff --git a/tests/test_serialization.py b/tests/test_serialization.py index 33adb4ee..e5bc8e1d 100644 --- a/tests/test_serialization.py +++ b/tests/test_serialization.py @@ -7,8 +7,8 @@ ImproperSizeError, UnsupportedTypeError, ) -from diffpy.utils.parsers.loaddata import load_data from diffpy.utils.parsers.serialization import deserialize_data, serialize_data +from diffpy.utils.tools import load_data def test_load_multiple(tmp_path, datafile): From 4a1ee184a07be3642d37b7655e7016114435fec2 Mon Sep 17 00:00:00 2001 From: Simon Billinge Date: Sun, 18 Jan 2026 08:25:25 -0500 Subject: [PATCH 03/16] change: change structure so load_data is back in parsers --- src/diffpy/utils/parsers/__init__.py | 10 +- src/diffpy/utils/parsers/loaddata.py | 351 ++++++++++++++------------- src/diffpy/utils/tools.py | 181 +------------- tests/test_loaddata.py | 2 +- tests/test_serialization.py | 2 +- 5 files changed, 191 insertions(+), 355 deletions(-) diff --git a/src/diffpy/utils/parsers/__init__.py b/src/diffpy/utils/parsers/__init__.py index a0278e27..956f2e6d 100644 --- a/src/diffpy/utils/parsers/__init__.py +++ b/src/diffpy/utils/parsers/__init__.py @@ -6,10 +6,18 @@ # (c) 2010 The Trustees of Columbia University # in the City of New York. All rights reserved. # -# File coded by: Chris Farrow +# File coded by: Simon Billinge # # See AUTHORS.txt for a list of people who contributed. # See LICENSE_DANSE.txt for license information. # ############################################################################## """Various utilities related to data parsing and manipulation.""" + +# this allows load_data to be imported from diffpy.utils.parsers +# it is needed during deprecation of the old loadData structure +# when we remove loadData we can move all the parser functionality +# a parsers.py module (like tools.py) and remove this if we want +from .loaddata import load_data + +__all__ = ["load_data"] diff --git a/src/diffpy/utils/parsers/loaddata.py b/src/diffpy/utils/parsers/loaddata.py index 7de4204e..e48090e2 100644 --- a/src/diffpy/utils/parsers/loaddata.py +++ b/src/diffpy/utils/parsers/loaddata.py @@ -13,7 +13,7 @@ # ############################################################################## -import os +from pathlib import Path import numpy @@ -35,6 +35,178 @@ @deprecated(loaddata_deprecation_msg) def loadData( filename, minrows=10, headers=False, hdel="=", hignore=None, **kwargs +): + return load_data(filename, minrows, headers, hdel, hignore, **kwargs) + + +class TextDataLoader(object): + """Smart loading of a text data with possibly multiple datasets. + + Parameters + ---------- + minrows: int + Minimum number of rows in the first data block. (Default 10.) + usecols: tuple + Which columns in our dataset to use. Ignores all other columns. If + None (default), use all columns. + skiprows + Rows in dataset to skip. (Currently not functional.) + """ + + def __init__(self, minrows=10, usecols=None, skiprows=None): + if minrows is not None: + self.minrows = minrows + if usecols is not None: + self.usecols = tuple(usecols) + # FIXME: implement usage in _findDataBlocks + if skiprows is not None: + self.skiprows = skiprows + # data items + self._reset() + return + + def _reset(self): + self.filename = "" + self.headers = [] + self.datasets = [] + self._resetvars() + return + + def _resetvars(self): + self._filename = "" + self._lines = None + self._splitlines = None + self._words = None + self._linerecs = None + self._wordrecs = None + return + + def read(self, filename): + """Open a file and run readfp. + + Use if file is not already open for read byte. + """ + with open(filename, "rb") as fp: + self.readfp(fp) + return + + def readfp(self, fp, append=False): + """Get file details. + + File details include: + * File name. + * All data blocks findable by load_data. + * Headers (if present) for each data block. (Generally the headers + contain column name information). + """ + self._reset() + # try to read lines from fp first + self._lines = fp.readlines() + # and if good, assign filename + self.filename = getattr(fp, "name", "") + self._words = "".join(self._lines).split() + self._splitlines = [line.split() for line in self._lines] + self._findDataBlocks() + return + + def _findDataBlocks(self): + mincols = 1 + if self.usecols is not None and len(self.usecols): + mincols = max(mincols, max(self.usecols) + 1) + mincols = max(mincols, abs(min(self.usecols))) + nlines = len(self._lines) + nwords = len(self._words) + # idx - line index, nw0, nw1 - index of the first and last word, + # nf - number of words, ok - has data + self._linerecs = numpy.recarray( + (nlines,), + dtype=[ + ("idx", int), + ("nw0", int), + ("nw1", int), + ("nf", int), + ("ok", bool), + ], + ) + lr = self._linerecs + lr.idx = numpy.arange(nlines) + lr.nf = [len(sl) for sl in self._splitlines] + lr.nw1 = lr.nf.cumsum() + lr.nw0 = lr.nw1 - lr.nf + lr.ok = True + # word records + lw = self._wordrecs = numpy.recarray( + (nwords,), + dtype=[ + ("idx", int), + ("line", int), + ("col", int), + ("ok", bool), + ("value", float), + ], + ) + lw.idx = numpy.arange(nwords) + n1 = numpy.zeros(nwords, dtype=bool) + n1[lr.nw1[:-1]] = True + lw.line = n1.cumsum() + lw.col = lw.idx - lr.nw0[lw.line] + lw.ok = True + values = nwords * [0.0] + for i, w in enumerate(self._words): + try: + values[i] = float(w) + except ValueError: + lw.ok[i] = False + # prune lines that have a non-float values: + lw.values = values + if self.usecols is None: + badlines = lw.line[~lw.ok] + lr.ok[badlines] = False + else: + for col in self.usecols: + badlines = lw.line[(lw.col == col) & ~lw.ok] + lr.ok[badlines] = False + lr1 = lr[lr.nf >= mincols] + okb = numpy.r_[lr1.ok[:1], lr1.ok[1:] & ~lr1.ok[:-1], False] + oke = numpy.r_[False, ~lr1.ok[1:] & lr1.ok[:-1], lr1.ok[-1:]] + blockb = numpy.r_[True, lr1.nf[1:] != lr1.nf[:-1], False] + blocke = numpy.r_[False, blockb[1:-1], True] + beg = numpy.nonzero(okb | blockb)[0] + end = numpy.nonzero(oke | blocke)[0] + rowcounts = end - beg + assert not numpy.any(rowcounts < 0) + goodrows = rowcounts >= self.minrows + begend = numpy.transpose([beg, end - 1])[goodrows] + hbeg = 0 + for dbeg, dend in begend: + bb1 = lr1[dbeg] + ee1 = lr1[dend] + hend = bb1.idx + header = "".join(self._lines[hbeg:hend]) + hbeg = ee1.idx + 1 + if self.usecols is None: + data = numpy.reshape(lw.value[bb1.nw0 : ee1.nw1], (-1, bb1.nf)) + else: + tdata = numpy.empty( + (len(self.usecols), dend - dbeg), dtype=float + ) + for j, trow in zip(self.usecols, tdata): + j %= bb1.nf + trow[:] = lw.value[bb1.nw0 + j : ee1.nw1 : bb1.nf] + data = tdata.transpose() + self.headers.append(header) + self.datasets.append(data) + # finish reading to a last header and empty dataset + if hbeg < len(self._lines): + header = "".join(self._lines[hbeg:]) + data = numpy.empty(0, dtype=float) + self.headers.append(header) + self.datasets.append(data) + return + + +def load_data( + filename, minrows=10, headers=False, hdel="=", hignore=None, **kwargs ): """Find and load data from a text file. @@ -44,7 +216,7 @@ def loadData( Parameters ---------- - filename + filename: Path or string Name of the file we want to load data from. minrows: int Minimum number of rows in the first data block. All rows must have @@ -79,8 +251,8 @@ def loadData( comma-separated data blocks, set delimiter to ','. unpack: bool Return data as a sequence of columns that allows tuple unpacking such - as x, y = loadData(FILENAME, unpack=True). Note transposing the - loaded array as loadData(FILENAME).T has the same effect. + as x, y = load_data(FILENAME, unpack=True). Note transposing the + loaded array as load_data(FILENAME).T has the same effect. usecols: Zero-based index of columns to be loaded, by default use all detected columns. The reading skips data blocks that do not have the usecols- @@ -128,10 +300,11 @@ def countcolumnsvalues(line): return nc, nv # Check if file exists before trying to open - if not os.path.exists(filename): + filename = Path(filename) + if not filename.is_file(): raise IOError( ( - f"File {filename} cannot be found. " + f"File {str(filename)} cannot be found. " "Please rerun the program specifying a valid filename." ) ) @@ -209,169 +382,3 @@ def countcolumnsvalues(line): kwargs.setdefault("usecols", list(range(ncvblock[0]))) data_block = loadtxt(fid, **kwargs) return data_block - - -class TextDataLoader(object): - """Smart loading of a text data with possibly multiple datasets. - - Parameters - ---------- - minrows: int - Minimum number of rows in the first data block. (Default 10.) - usecols: tuple - Which columns in our dataset to use. Ignores all other columns. If - None (default), use all columns. - skiprows - Rows in dataset to skip. (Currently not functional.) - """ - - def __init__(self, minrows=10, usecols=None, skiprows=None): - if minrows is not None: - self.minrows = minrows - if usecols is not None: - self.usecols = tuple(usecols) - # FIXME: implement usage in _findDataBlocks - if skiprows is not None: - self.skiprows = skiprows - # data items - self._reset() - return - - def _reset(self): - self.filename = "" - self.headers = [] - self.datasets = [] - self._resetvars() - return - - def _resetvars(self): - self._filename = "" - self._lines = None - self._splitlines = None - self._words = None - self._linerecs = None - self._wordrecs = None - return - - def read(self, filename): - """Open a file and run readfp. - - Use if file is not already open for read byte. - """ - with open(filename, "rb") as fp: - self.readfp(fp) - return - - def readfp(self, fp, append=False): - """Get file details. - - File details include: - * File name. - * All data blocks findable by load_data. - * Headers (if present) for each data block. (Generally the headers - contain column name information). - """ - self._reset() - # try to read lines from fp first - self._lines = fp.readlines() - # and if good, assign filename - self.filename = getattr(fp, "name", "") - self._words = "".join(self._lines).split() - self._splitlines = [line.split() for line in self._lines] - self._findDataBlocks() - return - - def _findDataBlocks(self): - mincols = 1 - if self.usecols is not None and len(self.usecols): - mincols = max(mincols, max(self.usecols) + 1) - mincols = max(mincols, abs(min(self.usecols))) - nlines = len(self._lines) - nwords = len(self._words) - # idx - line index, nw0, nw1 - index of the first and last word, - # nf - number of words, ok - has data - self._linerecs = numpy.recarray( - (nlines,), - dtype=[ - ("idx", int), - ("nw0", int), - ("nw1", int), - ("nf", int), - ("ok", bool), - ], - ) - lr = self._linerecs - lr.idx = numpy.arange(nlines) - lr.nf = [len(sl) for sl in self._splitlines] - lr.nw1 = lr.nf.cumsum() - lr.nw0 = lr.nw1 - lr.nf - lr.ok = True - # word records - lw = self._wordrecs = numpy.recarray( - (nwords,), - dtype=[ - ("idx", int), - ("line", int), - ("col", int), - ("ok", bool), - ("value", float), - ], - ) - lw.idx = numpy.arange(nwords) - n1 = numpy.zeros(nwords, dtype=bool) - n1[lr.nw1[:-1]] = True - lw.line = n1.cumsum() - lw.col = lw.idx - lr.nw0[lw.line] - lw.ok = True - values = nwords * [0.0] - for i, w in enumerate(self._words): - try: - values[i] = float(w) - except ValueError: - lw.ok[i] = False - # prune lines that have a non-float values: - lw.values = values - if self.usecols is None: - badlines = lw.line[~lw.ok] - lr.ok[badlines] = False - else: - for col in self.usecols: - badlines = lw.line[(lw.col == col) & ~lw.ok] - lr.ok[badlines] = False - lr1 = lr[lr.nf >= mincols] - okb = numpy.r_[lr1.ok[:1], lr1.ok[1:] & ~lr1.ok[:-1], False] - oke = numpy.r_[False, ~lr1.ok[1:] & lr1.ok[:-1], lr1.ok[-1:]] - blockb = numpy.r_[True, lr1.nf[1:] != lr1.nf[:-1], False] - blocke = numpy.r_[False, blockb[1:-1], True] - beg = numpy.nonzero(okb | blockb)[0] - end = numpy.nonzero(oke | blocke)[0] - rowcounts = end - beg - assert not numpy.any(rowcounts < 0) - goodrows = rowcounts >= self.minrows - begend = numpy.transpose([beg, end - 1])[goodrows] - hbeg = 0 - for dbeg, dend in begend: - bb1 = lr1[dbeg] - ee1 = lr1[dend] - hend = bb1.idx - header = "".join(self._lines[hbeg:hend]) - hbeg = ee1.idx + 1 - if self.usecols is None: - data = numpy.reshape(lw.value[bb1.nw0 : ee1.nw1], (-1, bb1.nf)) - else: - tdata = numpy.empty( - (len(self.usecols), dend - dbeg), dtype=float - ) - for j, trow in zip(self.usecols, tdata): - j %= bb1.nf - trow[:] = lw.value[bb1.nw0 + j : ee1.nw1 : bb1.nf] - data = tdata.transpose() - self.headers.append(header) - self.datasets.append(data) - # finish reading to a last header and empty dataset - if hbeg < len(self._lines): - header = "".join(self._lines[hbeg:]) - data = numpy.empty(0, dtype=float) - self.headers.append(header) - self.datasets.append(data) - return diff --git a/src/diffpy/utils/tools.py b/src/diffpy/utils/tools.py index 94611a96..42e43bc8 100644 --- a/src/diffpy/utils/tools.py +++ b/src/diffpy/utils/tools.py @@ -8,7 +8,7 @@ from scipy.signal import convolve from xraydb import material_mu -from diffpy.utils import validators +from diffpy.utils.parsers import load_data def _stringify(string_value): @@ -396,182 +396,3 @@ def compute_mud(filepath): key=lambda pair: pair[1], ) return best_mud - - -def load_data( - filename, minrows=10, headers=False, hdel="=", hignore=None, **kwargs -): - """Find and load data from a text file. - - The data block is identified as the first matrix block of at least - minrows rows and constant number of columns. This seems to work for most - of the datafiles including those generated by diffpy programs. - - Parameters - ---------- - filename: Path or string - Name of the file we want to load data from. - minrows: int - Minimum number of rows in the first data block. All rows must have - the same number of floating point values. - headers: bool - when False (default), the function returns a numpy array of the data - in the data block. When True, the function instead returns a - dictionary of parameters and their corresponding values parsed from - header (information prior the data block). See hdel and hignore for - options to help with parsing header information. - hdel: str - (Only used when headers enabled.) Delimiter for parsing header - information (default '='). e.g. using default hdel, the line ' - parameter = p_value' is put into the dictionary as - {parameter: p_value}. - hignore: list - (Only used when headers enabled.) Ignore header rows beginning with - any elements in hignore. e.g. hignore=['# ', '['] causes the - following lines to be skipped: '# qmax=10', '[defaults]'. - kwargs: - Keyword arguments that are passed to numpy.loadtxt including the - following arguments below. (See numpy.loadtxt for more details.) Only - pass kwargs used by numpy.loadtxt. - - Useful kwargs - ============= - comments: str, sequence of str - The characters or list of characters used to indicate the start of a - comment (default '#'). Comment lines are ignored. - delimiter: str - Delimiter for the data in the block (default use whitespace). For - comma-separated data blocks, set delimiter to ','. - unpack: bool - Return data as a sequence of columns that allows tuple unpacking such - as x, y = load_data(FILENAME, unpack=True). Note transposing the - loaded array as load_data(FILENAME).T has the same effect. - usecols: - Zero-based index of columns to be loaded, by default use all detected - columns. The reading skips data blocks that do not have the usecols- - specified columns. - - Returns - ------- - data_block: ndarray - A numpy array containing the found data block. (This is not returned - if headers is enabled.) - hdata: dict - If headers are enabled, return a dictionary of parameters read from - the header. - """ - from numpy import array, loadtxt - - # for storing header data - hdata = {} - # determine the arguments - delimiter = kwargs.get("delimiter") - usecols = kwargs.get("usecols") - # required at least one column of floating point values - mincv = (1, 1) - # but if usecols is specified, require sufficient number of columns - # where the used columns contain floats - if usecols is not None: - hiidx = max(-min(usecols), max(usecols) + 1) - mincv = (hiidx, len(set(usecols))) - - # Check if a line consists of floats only and return their count - # Return zero if some strings cannot be converted. - def countcolumnsvalues(line): - try: - words = line.split(delimiter) - # remove trailing blank columns - while words and not words[-1].strip(): - words.pop(-1) - nc = len(words) - if usecols is not None: - nv = len([float(words[i]) for i in usecols]) - else: - nv = len([float(w) for w in words]) - except (IndexError, ValueError): - nc = nv = 0 - return nc, nv - - # Check if file exists before trying to open - filename = Path(filename) - if not filename.is_file(): - raise IOError( - ( - f"File {str(filename)} cannot be found. " - "Please rerun the program specifying a valid filename." - ) - ) - - # make sure fid gets cleaned up - with open(filename, "rb") as fid: - # search for the start of datablock - start = ncvblock = None - fpos = (0, 0) - nrows = 0 - for line in fid: - # decode line - dline = line.decode() - # find header information if requested - if headers: - hpair = dline.split(hdel) - flag = True - # ensure number of non-blank arguments is two - if len(hpair) != 2: - flag = False - else: - # ignore if an argument is blank - hpair[0] = hpair[0].strip() # name of data entry - hpair[1] = hpair[1].strip() # value of entry - if not hpair[0] or not hpair[1]: - flag = False - else: - # check if row has an ignore tag - if hignore is not None: - for tag in hignore: - taglen = len(tag) - if ( - len(hpair[0]) >= taglen - and hpair[0][:taglen] == tag - ): - flag = False - # add header data - if flag: - name = hpair[0] - value = hpair[1] - # check if data value should be stored as float - if validators.is_number(hpair[1]): - value = float(hpair[1]) - hdata.update({name: value}) - # continue search for the start of datablock - fpos = (fpos[1], fpos[1] + len(line)) - line = dline - ncv = countcolumnsvalues(line) - if ncv < mincv: - start = None - continue - # ncv is acceptable here, require the same number of columns - # throughout the datablock - if start is None or ncv != ncvblock: - ncvblock = ncv - nrows = 0 - start = fpos[0] - nrows += 1 - # block was found here! - if nrows >= minrows: - break - - # Return header data if requested - if headers: - return hdata # Return, so do not proceed to reading datablock - - # Return an empty array when no data found. - # loadtxt would otherwise raise an exception on loading from EOF. - if start is None: - data_block = array([], dtype=float) - else: - fid.seek(start) - # always use usecols argument so that loadtxt does not crash - # in case of trailing delimiters. - kwargs.setdefault("usecols", list(range(ncvblock[0]))) - data_block = loadtxt(fid, **kwargs) - return data_block diff --git a/tests/test_loaddata.py b/tests/test_loaddata.py index 95ac009f..92c53571 100644 --- a/tests/test_loaddata.py +++ b/tests/test_loaddata.py @@ -5,8 +5,8 @@ import numpy as np import pytest +from diffpy.utils.parsers import load_data from diffpy.utils.parsers.loaddata import loadData -from diffpy.utils.tools import load_data def test_loadData_default(datafile): diff --git a/tests/test_serialization.py b/tests/test_serialization.py index e5bc8e1d..eeab5307 100644 --- a/tests/test_serialization.py +++ b/tests/test_serialization.py @@ -3,12 +3,12 @@ import numpy import pytest +from diffpy.utils.parsers import load_data from diffpy.utils.parsers.custom_exceptions import ( ImproperSizeError, UnsupportedTypeError, ) from diffpy.utils.parsers.serialization import deserialize_data, serialize_data -from diffpy.utils.tools import load_data def test_load_multiple(tmp_path, datafile): From 8d11075c07d46594572da1c7ed2d20918f5a12ea Mon Sep 17 00:00:00 2001 From: Simon Billinge Date: Sun, 18 Jan 2026 08:46:37 -0500 Subject: [PATCH 04/16] docs: change docstring so API docs are correctly updated --- src/diffpy/utils/_deprecator.py | 7 +++++++ src/diffpy/utils/parsers/loaddata.py | 5 +++++ 2 files changed, 12 insertions(+) diff --git a/src/diffpy/utils/_deprecator.py b/src/diffpy/utils/_deprecator.py index c504604a..8cc50cd7 100644 --- a/src/diffpy/utils/_deprecator.py +++ b/src/diffpy/utils/_deprecator.py @@ -110,3 +110,10 @@ def deprecation_message( f"version {removal_version}. Please use '{new_base}.{new_name}' " f"instead." ) + + +_DEPRECATION_DOCSTRING_TEMPLATE = ( + "This function has been deprecated and will be " + "removed in version {removal_version}. Please use" + "{new_base}.{new_name} instead." +) diff --git a/src/diffpy/utils/parsers/loaddata.py b/src/diffpy/utils/parsers/loaddata.py index e48090e2..da422058 100644 --- a/src/diffpy/utils/parsers/loaddata.py +++ b/src/diffpy/utils/parsers/loaddata.py @@ -36,6 +36,11 @@ def loadData( filename, minrows=10, headers=False, hdel="=", hignore=None, **kwargs ): + """This function has been deprecated and will be removed in version + 4.0.0. + + Please use diffpy.utils.parsers.load_data instead. + """ return load_data(filename, minrows, headers, hdel, hignore, **kwargs) From 34bfb90816314d9da2bd306427ddb13eff0be48a Mon Sep 17 00:00:00 2001 From: Simon Billinge Date: Sun, 18 Jan 2026 09:03:02 -0500 Subject: [PATCH 05/16] docs; fix typos in examples text that had load_data in tools --- docs/source/examples/parsers_example.rst | 2 +- docs/source/examples/resample_example.rst | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/examples/parsers_example.rst b/docs/source/examples/parsers_example.rst index 19e607eb..747d0c4f 100644 --- a/docs/source/examples/parsers_example.rst +++ b/docs/source/examples/parsers_example.rst @@ -18,7 +18,7 @@ Using the parsers module, we can load file data into simple and easy-to-work-wit .. code-block:: python - from diffpy.utils.tools import load_data + from diffpy.utils.parsers import load_data data_table = load_data('') While this will work with most datasets, on our ``data.txt`` file, we got a ``ValueError``. The reason for this is diff --git a/docs/source/examples/resample_example.rst b/docs/source/examples/resample_example.rst index 5af42e73..32e3e02a 100644 --- a/docs/source/examples/resample_example.rst +++ b/docs/source/examples/resample_example.rst @@ -16,7 +16,7 @@ given enough datapoints. .. code-block:: python - from diffpy.utils.tools import load_data + from diffpy.utils.parsers import load_data nickel_datatable = load_data('') nitarget_datatable = load_data('') From ed72acace036b31f38c2f0d1326e8a0e46a2382e Mon Sep 17 00:00:00 2001 From: stevenhua0320 Date: Tue, 20 Jan 2026 23:17:15 -0500 Subject: [PATCH 06/16] build: Add support for python 3.14 and drop for python 3.11 --- news/python-version.rst | 23 +++++++++++++++++++++++ pyproject.toml | 4 ++-- 2 files changed, 25 insertions(+), 2 deletions(-) create mode 100644 news/python-version.rst diff --git a/news/python-version.rst b/news/python-version.rst new file mode 100644 index 00000000..a037a121 --- /dev/null +++ b/news/python-version.rst @@ -0,0 +1,23 @@ +**Added:** + +* Support for Python 3.14 + +**Changed:** + +* + +**Deprecated:** + +* + +**Removed:** + +* Support for Python 3.11 + +**Fixed:** + +* + +**Security:** + +* diff --git a/pyproject.toml b/pyproject.toml index 9799bbf8..2e3da67d 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -14,7 +14,7 @@ maintainers = [ description = "General utilities for analyzing diffraction data" keywords = ['text data parsers', 'wx grid', 'diffraction objects'] readme = "README.rst" -requires-python = ">=3.11, <3.14" +requires-python = ">=3.12, <3.15" classifiers = [ 'Development Status :: 5 - Production/Stable', 'Environment :: Console', @@ -25,9 +25,9 @@ classifiers = [ 'Operating System :: Microsoft :: Windows', 'Operating System :: POSIX', 'Operating System :: Unix', - 'Programming Language :: Python :: 3.11', 'Programming Language :: Python :: 3.12', 'Programming Language :: Python :: 3.13', + 'Programming Language :: Python :: 3.14', 'Topic :: Scientific/Engineering :: Physics', 'Topic :: Scientific/Engineering :: Chemistry', ] From 7ac6862a0157abfe3da876077988b0acfdc0357d Mon Sep 17 00:00:00 2001 From: Simon Billinge Date: Thu, 22 Jan 2026 07:51:51 -0800 Subject: [PATCH 07/16] fix: caught a couple of tools typos fixed to parsers --- news/depr-tests.rst | 2 +- src/diffpy/utils/parsers/loaddata.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/news/depr-tests.rst b/news/depr-tests.rst index 74f7c173..12559e13 100644 --- a/news/depr-tests.rst +++ b/news/depr-tests.rst @@ -8,7 +8,7 @@ **Deprecated:** -* diffpy.utils.parsers.loaddata.loadData replaced by diffpy.utils.tools.load_data +* diffpy.utils.parsers.loaddata.loadData replaced by diffpy.utils.parsers.load_data **Removed:** diff --git a/src/diffpy/utils/parsers/loaddata.py b/src/diffpy/utils/parsers/loaddata.py index da422058..18601258 100644 --- a/src/diffpy/utils/parsers/loaddata.py +++ b/src/diffpy/utils/parsers/loaddata.py @@ -28,7 +28,7 @@ "loadData", "load_data", removal_version, - new_base="diffpy.utils.tools", + new_base="diffpy.utils.parsers", ) From b90f56ad83ff9e75aeb55339470d387d6ff76d52 Mon Sep 17 00:00:00 2001 From: Caden Myers Date: Tue, 3 Feb 2026 09:50:25 -0500 Subject: [PATCH 08/16] fix import bug --- src/diffpy/utils/_deprecator.py | 39 +-------------------------------- 1 file changed, 1 insertion(+), 38 deletions(-) diff --git a/src/diffpy/utils/_deprecator.py b/src/diffpy/utils/_deprecator.py index 560c9530..317c224e 100644 --- a/src/diffpy/utils/_deprecator.py +++ b/src/diffpy/utils/_deprecator.py @@ -96,7 +96,7 @@ def wrapper(*args, **kwargs): return decorator -def build_deprecation_message( +def deprecation_message( old_base, old_name, new_name, removal_version, new_base=None ): """Generate a deprecation message. @@ -131,40 +131,3 @@ def build_deprecation_message( f"version {removal_version}. Please use '{new_base}.{new_name}' " f"instead." ) - - -def generate_deprecation_docstring(new_name, removal_version, new_base=None): - """Generate a docstring for copy-pasting into a deprecated function. - - this function will print the text to the terminal for copy-pasting - - usage: - python - >>> import diffpy.utils._deprecator.generate_deprecation_docstring as gdd - >>> gdd("new_name", "4.0.0") - - The message looks like: - This function has been deprecated and will be removed in version - {removal_version}. Please use {new_base}.{new_name} instead. - - Parameters - ---------- - new_name: str - The name of the new function or class to replace the existing one - removal_version: str - The version when the deprecated item is targeted for removal, - e.g., 4.0.0 - new_base: str Optional. Defaults to old_base. - The new base for importing. The new import statement would look like - "from new_base import new_name" - - Returns - ------- - None - """ - print( - f"This function has been deprecated and will be " - f"removed in version {removal_version}. Please use" - f"{new_base}.{new_name} instead." - ) - return From e4d3c6621b5b3d8b9339419a297b4f63a33c7dbd Mon Sep 17 00:00:00 2001 From: Caden Myers Date: Tue, 3 Feb 2026 10:02:00 -0500 Subject: [PATCH 09/16] add deprecator test --- tests/test_deprecator.py | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) create mode 100644 tests/test_deprecator.py diff --git a/tests/test_deprecator.py b/tests/test_deprecator.py new file mode 100644 index 00000000..8743474c --- /dev/null +++ b/tests/test_deprecator.py @@ -0,0 +1,37 @@ +import pytest + +from diffpy.utils._deprecator import deprecated, deprecation_message + +old_base = "diffpy.utils" +old_name = "oldFunction" +new_name = "new_function" +removal_version = "4.0.0" + +dep_msg = deprecation_message(old_base, old_name, new_name, removal_version) + + +@deprecated(dep_msg) +def oldFunction(print_msg): + """This function is deprecated and will be removed in version 4.0.0. + + Please use newFunction instead + """ + return new_function(print_msg) + + +def new_function(print_msg): + print(print_msg) + return + + +def test_deprecated(capsys): + # Case: user deprecates a function with the deprecated decorator + # Expected: DeprecationWarning is raised when the function is called + # and the function executes correctly + expected_print_msg = "Testing deprecated function" + with pytest.deprecated_call(match=dep_msg): + oldFunction(expected_print_msg) + + captured = capsys.readouterr() + actual_print_msg = captured.out.strip() + assert actual_print_msg == expected_print_msg From 28b607423b0252c94755329e87961cd3b4d43f05 Mon Sep 17 00:00:00 2001 From: Caden Myers Date: Tue, 3 Feb 2026 10:02:38 -0500 Subject: [PATCH 10/16] precommit --- tests/test_deprecator.py | 1 - 1 file changed, 1 deletion(-) diff --git a/tests/test_deprecator.py b/tests/test_deprecator.py index 8743474c..f6d4c912 100644 --- a/tests/test_deprecator.py +++ b/tests/test_deprecator.py @@ -31,7 +31,6 @@ def test_deprecated(capsys): expected_print_msg = "Testing deprecated function" with pytest.deprecated_call(match=dep_msg): oldFunction(expected_print_msg) - captured = capsys.readouterr() actual_print_msg = captured.out.strip() assert actual_print_msg == expected_print_msg From 1c6bcda4e8964a512ac0cbb12b95f5fa73d7968e Mon Sep 17 00:00:00 2001 From: Caden Myers Date: Tue, 3 Feb 2026 12:03:45 -0500 Subject: [PATCH 11/16] change function to build_deprecation_message --- news/dep-msg-helper.rst | 2 +- src/diffpy/utils/_deprecator.py | 11 +++++++---- src/diffpy/utils/parsers/loaddata.py | 4 ++-- tests/test_deprecator.py | 6 ++++-- 4 files changed, 14 insertions(+), 9 deletions(-) diff --git a/news/dep-msg-helper.rst b/news/dep-msg-helper.rst index 733801e6..620a532a 100644 --- a/news/dep-msg-helper.rst +++ b/news/dep-msg-helper.rst @@ -1,6 +1,6 @@ **Added:** -* Add ``deprecation_message`` helper for printing consistent deprecation messages. +* Add ``build_deprecation_message`` helper for printing consistent deprecation messages. **Changed:** diff --git a/src/diffpy/utils/_deprecator.py b/src/diffpy/utils/_deprecator.py index 317c224e..6eeb2225 100644 --- a/src/diffpy/utils/_deprecator.py +++ b/src/diffpy/utils/_deprecator.py @@ -20,7 +20,9 @@ def deprecated(message, *, category=DeprecationWarning, stacklevel=1): .. code-block:: python - from diffpy.utils._deprecator import deprecated, deprecation_message + from diffpy.utils._deprecator import ( + deprecated, build_deprecation_message + ) deprecation_warning = build_deprecation_message("diffpy.utils", "old_function", @@ -44,8 +46,9 @@ def new_function(x, y): .. code-block:: python - from diffpy.utils._deprecator import deprecated, deprecation_message - + from diffpy.utils._deprecator import ( + deprecated, build_deprecation_message + ) deprecation_warning = build_deprecation_message("diffpy.utils", "OldAtom", "NewAtom", @@ -96,7 +99,7 @@ def wrapper(*args, **kwargs): return decorator -def deprecation_message( +def build_deprecation_message( old_base, old_name, new_name, removal_version, new_base=None ): """Generate a deprecation message. diff --git a/src/diffpy/utils/parsers/loaddata.py b/src/diffpy/utils/parsers/loaddata.py index 18601258..58a49a46 100644 --- a/src/diffpy/utils/parsers/loaddata.py +++ b/src/diffpy/utils/parsers/loaddata.py @@ -18,12 +18,12 @@ import numpy from diffpy.utils import validators -from diffpy.utils._deprecator import deprecated, deprecation_message +from diffpy.utils._deprecator import build_deprecation_message, deprecated base = "diffpy.utils.parsers.loaddata" removal_version = "4.0.0" -loaddata_deprecation_msg = deprecation_message( +loaddata_deprecation_msg = build_deprecation_message( base, "loadData", "load_data", diff --git a/tests/test_deprecator.py b/tests/test_deprecator.py index f6d4c912..c4f2d25b 100644 --- a/tests/test_deprecator.py +++ b/tests/test_deprecator.py @@ -1,13 +1,15 @@ import pytest -from diffpy.utils._deprecator import deprecated, deprecation_message +from diffpy.utils._deprecator import build_deprecation_message, deprecated old_base = "diffpy.utils" old_name = "oldFunction" new_name = "new_function" removal_version = "4.0.0" -dep_msg = deprecation_message(old_base, old_name, new_name, removal_version) +dep_msg = build_deprecation_message( + old_base, old_name, new_name, removal_version +) @deprecated(dep_msg) From 2a79ecf3438ef5512196c715d1e14ff02c2e9c0d Mon Sep 17 00:00:00 2001 From: Caden Myers Date: Wed, 4 Feb 2026 10:03:07 -0500 Subject: [PATCH 12/16] readd docstring generator with a simple cli tool --- src/diffpy/utils/_deprecator.py | 71 +++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) diff --git a/src/diffpy/utils/_deprecator.py b/src/diffpy/utils/_deprecator.py index 6eeb2225..47d2d6ef 100644 --- a/src/diffpy/utils/_deprecator.py +++ b/src/diffpy/utils/_deprecator.py @@ -1,3 +1,4 @@ +import argparse import functools import warnings @@ -134,3 +135,73 @@ def build_deprecation_message( f"version {removal_version}. Please use '{new_base}.{new_name}' " f"instead." ) + + +def generate_deprecation_docstring(new_name, removal_version, new_base=None): + """Generate a docstring for copy-pasting into a deprecated function. + + This function will print the text to the terminal for copy-pasting. + + Usage + ----- + >>> from diffpy.utils._deprecator import generate_deprecation_docstring + >>> generate_deprecation_docstring("new_name", "4.0.0") + + The message looks like: + This function has been deprecated and will be removed in version + {removal_version}. Please use {new_base}.{new_name} instead. + + Parameters + ---------- + new_name : str + The name of the new function or class to replace the existing one. + removal_version : str + The version when the deprecated item is targeted for removal, + e.g., 4.0.0. + new_base : str, optional + The new base for importing. The new import statement would look like + "from new_base import new_name". Defaults to None. + + Returns + ------- + None + """ + print( + f"This function has been deprecated and will be removed in version " + f"{removal_version}.\n" + f"Please use {new_base}.{new_name} instead." + ) + return + + +def main(): + parser = argparse.ArgumentParser( + description="""\ +Print a docstring for copy-pasting into a deprecated function. + +Example usage +------------- + python -m diffpy.utils._deprecator load_data 4.0.0 -n diffpy.utils.parsers +""", + formatter_class=argparse.RawTextHelpFormatter, + ) + parser.add_argument("new_name", help="Name of the new function.") + parser.add_argument( + "removal_version", + help="Version when the deprecated item will be removed.", + ) + parser.add_argument( + "-n", "--new-base", default=None, help="New import base." + ) + + args = parser.parse_args() + + generate_deprecation_docstring( + new_name=args.new_name, + removal_version=args.removal_version, + new_base=args.new_base, + ) + + +if __name__ == "__main__": + main() From 106b419417c1306623a194634c691f2163df5c57 Mon Sep 17 00:00:00 2001 From: Simon Billinge Date: Wed, 4 Feb 2026 08:05:08 -0800 Subject: [PATCH 13/16] Update news for Python 3.14 support and changes Added command line interface for generating template docstring for deprecated functions and updated references for loadData. --- news/python-version.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/news/python-version.rst b/news/python-version.rst index a037a121..a5d08f5b 100644 --- a/news/python-version.rst +++ b/news/python-version.rst @@ -1,6 +1,7 @@ **Added:** * Support for Python 3.14 +* Command line interface for generating a template docstring for deprecated functions **Changed:** @@ -16,7 +17,7 @@ **Fixed:** -* +* All references to ``loadData`` changed to ``load_data`` due to that deprecation **Security:** From 77db094c81311bd7cf44bebedf1427bf56a978d8 Mon Sep 17 00:00:00 2001 From: Caden Myers Date: Wed, 4 Feb 2026 11:08:10 -0500 Subject: [PATCH 14/16] reorder items in docstring for proper rendering --- src/diffpy/utils/_deprecator.py | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/src/diffpy/utils/_deprecator.py b/src/diffpy/utils/_deprecator.py index 47d2d6ef..1fcf3f00 100644 --- a/src/diffpy/utils/_deprecator.py +++ b/src/diffpy/utils/_deprecator.py @@ -142,15 +142,6 @@ def generate_deprecation_docstring(new_name, removal_version, new_base=None): This function will print the text to the terminal for copy-pasting. - Usage - ----- - >>> from diffpy.utils._deprecator import generate_deprecation_docstring - >>> generate_deprecation_docstring("new_name", "4.0.0") - - The message looks like: - This function has been deprecated and will be removed in version - {removal_version}. Please use {new_base}.{new_name} instead. - Parameters ---------- new_name : str @@ -162,6 +153,16 @@ def generate_deprecation_docstring(new_name, removal_version, new_base=None): The new base for importing. The new import statement would look like "from new_base import new_name". Defaults to None. + Example + ------- + >>> from diffpy.utils._deprecator import generate_deprecation_docstring + >>> generate_deprecation_docstring("new_name", "4.0.0") + + The message looks like: + This function has been deprecated and will be removed in version + {removal_version}. Please use {new_base}.{new_name} instead. + + Returns ------- None From ce61f514ce4f0368f39fc15d12708f61e138b635 Mon Sep 17 00:00:00 2001 From: Caden Myers Date: Wed, 4 Feb 2026 11:23:26 -0500 Subject: [PATCH 15/16] rm cli, we will put it in scikit-package --- src/diffpy/utils/_deprecator.py | 34 --------------------------------- 1 file changed, 34 deletions(-) diff --git a/src/diffpy/utils/_deprecator.py b/src/diffpy/utils/_deprecator.py index 1fcf3f00..0d5ce770 100644 --- a/src/diffpy/utils/_deprecator.py +++ b/src/diffpy/utils/_deprecator.py @@ -1,4 +1,3 @@ -import argparse import functools import warnings @@ -173,36 +172,3 @@ def generate_deprecation_docstring(new_name, removal_version, new_base=None): f"Please use {new_base}.{new_name} instead." ) return - - -def main(): - parser = argparse.ArgumentParser( - description="""\ -Print a docstring for copy-pasting into a deprecated function. - -Example usage -------------- - python -m diffpy.utils._deprecator load_data 4.0.0 -n diffpy.utils.parsers -""", - formatter_class=argparse.RawTextHelpFormatter, - ) - parser.add_argument("new_name", help="Name of the new function.") - parser.add_argument( - "removal_version", - help="Version when the deprecated item will be removed.", - ) - parser.add_argument( - "-n", "--new-base", default=None, help="New import base." - ) - - args = parser.parse_args() - - generate_deprecation_docstring( - new_name=args.new_name, - removal_version=args.removal_version, - new_base=args.new_base, - ) - - -if __name__ == "__main__": - main() From efd094d61e35d2d45a511e8950e7c48d442a49b9 Mon Sep 17 00:00:00 2001 From: Caden Myers Date: Wed, 4 Feb 2026 11:37:36 -0500 Subject: [PATCH 16/16] clean news items to reflect new changes --- news/dep-msg-helper.rst | 2 ++ news/deprecator.rst | 1 - news/python-version.rst | 1 - 3 files changed, 2 insertions(+), 2 deletions(-) diff --git a/news/dep-msg-helper.rst b/news/dep-msg-helper.rst index 620a532a..91496c2b 100644 --- a/news/dep-msg-helper.rst +++ b/news/dep-msg-helper.rst @@ -1,6 +1,8 @@ **Added:** * Add ``build_deprecation_message`` helper for printing consistent deprecation messages. +* Add ``generate_deprecation_docstring`` for generating a template docstring for deprecated functions. + **Changed:** diff --git a/news/deprecator.rst b/news/deprecator.rst index 01b88147..dabbfdf1 100644 --- a/news/deprecator.rst +++ b/news/deprecator.rst @@ -1,6 +1,5 @@ **Added:** -* added a function in _deprecator to generate a deprecation message for copy pasting * Add ``@deprecated`` decorator. **Changed:** diff --git a/news/python-version.rst b/news/python-version.rst index a5d08f5b..03d0f371 100644 --- a/news/python-version.rst +++ b/news/python-version.rst @@ -1,7 +1,6 @@ **Added:** * Support for Python 3.14 -* Command line interface for generating a template docstring for deprecated functions **Changed:**