| Server IP : 162.144.4.212 / Your IP : 216.73.216.108 Web Server : Apache System : Linux gator2125.hostgator.com 5.14.0-162.23.1.9991722448259.nf.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Jul 31 18:11:45 UTC 2024 x86_64 User : cozeellc ( 2980) PHP Version : 8.3.31 Disable Function : NONE MySQL : OFF | cURL : ON | WGET : ON | Perl : ON | Python : OFF | Sudo : ON | Pkexec : ON Directory : /usr/libexec/oracle-cloud-agent/plugins/osms/chardet/ |
Upload File : |
a
i��f, � @ sL d dl Z d dlZd dlmZmZ ddlmZmZ e�d�Z G dd� d�Z
dS )� N)�Optional�Union� )�LanguageFilter�ProbingStates% [a-zA-Z]*[�-�]+[a-zA-Z]*[^a-zA-Z�-�]?c @ s� e Zd ZdZejfedd�dd�Zdd�dd�Zee e
d�d d
��Zee e
d�dd��Ze
eef ed
�dd�Zeed�dd��Zed�dd�Zee
eef ed�dd��Zee
eef ed�dd��Zee
eef ed�dd��ZdS )�
CharSetProbergffffff�?N)�lang_filter�returnc C s$ t j| _d| _|| _t�t�| _d S )NT) r � DETECTING�_state�activer �logging� getLogger�__name__�logger)�selfr � r �y/sparta/input/_build_configuration/image_build+validate/lib/bmcenv/lib64/python3.9/site-packages/chardet/charsetprober.py�__init__, s zCharSetProber.__init__)r c C s t j| _d S �N)r r
r �r r r r �reset2 s zCharSetProber.resetc C s d S r r r r r r �charset_name5 s zCharSetProber.charset_namec C s t �d S r ��NotImplementedErrorr r r r �language9 s zCharSetProber.language)�byte_strr c C s t �d S r r )r r r r r �feed= s zCharSetProber.feedc C s | j S r )r r r r r �state@ s zCharSetProber.statec C s dS )Ng r r r r r �get_confidenceD s zCharSetProber.get_confidence)�bufr c C s t �dd| �} | S )Ns ([ -])+� )�re�sub)r r r r �filter_high_byte_onlyG s z#CharSetProber.filter_high_byte_onlyc C sZ t � }t�| �}|D ]@}|�|dd� � |dd� }|�� sJ|dk rJd}|�|� q|S )u7
We define three types of bytes:
alphabet: english alphabets [a-zA-Z]
international: international characters [-ÿ]
marker: everything else [^a-zA-Z-ÿ]
The input buffer can be thought to contain a series of words delimited
by markers. This function works to filter all words that contain at
least one international character. All contiguous sequences of markers
are replaced by a single space ascii character.
This filter applies to all scripts which do not use English characters.
N���� �r! )� bytearray�INTERNATIONAL_WORDS_PATTERN�findall�extend�isalpha)r �filtered�words�word� last_charr r r �filter_international_wordsL s
z(CharSetProber.filter_international_wordsc C s� t � }d}d}t| ��d�} t| �D ]R\}}|dkrB|d }d}q$|dkr$||krr|sr|�| ||� � |�d� d}q$|s�|�| |d � � |S )
a[
Returns a copy of ``buf`` that retains only the sequences of English
alphabet and high byte characters that are not between <> characters.
This filter can be applied to all scripts which contain both English
characters and extended ASCII characters, but is currently only used by
``Latin1Prober``.
Fr �c� >r � <r! TN)r'