Ë ïò3j5,ãóÒ—UdZddlZddlmZmZmZdZdZdZdZ dZ d Zd ZdezZ eed<d ededefd„Zd ededzfd„Zd ededzfd„Zd ededzfd„Zdedefd„Zddededefd„Zy)a«Stage 1a+: UTF-16/UTF-32 detection for data without BOM. This stage runs after BOM detection but before binary detection. UTF-16 and UTF-32 encoded text contains characteristic null-byte patterns that would otherwise cause binary detection to reject the data. Note: ``from __future__ import annotations`` is intentionally omitted because this module is compiled with mypyc, which does not support PEP 563 string annotations. éN)ÚASCII_TEXT_BYTESÚDETERMINISTIC_CONFIDENCEÚDetectionResultiéé g¸…ëQ¸ž?çà?gffffffæ?g333333Ã?óÚ_NULL_SEPARATOR_ALLOWEDÚdataÚ null_fracÚreturncóD—|tk\ry|jdt«S)u‹Return True if the data looks like ASCII with null byte separators. :param data: The raw byte sample to examine. :param null_frac: The positional null fraction for this UTF-16 candidate (i.e. fraction of null bytes in even positions for BE, or odd positions for LE) â€” not the total null fraction across all bytes. Checks two conditions: 1. The positional null fraction is below ``_NULL_SEPARATOR_MAX_FRACTION`` 2. Every non-null byte is printable ASCII or common whitespace When both conditions are met, the nulls are likely field separators (e.g. ``find -print0``), not UTF-16 encoding artifacts. FN)Ú_NULL_SEPARATOR_MAX_FRACTIONÚ translater )rrs úE/DATA/.local/lib/python3.12/site-packages/chardet/pipeline/utf1632.pyÚ_is_null_separator_patternr6s%€ðÔ0Ò0ØØ~‰~˜dÔ$;Ó<Ð<Ð<ócón—|dt}t|«tkryt|«}||St |«S)aDetect UTF-32 or UTF-16 encoding from null-byte patterns. UTF-32 is checked before UTF-16 since UTF-32 patterns are more specific. :param data: The raw byte data to examine. :returns: A :class:`DetectionResult` if a strong pattern is found, or ``None``. N)Ú_SAMPLE_SIZEÚlenÚ_MIN_BYTES_UTF16Ú_check_utf32Ú_check_utf16)rÚsampleÚresults rÚdetect_utf1632_patternsrJsD€ð-”<Ð €Fä ˆ6ƒ{Ô%Ò%Øô˜&Ó !€FØ ÐØˆ ô˜ÓÐrc óÔ‡—t‰«t‰«dzz }|tkry‰d|Š|dz}tˆfd„tdt‰«d«D««}tˆfd„tdt‰«d«D««}||k(r8||zdkDr0 ‰j d«}t|«rt dtd¬«S tˆfd „td t‰«d«D««}tˆfd„tdt‰«d«D««}||k(r9||zdkDr1 ‰j d «}t|«rt d td¬«S yy#t$rYŒšwxYw#t$rYywxYw)a’Check for UTF-32 encoding based on 4-byte unit structure. For valid Unicode (U+0000 to U+10FFFF = 0x0010FFFF): - UTF-32-BE: the first byte of each 4-byte unit is always 0x00 - UTF-32-LE: the last byte of each 4-byte unit is always 0x00 For BMP characters (U+0000 to U+FFFF), additionally: - UTF-32-BE: the second byte is also 0x00 - UTF-32-LE: the third byte is also 0x00 éNc3ó4•K—|]}‰|dk(sŒd–—Œyw©réN©©Ú.0Úirs €rÚ z_check_utf32..tóøèø€ÐJÑ#9˜a¸TÀ!¹WÈ»\œÑ#9ùóƒ ‘rc3ó:•K—|]}‰|dzdk(sŒd–—Œyw)r!rNr"r#s €rr&z_check_utf32..vs#øèø€ÐOÑ$:˜q¸dÀ1ÀqÁ5¹kÈQÓ>NœÑ$:ùsƒ”rz utf-32-be©ÚencodingÚ confidenceÚlanguagec3ó4•K—|]}‰|dk(sŒd–—Œywr r"r#s €rr&z_check_utf32..…søèø€ÐIÑ"8˜Q¸DÀ¹GÀq»L”qÑ"8ùr(éc3ó4•K—|]}‰|dk(sŒd–—Œywr r"r#s €rr&z_check_utf32..‡r'r(éz utf-32-le) rÚ_MIN_BYTES_UTF32ÚsumÚrangeÚdecodeÚ_looks_like_textrrÚUnicodeDecodeError)rÚtrimmed_lenÚ num_unitsÚ be_first_nullÚbe_second_nullÚtextÚle_last_nullÚ le_third_nulls` rrr`sxø€ôd“)œs 4›y¨1™}Ñ-€KØÔ%Ò%ØØÐ€Dà˜qÑ €IôÓJ¤5¨¬C°«I°qÔ#9ÓJÓJ€MäÓO¤E¨!¬S°«Y¸Ô$:ÓOÓO€Nà˜ Ò! n°yÑ&@À3Ò&Fð Ø—;‘;˜{Ó+ˆDÜ Ô%Ü&Ø(Ü7Ø!ôðð&ôÓI¤%¨¬3¨t«9°aÔ"8ÓIÓI€LäÓJ¤5¨¬C°«I°qÔ#9ÓJÓJ€MàyÒ ]°YÑ%>ÀÒ%Dð Ø—;‘;˜{Ó+ˆDÜ Ô%Ü&Ø(Ü7Ø!ôðð&ñøô)"ò Ùð ûô""ò Øàð ús$Â-EÄ-EÅ EÅEÅ E'Å&E'có&‡—tt‰«t«}||dzz}|tkry|dz}t ˆfd„td|d«D««}t ˆfd„td|d«D««}||z}||z}g}|tk\r"t‰d||«s|jd|f«|tk\r"t‰d||«s|jd|f«|syt|«dk(r<|dd} ‰d|j|«} t| «rt|td¬ «S yd} d }|D]/\}} ‰d|j|«} t| «} | |kDsŒ,| }|} Œ1| |tk\rt| td¬ «Sy#t$rYywxYw#t$rYŒjwxYw)aýCheck for UTF-16 via null-byte patterns in alternating positions. UTF-16 encodes each BMP character as two bytes. For characters whose code-point high byte is 0x00 (Latin, digits, basic punctuation, many control structures), one of the two bytes in each unit will be a null. Even for non-Latin scripts (Arabic, CJK, Cyrillic, etc.) a significant fraction of code units still contain at least one null byte. Non-UTF-16 single-byte encodings never contain null bytes, so even a small null-byte fraction in alternating positions is a strong signal. When both endiannesses show null-byte patterns (e.g., Latin text where every other byte is null), we disambiguate by decoding both ways and comparing text-quality scores. r1Nc3ó4•K—|]}‰|dk(sŒd–—Œywr r"r#s €rr&z_check_utf16..°óøèø€ÐKÑ#:˜a¸dÀ1¹gÈ»lœÑ#:ùr(rc3ó4•K—|]}‰|dk(sŒd–—Œywr r"r#s €rr&z_check_utf16..²rAr(r!z utf-16-lez utf-16-ber*çð¿)Úminrrrr3r4Ú_UTF16_MIN_NULL_FRACTIONrÚappendr5r6rrr7Ú _text_qualityÚ_MIN_TEXT_QUALITY)rÚ sample_lenr9Ú be_null_countÚ le_null_countÚbe_fracÚle_fracÚ candidatesr+r<Ú best_encodingÚbest_qualityÚ_Úqualitys` rrr˜søø€ô ”S˜“Y¤Ó-€JØ*˜q‘.Ñ €JØÔ$Ò$Øà˜a‘€IôÓK¤5¨¨J¸Ô#:ÓKÓK€MäÓK¤5¨¨J¸Ô#:ÓKÓK€Mà˜iÑ'€GØ˜iÑ'€Gà*,€JØÔ*Ò*Ô3MØˆ[ˆjÐ˜7ô4ð ×Ñ˜;¨Ð0Ô1ØÔ*Ò*Ô3MØˆ[ˆjÐ˜7ô4ð ×Ñ˜;¨Ð0Ô1áØôˆ:ƒ˜!ÒØ˜a‘= Ñ#ˆð Ø˜˜Ð$×+Ñ+¨HÓ5ˆDÜ Ô%Ü&Ø%Ü7Ø!ôðð&ðð!%€MØ€Lã!‰ˆ!ð Ø˜˜Ð$×+Ñ+¨HÓ5ˆDô Ó%ˆØ\Ó!Ø"ˆLØ$‰Mð"ðÐ \Ô5FÒ%FÜØ"Ü/Øô ð ðøô5"ò ØØð ûô"ò Ùð ús$Ã,0E5Ä,FÅ5 FÆFÆ FÆFr<có`—|sy|dd}td„|D««}|t|«ztkDS)z9Quick check: is decoded text mostly printable characters.FNéôc3óJK—|]}|j«s|dvsŒd–—Œyw)ú r!N)Úisprintable)r$Úcs rr&z#_looks_like_text..ñsèø€ÐJ™v˜!¨¯©¬¸AÀºM”A™vùs‚#œ#)r3rÚ_MIN_PRINTABLE_FRACTION)r<rÚ printables rr6r6ìs8€áØØ $3ˆZ€FÜÑJ™vÓJÓJ€IØ”s˜6“{Ñ"Ô%<Ñ<Ð10% control characters or >20% combining marks (category ``M*``). NrrCÚLr!é€ÚMÚZsrVÚCgš™™™™™¹?gš™™™™™É?ré)rÚunicodedataÚcategoryÚord)r<r[rÚnÚlettersÚmarksÚspacesÚcontrolsÚ ascii_lettersrXÚcatÚscores rrGrGõs€ð(&5ˆ\€FÜˆF‹€AØˆA‚vØà€GØ €EØ €FØ€HØ€Mã ˆÜ×"Ñ" 1Ó%ˆØˆq‰6SŠ=Øq‰LˆGÜ1‹v˜‹|Ø Ñ"‘ Ø ‰VsŠ]ØQ‰J‰EØ DŠ[˜A ™MØa‰K‰FØ ‰Vs‹]Ø˜‰M‰Hðð!|cÒØØˆqy3‚Øàa‰K€Eà ˆm˜aÑ 3Ñ &Ñ&€Eàˆ2‚v&˜1’*Ø ‰ˆà€Lr)rT)Ú__doc__rcÚchardet.pipelinerrrrr2rrErHrYrr ÚbytesÚ__annotations__ÚfloatÚboolrrrrÚstrr6ÚintrGr"rrÚrvsñðò óçXÑXð€ðÐØÐð ÐðÐðÐð $Ðð")Ð+;Ñ!;Ð˜Ó;ð= Uð=°uð=Àó=ð( %ð ¨O¸dÑ,Bó ð,5uð5 °4Ñ!7ó5ðpQuðQ °4Ñ!7óQðh=˜3ð= 4ó=ñ9˜ð9 Cð9°%ô9r