首頁猿問將包含 ISO 8859-1...

將包含 ISO 8859-1 十六進(jìn)制字符代碼的字符串轉(zhuǎn)換為 UTF-8 java

Java

慕運(yùn)維8079593 2023-03-23 13:52:15

我有一個字符串，我相信它包含一些 ISO-8859-1 十六進(jìn)制字符代碼String doc = "#xC1;o thun b#xE9; g#xE1;i c#x1ED9;t d#xE2;y xanh bi#x1EC3;n"我想把它改成這樣，áo thun bé gái c?t day xanh bi?n我試過這種方法但沒有運(yùn)氣byte[] isoBytes = doc.getBytes("ISO-8859-1"); System.out.println(new String(isoBytes, "UTF-8"));轉(zhuǎn)換它的正確方法是什么？非常感謝您的幫助！

查看完整描述

3 回答

米琪卡哇伊

TA貢獻(xiàn)1998條經(jīng)驗(yàn) 獲得超6個贊

假設(shè)#nnnn;序列是普通的舊 Unicode 字符表示，我建議采用以下方法。

class Cvt {

static String convert(String in) {

String str = in;

int curPos = 0;

while (curPos < str.length()) {

int j = str.indexOf("#x", curPos);

if (j < 0) // no more #x

curPos = str.length();

else {

int k = str.indexOf(';', curPos + 2);

if (k < 0) // unterminated #x

curPos = str.length();

else { // convert #xNNNN;

int n = Integer.parseInt(str.substring(j+2, k), 16);

char[] ch = { (char)n };

str = str.substring(0, j) + new String(ch) + str.substring(k+1);

curPos = j + 1; // after ch

}

return str;

}

static public void main(String... args) {

String doc = "#xC1;o thun b#xE9; g#xE1;i c#x1ED9;t d#xE2;y xanh bi#x1EC3;n";

System.out.println(convert(doc));

}

這與之前答案的方法非常相似，除了假設(shè)字符是 Unicode 代碼點(diǎn)而不是 8859-1 代碼點(diǎn)。

輸出是

女嬰藍(lán)色領(lǐng)帶 T 恤

反對回復(fù) 2023-03-23

米脂

TA貢獻(xiàn)1836條經(jīng)驗(yàn) 獲得超3個贊

在這種情況下，代碼確實(shí)會掩蓋需求。要求有點(diǎn)不確定，但似乎是對類似于 HTML 和 XML 的專用 Unicode 字符實(shí)體引用進(jìn)行解碼，如評論中所述。

正則表達(dá)式引擎的優(yōu)勢超過理解模式語言所需的任何學(xué)習(xí)的情況也很少見。

String input = "#xC1;o thun b#xE9; g#xE1;i c#x1ED9;t d#xE2;y xanh bi#x1EC3;n";

// Hex digits between "#x" and ";" are a Unicode codepoint value

String text = java.util.regex.Pattern.compile("(#x([0-9A-Fa-f]+);)")

.matcher(input)

// group 2 is the matched input between the 2nd ( in the pattern and its paired )

.replaceAll(x -> new String(Character.toChars(Integer.parseInt(x.group(2), 16))));

System.out.println(text);

匹配器函數(shù)查找候選字符串以替換與模式匹配的字符串。replaceAll 函數(shù)將它們替換為計(jì)算出的 Unicode 代碼點(diǎn)。由于 Unicode 代碼點(diǎn)可能被編碼為兩個char(UTF-16) 值，因此所需的替換字符串必須從char[].

反對回復(fù) 2023-03-23

MM們

TA貢獻(xiàn)1886條經(jīng)驗(yàn) 獲得超2個贊

Java 中的字符串沒有十六進(jìn)制語法。如果您需要支持該字符串格式，我會制作一個輔助函數(shù)來解析該格式并構(gòu)建一個字節(jié)數(shù)組，然后將其解析為 ISO-8859-1。

import java.io.ByteArrayOutputStream;

public class translate {

private static byte[] parseBytesWithHexLiterals(String s) throws Exception {

final ByteArrayOutputStream baos = new ByteArrayOutputStream();

while (!s.isEmpty()) {

if (s.startsWith("#x")) {

s = s.substring(2);

while (s.charAt(0) != ';') {

int i = Integer.parseInt(s.substring(0, 2), 16);

baos.write(i);

s = s.substring(2);

}

} else {

baos.write(s.substring(0, 1).getBytes("US-ASCII")[0]);

}

s = s.substring(1);

}

return baos.toByteArray();

}

public static void main(String[] args) throws Exception {

String doc = "#xC1;o thun b#xE9; g#xE1;i c#x1ED9;t d#xE2;y xanh bi#x1EC3;n";

byte[] parsedAsISO88591 = parseBytesWithHexLiterals(doc);

doc = new String(parsedAsISO88591, "ISO-8859-1");

System.out.println(doc); // Print out the string, which is in Unicode internally.

byte[] asUTF8 = doc.getBytes("UTF-8"); // Get a UTF-8 version of the string.

}

反對回復(fù) 2023-03-23

3 回答
0 關(guān)注
158 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學(xué)習(xí)伙伴

公眾號

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

將包含 ISO 8859-1 十六進(jìn)制字符代碼的字符串轉(zhuǎn)換為 UTF-8 java

將包含 ISO 8859-1 十六進(jìn)制字符代碼的字符串轉(zhuǎn)換為 UTF-8 java

3 回答

添加回答