1 回答

TA貢獻(xiàn)1735條經(jīng)驗(yàn) 獲得超5個(gè)贊
正如評(píng)論中所建議的,您可以使用PdfCanvasEditorfrom this answer根據(jù)需要從內(nèi)容流中過(guò)濾操作。實(shí)際上,我稍微擴(kuò)展了該類,以便能夠正確支持'和"文本繪制運(yùn)算符。您可以在此處找到該課程。
就像在您的方法中一樣,要清除的行是在第一次運(yùn)行時(shí)確定的:我RegexBasedLocationExtractionStrategy為此使用了一個(gè)實(shí)例。
此后,在該P(yáng)dfCanvasEditor步驟中,將在這些行上繪制文本的指令更改為僅繪制空字符串。
不過(guò),由于不是您檢查的事件導(dǎo)致在此處繪制文本,而是更基本的運(yùn)算符和操作數(shù)結(jié)構(gòu),因此確切的機(jī)制不是從IEventFilter. 但是機(jī)制與您的方法相似。
try (PdfDocument pdfDocument = new PdfDocument(SOURCE_PDF_READER, TARGET_PDF_WRITER)) {
List<Rectangle> triggerRectangles = new ArrayList<>();
PdfCanvasEditor editor = new PdfCanvasEditor()
{
{
Field field = PdfCanvasProcessor.class.getDeclaredField("textMatrix");
field.setAccessible(true);
textMatrixField = field;
}
@Override
protected void nextOperation(PdfLiteral operator, List<PdfObject> operands) {
try {
recentTextMatrix = (Matrix)textMatrixField.get(this);
} catch (IllegalArgumentException | IllegalAccessException e) {
throw new RuntimeException(e);
}
}
@Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (TEXT_SHOWING_OPERATORS.contains(operatorString))
{
Matrix matrix = null;
try {
matrix = recentTextMatrix.multiply(getGraphicsState().getCtm());
} catch (IllegalArgumentException e) {
throw new RuntimeException(e);
}
float y = matrix.get(Matrix.I32);
if (triggerRectangles.stream().anyMatch(rect -> rect.getBottom() <= y && y <= rect.getTop())) {
if ("TJ".equals(operatorString))
operands.set(0, new PdfArray());
else
operands.set(operands.size() - 2, new PdfString(""));
}
}
super.write(processor, operator, operands);
}
final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
final Field textMatrixField;
Matrix recentTextMatrix;
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
PdfPage page = pdfDocument.getPage(i);
Set<PdfName> xobjectNames = page.getResources().getResourceNames(PdfName.XObject);
for (PdfName xobjectName : xobjectNames) {
PdfFormXObject xobject = page.getResources().getForm(xobjectName);
byte[] content = xobject.getPdfObject().getBytes();
PdfResources resources = xobject.getResources();
RegexBasedLocationExtractionStrategy regexLocator = new RegexBasedLocationExtractionStrategy("Created by:|Calendar:");
new PdfCanvasProcessor(regexLocator).processContent(content, resources);
triggerRectangles.clear();
triggerRectangles.addAll(regexLocator.getResultantLocations().stream().map(loc -> loc.getRectangle()).collect(Collectors.toSet()));
PdfCanvas pdfCanvas = new PdfCanvas(new PdfStream(), resources, pdfDocument);
editor.editContent(content, resources, pdfCanvas);
xobject.getPdfObject().setData(pdfCanvas.getContentStream().getBytes());
}
}
}
(EditPageContent測(cè)試testRemoveSpecificLinesCalendar)
請(qǐng)注意,這是一個(gè)概念驗(yàn)證,它是為 OP 的用例特別定制的:PdfCanvasEditor此處僅用于檢查和編輯每個(gè)頁(yè)面的第一級(jí)表單 XObjects,因?yàn)閺?Google 日歷以 Agenda 格式創(chuàng)建的 PDF 包含他們所有的頁(yè)面內(nèi)容都以 XObject 形式呈現(xiàn),而 XObject 又會(huì)在頁(yè)面內(nèi)容流中繪制。此外,預(yù)計(jì)文本將與頁(yè)面頂部平行。
添加回答
舉報(bào)