博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Regular Expressions in Java
阅读量:4974 次
发布时间:2019-06-12

本文共 3550 字,大约阅读时间需要 11 分钟。

In the project of Data Mining, I have to make use of the regular expressions to deal with the large amount of text in html.

I used regular expression in Linux (grep) before and find it quite an efficient way to deal with text, especially when their amount is very large.

 

Introduction

Regular expressions are a way to describe a set of strings based on common characteristics shared by each string in the set. They can be used to search, edit, or manipulate text and data. You must learn a specific syntax to create regular expressions — one that goes beyond the normal syntax of the Java programming language. Regular expressions vary in complexity, but once you understand the basics of how they're constructed, you'll be able to decipher (or create) any regular expression.

 

The package of java.util.regex

It primary consists three classes:

Pattern: a compiled representation of a regular expression.

Matcher: interprets the Patten and performs match operation against an input string.

PatternSyntaxException: indicates an syntax error in a regular expression pattern

 

A single regular expression program

1 package regexTestHarness; 2  3 import java.util.regex.Pattern; 4 import java.util.regex.Matcher; 5 import java.io.BufferedReader; 6 import java.io.InputStreamReader; 7  8 public class RegexTestHarness { 9     public static void main(String[] args) {10         try {11 12             System.out.println("%nEnter your regex: ");13 14             InputStreamReader isr = new InputStreamReader(System.in);15 16             BufferedReader br = new BufferedReader(isr);17 18             String s = br.readLine();19 20             Pattern pattern = Pattern.compile(s);21 22             System.out.println("%nEnter your text: ");23 24             isr = new InputStreamReader(System.in);25 26             br = new BufferedReader(isr);27 28             s = br.readLine();29 30             Matcher matcher = pattern.matcher(s);31 32             boolean found = false;33             while (matcher.find()) {34                 System.out.print("I found the text " + matcher.group()35                         + " starting at " + "index " + matcher.start()36                         + " and ending at index " + matcher.end());37                 found = true;38             }39             if (!found) {40                 System.out.println("No match found.");41             }42         } catch (Exception e) {43             e.printStackTrace();44         }45     }46 47 }

 

Chracter classes and Predefined classes

Construct Description
[abc] a, b, or c (simple class)
[^abc] Any character except a, b, or c (negation)
[a-zA-Z] a through z, or A through Z, inclusive (range)
[a-d[m-p]] a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]] d, e, or f (intersection)
[a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]] a through z, and not m through p: [a-lq-z] (subtraction)

 

Construct Description
. Any character (may or may not match line terminators)
\d A digit: [0-9]
\D A non-digit: [^0-9]
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [^\s]
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]

 

Quantifiers

Greedy Reluctant Possessive Meaning
X? X?? X?+ X, once or not at all
X* X*? X*+ X, zero or more times
X+ X+? X++ X, one or more times
X{n} X{n}? X{n}+ X, exactly n times
X{n,} X{n,}? X{n,}+ X, at least n times
X{n,m} X{n,m}? X{n,m}+ X, at least n but not more than m times

Chinese Characters

[\u4e00-\u9fa5]

转载于:https://www.cnblogs.com/johnpher/archive/2012/07/02/2573865.html

你可能感兴趣的文章
HBuilder打包Android apk 支付不了问题解决
查看>>
poj2594——最小路径覆盖
查看>>
程序员口述:我是如何工作三年后跳槽到美团的?
查看>>
欧拉函数
查看>>
关于SQL2008 “不允许保存更改。您所做的更改要求删除并重新创建以下表。您对无法重新创建的标进行了更改或者启用了‘阻止保存要求重新创建表的更改’” 解决方案...
查看>>
php文件操作(上传文件)2
查看>>
linux内核驱动模型
查看>>
给WebApp加一个“壳”,实现Andriod系统添加到桌面
查看>>
js 浏览器复制功能
查看>>
数据库总编
查看>>
redis 字符串(string)函数
查看>>
杭州电 1372 Knight Moves(全站搜索模板称号)
查看>>
POJ--3268--Silver Cow Party【SPFA+邻接表】
查看>>
c语言的几个简单memo
查看>>
C#的默认访问权限
查看>>
selenium下打开Chrome报错解决
查看>>
红酒初识
查看>>
BNUOJ 5629 胜利大逃亡(续)
查看>>
HDU-1150 Machine Schedule(二分图、匈牙利)
查看>>
Python assert 断言函数
查看>>