The content of this post is written in a hurry. Please feel free to contact me at the bottom of page.
Some time ago, I was told the concept of RPA that stands for Robotic Process Automation. This is a new type of robot using artificial intelligence technologies making the robot learns by itself. This is not a programmer who tells him what to do in the form of lines of code instructions, but the robot itself by observing and imitating an operator. Very interesting technology because much flexible. An example is the car built by George Hotz. It uses the same principle to learn how to drive.
I myself am very interested in this kind technology and I wondered how I could have heard about it earlier. I then saw that the corresponding Wikipedia article was dating from last summer.
I then had the idea of using Wikipedia automatic parsing to list new articles on a topic that could potentially indicate: a new technology, a new ecosystem actor…
The first step of the project is to download two Wikipedia dumps extracts with a few months difference. Dumps can be found at this url: https://dumps.wikimedia.org/enwiki/.
The second step of this project is to parse both Wikipedia dumps and get the article titles speaking about a chosen theme, a sequence of key words. In the example below,
The last step is to isolate the new article names, ie those not included in the oldest dump.
Around the topic
deep learning, between
2015-12-01, the following articles emerged or starting speaking about deep learning:
- Adam Gibson (computer scientist)
- AI takeover
- Alexey Grigorevich Ivakhnenko
- AMAX Information Technologies
- Apple electric car project
- Brighton High School (Rochester, New York)
- Catalina Foothills Unified School District
- Cognitive architecture
- Deep machine learning
- GeForce 1000 series
- GeForce 900 series
- General video game playing
- Google DeepMind
- Google Translate
- Group method of data handling
- H2O (software)
- Language model
- List of important publications in computer science
- List of Italian Nobel laureates
- Long short-term memory
- Loss functions for classification
- MANIC (Cognitive Architecture)
- Massimiliano Versace
- Neural Designer
- Pathway Genomics
- Problem-based learning
- Rob Ryan (entrepreneur)
- Sentiment analysis
- Solvent models
- Sparse distributed memory
- Sunbury Downs College
- Thomas Huang
- Trajectory Inc.
- Truncated Newton method
- Virtual screening
- Xeon Phi
- Yoshua Bengio
The parsing is very basic. Tex mining state of the art give much more sophisticated tools than a simple
A classification of article by category: people / company / technology / … could be implemented. In this example we only remove Wikipedia special articles
new_articles = [x for x in new_articles if ':' not in x].