Skip to content
Open
Show file tree
Hide file tree
Changes from 47 commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
5d63c1d
Init commit of first task
Nov 10, 2019
21e3713
Base and yahoo classes are comleted
Nov 12, 2019
ded173c
Added tut and default bots classes
Nov 12, 2019
cb1c1b6
Remade whole architecture
Nov 13, 2019
e61350b
Begin tests implementation
Nov 14, 2019
a32fcda
Cod is covered with test over 90%
Nov 15, 2019
42e239d
readme file with project description
Nov 15, 2019
d155454
Script to check code checks using pycodestyle module
Nov 15, 2019
f2606f8
First iteration is completed
Nov 15, 2019
1b4df02
gitignore update after merging into remote repo
Nov 16, 2019
c3fb53a
init second iteration commit - create a draft of setup.py
Nov 17, 2019
342b0c4
Rebase whole project to move rss_reader into a separete pasckage
Nov 17, 2019
c3bd27c
Fixed paths in pycodestyle.sh script and corrected README.md
Nov 17, 2019
b864ad1
Change test to tests
Nov 17, 2019
be08d70
fixed requirements.txt
Nov 17, 2019
d93137f
added a link to wheel package
Nov 17, 2019
08bf226
Added the dist directory to provide distributed packages
Nov 17, 2019
ace616e
Added the 'dist' directory to provide distributed packages
Nov 17, 2019
42f02cd
Merge branch 'master' of github.com:Nenu1985/PythonHomework
Nov 17, 2019
1d42004
Restructured news to pass mypy cheks.
Nov 18, 2019
8d8fe31
Restructured news to pass mypy cheks.
Nov 18, 2019
e184bdd
Merge pull request #1 from Nenu1985/fix/structured-news
Nenu1985 Nov 18, 2019
335b165
Merge branch 'fix/structured-news' of github.com:Nenu1985/PythonHomew…
Nov 18, 2019
59806d7
Merge branch 'fix/structured-news'
Nov 18, 2019
c78bcae
Some mypy`s annotations optimization
Nov 20, 2019
25d9379
Implement feature to store news via decorators, json and pickle
Nov 20, 2019
95ba589
Move exceptions and json patcher into another modules
Nov 20, 2019
2dcf471
Implemented feature for loading news from storage by date
Nov 20, 2019
b9f29bd
Implement SQlite3 DB to store news
Nov 21, 2019
6adcdab
Split news storage procedure into couple of methods
Nov 22, 2019
c984129
Implemented news loading from DB by input --date
Nov 23, 2019
b98fbe2
Implemented pdf converter
Nov 24, 2019
266a8ed
Add logs, converter interface, docstrings
Nov 24, 2019
62ec18c
Implemented html converter
Nov 25, 2019
9ffbb34
Fixed mypy errors
Nov 25, 2019
7ca3efa
Docstrings added to bots
Nov 25, 2019
f7cfbcc
Updated tests due to code changed. Coverd 83%
Nov 25, 2019
c0e8996
Update readme and version after merging with forth-iteration branch
Nov 26, 2019
4bfc53d
Fixed argparse error with escape symbols. Updatet dist and readme
Nov 26, 2019
578978b
fixed issue related with loading stored news and printed via tut.py bot
Nov 26, 2019
67c31d2
Added an exception when there is no news found by date
Nov 26, 2019
a3556a3
Adde python-dateutil to requerements and setup py to correct load fro…
Nov 26, 2019
dcdd116
Fixed setup
Nov 26, 2019
583c72e
Changed verison of python-dateutil package
Nov 26, 2019
8d6d84c
Update setup
Nov 26, 2019
3913528
Update setup
Nov 26, 2019
dbfa52c
`Update setup
Nov 26, 2019
674f027
`Update setup
Nov 26, 2019
8cbbdd4
Make changes in readme related to launching utility
Nov 28, 2019
45a2d3e
Merge branch 'master' of github.com:Nenu1985/PythonHomework
Nov 28, 2019
94e3ca7
Merge branch 'master' of github.com:Nenu1985/PythonHomework
Nov 28, 2019
63c96fe
Merge branch 'master' of github.com:Nenu1985/PythonHomework
Nov 28, 2019
daa2991
Fifth iteration init
Nov 28, 2019
5edd6ba
Corol shemes, Flask server
Nov 29, 2019
f5836e6
Fixed tests and requirements
Nov 29, 2019
2373d13
Home tasks
Nov 29, 2019
b65b4ed
Forgot to delete ./dist/ - deleted
Nov 29, 2019
1671e30
Add some dependencies into requirements
Nov 29, 2019
15ae496
html templates, news generation
Nov 29, 2019
77d380a
delete *htmls from .gitignore
Nov 29, 2019
9bb41c4
six-iteration functionality comleted
Nov 30, 2019
3bf6e91
Add Colors class and color initialisation procedure
Nov 30, 2019
bddc888
Replace server dir to the root, change root dir for rss_reader
Nov 30, 2019
2378724
Tested all functionality, made some fixes
Nov 30, 2019
44a1abe
Make server as launchable package
Nov 30, 2019
3b557f5
Docker unstall updates
Dec 1, 2019
3372edd
install script update
Dec 1, 2019
3f90284
install script update
Dec 1, 2019
2bdb35b
last edits
Dec 1, 2019
1691ff5
Fix error in install.sh file
Dec 1, 2019
f2c96ca
update version in readme
Dec 1, 2019
cd955cf
extra tasks
Jun 15, 2021
4ce1088
task for NY ecommmerce team
Jul 14, 2021
b49bf09
updates
Aug 3, 2021
88ceba0
1
Aug 3, 2021
782e986
aiohttp polls example init
Aug 4, 2021
d6e9835
db creation, connection, filling sample data
Aug 4, 2021
c47d019
async request to the db
Aug 4, 2021
6358126
wiring up templates
Aug 4, 2021
844fbae
вап
Aug 4, 2021
a01bd48
Merge branch 'master' into aiohttp
Aug 5, 2021
62bcd79
Merge pull request #2 from Nenu1985/aiohttp
Nenu1985 Aug 5, 2021
6aed1c2
stone game DP
Aug 5, 2021
eac4dc9
add two linked lists
Aug 5, 2021
864b587
zigzag_conversion
Aug 5, 2021
a865257
str to int myAtoi
Aug 5, 2021
8ce91fd
longest palindrome
Aug 6, 2021
354dc43
Merge pull request #3 from Nenu1985/leetcode
Nenu1985 Aug 6, 2021
a558e17
Merge pull request #4 from Nenu1985/leetcode
Nenu1985 Aug 6, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 26 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ __pycache__/
.Python
build/
develop-eggs/
dist/
#dist/ Need for the final's second iteration task
downloads/
eggs/
.eggs/
Expand All @@ -35,7 +35,7 @@ MANIFEST
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
# Unit tests / coverage reports
htmlcov/
.tox/
.coverage
Expand Down Expand Up @@ -102,3 +102,27 @@ venv.bak/

# mypy
.mypy_cache/

# MAC filesystem's files
.DS_Store

# pycharm files
.idea/

# tests data files
cover/
tests/data/help.txt
tests/data/yahoo.txt

# news storage
rss_reader/storage
*sqlite3.db

# pdf & html files
*.pdf
*.html

# Unused files
rss_reader/utils/dejavu_font/DejaVuSansCondensed.cw127.pkl
rss_reader/utils/dejavu_font/DejaVuSansCondensed.pkl
rss_reader/utils/no_imagepng.png
19 changes: 19 additions & 0 deletions LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Copyright (c) 2019 The Python Packaging Authority (PyPA)

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
10 changes: 10 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
include requirements.txt
include *.sh
include *.txt
include config.cfg
include mypy.ini
recursive-include dist *.gz
recursive-include dist *.whl
recursive-include rss_reader *.ttf
recursive-include tests *.py
recursive-include tests *.xml
112 changes: 112 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Rss reader hometask for EpamTrainee
Python RSS-reader.

Url for cloning:
`https://github.com/Nenu1985/PythonHomework.git`

Version 4
```shell
usage: rss_reader [-h] [--verbose] [--limit LIMIT] [--json] [-v]
[--width WIDTH] [--date DATE] [--to_pdf TO_PDF]
[--to_html TO_HTML]
url

Rss reader. Just enter rss url from your favorite site and app will print
latest news.

positional arguments:
url url of rss

optional arguments:
-h, --help show this help message and exit
--verbose Outputs verbose status messages
--limit LIMIT Limit news topics if this parameter provided
--json Print result as JSON in stdout
-v, --version Print version info
--width WIDTH Define a screen width to display news
--date DATE Date of stored news you want to see. Format: %Y%m%d
--to_pdf TO_PDF Convert and store news you are looking for to pdf
--to_html TO_HTML Convert and store news you are looking for to html


```

## Code description
* Code uses `argparse` module.
* Codebase covered with unit tests with at 90% coverage.
* Any mistakes are printed in human-readable error explanation.
Exception tracebacks in stdout are prohibited.
* Docstrings are mandatory for all methods, classes, functions and modules.
* Code corresponds to `pep8` (used `pycodestyle` utility for self-check).
* Feedparser module is used for rss parsing;
* There are several bots for parsing news: default bor for unimplemented RSS urls and
custom bots (yahoo, tut) with detailed approach to parsing.

## Code self-checking
Use ./pycodestyle.sh to check the code corresponding to `pep8`
(pycodestyle package must be installed)

## Testing
Tests are available at `https://github.com/Nenu1985/PythonHomework`
Launching:
```
./make_tests.sh
```
- to pass test with coverage
(nose and coverage packages must be installed)

## Version 2: Distribution
Utility wrapes into distribution package with setuptools.
This package exports CLI utility named rss-reader.

To generate distribution package (setuptool and wheel must be installed).
Launch:

``` python3 setup.py sdist bdist_wheel```

In the ./dist repo you'll find a .tar and .whl files.

Wheel package for the second iteration task
(maybe is discarded but it works) on the Google Drive:
```https://drive.google.com/file/d/1RbMYxvpEXTx77Dk61xPkwSChD_jTf0jf/view?usp=sharin```

Actual packages you may find in the './dist' repo if you don't want to generate it manually.

Installing:

```python3 -m pip install ./dist/rss_reader-4.0-py3-none-any.whl```

OR
```
python3 -m pip install -r requirements.txt
pip install ./dist/rss-reader-4.0.tar.gz
```
## Version 3: News cashing
News cashing implemented by using Sqlite3 DB. DB consists of 4 related tables: feed, news_item, links, imgs.
The implementation is in the rss_reader/utils/sqlite.py file. It contains RssDB class. Builtin sqlite3 lib is
used.
Base RssParser class imports RssDB class and uses for storing and loading data. RssParser's method print_news()
is decorated with call_save_news_after_method() (rss_parser/utils/decorators) that calls appropriate function
for storing news data (_store_news()).

## Version 4: Converters
Utility implements news converting to pdf and html formats. See according files: rss_reader/utils/pdf.py and
rss_reader/utils/html_writer.py files.
Pdf converter uses pyFPDF package. To correct print cyrillic symbols djvu fonts are imported. Html2Pdf method
doesn't use because of unsupported utf-8 encoding. That's why I had to parse htmls and generate pdf object
manually.
Html converter uses lxml.html library to parse and generate html content.



## Docker deployment
docker run -it python /bin/bash
git clone https://github.com/Nenu1985/PythonHomework.git
cd PythonHomework
pip install -r requirements.txt
pip install .
rss-reader --help




13 changes: 13 additions & 0 deletions config.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
[metadata]
# This includes the license file(s) in the wheel.
# https://wheel.readthedocs.io/en/stable/user_guide.html#including-license-files-in-the-generated-wheel-file
license_files = LICENSE.txt

[bdist_wheel]
# This flag says to generate wheels that support both Python 2 and Python
# 3. If your code will not run unchanged on both Python 2 and 3, you will
# need to generate separate wheels for each Python version that you
# support. Removing this line (or setting universal to 0) will prevent
# bdist_wheel from trying to make a universal wheel. For more see:
# https://packaging.python.org/guides/distributing-packages-using-setuptools/#wheels
universal=1
Binary file added dist/rss-reader-4.0.tar.gz
Binary file not shown.
Binary file added dist/rss_reader-4.0-py3-none-any.whl
Binary file not shown.
4 changes: 4 additions & 0 deletions make_tests.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash

# Run this script to launch the tests
nosetests --with-coverage --cover-erase --cover-package=rss_reader --cover-html --traverse-namespace
38 changes: 38 additions & 0 deletions mypy.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Global options:
[mypy]
# Logistics of what code to check and how to handle the data.
scripts_are_modules = False
show_traceback = True

[mypy-bs4]
ignore_missing_imports = True

[mypy-feedparser]
ignore_missing_imports = True

[mypy-fpdf]
ignore_missing_imports = True

[mypy-lxml]
ignore_missing_imports = True

[mypy-lxml.html]
ignore_missing_imports = True

[mypy-rss_reader]
ignore_missing_imports = True

[mypy-setuptools]
ignore_missing_imports = True

[mypy-terminaltables]
ignore_missing_imports = True

[mypy-urllib]
ignore_missing_imports = True

[mypy-utils]
ignore_missing_imports = True

[mypy-utils.RssInterface]
ignore_missing_imports = True
20 changes: 20 additions & 0 deletions pycodestyle.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/bin/bash

YELLOW='\033[0;33m'
RED='\033[0;31m'
GREEN='\033[0;32m'
NC='\033[0m' # No Color

echo -e "${YELLOW}Starting static code analys ${NC}"

# utils/
python3 -m pycodestyle rss_reader/utils/ --max-line-length=120
echo -e "${YELLOW}utils/ ${GREEN}PASSED${NC}"

# rss.py
python3 -m pycodestyle rss_reader/rss.py --max-line-length=120
echo -e "${YELLOW}rss.py/ ${GREEN}PASSED${NC}"

# bots/
python3 -m pycodestyle rss_reader/bots/ --max-line-length=120
echo -e "${YELLOW}bots/ ${GREEN}PASSED${NC}"
9 changes: 9 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
attrs==19.3.0
bs4==0.0.1
coverage==4.5.4
feedparser==5.2.1
fpdf==1.7.2
nose==1.3.7
python-dateutil==2.8.1
terminaltables==3.1.0
lxml==4.4.1
1 change: 1 addition & 0 deletions rss_reader/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from . import bots, utils
2 changes: 2 additions & 0 deletions rss_reader/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
from rss_reader import rss
rss.main()
Empty file added rss_reader/bots/__init__.py
Empty file.
7 changes: 7 additions & 0 deletions rss_reader/bots/default.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"""Default (not specified) rss parser bot"""
from ..utils.rss_interface import BaseRssBot


class Bot(BaseRssBot):
"""Default (not specified) rss parser bot"""
pass
94 changes: 94 additions & 0 deletions rss_reader/bots/tut.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
"""Tut.by specified rss parser bot"""
import attr
import bs4
import feedparser
import typing

from rss_reader.utils.rss_interface import BaseRssBot
from ..utils.data_structures import NewsItem, News


@attr.s(frozen=True)
class TutNewItem(NewsItem):
"""Extended NewsItem class to store tags and authors"""
tags: typing.List[str] = attr.ib()
authors: typing.List[str] = attr.ib()


class Bot(BaseRssBot):
"""Tut.by specified rss parser bot"""

def _feed_to_news(self, feed: feedparser.FeedParserDict) -> News:
"""
Returns str containing formatted news from internal attr self.feed

:return: str with news
"""
news_items = []

for i, item in enumerate(feed.get('items')[:self.limit]):

news_items.append(TutNewItem(
title=item.get('title', ''),
link=item.get('link', ''),
published=item.get('published', ''),
imgs=[img.get('url', '') for img in item.get('media_content', '')],
links=[link.get('href', '') for link in item.get('links', '')],
html=item.get('summary', ''),
authors=[author.get('name', '') for author in item.get('authors', '')],
tags=[tag.get('term', '') for tag in item.get('tags', '')],
))

news = News(
feed=feed.get('feed', '').get('title', ''),
link=feed.get('feed', '').get('link', ''),
items=news_items,
)
self.logger.info(f'Feedparser object is converted into news_item obj with TUT news')

return news

def _parse_news_item(self, news_item: TutNewItem) -> str:
"""
Forms a human readable string from news_item and adds it to the news_item dict
:param news_item: news_item content
:return: human readable news content
"""
self.logger.info(f'_parse_news_item_tut.by Extending {news_item.title}')

out_str = ''
out_str += f"\nTitle: {news_item.title}\n" \
f"Date: {news_item.published}\n" \
f"Link: {news_item.link}\n"
if type(news_item) == TutNewItem:
out_str += f"Authors: {', '.join(news_item.authors)}\n"
out_str += f"Tags: {', '.join(news_item.tags)}\n"

html = bs4.BeautifulSoup(news_item.html, "html.parser")

links = news_item.links
imgs = news_item.imgs

for tag in html.descendants:
if tag.name == 'a':
pass
elif tag.name == 'img':
src = tag.attrs.get('src')
# src = src.replace('thumbnails/', '')
if src in imgs:
img_idx = imgs.index(src) + len(links) + 1
else:
imgs.append(src)
img_idx = len(imgs) + len(links)

out_str += f'\n[image {img_idx}: {tag.attrs.get("alt")}][{img_idx}]'
elif tag.name == 'p':
out_str += '\n' + tag.text
elif tag.name == 'br':
out_str += '\n'
out_str += f'\n{html.getText()}\n'
out_str += 'Links:\n'
out_str += '\n'.join([f'[{i + 1}]: {link} (link)' for i, link in enumerate(links)]) + '\n'
out_str += '\n'.join([f'[{i + len(links) + 1}]: {link} (image)' for i, link in enumerate(imgs)]) + '\n'

return out_str
Loading